Finally, we want to deploy to an endpoint in Vertex AI. When you use a Tensorflow model as in the codelab tutorial, this part is made much easier as you can use a prebuilt container for serving your model. In the case of LightGBM, you must build the container yourself, guided by the Vertex AI requirements for prediction containers . This includes setting up a small web server. We use flask here, because it is well-known and makes setting up a small web server as ours easy. There are lots of alternatives, so please make sure you check the available choices before deploying a prediction image to production.
Now let’s prepare our prediction container. First, enter
Assuming you’ve still set the variable $PROJECT_ID from the codelab in your terminal, we define the URI of our prediction image:
You may want to modify “gcr.io” if you want to do this tutorial in a region outside the US, e.g. “eu.gcr.io” if you’re using a European region. Now we build our prediction container and push it to Google’s container registry:
docker build ./ -t $IMAGE_URI && docker push $IMAGE_URI
The container we’ve just built contains the code for a complete webserver. You can see the code in the file app.py. You may notice that we’re using the Python SDK here instead of Cloud Storage FUSE to download our model. This is necessary because at the time of this writing, prediction images in Vertex AI don’t have automatic Cloud Storage FUSE support, in contrast to training.
Now you can import your model to Vertex AI using either the cloud console or the following command:
gcloud ai models upload --region=us-central1 --display-name=mpg --container-image-uri=$IMAGE_URI --artifact-uri=gs://<your-model-bucket-name>
Make sure you replace “<your-model-bucket-name>” with the name of the bucket with your model. You may also want to replace “us-central1”, if you’re working in a different region.
Finally, you can deploy your model as described in step 6 of the codelab.