LightGBM on Vertex AI

LightGBM on Vertex AI

In the Google cloud, Vertex AI is the MLOps framework. It is very flexible, and you can basically use any modelling framework you like. However, some frameworks are a bit easier to use than others: Tensorflow, XGBoost and Scikit-Learn are supported with some prebuilt images which are very helpful. This blog post will show how you can train and deploy models which are not generated by another framework. We will use a LightGBM model as an example, but the workflow can easily be transferred to any other modelling package.

Overview

Workflows in Vertex AI are heavily based on containers. For this blog post, we will need two containers, one for training and one for deployment. We will follow this Google codelab with some variations. The codelab shows you how to train and deploy a Tensorflow model using pre-built containers, and we will modify it by using own containers and code to get from Tensorflow to LightGBM.

Training a LightGBM model instead of Tensorflow

For the training part, start by going through step 1 through 3 without changes. In step 4, please enter the following command to download our modified code as soon as you have opened your terminal:


git clone https://github.com/Allgoerithm/lightgbm_vertex_ai.git

This command creates a directory lightgbm_vertex_ai with subdirectories training and prediction. Let’s carry on with the codelab and use the code we just downloaded for some modifications: When we containerize the training code, we need to replace the Dockerfile with our own file (located in lightgbm_vertex_ai/training/Dockerfile). And of course, the Tensorflow code for the training has to be replaced by LightGBM training code (see lightgbm_vertex_ai/training/trainer/train.py). Just as in the tutorial, you have to change BUCKET_NAME in the Python code to the name of the storage bucket you created for the model. Observe that we must use a slightly different syntax for the bucket name here. When using Tensorflow as in the codelab, you can use the standard notation starting with “gs://”. When using LightGBM, this notation is not available. However, there is a convenient replacement: Just use replace “gs://” with “/gcs/”, and everything works fine. Vertex AI uses Cloud Storage FUSE to mount Google cloud storage at “/gcs/”, so we can use Cloud Storage conveniently via the local file system without having to resort to the Google Cloud Python API.

To complete training, run step 5 of the tutorial. When you kick off the training job, choose “no prediction container” instead of “custom container”. Otherwise perform step 5 as described in the codelab (just be prepared that the training time will be shorter than the 10 to 15 minutes announced in the tutorial).

Deploying the LightGBM model to a Vertex AI endpoint

Finally, we want to deploy to an endpoint in Vertex AI. When you use a Tensorflow model as in the codelab tutorial, this part is made much easier as you can use a prebuilt container for serving your model. In the case of LightGBM, you must build the container yourself, guided by the Vertex AI requirements for prediction containers . This includes setting up a small web server. We use flask here, because it is well-known and makes setting up a small web server as ours easy. There are lots of alternatives, so please make sure you check the available choices before deploying a prediction image to production.

Now let’s prepare our prediction container. First, enter

cd ~/lightgbm_vertex_ai/prediction/

Assuming you’ve still set the variable $PROJECT_ID from the codelab in your terminal, we define the URI of our prediction image:

IMAGE_URI="gcr.io/$PROJECT_ID/mpg-prediction:v1"

You may want to modify “gcr.io” if you want to do this tutorial in a region outside the US, e.g. “eu.gcr.io” if you’re using a European region. Now we build our prediction container and push it to Google’s container registry:

docker build ./ -t $IMAGE_URI && docker push $IMAGE_URI

The container we’ve just built contains the code for a complete webserver. You can see the code in the file app.py. You may notice that we’re using the Python SDK here instead of Cloud Storage FUSE to download our model. This is necessary because at the time of this writing, prediction images in Vertex AI don’t have automatic Cloud Storage FUSE support, in contrast to training.

Now you can import your model to Vertex AI using either the cloud console or the following command:

gcloud ai models upload   --region=us-central1   --display-name=mpg   --container-image-uri=$IMAGE_URI   --artifact-uri=gs://<your-model-bucket-name>

Make sure you replace “<your-model-bucket-name>” with the name of the bucket with your model. You may also want to replace “us-central1”, if you’re working in a different region.

Finally, you can deploy your model as described in step 6 of the codelab.

Congrats! Your first LightGBM model is now being served on Vertex AI. Don’t forget to clean up any artifacts you created in this tutorial (cf. also step 7 of the codelab).

Now that you’ve trained and deployed a model, you’re flexible enough to use Vertex AI with any modelling framework you like. The next step is to embed embed training and deployment into an automated workflow. Have a look at my colleague’s great blog posts to see how to use them!

This way!