Skip to main content

Quickstart

In this quick walkthrough, we will train a machine learning model which predicts the survivors of the Titanic disaster and deploy it for real-time inference. We will use the famous Kaggle Titanic dataset for our example.

note

Layer only supports Linux and macOS systems. If you are a Windows user, we recommend using Windows Subsystem for Linux (WSL).

Layer SDK might work if you run PowerShell as an Administrator, but Layer does not guarantee support for it.

Prerequisites#

  • An internet connection
  • A Linux or macOS system to run the Layer SDK on
  • A Safari 12+ or Chrome 70+ web browser to run the Layer app
  • Python version 3.8 installed
  • pip installed

Install Layer SDK#

The Layer SDK is a Python module distributed on pypi, and can be installed via pip. We recommend using virtual environments when installing with pip.

  1. Install the Layer SDK.
pip install layer-sdk

If you have permission problems, then try:

pip install --user layer-sdk
  1. Check your installation. In the command line, enter layer --help.

    If the Layer help is displayed, then installation was successful.

Log in to Layer#

You need to log in before running any Layer SDK commands. Enter the following command in the command line.

layer login

A browser window opens https://beta.layer.co to sign in. If you don't have a Layer account already, you can create one now.

Check out the Titanic project#

Use the Layer SDK to clone the Titanic example we've created for you.

layer clone https://github.com/layerml/examples
cd examples/titanic

If you want to learn more about example projects, refer to Layer example projects.

note

If you have trouble with layer clone, then you can clone the Layer example repository with Git and then cd into that directory in the terminal.

Once the checkout is complete, you will find the following files which declaratively define our MLOps pipeline:

.
├── .layer
├── data
│ ├── passenger_features
│ │ ├── dataset.yml # Definition of `passenger_features`, with each SQL feature definition below.
│ │ ├── ageband.sql
│ │ ├── embarked.sql
│ │ ├── fareband.sql
│ │ ├── is_alone.sql
│ │ ├── sex.sql
│ │ ├── survived.sql
│ │ └── title.sql
│ └── titanic_data
│ └── dataset.yml # Declares where our source data is
├── models
│ └── survival_model
│ ├── model.py # Definition of our model
│ ├── model.yml # Training directives of our model
│ └── requirements.txt # Environment config file
├── notebooks
│ └── TitanicSurvivalNotebook.ipynb
└── README.md

Titanic example project contains four main directories;

📂 DirectoryDescription
data/titanic_dataContains a YAML file that connects the Titanic Dataset to this project. We have uploaded this source data into our demo database, which is the main data source for this project.
data/passenger_featuresContains SQL files that correspond to a feature you will use to create training data. The dataset.yml here is the descriptor for the featureset.
modelsContains model.py, which is the implementation of the model. model.yml is the descriptor for this ML model.
notebooksContains a Jupyter Notebook file.

Ready to go!#

You are now ready to run your first Layer project.

layer start

Layer finds the featuresets and ML models in this project. It then builds them and put them into catalogs which are:

  • Data Catalog: This is the starting point for finding the data you are looking for when developing projects and models. It's the central repository for all your Featureset among all your projects.
  • Model Catalog: The central repository for your ML Models. You can review the results of your experiments or deploy your models for inference with a single click here.

In your terminal, you will see something like this:

Layer run finished

You can enter the URLs in your browser to see the featureset and model information for this build.

warning

If you are working in a shared organization and someone else has also run this project, then Layer iterates over the existing project rather than creating a new one.

Deploy your model#

In your web browser, navigate to the Layer Model Catalog where your model is trained and ready to be deployed in the ​Layer Model Catalog.

image-20210130155808528

Click on your first model, Survival Model. You will see many details about your model including the accuracy metric logged while training or the signature of your model.

image-20210130160312995

Click on the + Deploy button on the top right. It will deploy your model to run on the Layer cloud server. This might take a minute or two.

Once deployed, that button will return to green. Click the 🔗 to copy the deployment URL. You'll need it in the next section.

image-20210130161022201

Make a prediction#

The model is deployed for real-time inference. As mentioned above, our model expects seven inputs, each corresponding to a feature, to make a prediction. Let's test our model with a sample input (make sure to replace $MODEL_DEPLOYMENT_URL):

curl --header "Content-Type: application/json; format=pandas-records" \
--request POST \
--data '[{"Sex":0, "EmbarkStatus":0, "Title":0, "IsAlone":0, "AgeBand":1, "FareBand":2}]' \
$MODEL_DEPLOYMENT_URL

The API returns either a [0] or a [1], predicting whether that passenger survived.

Congratulations!

You have successfully run your first Layer project!

You can stop now or, if you have Jupyter Notebook, continue and see how you can reuse the features and models you created in Layer.

Import the features on a Jupyter Notebook#

Let's now look at how you can re-use features after training a machine learning model with Layer. Features generated by Layer are stored in the Layer Data Catalog. In this Titanic example, we created features known as passenger_features.

image-20211625818219

note

If you prefer, you can also go through the TitanicSurvivalNotebook.ipynb and get pretty much the same results.

  1. Open Jupyter Notebook. On most systems, you can run jupyter notebook in the terminal to open the program in your web browser.
  2. Enter the following commands in cells:
# Import layer
import layer
# Authenticate your Layer account
layer.login()
# Fetch the features
passenger_features_df = layer.get_features(["passenger_features"])
# Display the features
passenger_features_df

Your results will look something like this:

image-20211625818220

The layer.get_features() function accepts a list of the features you'd like to fetch and returns the features as a dataFrame. Once you have the dataFrame, you can proceed with your analysis and model building as you'd like.

passenger_features_df.head()
passenger_features_df.info()
passenger_features_df.describe()

image-20211625818221

Run the model on a Jupyter Notebook#

Now let's run the model.

  1. Enter the following commands in cells.

    model = layer.get_model("survival_model")
    trained_classifier = model.get_train()
    trained_classifier

    Jupyter Notebook returns RandomForestClassifier().

  2. Test the model by entering following:

    trained_classifier.predict([[2,2,1,2,1,0]])

    It returns an array with a [0] or [1].

Example Notebook#

We have created an example notebook for the Titanic Example. It includes all the examples above from fetching a featureset to making an inference. Feel free to check it out here: https://github.com/layerml/examples/blob/main/titanic/notebooks/TitanicSurvivalNotebook.ipynb

Congratulations!

You have successfully run your first Layer project and reused the results in Jupyter Notebook!