For those completely new to Layer, it can be helpful to get a high-level overview of the concepts before diving into complex topics.
A Layer project is a directory that contains YAML configuration files along with corresponding model and feature definitions in Python/SQL. You tell Layer what to accomplish, rather than describe how to accomplish it. We call this Declarative MLOps.
Layer utilizes your existing data warehouses and data lakes to run computations needed to build the features. Through data source definitions, you can specify the target schemas where Layer materializes the featuresets. Layer doesn't store any of your data -- it only stores metadata and code. You own your data and Layer processes it within your computation platform.
In machine learning, feature is a measurable individual property. They are the inputs for the predictive models. For example, if an e-commerce company is trying to predict whether the customer is going to churn or not, the total number of orders of a customer in the last 7 days (e.g.
orders_last_7d) could be very useful. Features are the basic building blocks of ML models and feature engineering is crucial for true collaboration in a healthy data organization.
A featureset is a group of features which provide a high-level interface to access individual features. Featuresets are different than static datasets or ordinary database tables as they provide time-travel capability to get point-in-time correct values of its underlying features.
Featuresets are first-class entities in Layer. They are integral to and built within a Layer Project. They are versioned and stored in the Layer Data Catalog.
An ML Model is the output of a machine learning algorithm run on data. A model represents what was learned by a machine learning algorithm.
ML Models are first-class entities in Layer. They are integral to and built/trained via a Layer Project. They are versioned and stored in the Layer Model Catalog.
The central repository for your Datasets and Featuresets. It provides both batch and real-time interfaces to serve Featuresets. Data Catalog empowers everyone in your organization to find, understand, and govern the data. It enables company-wide collaboration on your data.
The central repository for your ML Models. It provides extensive features to manage the lifecycle of your ML Models from versioning, experiment tracking, deploying to monitoring your models. It's deeply integrated with Layer Data Catalog which enables end to end data lineage for every stage of your ML models.