Skip to main content


A featureset is a group of calculated features that provide a high-level interface to access individual features. Featuresets are different from static datasets or ordinary database tables as they provide the capability to time-travel to get point-in-time values of its underlying features.

Featuresets are first-class entities in Layer. They are integral to and built within a Layer project. They are stored in the Layer Data Catalog.

All featuresets are defined in a directory, with a dataset.yml file at the root linked to one or more SQL or Python files. Featuresets have the following basic layout:

├── my_featureset/
│ ├── dataset.yml
│ ├── my_feature_1.sql
│ ├──

Featureset configuration#

Featuresets are configured in a dataset.yml file. An example is shown below, alongside field definitions. Click a definition to highlight the code that it refers to.


Version of the model definition file. This is used to make sure backwards-incompatible changes in config format do not break Layer CLI.


Name of the featureset. The name will be used to identify the featureset in the Data Catalog.


Description of the dataset. We recommend writing a description that future coworkers (or future you) will be grateful for. This description is displayed on the featureset card in the Data Catalog.


Determines what type of data is contained in this dataset. featureset is the only option for featuresets.


List of features.


Name of the feature


Description of the feature.


Source code of the feature. SQL source query file indicates SQL features, Python source code file indicates either Python or Spark features.


Contains primary_keys. This field is used to join the features under a featureset. Every single feature has an ID and a Value column. The primary_key field tells Layer how to join the features


This field is required.


table is the only option.

target or integration#

Integration where this data (features) is materialized. Name of the integration where this dataset lives. You assign names to integrations in Layer Settings > Integrations.

# required.
apiVersion: 1
# required.
name: "my_featureset"
# optional.
description: "Car features with transmission and age"
# required.
type: featureset
# required.
- name: my_feature_1
description: "My SQL Feature's description"
source: my_feature1.sql
- name: my_feature_2
description: "My Python Feature's description"
# required.
primary_keys: ["ID"]
# required.
type: table
target: my_db

Defining features#

There are two ways you can define your features in a Layer Project.

  • SQL features: You can use SQL queries to define the transformation on your dataset to extract features
  • Python features: For advanced feature extraction, you can develop Python scripts with the help of libraries (nltk, scikit, etc)