A featureset is a group of calculated features that provide a high-level interface to access individual features. Featuresets are different from static datasets or ordinary database tables as they provide the capability to time-travel to get point-in-time values of its underlying features.
Featuresets are first-class entities in Layer. They are integral to and built within a Layer project. They are stored in the Layer Data Catalog.
All featuresets are defined in a directory, with a
dataset.yml file at the root linked to one or more SQL or Python files. Featuresets have the following basic layout:
Featuresets are configured in a
dataset.yml file. An example is shown below, alongside field definitions. Click a definition to highlight the code that it refers to.
Version of the model definition file. This is used to make sure backwards-incompatible changes in config format do not break Layer CLI.
Name of the featureset. The name will be used to identify the featureset in the Data Catalog.
Description of the dataset. We recommend writing a description that future coworkers (or future you) will be grateful for. This description is displayed on the featureset card in the Data Catalog.
Determines what type of data is contained in this dataset.
featureset is the only option for featuresets.
List of features.
Name of the feature
Description of the feature.
Source code of the feature. SQL source query file indicates SQL features, Python source code file indicates either Python or Spark features.
primary_keys. This field is used to join the features under a featureset. Every single feature has an ID and a Value column. The
primary_key field tells Layer how to join the features
There are two ways you can define your features in a Layer Project.