Skip to main content

Spam Detection

An NLP example with nltk library to predict the spam SMS messages. In this project, we are going to use Python Features to remove stop words and to lemmatize messages. Also, we are going to load an ML model from the Discover > Models tab to create training data for the spam_detection model.

What will you learn?

  • Extract advanced features from our data with Python Features utilizing nltk and scikit libraries.
  • Use a model to create a training data for another model
  • Experimentation tracking with logging metrics: f1_score, accuracy and mean_scores

Install and run

To check out the Layer Spam Detection example, run:

layer clone https://github.com/layerml/examples
cd examples/spam-detection

To build the project:

layer start

File structure

.
├── .layer
├── data
│ ├── sms_featureset
│ │ ├── is_spam
│ │ │ ├── feature.py # Source code of the `is_spam` feature. We do basic label encoding.
│ │ │ ├── requirements.txt # Environment config file for the `is_spam` feature
│ │ ├── message
│ │ │ ├── feature.py # Source code of the `message` feature. We remove stop words and lemmatize messages.
│ │ │ ├── requirements.txt # Environment config file for the `message` feature
│ │ └── dataset.yaml
│ └── spam_data
│ └── dataset.yaml # Declares where our source `spam_messages` dataset is
├── models
│ └── spam_detection
│ ├── model.py # Source code of the `Spam Detection` model
│ ├── model.yaml # Training directives of our model
│ └── requirements.txt # Environment config file
│ └── vectorizer
│ ├── model.py # Source code of the `Vectorizer` model
│ ├── model.yaml # Training directives of our model
│ └── requirements.txt # Environment config file
└── README.md