feast
Introduction: Feature Store for Machine Learning
Tags:
The above architecture is the minimal Feast deployment. Want to run the full Feast on Snowflake/GCP/AWS? Click here.
## 🐣 Getting Started
### 1. Install Feast
commandline
pip install feast
### 2. Create a feature repository
commandline
feast init my_feature_repo
cd my_feature_repo/feature_repo
### 3. Register your feature definitions and set up your feature store
commandline
feast apply
### 4. Explore your data in the web UI (experimental)
commandline
feast ui
### 5. Build a training dataset
python
from feast import FeatureStore
import pandas as pd
from datetime import datetime
entity_df = pd.DataFrame.from_dict({
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1 , 12)
]
})
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
features = [
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
).to_df()
print(training_df.head())
# Train model
# model = ml.fit(training_df)
commandline
event_timestamp driver_id conv_rate acc_rate avg_daily_trips
0 2021-04-12 08:12:10+00:00 1002 0.713465 0.597095 531
1 2021-04-12 10:59:42+00:00 1001 0.072752 0.044344 11
2 2021-04-12 15:01:12+00:00 1004 0.658182 0.079150 220
3 2021-04-12 16:40:26+00:00 1003 0.162092 0.309035 959
### 6. Load feature values into your online store
Option 1: Incremental materialization (recommended)
commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
Option 2: Full materialization with timestamps
commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize 2021-04-12T00:00:00 $CURRENT_TIME
Option 3: Simple materialization without timestamps
commandline
feast materialize --disable-event-timestamp
The --disable-event-timestamp flag allows you to materialize all available feature data using the current datetime as the event timestamp, without needing to specify start and end timestamps. This is useful when your source data lacks proper event timestamp columns.
commandline
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
### 7. Read online features at low latency
python
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
entity_rows=[{"driver_id": 1001}]
).to_dict()
pprint(feature_vector)
# Make prediction
# model.predict(feature_vector)
json
{
"driver_id": [1001],
"driver_hourly_stats__conv_rate": [0.49274],
"driver_hourly_stats__acc_rate": [0.92743],
"driver_hourly_stats__avg_daily_trips": [72]
}
## 📦 Functionality and Roadmap
The list below contains the functionality that contributors are planning to develop for Feast.
We welcome contribution to all items in the roadmap!
Natural Language Processing
[x] Vector Search (Alpha release. See RFC)
[ ] Enhanced Feature Server and SDK for native support for NLP
Data Sources
[x] Snowflake source
[x] Redshift source
[x] BigQuery source
[x] Parquet file source
[x] Azure Synapse + Azure SQL source (contrib plugin)
[x] Hive (community plugin)
[x] Postgres (contrib plugin)
[x] Spark (contrib plugin)
[x] Couchbase (contrib plugin)
[x] Kafka / Kinesis sources (via push support into the online store)
Offline Stores
[x] Snowflake
[x] Redshift
[x] BigQuery
[x] Azure Synapse + Azure SQL (contrib plugin)
[x] Hive (community plugin)
[x] Postgres (contrib plugin)
[x] Trino (contrib plugin)
[x] Spark (contrib plugin)
[x] Couchbase (contrib plugin)
[x] In-memory / Pandas
[x] Custom offline store support
Online Stores
[x] Snowflake
[x] DynamoDB
[x] Redis
[x] Datastore
[x] Bigtable
[x] SQLite
[x] Dragonfly
[x] IKV - Inlined Key Value Store
[x] Azure Cache for Redis (community plugin)
[x] Postgres (contrib plugin)
[x] Cassandra / AstraDB (contrib plugin)
[x] ScyllaDB (contrib plugin)
[x] Couchbase (contrib plugin)
[x] Custom online store support
Feature Engineering
[x] On-demand Transformations (On Read) (Beta release. See RFC)
[x] Streaming Transformations (Alpha release. See RFC)
[ ] Batch transformation (In progress. See RFC)
[x] On-demand Transformations (On Write) (Beta release. See GitHub Issue)
Streaming
[x] Custom streaming ingestion job support
[x] Push based streaming data ingestion to online store
[x] Push based streaming data ingestion to offline store
Deployments
[x] AWS Lambda (Alpha release. See RFC)
[x] Kubernetes (See guide)
Feature Serving
[x] Python Client
[x] Python feature server
[x] Feast Operator (alpha)
[x] Java feature server (alpha)
[x] Go feature server (alpha)
[x] Offline Feature Server (alpha)
[x] Registry server (alpha)
Data Quality Management (See RFC)
[x] Data profiling and validation (Great Expectations)
Feature Discovery and Governance
[x] Python SDK for browsing feature registry
[x] CLI for browsing feature registry
[x] Model-centric feature tracking (feature services)
[x] Amundsen integration (see Feast extractor)
[x] DataHub integration (see DataHub Feast docs)
[x] Feast Web UI (Beta release. See docs)
[ ] Feast Lineage Explorer
## 🎓 Important Resources
Please refer to the official documentation at Documentation
Quickstart
Tutorials
Examples
Running Feast with Snowflake/GCP/AWS
Change Log
## 👋 Contributing
Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:
- Contribution Process for Feast
- Development Guide for Feast
- Development Guide for the Main Feast Repository
## 🌟 GitHub Star History
Thanks goes to these incredible people:
