feast
Introduction: Feature Store for Machine Learning
Tags:

commandline
pip install feast
### 2. Create a feature repository
commandline
feast init my_feature_repo
cd my_feature_repo/feature_repo
### 3. Register your feature definitions and set up your feature store
commandline
feast apply
### 4. Explore your data in the web UI (experimental)

commandline
feast ui
### 5. Build a training dataset
python
from feast import FeatureStore
import pandas as pd
from datetime import datetime
entity_df = pd.DataFrame.from_dict({
"driver_id": [1001, 1002, 1003, 1004],
"event_timestamp": [
datetime(2021, 4, 12, 10, 59, 42),
datetime(2021, 4, 12, 8, 12, 10),
datetime(2021, 4, 12, 16, 40, 26),
datetime(2021, 4, 12, 15, 1 , 12)
]
})
store = FeatureStore(repo_path=".")
training_df = store.get_historical_features(
entity_df=entity_df,
features = [
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
).to_df()
print(training_df.head())
# Train model
# model = ml.fit(training_df)
commandline
event_timestamp driver_id conv_rate acc_rate avg_daily_trips
0 2021-04-12 08:12:10+00:00 1002 0.713465 0.597095 531
1 2021-04-12 10:59:42+00:00 1001 0.072752 0.044344 11
2 2021-04-12 15:01:12+00:00 1004 0.658182 0.079150 220
3 2021-04-12 16:40:26+00:00 1003 0.162092 0.309035 959
### 6. Load feature values into your online store
commandline
CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME
commandline
Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!
### 7. Read online features at low latency
python
from pprint import pprint
from feast import FeatureStore
store = FeatureStore(repo_path=".")
feature_vector = store.get_online_features(
features=[
'driver_hourly_stats:conv_rate',
'driver_hourly_stats:acc_rate',
'driver_hourly_stats:avg_daily_trips'
],
entity_rows=[{"driver_id": 1001}]
).to_dict()
pprint(feature_vector)
# Make prediction
# model.predict(feature_vector)
json
{
"driver_id": [1001],
"driver_hourly_stats__conv_rate": [0.49274],
"driver_hourly_stats__acc_rate": [0.92743],
"driver_hourly_stats__avg_daily_trips": [72]
}
## 📦 Functionality and Roadmap
The list below contains the functionality that contributors are planning to develop for Feast.
We welcome contribution to all items in the roadmap!
Natural Language Processing
[x] Vector Search (Alpha release. See RFC)
[ ] Enhanced Feature Server and SDK for native support for NLP
Data Sources
[x] Snowflake source
[x] Redshift source
[x] BigQuery source
[x] Parquet file source
[x] Azure Synapse + Azure SQL source (contrib plugin)
[x] Hive (community plugin)
[x] Postgres (contrib plugin)
[x] Spark (contrib plugin)
[x] Couchbase (contrib plugin)
[x] Kafka / Kinesis sources (via push support into the online store)
Offline Stores
[x] Snowflake
[x] Redshift
[x] BigQuery
[x] Azure Synapse + Azure SQL (contrib plugin)
[x] Hive (community plugin)
[x] Postgres (contrib plugin)
[x] Trino (contrib plugin)
[x] Spark (contrib plugin)
[x] Couchbase (contrib plugin)
[x] In-memory / Pandas
[x] Custom offline store support
Online Stores
[x] Snowflake
[x] DynamoDB
[x] Redis
[x] Datastore
[x] Bigtable
[x] SQLite
[x] Dragonfly
[x] IKV - Inlined Key Value Store
[x] Azure Cache for Redis (community plugin)
[x] Postgres (contrib plugin)
[x] Cassandra / AstraDB (contrib plugin)
[x] ScyllaDB (contrib plugin)
[x] Couchbase (contrib plugin)
[x] Custom online store support
Feature Engineering
[x] On-demand Transformations (On Read) (Beta release. See RFC)
[x] Streaming Transformations (Alpha release. See RFC)
[ ] Batch transformation (In progress. See RFC)
[x] On-demand Transformations (On Write) (Beta release. See GitHub Issue)
Streaming
[x] Custom streaming ingestion job support
[x] Push based streaming data ingestion to online store
[x] Push based streaming data ingestion to offline store
Deployments
[x] AWS Lambda (Alpha release. See RFC)
[x] Kubernetes (See guide)
Feature Serving
[x] Python Client
[x] Python feature server
[x] Feast Operator (alpha)
[x] Java feature server (alpha)
[x] Go feature server (alpha)
[x] Offline Feature Server (alpha)
[x] Registry server (alpha)
Data Quality Management (See RFC)
[x] Data profiling and validation (Great Expectations)
Feature Discovery and Governance
[x] Python SDK for browsing feature registry
[x] CLI for browsing feature registry
[x] Model-centric feature tracking (feature services)
[x] Amundsen integration (see Feast extractor)
[x] DataHub integration (see DataHub Feast docs)
[x] Feast Web UI (Beta release. See docs)
[ ] Feast Lineage Explorer
## 🎓 Important Resources
Please refer to the official documentation at Documentation
Quickstart
Tutorials
Examples
Running Feast with Snowflake/GCP/AWS
Change Log
## 👋 Contributing
Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:
- Contribution Process for Feast
- Development Guide for Feast
- Development Guide for the Main Feast Repository
## 🌟 GitHub Star History
Thanks goes to these incredible people: