feast

Project Url: gojek/feast
Introduction: Feature Store for Machine Learning
More: Author   ReportBugs   
Tags:


PyPI - Downloads GitHub contributors unit-tests integration-tests-and-build linter Docs Latest Python API License GitHub Release ## Join us on Slack! 👋👋👋 Come say hi on Slack! Check out our DeepWiki! ## Overview feast-dev%2Ffeast | Trendshift Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference. Feast allows ML platform teams to: Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online). Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training. Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another. Please see our documentation for more information about the project. ## 📐 Architecture The above architecture is the minimal Feast deployment. Want to run the full Feast on Snowflake/GCP/AWS? Click here. ## 🐣 Getting Started ### 1. Install Feast commandline pip install feast ### 2. Create a feature repository commandline feast init my_feature_repo cd my_feature_repo/feature_repo ### 3. Register your feature definitions and set up your feature store commandline feast apply ### 4. Explore your data in the web UI (experimental) Web UI commandline feast ui ### 5. Build a training dataset python from feast import FeatureStore import pandas as pd from datetime import datetime entity_df = pd.DataFrame.from_dict({ "driver_id": [1001, 1002, 1003, 1004], "event_timestamp": [ datetime(2021, 4, 12, 10, 59, 42), datetime(2021, 4, 12, 8, 12, 10), datetime(2021, 4, 12, 16, 40, 26), datetime(2021, 4, 12, 15, 1 , 12) ] }) store = FeatureStore(repo_path=".") training_df = store.get_historical_features( entity_df=entity_df, features = [ 'driver_hourly_stats:conv_rate', 'driver_hourly_stats:acc_rate', 'driver_hourly_stats:avg_daily_trips' ], ).to_df() print(training_df.head()) # Train model # model = ml.fit(training_df) commandline event_timestamp driver_id conv_rate acc_rate avg_daily_trips 0 2021-04-12 08:12:10+00:00 1002 0.713465 0.597095 531 1 2021-04-12 10:59:42+00:00 1001 0.072752 0.044344 11 2 2021-04-12 15:01:12+00:00 1004 0.658182 0.079150 220 3 2021-04-12 16:40:26+00:00 1003 0.162092 0.309035 959 ### 6. Load feature values into your online store commandline CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") feast materialize-incremental $CURRENT_TIME commandline Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done! ### 7. Read online features at low latency python from pprint import pprint from feast import FeatureStore store = FeatureStore(repo_path=".") feature_vector = store.get_online_features( features=[ 'driver_hourly_stats:conv_rate', 'driver_hourly_stats:acc_rate', 'driver_hourly_stats:avg_daily_trips' ], entity_rows=[{"driver_id": 1001}] ).to_dict() pprint(feature_vector) # Make prediction # model.predict(feature_vector) json { "driver_id": [1001], "driver_hourly_stats__conv_rate": [0.49274], "driver_hourly_stats__acc_rate": [0.92743], "driver_hourly_stats__avg_daily_trips": [72] } ## 📦 Functionality and Roadmap The list below contains the functionality that contributors are planning to develop for Feast. We welcome contribution to all items in the roadmap! Natural Language Processing [x] Vector Search (Alpha release. See RFC) [ ] Enhanced Feature Server and SDK for native support for NLP Data Sources [x] Snowflake source [x] Redshift source [x] BigQuery source [x] Parquet file source [x] Azure Synapse + Azure SQL source (contrib plugin) [x] Hive (community plugin) [x] Postgres (contrib plugin) [x] Spark (contrib plugin) [x] Couchbase (contrib plugin) [x] Kafka / Kinesis sources (via push support into the online store) Offline Stores [x] Snowflake [x] Redshift [x] BigQuery [x] Azure Synapse + Azure SQL (contrib plugin) [x] Hive (community plugin) [x] Postgres (contrib plugin) [x] Trino (contrib plugin) [x] Spark (contrib plugin) [x] Couchbase (contrib plugin) [x] In-memory / Pandas [x] Custom offline store support Online Stores [x] Snowflake [x] DynamoDB [x] Redis [x] Datastore [x] Bigtable [x] SQLite [x] Dragonfly [x] IKV - Inlined Key Value Store [x] Azure Cache for Redis (community plugin) [x] Postgres (contrib plugin) [x] Cassandra / AstraDB (contrib plugin) [x] ScyllaDB (contrib plugin) [x] Couchbase (contrib plugin) [x] Custom online store support Feature Engineering [x] On-demand Transformations (On Read) (Beta release. See RFC) [x] Streaming Transformations (Alpha release. See RFC) [ ] Batch transformation (In progress. See RFC) [x] On-demand Transformations (On Write) (Beta release. See GitHub Issue) Streaming [x] Custom streaming ingestion job support [x] Push based streaming data ingestion to online store [x] Push based streaming data ingestion to offline store Deployments [x] AWS Lambda (Alpha release. See RFC) [x] Kubernetes (See guide) Feature Serving [x] Python Client [x] Python feature server [x] Feast Operator (alpha) [x] Java feature server (alpha) [x] Go feature server (alpha) [x] Offline Feature Server (alpha) [x] Registry server (alpha) Data Quality Management (See RFC) [x] Data profiling and validation (Great Expectations) Feature Discovery and Governance [x] Python SDK for browsing feature registry [x] CLI for browsing feature registry [x] Model-centric feature tracking (feature services) [x] Amundsen integration (see Feast extractor) [x] DataHub integration (see DataHub Feast docs) [x] Feast Web UI (Beta release. See docs) [ ] Feast Lineage Explorer ## 🎓 Important Resources Please refer to the official documentation at Documentation Quickstart Tutorials Examples Running Feast with Snowflake/GCP/AWS Change Log ## 👋 Contributing Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project: - Contribution Process for Feast - Development Guide for Feast - Development Guide for the Main Feast Repository ## 🌟 GitHub Star History

Star History Chart

Thanks goes to these incredible people:

Apps
About Me
GitHub: Trinea
Facebook: Dev Tools