apfel
Use the FREE local Apple Intelligence LLM on your Mac - your model, your machine, your way.
No API keys. No cloud. No subscriptions. No per-token billing. The AI is already on your computer - apfel lets you use it.
What is this
Every Mac with Apple Silicon has a built-in LLM - Apple's on-device foundation model, shipped as part of Apple Intelligence. Apple provides the FoundationModels framework (macOS 26+) to access it, but only exposes it through Siri and system features. apfel wraps it in a CLI and an HTTP server - so you can actually use it. All inference runs on-device, no network calls.
- UNIX tool -
echo "summarize this" | apfel- pipe-friendly, file attachments, JSON output, exit codes - OpenAI-compatible server -
apfel --serve- drop-in replacement atlocalhost:11434, works with any OpenAI SDK - Tool calling - function calling with schema conversion, full round-trip support
- Zero cost - no API keys, no cloud, no subscriptions, 4096-token context window

Requirements & Install
- Apple Silicon Mac, macOS 26 Tahoe or newer, Apple Intelligence enabled
- Building from source requires Command Line Tools with macOS 26.4 SDK (ships Swift 6.3). No Xcode required.
Homebrew (recommended):
brew tap Arthur-Ficial/tap
brew install apfel
brew upgrade apfel
Update:
brew upgrade apfel
Build from source:
git clone https://github.com/Arthur-Ficial/apfel.git
cd apfel
make install
Troubleshooting: docs/install.md
Quick Start
UNIX tool
Shell note: if your prompt contains !, prefer single quotes in zsh/bash so history expansion does not break copy-paste. Example: apfel 'Hello, Mac!'
# Single prompt
apfel "What is the capital of Austria?"
# Permissive mode -- reduces guardrail false positives for creative/long prompts
apfel --permissive "Write a dramatic opening for a thriller novel"
# Stream output
apfel --stream "Write a haiku about code"
# Pipe input
echo "Summarize: $(cat README.md)" | apfel
# Attach file content to prompt
apfel -f README.md "Summarize this project"
# Attach multiple files
apfel -f old.swift -f new.swift "What changed between these two files?"
# Combine files with piped input
git diff HEAD~1 | apfel -f CONVENTIONS.md "Review this diff against our conventions"
# JSON output for scripting
apfel -o json "Translate to German: hello" | jq .content
# System prompt
apfel -s "You are a pirate" "What is recursion?"
# System prompt from file
apfel --system-file persona.txt "Explain TCP/IP"
# Quiet mode for shell scripts
result=$(apfel -q "Capital of France? One word.")
OpenAI-compatible server
Start the server:
apfel --serve
Then in another terminal:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"apple-foundationmodel","messages":[{"role":"user","content":"Hello"}]}'
Works with the official Python client:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="unused")
resp = client.chat.completions.create(
model="apple-foundationmodel",
messages=[{"role": "user", "content": "What is 1+1?"}],
)
print(resp.choices[0].message.content)
Interactive chat
apfel --chat
apfel --chat -s "You are a helpful coding assistant"
apfel --chat --mcp ./mcp/calculator/server.py # chat with MCP tools
apfel --chat --debug # debug output to stderr
Ctrl-C exits cleanly. Context window is managed automatically with configurable strategies:
apfel --chat --context-strategy newest-first # default: keep recent turns
apfel --chat --context-strategy oldest-first # keep earliest turns
apfel --chat --context-strategy sliding-window --context-max-turns 6
apfel --chat --context-strategy summarize # compress old turns via on-device model
apfel --chat --context-strategy strict # error on overflow, no trimming
apfel --chat --context-output-reserve 256 # custom output token reserve
Updating
apfel --update
Or directly via Homebrew: brew upgrade apfel
Benchmarking
Measure internal overhead on the installed release binary:
apfel --benchmark -o json
make benchmark
File attachments (-f/--file)
Attach files to any prompt with -f (repeatable). Contents are prepended to your prompt.
apfel -f main.swift "Explain what this code does"
apfel -f before.txt -f after.txt "What are the differences?"
git diff HEAD~1 | apfel -f style-guide.md "Any style violations in this diff?"
apfel -f data.csv -o json "Extract the top 5 rows" | jq .content
Files, stdin, and prompt arguments all compose:
apfel -f poem.txt # file only
apfel -f poem.txt "Translate this to German" # file + prompt
echo "some text" | apfel "Summarize this" # stdin + prompt
echo "ctx" | apfel -f code.swift "Explain with context" # all three
Demos
See demo/ for real-world shell scripts powered by apfel.
cmd - natural language to shell command:
demo/cmd "find all .log files modified today"
# $ find . -name "*.log" -type f -mtime -1
demo/cmd -x "show disk usage sorted by size" # -x = execute after confirm
demo/cmd -c "list open ports" # -c = copy to clipboard
Shell function version - add to your .zshrc and use cmd from anywhere:
# cmd - natural language to shell command (apfel). Add to .zshrc:
cmd(){ local x c r a; while [[ $1 == -* ]]; do case $1 in -x)x=1;shift;; -c)c=1;shift;; *)break;; esac; done; r=$(apfel -q -s 'Output only a shell command.' "$*" | sed '/^```/d;/^#/d;s/\x1b\[[0-9;]*[a-zA-Z]//g;s/^[[:space:]]*//;/^$/d' | head -1); [[ $r ]] || { echo "no command generated"; return 1; }; printf '\e[32m$\e[0m %s\n' "$r"; [[ $c ]] && printf %s "$r" | pbcopy && echo "(copied)"; [[ $x ]] && { printf 'Run? [y/N] '; read -r a; [[ $a == y ]] && eval "$r"; }; return 0; }
cmd find all swift files larger than 1MB # shows: $ find . -name "*.swift" -size +1M
cmd -c show disk usage sorted by size # shows command + copies to clipboard
cmd -x what process is using port 3000 # shows command + asks to run it
cmd list all git branches merged into main
cmd count lines of code by language
oneliner - complex pipe chains from plain English:
demo/oneliner "sum the third column of a CSV"
# $ awk -F',' '{sum += $3} END {print sum}' file.csv
demo/oneliner "count unique IPs in access.log"
# $ awk '{print $1}' access.log | sort | uniq -c | sort -rn
mac-narrator - your Mac's inner monologue:
demo/mac-narrator # one-shot: what's happening right now?
demo/mac-narrator --watch # continuous narration every 60s
Also in demo/:
- wtd - "what's this directory?" - instant project orientation
- explain - explain a command, error, or code snippet
- naming - naming suggestions for functions, variables, files
- port - what's using this port?
- gitsum - summarize recent git activity
Debug GUI
brew install Arthur-Ficial/tap/apfel-gui

Native SwiftUI debug inspector with request timeline, MCP protocol viewer, chat, TTS/STT - all on-device. apfel-gui repo ->
MCP Tool Support
Attach MCP tool servers with --mcp. apfel discovers tools, executes them automatically, and returns the final answer. No glue code needed.
apfel --mcp ./mcp/calculator/server.py "What is 15 times 27?"
mcp: ./mcp/calculator/server.py - add, subtract, multiply, divide, sqrt, power ← stderr
tool: multiply({"a": 15, "b": 27}) = 405 ← stderr
15 times 27 is 405. ← stdout
Tool info goes to stderr; only the answer goes to stdout. Use -q to suppress tool info.
apfel --mcp ./server_a.py --mcp ./server_b.py "Use both tools" # multiple servers
apfel --serve --mcp ./mcp/calculator/server.py # server mode
apfel --chat --mcp ./mcp/calculator/server.py # chat mode
Ships with a calculator MCP server at mcp/calculator/. See MCP docs for details.
OpenAI API Compatibility
Base URL: http://localhost:11434/v1
| Feature | Status | Notes |
|---|---|---|
POST /v1/chat/completions |
Supported | Streaming + non-streaming |
GET /v1/models |
Supported | Returns apple-foundationmodel |
GET /health |
Supported | Model availability, context window, languages |
GET /v1/logs, /v1/logs/stats |
Debug only | Requires --debug |
| Tool calling | Supported | Native ToolDefinition + JSON detection. See Tool Calling Guide |
response_format: json_object |
Supported | Via system prompt injection |
temperature, max_tokens, seed |
Supported | Mapped to GenerationOptions |
stream: true |
Supported | SSE with usage stats in final chunk |
finish_reason |
Supported | stop, tool_calls, length |
| Context strategies | Supported | x_context_strategy, x_context_max_turns, x_context_output_reserve extension fields |
| CORS | Supported | Enable with --cors |
POST /v1/completions |
501 | Legacy text completions not supported |
POST /v1/embeddings |
501 | Embeddings not available on-device |
logprobs=true, n>1, stop, presence_penalty, frequency_penalty |
400 | Rejected explicitly. n=1 and logprobs=false are accepted as no-ops |
| Multi-modal (images) | 400 | Rejected with clear error |
Authorization header |
Supported | Required when --token is set. See Server Security |
Full API spec: openai/openai-openapi
Limitations
| Constraint | Detail |
|---|---|
| Context window | 4096 tokens (input + output combined) |
| Platform | macOS 26+, Apple Silicon only |
| Model | One model (apple-foundationmodel), not configurable |
| Guardrails | Apple's safety system may block benign prompts (use --permissive to reduce false positives). See comparison |
| Speed | On-device, not cloud-scale - a few seconds per response |
| No embeddings / vision | Not available on-device |
CLI Reference
MODES
apfel <prompt> Single prompt (default)
apfel --stream <prompt> Stream response tokens
apfel --chat Interactive conversation
apfel --serve Start OpenAI-compatible server
apfel --benchmark Run internal performance benchmarks
INPUT
apfel -f, --file <path> <prompt> Attach file content (repeatable)
apfel -s, --system <text> <prompt> Set system prompt
apfel --system-file <path> <prompt> Read system prompt from file
apfel --mcp <server.py> <prompt> Attach MCP tool server (repeatable)
OUTPUT
-o, --output <fmt> Output format: plain, json
-q, --quiet Suppress non-essential output
--no-color Disable ANSI colors
MODEL
--temperature <n> Sampling temperature (e.g., 0.7)
--seed <n> Random seed for reproducibility
--max-tokens <n> Maximum response tokens
--permissive Relaxed guardrails (reduces false positives)
--retry [n] Retry transient errors with backoff (default: 3)
--debug Enable debug logging to stderr (all modes)
CONTEXT (--chat)
--context-strategy <s> newest-first, oldest-first, sliding-window, summarize, strict
--context-max-turns <n> Max history turns (sliding-window only)
--context-output-reserve <n> Tokens reserved for output (default: 512)
SERVER (--serve)
--port <n> Server port (default: 11434)
--host <addr> Bind address (default: 127.0.0.1)
--cors Enable CORS headers
--allowed-origins <origins> Comma-separated allowed origins
--no-origin-check Disable origin checking
--token <secret> Require Bearer token auth
--token-auto Generate random Bearer token
--public-health Keep /health unauthenticated
--footgun Disable all protections
--max-concurrent <n> Max concurrent requests (default: 5)
META
-v, --version Print version
-h, --help Show help
--release Detailed build info
--model-info Print model capabilities
--update Check for updates via Homebrew
Examples by flag:
# -f, --file — attach file content to prompt (repeatable)
apfel -f main.swift "Explain this code"
apfel -f before.txt -f after.txt "What changed?"
# -s, --system — set a system prompt
apfel -s "You are a pirate" "What is recursion?"
apfel -s "Reply in JSON only" "List 3 colors"
# --system-file — read system prompt from a file
apfel --system-file persona.txt "Introduce yourself"
# --mcp — attach MCP tool servers (repeatable)
apfel --mcp ./mcp/calculator/server.py "What is 15 times 27?"
apfel --mcp ./calc.py --mcp ./weather.py "Use both tools"
# -f, --file
apfel -f main.swift "Explain this code"
apfel -f before.txt -f after.txt "What changed?"
# -s, --system
apfel -s "You are a pirate" "What is recursion?"
# --system-file
apfel --system-file persona.txt "Introduce yourself"
# --mcp
apfel --mcp ./mcp/calculator/server.py "What is 15 times 27?"
apfel --mcp ./calc.py --mcp ./weather.py "Use both tools"
# -o, --output
apfel -o json "Translate to German: hello" | jq .content
# -q, --quiet
apfel -q "Give me a UUID"
# --no-color
NO_COLOR=1 apfel "Hello"
# --temperature
apfel --temperature 0.0 "What is 2+2?"
apfel --temperature 1.5 "Write a wild poem"
# --seed
apfel --seed 42 "Tell me a joke"
# --max-tokens
apfel --max-tokens 50 "Explain quantum computing"
# --permissive — relaxed guardrails (see docs/PERMISSIVE.md for comparison)
apfel --permissive "Write a villain monologue"
apfel --permissive -f long-document.md "Summarize this"
# --retry
apfel --retry "What is 2+2?"
# --debug
apfel --debug "Hello world"
# --stream
apfel --stream "Write a haiku about code"
# --chat
apfel --chat
apfel --chat -s "You are a helpful coding assistant"
# --context-strategy
apfel --chat --context-strategy newest-first # default
apfel --chat --context-strategy sliding-window --context-max-turns 6
apfel --chat --context-strategy summarize # compress old turns
# --serve
apfel --serve
apfel --serve --port 3000 --host 0.0.0.0
# --cors, --token, --footgun
apfel --serve --cors
apfel --serve --token "my-secret-token"
apfel --serve --footgun # only for local development!
# --token-auto, --public-health
apfel --serve --token-auto --host 0.0.0.0 --public-health
# --allowed-origins, --no-origin-check
apfel --serve --allowed-origins "https://myapp.com,https://staging.myapp.com"
apfel --serve --no-origin-check
# --max-concurrent
apfel --serve --max-concurrent 2
# --debug (server: also enables /v1/logs)
apfel --serve --debug
# --context-output-reserve
apfel --chat --context-output-reserve 256
# --benchmark, --model-info, --update, --release, --version, --help
apfel --benchmark -o json | jq '.benchmarks[] | {name, speedup_ratio}'
apfel --model-info
apfel --update
apfel --release
apfel --version
apfel --help
See Server Security for detailed documentation on security options.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Runtime error |
| 2 | Usage error (bad flags) |
| 3 | Guardrail blocked |
| 4 | Context overflow |
| 5 | Model unavailable |
| 6 | Rate limited |
Environment Variables
| Variable | Description |
|---|---|
APFEL_SYSTEM_PROMPT |
Default system prompt |
APFEL_HOST |
Server bind address |
APFEL_PORT |
Server port |
APFEL_TOKEN |
Bearer token for server authentication |
APFEL_TEMPERATURE |
Default temperature |
APFEL_MAX_TOKENS |
Default max tokens |
APFEL_CONTEXT_STRATEGY |
Default context strategy |
APFEL_CONTEXT_MAX_TURNS |
Max turns for sliding-window |
APFEL_CONTEXT_OUTPUT_RESERVE |
Tokens reserved for output |
NO_COLOR |
Disable colors (no-color.org) |
Architecture
CLI (single/stream/chat) ──┐
├─→ FoundationModels.SystemLanguageModel
HTTP Server (/v1/*) ───────┘ (100% on-device, zero network)
ContextManager → Transcript API
SchemaConverter → native ToolDefinitions
TokenCounter → real token counts (SDK 26.4)
Swift 6.3 strict concurrency. Three targets: ApfelCore (pure logic, unit-testable), apfel (CLI + server), apfel-tests (pure Swift runner, no XCTest). No Xcode required.
Build & Test
make install # build release + install to /usr/local/bin
make build # build release only
make version # print current version
make release-minor # bump minor: 0.6.x -> 0.7.0
swift build # quick debug build (no version bump)
swift run apfel-tests # unit tests
python3 -m pytest Tests/integration/ -v # integration tests (auto-starts servers)
apfel --benchmark -o json # performance report
Every make build/make install auto-bumps the patch version, updates the README badge, and generates build metadata (.version is the single source of truth).
Related Projects
- apfel-clip - AI clipboard actions from the menu bar
- apfel-gui - Native macOS debug GUI (inspector, MCP viewer, TTS/STT)
Examples
See docs/EXAMPLES.md for 50+ real prompts with unedited model output.
