book-to-skill

Introduction: Turn any technical book PDF into a Claude Code skill — ready to study, reference, and use while you work.
More: Author   ReportBugs   
Tags:

📚 book-to-skill

Turn any technical book or document into a Claude Code skill — ready to study, reference, and use while you work.

Claude Code Skill Formats supported Effort: high MIT License

Why · What it generates · Usage · Requirements · How it works · FAQ · Install


You buy a great technical book. You read it once. Three months later you can't remember chapter 7 existed.

The usual workarounds don't help:

  • 📄 "Let me just search the PDF" → you get a list of pages, not answers
  • 🧠 "I'll ask Claude about this book" → it either hallucinates or says it doesn't have the content
  • 📝 "I'll take notes as I read" → you end up with a 200-line doc you never open again

book-to-skill solves this by turning the book into a structured skill Claude loads on demand.

Once installed, you just type /your-book-slug replication and Claude reads the right chapter and answers from the actual content. No hallucination. No digging through PDFs. The book becomes part of your workflow.


📦 What it generates

Running /book-to-skill your-book.pdf (or .epub) creates a full skill at ~/.claude/skills/<slug>/:

File Purpose Size
SKILL.md Core mental models + chapter index ~4,000 tokens
chapters/ch01-*.md One file per chapter, loaded on-demand ~1,000 tokens each
glossary.md Every key term, alphabetically sorted with chapter refs ~1,500 tokens
patterns.md All techniques, algorithms, and design patterns ~2,000 tokens
cheatsheet.md Decision tables and quick-reference rules ~1,000 tokens

Chapter files are loaded on-demand — they don't count against the skill budget until you ask about that topic.


🚀 Usage

/book-to-skill <path-to-document> [skill-name-slug]

Supported document formats: PDF, EPUB, DOCX, TXT, Markdown, reStructuredText, AsciiDoc, HTML, RTF, MOBI/AZW/AZW3.

Examples:

# PDF — derive skill name from filename
/book-to-skill ~/Downloads/designing-data-intensive-applications.pdf

# EPUB — specify a custom slug
/book-to-skill ~/books/clean-code.epub clean-code

# Full path with explicit name
/book-to-skill /tmp/ddd-evans.pdf domain-driven-design

After the skill is created, use it like any other Claude Code skill:

/designing-data-intensive-apps                  # load core mental models
/designing-data-intensive-apps replication      # find and explain a topic
/designing-data-intensive-apps ch05             # dive into chapter 5
/designing-data-intensive-apps "what chapters do you have?"

🔧 Requirements

The extractor tries tools in order per format and uses the first available. If nothing is installed, it tells you which command to run. Plain text, Markdown, reStructuredText and AsciiDoc need no extra deps.

PDF — choose by book type:

Book type Tool Install Speed
Text-heavy (prose, few tables) pdftotext (poppler) sudo apt install poppler-utils ⚡ instant
Text-heavy fallback PyPDF2 pip3 install PyPDF2 ⚡ instant
Text-heavy fallback pdfminer.six pip3 install pdfminer.six ⚡ instant
Technical (code, tables, formulas) docling pip3 install docling ~1.5s/page

Before extraction begins, the skill asks you whether the book is technical or text-heavy and picks the right tool automatically. Docling preserves markdown tables and code blocks; pdftotext is faster for prose-only books.

EPUB:

Tool Install Quality
ebooklib + beautifulsoup4 pip3 install ebooklib beautifulsoup4 ⭐⭐⭐ Best
stdlib zipfile built-in — no install needed ⭐⭐ Always available

Other formats:

Format Tool Install
DOCX python-docx (fallback: stdlib ZIP/XML) pip3 install python-docx
HTML beautifulsoup4 (fallback: stdlib html.parser) pip3 install beautifulsoup4
RTF striprtf (fallback: regex) pip3 install striprtf
MOBI / AZW / AZW3 Calibre ebook-convert (external app, not pip) https://calibre-ebook.com/download
TXT / Markdown / reStructuredText / AsciiDoc built-in

⚙️ How it works

PDF or EPUB
     │
     ▼
Step 1.5 — "Technical or text-heavy book?"
     │
     ├── technical → Docling  (tables + code blocks as markdown, ~1.5s/page)
     └── text      → pdftotext → PyPDF2 → pdfminer  (instant)
     │
     ▼
scripts/extract.py --mode <technical|text>
  EPUB → ebooklib → stdlib zipfile
     │
     ├── /tmp/book_skill_work/full_text.txt
     └── /tmp/book_skill_work/metadata.json
               │
               ▼
          Claude analyzes structure
          (title, author, chapters, ToC)
               │
               ▼
          Generates per-chapter summaries  (800–1,200 tokens each)
          technical → includes Code Examples + Reference Tables sections
          Generates glossary, patterns, cheatsheet
          Generates master SKILL.md with core mental models
               │
               ▼
          ~/.claude/skills/<slug>/  ✅ written
          /tmp/book_skill_work/     🗑️  cleaned up

Extraction benchmark (103-page technical book, CPU only):

Method Time Tokens Tables Code blocks
pdftotext 0.1s 27K 0 0
Docling 164s 27K (+1.2%) 48 36
Design principles (click to expand)
  1. Density over completeness — a 1,000-token summary beats a 10,000-token excerpt
  2. Practitioner voice — "Use X when Y", not "The book explains X"
  3. Front-loaded SKILL.md — compaction keeps the first ~5,000 tokens; the most important content comes first
  4. On-demand chapters — the topic index tells Claude which file to read; chapters load only when needed
  5. Never raw text — always synthesize, summarize, extract signal from the source

❓ FAQ

"Can't I just dump the PDF/EPUB into my Claude project context?"

You can — but every conversation will burn that token budget upfront. A 400-page book is ~200K tokens. With a skill, only the chapters relevant to your question load. The rest stays on disk until you need it.

More importantly: raw text injection is retrieval. A skill is reasoning. When you load a chapter file, Claude isn't searching for keyword matches — it's working with pre-extracted named frameworks, principles, and mental models structured for application, not for reading.


"Isn't this just RAG?"

RAG works at query time: chunk the book → embed everything → find similar vectors → inject into prompt. It's optimized for "find me the part that talks about X."

book-to-skill works at compile time: one deep analysis run extracts the author's actual frameworks, names them, describes when to use each, captures the anti-patterns. The output is structure the author spent years building — not a similarity search over their sentences.

RAG answers: "here are chunks close to your query."
A skill answers: "here are the 12 frameworks this author built, ready to reason with."

For searching across 50+ books, RAG wins. For going deep on one book and using its frameworks while you work, a skill wins.


"Popular books are already in Claude's training data. Why bother?"

For widely-known books (Clean Code, DDIA, Pragmatic Programmer), Claude has general knowledge — but it's compressed, averaged across the entire internet's discussion of the book, and may hallucinate specific quotes or chapter locations.

book-to-skill works from your actual copy. Every framework name, every anti-pattern list, every chapter number is grounded in the text you provided. No training data drift, no hallucinated chapter titles.

It also shines for books Claude doesn't know at all: niche technical references, internal company documentation, recent publications, translated works.


"NotebookLM handles multiple books better."

Absolutely true — if your workflow is "I have 80 books and I want to search across all of them," NotebookLM is the right tool.

book-to-skill is built for a different job: you want to go deep on one book and have its frameworks embedded in your coding or writing workflow, not in a separate browser tab. It's less "library search" and more "the author is sitting next to you while you work."


📥 Install

Copy this into your Claude Code session:

Install book-to-skill: https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md

Or manually:

mkdir -p ~/.claude/skills/book-to-skill/scripts

curl -o ~/.claude/skills/book-to-skill/SKILL.md \
  https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/SKILL.md

curl -o ~/.claude/skills/book-to-skill/scripts/extract.py \
  https://raw.githubusercontent.com/virgiliojr94/book-to-skill/master/scripts/extract.py

Then in any Claude Code session:

/book-to-skill ~/path/to/your-book.pdf
# or
/book-to-skill ~/path/to/your-book.epub

📁 Repository structure

book-to-skill/
├── SKILL.md              # Skill definition + step-by-step instructions
├── scripts/
│   └── extract.py        # PDF + EPUB extraction (pdftotext / PyPDF2 / pdfminer / ebooklib / zipfile)
└── README.md             # This file

License

MIT

Star History

Star History Chart
Apps
About Me
GitHub: Trinea
Facebook: Dev Tools
AI Daily Digest