DSPy: Programming Language Models
Table of Contents
Introduction
DSPy revolutionizes how we work with language models by treating them as programmable systems rather than black boxes requiring manual prompt engineering. Instead of spending hours crafting the perfect prompt, DSPy lets you define what you want and automatically optimizes how to get it.
This guide will take you from DSPy basics to advanced optimization techniques, including GEPA (Genetic Evolution with Progressive Annotation) for evolving high-performance LLM programs.
By the end, you'll understand how to build self-improving AI systems that learn better instructions and demonstrations automatically, dramatically reducing development time while improving performance.
Why This Guide Matters
DSPy represents a paradigm shift in LLM development. Understanding it will help you:
- Build more reliable and reproducible LLM applications
- Reduce prompt engineering time from days to minutes
- Create self-improving systems that optimize themselves
- Balance quality, cost, and latency automatically
1. Why DSPy?
The Problem: Traditional prompt engineering is brittle, time-consuming, and doesn't scale. You manually craft prompts, test them, tweak them, and repeat — often breaking what worked when you make changes.
DSPy's Solution
DSPy introduces a declarative approach where you define what you want (the task) and let optimizers figure out how to achieve it (the prompts and examples).
- Declarative Programming: Define inputs and outputs with Signatures
- Modular Composition: Combine simple modules into complex pipelines
- Automatic Optimization: Let teleprompters learn optimal prompts and demonstrations
- Reproducible Workflows: Version and deploy optimized programs reliably
Perfect For
Classification Tasks
Sentiment analysis, intent detection, categorization
RAG Systems
Question answering with retrieval augmentation
Multi-step Reasoning
Complex agents and chain-of-thought tasks
Tool Use
Function calling and API integration
Think of DSPy as "PyTorch for prompts" — it provides the building blocks and optimization algorithms to build sophisticated LLM applications without manual prompt engineering.
2. Getting Started
Installing and configuring DSPy is straightforward. You'll need Python 3.8+ and an API key for your preferred language model provider.
Installation
pip install dspy-ai
Basic Configuration
import dspy
# Configure with OpenAI
lm = dspy.LM("openai/gpt-4o-mini", api_key="your-api-key")
dspy.configure(lm=lm)
# Or use Anthropic
lm = dspy.LM("anthropic/claude-3-haiku-20240307", api_key="your-api-key")
dspy.configure(lm=lm)
# Or use local models
lm = dspy.LM("ollama/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)
Supported Providers
- OpenAI (GPT-3.5, GPT-4, GPT-4o)
- Anthropic (Claude 3 family)
- Google (Gemini models)
- Local models via Ollama
- Any OpenAI-compatible API
Start with smaller, cheaper models during development. You can always switch to more powerful models for production after optimization.
3. Core Concepts
DSPy is built on four fundamental concepts that work together to create powerful, self-optimizing LLM programs.
1. Signatures: Typed I/O Contracts
Signatures define the structure of your task — what goes in and what comes out. They're like function signatures but for LLM operations.
class Summarize(dspy.Signature):
"""Summarize the document in one paragraph."""
document: str = dspy.InputField()
summary: str = dspy.OutputField()
class QuestionAnswer(dspy.Signature):
"""Answer questions using the provided context."""
context: str = dspy.InputField()
question: str = dspy.InputField()
answer: str = dspy.OutputField()
2. Modules: Composable Building Blocks
Modules are the workhorses of DSPy. They take signatures and turn them into executable components that can be chained together.
class Summarizer(dspy.Module):
def __init__(self):
super().__init__()
self.summarize = dspy.Predict(Summarize)
def forward(self, document):
return self.summarize(document=document).summary
# For complex reasoning, use ChainOfThought
class Reasoner(dspy.Module):
def __init__(self):
super().__init__()
self.reason = dspy.ChainOfThought("question -> reasoning -> answer")
def forward(self, question):
return self.reason(question=question)
3. Predictors: Execution Strategies
- Predict: Basic LLM call with signature
- ChainOfThought: Adds reasoning steps before the answer
- ProgramOfThought: Generates and executes code
- ReAct: Combines reasoning with action (tool use)
4. Teleprompters: Automatic Optimization
Teleprompters are optimizers that automatically improve your programs by learning better instructions and selecting optimal few-shot examples.
- BootstrapFewShot: Learns from a small training set
- BootstrapFewShotWithRandomSearch: Explores multiple prompt variations
- MIPRO: Multi-stage optimization for complex pipelines
- GEPA: Genetic evolution for Pareto-optimal solutions
4. Building Your First Program
Let's build a simple explanation system that takes complex topics and explains them in simple terms. This example demonstrates the core DSPy workflow.
Step 1: Define the Signature
import dspy
class ExplainSimply(dspy.Signature):
"""Explain a complex topic in simple terms."""
topic: str = dspy.InputField(desc="The complex topic to explain")
explanation: str = dspy.OutputField(desc="Simple explanation suitable for a 10-year-old")
Step 2: Create a Module
class SimpleExplainer(dspy.Module):
def __init__(self):
super().__init__()
# Use ChainOfThought for better reasoning
self.explain = dspy.ChainOfThought(ExplainSimply)
def forward(self, topic):
result = self.explain(topic=topic)
return result.explanation
Step 3: Use the Program
# Configure DSPy with your LLM
lm = dspy.LM("openai/gpt-4o-mini", api_key="...")
dspy.configure(lm=lm)
# Create and use the explainer
explainer = SimpleExplainer()
explanation = explainer(topic="quantum computing")
print(explanation)
# Output: "Quantum computing is like having a super-powerful calculator that can
# try many different answers at the same time..."
Start simple with Predict, then upgrade to ChainOfThought when you need reasoning. The beauty of DSPy is you can swap predictors without changing your module structure.
5. Optimization with Teleprompters
The real power of DSPy comes from automatic optimization. Instead of manually crafting prompts, you provide examples and a metric, and DSPy learns the best instructions and demonstrations.
Creating a Training Set
# Prepare training examples
trainset = [
{"topic": "inflation", "explanation": "Imagine everyone wants the same toy..."},
{"topic": "DNA", "explanation": "DNA is like a recipe book for your body..."},
{"topic": "black holes", "explanation": "A black hole is like a cosmic vacuum..."},
# Add 10-20 examples for best results
]
Defining a Metric
def simplicity_metric(example, prediction):
# Check if explanation exists and is simple enough
if not prediction.explanation:
return 0.0
# Simple heuristics for quality
words = prediction.explanation.split()
# Penalize if too short or too long
if len(words) < 20 or len(words) > 100:
return 0.5
# Check for complex words (you could use a readability library here)
complex_words = ["quantum", "algorithm", "theoretical", "hypothesis"]
complexity_score = sum(1 for w in words if w.lower() in complex_words)
# Return score (0 to 1)
return max(0, 1 - (complexity_score * 0.1))
Running Optimization
from dspy.teleprompt import BootstrapFewShot
# Create the teleprompter
teleprompter = BootstrapFewShot(
metric=simplicity_metric,
max_bootstrapped_demos=3, # Number of examples to include
max_labeled_demos=5, # Max examples to use for bootstrapping
max_rounds=2 # Optimization iterations
)
# Compile the program
explainer = SimpleExplainer()
optimized_explainer = teleprompter.compile(
student=explainer,
trainset=trainset
)
# The optimized version now has:
# 1. Better instructions in the prompt
# 2. Carefully selected few-shot examples
# 3. Consistent high-quality outputs
Your metric is crucial — it defines what "good" means for your task. Spend time crafting a metric that captures what you actually care about, including edge cases and failure modes.
6. Advanced: GEPA Evolution
GEPA (Genetic Evolution with Progressive Annotation) represents the cutting edge of DSPy optimization. It evolves programs using genetic algorithms while balancing multiple objectives like quality, cost, and latency.
How GEPA Works
- Population: Starts with multiple program variants
- Evolution: Mutates and crosses successful programs
- Reflection: Programs annotate why they succeed or fail
- Pareto Optimization: Balances multiple competing objectives
Using GEPA
from dspy.optimize import GEPA
# Define multiple objectives
def quality_metric(example, pred):
# Your quality score (0-1)
return simplicity_metric(example, pred)
def cost_metric(example, pred, trace):
# Calculate token usage
return trace.total_tokens / 1000 # Cost per 1k tokens
def latency_metric(example, pred, trace):
# Measure response time
return trace.response_time
# Configure GEPA
gepa_optimizer = GEPA(
metric=quality_metric,
secondary_metrics={
"cost": cost_metric,
"latency": latency_metric
},
iterations=10,
population_size=8,
track_pareto_axes=["quality", "cost", "latency"],
verbose=True
)
# Evolve the program
evolved_explainer = gepa_optimizer.compile(
student=SimpleExplainer(),
trainset=trainset
)
# GEPA returns the best program on the Pareto frontier
# You can access multiple solutions trading off different objectives
pareto_solutions = gepa_optimizer.pareto_frontier
GEPA Advantages
Automatic Trade-offs
Finds optimal balance between competing objectives
Self-Reflection
Programs learn from their mistakes
No Heavy RL
Achieves strong gains without complex RL setups
Interpretable
Inspect reflections to understand behavior
GEPA is particularly powerful for production systems where you need to balance quality with cost constraints. Use it when you have clear metrics and need Pareto-optimal solutions.
7. Practical Applications
DSPy excels in real-world applications. Here are concrete examples you can adapt for your projects.
RAG System in 15 Lines
class RAGSignature(dspy.Signature):
"""Answer questions using retrieved context."""
question: str = dspy.InputField()
context: list[str] = dspy.InputField()
answer: str = dspy.OutputField()
class RAGSystem(dspy.Module):
def __init__(self, retriever):
super().__init__()
self.retriever = retriever
self.answer = dspy.ChainOfThought(RAGSignature)
def forward(self, question):
# Retrieve relevant documents
context = self.retriever(question, k=3)
# Generate answer with context
result = self.answer(question=question, context=context)
return result.answer
Multi-step Agent
class PlanAndExecute(dspy.Module):
def __init__(self):
super().__init__()
self.plan = dspy.ChainOfThought("task -> steps")
self.execute = dspy.Predict("step -> result")
self.synthesize = dspy.Predict("results -> final_answer")
def forward(self, task):
# Plan the steps
plan = self.plan(task=task)
# Execute each step
results = []
for step in plan.steps.split('\n'):
result = self.execute(step=step)
results.append(result.result)
# Synthesize final answer
answer = self.synthesize(results=results)
return answer.final_answer
Classification with Confidence
class ClassifyWithConfidence(dspy.Signature):
"""Classify text with confidence score."""
text: str = dspy.InputField()
category: str = dspy.OutputField()
confidence: float = dspy.OutputField()
reasoning: str = dspy.OutputField()
classifier = dspy.ChainOfThought(ClassifyWithConfidence)
result = classifier(text="DSPy makes LLM programming easy")
print(f"{result.category} (confidence: {result.confidence})")
print(f"Reasoning: {result.reasoning}")
8. Best Practices
Following these practices will help you build robust, efficient DSPy programs.
Development Workflow
- Start Simple: Begin with basic Predict, add complexity gradually
- Test Locally: Use small models (Llama 3.2) for rapid iteration
- Create Quality Data: 20 high-quality examples beat 200 noisy ones
- Define Clear Metrics: Your metric is your north star
- Optimize Systematically: Try BootstrapFewShot first, then GEPA
- Version Everything: Save compiled programs with their metrics
Common Patterns
When to Use ChainOfThought
Use for: Math problems, multi-step reasoning, complex analysis
Avoid for: Simple extraction, classification, straightforward tasks
Handling Errors
Add retry logic, use assertions in signatures, validate outputs in metrics
Cost Optimization
Start with cheap models, cache results, batch requests when possible
Common Pitfalls to Avoid
- Treating DSPy as a prompt wrapper: It's a programming model, embrace the abstraction
- Optimizing without good metrics: Garbage in, garbage out — invest in metric design
- Over-optimizing on tiny datasets: Use held-out test sets to check generalization
- Ignoring costs: Track tokens and API costs, especially during optimization
- Not versioning: Save your compiled programs — they're valuable artifacts
Remember: DSPy compiled programs are model-specific. If you switch models, you'll need to recompile. Always test compiled programs with your production model before deployment.
9. Quick Reference
Here's a concise reference for the key DSPy concepts and components:
Concept | Summary |
---|---|
Signatures | Typed I/O contracts for LLM tasks |
Modules | Composable steps that process signatures |
Predict | Basic LLM call with signature |
ChainOfThought | Step-by-step reasoning for complex tasks |
Teleprompters | Optimizers that improve prompts automatically |
GEPA | Genetic evolution for program optimization |
Metrics | Define success criteria for optimization |
Command Cheatsheet
# Install DSPy
pip install dspy-ai
# Basic setup
import dspy
lm = dspy.LM("openai/gpt-4o-mini", api_key="...")
dspy.configure(lm=lm)
# Define signature
class Task(dspy.Signature):
input: str
output: str
# Create module
class Program(dspy.Module):
def __init__(self):
self.predict = dspy.Predict(Task)
def forward(self, input):
return self.predict(input=input).output
# Optimize
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=my_metric)
optimized = optimizer.compile(Program(), trainset=data)
# Save/Load
optimized.save("my_program.json")
loaded = Program()
loaded.load("my_program.json")
Next Steps
- Try the examples in this guide with your own use case
- Experiment with different predictors (Predict vs ChainOfThought)
- Build a simple RAG system with optimization
- Explore GEPA for multi-objective optimization
- Join the DSPy community for support and updates

Stay Updated on AI Engineering
Subscribe to our newsletter for the latest advances in LLM programming and DSPy techniques.
No spam. Unsubscribe anytime.Continue Your Learning
LLM Fundamentals Guide
Understand the foundations of large language models and transformers.
Read the GuideAI-Empowered Future Guide
Explore how AI is reshaping software development and engineering.
Read the Guide