VERSALIST GUIDES

DSPy: Programming Language Models

Introduction

DSPy revolutionizes how we work with language models by treating them as programmable systems rather than black boxes requiring manual prompt engineering. Instead of spending hours crafting the perfect prompt, DSPy lets you define what you want and automatically optimizes how to get it.

This guide will take you from DSPy basics to advanced optimization techniques, including GEPA (Genetic Evolution with Progressive Annotation) for evolving high-performance LLM programs.

By the end, you'll understand how to build self-improving AI systems that learn better instructions and demonstrations automatically, dramatically reducing development time while improving performance.

Why This Guide Matters

DSPy represents a paradigm shift in LLM development. Understanding it will help you:

  • Build more reliable and reproducible LLM applications
  • Reduce prompt engineering time from days to minutes
  • Create self-improving systems that optimize themselves
  • Balance quality, cost, and latency automatically

1. Why DSPy?

The Problem: Traditional prompt engineering is brittle, time-consuming, and doesn't scale. You manually craft prompts, test them, tweak them, and repeat — often breaking what worked when you make changes.

DSPy's Solution

DSPy introduces a declarative approach where you define what you want (the task) and let optimizers figure out how to achieve it (the prompts and examples).

  • Declarative Programming: Define inputs and outputs with Signatures
  • Modular Composition: Combine simple modules into complex pipelines
  • Automatic Optimization: Let teleprompters learn optimal prompts and demonstrations
  • Reproducible Workflows: Version and deploy optimized programs reliably

Perfect For

Classification Tasks

Sentiment analysis, intent detection, categorization

RAG Systems

Question answering with retrieval augmentation

Multi-step Reasoning

Complex agents and chain-of-thought tasks

Tool Use

Function calling and API integration

Think of DSPy as "PyTorch for prompts" — it provides the building blocks and optimization algorithms to build sophisticated LLM applications without manual prompt engineering.

2. Getting Started

Installing and configuring DSPy is straightforward. You'll need Python 3.8+ and an API key for your preferred language model provider.

Installation

pip install dspy-ai

Basic Configuration

import dspy

# Configure with OpenAI
lm = dspy.LM("openai/gpt-4o-mini", api_key="your-api-key")
dspy.configure(lm=lm)

# Or use Anthropic
lm = dspy.LM("anthropic/claude-3-haiku-20240307", api_key="your-api-key")
dspy.configure(lm=lm)

# Or use local models
lm = dspy.LM("ollama/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)

Supported Providers

  • OpenAI (GPT-3.5, GPT-4, GPT-4o)
  • Anthropic (Claude 3 family)
  • Google (Gemini models)
  • Local models via Ollama
  • Any OpenAI-compatible API

Start with smaller, cheaper models during development. You can always switch to more powerful models for production after optimization.

3. Core Concepts

DSPy is built on four fundamental concepts that work together to create powerful, self-optimizing LLM programs.

1. Signatures: Typed I/O Contracts

Signatures define the structure of your task — what goes in and what comes out. They're like function signatures but for LLM operations.

class Summarize(dspy.Signature):
    """Summarize the document in one paragraph."""
    document: str = dspy.InputField()
    summary: str = dspy.OutputField()

class QuestionAnswer(dspy.Signature):
    """Answer questions using the provided context."""
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

2. Modules: Composable Building Blocks

Modules are the workhorses of DSPy. They take signatures and turn them into executable components that can be chained together.

class Summarizer(dspy.Module):
    def __init__(self):
        super().__init__()
        self.summarize = dspy.Predict(Summarize)
    
    def forward(self, document):
        return self.summarize(document=document).summary

# For complex reasoning, use ChainOfThought
class Reasoner(dspy.Module):
    def __init__(self):
        super().__init__()
        self.reason = dspy.ChainOfThought("question -> reasoning -> answer")
    
    def forward(self, question):
        return self.reason(question=question)

3. Predictors: Execution Strategies

  • Predict: Basic LLM call with signature
  • ChainOfThought: Adds reasoning steps before the answer
  • ProgramOfThought: Generates and executes code
  • ReAct: Combines reasoning with action (tool use)

4. Teleprompters: Automatic Optimization

Teleprompters are optimizers that automatically improve your programs by learning better instructions and selecting optimal few-shot examples.

  • BootstrapFewShot: Learns from a small training set
  • BootstrapFewShotWithRandomSearch: Explores multiple prompt variations
  • MIPRO: Multi-stage optimization for complex pipelines
  • GEPA: Genetic evolution for Pareto-optimal solutions

4. Building Your First Program

Let's build a simple explanation system that takes complex topics and explains them in simple terms. This example demonstrates the core DSPy workflow.

Step 1: Define the Signature

import dspy

class ExplainSimply(dspy.Signature):
    """Explain a complex topic in simple terms."""
    topic: str = dspy.InputField(desc="The complex topic to explain")
    explanation: str = dspy.OutputField(desc="Simple explanation suitable for a 10-year-old")

Step 2: Create a Module

class SimpleExplainer(dspy.Module):
    def __init__(self):
        super().__init__()
        # Use ChainOfThought for better reasoning
        self.explain = dspy.ChainOfThought(ExplainSimply)
    
    def forward(self, topic):
        result = self.explain(topic=topic)
        return result.explanation

Step 3: Use the Program

# Configure DSPy with your LLM
lm = dspy.LM("openai/gpt-4o-mini", api_key="...")
dspy.configure(lm=lm)

# Create and use the explainer
explainer = SimpleExplainer()
explanation = explainer(topic="quantum computing")
print(explanation)

# Output: "Quantum computing is like having a super-powerful calculator that can 
# try many different answers at the same time..."

Start simple with Predict, then upgrade to ChainOfThought when you need reasoning. The beauty of DSPy is you can swap predictors without changing your module structure.

5. Optimization with Teleprompters

The real power of DSPy comes from automatic optimization. Instead of manually crafting prompts, you provide examples and a metric, and DSPy learns the best instructions and demonstrations.

Creating a Training Set

# Prepare training examples
trainset = [
    {"topic": "inflation", "explanation": "Imagine everyone wants the same toy..."},
    {"topic": "DNA", "explanation": "DNA is like a recipe book for your body..."},
    {"topic": "black holes", "explanation": "A black hole is like a cosmic vacuum..."},
    # Add 10-20 examples for best results
]

Defining a Metric

def simplicity_metric(example, prediction):
    # Check if explanation exists and is simple enough
    if not prediction.explanation:
        return 0.0
    
    # Simple heuristics for quality
    words = prediction.explanation.split()
    
    # Penalize if too short or too long
    if len(words) < 20 or len(words) > 100:
        return 0.5
    
    # Check for complex words (you could use a readability library here)
    complex_words = ["quantum", "algorithm", "theoretical", "hypothesis"]
    complexity_score = sum(1 for w in words if w.lower() in complex_words)
    
    # Return score (0 to 1)
    return max(0, 1 - (complexity_score * 0.1))

Running Optimization

from dspy.teleprompt import BootstrapFewShot

# Create the teleprompter
teleprompter = BootstrapFewShot(
    metric=simplicity_metric,
    max_bootstrapped_demos=3,  # Number of examples to include
    max_labeled_demos=5,       # Max examples to use for bootstrapping
    max_rounds=2                # Optimization iterations
)

# Compile the program
explainer = SimpleExplainer()
optimized_explainer = teleprompter.compile(
    student=explainer,
    trainset=trainset
)

# The optimized version now has:
# 1. Better instructions in the prompt
# 2. Carefully selected few-shot examples
# 3. Consistent high-quality outputs

Your metric is crucial — it defines what "good" means for your task. Spend time crafting a metric that captures what you actually care about, including edge cases and failure modes.

6. Advanced: GEPA Evolution

GEPA (Genetic Evolution with Progressive Annotation) represents the cutting edge of DSPy optimization. It evolves programs using genetic algorithms while balancing multiple objectives like quality, cost, and latency.

How GEPA Works

  1. Population: Starts with multiple program variants
  2. Evolution: Mutates and crosses successful programs
  3. Reflection: Programs annotate why they succeed or fail
  4. Pareto Optimization: Balances multiple competing objectives

Using GEPA

from dspy.optimize import GEPA

# Define multiple objectives
def quality_metric(example, pred):
    # Your quality score (0-1)
    return simplicity_metric(example, pred)

def cost_metric(example, pred, trace):
    # Calculate token usage
    return trace.total_tokens / 1000  # Cost per 1k tokens

def latency_metric(example, pred, trace):
    # Measure response time
    return trace.response_time

# Configure GEPA
gepa_optimizer = GEPA(
    metric=quality_metric,
    secondary_metrics={
        "cost": cost_metric,
        "latency": latency_metric
    },
    iterations=10,
    population_size=8,
    track_pareto_axes=["quality", "cost", "latency"],
    verbose=True
)

# Evolve the program
evolved_explainer = gepa_optimizer.compile(
    student=SimpleExplainer(),
    trainset=trainset
)

# GEPA returns the best program on the Pareto frontier
# You can access multiple solutions trading off different objectives
pareto_solutions = gepa_optimizer.pareto_frontier

GEPA Advantages

Automatic Trade-offs

Finds optimal balance between competing objectives

Self-Reflection

Programs learn from their mistakes

No Heavy RL

Achieves strong gains without complex RL setups

Interpretable

Inspect reflections to understand behavior

GEPA is particularly powerful for production systems where you need to balance quality with cost constraints. Use it when you have clear metrics and need Pareto-optimal solutions.

7. Practical Applications

DSPy excels in real-world applications. Here are concrete examples you can adapt for your projects.

RAG System in 15 Lines

class RAGSignature(dspy.Signature):
    """Answer questions using retrieved context."""
    question: str = dspy.InputField()
    context: list[str] = dspy.InputField()
    answer: str = dspy.OutputField()

class RAGSystem(dspy.Module):
    def __init__(self, retriever):
        super().__init__()
        self.retriever = retriever
        self.answer = dspy.ChainOfThought(RAGSignature)
    
    def forward(self, question):
        # Retrieve relevant documents
        context = self.retriever(question, k=3)
        # Generate answer with context
        result = self.answer(question=question, context=context)
        return result.answer

Multi-step Agent

class PlanAndExecute(dspy.Module):
    def __init__(self):
        super().__init__()
        self.plan = dspy.ChainOfThought("task -> steps")
        self.execute = dspy.Predict("step -> result")
        self.synthesize = dspy.Predict("results -> final_answer")
    
    def forward(self, task):
        # Plan the steps
        plan = self.plan(task=task)
        
        # Execute each step
        results = []
        for step in plan.steps.split('\n'):
            result = self.execute(step=step)
            results.append(result.result)
        
        # Synthesize final answer
        answer = self.synthesize(results=results)
        return answer.final_answer

Classification with Confidence

class ClassifyWithConfidence(dspy.Signature):
    """Classify text with confidence score."""
    text: str = dspy.InputField()
    category: str = dspy.OutputField()
    confidence: float = dspy.OutputField()
    reasoning: str = dspy.OutputField()

classifier = dspy.ChainOfThought(ClassifyWithConfidence)
result = classifier(text="DSPy makes LLM programming easy")
print(f"{result.category} (confidence: {result.confidence})")
print(f"Reasoning: {result.reasoning}")

8. Best Practices

Following these practices will help you build robust, efficient DSPy programs.

Development Workflow

  1. Start Simple: Begin with basic Predict, add complexity gradually
  2. Test Locally: Use small models (Llama 3.2) for rapid iteration
  3. Create Quality Data: 20 high-quality examples beat 200 noisy ones
  4. Define Clear Metrics: Your metric is your north star
  5. Optimize Systematically: Try BootstrapFewShot first, then GEPA
  6. Version Everything: Save compiled programs with their metrics

Common Patterns

When to Use ChainOfThought

Use for: Math problems, multi-step reasoning, complex analysis
Avoid for: Simple extraction, classification, straightforward tasks

Handling Errors

Add retry logic, use assertions in signatures, validate outputs in metrics

Cost Optimization

Start with cheap models, cache results, batch requests when possible

Common Pitfalls to Avoid

  • Treating DSPy as a prompt wrapper: It's a programming model, embrace the abstraction
  • Optimizing without good metrics: Garbage in, garbage out — invest in metric design
  • Over-optimizing on tiny datasets: Use held-out test sets to check generalization
  • Ignoring costs: Track tokens and API costs, especially during optimization
  • Not versioning: Save your compiled programs — they're valuable artifacts

Remember: DSPy compiled programs are model-specific. If you switch models, you'll need to recompile. Always test compiled programs with your production model before deployment.

9. Quick Reference

Here's a concise reference for the key DSPy concepts and components:

ConceptSummary
SignaturesTyped I/O contracts for LLM tasks
ModulesComposable steps that process signatures
PredictBasic LLM call with signature
ChainOfThoughtStep-by-step reasoning for complex tasks
TelepromptersOptimizers that improve prompts automatically
GEPAGenetic evolution for program optimization
MetricsDefine success criteria for optimization

Command Cheatsheet

# Install DSPy
pip install dspy-ai

# Basic setup
import dspy
lm = dspy.LM("openai/gpt-4o-mini", api_key="...")
dspy.configure(lm=lm)

# Define signature
class Task(dspy.Signature):
    input: str
    output: str

# Create module
class Program(dspy.Module):
    def __init__(self):
        self.predict = dspy.Predict(Task)
    def forward(self, input):
        return self.predict(input=input).output

# Optimize
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=my_metric)
optimized = optimizer.compile(Program(), trainset=data)

# Save/Load
optimized.save("my_program.json")
loaded = Program()
loaded.load("my_program.json")

Next Steps

  1. Try the examples in this guide with your own use case
  2. Experiment with different predictors (Predict vs ChainOfThought)
  3. Build a simple RAG system with optimization
  4. Explore GEPA for multi-objective optimization
  5. Join the DSPy community for support and updates
Newsletter Card

Stay Updated on AI Engineering

Subscribe to our newsletter for the latest advances in LLM programming and DSPy techniques.

No spam. Unsubscribe anytime.

Continue Your Learning

LLM Fundamentals Guide

Understand the foundations of large language models and transformers.

Read the Guide

AI-Empowered Future Guide

Explore how AI is reshaping software development and engineering.

Read the Guide

Test Your Knowledge

intermediate

Short, practical guide to DSPy programming and GEPA optimization.

3 questions
12 min
70% to pass

Sign in to take this quiz

Create an account to take the quiz, track your progress, and see how you compare with other learners.