VERSALIST GUIDES

DSPy: Programming Language Models

1.Why DSPy?
2.Getting Started
3.Core Concepts
4.Building Your First Program
5.Optimization with Teleprompters
6.Advanced: GEPA Evolution
7.Practical Applications
8.Best Practices
9.Quick Reference

Introduction

DSPy revolutionizes how we work with language models by treating them as programmable systems rather than black boxes requiring manual prompt engineering. Instead of spending hours crafting the perfect prompt, DSPy lets you define what you want and automatically optimizes how to get it.

This guide will take you from DSPy basics to advanced optimization techniques, including GEPA (Genetic Evolution with Progressive Annotation) for evolving high-performance LLM programs.

By the end, you'll understand how to build self-improving AI systems that learn better instructions and demonstrations automatically, dramatically reducing development time while improving performance.

Why This Guide Matters

DSPy represents a paradigm shift in LLM development. Understanding it will help you:

Build more reliable and reproducible LLM applications
Reduce prompt engineering time from days to minutes
Create self-improving systems that optimize themselves
Balance quality, cost, and latency automatically

1. Why DSPy?

The Problem: Traditional prompt engineering is brittle, time-consuming, and doesn't scale. You manually craft prompts, test them, tweak them, and repeat — often breaking what worked when you make changes.

DSPy's Solution

DSPy introduces a declarative approach where you define what you want (the task) and let optimizers figure out how to achieve it (the prompts and examples).

Declarative Programming: Define inputs and outputs with Signatures
Modular Composition: Combine simple modules into complex pipelines
Automatic Optimization: Let teleprompters learn optimal prompts and demonstrations
Reproducible Workflows: Version and deploy optimized programs reliably

Perfect For

Classification Tasks

Sentiment analysis, intent detection, categorization

RAG Systems

Question answering with retrieval augmentation

Multi-step Reasoning

Complex agents and chain-of-thought tasks

Tool Use

Function calling and API integration

Think of DSPy as "PyTorch for prompts" — it provides the building blocks and optimization algorithms to build sophisticated LLM applications without manual prompt engineering.

2. Getting Started

Installing and configuring DSPy is straightforward. You'll need Python 3.8+ and an API key for your preferred language model provider.

Installation

pip install dspy-ai

Basic Configuration

import dspy

# Configure with OpenAI
lm = dspy.LM("openai/gpt-4o-mini", api_key="your-api-key")
dspy.configure(lm=lm)

# Or use Anthropic
lm = dspy.LM("anthropic/claude-3-haiku-20240307", api_key="your-api-key")
dspy.configure(lm=lm)

# Or use local models
lm = dspy.LM("ollama/llama3.2", api_base="http://localhost:11434")
dspy.configure(lm=lm)

Supported Providers

OpenAI (GPT-3.5, GPT-4, GPT-4o)
Anthropic (Claude 3 family)
Google (Gemini models)
Local models via Ollama
Any OpenAI-compatible API

Start with smaller, cheaper models during development. You can always switch to more powerful models for production after optimization.

3. Core Concepts

DSPy is built on four fundamental concepts that work together to create powerful, self-optimizing LLM programs.

1. Signatures: Typed I/O Contracts

Signatures define the structure of your task — what goes in and what comes out. They're like function signatures but for LLM operations.

class Summarize(dspy.Signature):
    """Summarize the document in one paragraph."""
    document: str = dspy.InputField()
    summary: str = dspy.OutputField()

class QuestionAnswer(dspy.Signature):
    """Answer questions using the provided context."""
    context: str = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

2. Modules: Composable Building Blocks

Modules are the workhorses of DSPy. They take signatures and turn them into executable components that can be chained together.

class Summarizer(dspy.Module):
    def __init__(self):
        super().__init__()
        self.summarize = dspy.Predict(Summarize)
    
    def forward(self, document):
        return self.summarize(document=document).summary

# For complex reasoning, use ChainOfThought
class Reasoner(dspy.Module):
    def __init__(self):
        super().__init__()
        self.reason = dspy.ChainOfThought("question -> reasoning -> answer")
    
    def forward(self, question):
        return self.reason(question=question)

3. Predictors: Execution Strategies

Predict: Basic LLM call with signature
ChainOfThought: Adds reasoning steps before the answer
ProgramOfThought: Generates and executes code
ReAct: Combines reasoning with action (tool use)

4. Teleprompters: Automatic Optimization

Teleprompters are optimizers that automatically improve your programs by learning better instructions and selecting optimal few-shot examples.

BootstrapFewShot: Learns from a small training set
BootstrapFewShotWithRandomSearch: Explores multiple prompt variations
MIPRO: Multi-stage optimization for complex pipelines
GEPA: Genetic evolution for Pareto-optimal solutions

4. Building Your First Program

Let's build a simple explanation system that takes complex topics and explains them in simple terms. This example demonstrates the core DSPy workflow.

Step 1: Define the Signature

import dspy

class ExplainSimply(dspy.Signature):
    """Explain a complex topic in simple terms."""
    topic: str = dspy.InputField(desc="The complex topic to explain")
    explanation: str = dspy.OutputField(desc="Simple explanation suitable for a 10-year-old")

Step 2: Create a Module

class SimpleExplainer(dspy.Module):
    def __init__(self):
        super().__init__()
        # Use ChainOfThought for better reasoning
        self.explain = dspy.ChainOfThought(ExplainSimply)
    
    def forward(self, topic):
        result = self.explain(topic=topic)
        return result.explanation

Step 3: Use the Program

# Configure DSPy with your LLM
lm = dspy.LM("openai/gpt-4o-mini", api_key="...")
dspy.configure(lm=lm)

# Create and use the explainer
explainer = SimpleExplainer()
explanation = explainer(topic="quantum computing")
print(explanation)

# Output: "Quantum computing is like having a super-powerful calculator that can 
# try many different answers at the same time..."

Start simple with Predict, then upgrade to ChainOfThought when you need reasoning. The beauty of DSPy is you can swap predictors without changing your module structure.

5. Optimization with Teleprompters

The real power of DSPy comes from automatic optimization. Instead of manually crafting prompts, you provide examples and a metric, and DSPy learns the best instructions and demonstrations.

Creating a Training Set

# Prepare training examples
trainset = [
    {"topic": "inflation", "explanation": "Imagine everyone wants the same toy..."},
    {"topic": "DNA", "explanation": "DNA is like a recipe book for your body..."},
    {"topic": "black holes", "explanation": "A black hole is like a cosmic vacuum..."},
    # Add 10-20 examples for best results
]

Defining a Metric

def simplicity_metric(example, prediction):
    # Check if explanation exists and is simple enough
    if not prediction.explanation:
        return 0.0
    
    # Simple heuristics for quality
    words = prediction.explanation.split()
    
    # Penalize if too short or too long
    if len(words) < 20 or len(words) > 100:
        return 0.5
    
    # Check for complex words (you could use a readability library here)
    complex_words = ["quantum", "algorithm", "theoretical", "hypothesis"]
    complexity_score = sum(1 for w in words if w.lower() in complex_words)
    
    # Return score (0 to 1)
    return max(0, 1 - (complexity_score * 0.1))

Running Optimization

from dspy.teleprompt import BootstrapFewShot

# Create the teleprompter
teleprompter = BootstrapFewShot(
    metric=simplicity_metric,
    max_bootstrapped_demos=3,  # Number of examples to include
    max_labeled_demos=5,       # Max examples to use for bootstrapping
    max_rounds=2                # Optimization iterations
)

# Compile the program
explainer = SimpleExplainer()
optimized_explainer = teleprompter.compile(
    student=explainer,
    trainset=trainset
)

# The optimized version now has:
# 1. Better instructions in the prompt
# 2. Carefully selected few-shot examples
# 3. Consistent high-quality outputs

Your metric is crucial — it defines what "good" means for your task. Spend time crafting a metric that captures what you actually care about, including edge cases and failure modes.

6. Advanced: GEPA Evolution

GEPA (Genetic Evolution with Progressive Annotation) represents the cutting edge of DSPy optimization. It evolves programs using genetic algorithms while balancing multiple objectives like quality, cost, and latency.

How GEPA Works

Population: Starts with multiple program variants
Evolution: Mutates and crosses successful programs
Reflection: Programs annotate why they succeed or fail
Pareto Optimization: Balances multiple competing objectives

Using GEPA

from dspy.optimize import GEPA

# Define multiple objectives
def quality_metric(example, pred):
    # Your quality score (0-1)
    return simplicity_metric(example, pred)

def cost_metric(example, pred, trace):
    # Calculate token usage
    return trace.total_tokens / 1000  # Cost per 1k tokens

def latency_metric(example, pred, trace):
    # Measure response time
    return trace.response_time

# Configure GEPA
gepa_optimizer = GEPA(
    metric=quality_metric,
    secondary_metrics={
        "cost": cost_metric,
        "latency": latency_metric
    },
    iterations=10,
    population_size=8,
    track_pareto_axes=["quality", "cost", "latency"],
    verbose=True
)

# Evolve the program
evolved_explainer = gepa_optimizer.compile(
    student=SimpleExplainer(),
    trainset=trainset
)

# GEPA returns the best program on the Pareto frontier
# You can access multiple solutions trading off different objectives
pareto_solutions = gepa_optimizer.pareto_frontier

GEPA Advantages

Automatic Trade-offs

Finds optimal balance between competing objectives

Self-Reflection

Programs learn from their mistakes

No Heavy RL

Achieves strong gains without complex RL setups

Interpretable

Inspect reflections to understand behavior

GEPA is particularly powerful for production systems where you need to balance quality with cost constraints. Use it when you have clear metrics and need Pareto-optimal solutions.

7. Practical Applications

DSPy excels in real-world applications. Here are concrete examples you can adapt for your projects.

RAG System in 15 Lines

class RAGSignature(dspy.Signature):
    """Answer questions using retrieved context."""
    question: str = dspy.InputField()
    context: list[str] = dspy.InputField()
    answer: str = dspy.OutputField()

class RAGSystem(dspy.Module):
    def __init__(self, retriever):
        super().__init__()
        self.retriever = retriever
        self.answer = dspy.ChainOfThought(RAGSignature)
    
    def forward(self, question):
        # Retrieve relevant documents
        context = self.retriever(question, k=3)
        # Generate answer with context
        result = self.answer(question=question, context=context)
        return result.answer

Multi-step Agent

class PlanAndExecute(dspy.Module):
    def __init__(self):
        super().__init__()
        self.plan = dspy.ChainOfThought("task -> steps")
        self.execute = dspy.Predict("step -> result")
        self.synthesize = dspy.Predict("results -> final_answer")
    
    def forward(self, task):
        # Plan the steps
        plan = self.plan(task=task)
        
        # Execute each step
        results = []
        for step in plan.steps.split('\n'):
            result = self.execute(step=step)
            results.append(result.result)
        
        # Synthesize final answer
        answer = self.synthesize(results=results)
        return answer.final_answer

Classification with Confidence

class ClassifyWithConfidence(dspy.Signature):
    """Classify text with confidence score."""
    text: str = dspy.InputField()
    category: str = dspy.OutputField()
    confidence: float = dspy.OutputField()
    reasoning: str = dspy.OutputField()

classifier = dspy.ChainOfThought(ClassifyWithConfidence)
result = classifier(text="DSPy makes LLM programming easy")
print(f"{result.category} (confidence: {result.confidence})")
print(f"Reasoning: {result.reasoning}")

8. Best Practices

Following these practices will help you build robust, efficient DSPy programs.

Development Workflow

Start Simple: Begin with basic Predict, add complexity gradually
Test Locally: Use small models (Llama 3.2) for rapid iteration
Create Quality Data: 20 high-quality examples beat 200 noisy ones
Define Clear Metrics: Your metric is your north star
Optimize Systematically: Try BootstrapFewShot first, then GEPA
Version Everything: Save compiled programs with their metrics

Common Patterns

When to Use ChainOfThought

Use for: Math problems, multi-step reasoning, complex analysis
Avoid for: Simple extraction, classification, straightforward tasks

Handling Errors

Add retry logic, use assertions in signatures, validate outputs in metrics

Cost Optimization

Start with cheap models, cache results, batch requests when possible

Common Pitfalls to Avoid

Treating DSPy as a prompt wrapper: It's a programming model, embrace the abstraction
Optimizing without good metrics: Garbage in, garbage out — invest in metric design
Over-optimizing on tiny datasets: Use held-out test sets to check generalization
Ignoring costs: Track tokens and API costs, especially during optimization
Not versioning: Save your compiled programs — they're valuable artifacts

Remember: DSPy compiled programs are model-specific. If you switch models, you'll need to recompile. Always test compiled programs with your production model before deployment.

9. Quick Reference

Here's a concise reference for the key DSPy concepts and components:

Concept	Summary
Signatures	Typed I/O contracts for LLM tasks
Modules	Composable steps that process signatures
Predict	Basic LLM call with signature
ChainOfThought	Step-by-step reasoning for complex tasks
Teleprompters	Optimizers that improve prompts automatically
GEPA	Genetic evolution for program optimization
Metrics	Define success criteria for optimization

Command Cheatsheet

# Install DSPy
pip install dspy-ai

# Basic setup
import dspy
lm = dspy.LM("openai/gpt-4o-mini", api_key="...")
dspy.configure(lm=lm)

# Define signature
class Task(dspy.Signature):
    input: str
    output: str

# Create module
class Program(dspy.Module):
    def __init__(self):
        self.predict = dspy.Predict(Task)
    def forward(self, input):
        return self.predict(input=input).output

# Optimize
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=my_metric)
optimized = optimizer.compile(Program(), trainset=data)

# Save/Load
optimized.save("my_program.json")
loaded = Program()
loaded.load("my_program.json")

Next Steps

Try the examples in this guide with your own use case
Experiment with different predictors (Predict vs ChainOfThought)
Build a simple RAG system with optimization
Explore GEPA for multi-objective optimization
Join the DSPy community for support and updates

Stay Updated on AI Engineering

Subscribe to our newsletter for the latest advances in LLM programming and DSPy techniques.

No spam. Unsubscribe anytime.

Continue Your Learning

LLM Fundamentals Guide

Understand the foundations of large language models and transformers.

Read the Guide

AI-Empowered Future Guide

Explore how AI is reshaping software development and engineering.