Experiments
Foil’s experiment feature lets you run A/B tests on prompts, models, and configurations to find what works best.What Can You Test?
- Prompts - Compare different system prompts or instructions
- Models - Test gpt-4o vs gpt-4o-mini vs claude
- Parameters - Temperature, max tokens, etc.
- Tools - Different tool configurations
- Full workflows - Compare entire agent implementations
Creating an Experiment
Via Dashboard
- Go to Experiments in the dashboard
- Click Create Experiment
- Configure variants
- Set traffic allocation
- Start the experiment
Via API
Experiment Lifecycle
Start an Experiment
Pause an Experiment
Stop an Experiment
Getting Variant Assignment
In your application, request a variant assignment:Via API
Using Variants in Your Code
Tracking Metrics
Record metrics for each variant:Viewing Results
Dashboard
The experiment results page shows:- Variant performance comparison
- Statistical significance
- Metric breakdowns
- Traffic distribution
API
Experiment Configuration
| Field | Type | Description |
|---|---|---|
name | string | Experiment identifier |
description | string | What you’re testing |
variants | array | List of variants to test |
variants[].name | string | Variant name |
variants[].weight | number | Traffic percentage (0-100) |
variants[].config | object | Variant-specific configuration |
metrics | array | Signal names to track |
minimumSampleSize | number | Required samples per variant |
maximumDuration | number | Auto-stop after N days |
Best Practices
Test one variable at a time
Test one variable at a time
Isolate what you’re testing. If you change both prompt AND model, you won’t know which caused the difference.
Use consistent identifiers
Use consistent identifiers
Use user IDs or session IDs for assignment to ensure users get the same variant consistently.
Wait for statistical significance
Wait for statistical significance
Don’t end experiments early. Wait until you have enough samples and the p-value is meaningful (typically < 0.05).
Track multiple metrics
Track multiple metrics
A variant might improve one metric while hurting another. Track quality, user satisfaction, and business metrics.
Document your experiments
Document your experiments
Include clear descriptions of what you’re testing and why. Future you will thank present you.