Every prompt, every data point, every instruction — exactly as the AI receives it during benchmark testing.
All models receive identical prompts. No model-specific tuning. Same data, same rules, same scoring.
system role message before the user prompt. All models receive this exact same system prompt with no modifications.
memory_update the AI writes is validated before being stored. Memories containing dollar prices, confirmation bias, or duplicate content are rejected. Only relative, pattern-based insights survive.
Every significant change to the prompts is logged here. This lets us track what changed, why, and what impact it had on results.
| Version | Date | System Prompt Changes | User Prompt Changes | Impact |
|---|---|---|---|---|
| V6.1 | Apr 12, 2026 | Added Rules 2-4: direction neutrality, deliberate trading, conviction sizing | Added: DIRECTION NEUTRALITY, OVERTRADING WARNING, POSITION SIZING sections. Verified regime offsets. | Score: 52.7→64.6. Fees: $60→$16/trial. Direction: 43%→53%. |
| V6.0 | Apr 12, 2026 | Rule 1 only: relative memory enforcement. No directional guidance. | Full redesign: 6 sections (portfolio, history, outlook, TA, classifiers, memory). Memory validator. 4 baselines. Direction forecast field. | Score: 52.7/100 Grade C. Direction accuracy 43% (below coin flip). Lost to ALL 4 baselines. Short bias 1.47:1. |
| V5.0 | Apr 11, 2026 | Production-matching system prompt. No trade constraints. | Full production prompt builder with §1-§5 sections. Price-anchored memories allowed. No classifier signals. | Secret -3.91% in +2.96% market. Memory feedback loop created bearish bias for 150+ steps. |
| V4.0 | Apr 10, 2026 | Simple "trading AI" identity. No memory rules. | Minimal: price, portfolio, simple indicators. No classifiers, no memory validation. | Fleet avg -0.19%. Fixed critical leverage accounting bug (100x phantom profits from V3). |
| V3.0 | Apr 9, 2026 | Basic trading AI. No constraints. | Minimal prompt. No baselines. Broken leverage mechanics. | INVALID — +133% to +355% returns were phantom profits from leverage bug. |
| Parameter | Value | Why |
|---|---|---|
| Temperature | 0.3 | Low for reproducibility. Higher = more variation between trials. |
| Max Tokens | 2,048 | Enough for full JSON response with reasoning + multiple actions. |
| Steps per Trial | 30 | Each step = 1 hour of market data. 30 steps = 30 hours. |
| Trials per Regime | 3 | Each trial starts from a different verified offset in the candle data. |
| Regimes | Bull, Bear, Neutral, Volatile | 4 regimes × 3 trials = 12 trials per model. |
| Starting Balance | $10,000 per portfolio | Both PUBLIC and SECRET start with $10K cash. |
| Max Leverage | 5x | SECRET only. Liquidation at ~20% adverse move. |
| Taker Fee | 0.1% spot, 0.05% futures | Realistic exchange fees. Funding: 0.01%/step for leveraged positions. |
| Slippage | 0.05% | Simulated execution slippage. |
| Candle Data | 10,443 BTC 1h candles | Jan 28, 2025 to Apr 8, 2026. Real historical data. |
| Market Outlooks | 1,480 from production DB | ⚠ Potential data leakage — being removed in V7. |
| API Provider | OpenRouter | Rate-limited. All models accessed through same API. |
Now that you've seen what the AI receives, see how it performs.