FULL TRANSPARENCY

What the AI Actually Sees

Every prompt, every data point, every instruction — exactly as the AI receives it during benchmark testing.

All models receive identical prompts. No model-specific tuning. Same data, same rules, same scoring.

💡 Why publish our prompts? Most AI benchmarks are black boxes. We believe transparency is essential for scientific credibility. By showing exactly what data and instructions the AI receives, anyone can evaluate whether our benchmark is fair, whether the AI has an unfair advantage, and whether the results are meaningful. This page updates with each benchmark version.

Data Pipeline — What Flows Into Each Step

📊
10,443 Candles
BTC 1h OHLCV
Jan 2025 – Apr 2026
📈
Indicators
RSI, MACD, Bollinger
EMA20/50/200, ATR, ADX
🎯
6 Classifiers
TrendStrength, Momentum
Volatility, MeanReversion
MACD, MultiTimeframe
📝
Market Outlook
Direction + confidence
+ reasoning (from DB)
⚠ Data leakage risk
🧠
AI Memory
Up to 12 relative insights
validated + deduplicated
no $ prices allowed
💼
Portfolio State
Cash, holdings, positions
P&L, leverage, recent trades

The Prompts (V6.1 — Current)

SYSTEM

System Prompt — AI Identity & Core Rules

What is this? The system prompt defines who the AI is and its fundamental operating rules. It's sent as the system role message before the user prompt. All models receive this exact same system prompt with no modifications.
You are an expert autonomous crypto trading AI being benchmarked on historical data. You manage TWO portfolios: PUBLIC (conservative spot only) and SECRET (aggressive, leverage allowed). CRITICAL RULES: 1. Your memory_update field must contain RELATIVE insights using %, ratios, and indicator levels — NEVER reference specific dollar prices. 2. You are DIRECTION NEUTRAL by default. Do NOT assume bearish or bullish bias. Let the data guide your direction each step independently. Going long and going short are EQUALLY valid — choose based on current indicators, not prior bias. 3. TRADE DELIBERATELY. Only open new positions when you have high-conviction signals. Holding cash is a valid strategy. Excessive trading generates fees that destroy returns. 4. SIZE WITH CONVICTION. When your signals are strong and aligned, size larger. When signals are mixed, size smaller or hold. Respond with valid JSON only — no markdown.
V6.1 Changes from V6.0: Rules 2, 3, and 4 were added in V6.1 to fix direction bias (V6.0 had 1.47:1 short-to-long ratio), overtrading ($60/trial fees), and weak position sizing (42/100 IQ). V6.0's system prompt only had Rule 1.
USER

User Prompt — Full Example (Step 15 of Bull Regime)

What is this? This is the user message sent each step. It's dynamically built from real data — portfolio state, indicators, classifiers, memory, outlook. Below is a realistic example of what the AI receives at step 15 of a bull market trial.
### 1. PORTFOLIO STATUS PUBLIC: Cash $10000 | Holdings: None | Total: $10000 (0.00%) SECRET: Cash $9850 | Holdings: None | Positions: 2x short BTC (margin: $500, PnL: -3.2%, liq: 48.5% away) | Total: $9834 (-1.66%) ### 2. RECENT TRADE LOG Step12: open_short BTC $112450 — RSI overbought at 72, BB%B at 95%, expecting mean reversion Step13: stop_loss BTC $114500 — Protecting short position at 1.8% above entry Step14: hold BTC $113200 — Maintaining short, RSI cooling from 72 to 68 ### 3. MARKET OUTLOOK Direction: bullish (7/10) Analysis: BTC showing sustained strength above key moving averages with increasing volume. Macro sentiment supports continued upside with institutional flows remaining positive. Key resistance at 115K zone being tested. Support established at 108K with strong buyer interest... ### 4. TECHNICAL ANALYSIS Price: $113,450 | 24h Change: +1.8% Relative Position: +2.1% from EMA20 | +4.7% from EMA50 | +12.3% from EMA200 RSI(14): 68 | MACD: +245 Bollinger %B: 82.5% (0=lower band, 100=upper band) ATR: 1.8% of price | ADX: 34 (trend strength) +DI:28 -DI:16 4H: RSI 65 | EMA20: $112,100 | EMA50: $109,800 1D: RSI 62 ### 4.5 CLASSIFIER SIGNALS Composite: 68/100 | Direction: BULLISH | Confidence: 72% TrendStrength: bullish (74/100) Momentum: bullish (71/100) Volatility: neutral (52/100) MeanReversion: bearish (35/100) MACD: bullish (78/100) MultiTimeframe: bullish (65/100) Note: Bearish classifier signals have historically higher accuracy than bullish. ### 5. ACCUMULATED INSIGHTS (Relative knowledge — no price anchoring) 1. RSI>70 + BB%B>90 in low-ADX (<25) regimes produces mean-reversion opportunities 65% of the time 2. When ADX>30 and +DI>-DI by 10+ points, trend continuation occurs ~70% of the time 3. MACD histogram declining while price rises signals weakening momentum — reduce long exposure 4. After 3+ consecutive bullish candles, a 1-2% pullback to EMA20 is common before resumption 5. Classifier composite >65 with 4+ bullish signals has preceded 1-3% upside in 60% of cases 6. Short positions against strong ADX>30 trends have -2:1 risk/reward — avoid unless RSI>75 7. 4H RSI divergence from 1H RSI by >10 points signals potential reversal within 6-12 candles ### 6. SECRET PORTFOLIO You have a secondary hidden portfolio with full trading freedom including leverage up to 5x. Use it to express your HIGHEST conviction ideas with proper risk management. ### TRADING INSTRUCTIONS SCORED STEP 15/30 — Historical BTC benchmark. DIRECTION NEUTRALITY: You MUST evaluate each step fresh. Going LONG and going SHORT are equally valid. Do NOT carry forward a bearish or bullish bias from previous steps. Let the current indicators guide you. OVERTRADING WARNING: Every trade costs fees (~0.1% spot, ~0.05% futures). Excessive trading destroys returns. Only trade when you have a clear edge. Holding/doing nothing is ALWAYS an option and often the best one. POSITION SIZING: Size proportional to conviction. Strong aligned signals → larger size. Mixed/unclear signals → smaller size or hold. This is how you demonstrate real intelligence. ACTIONS (respond in JSON): Spot: buy (amount_usd), sell (quantity or amount_usd), hold Futures (secret only): open_long/open_short (amount_usd + leverage 2-5), close_long/close_short Orders: stop_loss (price + quantity), limit_buy/limit_sell (price + amount_usd/quantity), cancel_order (order_id) LEVERAGE: 5x on $1000 = $5000 exposure. Liquidation at ~20% adverse move. Funding fees 0.01%/step. Respond with ONLY valid JSON (no markdown, no code blocks): { "reasoning": "2-3 sentences: what you see + what you conclude + what you do", "direction_forecast": "up|down|flat", "forecast_confidence": <0.0-1.0>, "actions": [{ "action": "...", "symbol": "BTC", "amount_usd": <n>, ... }], "secret_actions": [{ ... }], "confidence": <0.0-1.0>, "memory_update": "One RELATIVE insight (use %, ratios, indicator levels — NEVER use dollar prices)." }
⚠ Known Integrity Issue: The AI sees "$113,450" and "BTC" — it can potentially recognize the time period from its training data. V7 will normalize prices to anonymize the asset. The Market Outlook section (§3) may also contain future-biased information when replayed historically.
RESPONSE

Expected AI Response Format

What is this? This is the exact JSON structure the AI must return. Parse failure (invalid JSON) = the step is treated as "hold" with no action. V4.0 showed some models fail to produce valid JSON up to 99% of the time.
{ "reasoning": "RSI cooling from 72 to 68 with ADX still strong at 34 suggests trend continuation. Classifier composite at 68 with 4/6 bullish favors long bias. Closing short position and switching to small long to align with dominant trend.", "direction_forecast": "up", "forecast_confidence": 0.65, "actions": [ { "action": "hold", "symbol": "BTC", "reasoning": "Public stays cautious — trend strong but extended" } ], "secret_actions": [ { "action": "close_short", "symbol": "BTC", "reasoning": "Cutting losing short — trend against us, ADX 34 confirms" }, { "action": "open_long", "symbol": "BTC", "amount_usd": 300, "leverage": 2, "reasoning": "Small long aligned with 4/6 bullish classifiers, 2x conservative" }, { "action": "stop_loss", "symbol": "BTC", "price": 111800, "quantity": 0.003, "reasoning": "Stop below EMA20 — 1.5% risk" } ], "confidence": 0.6, "memory_update": "When ADX>30 with +DI leading by 12+ pts, closing counter-trend shorts and pivoting to small trend-following longs with 2x leverage captures 60-70% of continuation moves while limiting drawdown to 1.5% with EMA20 stops." }
MEMORY

Memory Validation Rules — What Gets Rejected

What is this? Every memory_update the AI writes is validated before being stored. Memories containing dollar prices, confirmation bias, or duplicate content are rejected. Only relative, pattern-based insights survive.
REJECTED (price-anchored): "BTC rejected at $89,000 — strong resistance" "Support at $84,500 held three times" "Stop loss should be at $112,000" ACCEPTED (relative): "RSI>70 + BB%B>90 in low-ADX regimes produces mean-reversion 65% of the time" "When ADX>30 and +DI>-DI by 10+ points, trend continuation ~70%" "Classifier composite >65 with 4+ bullish has preceded 1-3% upside in 60% of cases" REJECTED (confirmation bias): "Maintain bearish thesis — continue shorting" "Stay short as bearish view intact" "Bullish stance confirmed — hold long" VALIDATION RULES: 1. Dollar values ($xxx) are counted — 3+ rejections = memory rejected 2. Must contain relative terms (%, ATR, ratio, indicator, regime, pattern, etc.) 3. Bias phrases ("maintain position", "continue short/long", "stay short/long") flagged 4. Similarity >60% to existing memories = rejected as duplicate 5. Maximum 12 memories stored — oldest evicted when full (FIFO)
SCORING

How We Score the AI (V6.1 Weights)

What is this? The composite score that determines the AI's grade. Note the integrity concern: 30% of this score can be achieved by doing nothing (risk + parse + memory + efficiency from holding).
SCORING WEIGHTS (V6.1): Direction Accuracy 30% — Can AI predict up/down correctly? Primary intelligence test. Measured against actual price movement with 0.2% threshold. Only "decisive" steps counted (AI says up or down, not flat). Alpha vs Buy & Hold 20% — Secret portfolio return minus what B&H would have earned. Tests: can AI add value over just holding? Alpha vs Baselines 10% — Must beat Random (10-run avg), RSI 30/70, and Classifier-only. Tests: is AI better than simple mechanical strategies? Risk Management 15% — Max drawdown, liquidation avoidance. ⚠ Achievable by holding cash (never trading = 0 drawdown). Position Sizing IQ 10% — Kelly criterion alignment. Does AI size bigger when RIGHT? Measures: avg size on winning trades vs avg size on losing trades. Trading Efficiency 5% — Penalizes >1 trade per step. V6.1 addition to stop overtrading. >2 trades/step = heavy penalty. Optimal: 0.3-0.7 trades/step. Memory Quality 5% — Relative insights, no price anchoring, no duplicates. ⚠ Achievable by writing any plausible-sounding text. Parse Reliability 5% — Valid JSON rate. 100% = perfect. <80% = fail. ⚠ Achievable by any model that follows JSON instructions. GRADE SCALE: A (75-100): ELITE — Genuine trading intelligence demonstrated B (60-74): GOOD — Skill above baselines C (45-59): AVG — Marginal improvement D (30-44): POOR — Fails to beat simple strategies F (0-29): FAIL — No evidence of intelligence INTEGRITY NOTE: Risk (15%) + Parse (5%) + Memory (5%) + Efficiency (5%) = 30% achievable by holding. A do-nothing AI scores ~60/100 = Grade B. This is a known flaw being fixed in V7.
DATA

Classifier Signals — What the AI Uses to Decide

What is this? Six independently-computed statistical probability signals. Each analyzes different market dimensions. The AI receives all six plus a composite score.
6 CLASSIFIER SIGNALS (computed from OHLCV candles): 1. TrendStrength Inputs: ADX, +DI/-DI spread, price vs EMA20/50/200, slope of moving averages Output: 0-100 score. >65 = strong trend. <35 = no trend. Direction: based on +DI vs -DI and price position relative to EMAs. 2. Momentum Inputs: RSI, ROC (rate of change), MACD histogram slope, volume trend Output: 0-100. >70 = strong momentum. <30 = momentum exhaustion. Direction: based on RSI position and MACD direction. 3. Volatility Inputs: ATR vs 20-period avg ATR, Bollinger Band width, high-low range Output: 0-100. >70 = high volatility. <30 = compressed/quiet. Direction: neutral unless extreme (expansion often precedes breakouts). 4. MeanReversion Inputs: BB%B, RSI extremes, distance from EMA20, Stochastic RSI Output: 0-100. >70 = overbought (bearish signal). <30 = oversold (bullish signal). The only contra-trend classifier. High score = expect pullback. 5. MACD Inputs: MACD line vs signal line, histogram direction, zero-line position Output: 0-100. Based on MACD crossover state and momentum. Direction: bullish when MACD > signal, bearish when MACD < signal. 6. MultiTimeframe Inputs: 1H + 4H + 1D RSI alignment, EMA alignment across timeframes Output: 0-100. >65 = all timeframes aligned. <35 = conflicting signals. Direction: consensus direction across timeframes. COMPOSITE SCORE: Weighted average of all 6 signals. 0-100 with direction (BULLISH/BEARISH/NEUTRAL). Note: Classifier-only strategy is one of the baselines the AI must beat. Historical accuracy: bearish signals more reliable than bullish.
BASELINES

4 Strategies the AI Must Beat

What is this? Simple mechanical strategies run on the same data. If the AI can't beat these, it has no edge. Currently GPT-5.4-nano beats 3/4 in V6.1, but direction accuracy (53%) suggests this may be luck.
1. Buy & Hold Buy at step 1, sell at step 30. The simplest possible strategy. V6.1 returns: Bull +8.12%, Bear -10.03%, Neutral +0.80%, Volatile -3.75% 2. Random Trader (10-run average) Random buy/sell/hold decisions with random position sizes. Averaged over 10 runs to smooth variance. Anything the AI can't beat consistently is just noise. 3. RSI Strategy (30/70) Buy when RSI < 30 (oversold), sell when RSI > 70 (overbought). Simple mean-reversion strategy. No AI needed. 4. Classifier-Only Strategy Follow the composite classifier signal: buy when bullish >60, sell when bearish <40, hold otherwise. Tests whether the AI adds value beyond what its own data inputs already signal.

Prompt Version History

Every significant change to the prompts is logged here. This lets us track what changed, why, and what impact it had on results.

VersionDateSystem Prompt ChangesUser Prompt ChangesImpact
V6.1 Apr 12, 2026 Added Rules 2-4: direction neutrality, deliberate trading, conviction sizing Added: DIRECTION NEUTRALITY, OVERTRADING WARNING, POSITION SIZING sections. Verified regime offsets. Score: 52.7→64.6. Fees: $60→$16/trial. Direction: 43%→53%.
V6.0 Apr 12, 2026 Rule 1 only: relative memory enforcement. No directional guidance. Full redesign: 6 sections (portfolio, history, outlook, TA, classifiers, memory). Memory validator. 4 baselines. Direction forecast field. Score: 52.7/100 Grade C. Direction accuracy 43% (below coin flip). Lost to ALL 4 baselines. Short bias 1.47:1.
V5.0 Apr 11, 2026 Production-matching system prompt. No trade constraints. Full production prompt builder with §1-§5 sections. Price-anchored memories allowed. No classifier signals. Secret -3.91% in +2.96% market. Memory feedback loop created bearish bias for 150+ steps.
V4.0 Apr 10, 2026 Simple "trading AI" identity. No memory rules. Minimal: price, portfolio, simple indicators. No classifiers, no memory validation. Fleet avg -0.19%. Fixed critical leverage accounting bug (100x phantom profits from V3).
V3.0 Apr 9, 2026 Basic trading AI. No constraints. Minimal prompt. No baselines. Broken leverage mechanics. INVALID — +133% to +355% returns were phantom profits from leverage bug.

Execution Parameters (V6.1)

ParameterValueWhy
Temperature0.3Low for reproducibility. Higher = more variation between trials.
Max Tokens2,048Enough for full JSON response with reasoning + multiple actions.
Steps per Trial30Each step = 1 hour of market data. 30 steps = 30 hours.
Trials per Regime3Each trial starts from a different verified offset in the candle data.
RegimesBull, Bear, Neutral, Volatile4 regimes × 3 trials = 12 trials per model.
Starting Balance$10,000 per portfolioBoth PUBLIC and SECRET start with $10K cash.
Max Leverage5xSECRET only. Liquidation at ~20% adverse move.
Taker Fee0.1% spot, 0.05% futuresRealistic exchange fees. Funding: 0.01%/step for leveraged positions.
Slippage0.05%Simulated execution slippage.
Candle Data10,443 BTC 1h candlesJan 28, 2025 to Apr 8, 2026. Real historical data.
Market Outlooks1,480 from production DB⚠ Potential data leakage — being removed in V7.
API ProviderOpenRouterRate-limited. All models accessed through same API.
⚠ Important: All models receive the exact same system prompt, user prompt template, data, and scoring. There is no model-specific prompt tuning. The only variable is the model ID sent to OpenRouter. This ensures a fair comparison.

See the Full Results

Now that you've seen what the AI receives, see how it performs.

View Benchmark Results Back to Dashboard