Autonomous strategy search agent for the swym backtesting platform.
Runs a loop: asks Claude to generate trading strategies → submits backtests to swym → evaluates results → feeds learnings back → repeats. Promising strategies are automatically validated on out-of-sample data to filter overfitting.
export ANTHROPIC_API_KEY="sk-ant-..."
cargo run -- \
--swym-url https://dev.swym.hanzalova.internal/api/v1 \
--max-iterations 50 \
--instruments binance_spot:BTCUSDC,binance_spot:ETHUSDC,binance_spot:SOLUSDC \
--backtest-from 2025-01-01T00:00:00Z \
--backtest-to 2025-10-01T00:00:00Z \
--oos-from 2025-10-01T00:00:00Z \
--oos-to 2026-03-01T00:00:00Z
Coverage check — verifies candle data exists for all instruments and finds common available intervals.
Strategy generation — sends the DSL schema + prior results to Claude, which produces a new strategy JSON each iteration.
In-sample backtest — submits the strategy against all instruments for the training period. Evaluates Sharpe ratio, profit factor, win rate, net PnL.
Out-of-sample validation — if any instrument shows Sharpe > threshold with enough trades, the strategy is re-tested on held-out data. Only strategies that pass both phases are saved as "validated".
Learning loop — all results (including failures) are fed back to Claude so it can learn from what works and what doesn't. The conversation is trimmed to avoid context exhaustion while the full results history is passed as structured text.
All options are available as CLI flags and environment variables:
| Flag | Env | Default | Description |
|---|---|---|---|
--swym-url | SWYM_API_URL | https://dev.swym.hanzalova.internal/api/v1 | Swym API base URL |
--anthropic-key | ANTHROPIC_API_KEY | required | Anthropic API key |
--model | CLAUDE_MODEL | claude-sonnet-4-20250514 | Claude model |
--max-iterations | 50 | Maximum search iterations | |
--min-sharpe | 1.0 | Minimum Sharpe for "promising" | |
--min-trades | 10 | Minimum trades for significance | |
--instruments | BTC,ETH,SOL vs USDC | Comma-separated exchange:SYMBOL | |
--backtest-from | 2025-01-01 | In-sample start | |
--backtest-to | 2025-10-01 | In-sample end | |
--oos-from | 2025-10-01 | Out-of-sample start | |
--oos-to | 2026-03-01 | Out-of-sample end | |
--initial-balance | 10000 | Starting USDC balance | |
--fees-percent | 0.001 | Fee per trade (0.1%) | |
--output-dir | ./scout-results | Where to save strategies and reports |
scout-results/
├── strategy_001.json # Every strategy attempted
├── strategy_002.json
├── ...
├── validated_017.json # Strategies that passed OOS validation
├── validated_031.json # (includes in-sample + OOS metrics)
└── best_strategy.json # Highest avg Sharpe across instruments
Start with Sonnet (claude-sonnet-4-20250514) for cost efficiency during
exploration. Switch to Opus for refinement of promising strategies.
50 iterations is a reasonable starting point. The agent typically finds interesting patterns within 20-30 iterations if they exist.
Watch the logs — the per-iteration summaries show you what the agent is learning in real time.
Adjust dates to match your actual candle coverage. The agent checks coverage at startup and will fail fast if data is missing.
The OOS validation threshold is intentionally relaxed (70% of in-sample Sharpe, half the trade count) because out-of-sample degradation is expected. Strategies that maintain edge through this filter are genuinely interesting.
23 activities
11fe79e docs: add CLAUDE.md for future Claude Code instancesfcb9a2f chore: attempt dedupe guidance in prompt75c95f7 feat: add triple-Supertrend consensus flip as strategy family 76601da2 feat: add reverse flag and symmetric short support to DSL8de3ae5 Add Binance Futures support (long and short)a435d3a Define concrete 'promising' threshold and enforce indicator diversity in ledger-informed promptb476199 Fix ledger context being overridden by prescriptive initial promptd76d3b9 Use write_all for ledger entries to improve concurrent-write safety0945c94 Add --ledger-file arg for explicit ledger path controla0316be Add cross-run learning via run ledger and compare endpoint609d645 docs: cross-run learnings plan6692bdb Prompt: fix method vs kind confusion causing 11/15 validation failures36689e3 Prompt: fix field+offset kind omission and add interval guidance87d31f8 Use flat result_summary fields from swym patch 8fb4103113892ab3 fix: parse actual result_summary structure (backtest_metadata + assets)8589675 fix: ValidationError.path optional, correct position_quantity usage in promptsee260ea fix: parse flat result_summary structure per updated API doc3f8d4de feat: add declarative SizingMethod types from upstream schema51e452b feat: discover max_output_tokens from server at startup89f7ba6 feat: model-family-aware token budgets and prompt style6f4f864 fix: increase max_tokens to 8192 for R1 reasoning overhead185cb45 fix: strip R1 think blocks before JSON extractionb947f48 feat: client-side validation, cycling detection, quantity prompt fixe27aaba feat(agent): improve LLM feedback loop and convergence detectionfb1145a fix(swym): parse result_summary from actual API response structurec7a2d65 fix(prompts): forbid dynamic quantity expressions, require plain decimal string292c101 docs(prompts): add DSL expression kind reference and three working examplesfc9b7e0 feat(agent): add strategy quality introspectiondeb28f6 chore: local defaultsb7aa458 feat(claude): add configurable API base URL via --anthropic-url