Quantitative Trading with Deep Reinforcement Learning
The Opportunity
QuantaFin Capital, a proprietary trading desk in Milan managing a €120M portfolio focused on European equity derivatives, sought to augment their human traders with an AI-driven execution system. The goal was not to replace traders but to handle the high-frequency, low-margin statistical arbitrage opportunities that humans physically cannot react to fast enough — sub-second price dislocations across correlated instruments.
System Architecture: The Trading Brain
1. Market Data Infrastructure
- Ultra-low-latency market data feed processing via kernel-bypass networking (DPDK).
- Normalized order book data from Borsa Italiana, Eurex, and Euronext into a unified internal format.
- Tick-by-tick storage in QuestDB (a time-series database optimized for financial data) — 5+ billion rows per day.
2. Reinforcement Learning Agent
- Custom Proximal Policy Optimization (PPO) agent built with Stable-Baselines3.
- State space: 127-dimensional feature vector including order book imbalance, VWAP deviation, realized volatility (Parkinson estimator), cross-asset correlation matrix, and Greeks exposure.
- Action space: Continuous — position sizing from -1 (max short) to +1 (max long) with discrete hold option.
- Reward function: Risk-adjusted PnL (Sharpe-penalized) with transaction cost awareness and maximum drawdown penalties.
- Trained on 7 years of tick data (~180 billion data points) using curriculum learning: simple single-asset → multi-asset → portfolio-level.
3. Risk Management Layer
- Hard-coded risk limits that cannot be overridden by the AI agent:
- Max position size: 2% of NAV per instrument.
- Max portfolio VaR (99%, 1-day): €1.2M.
- Kill switch: automatic liquidation if daily PnL drops below -€500k.
- Real-time Greeks monitoring (Delta, Gamma, Vega, Theta) with auto-hedging.
4. Execution Engine
- Smart Order Router (SOR) that fragments large orders across venues to minimize market impact.
- FIX 4.4 protocol connectivity to multiple execution venues.
- Latency budget: < 500μs from signal to order submission.
5. Monitoring & Explainability
- Custom Grafana dashboards showing real-time PnL attribution, agent confidence scores, and feature importance (SHAP values).
- Every trading decision is logged with the full state vector and the agent's action probability distribution for post-trade analysis.
- Daily automated reports comparing AI performance vs. human trader benchmarks.
Technology Stack
- ML/RL: Python, PyTorch, Stable-Baselines3, Ray (distributed training)
- Data: QuestDB, Apache Parquet, Redis (feature store)
- Execution: C++ (order gateway), FIX protocol, DPDK
- Infrastructure: Bare-metal servers (co-located at Equinix ML1), Docker
- Monitoring: Grafana, Prometheus, custom Python dashboards
Performance (12-month live track record)
- Annualized Sharpe Ratio: 2.8 (vs. 1.4 for the human-only desk).
- Maximum Drawdown: -3.2% (vs. -8.7% human desk).
- Win Rate: 62% of trades profitable.
- Incremental Alpha: +€4.8M attributable to the RL agent.
- Latency: p99 signal-to-order = 380μs.