In this book, we demonstrate how ML fits into the overall process of designing, executing, and evaluating a trading strategy. To this end, we’ll assume that an ML-based strategy is driven by data sources that contain predictive signals for the target universe and strategy, which, after suitable preprocessing and feature engineering, permit an ML model to predict asset returns or other strategy inputs. The model predictions, in turn, translate into buy or sell orders based on human discretion or automated rules, which in turn may be manually encoded or learned by another ML algorithm in an end-to-end approach.
Alpha factors are designed to extract signals from data to predict returns for a given investment universe over the trading horizon. A typical factor takes on a single value for each asset when evaluated at a given point in time, but it may combine one or several input variables or time periods.
Just like SWIFT is the message protocol for back-office (for example, in trade-settlement) messaging, the FIX protocol is the de facto messaging standard for communication before and during trade executions between exchanges, banks, brokers, clearing firms, and other market participants.
Exchanges provide access to FIX messages as a real-time data feed that is parsed by algorithmic traders to track market activity and, for example, identify the footprint of market participants and anticipate their next move.
The sequence of messages allows for the reconstruction of the order book. The scale of transactions across numerous exchanges creates a large amount (~10 TB) of unstructured data that is challenging to process and, hence, can be a source of competitive advantage.
While FIX has a dominant market share, exchanges also offer native protocols. Nasdaq offers a TotalView-ITCH direct data-feed protocol, which allows subscribers to track individual orders for equity instruments from placement to execution or cancellation.