Machine Learning for Asset ManagersSuccessful investment strategies are specific implementations of general theories. An investment strategy that lacks a theoretical justification is likely to be false. Hence, an asset manager should concentrate her efforts on developing a theory rather than on backtesting potential trading rules. The purpose of this Element is to introduce machine learning (ML) tools that can help asset managers discover economic and financial theories. ML is not a black box, and it does not necessarily overfit. ML tools complement rather than replace the classical statistical methods. Some of ML's strengths include (1) a focus on out-of-sample predictability over variance adjudication; (2) the use of computational methods to avoid relying on (potentially unrealistic) assumptions; (3) the ability to “learn” complex specifications, including nonlinear, hierarchical, and noncontinuous interaction effects in a high-dimensional space; and (4) the ability to disentangle the variable search from the specification search, robust to multicollinearity and other substitution effects. |
Contents
Distance Metrics | 2-3 |
Optimal Clustering | 2-4 |
Financial Labels | 2-5 |
Feature Importance Analysis | 2-6 |
Portfolio Construction | 2-7 |
Testing Set Overfitting | 12 |
Testing on Synthetic Data | 17 |
Proof of the False Strategy Theorem | 20 |
Bibliography | 22 |
References | 29 |
Other editions - View all
Common terms and phrases
allocation apply asset managers autoencoders backtest classical clustered MDI Code Snippet compute condition number conditional entropies correlation matrix covariance matrix cross-validation data sets denoising derive distance metric distribution econometrics efficient frontier eigenvalues eigenvectors empirical covariance matrix entropy estimate evaluate example false discovery false positive False Strategy theorem familywise familywise error rate feature importance Figure financial ML fixed-horizon Forecasting function Gaussian hierarchical implements instability investment strategy joint entropy labels linear López de Prado Machine Learning Marcenko–Pastur Markowitz's maximum Sharpe ratio meta-labeling method minimum variance portfolio ML algorithms Monte Carlo experiments multiple testing mutual information nonlinear normalized mutual information number of clusters number of trials numpy observations matrix ONC algorithm out-of-sample overfitting p-values partitions prediction probability problem Random Forests random variables regression researchers RMSE sample Section shrinkage shuffling signal simulations solution standard deviation statistical supervised learning techniques type I error variation of information vector