Machine learning

Objectives and Learning Outcomes

The course provides a comprehensive introduction to the most important machine learning models and algorithms, and their applications in finance using Python, with an emphasis on model performance, validation, and interpretability.


Data preprocessing, common pitfalls in machine learning. Density estimation: parametric and non-parametric (histogram and kernel density estimation) methods. Big data and dimensionality reduction techniques: principal components analysis (PCA), and multidimensional scaling (MDS). Clustering analysis. Supervised learning. Regression methods. Classification approaches. Ensemble learning methods. Unsupervised learning. Artificial neural networks: feedforward, recurrent (e.g., long short-term memory), generative adversarial, and convolutional networks, as well as autoencoders. Model assessment and selection: bias, variance, training and in-sample prediction error, information criteria, cross-validation, and bootstrapping.


Working with large data sets comprising stock and bond prices, fundamental financial and macroeconomic factors. Accessing, storing and retrieving data using Python. Developing Python code for a battery of machine learning methods, i.e., ordinary and robust linear regressions, penalized linear models, dimension-reduction techniques, generalized linear methods, regression trees and random forests, and neural networks. These machine learning algorithms will be applied for modelling equity and bond risk premia along the lines of the state-of-the-art research in empirical asset pricing. Practical aspects and industry applications represent an essential part of the work. In addition to data processing and coding in Python, the exercises are designed to address questions about samples splitting (i.e., training vs. validation vs. test), hyper-parameter tuning, in-sample and out-of-sample analysis, time series vs. cross sectional aspects (from modern asset pricing perspective), clustering, marginal relationship between factors and expected returns / risk premia, statistical and economic significance, portfolio forecasting, robustness, and interpretability of results.