PROGRAMME
Wednesay, 29 November
2017 

Time 
Activity 
09:00 – 09:30 
Registration 
09:30 – 09:40 
Opening Address 
09:40 – 10:25 
Ying CHEN 
10:25 – 10:55 
Coffee Break 
10:55 – 11:40 
Stephan CLÉMENÇON 
11:40 – 12:25 
Stéphane GAIFFAS 
12:25 – 14:00 
Group Photo & Lunch 
14:00 – 14:45 
JeanYves AUDIBERT 
14:45 – 15:30 
Benjamin BRUDER 
15:30 – 16:00 
Tea Break 
16:00 – 16:45 
GahYi BAN 
16:45 – 17:30 
Mathilde MOUGEOT 
Thursday, 30 November 2017 

Time 
Activity 
09:30 – 10:15 
Arnulf JENTZEN 
10:15 – 10:45 
Coffee Break 
10:45 – 11:30 
Johann LUSSANGE 
11:30 – 12:15 
Chao ZHOU 
12:15 – 14:00 
Lunch 
14:00 – 14:45 
Steven KOU 
14:45 – 15:30 
Steven KOU 
15:30 – 16:00 
Tea Break 
16:00 – 16:45 
Cyril GRUNSPAN 
Aggregating Weak Predictions and Rupture Detections in Financial Time Series
JeanYves AUDIBERT, Capital Fund Management, France
Financial market data offers various challenging tasks for Machine Learning (ML) methods. We present here some results on semisupervised and supervised learning tasks related to forecasting stocks returns. We show how ML methods do predict stocks returns, even when using standard indicators from the literature. We discuss the limits of these approaches at low or medium frequency timescales.
ML methods strongly rely on i.i.d. or at least strong stationarity assumptions. In reality, there are clear breakpoints in the history of financial markets, and in particular in the performance of strategies.
Detecting these breakpoints may help for dynamically investing in strategies. We will thus present rupture detection methods, and emphasize on the impact of the nongaussianity of P&L timeseries.
Machine Learning and Portfolio Optimization
GahYi BAN, London Business School, England
The portfolio optimization model has limited impact in practice because of estimation issues when applied to real data. To address this, we adapt two machine learning methods, regularization and crossvalidation, for portfolio optimization. First, we introduce performancebased regularization (PBR), where the idea is to constrain the sample variances of the estimated portfolio risk and return, which steers the solution toward one associated with less estimation error in the performance. We consider PBR for both meanvariance and meanconditional valueatrisk (CVaR) problems. For the meanvariance problem, PBR introduces a quartic polynomial constraint, for which we make two convex approximations: one based on rank1 approximation and another based on a convex quadratic approximation. The rank1 approximation PBR adds a bias to the optimal allocation, and the convex quadratic approximation PBR shrinks the sample covariance matrix. For the meanCVaR problem, the PBR model is a combinatorial optimization problem, but we prove its convex relaxation, a quadratically constrained quadratic program, is essentially tight. We show that the PBR models can be cast as robust optimization problems with novel uncertainty sets and establish asymptotic optimality of both sample average approximation (SAA) and PBR solutions and the corresponding efficient frontiers. To calibrate the righthand sides of the PBR constraints, we develop new, performancebased kfold crossvalidation algorithms. Using these algorithms, we carry out an extensive empirical investigation of PBR against SAA, as well as L1 and L2 regularizations and the equally weighted portfolio. We find that PBR dominates all other benchmarks for two out of three Fama–French data sets.
Data Science and Asset Management
Benjamin BRUDER, Lyxor Asset Management, France
This presentation overviews the potential applications of machine learning to asset management. After a brief reminder of the major steps of a portfolio construction process, we will consider which related problems can benefit from machine learning techniques, and which ones can be suffer from data mining and strong overfitting biais. In particular, we will show that various problems involving covariances are in general good candidates for a machine learning oriented framework. On the contrary, we consider that long term trend estimation should be based on very parsimonious techniques, and rely on long term experience rather than off the shelf complex data mining solutions.
Sentiment Analysis for Online Reviews with Regularized Text Logistic Regression
Ying CHEN, National University of Singapore, Singapore
With the increasing usergenerated reviews and feedback posted in online review platforms, it becomes essential for executives and managers to build an efficient classifier to capture general sentiment of reviews based on the unstructured text information. We propose regularized text logistic regression method that, besides providing good classification accuracy, can identify a set of essential features so as to provide rapid and valuable suggestions for sentiment analysis and operational improvement. We demonstrate the performance of the proposed method along with two real text data on restaurants and hotels and compare the classification performance with several alternatives.
This is based on joint work with Peng LIU (National University of Singapore, Singapore) and Chung Piaw TEO (National University of Singapore, Singapore).
Weak Signals: MachineLearning Meets Extreme Value Theory
Stephan CLEMENÇON, Telecom Paris, France
"From pattern recognition to stochastic bandits, most machinelearning algorithms only involve the computation of basic sample mean statistics and the performance of empirical risk/regret minimizers produced by the latter can be investigated by means of concentration results for empirical processes. In many applications however (e.g. classification with unbalanced classes, novelty detection, dimensionality reduction), the useful information can be located in the ‘tails’ of the data distribution, far from the mean behaviour, and risk/regret cannot be appropriately described by such statistics any more. In the Big Data era, the observation of rare/extreme events is now possible, which paves the way for designing novel algorithms relying on extreme value statistics. It is the goal of this talk to illustrate this belief through the presentation of recent works, where machinelearning interfaces with extreme value theory ans leads to efficient methods supported by a sound validity framework."
References:
Anomaly Detection in Extreme Regions via Empirical MVsets on the Sphere. A. Thomas, S. Clémençon, A. Sabourin & A. Gramfort. In the Proceedings of AISTATS 2017, Fort Lauderdale, USA.
Sparse Representation of Multivariate Extremes with Applications to Anomaly Detection. With N. Goix, A. Sabourin & S. Clémençon. In Journal of Multivariate Analysis, 2017.
Learning the dependence structure of rare events: a nonasymptotic study. N. Goix, A. Sabourin & S. Clémençon. In the proceedings of the 2015 COLT conference, Paris, France.
Statistical Learning with Hawkes Processes
Stéphane GAIFFAS, University Paris Diderot, France
We consider the problem of unveiling the implicit network structure of interactions between nodes (users in a social network for instance, moves of highfrequency financial signals), based their actions timestamps. We will describe several approaches to achieve this: using a parametric modeling of the Hawkes process with sparsityinducing penalization (sparsity and lowrank of the adjacency matrix), and using a more recent and direct approach based on cumulants matching. Our theoretical analysis required a new tool: matrix concentration inequalities for continuous time martingales, that are of independent interest, and that will be quickly described during this talk. Our methods are illustrated on the MemeTracker dataset (network of blogs) and on financial data (order book modeling).
Security and Stability of Blockchains
Cyril GRUNSPAN, ESILV, France
The invention of bitcoin in 2008 marks a new stage in the history of money. For the first time, a currency lies solely on trust in simple cryptographic algorithms rather than on a state or a central bank. The technology used and known as blockchain establishes an original bridge between distributed systems and probability theory. We propose to explain this fact as well as the convergence between private interests and public interest.
On Deep Learning based Approximation Algorithms for Partial Differential Equations
Arnulf JENTZEN, ETH Zurich, Switzerland
Partial differential equations (PDEs) are among the most universal tools used in modelling problems in nature and manmade complex systems. In particular, PDEs are a fundamental tool in portfolio optimization problems and in the stateoftheart pricing and hedging of financial derivatives. The PDEs appearing in such financial engineering applications are often high dimensional as the dimensionality of the PDE corresponds to the number of financial asserts in the involved hedging portfolio. Such PDEs can typically not be solved explicitly and developing efficient numerical algorithms for high dimensional PDEs is one of the most challenging tasks in applied mathematics. As is wellknown, the difficulty lies in the socalled "curse of dimensionality" in the sense that the computational effort of standard approximation algorithms grows exponentially in the dimension of the considered PDE and there is only a very limited number of cases where a practical PDE approximation algorithm with a computational effort which grows at most polynomially in the PDE dimension has been developed. In the case of linear parabolic PDEs the curse of dimensionality can be overcome by means of stochastic approximation algorithms and the FeynmanKac formula. We first review some results for stochastic approximation algorithms for linear PDEs and, thereafter, we present a stochastic approximation algorithm for high dimensional nonlinear PDEs whose key ingredients are deep artificial neural networks, which are widely used in data science applications. Numerical results illustrate the efficiency and the accuracy of the proposed stochastic approximation algorithm in the cases of several high dimensional nonlinear PDEs from finance and physics.
The Economics of Bitcoin
Steven KOU, National University of Singapore, Singapore
We attempt to build an equilibrium model about Bitcoin to address the following research questions simultaneously: Why the Bitcoin price has increased more than 60,000 times from 2009 to now? Why the miner's proportion of Bitcoin holding will decline in time, despite the price increase, Athey et al. (2016, WP)? The model features (1) two control variables, inventory level and imposed transaction fee, and (2) ``S'' shape demand level (Bass, 1967; Bass, 2004). The assumption of a given demand level or a curve is popular in monopolistic pricing, as in Industrial Organization, Revenue Management, and Marketing. Our model yields an interesting price dynamic: In shortrun the Bitcoin price is driven by "S" shape demand level, while in longrun the price turns to be flat. The model predicts that the miner's proportion of Bitcoin holding will decline in time, consistent with the empirical finding in Athey et al. (2016, WP).
This is a joint work with Min DAI (National University of Singapore, Singapore), Wei JIANG (National University of Singapore, Singapore) and Cong QIN (National University of Singapore, Singapore).
A Theory of Fintech
Steven KOU, National University of Singapore, Singapore
In this talk, I will give a brief overview of current academic research on Fintech. The topics to be discussed include: (1) P2P equity financing: how to design contracts suitable for a P2P equity financing platform with information asymmetry. (2) Robotic financial advising: how to get investor’s risk aversion parameters automatically by asking simple questions, and how to get consistent answers to meet goals of investors, such as retirement planning. (3) Economics of Bitcoin: how to build a general equilibrium model for bitcoin. (4) Data privacy preservation: how to do econometrics based on the encrypted data while still preserving privacy. All the above 4 topics are based on my recent working papers.
Latest Advances in Reinforcement Learning
Johann LUSSANGE, École Normale Supérieure, France
Derived from the early biological studies on Pavlovian conditioning, reinforcement learning is one of the most promising approaches to machine learning. Distinct from the supervised and unsupervised learning approaches, the whole reinforcement learning framework is subject to three specific challenges: i the exploration versus exploitation dilemma, ii the curse of dimensionality, iii the reward estimation problem. These challenges have recently led to much research activity, such as deep Qlearning, hierarchical reinforcement learning, shaping rewards, inverse and transfer learning, selfplay and multiagent learning, etc. Keeping in mind the potential fintech applications, we present here for technology intelligence, an overview of the latest advances of the field.
Statistical and Machine Learning Methods to Model and Forecast Energy
Mathilde MOUGEOT, LPMA and ENSIIE, France
Since electricity can hardly be stored, forecasting tools are essential to appropriately balance consumption and generation of energy, including renewable energies. To adapt energy market prices, forecasting is also essential.
Analyzing historical data shows that the time series of wind production or the national electrical consumption are radically different regarding for example, volatility or periodicity. Dedicated models for modeling and forecasting should be, consequently, introduced and adapted for energy production or consumption.
In order to forecast energy consumption, we are currently developing a “prediction box” based on a sparse learning process for functional regression. This model allows us to forecast, in a high dimensional framework, the intra day load curves on the French national consumption [2,3]. In a second study, machine learning techniques are challenged first to model the wind energy then to forecast the wind production using restricted wind measures as provided by the meteorological companies [3].
[1] Fischer A., Montuelle L., Mougeot M., Picard D. (2017) Statistical learning for wind power: a modeling and stability study towards forecasting. Wind Energy.
[2] Mougeot M., Picard D., Lefieux V., MaillardTeyssier L. Forecasting intra day load curves using sparse functionnal regression. Springer Lecture Notes in Statistics, p 161182.
[3] Mougeot M., Picard D., Tribouley K. (2013) Sparse approximation and fit of intraday load
curves in a high dimensional framework. Advances in Adaptive Data Analysis, p123.
Investment Decisions and Falling Cost of Data Analytics
Chao ZHOU, National University of Singapore, Singapore
We study how the cost of data analytics and the characteristics of investors and investment opportunities affect investment decisions and their data analytics. We show that the falling cost of the data analytics raises investors’ leverage, financially constrained or highly riskaverse investors use less data analytics, the value of data analytics is highest with average investment opportunities and it is low with a high or low expected return opportunities. Due to the increased leverage, the falling cost of data analytics may lead to higher losses during the crises.
This is a joint work with Jussi Keppo (National University of Singapore, Singapore) and Hong Ming Tan (National University of Singapore, Singapore).