Here are the topics that are offered to CentraleSupélec 3rd year students for the period 2021-2022.

As part of our work on automated trading systems, we use model-based reinforcement learning algorithms. These algorithms exploit a model of the prices in time. However, classical regression approaches are limited and the current state of the art, XGBoost, does not achieve reliable results.

We therefore seek to explore solutions from physics, such as [1]. These approaches exploit a time series decomposition to extract noise and non-Markovian components.

The project aims at conducting a comparative study of non-Markovian Dynamics Modeling solutions. Among these solutions, we find the Mori-Zwanzig, Nakajima-Zwanzig or k-PCA formalism, among others. Our main use of the models is the simulation of temporal sequences, coming from the same unknown distribution as the samples of the data set. The metric to be optimized is therefore the Kolmogorov-Smirnov metric [3], which compares the distances between two observed distributions. The final objective is to train a RL agent on a real, augmented or synthetic environment to compare the performance in Portfolio Allocation.

During this project, you will:

- draw a state of the art of existing modeling techniques,
- implement algorithms, with libraries [4],
- benchmark those methods.

[1] Spindler et al, “Time Series Analysis of Real-world Complex Systems—Climate, Finance, Proteins, and Physiology”. (2007) Chap. 9.

[2] Hassanibesheli et al, “Reconstructing Complex System Dynamics from Time Series: A Method Comparison.” New Journal of Physics 22, no. 7 (2020) https://iopscience.iop.org/article/10.1088/1367-2630/ab9ce5/pdf

[3] Dimitrova et al, “Computing the Kolmogorov–Smirnov Distribution when the Underlying cdf is Purely Discrete, Mixed or Continuous”. Journal of Statistical Software (2020)

[4] MZProjection, https://github.com/smaeyama/mzprojection

In the context of work on automated trading systems, we rely on reinforcement learning algorithms to create portfolio allocation strategies. These algorithms require large computational resources to obtain a strategy that is not guaranteed to converge. The current approach aims at learning an action policy from interactions between an agent and the trading environment. Thus, the transition and reward functions are learned either implicitly or by a critic in parallel to the action policy.

Reverse reinforcement learning (RRL), nicknamed “imitation learning”, consists in learning the reward function and the action policy from an expert system. This approach can be doubly interesting in finance where these estimates are complex from interactions, and where an adapted reward function can be difficult to create, in the case of risk mitigation.

The project aims at comparing the different existing LRI solutions, as well as quantifying their advantages over classical RL solutions (DDPG, SAC, PPO). Works [1,2,3,4] gather a number of approaches, as well as the founding theories of the field. We wish to apply this method to portfolio management (OLPS), we can already find first publications on this application, such as [5].

During this project, you will:

- get to know the existing IRL techniques,
- implement algorithms, with libraries (SB3, torch),
- benchmark your methods.

[1] Schmidhuber “Reinforcement Learning Upside Down: Don’t Predict Rewards – Just Map Them to Actions.” ArXiv:1912.02875 [Cs], (2020)

[2] Xu et al “Receding Horizon Inverse Reinforcement Learning.” arXiv, (2022)

[3] Gerogiannis “Inverse Reinforcement Learning,” (2022)

[4] Chen et al “Decision Transformer: Reinforcement Learning via Sequence Modeling.” arXiv, (2021)

[5] Halperin et al “Combining Reinforcement Learning and Inverse Reinforcement Learning for Asset Allocation Recommendations.” arXiv, (2022)

Automated fraud detection methods can take two distinct forms:

- a set of discrete rules (e.g. IF Amount > x AND Currency == $: Fraud) relying on expert knowledge that is expensive to acquire,
- machine learning (ML) and Deep Learning (DL) approaches: learning from a training dataset whether a payment is fraudulent. While ML methods can be more efficient than rule-based methods, it is not easy to implement them efficiently. Indeed, fraud detection has two major characteristics that differentiate it from standard applications: (i) unbalanced data: non-fraudulent payments are largely in the majority in the datasets, (ii) concept drift: the distribution from which fraud data are derived may vary over time.

Generally, for classification tasks on tabular data, rather than DL models, the best performing approaches are methods based on Gradient Boosting (e.g. LightGBM, XGBoost). This is also true in the case of unbalanced data where Gradient Boosted Decision Trees (GBDT) seem to be the best performing methods. Nevertheless, recently a number of deep methods for tabular data have been proposed in the literature and seem to show encouraging results on balanced data. In particular Hopular, proposed in [1] based on Modern Hopfield Network. The objective of this project will be to implement this model on real fraud data provided by LUSIS in order to compare its performance to baseline GBDT models. The complexity of the Modern Hopfield Networks makes it impossible to use them on a dataset of too large size, therefore this approach will be used to perform two types of experiments. First, a comparison with GBDT on a well-chosen subset of the data, and second, using Hopular to try to classify among the payments predicted as fraud by a model (e.g. GBDT) those that are actually frauds in order to improve accuracy.

During this project, you will:

- get acquainted with the issues related to the use of ML methods for fraud detection,
- do bibliographic research to determine the metrics to use to evaluate the models,
- read and understand the model proposed in [1],
- adaptation the code made available by [1] (https://github.com/ml-jku/hopular) to the fraud dataset made available by LUSIS,
- do performance comparison with XGBoost and LightGBM on a subset of data,
- use Hopular to try to improve the performance of a Gradient Boosting model,
- if enough time, modify the Hopular model to try to improve its performance.

[1] Schäfl, Bernhard and Gruber, Lukas and Bitto-Nemling, Angela and Hochreiter, Sepp. “Hopular: Modern Hopfield Networks for Tabular Data”. https://arxiv.org/abs/2206.00664 (2022).

[2] Jesse Davis et Mark Goadrich. “The Relationship Between Precision-Recall and ROC Curves”. In : t. 06. Juin 2006. doi : 10.1145/1143844.1143874.

Automated fraud detection methods can take two distinct forms:

- a set of discrete rules (e.g. IF (Amount > x) AND (Currency == $) : Fraud) relying on expert knowledge that is expensive to acquire,
- Machine Learning (ML) and Deep Learning (DL) approaches, which is learning from a training dataset whether a payment is fraudulent.

While ML methods can be more efficient than rule-based methods, it is not easy to implement them efficiently. Indeed, fraud detection has two major characteristics that differentiate it from standard applications: (i) unbalanced data: non-fraudulent payments are largely in the majority in the datasets, (ii) concept drift: the distribution from which fraud data are derived may vary over time.

Generally, for classification tasks on tabular data rather than deep models, the best performing approaches are those based on Gradient Boosting (e.g. LightGBM, XGBoost). This is also true in the case of unbalanced data where Gradient Boosted Decision Trees (GBDT) seem to be the best performing methods. Nevertheless, in parallel to these approaches, anomaly detection (AD) methods, at the border between supervised and unsupervised, have also shown interesting performances on unbalanced data. In [1], the authors propose to combine these two methods to try to improve the performances of GBDT methods. In particular, they propose to use existing DA methods to create new features.

During this project, you will:

- get acquainted with the issues related to the use of ML methods for fraud detection,
- search the literature to determine the metrics to use to evaluate the models.
- implement AD methods to create new features:
- standard One-Class Classification methods: One-Class SVM, Isolation Forest (sklearn)
- deep methods: Deep SVDD [2], GOAD [3], NeuTraLAD [4] (code available on Github).
- methods based on Deep Autoencoders [5], [6].

- use methods implemented in 3° to augment existing features and try to improve GBDT performance as suggested in [1].
- if enough time, adapt/modify the method proposed in [1] to try to improve its performance.

[1] Zhao, Yue and Maciej K. Hryniewicki. “XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning.” 2018 International Joint Conference on Neural Networks (IJCNN) (2018): 1-8. (https://arxiv.org/ftp/arxiv/papers/1912/1912.00290.pdf).

[2] Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S.A., Binder, A., Müller, E. ; Kloft, M.. (2018). Deep One-Class Classification. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research 80:4393-4402 Available from https://proceedings.mlr.press/v80/ruff18a.html.

[3] Bergman, L., Hoshen, Y.: Classification-based anomaly detection for general data. In: International Conference on Learning Representations (2020). (https://openreview.net/forum?id=H1lKlBtvS).

[4] Qiu, C., Pfrommer, T., Kloft, M., Mandt, S., Rudolph, M.: Neural Transformation Learning for Deep Anomaly Detection Beyond Images. https://arxiv.org/abs/2103.16440 (2021).

[5] Marco Schreyer and Timur Sa]arov and Damian Borth and Andreas Dengel and Bernd Reimer (2017). Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks. CoRR, abs/1709.05254. (https://arxiv.org/abs/1709.05254)

[6] Ki Hyun Kim, Sangwoo Shim, Yongsub Lim, Jongseob Jeon, Jeongwoo Choi, Byungchan Kim, & Andre S. Yoon (2020). RaPP: Novelty Detection with Reconstruction along Projection Pathway. In: International Conference on Learning Representations. (https://openreview.net/pdf?id=HkgeGeBYDB)

[7] Jesse Davis et Mark Goadrich. “The Relationship Between Precision-Recall and ROC Curves”. In : t. 06. Juin 2006. doi : 10.1145/1143844.1143874.