Application of Reinforcement Learning in Portfolio Management
The Bandit Problem
Determining the optimal portfolio is quite challenging because of the non-deterministic nature of financial markets. Multiarmed bandit serves as a sequential decision-making strategy under uncertainty. Researches reveal that if the bandit algorithm is combined with traditional methods, the portfolio's total performance will increase. Having sufficient market data, the bandit algorithm can improve the Sharpe ratio and the total return.