Our Reinforcement Learning mining and repository: Now live trading!

Machine learning has been a great passion for me during the past several years. During last year and most of this year I have been committed to the improvement and creation of an ML system repository based on classic supervised learning techniques and during the past several months I have been focused on bringing another machine learning vision – based on reinforcement learning – to life. After a lot of hard work implementing OpenCL based mining software – which can mine RL strategies using GPU technology – and also implementing the entire F4 framework trading and cloud mining server-side infrastructure today I am happy to announce the start of RL live trading using the first 91 systems that have been added to our repository as the result of our first low data-mining bias experiment. In this article I will talk a bit about these advances and some of the differences that RL has had with some of our other trading approaches.

Our reinforcement learning mining experiments proceed just like our price-action and machine-learning experiments have, with some small differences. The core of the process remains the same, we generate trading strategies using real data and then attempt the exact same search procedure using random data in order to discard any process where the generation of a profitable system in random data is more than 1/100 as probable as the same generation in real data. What this means is simply that we only care for systems that have a less than 1% chance of being created out of the simple strength of the data-mining process. In the RL case the creation process is however more complex since it involves training the reinforcement learning algorithm with 60% of the data – which involves 10 back-tests for each system – then testing within the remaining 40% and ensuring that the initial 60% remains coherent with the 40% used for testing (little deterioration in the pseudo out-of-sample). This exact same process is applied to real and random series. Note that we carry out this p-OS split in the case of RL because RL does not “lose information” due to having a p-OS period. This happens because it also trains through this period, although with no hindsight (only trains once as it passes over it with no ability to see into the future, just as it trains when live trading).

To many the above and rather complicated process might seem unnecessary. If you have a pseudo out-of-sample that is already 40% of the data, then isn’t this enough “guarantee” that you are not falling into an excessive curve-fitting or data-mining bias trap? The answer is that the multiple testing process – the fact that you’re searching multiple times for a pseudo out-of-sample that works – makes it necessary to ensure that you’re not just finding a pseudo out-of-sample that works just out of random chance. As a matter of fact the RL mining has showed to be extremely good at finding systems – yes, systems where even the testing phase looks great – where there is also a large propensity of finding the exact same “great systems” in random series. This shows that the strength of the mining process is huge, the RL process is very good at fitting and the chance that you also perform well in testing phases just out of random chance can also be very significant. The second image in this post shows you an experiment where RL finds a lot more systems across random series (orange) than it did in the real data (yellow).

Up until now we have only found a single case where the RL has been able to find great systems in real data but such systems have been very scarce (in fact non-existent) in random data series. This was a EURUSD experiment that was able to generate 91 uncorrelated strategies for this pair. The system showed in the first image belongs to this group although for this back-test I used a testing period of only 2010-2016 (although the system was generated using a 60/40 split as described above). As you can see there is some deterioration of the Sharpe within the testing period – the maximum drawdown happens within the testing phase – but overall at least 40% of the profit happens within the 40% testing period and overall system characteristics do remain similar. A very important thing is that linearity does not deteriorate significantly, meaning that the system does not show significant signs of alpha decay within this period, showing that the system is indeed able to adjust as it does its online trading.

These systems are now being live traded within an Oanda live account using the Asirikuy Trader. Another advantage of RL systems is that they execute very fast, given that they use in-memory arrays that are very efficiently accessed and the operations carried out on each bar are extremely simple. The 91 strategies execute in a bit less than 0.4 seconds in the Asirikuy trader, also thanks to some modifications I made during the past two weeks to greatly increase the data usage efficiency of the program (preventing unnecessary data requests and taking advantage of the fact that multiple systems might use the same symbol data). We will probably be able to execute hundreds of RL systems in the Asirikuy Trader before we run into problems. Since these RL systems use no SL or TP values they also have the advantage of being more resistant to execution issues since they are not searching for some predetermined price based exits but simply enter/exit trades within the start/end of daily bars (current systems trade on the daily timeframe).

Our reinforcement learning mining, trading system repository and live trading account are the start of a new journey in our understanding of machine learning, curve-fitting and data-mining bias. In a few months we’ll know how well reinforcement learning systems can respond to changing market conditions, how well do they learn when live trading and how easy or hard it is to find RL system generation processes with low data-mining bias. If you would like to learn more about RL and how you too can actually live trade systems using this type of trading please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading.strategies.

Print Friendly
You can leave a response, or trackback from your own site.
Subscribe to RSS Feed Follow me on Twitter!
Show Buttons
Hide Buttons