System Programming: Time Series Analysis and Forecasting. Programming Approach

Tuesday, September 4, 2012

Time Series Analysis and Forecasting. Programming Approach - thoughts

"Certain things are impossible...
Until an ignoramus appears, who is not aware of that".

Time Series - a sequence of data points, measured typically at successive time instants spaced at uniform time intervals.

There are quite a lot of things that may fit this definition. For example, air temperature changes throughout the day (let's say, hourly measured), distance from the Earth to the Moon (which changes slightly throughout the lunar month). Even which political party holds the presidential chair after the elections (which depends of the "history" of the previous president, etc.) We can go on with the list of examples until the server's storage is full. As you may see, the examples above have cyclic nature, but so is everything (or at least everything) related to time series (of course, within certain deviations).

It is a nature of the mankind to want to know the future (although, sometimes it better not to know). Attempts are being made to predict, or let's use a more politically correct term - forecast, where certain series would go in the future. The best example may be shamans predicting rain or drought. These days there are complex (and not so complex) algorithms to forecast time series (e.g. noise reduction in digital signal processing). But the most scandalous and loud argument is going on about the stock market analysis and forecasting. Many of you may have heard about William Gann - some say genius, some say charlatan. I personally tend to take the first side, although, there may be facts that I am not aware of.

Mr. Gann died almost 60 years ago. Quite a long period of time. Imagine how many time series forecasting (read stock market forecasting) techniques have been born and how many have vanished. Since the chaos theory, more and more people tend to say that "stock market forecasting is impossible due to its fractal nature". Which makes sense if you look at the problem from the chaos theory's perspective. However, do not forget that chaos theory is accepted as the one that fits the situation the best, not as the one that fully explains it. In my perception, this tiny difference leaves a tiny space for hope ;-)

Well, we've had enough of science this far. Let us get to practice. Let me try to simplify things as much as possible, to demonstrate a simpler, yet effective approach from a developer's point of view.

Software

From software perspective, there's not too much needed for successful forecasts - an expert system. Smart people use different software packages and programming languages targeted at expert systems development, but being an ignoramus (as I decided to be for this article), I decided to use what I have and what I know - C language, GCC and Geany text editor as an IDE.

Data

There are several (graphical) ways to represent stock/forex market data. The most known one is candlesticks. A sequence of simple graphic figures, of which each one represents the variation of the price for a certain period of time (open, high, low and close values). We, however, are not going to consider any of them. Simply because we do not need that. Instead, we are going to concentrate on the raw row of numbers for a given period (let's say one year) measured hourly, which gives us a sequence of more then 8000 items (we are only paying attention to one value - either open, high, low or close).

If you try to plot this sequence (e.g. in Excel|) you will get a curvy line. Take another look at it and you will notice that there are similar segments (within certain deviations, of course). Just as a set of similar images, which would bring up one of the best approaches for image recognition - Artificial Neural Networks (especially perceptrons). Although, there is nothing new in using ANN for stock/forex market analysis. There are tones of commercial software products that provide the end user with different indicators telling him/her whether to buy, sell of hold the current position, I personally have not seen a lot of attempts to actually make long term (e.g. 24 hours for an hourly measured sequence) forecasts. There is also a lot of uncertainty as to what data should be used as ANN's input and how much data should be fed in each time. Unfortunately, no one has the exact answer for this question. It is just your trial and error. The same applies to the amount of hidden neurons in the ANN.

Another big question is how should the data be preprocessed - prepared for the ANN. Some use complex algorithms (Fourier transform, for example), other tend to use a more simplistic ones. The idea is that data should be in the range of 0.0- 1.0 and it should be as varied as possible. But remember - if you feed ANN with garbage - you get garbage in response. Meaning that you have to carefully select your algorithm for data preprocessing (normalization). I tend to use a custom normalization algorithm, which is quite simple. Sorry to disappoint you, but I am not going to give it here for now as it is still not completely defined (although, it already produces good results).

The bottom line for this paragraph - data preprocessing is not very important, it is the MOST important.

Instruments

My programming solutions for this problem is quite simple - a console program that reads the input (the whole sequence of price values for the specified period), trains an artificial neural network (in my case the topology was 8x24x1 - 8 inputs, 24 hidden neurons and one output neuron), and then produces a long term forecast (at least 7 entries into the future) while each step of the forecast is done using the previously generated values.

The ANN is a simple multilayer perceptron with 8 inputs, 24 hidden neurons and 1 output neuron. Basically saying - we do not perform much calculations ourselves, if at all. ANN is a perfect implementation of a learning paradigm, able to find hidden dependencies and rules. Therefore, if you ask me - there is no better solution then utilizing ANNs for time series forecasting.

Test

So, I implemented an ANN (in C this time, not in Assembly) and got the dataset (EUR/USD price values for every hour of the past year). The next move was to give it a try and test in run time. I decided to do that during the weekend as I was not sure about how much time would be required to train the network. Surprisingly, I got a good error after only about 30,000 epochs (several minutes). The following picture shows what I got:

EUR/USD forecast

Test set - data not included in the ANN training process. Used as a pattern for error calculation.

Test forecast - forecast on data from the past, which was not included in the training set.

Real forecast - forecast of the future values. This was done on Saturday at least 24 hours before the opening of the next trading session.

Real data - real values obtained Monday early morning after the new trading session began.

As you can see, such simple system was even able to forecast the gap between the two sessions.

P.S. Although, this article contains no source code, no description of any interesting programming technique or whatsoever, it comes to show, that each problem has a (not necessarily complicated) solution. Most of the time, the most important thing is to take a look at a problem from another angle.

13 comments:

AnonymousSeptember 5, 2012 at 11:27 AM
Hi Alexey,
predicting the gap between trading sessions is an easy task since it's periodic, Sydney market opens first, US markets close as last. How far were you able to forecast the results and for which time slice? Consider also that most of the game is played with stop-losses and the volatility of a single candle can be huge especially on the EUR/USD. It basically means that the ANN is just predicting a value that will be at one point between the low and the high and it's easy for high volatility markets to guess a right value. I don't want to say that you're not doing a good job, only that is REALLY difficult to assess the predictions on an ANN unless you try them on the field. The prediction as shown seems good but when do you open your position and when do you close it? If you don't take volatility into account you risk losing a lot of money.
ReplyDelete
Replies
AnonymousSeptember 5, 2012 at 12:52 PM
Hi again Alexey,
remember than an ANN with three layers is a universal approximator, this means that everything that has a periodic behavior is easily recognized by the network. If you can accept an advise from me, I can tell that you might be able to get better results (but I was unable to get anything that was even close to be really good) by using a second hidden layer, that accounts for the incredible noise added by the HFT transactions. Also adding the time of the day, especially on hourly candles, will be able to help the ANN spot the activity of the big traders (when they go to lunch, when they have breaks and so on). Still you won't be able to predict those things called Noe, that is when the banks push a lot of money, or when new data is released, this strongly influences the market AND it also ruins the training of your ANN because those events are totally unpredictable. Indeed you will notice that the ANN performs better on VERY short time frames (minutes, seconds or even ticks) and you could couple them with a Markov chain. On very short time frames I was able to use successfully my model, the model worked also on very long time frames (1 candle per day) but I didn't use it with real money on the last case, for very long time frames you usually need a lot of money to be used as a buffer since deviations are veeery wide. For shorter time frames I was unable to remove the noise or to get really reliable predictions. Check my picture (http://i49.tinypic.com/vakow.png) blu being the real series, red the predicted. The prediction is currently really accurated BUT it won't yield any money if you'd trade on that model because the volatility of the market makes that prediction way too inaccurate. Seems good to the eye but really it's not... Unfortunately ;ppp
ReplyDelete
Replies
AnonymousSeptember 12, 2012 at 9:06 AM
How did you decide to use 24 hidden neurons? That should give you about 200 weights to be fitted or 1 for every 40 data points. I would worry about the potential for overfitting but if it works out of sample then that's the main thing.
ReplyDelete
Replies
Alexey LyashkoSeptember 28, 2012 at 9:39 PM
Someone asked my for my twitter in Spanish, so two things:

1. here's my twitter account https://twitter.com/alexey_lyashko
2. Unfortunately, I do not speak spanish...
ReplyDelete
Replies
bcdc1994September 29, 2012 at 9:57 PM
Hello , you know of any method so that when running my (Backdoor.exe) is injected into another process automatically without use (migrate) of metasploit exploitation frameworks?
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Search This Blog

Tuesday, September 4, 2012

Time Series Analysis and Forecasting. Programming Approach - thoughts

13 comments: