Time Series Imaging — and how it can help in classification (And football betting)

9 min readOct 19, 2022

This article is a non-obvious continuation of my previous publication (Beating Bookmakers — Proof of Concept model that is good enough to start betting), in which I discussed, among others, the subject of transforming tabular data into an image. Despite the unsatisfactory results of the experiment at that time, I emphasized that I would not abandon this idea as much. Moreover, I wanted to present how much potential it has in a separate article.

However, the characteristics of the time series data differ significantly from tabular data. Although both types of data usually have the form of vectors, in the case of a given time series, the time variable is added, so each value in such a vector is autoregressive, and often also stochastic. It is this dependence t on t-1 (and even on t-n) that, despite the apparent similarity in the presentation of data, we deal with a completely different animal.

And in this case, without getting too far ahead of the facts, we have many more tools at our disposal that can potentially help in the development of the previously presented concept, about the use of two-dimensional convolutional neural networks for data classification.

Executive summary. What was the purpose of the experiments and what you will find in this material

In this article, I will try to summarize a few methods for classifying given time series (using the experiments I did inspired by an article written by Johann Faouzi in February this year).

I took data from the FordA set (link) to the workshop, which shows readings from car components, and the purpose of this dataset was to classify whether it indicated that a problem with the vehicle was present or not.

The data itself is of a secondary nature, as the essence was to establish benchmarks by using more classical methods of classifying time series data and then relate them to the results that can be obtained from the same data sample by using imaging techniques and the subsequent use of two-dimensional convolutional neural networks.

In addition to the comparison with the tree-based or dictionary-based algorithms, such as, for example, Bags-of-SFA symbols in Vector Space, I decided to explore:

the possibilities offered in this matter by transfer learning and I reached for well-known architectures (ResNet101 and Inception V3), which I have fitted on the data in question
tested feature extraction from the ResNet101 architecture
compare experiments, and results overview

The Data

The data from the FordA is divided into a standard train and test split [training: ((3601,500) (3601,), test: ((1320,500), (1320,))]. Also they are stationary ones, so you do not need to do anything else with it. The maximum amplitude is about six. The example data in the line graph are as follows:

plt.plot() output on a 1D slice of a features data

“Classic” models that will be used for benchmarking

The basis for my research and choice of tools was the aforementioned article by Johann Faouzi, in which he quite precisely (andsome sense also and chronologically) described techniques and tools available

Apart from the methods mentioned in the first half of the text, I took under consideration following ones:

Naive use of Random Forest from sklearn
BOSSVS algorithm, implemented in PYTS (https://pyts.readthedocs.io/) on roughly default parameters
Random Convolutional Kernel Transformation (RocketClassifier in sktime implementation).

Libraries from points 2 and 3 have their structure very similar to sklearn. So launching those classifiers, including prediction, are literally three lines of code. Reminder:

Three lines-long implementation of RocketClassifier from sktime, in the sklearn manner

* Unfortunately, I was unable to implement and test Hive-COTE v1.0, which has been repeatedly indicated in the industry literature as one of the most interesting solutions for this problem. Despite the use of Google Colab Pro, each time the code was run in accordance with the sktime documentation, the notebook instance ran out of memory and resulted in a crash.

Nevertheless, what were the results of the tested models on the validation data set?

Imaging

In order to use the possibilities of two-dimensional convolutional neural networks, we need to convert a one-dimensional vector into an image. To do this, we can use several proposed techniques that have been described in detail, among others also in this article.

Once again, PYTS (Python Time Series, written by Johann Faouzi) library comes handy with presented implementations. In practice, according to the documentation — once again it’s only like three lines of code, in a sklearn manner. However, below I am attaching a snippet with the function I created for the transformation.

And what do these line graphs look like after applying aforementioned transformations?

Markov Transition Field (MTF):

MTF transformed 1D vectors of the same class

GA(S)F transformed 1D vectors of the same class

However, if we looked at the numpy array behind these fractalistic images, we would find that we are only dealing with only two dimensions, not the three that are needed for the Conv2D network. Therefore, in order for exactly such images as the displayed ones to contribute to the datasets, we need one more small transformation.

In this case, I used matplotlib to “color” and normalize the arrays resulting from conversion to MTF and GASF.

2D array to “image” (HWC format) converter

Then I turned the prepared arrays in tf.data.Dataset.

Architecture of two-dimensional convolutional neural network.

For both datasets, the architecture is identical. It consists of:

The input layer, which has the same shape as one image

Three convolutional blocks that have the following layers:

BatchNormalization

Conv2D layer (32 filters, kernel size 5, relu)

Dense layer (64,32,16, relu respectively)

Dropout (0.2)

An output block consisting of:

GlobalMaxPooling2D

Flatten layer

Dense layer handling a binary classification with sigmoid activation

Then it was retrained by 40 epochs, along with the ReduceLROnPlateau () callback. After making the predictions, I compared the created models using the basic metrics from the sklearn package — accuracy, precision, recall and F1 score. The results were as follows:

Score per model, Markov and Gramian are Conv2D models with different data transformation

In terms of a score alone, none of the models beat Rocket, but they were not that far behind their backs, losing only by 2–2.5 points. And at the same time provided a much faster speed of both learning and predictions, with much less resource consumption.

Transfer learning

The next step, due to the above success — as this level of performance as an entry result should be considered this way (simple architecture, no hyperparameter tuning, quite fast and efficient training, and potential reserves in the data transformation itself — there are also various strategies and various possibilities) — fully encouraged me to dig deeper.

I started looking for an architecture that could be suitable for solving similar problems and I came across an article on recognizing human activity through the use of GASF and Conv2D. The architecture presented there is far beyond the scope of what I was looking for (it would not be transfer learning, but a recreation of a new, not tested so extensively network, which I will gladly do in the upcoming months), but a large part of it is based on ResNet.

After short testing, ResNet101 showed promising results, as did Inception V3. Promising but having its drawbacks. First things first.

Feature Extraction

Training the model on frozen layers, unfreezing some of them and training again did not give great results (on the Random Forest level), so I did it for only one data set — MTF.

All layers trainable

On the other hand, when I started training with all layers from each of the tested architectures frozen, the results were at least intriguing, but also slightly disturbing. The graph below illustrates the learning and performance curves for the training and validation datasets.

Learning curves of Resnet101 on GASF… jagged…

Couple of things stands out:

The relative symmetry of the phenomena between the cost-effectiveness curve, with both datasets, should be considered as expected, but…
Somewhere up to the 20 epoch level, the algorithm has an obvious problem with generalization, as if it would only be possible at a certain cost level
Probably a more dynamic adjustment of the learning rate should solve the problem of not so smooth learning. It should also help with generalization.

Nevertheless, the exact results (analogous metrics as before) are as follows:

Comparison of a Transfer Learning architectures; FE stands for Feature Extraction

Both models work better by almost 1 point with the GASF transformation. Comparing this with the fact that our simple CNN network also scored better on the same than on the MTF, it suggests a hypothesis that slightly favors the GASF transformation. Additionally, Resnet on each dataset turned out to be better than Inception v3, approaching 1.8 points to the Rocket algorithm.

In the last step, we will check how the ResNet101 model works on data sets based on Gramian Angular Field transformation prepared with various strategies.

What’s next

For sure, we have some challenges ahead. Stabilization of a learning curve is something worth considering. As mentioned, maybe dynamic learning rate on the fly, epoch by epoch should help. Checking different batch sizes also seems an interesting idea, in relation to this adaptive learning rate.

In the field of a data preprocessing and transformation into images, at this moment it looks like Gramian Angular Field is leading a race, but Markov Transition Field offers more customization options, like nearly infinite matrix of n_bins (“alphabet size”) x strategy type (uniform, normal, quantile). In this case it looks like each dataset may need an individual approach and tests upfront.

Also, there is a level of unpredictability. Just a minutes before finishing this article I ran this notebook once again, and this time guess what… the Inception V3 on the Gramian dataset performed far better, beating the previous winner by nearly 0.8 points (and a previous winner, Resnet on Gramian, which one this time dropped down around a point).

Same code, different result on a different day…

But nevertheless, this approach looks more than promising especially when we take into the account nearly endless possibilities of convolutional neural networks. We have a great entry level of accuracy, reasonable time & cost of computing, and infinite playground.

And probably you may ask how it’s related to the betting and my previous article? Well, the main idea behind my research in general was to “sensorize” football.

By that I mean in-game performance of each player may be a kind of a time-series, isn’t it? For example influence of the ball progression/decision making in time.

And also each fixture is kind of a time series of time series of it’s own. And if we have effective method for the time series classification, we may be able to predict a game in-depth…

Some links in addition:

My repo with the research.
A link to the Inception_v3 model trained on a Gramian Angular Summation Field of full model (not h5) if someone want to look deeper. In upcoming months, I plan to publish it on a tfhub.dev, but need to work a bit on this. So for now a link to GDrive.
Paper about similar approach in regression problems https://arxiv.org/abs/1506.00327
A great article by Louis de Vitry about math behind the GAF transformations: https://medium.com/analytics-vidhya/encoding-time-series-as-images-b043becbdbf3