I installed GNU Octave, which is essentially a free knock-off version of MATLAB. It incorporates much of MATLAB's non-cryptic syntax, making it possible to run complete scripts from .m files.
Will I be able to predict the future?
I previously discussed how I was able to perform interpolation with my data. I would take every n-th point from the initial data set and use this set as the training set. I would then proceed by taking every m-th point (where m < n) and place these into another set, such that the resulting set is larger than the training set and includes a few points 'in between' training examples. This resulting set will be the test set.I displayed that interpolation could be done reasonable accurately. The next obvious area to investigate is to look at how GP regression performs for extrapolation. I will fix the amount of data points I would like to predict and vary the size of the training set. This makes it possible to observe the behaviour as the program encounters more data points to train on, and whether this increases the accuracy of predictions.
I made sure to use a method of choosing which points to use in the training or test set which makes it easier to check the accuracy of predictions. I simply look at the last P points, starting from the last observed data point that I have, and have that as the test set. This set will be fixed. Now, I look at the Q points that are before the P unseen testing points and use this as the training set. So, the larger the Q, the further back in history we are looking at and thus the more comprehensive the training set will be for the program. This also means that we are further away from the testing points, so predictions may actually become less accurate as we continue increasing Q.
After some trial and error, I chose the length scale and signal variance parameters to be 14 and 10 respectively. It would be interesting to see if there is a way in which these hyperparameters can be learnt via GP, as trial and error is far from a scientific approach.
Below are the results. As always, in the diagrams the red curve shows the predictions given the values of the 18 parameters corresponding to each observed PM2.5 value, which are shown as the blue crosses. The aim is to see how close we can get the red curve to 'fit' the blue observations to see how accurate we are. The size of the training set varies from 10 to 1500.
We can see that within the 1000-1500 range for the training set size, we have learnt a reasonably accurate curve to fit the data, which is what we wanted.
What else needs to be looked into?
As mentioned, I need to find a way to learnt the signal variance and length scale parameters. In addition I am currently learning the form of the mean function hyperparameter as I assume a priori that it is unknown. It may be a better idea to use the mean value of PM2.5, a positive real number, to act as a constant mean function, and see how this affects predictions. I currently limit the number of training examples to 1500 in this case, as after this point, Octave begins to struggle with the computations. It would be helpful to increase this number further to see how this affects the accuracy of predictions.