Interpolation
In terms of research, I have managed to produce some reasonable results in terms of PM2.5 prediction. I performed a variety of experiments to see how accurately predictions can be made.
The first of which is checking the interpolation ability of GP regression. I first looked at each 100th observed data point to train on, and used a larger set of every 10th data point to test on. We want to see whether the ten unseen points between each training example produce reasonable results. The hyperparameter which was varied between each experiment was the 'length scale', while keeping the covariance function as the squared exponential. It practically determines how strongly the predicted curve will try to fit the data. Theoretically, it is a measure of how 'close' two points in the training set need to be for them to be seen as 'significant' and be assigned a greater weight. This behaviour is similar to what is seen in weighted linear regression.
Here are some results.
We can see from these result that the larger the length scale is, the less likely we will encounter problems with overfitting the data, but the less accurate the data seems to be from what is expected.
Extrapolation
An alternative experiment to the above is to try to extrapolate from the data and predict future values given a few sequential data points to train on. I happen to find these results the most interesting so far.
Unfortunately I am not able to present the findings here because the SSH connection to the server at Tsinghua is horrendously slow. As soon as I get a better connection I will try to upload the images.
No comments:
Post a Comment