# DATA MINING And NEURAL NETWORKS

MA4022/MA7022 DATA MINING and NEURAL NETWORKS
Due date 07.05.2021, 23:59
Any student should have their own unique set of time series!
Please collect available data for three years 2018-2020
Please pay attention that for your analysis the time moments should be sorted from oldest to newest.
Use the daily closing price.
1. Data evaluation and elementary preprocessing. Analyse completeness of data. Are there missed
data (besides weekends)? How many missed data points are in your time series? Are the dates of
missed values the same for all your time series? What may be the reasons for missing? How can you
handle the missed values in your data (explain at least three approaches)? Use the simple rule: fill in a
missed value by the closest in time past existing value. Plot the results. Normalise to the z-score (zero
mean and unit standard deviation). Plot the results. (15 marks)
3. Segmentation. Prepare the bottom-up piecewise linear segmentation for the transformed and
normalised log-return time series. Use the following mean square errors tolerance levels: 1%, 5%, 10%
(the thresholds of the mean square errors). Plot the results. Are the segments similar for different time
series you analysed? (25 marks)
4. Prediction. Chose one of the transformed and normalised time series as a target ( ) and other 3 as
supporting data 1
( ), 2
( ), 3
( ), where = 1, … , . Provide scatter diagrams of (g(t),g(t+1)).
Evaluate the error of the “next-day forecast”, ̂( + 1) = ( ).
Use data for 2018 as the training set and find the predictor of ( + 1) (the next day value) as a
linear function Ψ of ( ), 1
( ), 2
( ), 3
( ):
̂( + 1) = Ψ( ( ), 1
( ), 2
( ), 3
( )) (1)
(linear regression). Evaluate the training set error. Use data for 2019 as a test set and evaluate the test
set error for this set. Also, use data for 2020 as a test set and evaluate the test set error for this set.
Compare these errors. Compare these errors to the errors of the “next-day forecast”. Comment.
Provide plots of ( ), ̂( ), and the residual. Present the ( ( ), ̂( )) scatter diagram. (30 marks)
5. Adaptive predictors. For each given value of the “frame width”, Δ=5, 10, 30, create and test the
following adaptive predictor. For every T> Δ create the training set with Δ input vectors
( ( ), 1
( ), 2
( ), 3
( )) ( = − Δ, … , -1) and the corresponding outputs ( + 1).
In more detail, the input vectors and the output values
for a given T are
1 = ( ( − Δ), 1
( − Δ), 2
( − Δ), 3
( − Δ)), 1 = ( − Δ + 1)
………..
= ( ( − Δ + − 1), 1
( − Δ + − 1), 2
( − Δ + − 1), 3
( − Δ + − 1)),
= ( − Δ + )
Where i=1,2,…, Δ.
Find the linear regression (1) for each T> Δ. Test this linear regression for the next time value, t=T+1.
In more detail, for each T there is one test example with the input vector and output value :
= ( ( ), 1
( ), 2
( ), 3
( )), = ( + 1)Please pay attention that this example does not belong to a training set for this value of T.
Find the residuals at these test time moments. Plot these residuals and the values ( ), ̂( ). Present
the ( ( ), ̂( )) scatter diagram (t=T+1). Calculate the mean square error. Compare to the previous