We can use the [UCI ML Air Quality Dataset](https://archive.ics.uci.edu/ml/datasets/Air+quality) to demonstrate timeseries analysis of longitudinal data. The *Air Quality data* consists of 9,358 hourly-averaged responses from an array of 5 sensors embedded in an Air Quality Chemical Multisensor Device. These measurements were obtained in a significantly polluted area during a one year period (March 2004 to February 2005). The features include Concentrations for CO, Non Metanic Hydrocarbons, Benzene, Total Nitrogen Oxides (NOx), and Nitrogen Dioxide (NO2). The attributes in the CSV file inlcude: * Date (DD/MM/YYYY) * Time (HH.MM.SS) * True hourly averaged concentration CO in mg/m^3 (reference analyzer) * PT08.S1 (tin oxide) hourly averaged sensor response (nominally CO targeted) * True hourly averaged overall Non Metanic HydroCarbons concentration in microg/m^3 (reference analyzer) * True hourly averaged Benzene concentration in microg/m^3 (reference analyzer) * PT08.S2 (titania) hourly averaged sensor response (nominally NMHC targeted) * True hourly averaged NOx concentration in ppb (reference analyzer) * PT08.S3 (tungsten oxide) hourly averaged sensor response (nominally NOx targeted) * True hourly averaged NO2 concentration in microg/m^3 (reference analyzer) * PT08.S4 (tungsten oxide) hourly averaged sensor response (nominally NO2 targeted) * PT08.S5 (indium oxide) hourly averaged sensor response (nominally O3 targeted) * Temperature in °C * Relative Humidity (%) * AH Absolute Humidity References S. De Vito, E. Massera, M. Piga, L. Martinotto, G. Di Francia, On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario, Sensors and Actuators B: Chemical, Volume 129, Issue 2, 22 February 2008, Pages 750-757, ISSN 0925-4005. Saverio De Vito, Marco Piga, Luca Martinotto, Girolamo Di Francia, CO, NO2 and NOx urban pollution monitoring with on-field calibrated electronic nose by automatic bayesian regularization, Sensors and Actuators B: Chemical, Volume 143, Issue 1, 4 December 2009, Pages 182-191, ISSN 0925-4005. S. De Vito, G. Fattoruso, M. Pardo, F. Tortorella and G. Di Francia, 'Semi-Supervised Learning Techniques in Artificial Olfaction: A Novel Approach to Classification Problems and Drift Counteraction,' in IEEE Sensors Journal, vol. 12, no. 11, pp. 3215-3224, Nov. 2012. doi: 10.1109/JSEN.2012.2192425 R Code Snippet aqi_data <- read.csv("https://umich.instructure.com/files/8208336/download?download_frd=1") summary(aqi_data) aqi_data.ts <- ts(aqi_data, start=c(2004,3), freq=24) # hourly sampling rate # set up training and testing time-periods alltrain.ts <- window(aqi_data.ts, end=c(2004,3)) allvalid.ts <- window(aqi_data.ts, start=c(2005,1)) # Estimate the ARIMAX model library(forecast) fitArimaX <- auto.arima(aqi_data$CO.GT., xreg= aqi_data[ , c("PT08.S1.CO.", "NMHC.GT.", "C6H6.GT.", "PT08.S2.NMHC.", "NOx.GT.", "PT08.S3.NOx.", "NO2.GT.", "PT08.S4.NO2.", "PT08.S5.O3.", "T", "RH", "AH")]) fitArimaX # Predict prospective CO concentration pred_length <- 24*30 # 1 month forward forecasting predArrivals <- predict(fitArimaX, n.ahead = pred_length, newxreg = aqi_data[c((9471-pred_length+1):9471), c(4:15)]) #plot(predArrivals$pred, main="Forward time-series predictions (fitArimaX)") plot(forecast(fitArimaX, xreg = aqi_data[c((9471-pred_length+1):9471), c(4:15)])) More information: http://www.socr.umich.edu/people/dinov/courses/DSPA_notes/18_BigLongitudinalDataAnalysis.html