type
Post
Created date
Jun 16, 2022 01:21 PM
category
Data Science
tags
Applied forecasting
Economics
status
Published
Language
From
summary
slug
password
Author
Priority
Featured
Featured
Cover
Origin
Type
URL
Youtube
Youtube
icon
TOC
Time series patternsSeasonal subseries plotsWeek 3: Time series decomposition. 13.1. Transformations and adjustments. 13.2. Mathematical transformations. 1Example of the different adjustment3.2.1. Box-Cox transformations.Alternative transformationlog(x+1) (Use it when there is zero)3.3. Time series components. 3.3.2. Time series decomposition. (trend-cycle)3.3.3. Seasonal adjustment (Seasonal)3.4. History of time series decomposition. Change of window (Example)Week 5. The forecaster's toolbox5.1 A tidy forecasting workflow5.2. Some simple forecasting methods. 25.3. Residual diagnostics5.2.3. Forecasting residuals5.2.4. ACF of residuals.5.2.5. Portmanteau tests5.4. Distributional forecasts and prediction intervals5.4.1. Forecast distributions5.4.2. Prediction intervals5.5. Forecasting with transformationsMeasures of forecast accuracyFAQ
TOC (By Week)
Model Learnt
Basic
- MEAN(y): Average method
- NAIVE(y): Naïve method
- SNAIVE(y ~ lag(m)): Seasonal naïve method
- RW(y ~ drift()): Drift method
Medium
- Automatic forecasting
Sophisticated
Arima → Less interpretable compared to ETS
To do
What is hetroskesity
ETS
optimisation
p.57 likelihood
Innovations state space models
ARIMA
Time series patterns
There are 3 types of time series patterns that need to be well-defined :
Trend
- exists when there is a long-term increase or decrease in the data.
Seasonal
- exists when a series is influenced by seasonal factors
- (e.g., the quarter of the year, the month, or day of the week).
Cyclic
- exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).
Economic data is more of a cyclic pattern = Recession, boom, climax
Seasonal subseries plots
3.5. Lagged scatterplots
Why do we need lagged valued?
- In finance, we want to see percentage change in certain amount of day like t-1 = one day changed
3.6.1 Autocorrelation
- Covariance and correlation: measure extent of linear relationship between two variables (y and X).
- Auto means “Self” in Latin. As implied, autocovariance and autocorrelation measure the linear relationship between lagged values of a time series y. Lagged values are derived from the “SELF” values.
The autocorrelation function (ACF) tells us the correlation between observations and those that came before them separated by different lags (refer to the monster generations in slides!)
3.6.1 correlogram (ACF)
new_production %>% ACF(Beer) %>% autoplot()
- r4 higher than for the other lags.
- This is due to the seasonal pattern in the data: the peaks tend to be 4 quarters apart and the troughs tend to be 2 quarters apart.The dashed Blue line indicates that if the correlations are significantly different from 0.
- r2 is more negative than for the other lags because troughs tend to be 2 quarters behind peaks.
- When data have a trend, the autocorrelations for small lags tend to be large and positive.
- When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)
- When data are trended and seasonal, you see a combination of these effects.
3.7. White noise
- Time series that show no Autocorrelation are called “White noise“.
- Expect each autocorrelation to be close to zero.
- Blue lines show 95% critical values.
- Common to plot lines at ±1.96/√T when plotting ACF. These are the critical values.
- 95% of all for white noise must lie within ±1.96/√T.
- If this is not the case (亦即某 超過 dashed blue line) , the series is probably not White noise.
Week 3: Time series decomposition. 1
3.1. Transformations and adjustments. 1
There are 4 kinds of adjustments :
- Population adjustments
- Inflation adjustments
- calendar adjustments
- Mathematical transformations
The purpose of which is to simplify the pattern in the historical data
- by removing the known source of variation
- by making the pattern more consistent across the entire dataset.
Why do I need to simplify the pattern ?
- Lead to a more accurate forecast.
3.2. Mathematical transformations. 1
- Power transformations includes Square root, cube roots
- Box-cox transformations: log
Example of the different adjustment
3.2.1. Box-Cox transformations.
myseries_train %>% features(Turnover, features = guerrero)
Feature of power transformation :
- STABLISES the variance of the series.
- ADJUST the series to make it more comparable.
- Often no transformation needed.
- Simple transformations are easier to explain and work well enough.
- Transformations can have very large effect on PI.
- If some data are zero or negative, then use λ > 0.
- log1p() can also be useful for data with zeros.
- Choosing logs is a simple way to force forecasts to be positive
- Transformations must be reversed to obtain forecasts on the original scale. (Handled automatically by fable.)
- ASSUME the variation is proportional to the level of the series.
Rule of thumb of Box-Cox transformation (lambda)
Let x = the size of the seasonal variation about the same across the whole series
When Value of
- ≥ 0, you are making x weaker.
- ≤ 1, you are making x stronger.
- = 1 means there is no transformation.
- = 0 means natural log, which is pretty strong.
Alternative transformation
log(x+1)
(Use it when there is zero)
pedestrian %>% filter(Sensor == "Southern Cross Station") %>% autoplot(log1p(Count)) + labs(title = "Southern Cross Pedestrians")
If there is a high skewness and some ZEROs (so we can’s take logs, we can try
log(x+1)
transformation. 3.3. Time series components.
3.3.1. Time series patterns (Refer back the previous week)
3.3.2. Time series decomposition. (trend-cycle)
When we decompose a time series into components, we usually combine the trend and cycle into a single trend-cycle component (sometimes called the trend for simplicity).
Thus we think of a time series as comprising three components: a trend-cycle component, a seasonal component, and a remainder component (containing anything else in the time series).
In this chapter, we are to learn some common methods for extracting these components from a time series.
where
yt = data at period t
Tt = trend-cycle component at period t
St = seasonal component at period t
Rt = remainder component at period t
- Additive model appropriate if the magnitude of seasonal fluctuations does not vary with level.
- If seasonal is proportional to the level of series, then the multiplicative model is appropriate.
- Multiplicative decomposition is more common with economic time series.
- Alternative: use a Box-Cox transformation (make it more stable), and then use additive decomposition.
- Logs turn multiplicative relationship into an additive relationship:
The grey bars to the right of each panel show the RELATIVE SCALES of the components.
- A longer bar means the smaller scales; smaller bar means the scales are similar to the original data.
- Bars with same lengths belong to the same scales.
The large grey bar in the bottom panel shows that the variation in the remainder component is small compared to the variation in the data, which has a bar about one quarter the size.
- If we shrunk the bottom three panels until their bars became the same size as that in the data panel, then all the panels would be on the same scale.
3.3.3. Seasonal adjustment (Seasonal)
- We use estimates of S based on past values to seasonally adjust a current value.
- Seasonally adjusted series reflect remainders as well as trends. Therefore they are not “smooth”
- “downturns” or “upturns” can be misleading.
What is an example of seasonal variation?
- An increase in unemployment due to school leavers seeking work is seasonal variation, while an increase in unemployment due to an economic recession is non-seasonal.
- Most economic analysts who study unemployment data are more interested in the non-seasonal variation. Consequently, employment data (and many other economic series) are removing the seasonality of the data (i.e. seasonally adjusted).
3.4. History of time series decomposition.
3.4.1. X-11 decomposition. 2
3.4.2. Extensions: X-12-ARIMA and X-13-ARIMA.. 2
3.4.3. X-13ARIMA-SEATS. 2
3.4.4. STL decomposition. 2
seats_dcmp <- us_retail_employment %>% model(seats = X_13ARIMA_SEATS(Employed ~ seats())) %>% components() autoplot(seats_dcmp) + labs(title = "Decomposition of total US retail employment using SEATS")
The grey bar asides correspond to the contribution to the original data.
- The seasonal pattern (3rd row) has the SMALLEST contribution as the bar is the largest compared to the trend’s and seasonal one.
3.5. When things go wrong. 2
Change of window (Example)
(remainder ) : my remainder looks pretty random; it's going all over the place.
(seasonality ) : my seasonality looks smooth and seasonal there are some fluctuations in it in the size you can see it's growing a little bit but that's fine it's just variability that we couldn't handle with our transformation
(trend) : our trend looks like it goes through the data nicely
(trend) : the line is a bit smoother now i've increased the
window so it's now averaging over more numbers and we get a straighter line a smoother line
(trend) : what an infinite window means is essentially we've just got regression we don't have a local regression anymore. Because we've only got one window of infinite size, that's using the whole length of the data and we just get one straight line our regression line
(trend): each slope is based only on one position in time; that value essentially will just be the intercept. That means it's really flexible right it's pretty much your trend
(remainder ): your remainder in one component and looking at the error there's no error left it's zero which again sometimes in models you do want no error. However, for this one, we don't we want the randomness to be in the remainder term. Here we've got too much flexibility and our trend is no longer smooth
Week 5. The forecaster's toolbox
For this week, we discuss some useful tools for many different forecasting situations. Each of the tools below will be used repeatedly as we develop and explore a range of forecasting methods.
- Some benchmark forecasting methods,
- Ways of making the forecasting task simpler using transformations and adjustments
- Methods for checking whether a forecasting method has adequately utilised the available information (Quality of the method)
- Techniques for computing prediction intervals.
5.1 A tidy forecasting workflow
3 Specifying a model
5 Accuracy & performance evaluation
- A mable is a model table, each cell corresponds to a fitted model.
- The
model()
function trains models to data.
- A fable is a forecast table with point forecasts and distributions.
5.2. Some simple forecasting methods. 2
The following 4 forecasting methods that we will use are benchmarks for other forecasting methods. They are very simple and surprisingly effective.
- MEAN(y): Average method
- NAIVE(y): Naïve method
- SNAIVE(y ~ lag(m)): Seasonal naïve method
- RW(y ~ drift()): Drift method
Above, Naive assumes the most recent observation is the most important one, and all previous obs provide no information about the future.
MEAN(y)
: Average methodForecast of all future values = mean of historical data {y1, . . . , yT}.
SNAIVE(y ~ lag(m))
: Seasonal naïve methodForecasts = last value from same season.
NAIVE(y)
: Naïve methodForecasts = last observed value.
RW(y ~ drift())
: Drift methodForecasts = last value plus average change.
5.3. Residual diagnostics
5.2.3. Forecasting residuals
- Define difference between observed value and its fitted value:
- Useful in checking whether a model has adequately captured the information in the data.
- A good forecasting method has the following assumptions and useful properties :
Assumptions
Useful properties
There are 2 ways to check residuals; one by treating residuals individually (i.e. ACF of residuals) , another one by treating residuals as a group (Portmanteau tests)
5.2.4. ACF of residuals.
Interpretation
These graphs show that the naïve method produces forecasts that appear to account for all available information. The mean of the residuals is close to zero and there is no significant correlation in the residuals series. The time plot of the residuals shows that the variation of the residuals stays much the same across the historical data, apart from the one outlier, and therefore the residual variance can be treated as constant. This can also be seen on the histogram of the residuals. The histogram suggests that the residuals may not be normal — the right tail seems a little too long, even when we ignore the outlier. Consequently, forecasts from this method will probably be quite good, but prediction intervals that are computed assuming a normal distribution may be inaccurate.
- Assume residuals are white noise (uncorrelated, mean zero, constant variance).
- If they aren’t, then there is information left in the residuals that should be used in computing forecasts.
5.2.5. Portmanteau tests
- A more formal test for autocorrelation by considering a whole set of values as a group
- A test to see whether the set is significantly different from a zero set.
5.4. Distributional forecasts and prediction intervals
5.4.1. Forecast distributions
- A forecast is (usually) the mean of the conditional distribution .
- Most time series models produce normally distributed forecasts.
- The forecast distribution describes the probability of observing any future value.
5.4.2. Prediction intervals
- A prediction interval gives a region within which we expect to lie with a specified probability.
- Assuming forecast errors are normally distributed, then a 95% PI is ; where is the st dev of the h-step distribution.
- When h = 1, can be estimated from the residuals.
brick_fc %>% hilo(level = 95)
- Point forecasts often useless without a measure of uncertainty (such as prediction intervals).
- Prediction intervals require a stochastic model (with random errors, etc).
- For most models, prediction intervals get wider as the forecast horizon increases.
- Use level argument to control coverage. Check residual assumptions before believing them.
- Usually too narrow due to unaccounted uncertainty.
5.5. Forecasting with transformations
5.5.1. Modelling with transformations
5.5.2. Forecasting with transformations
5.5.3. Bias adjustment
ETC3550 Lecture 5A - YouTube Here mentions :
If this probability is some number — p, then the probability of the transformation must also be that number p.
- because the amount of probability of the amount of mass — the density mass — is going to be the same as whatever sits in here.
- So. probabilities are preserved (i.e., identical), at least in terms of the quantiles of the distribution.
- The mean is not the same, but the median is.
Taylor Series : Lecture starts here.
5.6. Forecasting and decomposition
Since we have learnt how to decompose the time series into 3 components (T= S+T_R), we now can first forecast the components and then combine them into one forecast.
- Fit a decomposition model which involves both an STL decomposition followed by separate models for the seasonally adjusted series and the seasonal component.
- When I produce forecasts of that, it's a forecast of the original series.
- Under decomposition, model understands it's going to put these together at the end of the day
- it looks to see what the model is for the seasonal component and what the model is for the adjusted component and adds them together to get forecasts of the original series
- that's what it comes back with a forecast of the original series in the usual format the distribution and then the mean of the distribution
## use the function decomposition model ## 1 dcmp <- decomposition_model( STL(Employed), NAIVE(season_adjust), SNAIVE(season_year) ) ## 2 us_retail_employment %>% model(stlf = dcmp) %>% forecast()%>% autoplot()
5.7. 7 Evaluating forecast accuracy
5.7.1. Training and test sets
- Same as the ones taught in ETC3250.
5.7.2. Forecast errors
- Errors are not the same as the residual (Here).
Measures of forecast accuracy
for notation above,
- y t plus h is the t plus h th observation
- y hat t plus h given t is the forecast based on data up until the end of the training set
- e t plus h is the difference between the two
Example of Scale independent:
If the unit of e is a dollar, the units of e in
MAE
, RMSE
are dollars; but the unit of MSE
is dollar ^2.MAPE is a common metric to be used in the industry. But it has drawbacks: 1) yt has to be positive. 2) y has an absolute 0.
So Rob invented one that called
MASE. ->>>>
Works well because it can be used to compare forecast accuracy across series with different units.
Mean Error (ME) and Mean Percentage Error (MPE) are a measure of bias, rather than accuracy, which Rob does not normally look at.
5.8. 8 Time series cross-validation 2. 4
FAQ
- Author:Jason Siu
- URL:https://jason-siu.com/article/7d3431d9-8016-4409-8591-87d58880982a
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!
Relate Posts