Exponential smoothing (ETS)

type

Post

Created date

Jun 16, 2022 01:21 PM

Simple forecasting smoothing

Before introducing Simple forecasting smoothing, you need to know about what Average and Naive methods are :

AVG : All observations are equally weighted

when we were using average method we say that the forecast for the future it is the average of all the values in the time series that is we said that our time series.

that is, we do not distinguish between the previous observation the observation before that or the last observation we give them equal weights

NAIVE : The last observation contains all information; previous observation provides no information, so all weight is given to the last information.

our forecast for the future is whatever observation we observed in the previous period, that is whatever we observed the last time and any information contained in the observation before that observation is zero

that is, only the last observation contains all the information and we can use that information to forecast.

Not being too extreme, Simple forecasting smoothing is lying between the method of AVG and Navie; most recent data should have more weight.

It is suitable for forecasting a series with no trend or seasonality.

Simple Exponential Smoothing (SES)

Used when you don’t know the trend and seasonality.

What is Smoothing params? (??)

Smoothing params controls the rate of change of the components, which are .

Rule of thumb for those parameters?

7.2: Simple exponential smoothing youtube

T+1 is a time on which is one step ahead /

What is the intuition of Alpha ?

Alpha is a param to control how much weight we want to assign to each observation.

The value lies between 0 and 1.

if is closer to 1 (large), then we assign more weight to the most recent observations and the weight decay very rapidly.

Conversely, if is closer to 0 (small) small weights are assigned to the most recent observations and the weights they decay pretty slowly over time.

When = 1, it becomes the method of Naïve, that is, the last observation contains no information

So here comes the question: How do we estimate the value of alpha 7.3: Simple exponential smoothing in component form - YouTube. But essentially we find an alpha in which its level has the least SSE.

It turns out Component Form answers this question.

Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation.

The level (or the smoothed value) of the series at time

Setting h = 1 gives the fitted values, while setting gives the true forecasts beyond the training data.

Level depends on the previous level. i.e., whatever level was in time t minus 1 and the value of alpha.

We can compute all forecasts when we have & alpha, so we will need to estimate these two first.

Optimization

What is ?

determines how wiggly the line is. The higher , the wigglier

When = .99, it seems like the trend is overfitting.

Holt's linear trend methods

An extension of SES allowing local trends in the data (Small part of time space) and the seasonality.

(1214) 7.5: Holt’s linear trend method - YouTube

Damped trend methods (保守d)

7.8: A comparison of the forecasting performance of SES, Holt's trend, and damped trend methods in R - YouTube

Holt's linear trend methods may have a problem of over forecasting in this case, so sometimes it makes more sense if we can dampen this forecast and say that the trend is gonna keep increasing in the same direction, but just gonna keep dampening as we move in time.

That is, it will not be as aggressive as it was, showing recently that is for the long time horizon the trend will be a little bit smaller in slope as compared with the trend that we are observing for our near forecasts

so to do that, we can introduce another parameter phi and now all of our three equations will contain these five parameters.

What will the model be in long run?

In short run, forecasts are trending, but in long run, the forecast remains constant.

WDYM by constant in a trend ?

if the trend is increasing, it will keep on increasing the trend

Holt-Winters’ seasonal method

This method extended Holt’s method to capture seasonality.

Holt-Winters’ seasonal additive method

Holt-Winters’ seasonal multiplicative method

ETS

ETS has 2 meaning 1) ExponenTial Smoothing; 2) Error Trend Seasonality (i.e., state)

Three components are Error Trend Seasonality.

Formula of Additive error models

Formula of Multiplicative error models

Innovations state space models

ETS - Coding in R

Simple Exponential Smoothing

ETS(y ~ error("A") + trend("N") + season("N"))

By default, an optimal value for α and `0 is used. α can be chosen manually in trend().

trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))


algeria_economy <- global_economy %>%
filter(Country == "Algeria")
fit <- algeria_economy %>%
model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")))
report(fit)

Plot forecast


fit %>%
forecast(h = 5) %>%
autoplot(algeria_economy) +
labs(y = "% of GDP", title = "Exports: Algeria")

Modeling with trend

ETS(y ~ error("A") + trend("A") + season("N"))

By default, an optimal value for α and `0 is used. α can be chosen manually in trend().

trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))


aus_economy <- global_economy %>% 
filter(Code == "AUS") %>%
mutate(Pop = Population / 1e6)
fit <- aus_economy %>%
model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N")))
report(fit)


fit %>%
forecast(h = 10) %>%
autoplot(aus_economy) +
labs(y = "Millions", title = "Population: Australia")


## Dampen
aus_economy %>%
model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) %>%
forecast(h = 20) %>%
autoplot(aus_economy)

All in one


fit <- aus_economy %>%
filter(Year <= 2010) %>%
model(
ses = ETS(Pop ~ error("A") + trend("N") + season("N")),
holt = ETS(Pop ~ error("A") + trend("A") + season("N")),
damped = ETS(Pop ~ error("A") + trend("Ad") + season("N"))
)
tidy(fit)
accuracy(fit)

Modeling with seasonlity

Holt-Winters additive method with additive errors.

components(fit)


aus_holidays <- tourism %>%
filter(Purpose == "Holiday") %>%
summarise(Trips = sum(Trips))
fit <- aus_holidays %>%
model(
additive = ETS(Trips ~ error("A") + trend("A") + season("A")),
multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M"))
)
fc <- fit %>% forecast()

fc %>%
autoplot(aus_holidays, level = NULL) +
labs(y = "Thousands", title = "Overnight trips")

Holt-Winters damped method


sth_cross_ped <- pedestrian %>%
filter(
Date >= "2016-07-01",
Sensor == "Southern Cross Station"
) %>%
index_by(Date) %>%
summarise(Count = sum(Count) / 1000)


sth_cross_ped %>%
filter(Date <= "2016-07-31") %>%
model(
hw = ETS(Count ~ error("M") + trend("Ad") + season("M"))
) %>%
forecast(h = "2 weeks") %>%
autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) +
labs(
title = "Daily traffic: Southern Cross",
y = "Pedestrians ('000)"
)

Automatic forecasting


fit <- global_economy %>%
mutate(Pop = Population / 1e6) %>%
model(ets = ETS(Pop))


fit %>%
forecast(h = 5)

R interpretation

alpha :

here in this case is optimal value (You need to compare to know).

The smoothing parameter is 0.322, which is pretty big so that means it's moving the intercept pretty quickly to changes in the data. Which is appropriate given the amount of movement that we saw in the data

L_Not : Initial level we talked before

Not the first value nor the mean.

was computed by optimizing for the minimal sum of squared errors.

it's wherever the general location of the data is at that point, which is 100647 here

FAQ

Math

Damped trend method

Reference

A Gentle Introduction to Exponential Smoothing for Time Series Forecasting in Python (machinelearningmastery.com)