type

Post

Created date

Jun 16, 2022 01:21 PM

category

Data Science

tags

Applied forecasting

Economics

status

Published

Language

From

summary

slug

password

Author

Priority

Featured

Featured

Cover

Origin

Type

URL

Youtube

Youtube

icon

This week, we're gonna talk about exponential smoothing, which is some of the most successful forecasting ways to generate reliable forecasts for a wide range of time-series data. ETS essentially is that we give more weight to the most recent observations, and then the weights will be decaying exponentially. That is, the most recent observations will contain more information.

An example would be the stock price yesterday contains more information as compared to the stock price two days ago

we will look at three types of forecasting models based on

- a series with no trend or seasonality → (Simple Exponential Smoothing)

- a series with the trend but no seasonality → (Holt's linear trend methods, damped trend methods)

- a series with both trend and seasonality → (Holt-Winters’ seasonal method, HoltWinters’ Multiplicative method, HoltWinters’ additive method)

#### Simple forecasting smoothing

## Before introducing Simple forecasting smoothing, you need to know about what Average and Naive methods are :

#### AVG : All observations are equally weighted

when we were using average method we say that the forecast for the future it is the average of all the values in the time series that is we said that our time series.

that is, we do not distinguish between the previous observation the observation before that or the last observation we give them equal weights

#### NAIVE : The last observation contains all information; previous observation provides no information, so all weight is given to the last information.

our forecast for the future is whatever observation we observed in the previous period, that is whatever we observed the last time and any information contained in the observation before that observation is zero

that is, only the last observation contains all the information and we can use that information to forecast.

Not being too extreme, Simple forecasting smoothing is lying between the method of AVG and Navie; most recent data should have more weight.

It is suitable for forecasting a series with no trend or seasonality.

#### Simple Exponential Smoothing (SES)

Used when you don’t know the trend and seasonality.

#### What is Smoothing params? (??)

Smoothing params controls the rate of change of the components, which are .

#### Rule of thumb for those parameters?

7.2: Simple exponential smoothing youtube

**What is the intuition of Alpha ?**

Alpha is a param to control how much weight we want to assign to each observation.

- The value lies between 0 and 1.

- if is closer to 1 (large), then we
**assign more weight to the most recent observations**and the weight decay very rapidly.

- Conversely, if is closer to 0 (small) small weights are assigned to the most recent observations and the weights they decay pretty slowly over time.

When = 1, it becomes the method of Naïve, that is, the last observation contains no information

So here comes the question: How do we estimate the value of alpha 7.3: Simple exponential smoothing in component form - YouTube. But essentially we find an alpha in which its level has the least SSE.

It turns out Component Form answers this question.

Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation.

- The level (or the smoothed value) of the series at time

- Setting h = 1 gives the fitted values, while setting gives the true forecasts beyond the training data.

- Level depends on the previous level. i.e., whatever level was in time t minus 1 and the value of alpha.

- We can compute all forecasts when we have & alpha, so we will need to estimate these two first.

#### Optimization

#### What is ?

determines how wiggly the line is. The higher , the wigglier

**Holt's linear trend methods**

An extension of SES allowing local trends in the data (Small part of time space) and the seasonality.

#### D**amped trend methods (保守d)**

**Holt's linear trend methods**may have a problem of over forecasting in this case, so sometimes it makes more sense if we can dampen this forecast and say that the trend is gonna keep increasing in the same direction, but just gonna

**keep dampening**as we move in time.

That is, it will not be as aggressive as it was, showing recently that is for the long time horizon the trend will be a little bit smaller in slope as compared with the trend that we are observing for our near forecasts

so to do that, we can introduce another parameter

*phi**and now all of our three equations will contain these five parameters.*## What will the model be in long run?

- In short run, forecasts are trending, but in long run, the forecast remains constant.

#### WDYM by constant in a trend ?

if the trend is increasing, it will keep on increasing the trend

**Holt-Winters’ seasonal method**

This method extended Holt’s method to capture seasonality.

#### H**olt-Winters’ seasonal additive method**

**olt-Winters’ seasonal additive method**

**Holt-Winters’ seasonal multiplicative method**

**Holt-Winters’ seasonal multiplicative method**

#### ETS

ETS has 2 meaning 1) ExponenTial Smoothing; 2) Error Trend Seasonality (i.e., state)

Three components are Error Trend Seasonality.

## Formula of Additive error models

## Formula of Multiplicative error models

#### Innovations state space models

#### ETS - Coding in R

`ETS(y ~ error("A") + trend("N") + season("N"))`

By default, an optimal value for α and `0 is used.
α can be chosen manually in trend().

```
trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))
```

`algeria_economy <- global_economy %>% filter(Country == "Algeria") fit <- algeria_economy %>% model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N"))) report(fit)`

## Plot forecast

`fit %>% forecast(h = 5) %>% autoplot(algeria_economy) + labs(y = "% of GDP", title = "Exports: Algeria")`

**Modeling with trend**

`ETS(y ~ error("A") + trend("A") + season("N"))`

By default, an optimal value for α and `0 is used.
α can be chosen manually in trend().

```
trend("N", alpha = 0.5)
trend("N", alpha_range = c(0.2, 0.8))
```

`aus_economy <- global_economy %>% filter(Code == "AUS") %>% mutate(Pop = Population / 1e6) fit <- aus_economy %>% model(AAN = ETS(Pop ~ error("A") + trend("A") + season("N"))) report(fit)`

`fit %>% forecast(h = 10) %>% autoplot(aus_economy) + labs(y = "Millions", title = "Population: Australia")`

`## Dampen aus_economy %>% model(holt = ETS(Pop ~ error("A") + trend("Ad") + season("N"))) %>% forecast(h = 20) %>% autoplot(aus_economy)`

All in one

`fit <- aus_economy %>% filter(Year <= 2010) %>% model( ses = ETS(Pop ~ error("A") + trend("N") + season("N")), holt = ETS(Pop ~ error("A") + trend("A") + season("N")), damped = ETS(Pop ~ error("A") + trend("Ad") + season("N")) ) tidy(fit) accuracy(fit)`

**Modeling with seasonlity**

Holt-Winters additive method with additive errors.

`components(fit)`

`aus_holidays <- tourism %>% filter(Purpose == "Holiday") %>% summarise(Trips = sum(Trips)) fit <- aus_holidays %>% model( additive = ETS(Trips ~ error("A") + trend("A") + season("A")), multiplicative = ETS(Trips ~ error("M") + trend("A") + season("M")) ) fc <- fit %>% forecast() fc %>% autoplot(aus_holidays, level = NULL) + labs(y = "Thousands", title = "Overnight trips")`

Holt-Winters damped method

`sth_cross_ped <- pedestrian %>% filter( Date >= "2016-07-01", Sensor == "Southern Cross Station" ) %>% index_by(Date) %>% summarise(Count = sum(Count) / 1000)`

`sth_cross_ped %>% filter(Date <= "2016-07-31") %>% model( hw = ETS(Count ~ error("M") + trend("Ad") + season("M")) ) %>% forecast(h = "2 weeks") %>% autoplot(sth_cross_ped %>% filter(Date <= "2016-08-14")) + labs( title = "Daily traffic: Southern Cross", y = "Pedestrians ('000)" )`

#### Automatic forecasting

`fit <- global_economy %>% mutate(Pop = Population / 1e6) %>% model(ets = ETS(Pop))`

`fit %>% forecast(h = 5)`

#### R interpretation

alpha :

- here in this case is optimal value (You need to compare to know).

- The smoothing parameter is 0.322, which is pretty big so that means it's moving the intercept pretty quickly to changes in the data. Which is appropriate given the amount of movement that we saw in the data

L_Not : Initial level we talked before

- Not the first value nor the mean.

- was computed by optimizing for the minimal sum of squared errors.

**it's wherever the general location of the data is at that point, which is 100647 here**

### FAQ

### Math

### Reference

**Author:**Jason Siu**URL:**https://jason-siu.com/article%2Fc4607cee-619e-4028-9148-5dc8c6c75ca6**Copyright:**All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!

Relate Posts