Proper Picture about Time Series Forecasting

Akshith Kumar
6 min readMay 23, 2021

Trouble-Free Concept in 5 min.

What is Time Series Forecasting ?

It is model to predict the future values based on previous observed values. It has certain data points at every time steps.

Time Series Applications :

Time Series widely used in non-stationary data such as

Stock Prices, Weather predictions , Economics, Retail sales etc…

Aspects when dealing with timeseries data :

  1. Stationarity
  2. Seasonality
  3. Auto Correlation

So, what these aspects represents ??

Stationarity :

Stationarity in time series is important and a timeseries is said to be stationary only if its statistical properties doesn’t change overtime. And most importantly maintains constant mean and variance.

Stationarity : It’s behaviour doesn’t change overtime.

Non- Stationarity : It’s behaviour changes overtime.

Seasonality :

Seasonality refers to periodic fluctuations in data. For example, a clothing store would get high sales in particular seasons(like Festivals, Marriages etc…) and rest of the time at low sales. This is called Seasonality in data.

Auto Correlation :

Auto Correlation is similarity between observations as function of time-lag between them.

But, how can we know that our data is stationary or not ??

Don’t worry there is a famous test to check the data.

Test?? OMG

Statistical test — ADF Test (or) Dickey Fuller Test.

By this ADF test we can figure out whether the data is stationary or not.

  • Null hypothesis in ADF test — Not Stationary
  • Alternate hypothesis in ADF test — Stationary
  • ADF statistic < Critical value — Reject null hypothesis
  • ADF statistic > Critical value — Fail to reject null hypothesis

Components of Time Series

There are many components in time series data, but most successfully used are :

  1. Level: The base value for the time series data if it is in a straight line.
  2. Trend: Trend is the increasing or decreasing behavior of the series data over time.
  3. Seasonality: Seasonality refers to periodic fluctuations in data.
  4. Noise: The disturbance in the observations (Series data) that cannot be explained by the model.

Modelling Time Series :

Modelling time series is nothing but visualizing the series of data by Moving average, Exponential Smoothing.

Moving Average :

In Moving Average we can visualize the data, so that we could get an insights of series data by there seasonality and trends in data. We can use the window size for proper visualizations in data.

  • Naive Approach to time series modelling.
  • It states future observation which is the mean of all past observations.
  • Used to identify interesting trends in the data.
  • It smoother the timeseries.
  • Longer the window, the smoother the trend will be.

Exponential Smoothing :

  • It is similar to moving average.
  • It gives less importance to observations as we move further from present.

ARIMA | SARIMA

(Autoregressive Intergraded Moving Average)

(Seasonal Autoregressive Intergraded Moving Average)

Seasonality (S) (P, D, Q, S) :

  • S is season’s length
  • P & Q are similar to p & q
  • D is order of seasonal integration represents number of difference required to remove seasonality from series.

Auto Regression Model (AR) (p) :

  • A regression of the time series onto itself.
  • Current value depends on its previous value with some lag.
  • Taking Parameter (p) which represent the maximum lag.
  • To predict (p) use Partial autocorrelation ( PACF ).
  • To predict (q) use Autocorrelation ( ACF ).

Intergraded (I) (d):

  • It means that instead of taking the raw target values, we are differencing them. For example, our sales prediction model would try to forecast tomorrow’s change in sales (i.e. tomorrow’s sales minus today’s sales) rather than just tomorrow’s sales.

Moving Average (MA) (q) :

  • Parameter (q) represents biggest lag after which other lags are not significant on autocorrelation plot.

OMG !! Enough with theory….

Enough with theory….

Let’s see in practical

So, we are handling AXIS Bank Stocks data..

Import necessary packages

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns

Load dataset

df=pd.read_csv('AXISBANK.csv',index_col=['Date'],parse_dates=['Date'])
df.head()

Data Cleaning

df.isnull().sum()
df=df[df.Symbol!='UTIBANK']

Plot the data

plt.figure(figsize=(17,8))
plt.plot(df.Close)
plt.title('Closing price of Axis Bank')
plt.xlabel('Trading day')
plt.ylabel('Closing price')
plt.grid(False)
plt.show()

Modelling the Series data to Visualize

Moving Average

# Moving Average to smooth our timeseriesdef plot_moving_average(series,window,plot_intervals=False,scale=1.96):
rolling_mean=series.rolling(window=window).mean()

plt.figure(figsize=(17,8))
plt.title('Moving average\n window size={}'.format(window))
plt.plot(rolling_mean,'g',label='Rolling mean trend')
plt.legend()
plt.grid(True)
plot_moving_average(df.Close,30)
Moving average with window size of 30

Exponential Smoothing

def exponential_smoothing(series,alpha):
result=[series[0]] # first value is same as in series
for n in range(1,len(series)):
result.append(alpha * series[n] + (1-alpha) * result[n-1])
return result
def plot_exponential_smoothing(series,alphas):
plt.figure(figsize=(17,8))
for alpha in alphas:
plt.plot(exponential_smoothing(series,alpha),label='Alpha {}'.format(alpha))
plt.plot(series.values,'c',label='Actual')
plt.legend()
plt.title('Exponential Smoothing')
plt.grid(True)
plot_exponential_smoothing(df.Close,[0.5,0.02])
Exponential Smoothing with two alpha values

Dickey-Fuller Test — To check stationarity in data

# To check whether the model is stationary or not then use Dickey-Fuller test
# Statinority
from statsmodels.tsa.stattools import adfullertest_result=adfuller(df['Close'])
ADF tester

ADF test results

Test results

WoW… I got Stationary data.

But if we get Non-Stationary data ??

For that just use difference…

# As my timeseries is in stationary no need of making difference over here.
# But just made this diff dataframe for future references, what if it is not stationary then it would help.
df_diff = df.Close-df.Close.shift(1)# As the timeseries is in seasonal we should difference with seasonal value of 12 monthsdf_seasonal_diff = df.Close-df.Close.shift(12)
One shift for normal diff & twelve shift for seasonal diff

ARIMA Model

ARIMA Model with ACF & PACF

Correlation of time series observations with previous time steps is called lags.

ACF : Plot with time series data by lag is called the Auto Correlation Function.

PACF : Summary of the relationship between an observation in time series data with observations at previous time steps with its relationships of in between observations removed is called Partial Auto Correlation Function.

ACF & PACF of ARIMA Model.
Seasonal ARIMA.
ACF & PACF of SARIMA Model.

Metrics :

$errors = Forecast — actual$

$mse = np.square(errors).mean()$

$rmse = np.sqrt(mse)$

$mae = np.abs(errors).mean()$

$mape=np.abs(errors/x.valid).mean()$

Advanced Models to deal time series forecasting

  • SARIMAX
  • VARMAX
  • CNN
  • RNN
  • LSTM
  • RESNET

Points to remember :

ARIMA Model is based on:
p, d ,q

p — Auto regression model for lags
d — Differencing
q — Moving average for lags

AR model works well with PACF (partial auto correlation)
MA model works well with ACF (Auto correlation)
MA model needs exponential decrease in values
In MA model ACF has sudden shutoff, whereas PACF has exponential decrease in values

Thanks for reading and your interest.

If you like my article, click on clap icon…

--

--