Out in the real world, data is being collected in ever-increasing amounts. Currently, around 2.5 quintillion bytes of data are created every day. Much of this data comes in the form of time series, such as monthly sales figures, annual population numbers or daily rainfall. As we explained in our previous post, time series are great for forecasting. This is because they can be broken down to find the underlying trends and seasonal patterns. From these we are able to create accurate forecasts.
We use a six-step process in all of our time series forecasting projects:
- Define the problem and the outcome
- Prepare the data
- Analyse the time series
- Model the forecast using insight from the analysis
- Deploy the forecast model
- Monitor the forecast's performance
Step 1: Define
As in any successful project, the first step is to clearly define the problem. This includes the motivation behind creating a forecast and the intended goals and outcomes. After this, you can decide how to tackle the task at hand, including which software to use at each stage. For example, we might use Excel for data preparation, R for the analysis and modelling and Power BI for deployment.
Step 2: Prepare
It is essential to properly prepare and clean the data that will be used to create the forecast. Data cleansing might involve removing duplicated or inaccurate records, or dealing with missing data points or outliers.
In the case of a forecasting project, the data will take the form of a time series. Depending on what is being forecast, the observations might be daily, weekly, monthly, quarterly or yearly.
Step 3: Analyse
Once the data has been prepared, the next step is to analyse it. For a time series, this involves decomposing the series into its constituent parts. These include trend and seasonal effects. The trend is the long-term overall pattern of the data and is not necessarily linear. Seasonality is a recurring pattern of a fixed length which is caused by seasonal factors (for example, rainfall being higher during the winter months).
See Rob J. Hyndman and George Athanasopoulos's fantastic (and free!) online textbook, Forecasting: Principles and Practice, for more information about time series in the context of forecasting.
Step 4: Model
Forecasts are created by combining the trend and seasonality components. There are functions that can do this for you in Excel, or it can be done by hand in a statistical package like R.
If modelling manually, refine the weightings of each component to produce a more accurate model. The model can be edited to account for any special factors that need to be included, such as a sale or promotion. However, be careful to avoid introducing bias into the forecast and making it less accurate.
Whether using Excel or R (or most other forecasting software), your model will include prediction intervals (or confidence intervals). These show the level of uncertainty in the forecast at each future point.
Step 5: Deploy
Once you are happy with your model, it's time to deploy it and make the forecasts live. This means that decision makers within the business or organisation can utilise and benefit from your forecast. Deployment may take the form of a visualisation, a performance dashboard, a graphic or table in a report, or a web application.
You may wish to include with your forecast the prediction intervals calculated in the previous step. These show the user the limits within which each future value can be expected to fall between if your model is correct.
Step 6: Monitor
After the forecast goes live, it is important to monitor its performance. A common way of doing this is to calculate the accuracy using an error measurement statistic. Popular measures include the mean absolute percentage error (MAPE) and the mean absolute deviation (MAD).
Depending on what is being forecast, it may be possible to update your model as new data becomes available. This should also lead to a more accurate forecast of future values.
by Dominic Nelson
- Aug 31, 2018 Plotly in R: How to make ggplot2 charts interactive with ggplotly Aug 31, 2018
- Aug 16, 2018 Making the most of box plots Aug 16, 2018
- Jul 24, 2018 Plotly in R: How to order a Plotly bar chart Jul 24, 2018
- Apr 11, 2018 Machine learning in the housing sector Apr 11, 2018
- Mar 5, 2018 How Useful Are Traffic Light Scorecards for Performance Management? Mar 5, 2018
- Feb 16, 2018 How to merge multiple data frames using base R Feb 16, 2018
- Feb 8, 2018 The beginner's guide to time series forecasting Feb 8, 2018
- Jan 24, 2018 R Shiny vs. Power BI Jan 24, 2018
- Aug 15, 2016 Fundamentals of a good performance framework Aug 15, 2016