The challenge of moving holidays in demand forecasting
Moving holidays are holidays that occur each year, but where the exact timing shifts from the perspective of the Gregorian calendar system. Examples of moving holidays include Easter and Chinese New Year (CNY). Easter generally falls in April but can also fall in late March. Chinese New Year mostly falls in February but can also occur in January. Since the date of these holidays changes from year to year, their effect can impact two or more months depending on the date. Related to Chinese New Year, for example, it is often the case that production accelerates some time before the start of Chinese New Year, almost completely stops during the holidays and finally rises to the regular level after the holidays. In these cases, the effect of the holiday is not confined to the seasonal component of the timeseries since the seasonality rhythm (based on lunar calendar) is not in line with the demand forecast rhythm (based on Gregorian calendar). This often leads to significant decrease in the performance of the statistical forecast (i.e. lower accuracy and higher bias) for the months affected by the holiday. In this blog we will explore how EyeOn addresses this challenge for one of our customers.
Combining conventional statistical forecasting with machine learning
Conventional statistical models (e.g. moving average and exponential smoothing) are widely used within the industry to predict demand. Often with good reason, since these models usually perform reasonably well and they are intuitive and easy to interpret for planners. We have seen, however, that these models (even if we add a seasonal component) are not able to model the complex effects of moving holidays. Based on these observations, we designed the following approach at EyeOn:
- Start with a conventional statistical model to obtain a baseline forecast.
- Predict “uplift” factors for months affected by the moving holiday(s). In other words, in this step we are estimating how much higher or lower the demand is in a given month, relative to the baseline demand. Note that the uplift factor can also be smaller than 1. In that case we are actually scaling down the baseline forecast.
- Multiply the baseline forecast with the uplift factors to obtain the final forecast.
Obviously this approach relies to a large extent on how well we are able to estimate the uplift factors and for this step we are harvesting the power of machine learning. In the context of modelling moving holidays effects, a well-known type of regressor, called the Bell-Hillmer interval , has proven to be very useful. Assuming that the holiday effect is the same for each day of the interval over which the regressor is nonzero in a given year, the value of the regressor in a given month is the proportion of this interval that falls in the month. Using this logic, we can thus define multiple intervals to model the backward and forward effect of a moving holiday. These Bell-Hillmer regressors are used as features in a machine learning algorithm that uses gradient boosting on decision trees.
Although the above description might sound daunting at first, what it essentially boils down to is this: based on the characteristics of one or more moving holidays, we let a smart algorithm learn by which factor we need to adjust the statistical baseline forecast.
Case study: modelling the impact of Chinese New Year for a large multinational
Chinese New Year is China’s most important holiday and the largest annual mass migration on the planet. Since most elderly parents live in rural villages and their children work in the cities, the “chunyun” (spring migration) creates approximately two to four weeks of radio silence from the entire country, including your suppliers, contract manufacturers, and partners. During this time, almost everything shuts down.
All of this poses serious complex supply chain planning challenges for all companies operating in Asia. The graph below shows one of these challenges. The vertical bars represent the sales quantity per month from January 2017 up until March 2021. The orange-shaded bars annotate the months December, January, and February where the effect of CNY is clearly visible. Also, note that the effect varies considerably from year to year. For instance, in the years on which CNY fell in January (2017 and 2020), the sales in January are impacted significantly more compared to the years when CNY fell in February. The blue line in the graph depicts the forecast generated by conventional time series models (i.e. moving average / simple exponential smoothing). Note that, although we supplemented these models with a seasonal component, they do not fully capture the effect of CNY. This results in reduced forecast accuracy and greatly increased bias for the months affected by CNY.
The green line shows the forecast as generated by the method proposed in this blog. Already from looking at the graph, it becomes clear that this forecast outperforms the conventional timeseries forecast. In this case we found an increase of 3.5 percentage points in forecast accuracy and a reduction of 44.6 percentage points in the bias. Note that for the non-impacted months, the two forecasts are identical.
While conventional timeseries models have a good track record and are the de facto standard in the industry, they are often not equipped to capture more complex effects, such as moving holidays. If these effects are large (such as in our example with Chinese New Year), this could lead to diminishing performance of the statistical forecast. To address this issue, we implement this new forecast approach where the conventional timeseries forecast is complemented with a machine learning algorithm that models the effect of moving holidays.