Skip to main content


Corporate InformationResearch & Development

Industrial AI blog

Relevance of historic data for modeling and estimation

24 March 2020

Vinoth Kumar
R&D Centre, Hitachi India Pvt. Ltd.              

Yasushi Harada
Research & Development Group, Hitachi, Ltd.

Electric power utilities operate generation, transmission and distribution infrastructure efficiently by estimating their load one day ahead [1] because generators such as thermal power plants take a long time to ramp up their production. Load estimation has become even more necessary in recent times owing to the de-regulation in electricity markets, increasing distributed energy resources including renewables, storages and electric vehicles in the transmission and distribution network.

With improvement in technologies to acquire, store and retrieve load data over long periods of time, data-driven models have gained popularity for estimating the day ahead load [2]. However, there is a cost involved in acquiring and storing historic data and often there is an upper limit to the accuracy that one can achieve by using larger amount of historic data. This motivated us to develop a theory to quantify the amount of historic data necessary to achieve a certain level of accuracy in load estimation.
Time series data, such as load data may undergo structural changes over longer periods of time e.g. varying demography, power tariffs, popularity of energy efficient products etc. that impact power consumption. As shown in Figure 1 below, some of the examples of structural changes are highlighted and our approach in two steps to identify structural change followed by trade-off quantification are presented.

Figure 1: Examples of structural change in load

Although it is difficult to model these changes periodically, it is helpful if we can identify these change points to suggest relevant historic data that can be used to model the day ahead load. In this study, we present a novel method and mathematical criteria to examine historic data for structural change. We call the method as HISIA, an acronym for Hitachi and Indian Statistical Institute Analysis. HISIA method can be used for data-driven load forecasting to examine the amount of historic data that can be used to achieve maximum possible accuracy.

The two key steps in our approach are presented below using an illustrative example. Figure 2 presents the first step, where the objective is to identify the structural change in load data. We consider daily load data between 9 to 10 AM (referred to as segment) with a time resolution of every minute (referred to as sample) available to us from 1st to 6th September. To estimate the mean load on 7th September between 9 to 10 AM, we must first detect if there was a structural change in the given time-series data. All samples and segments are assumed to belong to the normally distributed population and relationship between segment means are not assumed. Further, a confidence level for comparing means is set (e.g. 95%) and Mean Square error (MSE) is formulated to quantify the error in estimating mean.

Figure 2: Identifying structural change

In the second step, as shown in Figure 3 below, we investigate if samples from the last rejected segment could be used to improve the accuracy of our estimate. This is also achieved by formulating a mathematical expression that quantifies the trade-off in using samples gathered before structural change. With x7 estimated by using x, assuming structural change is suspected before x3, tradeoff for using more data i.e. from 2nd Sept. is quantified by using Mean square error that has bias and variance components.

Figure 3: Trade-off in using more historic data


The trade-off in using observations/data from before change is expressed in error with two components – Variance (decreasing) and Bias (increasing). Once mean has changed, the scope of reducing variance component is very limited as compared to increasing error contribution from bias. It may be noted that a higher confidence level specified as an input will lead to fewer segments being accepted. It beats the trial and error approach with additional historic data, which takes more time and cost without a clear hint about the marginal benefit.

As an experiment, we evaluated HISIA using the integrated hourly load from Zone-F of NYISO [3]. We used average load between 8 to 9 AM of each day as sample and collection of all such data for each year as segment as can be seen in Figure 4 below. If we consider year 2014 as current segment for which estimate must be made, we ask a question on how many segments from the past we must use. The first step is shown on the left-hand side of Figure 4 wherein we detect the structural change before segment (Year 2012).

Figure 4: Results of Structural change on estimated mean load


The second step of quantifying tradeoff in using historic data before is presented on the right-hand side of Figure 4. It may be noted that the decreasing trend in MSE continues until 1100 observations from past, which is the optimal size of historic data to be used for load estimation (8 to 9 AM of year 2014).
The proposed method may be considered simple yet optimal because it does not assume existence as well as number of change-points and relationship between the parameters in each segment. For more details, we encourage you to read our paper, “Mathematical examination of structural changes in load forecasting models,” which is available on ScienceDirect [4].


N Srinivasan and Lee, 1995 Srinivasan, D and Lee, M A; Survey of hybrid fuzzy neural approaches to electric load forecasting IEEE International Conference on Intelligent Systems for 21st Century, 1995
Pengwei and et al., 2017 Pengwei, Su et. al; Recent Trends in Load Forecasting Technology for the Operation Optimization of Distributed Energy System Energies, 2017
NYISO NYISO; Market Operation Data,2014