Comparing FERC and EIA electricity demand data

The United States government coordinates the collection of hourly electricity demand data from regional entities for use in planning and decision making processes.  The Federal Energy Regulatory Commission (FERC) provides easily accessible data records spanning 2006-2018 for a mix of Balancing Authorities (BAs) and Planning Areas with Form 714.

While the Energy Information Administration (EIA) began their collection of hourly electricity demand data in July of 2015 for all BAs with Form 930. The EIA data are updated in near real-time and bring other benefits such as including hourly generation by resource type: coal, hydropower, natural gas, nuclear, wind, solar, petroleum, and other.

An interesting question for the energy modeling community is, does the 2017 data gathered by FERC align with the 2017 data gathered by EIA?  Can these records be used almost interchangeably?  Additionally, benefits will be realized by stitching together the longer historical FERC data records with the EIA records that contain more details of the current system.

One of our collaborators, Zane Selvans (@ZaneSelvans) of the Catalyst Cooperative (@CatalystCoop), mapped the ~200 FERC respondents to the ~70 EIA BAs and arranged the FERC data into a more usable format.  With this, we compared the hourly demand values for the successfully mapped BAs for 2017.  Details of the comparison methods are at the end of this post.

Results

We compare the ratio of FERC hourly values to EIA hourly values and calculate the ratio of mean, minimum, and maximum values for each region.

California Independent System Operator (CISO)

Midwest Independent System Operator
(MISO)

The two examples here show hourly comparisons for CISO, with most values nearly identical and nearly all within 10%, and MISO, with most values agreeing within 10% and overall agreement based on the ratio of mean values of 1.01.

ISO New England (ISNE)

PJM Interconnection (PJM)

Some regions show a mean value close to 1 yet have non-uniform features in their distributions, such as ISNE (ratio of mean values = 0.99) and PJM (ratio of mean values = 0.98).

Furthermore, other regions have substantial discrepancies in the ratio of their mean values.  A histogram of the ratios of the mean values for each compared BA shows agreement within a few percent for over 30 BAs (a csv file is attached at the bottom showing the ratio of their mean, minimum, and maximum values). Additionally, we compare the minimum and maximum values and see a distribution similar to the mean value comparison.

Ratio of the mean of demand values for each mapped BA (FERC mean value/EIA mean value)

Ratio of the minimum and maximum demand values for each mapped BA

Conclusion

There are a considerable number of Balancing Authorities that have reasonably similar FERC and EIA hourly demand records based on agreement within a few percent of the ratios of mean, minimum, and maximum values.  This indicates that the FERC and EIA records may be approximately interchangeable for these BAs if the exact hourly profile is not a concern (see excel file for list).  The fact that many histograms contain a spread about 1.0 is worth exploring for anyone considering using these profiles as replacements for each other while modeling. Are there biases in which hours are misaligned?

In the future, this could also allow analysts to stitch together the longer FERC records with the more current and detailed EIA records.  The Catalyst Cooperative and Zane are pursuing work along these lines.  We wish them the best of luck!

Details

The FERC data contains records from both Balancing Authorities and Planning Areas, while the EIA records are only for Balancing Authorities.  Therefore, many of the FERC records do not have EIA equivalents.  We only compare records that we think should align.

Both the FERC and EIA data records are imperfect, containing zero values, missing values, and the occasional outlier value.  For the EIA data, we use the EIA records after removing outlier values based on the details in this paper.  For the FERC data, we use the FERC records arranged by Zane with all zero values removed.  Hours are only included in the comparison if the corresponding hourly value in each record was present and was not removed by these two cleaning methods.

  • Summary csv file: comparing the mean, minimum, and maximum values in the FERC 714 and EIA 930 hourly demand data for year 2017 for the matched BAs.
  • FERC to EIA mapping: the mapping of FERC respondents to their EIA codes and acronyms provided by Zane.

Electricity Demand

Creating electricity to power our industries, schools, hospitals, and modern lifestyles consumes 40% of all primary energy in the U.S. At Carnegie Science, we are studying what paths the electricity system could take to become net zero in carbon emissions in the future.

It would be incredible to have a clean 100% renewable wind and solar based electricity system. However, there are real challenges in meeting energy demand at all hours because the sun does not always shine and the wind does not always blow. These hurdles can be overcome with smart choices in energy storage and by wise planning based on studying the variability of wind and solar resources.

At Carnegie Science, we have built a computer model of a simplified energy system to study net zero emissions systems. Any energy system our model designs must be able to supply electricity to meet the desired consumption of the U.S. for every hour of every day in the future. To begin to understand what is required, we use historical hourly electricity demand as one of the model inputs.

One of my colleagues, David Farnham (@farnham_h2o), and I are working on preparing these historical electricity demand data for our model. The U.S. Energy Information Administration (EIA) graciously collects hourly information from the utilities across the U.S. and publishes that data for analysis and use by the public.

However, we are all at the mercy of the reporting practices of each utility. If utilities report outrageous numbers, the EIA publishes outrageous numbers. And, when these numbers are used in an energy model, they can lead to wild results.

David and I have been developing algorithms to identify these anomalous values. After identifying anomalies, we replace them with a best estimate of what the true value probably was. A great example of some strange values can be seen in the below graphic, which shows the hourly electricity demand for the PacifiCorp West service territory over 10 December days in 2016.

Electricity demand during 10 December days in 2016 in the PacifiCorp West service territory of the U.S. Data pulled from EIA database Sept. 3, 2019.

Even without any background knowledge of what electricity demand should look like, the problem region jumps out immediately. The demand increases by a factor of 7 for 24 hours compared to the surrounding data. There is also a sudden one hour drop in demand which we also flag as anomalous. Our brain is phenomenal at pattern recognition and at identifying regions which do not conform with their surroundings.

Imagine designing an energy system which had to provide electricity for those 24 anomalous hours. You would build a system 7 time larger than what is needed for the rest of the year. Utility rate payers would be up in arms.

We could visually check all 56 reporting regions in the U.S. for all four years of hourly data: 56 regions * 4 years * 8760 hours per year = 1,962,240 data points! Instead, we devise algorithms to scan the data for us.

A good algorithm is reusable. We are putting in extra effort now to design the best algorithms possible for the task with an aim of reusability. In 6 months, when there is a new 6 month chunk of data, we will simply run our code to clean it up and share the results with colleagues. David and I plan to publish our techniques and make the clean data available for everyone.

In two weeks, I am going to be sharing our techniques at an upcoming Open Energy Modeling workshop at the National Renewable Energy Laboratory. I hope that the intense effort we put into this work leads to a data product that other research teams can also use for their modeling.