I approached the intersection of Selby Lane and El Camino Real cautiously, like many times before. The intersection has a marked crosswalk, but no stop light. I dismounted my bike and waited for the continuous stream of cars to break. The cars in the first two lanes were slowing down for me. I made eye contact with them and started to walk across the six lane road. There were no cars in the third lane. I continued.
Another pedestrian was crossing El Camino from the other side where cars in all three lanes had already stopped. Screeching tires snapped my attention back to the third lane. A big white truck had swerved from the second lane into the third lane and was skidding straight at me.
My muscles tensed as I sprang towards the median while dragging my bike. The truck continued to skid as I made it to the median. Finally, half way into the crosswalk, the truck came to rest. The driver looked down at me from his lifted white truck, shook his head at me, and drove off.
My heart was racing as I stood on the little safe median. One person shouted from their car, “you should call the police!” What would a phone call even do? I’m not sure. I wasn’t in the right state of mind to grab the license plate number and I wasn’t physically injured. Could the driver be charged with unsafe or reckless driving?
Did he swerved out of the second lane to avoid a collision with the stopped cars? Or maybe he switched lanes to avoid stopping without knowing there were us pedestrians crossing a marked crosswalk.
Last week the National Renewable Energy Laboratory (NREL), the United States’ epicenter of solar and wind research, hosted the first openmod workshop in North America. The workshop was a congregation of academics, scientists, researchers, and open software and data enthusiasts gathering to discuss the state of open source energy modeling.
We discussed a broad range of models. On one end, there were extremely detailed models. Models that are excellent for understanding how the electric grid will respond to changing weather patterns that alter renewable energy availability in the coming hours or days. Models like ours at Carnegie Science, which focus on 50 year to century scale energy transitions, were the other end.
Beyond interest in models, some groups focused on model inputs and making data available. The Catalyst Cooperative is gathering data from disparate sources into an open communal databases for everyone to use. This effort is part of their Public Utility Data Liberation (PUDL) project.
In line with the communal data theme, I presented a 7 minute lightening round talk on the electricity demand project Dave Farnham (@farnham_h2o) and I have been working on. This is a project focused on making publicly available electricity demand data usable by everyone and was the subject of a previous post.
Altering data can be contentious. And, it should be if there is no well defined method to identify data to be replaced and deciding how to replace it. Because of this, I initially thought there would be some opposition to our work. After all, a 7 minute talk is not enough time to allay everyone’s fears.
There was support for our approach once the workshop participants saw the magnitude of the anomalous deviations we target. One participant was the exception who needed much more detail than what was possible in my 7 minute talk. We invested the additional time and effort discussing the details. And, it paid off. In the end, this participant expressed his support for our method.
In the coming weeks I hope to post a link to a recording of the talk, which is not currently available. For now, please see the linked slides if you are interested.
Creating electricity to power our industries, schools, hospitals, and modern lifestyles consumes 40% of all primary energy in the U.S. At Carnegie Science, we are studying what paths the electricity system could take to become net zero in carbon emissions in the future.
It would be incredible to have a clean 100% renewable wind and solar based electricity system. However, there are real challenges in meeting energy demand at all hours because the sun does not always shine and the wind does not always blow. These hurdles can be overcome with smart choices in energy storage and by wise planning based on studying the variability of wind and solar resources.
At Carnegie Science, we have built a computer model of a simplified energy system to study net zero emissions systems. Any energy system our model designs must be able to supply electricity to meet the desired consumption of the U.S. for every hour of every day in the future. To begin to understand what is required, we use historical hourly electricity demand as one of the model inputs.
One of my colleagues, David Farnham (@farnham_h2o), and I are working on preparing these historical electricity demand data for our model. The U.S. Energy Information Administration (EIA) graciously collects hourly information from the utilities across the U.S. and publishes that data for analysis and use by the public.
However, we are all at the mercy of the reporting practices of each utility. If utilities report outrageous numbers, the EIA publishes outrageous numbers. And, when these numbers are used in an energy model, they can lead to wild results.
David and I have been developing algorithms to identify these anomalous values. After identifying anomalies, we replace them with a best estimate of what the true value probably was. A great example of some strange values can be seen in the below graphic, which shows the hourly electricity demand for the PacifiCorp West service territory over 10 December days in 2016.
Even without any background knowledge of what electricity demand should look like, the problem region jumps out immediately. The demand increases by a factor of 7 for 24 hours compared to the surrounding data. There is also a sudden one hour drop in demand which we also flag as anomalous. Our brain is phenomenal at pattern recognition and at identifying regions which do not conform with their surroundings.
Imagine designing an energy system which had to provide electricity for those 24 anomalous hours. You would build a system 7 time larger than what is needed for the rest of the year. Utility rate payers would be up in arms.
We could visually check all 56 reporting regions in the U.S. for all four years of hourly data: 56 regions * 4 years * 8760 hours per year = 1,962,240 data points! Instead, we devise algorithms to scan the data for us.
A good algorithm is reusable. We are putting in extra effort now to design the best algorithms possible for the task with an aim of reusability. In 6 months, when there is a new 6 month chunk of data, we will simply run our code to clean it up and share the results with colleagues. David and I plan to publish our techniques and make the clean data available for everyone.
In two weeks, I am going to be sharing our techniques at an upcoming Open Energy Modeling workshop at the National Renewable Energy Laboratory. I hope that the intense effort we put into this work leads to a data product that other research teams can also use for their modeling.