From Raw Data to Field Decisions
- Apr 29
- 5 min read
The conversation focused on how different organizations are collecting, scrubbing, validating, and reporting reliability data, especially around outage information and core reliability metrics. A clear theme was that many teams have access to daily or next-day data, but that data is often still considered raw, preliminary, unverified, or dynamic until someone reviews it and makes sure the outage details are correct.
Participants described several different approaches. Some are still using reports and spreadsheets, while others are moving data into centralized repositories, SQL environments, data lakes, or Power BI dashboards. A few groups said they can usually get to a solid number within a few days, while others described a month-end process that takes several days into the following month. One participant described the current process as taking about six weeks to fully validate what happened at the start of a month, which is why they are working toward more daily validation, more automation, and better quality control closer to the source.
Key Takeaways
Daily data is useful, but it is not always final.
Several participants said they can look at what happened yesterday through a dashboard, scorecard, or daily report. However, that information may still be preliminary. The group used words like raw, unverified, unreconciled, and dynamic to describe data that is available quickly but still subject to change.
Scrubbing timelines are different across organizations.
Some teams said their data is usually sturdy within about three days. Others said their operators or analysts review the month’s outages and finalize metrics within the first five to ten days of the next month. Another example was a process where January 1 data may not be fully validated until roughly six weeks later.
Several teams are trying to move quality control closer to the source.
One organization described wanting “point of entry” or shift-based quality control, where the people closest to the outage review the information sooner. Others talked about operators, shift supervisors, field techs, engineers, and business analysts reviewing outage records before they become final.
There is a lot of manual review behind the numbers.
Even where dashboards exist, participants made it clear that people are still checking start times, end times, durations, customer counts, device operations, cause codes, and crew comments. Some teams are comparing multiple systems to make sure the outage record makes sense.
AMI, SCADA, OMS, ADMS, GIS, and SQL data are all part of the picture.
Participants described pulling data from outage management systems, advanced distribution management systems, AMI, SCADA, GIS, transmission or substation systems, SQL servers, and internal databases. In many cases, one system is the system of record, but other systems are used to validate or challenge what was entered.
Some organizations are building toward a centralized source of truth.
One example shared was a move toward a centralized data lake so historical reliability data can be stored in one place and pulled into reports or dashboards more easily. Others described querying databases directly or using SQL copies of OMS data for analysis.
Dashboards and scorecards are becoming a major part of the process.
Participants talked about Power BI dashboards, daily scorecards, outage trackers, CEMI trackers, and reports that go out to leadership, operations, engineering, and other groups. Some dashboards use red, yellow, and green status indicators to show whether a metric is above target, close to target, or below target.
Exception reporting is helping teams find issues before final reporting.
One participant described daily checks that flag outages where the customer count in OMS does not match what GIS shows for that device. Others described checking AMI against OMS for start and stop times, looking for duplicate events, reviewing false incidents, and asking operations to correct records when something looks off.
Cause codes were a big part of the discussion.
Participants recognized that cause coding can be hard to get right, especially when crews are entering information during the event or in the middle of the night. Several examples focused on improving the accuracy of outage causes so the data can actually be used to drive follow-up work.
Vegetation cause codes need more detail to be useful.
One participant explained that simply coding something as a “tree” outage is not enough. Their team is trying to understand whether it was a live tree, dead tree, limb, inside the right-of-way, outside the right-of-way, or weather-related. That detail helps the vegetation team determine whether something was missed, whether the trim cycle needs attention, or whether customer outreach may be needed.
Underground conductor failures are being separated from broad equipment failure categories.
One organization described creating a specific cause code for underground primary conductor failures instead of grouping those events under general equipment failure. They are using that information to track repeat spans, color-code problem areas on maps, and support cable rehab or replacement decisions.
Momentary interruptions are getting more attention as visibility improves.
Participants discussed MAIFI and momentary events, especially where more smart devices, reclosers, and distribution automation are being added. One point made was that MAIFI may go up simply because the organization now has better information from devices that used to be “dumb” or not visible.
More device data does not always mean performance is worse.
The group discussed that more momentary events may show up because of fuse-saving schemes, recloser operations, or better device communication. Participants noted that the important question is whether the operation was expected based on the protection scheme or whether it points to a field condition that needs to be addressed.
CEMI is being tracked in several different ways.
Participants talked about CEMI thresholds such as four interruptions over two years, rolling twelve-month CEMI, year-to-date CEMI, and higher interruption bands like six, eight, ten, and twelve. The broader point was that teams are using CEMI not just as a reportable metric, but also as a way to find customers or areas that may need attention.
Planned work can create tension with CEMI goals.
One participant said they had seen cases where reliability improvement work caused an additional planned outage, which could make the metric look worse even though the work was intended to fix the problem. Their organization created an exclusion for certain planned outages that are specifically done to improve reliability.
Forecasting and predictive work are still developing.
Participants discussed several levels of forecasting. One approach was to take the current year-to-date metric, add the average of the previous three years for the remaining part of the year, and use that as a year-end projection. Others talked about early exploratory work around predictive modeling for device failures or using storm tools to estimate system response and crew needs.
Several participants are trying to move from reporting to action.
The conversation was not just about producing metrics. Participants repeatedly connected better data to better follow-up, such as checking vegetation events, identifying repeat underground cable failures, reviewing circuits with high interruption counts, or deciding where field work is needed.
The group wants to see real examples in the next discussion.
Participants showed interest in seeing actual dashboards, scorecards, predictive methods, exception reports, and tools. There was also interest in understanding what others are using, whether internally built or commercial, and how far along those efforts are.
Next Steps
The follow-up forum is scheduled for May 14, 2026
Comments