“The map is not the territory.” – Alfred Korzibski
On Sept. 9, 2010, a gas transmission pipeline rupture in San Bruno, CA killed eight people and caused extensive property damage.
The subsequent National Transportation Safety Board (NTSB) investigation revealed that some records in the operator’s Geographic Information System (GIS) were inaccurate and potentially affected events leading to the rupture. This spurred the NTSB to issue a call to operators requiring the extensive verification and validation of pipeline records.
The subtext in all this is that regulatory agencies are deeply concerned about industry data management practices. Regulatory scrutiny will only increase as suburban populations expand and the pipeline infrastructure continues to age.
For a data governance professional, the key questions are: 1) How does incorrect data enter the system? 2) Why is critical data sometimes missing? 3) What can be done to rectify the situation?
Herein, we address the first two questions. The second article in the series will tackle the third, and the final article will address fundamental limitations of data governance and predictive modeling in general.
Humans are masters of abstraction. Unlike denizens of the animal kingdom, human interactions with the physical world are filtered through the lens of language. Data modelers extend the process of abstraction, reducing complex real world entities to rows of data in a database. GIS data modelers take abstraction even further, reducing complex real world features to attributed points, lines and polygons on a digital map. For GIS data modelers, the famous Korzibski quote (“The map is not the territory.”) rings especially true.
Pipeline data capture and storage in a GIS is a process of iterative abstraction. Data in the GIS is often distilled from many sources, with one source (often arbitrarily) designated the authoritative primary source. Common practice in initial population of GIS databases for existing pipelines is to collect data from available manually drafted as-built alignment sheets. However, as-built alignment sheets are merely summaries of more detailed information.
At typical alignment sheet scales, much fine detail is absent. Only a limited volume of data can be displayed on an alignment sheet. Data that enters the GIS is only the tip of the pipeline data iceberg; the data iceberg itself is only a representation of the real iceberg that is the pipe in the ground.
Figure 2: Typical pipeline alignment sheet.
Older pipelines complicate matters. The sophisticated computerized inventory management systems we now enjoy did not exist in the 1950s when the San Bruno line was built; the importance of tracking every joint of pipe was unrecognized. For some older systems there simply may be no adequate records of detailed pipe data. This problem is amplified by asset churn. Many older systems have changed hands several times and data can sometimes be lost in these transactions.
Our data travails only begin with as-built data. In some instances, manually drafted as-built alignment sheets are not updated with ongoing repairs. Such alignment sheets do not reflect post-construction pipe changes. A GIS populated from such sheets lacks the prior pipe repair history. Furthermore, even though the GIS is a digital entity, many operators still rely on paper reports for field data capture, so capture of ongoing data updates in the GIS can be costly and time-consuming.
Finally, the data distillation process is error-prone. As data is transcribed, reduced, and summarized, opportunities for error multiply. GIS data conversions are typically performed by the lowest bidder, and the domain expertise of conversion technicians is GIS, not pipeline. Additionally, much forms-based pipeline data has no readily discernable spatial component, yet spatial context is a requirement of the GIS. Even if data attribution is captured flawlessly, spatial location of the data may not be.
Our situation is challenging. By definition, our GIS paints an incomplete picture of the pipeline, and various factors conspire to exacerbate the problem. The process of data capture and distillation is often costly, error-prone and untimely. These challenges may seem unmanageable, but this is where modern data governance comes to our aid.
Data Governance is the marriage of traditional IT data management methods and technologies with modern business process management methodologies. The data capture and distillation process may be treated as a manufacturing process, and fortunately for pipeline operators, decades of manufacturing process management expertise may be brought to bear to improve pipeline data governance.
In the next installment, we’ll apply key lessons from three prominent schools of manufacturing process management to pipeline data governance: 1) Six Sigma, 2) Lean Manufacturing, and 3) Theory of Constraints. Correctly applied, these lessons can improve data governance, minimize data cycle time, and eliminate data defects.
Tracy Thorleifson is a co-founder of Eagle Information Mapping Inc. and has served as vice president since 1994. During his tenure with Eagle he has managed many enterprise Geographic Information System (GIS) implementations, and has overseen the development of much of Eagle’s proprietary GIS technology. He has played an important role in the development of several pipeline industry data models including the Pipeline Open Data Standard (PODS), the ArcGIS Pipeline Data Model (APDM) and PODS Esri Spatial. Thorleifson has been a lecturer at the University of Houston since 2003, teaching advanced GIS courses in the College of Natural Science and Mathematics. Prior to joining Eagle, he worked for 10 years at Shell Oil Co. as an exploration geologist and research manager. Thorleifson received his M.S. in Geology from Arizona State University in 1984.