New Concept Based On Big Data To Pipeline Risk Management

March 2015, Vol. 242, No. 3

Qingshan Feng, Petrochina Pipeline Company

In the last several decades, there has been a rapid rise in the use of pipeline risk assessment for managing pipeline safety. It has become an important part of pipeline integrity management, but risk assessment is not an absolute science.

Previous work has followed the traditional approach of using the pipeline experts’ score card: statistical and probabilistic analysis of historical failure data and associated risk parameters to model, and predict the risk level of pipeline systems. The core issue with the traditional risk assessment models is that it only relies on limited sample data for analyzing the risk of a whole system.

Although it requires a firm technical basis, the methods and criteria used are also strongly dependent on social, cultural and historical influences. Hence, as we have seen, different approaches are adopted in different parts of the world.

Methods used range from highly analytical and quantitative to more qualitative and almost subjective approaches. However, recent developments in data management technology – namely big data – have challenged the field of reliability or risk assessment because pipeline operators now can acquire comprehensive data on nearly all pipelines.

This resulted in a transition of the risk-assessment approach from traditional limited analysis to the use of a big data-based approach. Big data has also raises challenges to how to develop new risk-assessment models using meter-to-meter aligned data to manage anomalies like corrosion defects.

This article calls the traditional risk-assessment method into question and presents a new approach to pipeline risk management based on pipeline big data (PBD). Examples of girth weld big data and assessment models are used to illustrate the concept and demonstrate values. Further work on quantifying the probability of failure and confidence level for risk prediction is also identified.

Basics Of Pipeline Big Data

After “Big Data” by Viktor Mayer-Schonberger and Kenneth Cukier2 was published in 2013, it became essential to ask the following fundamental questions about the pipeline industry: 1) What is the pipeline big data 2) How do we form PBD? 3) Under what conditions can PBD be considered to have formed, and is there existing PBD?

PBD is a unified and structured database or data sheet that aligns and integrates the external inspection, design and construction, operation, pipeline environment, and daily management data with inline inspection (ILI) data on a joint-to-joint basis. The basic condition to form the PBD is the availability of ILI information on the anomalies of the pipeline.

When the ILI data is available, using it as the baseline to link, correlate and structure all other data to form the PBD is desirable. With the formed PBD, the confidence level of the risk prediction greatly improves.

Experience with our pipelines leads me to believe the era of PBD has arrived. Integrity management has been carried out on a systematic and structured way that enables the pipeline operator to use ILI data as a baseline aligning, correlating and integrating large amounts of additional available data, including aboveground and in-ditch inspection data, design, manufacturing and construction data, historical and daily management data on a joint-to-joint (or, meter-to-meter scale) to form PBD.

PBD, though possibly in a small volume as compared with overall pipeline systems, has already played a role in the improvement of risk identification and assessment. This is a sharp contrast to the era when various kinds of pipeline data were stored or managed in an unstructured way.

Change Based On PBD

Viktor Mayer-Schonberger and Kenneth Cukier indicate that big data is a revolution that will transform how we live, work and think. Big data can result in major changes in the risk assessment and management of pipeline integrity technology and management as follows:

If N sample=All with 100% of certainty, the engineering risk assessment is not necessary.

Sampling is an outgrowth of an era of information-processing constraints in which people took measurements but lacked tools to extract all the data. In some cases, there is no other way but to sample.

However, limitations of today’s technology for data extraction in many industry areas no longer exist to this extent. In many areas, a shift is taking place from sampling (collecting) data to gathering as much data as possible, and if feasible, to getting everything, namely, Nsample=All.

Historically, risk assessment with various models has been used to predict the risk level of a pipeline or a segment with limited available information when the pipeline’s condition is not fully understood because of the lack of technology to get all the data.

In contrast, when sample size equals the population, Nsample=All, and the certainty and confidence level of the collected data also equals one, the calculation of the risk and reliability becomes simple. The traditional risk-assessment methods are no longer necessary.

In general, assessment of risk consists of two components: probability and consequence of failure. When the uncertainties and assumptions are clearly presented in the collected data, even though Nsample=All, the assessment of probability of failure (reliability) is still necessary.

However, when the assumptions and uncertainty factors (for example, ILI-sizing tolerance) become clearly understood within an appropriate safety factor (for example, the calibrated upper-bound sizing tolerance) in the assessment and repair, assumptions and uncertainty can be ignored, making the reliability assessment unnecessary.

Since the basis for understanding the pipeline’s status with PBD has been changed, the effectiveness of the traditional risk-assessment methods and resulting values need to be reviewed and changed. This reflects clearly in pipeline integrity evaluations made using big data.

The transformation in management can be seen when the data dominates the management level’s decision-making in investments as well as the determination of the pipeline lifetime. Conclusions and decisions are not made based on guess work or engineering judgment, but on data analysis.

PBD allows certain inexactitude, such as certain inaccuracy in ILI data, for example, tool-sizing tolerance. As long as the ILI tool performance (errors) being validated and quantified with probabilistic and statistical methods showing a high confidence level of the tool performance, and all the significant defects that are found in the PBD are repaired, then the traditional risk assessment with limited aboveground information and failure incident data is no longer necessary and should not be encouraged.

That is because the resolution and confidence level provided by the traditional risk assessment is much lower than from a PBD-based assessment. The safety status of the pipeline with the PBD-based assessment is already clearly understood on a joint-to-joint level.

Shift Of Risk-Assessment Model

Big data makes it possible to determine the risk on a joint-to-joint scale based on intrinsic causes rather than predicting the likelihood using external information. Big data has brought changes to understanding of accidents and corresponding risk-assessment methods.

Traditional risk-assessment methods in most cases either take the ratio of certain factors leading to the accidents derived from historical accidents statistics as the occurrence probability of accidents, or rely on engineering judgments based on the assessor’s personal knowledge and understanding of such accidents.

Therefore, it is common practice to predict the overall occurrence probability as well the possible presence of similar phenomenon, using the extrinsic information and historical data. PBD includes the historical data and covers a variety of information for each of the features in the pipeline.

The statistics on various high-occurrence factors causing pipe leaks have revealed the intrinsic causes of accidents and validated this with the historical incident data. Therefore, the conditions to assess the intrinsic causes to pipe leak or rupture are well defined.

Based on the tree life model of accident causation theory (TLMACT)3, risk analysis can determine intrinsic causes. “Duty ratio” (DR), which compares potential risk factors to the number of failure factors, is the key to this model. DR statistics based on certain rules or criteria focus on statistical analysis of potential intrinsic causes to accidents.

The higher percentage of accident-induced factors, the higher the probability of accidents at certain spots in the line. The leak/rupture probability could be judged through the analysis of leak/rupture conditions on specific spots or a cluster of features bearing the same characteristics.

Data Analysts’ Assessment

Traditionally, risk-assessment experts use their knowledge of risks applicable to the pipeline industry or rely on pipeline incident statistics to enable the risk assessment. With the availability of PBD, potential risks can be identified through statistics, according to certain rules or criteria.

Using a specific case, we can show the anomalies in repaired welds by magnetic flux leakage (MFL), combined with the statistics of the key factors in PBD, which have resulted in girth weld failure; we can identify those girth welds at higher risk of leakage.

The key factors in the PBD are manufacturing quality information for the weld, including field weld team in the weld company and post-welding x-ray inspection data. With the available PBD, identification of those girth welds at a higher risk of leakage is not complicated.

These key factors could have been neglected by a risk analysis expert, but would not be missed by the statistics of PBD. Irrespective of the relationship between data and risk-assessment experts or the relationship between causes and consequences, statistics of PBD focus on the intrinsic causes to accidents, based on certain rules or criteria.

Case Study

A pipeline leaked at a defective girth weld in 2013 (Figure 1). A PBD database (taking ILI results as the baseline) for this pipeline was created. By analyzing this pipeline’s whole-life data, it revealed the key factors associated with the leakage including welding process design, welding practice, quality of post x-ray non-destructive testing (NDT), quality of on-site supervision, welding and repair temperature (season and time-related), pipeline operating pressure, soil environment and ILI results.

If a girth weld meets one or more of the criteria, such as 1) the presence of metal loss identified by MFL ILI run, 2) a winter manufacture date, or 3) previous repairs, there is a high probability of leakage. If whole-life data associated with the girth weld leaking are considered as intrinsic factors, the probability of failure for all other girth welds can be ranked accordingly.

After the girth weld failure, the operator ranked the probability of failure for all other girth welds in accordance with three criteria. Based on the girth weld condition and the rankings, the girth weld is ranked from 1 to 7 with rank 1 being the highest and rank 7 being the lowest probability of failure. The sequence follows:

1. Girth welds meeting criteria a, b and c, simultaneously
2. Girth welds meeting criteria a and b, simultaneously
3. Girth welds meeting criteria a and c, simultaneously
4. Girth welds meeting criteria b and c, simultaneously
5. Girth welds meeting just criteria a
6. Girth welds meeting just criteria b
7. Girth welds meeting just criteria c

Definition of criteria:

Criteria “a” refers to girth welds determined to be unsafe using BS7910 (at level II), based on the weld defect dimensions as reported by MFL ILI.

Criteria “b” refers to girth welds reported by MFL ILI to contain metal loss with depth over 20% wt.

Criteria “c” refers to girth weld manufactured at tie-in in winter or weld repair in winter or girth welds reported by MFL ILI to contain anomalies.

Based on the traditional risk assessment, the common practice following the leakage accident is to increase the probability of leakage for all girth welds in the pipeline and deem the manufacturing faults in the pipeline to be at a higher risk level.

It is not feasible for the traditional risk assessment to identify and rank risks against each girth weld. Therefore, the traditional approach may result in higher risk score and increased cost to mitigate such risk. In contrast, PBD can rank and clear the condition for each of the girth welds in the pipeline without overspending for mitigation.

The PBD-based assessment concludes as to whether the pipeline assessed is fit-for-purpose for continuous operation safely. The PBD approach lays on a solid foundation for decision-making and thus enables transformation of the management mode from traditional to PBD.

Issues To Explore

Knowledge about big data at the current stage is not totally comprehensive. The available data is not fully taken advantage of and much data have not been included in the big data. There are many issues that need be thought through and explored. Among them:

• How to determine the collected full big data reaching a level of Nsample=All.
• How to consider the relationship between Big Data and accuracy?
• How to consider the relationship between big data and cause-consequence?
• How to prevent “survivorship bias” phenomena from occurring?

Big data will lead to a revolution in the pipeline operation in a safe and cost-effective way. However, there is a possibility that the management cost could increase and management efficiency could decrease if the big data-based risk-management mode is left unchanged. Consequently, the following key factors need to be considered:

• Should all data associated with the pipeline be included in the big data management system?

• Data analysis needs to be performed by the data analysts. However, selective use of data by the analysts is not allowed. Theoretically, risk assessment is required if the collected data cannot reach the level of Nsample=All.

• If there is any change in the risk-assessment methods, risk-acceptance criteria with the availability of big data needs be re-considered.

As the study of the effect of PBD on pipeline risk-assessment technology and management is at an early stage, priority needs to be given to the following aspects:

• Research into basic principles and management mode of PBD, as well as creation of PBD database

• Creation of new theory on the accident causation so as to set up risk-assessment theory and model-based on intrinsic causes


PBD as well as the transformation brought forward by PBD in the field of pipeline risk assessment are at a very early stage of exploration. With the availability of big data, fitness-for-purpose assessment using the intrinsic data will take the place of traditional risk assessment, based on external superficial data. This will bring a thorough change in methods and operation mode.

It will not only change the concept of cognition, the way of thinking, and the system of relevant technology with respect to pipeline risks, but also pipeline construction, operation systems and methods in a more comprehensive way. It will also change administrative, regulatory and other decision-making on the part of the government and the public.

Forming of PBD will be the development trend in the future and will surely lead to a revolution in both risk-management technology and management. As a result, research and development of new risk-assessment technology will be the only option in the future.


1Peter Tuft, Nader Yoosef-Ghodsi, John Bertram, “Benchmarking Pipeline Risk Assessment Processes,” IPC 2012-90045, Sept. 24-28, 2012, Calgary, Alberta, Canada.
2 Viktor Mayer-Schonberger & Kenneth Cukier. „Big Data: A Revolution that Will Transform How We Live, Work and Think,” [M] Boston: Houghton Mifflin Harcourt Publishing Company?2013.
3 Qingshan Feng, ‘Tree Life Model of Accident Cause Theory,” Oil & Gas Storage and Transportation, 2014, [2]
4 Chris Anderson, “The Long Tail: Why the Future of Business is Selling Less of More,” Oversea Publishing House.

Find articles with similar topics