Key Considerations For Managing Unstructured Data

July 2013, Vol. 240 No. 7

Shahbaz Ali, Chief Executive Officer, Tarmin

Oil, gas and energy companies generate huge volumes of data through sensitive equipment – volumes that roughly double every two years. Increasing amounts of data and an increased focus on data protection and reliability have resulted in a surge of interest regarding unstructured data management technologies.

In oil and gas companies, unstructured data loads are increasingly generated not by a human workforce, but by machines – from scientific instrumentation, such as seismic exploration equipment, industrial sensors and meters, calibration and monitoring devices, all producing exceptionally precise measurements – all sending an influx of unstructured data each day, in the scale of petabytes. The latest seismic sensor technology alone is driving significant growth in data assets. Visualization and modeling applications, many with unique data types, are quickly multiplying the amount of data being stored.

With 80% or more of all data estimated to be unstructured, there are many opportunities to improve what we refer to as the Three Pillars of Unstructured Data Management:

• Storing unstructured data
• Controlling risk
• Understanding information
As the volume and velocity of data in the industry increase, so do the challenges involved in data management. The costs and overheads involved in storage continue to increase, as do the difficulties in searching for and retrieving the right data at the right time. Critical metadata associated with industry-specific file types, such as .segy files, are difficult, if not impossible, for many data management platforms to interpret, understand and process.

A unified approach to unstructured data management can positively position almost any organization to gain benefits by reducing lifetime total cost of ownership and assist in achieving a competitive advantage. Optimizing and strategically leveraging information assets will help accelerate discovery of hydrocarbons, streamline production operations, reduce operating costs, enhance compliance, minimize downtime, enable collaboration and mitigate risk.

Thinking Ahead
Exponential data growth is the primary IT challenge facing oil, gas and energy enterprises today. A relatively manageable 50 terabytes (TB) today, growing at a modest rate of 30%, becomes nearly 110 TB in three years’ time. Seven short years later, it’s almost 700 TB. That’s a 1,378% increase, and an enormous amount of data to manage.

With data retention periods growing, chances are IT will have to go through multiple technology refreshes before today’s data is retired. The organization will still need access to this information for innovation and discovery. Storage may be painful to manage today, and may be too expensive already, but in three or seven years, it will be oppressive to manage.

Focusing on the challenges of today, without setting aside sufficient time and money to address the challenges of tomorrow, results in lost opportunities and potentially dire circumstances. As the old proverb says, the best time to have planted a tree is 20 years ago; the next best time is now. Don’t wait until it’s too late to think ahead.

Massive growth requires massive scalability, and different platforms offer different trade-offs. We believe that x86/64 architectures harnessing the on-demand elastic power of the grid will be the basis for virtually all major large-scale data management projects going forward. What may follow is less easy to predict, but in almost every case, scalability and performance must be key considerations. If a particular solution wasn’t originally built to be both scale-up and scale-out from day one, it’s not likely to be capable of the demands that growing data will place on it.

Today, cloud-enabled architectures are delivering scalability admirably. As cloud technologies further mature, numerous advances in managing unstructured data are anticipated. In the near future, look to advances in speed, improved accessibility and security to drive new use cases and adoption. In the longer term, the sky is the limit with the cloud and its potential to manage unstructured data. The cloud has and will continue to positively affect how we manage unstructured data, and “cloud-ready” scalable storage will doubtless reduce overall management costs in the long run.

Application Integration
Unstructured data within an organization is frequently at risk: It is dependent upon an application to be rendered, it often contains sensitive information, it is often unencrypted, and it is often the subject of electronic discovery requests – a veritable cornucopia of risk.

This data also happens to be among the most valuable of all information. In the oil and gas industry, discovery data like seismic images and geographic surveys aren’t only worth potentially billions of dollars, it costs millions to produce in the first place. Every day, the industry grows more dependent on the digital oil field, i-field, e-field and the specialized data that lies beneath it.

This data relies on an application to provide continuous access and value. Too often the application itself is not a primary consideration when thinking about managing unstructured data – but it really needs to be. Beyond a shadow of a doubt, unstructured data management solutions must be compatible with key applications and investments. Otherwise, data is put at additional risk for future access or protection.

Data protection is a significant concern in the energy industry. Since information is such a key asset to energy organizations, long-term data retention is mandatory. The integrity of the data must outlast the hardware refreshes and physical failures. Additionally, data security and auditing is a key concern for the highly sensitive information. Finally, energy commodities organizations are subject to significant regulatory retention and data compliance concerns.

An information governance framework in today’s networked and hypercompetitive, information-centric economy needs a fully policy-driven toolkit for retention and tamper-proof protection of data resources. Unstructured data is ubiquitous. Even if it’s outside the corporate firewall, it’s still a potential problem. International Data Corporation (IDC) estimates that corporations have responsibility for as much as 80% of total global data at some point during that data’s lifecycle; data is everywhere and, therefore, protecting it must be all-encompassing and pervasive.

Big Data

“Big Data” has become the new favorite term of every vendor, analyst and media type. The excitement around Big Data is not without merit. Application-aware data mining and analytics reporting is possible today, based on data flows from unstructured data stores. Understanding underlying trends gives oil and gas enterprises actionable insights from their data, such as the ability to preempt future threats or capitalize on opportunities.

But recent research from IDC concludes that today only 3% of data is tagged and less than 0.5% is analyzed. IDC further estimates that by 2020, as much as 23% of data will contain value that modern Big Data techniques and technologies can extract and harness. This represents an enormous new reservoir of potential that will create as yet undreamt-of value. Promising sources of data containing this hidden value include modeling and simulations, GIS system data, video, embedded device data, social media data, voice and Internet traffic data. As we live amid unprecedented data growth, we must begin laying the foundations and exploring the tools and technologies that will help position us to capture more value from data. Big Data and its potential make it a key consideration in managing unstructured data.

Unstructured data growth presents both great challenges and opportunities. Data management platforms must manage the growth of exploration and discovery data from upstream processes, and deliver data retention, protection and high availability access at lower cost.

Strategies for unstructured data management are not limited to the key considerations raised here. There are other circumstances within each industry and each operation. In the oil and gas sector, storing unstructured data, controlling risk and understanding information is critical for reducing overall costs and maintaining a competitive advantage.

Shahbaz Ali has more than 20 years’ experience creating world-class software-driven solutions for enterprise clients. Ali holds a bachelor’s of science degree in software engineering from London Southbank University and has completed a doctoral course of study in software requirements engineering from Open University. Tarmin is a global organization with headquarters in Cambridge, MA and offices in Greater London and Bulgaria, with additional locations scheduled to open shortly.

