• home
  • | oseblog
  • | organized chaos: structuring images, PDFs, contracts, surveys, and more

organized chaos: structuring images, PDFs, contracts, surveys, and more

organized chaos: structured data

The Times Are A-Changin’

When I started my career in Oil & Gas more than 25 years ago, things were different. We did a lot of work on paper.  I remember my first job after graduating. I was a reservoir engineer and was assigned responsibility for a newly acquired asset.  A mountain of well files were brought to my office in three ring binders and boxes. In school it was easy to do volumetric calculations since all the data was nicely provided to you.  However in the real world, I spent countless hours searching first for the wells of interest and then for the needed reservoir details.

Fast forward to the age of digitalization, Big Data, Artificial Intelligence and every other buzz word associated with Computer Science and Silicon Valley, and you quickly see how the dusty paper files of yester-year are becoming more important. Yes… I said more important. (You expected me to say obsolete, didn’t you?)

The fact is, these documents contain valuable information, and not just for historical preservation and archiving. I’m talking about untapped resources hosting a wealth of information that could literally change the way E&P Operators and Oilfield Service Companies operate. Buried deep within contracts, surveys, permits, legal agreements, purchase orders, BoMs (Bill of Materials) and more, you can find unharnessed data that reveals patterns and behaviors if synthesized, analyzed and visualized properly.


First Things First

A quick vocabulary lesson so we’re all on the same page. The data I referenced previously is considered unstructured. If you Google ‘unstructured data’, Wikipedia gives you this definition, “Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.”

Clear as mud isn’t it?

Basically unstructured data is any information that has not been extracted, normalized and organized in a meaningful, consistent manner for search and retrieval. For example, the pictures we take every day would be considered unstructured. Even though you might add some metadata such as place, date, or context, for the majority of people, an exhaustive search of your image library to find all the photos containing trees, for example, would be almost impossible to do without human involvement. If you have not properly indexed every image and organized the information (a very time-consuming process), you would have a hard time accurately analyzing the number of pictures in your collection that contain trees, let alone figure out the ratio of pine tree vs. oak tree or night vs. day images.

This is a rudimentary example, but you get the picture (no pun intended). Unstructured data is any information that isn't specifically structured (organized) to be easy for machines to understand. Historically, virtually all computer code required information to be highly structured according to a predefined data model in order to be processed. For example, relational databases organize data into tables, rows and fields with constrained data types.

Unfortunately, real world information isn't like this. It’s predominantly unstructured data that cannot be searched or analyzed easily, representing a huge opportunity - particularly in the Oil and Gas Industry.  


structured data vs unstructured data

So Why Now?

Advances in Artificial Intelligence, in particular Machine Learning using natural language process logic, is able to learn contextual syntax to identify relevant information in unstructured documents. This means, using our image library example above, a machine could learn how to identify trees by training a model. Likewise, you could train the computer to distinguish the difference between night and day.

Technology and computing capacity have opened the door to build these models rapidly, calibrated by human insight, which significantly reduces the time and resource commitment of manual data input and organization. This has a number of advantages including:

  1. Speed - what would take humans an inordinate amount of time to accomplish, a machine can analyze and categorize efficiently.
  2. Accuracy - data integrity and consistency in human-organized information is a major pitfall. Ensuring each team member catalogs and tags information in a useful manner is a monumental task. Machines have a proven track record of delivering accuracy once the model is sufficiently trained.
  3. Cost - With efficiency and accuracy comes cost-savings advantages.

For example, at Oseberg we’ve utilized technology to deliver unprecedented quality in our Oklahoma, Texas and New Mexico E&P datasets (lease, regulatory, production, wells, etc). We’ve coupled domain knowledge and expertise with advanced technology to rapidly assess millions of records efficiently and cost-effectively. To this end, we evaluated available solutions on the market, but didn’t find any that would satisfy our needs and specific E&P use-cases. Ultimately, we wound up investing in and developing our own proprietary platform capable of processing large amounts of unstructured documents so we could avoid hiring an army of people to manually type in the information - an expensive and time-consuming endeavor.

What this allows us to do is not only bring data to market quickly (an operational advantage), but also provides the backbone of an incredibly valuable and robust dataset with much greater detail for our clients to build brand new workflows that otherwise would not be possible.


Consider This Scenario

When acquiring assets, you typically have up to 90 days to conduct due diligence on leases.  This is normally a time-consuming, manual effort; therefore, it is typically focused solely on the high-value leases and the formations of most interest in order to keep costs under control. The purchaser knows the basics to get the deal done, but does not have a good understanding of all the assets in terms of depth restrictions, royalty deductions, etc. which could be liabilities down the road.

A recent example we found involved a “damages clause” that requires the lessee to pay the lessor “$1000/acre in damages” in the event they cannot successfully pool the surrounding interests required to drill a well.  This is an unusual term. Using our Full Text Search capability across our entire TX county records dataset we were able to instantly identify an additional 141 leases in the surrounding area that contained that same punitive liability.

But how would you do this beyond a singular lease, or at larger scale… say, county-wide?

What new insights might you unveil by integrating datasets and analyzing the patterns to look for trends? Could you gain a competitive advantage, optimize your cost structure, or conduct more thorough due diligence prior to buying or selling assets?

Considering that much of this valuable data is locked away in file cabinets-worth of unstructured legal documents, you would need an army to synthesize and analyze the information. Imagine what lies in wait if you could simply search and extract the data points that would reveal a substantial financial liability that could easily have been missed in due diligence because the trigger is literally buried in the ‘fine print’”.

That’s what technology used in parallel with subject matter expertise can provide - organization to the chaos that delivers true business intelligence.

Do we have any images from our workflows, software or POCs that might support this section?



Dean Williams, Managing Director - Oseberg Consulting

Learn more about Dean >


need help?