USDOEMATRIC is using ‘big data’ computing to help the Department of Energy’s (DOE) National Energy Technology Laboratory (NETL) to perform or enhance data mining, analysis, simulation and modeling of high volume, velocity, and variety of spatial and non-spatial data.  Big data computing is a combination of hardware (computing clusters) and software technologies (Hadoop, Spark, MapReduce) that make it possible to realize value from “Big Datasets” that are too large for a single computer to process. Big data computing shares similarities with high performance computing in that both approaches utilize clusters of computers (cloud or locally hosted) to distribute and accomplish complex tasks. However, big data computing is specifically designed to process and analyze larger scale datasets.

The initiative has multiple on-going big data computing tasks that are integrating Hadoop clusters with geoscience computing and geospatial analytical methods (including ESRIs ArcGIS tools and libraries) to develop approaches capable of processing massive amounts of data. One such effort focuses on cataloging, discovery and data mining by sifting through terabytes of unstructured data from multiple sources (web crawling, document parsing, geospatial files/services) to correlate relevant data using natural language processing and machine learning. The result is a recommendation engine that promotes discovery by performing a deep analysis of the data to generate a relevance score for how well it correlates with other data catalogued by the system (i.e., similar to “” recommendations for data rather than products). Additionally, the team has ‘Big Data GIS’ efforts under development that are harnessing Hadoop-based computing clusters to perform distributed, iterative analysis of geospatial data to gain insights into subsurface and geospatial studies (e.g. induced seismicity risk).

Vic Baker, MATRIC senior systems engineer, has been leading the development of ‘big data’ clusters and applications in collaboration with researchers from  DOE-NETL’s Research Innovation Center. Vic recently presented a paper which highlighted aspects of this collaboration with DOE-NETL researchers at the 50th US Rock Mechanics/Geomechanics Symposium June 26-29 2016, in Houston.  The paper, titled, “Computational Advances and Data Analytics to Reduce Subsurface Uncertainty,” details the team’s big data research for offshore hydrocarbon, carbon storage, enhanced geothermal, unconventional resources and underground fluid disposal. Vic also co-presented with DOE-NETL researcher and contract lead Dr. Kelly Rose at the DOE-NETL hosted “2016 Mastering the Subsurface Through Technology Innovation and Collaboration: Carbon Storage and Oil and Natural Gas Technologies Review Meeting” in Pittsburgh, Pa., on August 18, 2016.

This team is continuing to innovate by combining big data computing capabilities with geoprocessing tools and customer algorithms to address energy resource and environmental challenges.

— For more information on ‘big data,’ contact Mark Dehlin.