Berkeley Lab

A New Tool to Integrate Diverse Environmental Data

Conceptual figure showing how the BASIN-3D broker would connect to various data sources across organizations and present an integrated view of the data to the user.

The Science

Earth data include measurements and model results of physical, chemical, and biological processes in ecosystems. The data are diverse and often stored across many databases, with different formats and conventions. BASIN-3D is a tool that helps lower the burden on scientists to integrate data for their research. It is designed as a “broker” that retrieves data on demand from different sources and transforms it into a unified view. This paper presents two applications of BASIN-3D to integrate time series (data collected at different time intervals). The first is for advanced search and exploration of data on a web portal. The second is to provide data to machine learning models for water quality predictions.

The Impact

The BASIN-3D software helps address some critical challenges faced by environmental researchers who use data from public and private sources. It helps automate the process of pulling together data from different sources. Thus it enables users to have access to the latest data available from providers of their choice, without having to manually download data and reconcile differences. This software can be used to support data integration for both web-based tools, as well as data analytics. It is applicable to environmental field and modeling studies requiring data integration.

Summary

Earth scientists expend significant effort integrating data from multiple data sources for both modeling and data analyses. We introduce BASIN-3D (Broker for Assimilation, Synthesis and Integration of eNvironmental Diverse, Distributed Datasets) as a data brokering approach to reduce the data processing burden on scientists. BASIN-3D can synthesize diverse data from different sources on demand, without the need for additional storage. BASIN-3D is currently implemented to integrate time series earth observations across a hierarchy of spatial locations commonly used in field measurements (such as river basins, watersheds, sites, plots, wells). It has a framework to enable its users to map data sources of interest to a common format. The utility of this tool is demonstrated in two applications. The first is a web portal that allows scientific users to explore and access data through features such as an interactive map, graphs, and download. The second is a python package that can be embedded in scripts to input data to machine learning models for water quality predictions. Hence BASIN-3D can be used to support data integration for both web-based tools as well as data analytics.

Citation

Varadharajan, C. et al. BASIN-3D: “A brokering framework to integrate diverse environmental data”. Computers & Geosciences 105024 (2022) [DOI:10.1016/j.cageo.2021.105024].