Berkeley Lab

A New Tool to Integrate Diverse Environmental Data

Conceptual figure showing how the BASIN-3D broker would connect to various data sources across organizations and present an integrated view of the data to the user.
Figure from BASIN-3D Documentation

The Science

Earth data include measurements and model results of physical, chemical, and biological processes in ecosystems. The data are diverse and often stored across many databases, with different formats and conventions. BASIN-3D is a tool that helps lower the burden on scientists to integrate data for their research. It is designed as a “broker” that retrieves data on demand from different sources and transforms it into a unified view. This paper presents two applications of BASIN-3D to integrate time series (data collected at different time intervals). The first is for advanced search and exploration of data on a web portal. The second is to provide data to machine learning models for water quality predictions.

The Impact

The BASIN-3D software helps address some critical challenges faced by environmental researchers who use data from public and private sources. It helps automate the process of pulling together data from different sources. Thus it enables users to have access to the latest data available from providers of their choice, without having to manually download data and reconcile differences. This software can be used to support data integration for both web-based tools, as well as data analytics. It is applicable to environmental field and modeling studies requiring data integration.

Summary

Synthesis and Integration of eNvironmental Diverse, Distributed Datasets) as a data brokering approach to reduce the data processing burden on scientists. BASIN-3D can synthesize diverse data from different sources on demand, without the need for additional storage. BASIN-3D is currently implemented to integrate time series earth observations across a hierarchy of spatial locations commonly used in field measurements (such as river basins, watersheds, sites, plots, wells). It has a framework to enable its users to map data sources of interest to a common format. The utility of this tool is demonstrated in two applications. The first is a web portal that allows scientific users to explore and access data through features such as an interactive map, graphs, and download. The second is a python package that can be embedded in scripts to input data to machine learning models for water quality predictions. Hence BASIN-3D can be used to support data integration for both web-based tools as well as data analytics.

Citation

Varadharajan, C. et al. BASIN-3D: “A brokering framework to integrate diverse environmental data”. Computers & Geosciences 105024 (2022) [DOI:10.1016/j.cageo.2021.105024].

A New Tool to Integrate Diverse Environmental Data

Conceptual figure showing how the BASIN-3D broker would connect to various data sources across organizations and present an integrated view of the data to the user.

The Science

Earth data include measurements and model results of physical, chemical, and biological processes in ecosystems. The data are diverse and often stored across many databases, with different formats and conventions. BASIN-3D is a tool that helps lower the burden on scientists to integrate data for their research. It is designed as a “broker” that retrieves data on demand from different sources and transforms it into a unified view. This paper presents two applications of BASIN-3D to integrate time series (data collected at different time intervals). The first is for advanced search and exploration of data on a web portal. The second is to provide data to machine learning models for water quality predictions.

The Impact

The BASIN-3D software helps address some critical challenges faced by environmental researchers who use data from public and private sources. It helps automate the process of pulling together data from different sources. Thus it enables users to have access to the latest data available from providers of their choice, without having to manually download data and reconcile differences. This software can be used to support data integration for both web-based tools, as well as data analytics. It is applicable to environmental field and modeling studies requiring data integration.

Summary

Earth scientists expend significant effort integrating data from multiple data sources for both modeling and data analyses. We introduce BASIN-3D (Broker for Assimilation, Synthesis and Integration of eNvironmental Diverse, Distributed Datasets) as a data brokering approach to reduce the data processing burden on scientists. BASIN-3D can synthesize diverse data from different sources on demand, without the need for additional storage. BASIN-3D is currently implemented to integrate time series earth observations across a hierarchy of spatial locations commonly used in field measurements (such as river basins, watersheds, sites, plots, wells). It has a framework to enable its users to map data sources of interest to a common format. The utility of this tool is demonstrated in two applications. The first is a web portal that allows scientific users to explore and access data through features such as an interactive map, graphs, and download. The second is a python package that can be embedded in scripts to input data to machine learning models for water quality predictions. Hence BASIN-3D can be used to support data integration for both web-based tools as well as data analytics.

Citation

Varadharajan, C. et al. BASIN-3D: “A brokering framework to integrate diverse environmental data”. Computers & Geosciences 105024 (2022) [DOI:10.1016/j.cageo.2021.105024].

Advanced Methods to Better Predict Watershed Responses to Climate Change

The Watershed zonation method applies unsupervised clustering to various spatial data layers for grouping hillslopes with similar above/below-ground environmental features

The Science

More than half of earth’s freshwater comes from mountainous watersheds. Watersheds are a “system of systems,” meaning there are many interacting compartments – such as bedrock, soil, and snow plants – that affect their functioning. Predicting watershed behavior is challenging because there are different environmental processes and characteristics– both at different scales and levels, from bedrock to the atmosphere– that affect watershed function and water quality. To understand how watersheds may respond to droughts as climate changes, researchers used data from the Colorado East River Watershed to develop a watershed zonation approach–a method that uses machine learning to characterize entire watersheds by grouping zones of similar functioning and characteristics, like watershed “zip codes.” The team grouped hillslopes specifically since these features are a functional unit in hydrology, capturing waterflow and a range of environmental characteristics like elevation, topography, and vegetation. This method not only combines data of multiple state-of-the-art arbonne remote sensing data layers of multiple types and scales to identify zones with similar bedrock-to-canopy features, but also shows how these areas respond to disturbances in different ways to advance holistic and large-scale predictions of watershed response to change.

The Impact

Watershed function can significantly impact energy production, agriculture, and water quality and availability. Now that environmental disturbances such as drought, wildfires, and floods mark what many have called a “new normal” state, scientists can no longer depend on historical trends to project future watershed behavior, but instead need to develop new approaches to studying watershed response to environmental changes. However, predicting watershed behavior is challenging because watersheds are extremely heterogeneous including the complex interactions taking place across different Earth compartments from tree canopy to the deep subsurface as well as from one hillslope in a watershed to another. – Using machine learning, researchers organized the watershed research site into zones based on similar environmental features, and were able to show how different zones process/export nutrients and respond to droughts. By using multiscale spatial data layers to capture different characteristics throughout a watershed, this approach allows for more accurate large-scale predictions of watershed responses to climate change. Understanding these responses is critical for managing and protecting critical freshwater resources as water demand continues to increase.

Summary

In this study, we develop a watershed zonation approach for characterizing watershed organization and function in a tractable manner by integrating multiple spatial data layers. We hypothesize that (1) a hillslope is an appropriate unit for capturing the watershed-scale heterogeneity of key bedrock-through-canopy properties, and for quantifying the co-variability of these properties representing coupled ecohydrological and biogeochemical interactions; (2) remote sensing data layers and clustering methods can be used to identify watershed hillslope zones having the unique distributions of these properties relative to neighboring parcels; and (3) property suites associated with the identified zones can be used to understand zone-based functions, such as response to early snowmelt or drought, and solute exports to the river. We demonstrate this concept using unsupervised clustering methods that synthesize airborne remote sensing data (LiDAR, hyperspectral, and electromagnetic surveys) along with satellite and streamflow data collected in the East River Watershed, Crested Butte, Colorado, USA. Results show that (1) we can define the scale of hillslopes at which the hillslope-averaged metrics can capture the majority of the overall variability in key properties (such as elevation, net potential annual radiation and peak SWE), (2) elevation and aspect are independent controls on plant and snow signatures, (3) near-surface bedrock electrical resistivity (top 20 m) and geological structures are significantly correlated with surface topography and plan species distribution, and (4) K-means, hierarchical clustering, and Gaussian mixture clustering methods generate similar zonation patterns across the watershed. Using independently collected data, we show that the identified zones provide information about zone-based watershed functions, including foresummer drought sensitivity and river nitrogen exports. The approach is expected to be applicable to other sites and generally useful for guiding the selection of hillslope-experiment locations and informing model parameterization.

Citation

Wainwright, H. M., Uhlemann, S., Franklin, M., Falco, N., Bouskill, N. J., Newcomer, M., … & Hubbard, S. S. (2022). Watershed zonation approach for tractably quantifying above-and-belowground watershed heterogeneity and functions. Hydrol. Earth Syst. Sci., 26, 429–444, 2022, DOI: 10.5194/hess-26-429-2022.

Computer Science Fills Groundwater Data Gaps and Advances Water Level Predictions

We develop an approach for estimating missing groundwater data at a study site located in the East River watershed, a high-elevation catchment in southwestern Colorado (a). Seven monitoring wells (WLE1 to WLE7, marked as 1 to 7) are located in the East River watershed floodplain (b).

The Science

Sixty to 90 percent of the world’s water comes from alpine watersheds, but without continuous data about characteristics like groundwater levels, temperature, and precipitation it can be nearly impossible for scientists to understand groundwater dynamics well enough to help predict the amount or quality of water coming from those mountains. Machine learning (ML)–using computer science to make predictions and inferences about data–can be used to help estimate missing data due to power outages, failures in the equipment used to gather the data, and extreme weather events from previous datasets gathered on these features. Researchers recently evaluated several ML-based techniques to infer data missing from datasets previously obtained at multiple wells in the East River Watershed located in southwestern Colorado. The team developed a new sequential approach to use existing data from previous time periods to estimate the missing extremes of a hydrograph, which shows the rate of water flow over time. This approach allows for missing groundwater data in the East River to be estimated with high accuracy.

The Impact

Environmental datasets such as groundwater data are often incomplete and contain missing entries due to various reasons such as adverse weather conditions or delays in collecting sensor data. Scientists rely on data about previous groundwater levels to predict the availability, quality, and function of freshwater. However, without continuous data sets, it is challenging to use scientific models that require this data to properly predict groundwater functioning. Researchers showed that ML techniques could be used to fill in gaps in these data series using previous data from a single well or data from surrounding wells. Overall, this new approach can be transferable to gap-fill other environmental datasets like precipitation and soil moisture. Complete groundwater and other environmental data are critical to monitor how freshwater and other natural resources may change as climate and environmental conditions change.

Summary

It is not uncommon for groundwater data series to have missing records due to factors like malfunctioning technology and physical disturbances. Researchers explored several techniques to gap-fill groundwater datasets, focusing on missing data patterns that are either random, such as data missing from one day in a series of several days, or contiguous gaps, such as a lack of data for an entire month during an observed time period. The researchers considered data from both single and multiple wells, looking to gap-fill missing groundwater entries in a well using that same well’s time-series data in the case of single wells, and for multiple wells using available data from neighboring wells to gap-fill a specific well’s missing groundwater data. They compared three machine-learning methods to understand which was better at estimating missing data for either the random or contiguous patterns. All three were shown to estimate up to 90% of random gaps in the groundwater time series over a two-year period. Multiple-well methods could effectively estimate up to 50% of missing contiguous gaps, but failed to capture extremes for the same period. The research team has developed an effective strategy to capture missing extremes in the groundwater time series and demonstrated its application across multiple wells in the Colorado East River floodplain.

Citation

Dwivedi, D., Mital, U., Faybishenko, B., Dafflon, B., Varadharajan, C., Agarwal, D., Williams, K H., Steefel, C. I. and Hubbard, S. S. Imputation of Contiguous Gaps and Extremes of Subhourly Groundwater Time Series Using Random Forests, Journal of Machine Learning for Modeling and Computing, Volume 3, 2022, Issue 2, DOI: 10.1615/JMachLearnModelComput.2021038774

Field-scale Estimation of Soil Properties from Spectral Induced Polarization Tomography

2D Estimates of cation exchange capacity, water content, grain size and permeability were obtained along a 45m ecosystem transect through development and demonstration of a field geophysical method called spectral induced polarization tomography. Image courtesy of the authors (A. Revil et al.)

The Science

Properties that influence how fluids flow and react in soils are difficult to measure using conventional techniques. This challenge stems from the time and cost involved to collect and analyze soil samples, and because collection of the soil samples can disturb the property of interest. This study describes how a surface geophysical approach, called spectral induced polarization tomography, can be used to estimate soil chemical and physical processes over field scales without disturbing the soil. The authors demonstrated the geophysical approach by collecting data along an ecosystem transect. Analysis of the data led to high-resolution estimates of important soil properties over the top 4 meters of soil along the transect, including cation exchange capacity, water content, grain size and permeability. Comparison of the obtained estimates with lab and soil core measurements indicated good agreement.

The Impact

This is the first ever field-scale estimation of soil hydrogeochemical properties using a geophysical approach called spectral induced polarization tomography. The ability to remotely quantify soil hydrological and geochemical properties in high resolution and over field-relevant scales, as demonstrated by this study, is expected to be useful for many applications, including watershed and ecosystem investigations, geotechnical engineering, and agriculture.

Summary

Estimates of soil properties such as Cation Exchange Capacity (CEC), water content, grain size and permeability are important in geotechnical engineering, water resources, and agriculture. We develop a non-intrusive approach to estimate these properties in the field using spectral induced polarization (SIP) tomography. This geophysical method provides information about the frequency dependence of the complex electrical conductivity of porous media. Using 18 soil samples collected from a managed ecosystem, we first conducted a laboratory study using SIP over the frequency range 10 mHz-45 kHz. The laboratory data were used to confirm the accuracy of a recently developed dynamic Stern layer petrophysical model. A comparison was made by comparing the field complex conductivity spectra and the experimental data at two locations where core samples were obtained. The model was then used in concert with field data to image the spatial distribution of CEC, water content, permeability, and mean grain size along a 2D transect. For clay and sandy textures found in the field, good agreement was found between measured and estimated CEC values. Our approach provides an efficient way to estimate important soil properties in a non-invasive manner, in high resolution, and over field-relevant scales of the critical zone of the Earth.

Citation

Revil, A., Schmutz, M., Abdulsamad, F., Balde, A., Beck, C., Ghorbani, A., Hubbard, S.S. (2021). Field-scale estimation of soil properties from spectral induced polarization tomography. Accepted in Geoderma. DOI: 10.1016/j.geoderma.2021.115380

What a Low-to-No-Snow Future Could Mean for the Western U.S.

The Science

Mountain snowpack acts as a large natural reservoir, providing water resources to communities, ecosystems, energy and industry upon spring snowmelt. Because up to 75% of western the region’s water resources originate in mountainous watersheds, decreasing snowpack threatens resiliency of the systems that depend on snowmelt water. This research synthesizes historical observations of western U.S. snow loss over the 20th century and develops a range of projected snowpack conditions in the 21st century. This study highlights that it is likely that western U.S. snowpack will decrease substantially over the next ~35-60 years, especially if high greenhouse gas emissions continue.

The Impact

Comparable to recent western snowpack declines, future snow losses are projected to decrease 20-30% by the 2050s and 40-60% by the 2100s. But there’s potential to build resilience to future low-to-no snow conditions using a portfolio of adaptation strategies. Models used to project future water cycle changes need to be improved to provide water resource managers with estimates that are better suited to decision making. The development of new atmosphere-through-bedrock modeling capabilities are needed, and could greatly benefit from non-traditional scientific-stakeholder partnerships.

Summary

This study synthesizes observational evidence of snow loss in the western U.S. over the 20th century and develops a range of projected snowpack conditions in the 21st century, elevating the understanding and importance of snow loss on water resources. Results show that there is less consensus on the time horizon of future snow disappearance, but that model projections suggest that if carbon emissions continue unabated, low-to-no snow conditions will become persistent in ~35–60 years, depending on the mountain range. We propose a new low-to-no snow definition which uses a percentile approach, akin to the U.S. Drought Monitor, and considers sequencing of 1, 5, or 10 low-to-no snow years via a framework describing those losses as “extreme, episodic, or persistent.” Potential trickle down impacts on mountain landscapes, hydrologic cycles, and subsequent water supply are also discussed. For example, diminished and more ephemeral snowpacks that melt earlier will alter groundwater and streamflow dynamics, but the direction of these changes are difficult to constrain given competing factors such as higher evapotranspiration, altered vegetation composition, and changes in wildfire behavior in a warmer world. A re-evaluation of long-standing hydroclimatic stationarity assumptions in WUS water management is urgently needed, given the impending impacts of snowpack loss. These hydroclimatic changes undermine conventional western U.S. water management practices, but through proactive implementation of soft and hard adaptation strategies, there is potential to build resilience to extreme, episodic and, eventually, persistent low-to-no snow conditions. Finally, suggestions are provided for the scientific breakthroughs, management strategies, and institutional partnerships that will be needed to overcome a future with less or no snow. Co-production of knowledge between scientists and water managers can help to ensure that scientific advances provide actionable insight and support adaptation decision-making processes that unfold in the context of significant uncertainties about future conditions.

Citation

Siirila-Woodburn, E.R*., A.M. Rhoades*, et al. “A low-to-no snow future and its impacts on water resources in the western United States” Nature Reviews Earth and Environment. (2021) [DOI: 10.1038/s43017-021-00219-y] Open access: https://rdcu.be/cAivm. *Equally contributing first-author.

A hybrid data-model approach to map soil thickness in mountain hillslopes

Soil thickness map. (a) The map of soil thickness from modeling. (b1) and (b2) Comparison between model and field measurements for the south-facing and north-facing hillslope, respectively. The error bars along the x-axis are the differences between auger and CPT data. Gray and green dots present the bottom of the sampling site is bedrock and saprolite, respectively.

The Science

Soil thickness plays a central role in the interactions between vegetation, soils, and topography where it controls the retention and release of water, carbon, nitrogen, and metals. However, mapping soil thickness—here defined as the mobile regolith layer—at high spatial resolution remains challenging. An accurate soil thickness map can improve the estimation of water, carbon, nitrogen, and other elements dynamics for hydrologic and biogeochemical modelling, but soil thickness remains one of the key uncertainties because of the complexity of factors that affect soil thickness.

The Impact

A new hybrid model combines a process-based model with empirical relationships to reveal the fundamental mechanisms of soil thickness and understand the spatial variability. This hybrid model generalizes the mechanisms and is therefore applicable to various sites. The soil thickness map can be an essential input for Earth System model, particularly for land surface models.

Summary

Here, the authors develop a hybrid model that combines a process-based model and empirical relationships to estimate the spatial heterogeneity of soil thickness with fine spatial resolution (0.5 m). The authors apply this model to two aspects of hillslopes (southwest- and northeast-facing, respectively) in the East River Watershed in Colorado. Two independent measurement methods—auger and cone penetrometer—are used to sample soil thickness at 78 locations to calibrate the local value of unconstrained parameters within the hybrid model. Sensitivity analysis using the hybrid model reveals that the diffusion coefficient used in hillslope diffusion modelling has the largest sensitivity among all input parameters. In addition, our results from both sampling and modeling show that, in general, the northeast-facing hillslope has a deeper soil layer than the southwest-facing hillslope. By comparing the soil thickness estimated between a machine learning approach and this hybrid model, the hybrid model provides higher accuracy and requires less sampling data. Modeling results further reveal that the southwest-facing hillslope has a slightly faster surface soil erosion rate and soil production rate than the northeast-facing hillslope, which suggests that the relatively less dense vegetation cover and drier surface soils on the southwest-facing slopes influence soil properties. With seven parameters in total for calibration, this hybrid model can provide a realistic soil thickness map with a relatively small amount of sampling dataset comparing to machine learning approach. Integrating process-based modeling and statistical analysis not only provides a thorough understanding of the fundamental mechanisms for soil thickness prediction, but integrates the strengths of both statistical approaches and process-based modeling approaches.

Citation

Yan, Q., Wainwright, H., Dafflon, B., Uhlemann, S., Steefel, C. I., Falco, N., Kwang, J., and Hubbard, S. S.: A hybrid data–model approach to map soil thickness in mountain hillslopes, Earth Surf. Dynam., 9, 1347–1361, https://doi.org/10.5194/esurf-9-1347-2021, 2021

SAIL Comes to East River: A Climate Observatory to Understand the Future of Water

The Surface Atmosphere Integrated Field Laboratory (SAIL) will make its debut at the East River, CO Watershed in September 2021.

Science Magazine highlighted the upcoming deployment in a story included in their August 27 issue (first online Aug 24). 

Read more about SAIL in this latest storyboard.

Read more »

Testing geological origins of fast groundwater pathways using machine learning

(a) Data from the East River valley, Colorado showing an anomaly – a geological feature that is different from its surroundings. A new machine learning method tests multiple interpretations of how this feature could have formed, demonstrating that one (b) is consistent with the measured data while the other (c) is not.
Image courtesy of Alex Miltenberger

The Science

Groundwater provides about a third of earth’s freshwater, yet much is still unknown about where and how water moves underground. Geological features affect groundwater movement, but these structures often can’t be seen from Earth’s surface. Understanding how these features may have formed can help enhance knowledge about the broader behavior and structure of watersheds, allowing for better predictions of freshwater movement. A team of scientists developed a method to map underground flow pathways and understand how they formed. The researchers used Bayesian hypothesis testing to compare multiple interpretations, or scenarios, for what created the flow pathways, such as from a crack in earth’s surface or rock-mass movements. Then, these interpretations are ranked by how consistent they are with the measured data using machine learning. This method was applied at a fractured bedrock zone – an area of cracked and crushed subsurface rock – in the Elk Mountains of Central Colorado, where water flows much faster through these fractures than in surrounding rock. The method demonstrated that the fractured bedrock was most likely created by a fault or sedimentary layer.

The Impact

Sustainable management of groundwater is becoming urgent as groundwater resources are increasingly withdrawn in response to population increase and climate change. Mapping groundwater flow pathways is crucial for understanding freshwater behavior and movement. This research shows that machine learning can help scientists understand how the geology of an area forms groundwater flow pathways, and can be applied to enhance freshwater resource management. In places affected by drought or contamination, knowing the path of groundwater flow can help conserve water or stop the spread of contaminants.

Summary

Certain structures in the Earth form groundwater “highways”, where water moves faster than normal. Finding these structures is crucial for understanding when and where groundwater moves. When flow pathways are hidden below the surface, they are found by sending electrical, magnetic, and other signals into the ground and measuring how the ground responds. Since different geological formations respond differently to the signals, we can use the signals to find places underground that are likely to contain groundwater flow pathways. However, multiple geological structures can have similar responses, which makes it hard to choose the best interpretation of how these structures could have formed. A team of scientists developed a method to test multiple interpretations of these types of signals.

The proposed method has three parts. First, for each proposed interpretation, the signals and measurements are simulated on a computer. Second, the researchers compare the simulated data to the field data for each interpretation. Finally, using machine learning the team ranks each interpretation according to how closely it matches data gathered in the field. The research team applied this method to a zone of fractured rock in the Elk Mountains of Central Colorado. Six interpretations are proposed and ranked according to how closely they match the measurements. The team concludes that the fractured rock is from either a fault or a sedimentary layer.

Citation

A. Miltenberger, et al. “Probabilistic Evaluation of Geoscientific Hypotheses with Geophysical Data: Application to Electrical Resistivity Imaging of a Fractured Bedrock Zone”. Journal of Geophysical Research: Solid Earth. 126, e2021JB021767 (2021). [DOI: 10.1029/2021JB021767]

Carroll Speaks on NPR Morning Edition about Monsoons and Snowpack

SFA researcher Dr. Rosemary Carroll (Desert Research Institute) was featured on NPR’s morning edition in a story titled “Rain During Monsoon Season Is Becoming Less Reliable, Less Effective“. Carroll references results from Watershed Function SFA research showing how low snowpack results in conditions that lessen the amount of lower streamflow generated monsoon rains. Listen here »