Overcoming Data Scarcity in Earth ScienceAngela Gorgoglione, Alberto Castro Casales, Christian Chreties Ceriani, Lorena Etcheverry Venturini heavily Environmental mathematical models represent one of the key aids for scientists to forecast, create, and evaluate complex scenarios. These models rely on the data collected by direct field observations. However, assembly of a functional and comprehensive dataset for any environmental variable is difficult, mainly because of i) the high cost of the monitoring campaigns and ii) the low reliability of measurements (e.g., due to occurrences of equipment malfunctions and/or issues related to equipment location). The lack of a sufficient amount of Earth science data may induce an inadequate representation of the response’s complexity in any environmental system to any type of input/change, both natural and human-induced. In such a case, before undertaking expensive studies to gather and analyze additional data, it is reasonable to first understand what enhancement in estimates of system performance would result if all the available data could be well exploited. Missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. Different approaches are available to deal with missing data. Traditional statistical data completion methods are used in different domains to deal with single and multiple imputation problems. More recently, machine learning techniques, such as clustering and classification, have been proposed to complete missing data. This book showcases the body of knowledge that is aimed at improving the capacity to exploit the available data to better represent, understand, predict, and manage the behavior of environmental systems at all practical scales. |
Common terms and phrases
accessed addition algorithms analysis application assessment atmospheric attributes authors Available online average background condition CEIs classification clay climate collected computed considered corresponding cover CrossRef currently data assimilation database dataset decision decision attribute Decision Tree defined developed distribution Earth effects Environ environmental environmental data errors estimates evaluation example extreme field Figure files function global grid ground groups hour hydrologic IAVP identified impacts importance indices interest invasive Italy Land machine learning magnetotelluric mapping measurements methods metrics monitoring observations obtained performance period points potential precipitation predict presented processes range records regional relatively remote sensing represents require resolution rough set sand Science shows silt simulations soil spatial resolution species station statistical surface Table techniques temperature temporal turbidity University Uruguay values variables vector Weather wind