
R Programming is a well-developed, simple, and effective programming language and an integrated environment for statistical computing and data analysis. It provides a wide variety of statistical techniques (e.g., linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and raster data processing. The growth of Rs usage in water science is reflected in the number of newly published packages connecting to water problems.
In this book, we explore the usage of different R packages to solve water science issues connecting to rainfall, drought, evapotranspiration, stream flow analysis, etc., advanced statistical and machine learning models like Modified Trend analysis, Autoregressive Integrated Moving Average (ARIMA), Artificial Neural Network (ANN), hybrid models (ARIMA-ANN, ARIMASVR), Multivariate modeling, spatiotemporal time series modeling, count time series modeling for forecasting, in addition to understanding the Remote Sensing and GIS applications toward mapping and understanding the changing Water Resources.
Water is a valuable natural resource that is essential to all aspects of life on Mother “Earth” In fact, our planet is covered by 70% of water, but only 3% of the total water is in the form of freshwater, and again two-thirds of that is in the form of glaciers or is unavailable for our use. Presently, 1.1 billion people globally lack access to good drinking water resources, and another 2.7 billion find water scarce for at least one month of the year. While, India has an annual average precipitation of 1,170 mm and about 80% of the total area experiences an annual rainfall of 750 mm or more, which is due to large spatial and temporal variability in the rainfall. With growing urban population and land use changes, water scarcity will be exacerbated, and also with climate change and bio-energy demands amplify the complex relationship between world development and water demand. Another major threat in the recent past is climate change and its associated extreme floods and droughts events are a major concern, and there is a need to understand the changing dynamics in “Water Resources” with geographical space and time, for effective management of natural resources using the advanced tools like Remote Sensing and GIS technology, and also the use of R Programming.
Introduction An integrated approach to understanding the changing dynamics in water resources involves taking into account various factors that influence the management of water resources under changing climatic conditions (Pahl-Wostl, 2007). It is a holistic approach that combines different disciplines, including hydrology, climatology, ecology, geography, and social sciences, to develop a comprehensive understanding of water resources. The integrated approach is essential for several reasons, including addressing climate change impacts, sustainable management of water resources, enhancing water security, addressing water quality issues, and promoting collaboration among stakeholders. Climate change is causing alterations in precipitation patterns, melting glaciers, and changes in river flows, affecting water availability and quality. An integrated approach helps in understanding the climate change impacts on water resources and developing adaptation strategies (Asif et al., 2023).
Introduction R language is a popular and versatile tool for hydrological data analysis and visualization, offering several powerful packages for handling and analyzing hydrological parameters such as river flow, precipitation, groundwater levels, and water quality parameters. R language provides a wide range of statistical functions for hydrological parameter analysis, including time-series analysis, frequency analysis, trend analysis, and regression analysis. Moreover, R’s powerful graphics capabilities make it possible to create high-quality and customizable visualizations of hydrological data. In addition to its statistical and graphical capabilities, R also offers a wide range of packages for data manipulation and cleaning, making it a powerful tool for preprocessing hydrological data. R’s ability to handle and analyze large datasets also makes it well-suited for processing and analyzing large hydrological datasets. Furthermore, R has a strong community of developers and users who regularly contribute new packages and tools, providing a constantly evolving stage for hydrological analysis. Overall, R is a versatile and powerful tool for hydrological data analysis and visualization, offering numerous capabilities and resources for informed decision-making in water resources management and related fields.
Introduction Hydrological data visualization plays a crucial role in analyzing and understanding the behavior and characteristics of water resources. By visualizing hydrological data, we can gain insight into the distribution, variability, and trends of different hydrological variables such as river flow, precipitation, groundwater levels, and water quality parameters. This information can help decision-making in areas such as water resources management, flood forecasting, and climate adaptation. R Language is a popular and versatile tool for hydrological data visualization, offering several powerful packages for data analysis and visualization. Each of these packages has its own strengths and limitations, which can be leveraged depending on the specific needs of a given analysis. For example, ggplot2 is an extensively used package for creating publication-quality static plots that are customizable and visually appealing. This package is based on the grammar of graphics, which provides a new plotting system that combines the benefits of both base and lattice graphics. ggplot2 can be used to create a wide range of plot types, including time-series plots, spatial maps, and scatterplots. It provides several options for customizing plot elements such as axis labels, legends, and plot themes. On the other hand plotly offers interactive and dynamic plots that can be easily shared and embedded in web applications. With plotly, we can create plots that allow for zooming, panning, and hovering over data points to display additional information. This package is mainly useful for exploring relationships between hydrological variables. In addition to these packages, hydroTSM provides functions for reading, manipulating, and plotting time-series data. hydroTSM can help us create time-series plots of variables such as river f low, precipitation, and temperature, with options for adding trend lines error bars, and annotations.
Introduction Evapotranspiration is a vital process in the Earth’s water cycle, and it involves two main processes—evaporation and transpiration. Evaporation refers to the conversion of water from its liquid state to its gaseous state, which occurs primarily from the surface of bodies of water, wet vegetation, and soils. This process requires energy, which is typically provided by direct solar radiation and ambient air temperature. Transpiration, on the other hand, is the process by which plants release water vapor through small openings in their leaves called stomata. This process is essential for plant growth and function as it helps regulate the plant’s temperature and allows for nutrient uptake and water transport. Like evaporation, transpiration also requires energy, with factors like vapor pressure gradient and wind speed influencing the rate at which water is released. To accurately measure evapotranspiration, it is necessary to consider a range of factors such as temperature, radiation, wind speed, and humidity. Plant parameters, such as the presence of a cuticle or the size and number of stomata, can also greatly influence the transpiration rate. Other factors that may impact evapotranspiration include the season, time of day, and soil properties such as water holding capacity and root depth. Evapotranspiration is a complex concept that has been studied by various scientists and organizations over time. Three main evapotranspiration (ET) concepts have been developed: Reference ET, ET under standard conditions, and ET under nonstandard conditions. The goal of determining potential or reference evapotranspiration is to exclude crop-specific variations from the evapotranspiration process. The potential evapotranspiration concept, introduced in the late 1940s and 50s by Penman, assumes constant crop conditions and defines it as “the amount of water transpired in a given time by a short green crop, completely shading the ground, of uniform height and with adequate water status in the soil profile.” ETo, which is a climatic parameter, represents the evaporating power of the atmosphere at a certain location and time of the year and is computed from weather data. It does not take into account crop characteristics or soil conditions, and the only factors affecting it are climatic parameters. The range of ETo varies from 1–2 mm/day in humid and subhumid temperate regions to 6–9 mm/day in warm arid and semi-arid temperate regions and from 2–3 in cool humid and subhumid tropics to 6–8 mm/day in warm arid and semi-arid tropics. Actual ET is the amount of water on a large area of grass that is actively growing and covering the entire soil surface it actually uses. The amount of water that is actually evacuated from a surface as a result of the processes E and T is known as ETa. Crop water needs can be calculated by the following equation.
Introduction Droughts are considered to be a significant environmental disaster and have captured the attention of various experts, including environmentalists, ecologists, hydrologists, meteorologists, geologists, and agricultural scientists. These natural calamities are known to occur in almost all climatic zones, regardless of whether they are high- or low-rainfall areas. Generally a decrease in the amount of precipitation received over an extended period such as a season or a year, is the primary cause of droughts. However, other factors such as high temperatures, strong winds, low relative humidity, and characteristics of rainfall (e.g., intensity, duration, onset, and termination) also play a crucial role in their occurrence. It’s important to distinguish a drought from aridity, which is a permanent feature of a climate and is limited to low rainfall areas (Wilhite, 1992). A drought is a temporary phenomenon and can last from a few months to several years. Some people often mistake droughts for heat waves, but Chang and Wallace (1987) highlighted the fundamental difference between these two events. A heat wave typically lasts for a week, while a drought can extend for several months or even years. When a heat wave and a drought coincide, it can lead to dire socio-economic consequences.
Introduction In recent decades, severe weather and climatic occurrences have become more frequent, affecting human populations across the globe. These changes are caused, in part, by anthropogenic climate change. Vulnerable communities are particularly affected by these rapidly changing patterns, which can have catastrophic consequences. To mitigate the impact of climate change, communities must have access to accurate and up-to-date climate information. Weather predictions, from short-term forecasts to long-term scenarios based on data from previous decades and centuries, can provide valuable information to socioeconomic sectors that are vulnerable to climate change. Climate services offer scientifically-based data and forecasts that can assist decision-makers in managing the risks of climate variability and change. Globally accepted indices generated from daily temperature and precipitation data can indicate the more severe features of climate change. These indices are developed and analyzed by the CCI/WCRP/JCOMM Expert Team on Climate Change Detection and Indices (ETCCDI) and have made a significant contribution to the Intergovernmental Panel on Climate Change (IPCC) Assessment Reports.
Introduction The hydrological cycle transfers water and energy between the land and atmosphere, but its regional characteristics make accurate modeling challenging. Hydrological models aim to predict hydrological events and processes, but there are uncertainties in their output due to the complex nature of the system and assumptions made. Empirical models use statistical or mathematical relationships, while process-based models use equations to represent physical processes. Hydrological modeling involves various steps, leading to uncertainties that can affect the accuracy of model predictions. The four main types of uncertainties commonly encountered are data, model, parameter, and climate uncertainty. Data uncertainty arises due to errors or gaps in the input data, model uncertainty stems from limitations in the model structure or assumptions used, parameter uncertainty arises due to uncertainties in the values allocated to various model parameters, and climate uncertainty refers to uncertainties in the future climate projections used as inputs.
Introduction Time Series Analysis (TSA) TSA is a crucial tool in the study of climate change, as it allows researchers to carefully examine patterns and changes in weather parameters over time. By analyzing time series data for temperature, precipitation, and other weather variables, scientists can identify trends and develop models that accurately describe the underlying structure of the series. These models can then be used to make predictions about future changes in weather patterns and the potential impacts of climate change on the environment and human society. The accurate forecasting of climate change using TSA can inform policy decisions and facilitate adaptation strategies to mitigate the effects of a changing climate. Components of Time Series When analyzing a time series, we often want to separate the different components that contribute to its overall behavior. The classical decomposition method is a widely used technique that breaks down a time series into four components: trend, seasonal, cyclical, and irregular.
Introduction Time series analysis is a powerful tool for detecting and describing sequential processes, and it has become increasingly important in water resource management, especially in the context of global warming and climate change. Statistical methods have been widely used for analyzing climatological time series, such as precipitation, air temperature, and evapotranspiration, and hydrological time series, such as streamflow and surface water quality. However, most of the linear time series models depend on strict assumptions, such as homogeneity, trend-freeness, stationarity, non-periodicity, and lack of persistence, which are often violated in real-world datasets. Therefore, there is a need to develop new models that are more flexible and can accommodate different types of data (Machiwal & Jha, 2006). One of the most popular and widely used classical time series models is the Autoregressive Integrated Moving Average (ARIMA) model, which has linear statistical properties and can be built using the Box-Jenkins methodology (Box and Jenkins, 1970). This model adoption is important for hydrological analysis because it can help in accurately characterizing and predicting the behavior of hydrological time series data. ARIMA models can capture complex patterns and trends in the data and can be used to forecast future values of the time series. This can help in better understanding the underlying processes that drive hydrological variability, such as changes in precipitation patterns or land use. Furthermore, hydrological data often exhibit seasonality and long-term trends, which can be accounted for in ARIMA modeling. ARIMA models can also be used for anomaly detection, identifying unusual events that may have significant impacts on water resources. Overall, ARIMA modeling provides a f lexible and powerful framework for analyzing and forecasting hydrological time series, which is essential for effective water resource management in the face of changing environmental conditions.
Introduction Climate variability can have significant impacts on agricultural production, affecting both the demand and supply sides of farming. As climate change continues to negatively impact our ecosystem, food production, and human livelihood, this topic has gained widespread attention from researchers and academics worldwide. Given that agriculture is highly dependent on factors such as precipitation and temperature, it is considered one of the most climate sensitive sectors (Sagar et al., 2022). Climatic data is complex and nonlinear, with significant fluctuation across time components, resulting in greater volatility. This means that stochastic climatic variables, including rainfall, temperature, relative humidity, and other weather parameters, often exhibit conditional heteroscedasticity, leading to irregular residual mean and variance across time series.
Introduction Drought is an evolutionary process often triggered by precipitation deficiencies and may result in enormous economic losses. Effective early drought warnings and predictions may be beneficial for the farmers. To determine drought and its severity level, meteorological drought indices such as the standardized precipitation index, effective drought index etc. are utilized. Drought events can be foreknown using prediction models and by predicting these indices. Many studies on prediction have primarily focused on statistical approaches such as Holt-Winters, auto-regressive integrated moving average (ARIMA), etc. These statistical techniques are insufficient for providing long-term predictions. However, with improvements in machine learning techniques, there seems to be a mechanism to forecast drought in the long run with high accuracy. Frequently used machine learning algorithms in drought forecasting are artificial neural network (ANN), support vector machine, etc. Machine learning techniques have a powerful and versatile data-driven algorithm for capturing and representing complex input and output relationships.
Introduction Rainfall is a crucial component of the Earth’s climatic system that affects various aspects of human life. It has a direct impact on the ecosystems, water resources, agricultural production, and overall water management. Dependence on rainfall is particularly high in regions where it is the primary source of water, making it essential to have an accurate understanding of future rainfall patterns. Given its importance, rainfall forecasting is a critical concern for a wide range of stakeholders, including governments, businesses, risk management organizations, and the scientific community. However, predicting rainfall is a complex task as it is a random occurrence influenced by numerous factors (Samantaray et al., 2020). Even in similar weather conditions, the possibility of rainfall may vary. This makes historical data and predictive analytics crucial in forecasting future rainfall and taking necessary actions to manage its effects (Avinash et al., 2022). The impact of rainfall extends beyond water management to affect various human activities, including power generation, construction, tourism, and forestry, among others. Therefore, it is crucial to accurately forecast and analyze rainfall patterns to help mitigate the risks and consequences associated with it (Hernandez et al., 2016).
Introduction Time series data is data that has been collected in a specific pattern with time stamps. (Nayak and Ghosh, 2013). This type of time-oriented data is collected at regular intervals, such as hourly, daily, monthly, or weekly. Time series data can be used to make accurate predictions in a variety of fields and domains, such as foreign currency rates, stock market trends, energy consumption estimates, and climate change. Machine learning and data mining techniques can be used to uncover hidden patterns in historical data and forecast future trends (Aftab et al. 2018a, 2018b, Mishra et al. 2017). Weather forecasting based on historical data is a complex but very beneficial task that comes with several issues that must be resolved to achieve optimal results (Wu and Chau, 2013). Weather data includes attributes or features such as temperature, pressure, humidity, and wind speed. Machine learning techniques are used to forecast future weather conditions by identifying hidden patterns and relationships in historical weather data (Aftab et al. 2018b). Prediction of precipitation is an important part of the weather forecasting process. The use of machine learning and artificial intelligence techniques is regarded as a critical component in farmer services and policy formulation. Farmers require weather forecasting to plan their activities following the predicted weather. Accurate and timely rainfall prediction in farming, in particular, can be extremely beneficial in planning and security measures for operations, agricultural tasks, water reservoir systems, and construction and transportation activities (Chau and Wu, 2010). Forecasting rainfall will become increasingly important in the future.
Introduction Drought is defined as “when a region receives below average precipitation, resulting in prolonged shortages in its water supply, whether surface or ground water or atmospheric. It can have a substantial impact on the ecosystem and agriculture of the affected region.” When drought arises in a particular area, obviously it affects the crop and livestock production. In order to reduce the effect of drought on farm productivity and to stabilize farm income, farmers have to take some systematic measures, and such measures are referred to as drought coping mechanisms. The objective of the study was to identify the determinants of the adoption of drought-coping mechanisms to balance and stabilize the farm income of the stakeholders. The study also helps to know how to mitigate the effect of drought on farmer’s livelihoods. The existing literature on the adoption of drought coping mechanisms is quite meager. Akwango et al., (2016), Chandan Kumar Jha et al., (2017), Emmanuella Ellis (2017), Ronald Twongyirwe et al., (2019) assessed the adoption strategies by considering multinomial logistic regression and the results state that the use of different tree planting, crop varieties, soil and water conservation, early and late planting, and furrow irrigation. The data related to the adoption of any agriculture technology have qualitative response variables with two or more categories. Most of the studies used qualitative response models such as the logit model, probit model, and multivariate techniques like discriminant analysis to measure the farmer’s perception toward the adoption of any agriculture technology. In this study, we chose machine learning models such as Decision tree, to assess the classification ability and to determine the factors influencing adoption.
Introduction Drought occurs when a region experiences below average precipitation, resulting in prolonged shortages in its water supply, which can have a significant impact on agriculture and the ecosystem. To reduce the effect of drought on farm production and stabilize farm income, farmers must take systematic measures called drought coping mechanisms. These measures can be categorized into two broad categories: Ex-ante coping mechanisms, which are taken before the occurrence of drought, and Ex-post coping mechanisms, which are taken after the occurrence of drought. Ex-ante coping mechanisms include activities such as crop, spatial, and varietal diversification, income diversification, and the use of savings. Ex-post coping mechanisms include changing consumption patterns, borrowing money, liquidating assets, seeking government drought relief funds, and migration. Drought can have a significant impact on crop and livestock production, causing farmers to adopt drought coping mechanisms to stabilize their farm income. This study goals to identify the factors that determine the adoption of such coping mechanisms, to mitigate the effects of drought on the livelihoods of farmers.
Introduction Agriculture is backbone of India’s economy and is largely dependent on the monsoon. Rainfall plays a dominant role in agricultural production and productivity, despite advancement in many technological fronts. Forecast of rainfall is essential for planning and management of water resources especially in an agriculture-based country like India. About 65 per cent of the total cultivated land in India is under the influence of the rainfed agriculture system. Especially, monthly and seasonal rainfall forecasts provide useful information for water resource management, agricultural planning and its associated crop insurance application (Garbrecht et al., 2010). The early prediction of rainfall is essential which help farmers to make decisions on crop production and harvesting, as well as help prevent flooding, protect human lives and resources. Rainfall forecasting is very challenging due to its uncertain nature and dynamically changing climate. It is an application of science and technology to predict precipitation in advance. It’s always been a challenging task for meteorologists. Prediction of precipitation is categorized into short-range prediction and long-range prediction. Forecasting is done through the collection and analysis of weather and climate data. Avinash et al. (2022) studied temporal variation of land use pattern in Karnataka through indices and concluded that due to fluctuations in the rainfall the forest land is degrading year by year.
Introduction Artificial intelligence (AI) is a field that mainly focuses on creating intelligent technologies that can easily perform tasks requiring human intelligence. One subfield is machine learning (ML), which enables machines to learn from data without being explicitly programmed. There are two main types of ML: supervised and unsupervised learning. Supervised learning trains machines on labeled data to predict output, while unsupervised learning trains machines on unlabeled data to find patterns. ML includes techniques such as regression, classification, clustering, and deep learning. ML has the potential to revolutionize many fields but also raises ethical, social, and legal issues. Developing trustworthy, responsible, and ethical ML systems is crucial. The MuCulloch and Pitts Model, proposed by Warren MuCulloch and Walter Pitts in 1943, is the first formal model of ML techniques. It is also known as the linear threshold gate (Fig. 1).
Introduction The study of spatiotemporal time series forecasting has gained significant attention in recent years, as it involves analyzing records of events that occur over both space and time. Spatiotemporal models group time- and location-specific discrete variables and are used in various sectors, including agriculture, transportation, meteorology, economics, disease mapping, sociology, environmental sciences, and ecology. However, the popular Autoregressive Integrated Moving Average (ARIMA) (Box and Jenkins, 1970) model has two major flaws: it assumes linearity, which is not always the case in time series data that may exhibit nonlinear interactions, and it is a univariate model that cannot capture spatial linkages. Hence, when investigating time series data that are not perfectly linear and involve spatial correlation, the ARIMA model should not be used. Other multivariate time series models like VARMA have also been used to represent spatiotemporal data. However, VARMA models require more parameters, and the STARMA model introduced by Cliff and Ord has fewer parameters (Cliff and Ord, 1975). In previous literatures, the STARMA model was commonly used for modeling different spatiotemporal time series data (Kamarianakis, 2003). The STARMA model was previously employed to forecast the spatiotemporal data in different fields such as economics, disease mapping, climate modeling, agriculture, etc. (Pfeifer and Bodily, 1990; Lee, 2005; Rathod et al., 2018; Saha et al., 2020 and Rathod et al., 2021).
A Artificial Intelligence (AI) – 192, 206, 218 Autoregressive Integrated Moving Average (ARIMA) – 97, 101, 193, 207, 218 ARIMA-LSTM – 191, 195, 196, 199 ARMA – 101, 110, 111, 112, 113, 115, 193, 208, 222 Artificial Neural Network (ANN) – 120, 146, 208, 218, 219 C CHAID: 162, 164, 165, 166, 167, 169 Climate Change – 4, 5, 42, 48, 52, 59, 60, 65, 69, 70, 75, 86, 87, 91, 98, 110, 111, 146, 156 ClimPACT- 68 D Data Visualization – 5, 6, 15, 16, 21, 25 Decision Tree: 160, 161, 162, 163, 168 Deep Learning – 132, 133, 143, 191, 192, 197, 198, 200, 206 Drought – 6, 12, 47, 48, 49, 51, 52, 53, 54, 55, 56, 61, 97, 109, 110, 113, 114, 115, 119, 120, 124, 125, 126, 159, 160, 163, 165, 166, 167, 168, 171, 172, 178, 179, 182, 183, 185, 207, 211, 212 Drought Coping – 159, 160, 165, 166, 168, 171, 172, 178, 182, 183 Drought Coping Mechanisms: 159, 160, 165, 166, 172 Drought Modeling – 205
