Jun 06, 2024

Public workspaceEnhancing precision flood mapping: Pahang's vulnerability unveiled

  • Dr. Tahmina Afrose Keya1,2,3,
  • Siventhiran S B4,5,3,6,
  • Maheswaran S7,
  • Sreeramanan S8,9,10,11,
  • Low J An12,2,3,11,
  • Leela A13,14,3,15,
  • Prahankumar R13,2,16,17,18,19,
  • Lokeshmaran A13,2,16,17,18,19,
  • Boratne AV20,21,22,23,
  • Abdullah, M. T24,25,11
  • 1Community Medicine Department;
  • 2Faculty of Medicine;
  • 3AIMST University;
  • 4Research Management Centre;
  • 5Faculty of Applied Sciences;
  • 6Malaysia .;
  • 7Faculty of Applied Sciences , AIMST University, Malaysia.;
  • 8Professor;
  • 9Centre for Chemical Biology;
  • 10USM University Sains;
  • 11Malaysia;
  • 12Department of Medical Microbiology;
  • 13Department of Community Medicine;
  • 14Public Health, Faculty of Medicine;
  • 15Malaysia.;
  • 16MGMCRI;
  • 17Sri Balaji Vidyapeeth Deemed to be University;
  • 18Puducherry;
  • 19India.;
  • 20Dept of Community & Family Medicine;
  • 21AIIMS Deoghar;
  • 22Jharkhand;
  • 23India;
  • 24Academy of Science Malaysia;
  • 25Kuala Lumpur
Open access
Protocol CitationDr. Tahmina Afrose Keya, Siventhiran S B, Maheswaran S, Sreeramanan S, Low J An, Leela A, Prahankumar R, Lokeshmaran A, Boratne AV, Abdullah, M. T 2024. Enhancing precision flood mapping: Pahang's vulnerability unveiled. protocols.io https://dx.doi.org/10.17504/protocols.io.kxygxyy6zl8j/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License,  which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Protocol status: Working
We use this protocol and it's working
Created: June 06, 2024
Last Modified: June 06, 2024
Protocol Integer ID: 101308
Keywords: Flood susceptibility, vulnerability, Geographic Information System, Ensemble Machine Learning, Pahang,
Funders Acknowledgement:
The Ministry of Higher Education (MoHE), Mal. Fundamental Research Grant Scheme (FRGS)
Grant ID: FRGS/1/2022/SKK04/AIMST/03/1
Abstract
Flooding in Malaysia is considered one of the most impactful natural disasters. Annually, Pahang experiences substantial destruction due to floods. The aim of this research is to address the urgent issue of flood susceptibility in Pahang, Malaysia. To achieve this, a combination of Geographic Information System (GIS) and Ensemble Machine Learning (EML) will be utilized. By considering nine factors from a geospatial database that contribute to flooding, the areas prone to floods will be mapped. The mapping process will be carried out using the ArcGIS environment, and a model called Random Forest (RF)-embedding will be developed using the Ensemble Machine Learning (EML) technique. To determine the most influential factors in flooding, Feature Selection (FS) will be employed. The accuracy of the flood susceptibility models will be assessed by analysing the Area Under the Curve (AUC). Flood susceptibility mapping is a complex procedure with uncertainties. Hence, our research can contribute to flood management in vulnerable regions by improving flood models and providing spatial outcomes to help decision-makers implement risk reduction strategies.
Attachments
Materials
Study Area
Pahang in Peninsular Malaysia has been chosen as the research site due to its annual monsoon floods, which harm the local population.
Study design and Data Collection Tool

Flood influencing factors.
According to the data available for Pahang and a comprehensive literature search, a total of nine factors have been identified as potential indicators of heightened flood susceptibility in the context of modelling studies. These factors encompass elevation, slope, curvature, flow direction, flow accumulation, distance from river, rainfall, land-use, and geology. Together, these parameters effectively capture the topographical and hydrometeorological conditions that contribute to the overall vulnerability of the region to flooding events[15,16] .

Digital Elevation Models (DEMs) have demonstrated their indispensable role in ensuring the precision of hydrodynamic models [17]  .  The Earth data platform provided access to the 30 m resolution Shuttle Radar Topography Mission (SRTM) DEM Version 3, from which the digital elevation data will be obtained [18] . The presence of flooding is largely impacted by the slope of the land, as steeper slopes can accelerate the flow of water over the surface, hindering its ability to seep into the ground [19]  . The shape of a surface, as determined by its curvature, indicates whether it is convex, concave, or flat, indicating changes in slope inclination. Concave surfaces tend to collect flood water, increasing the likelihood of flooding [20] . The direction of flow plays a crucial role in determining the path that surface water will take and the potential for flooding [21]  

An increase in flow accumulation coincides with an increase in vulnerability to flooding [19] .  In this research, the distance from rivers was estimated using the Euclidean distance tool in ArcGIS software, which utilized a raster layer depicting the river network. The ArcGIS platform will be used to generate maps for the elevation, slope, curvature, flow direction, flow accumulation, and distance from river, which will be subsequently categorized into sub-classes using the natural break classification method. Flooding occurs when there is a sudden increase in water levels in rivers, lakes, and reservoirs due to intense rainfall, often resulting in inadequate drainage [22] . We will be using data from 10 precipitation stations in Pahang, including Cameron Highlands, Bentong, Bera, Kuantan, Lipis, Maran, Pekan, Raub, Rompin, and Temerloh, to create a rainfall distribution map for the research area. We will employ the Inverse Distance Weighted (IDW) approach, utilizing a 10-year dataset from 2012 to 2021, to construct the map [23] . This method ensured that the rainfall patterns in the area being studied were accurately depicted. 

The properties of drainage systems are significantly affected by changes in land use and land cover (LULC) in the upstream watersheds. These modifications directly impact the occurrence of surface overflow and the land surface's capacity to absorb water, ultimately playing a role in the frequency and intensity of flooding events [24] . The global geological and LULC data will be obtained from the worldwide geological maps database provided by the USGS and the Global data [25] . The LULC map will be created using the ArcGIS platform, delineating seven distinct categories: water bodies, trees, flooded vegetation, crops, built area, bare terrain, and rangeland. In the case of the Geology map of Pahang, and will be segmented into nine primary soil features, based on the USGS-USA soil taxonomy  [25,27] .

Random forest (RF) Embedding classifier.
The random forest technique demonstrates strong predictive accuracy and is adept at managing large datasets for regression and classification purposes. By training numerous decision trees concurrently through bootstrapping, aggregation, and bagging methods, the RF method consistently outperforms alternative techniques in accuracy and prevents overfitting. Moreover, the training process for the RF-embedding model is quicker, leading to superior classification accuracy [28].

Feature Selection 
Feature selection is crucial for improving model efficiency, eliminating unnecessary data, preventing overfitting, and enhancing generalization on test data. In this study, an embedded feature selection method using a shuffling algorithm was used to create random probes based on the original variables. These probes were combined with the variables to train a Random Forest regression model, which determined the significance of each variable (Z-score). Variables with a Z-score higher than the maximum Z-score among the random probes were considered important [29]  . In this context, the DML algorithm uses the embedded Mean Decrease Accuracy (MDA) measure. It typically splits based on "gini" for Gini impurity and "entropy" for information gain, mathematically defined as p(xi) for each possible value i of random variable x and c for the number of classes in the dataset (Eq 1,2)   [30–32].

(1)

(2)


The RF learning model, using multiple decision trees, is more accurate than a single decision tree. It combines random feature selection and bagging for classification and regression. In this study, a popular machine learning FS method ranked flood influencing factors. This algorithm is widely endorsed by researchers for its strong predictive performance, high accuracy, and ease of interpretation. It iteratively generates rankings by shuffling features and identifying consistently important ones [29,33] .


Before start
GIS is essential for spatial data analysis and decision-making, particularly in flood susceptibility mapping. It integrates geospatial data to examine spatial relationships and visualize vulnerable areas. Machine Learning, specifically ensemble methods like Random Forest, provide advanced techniques for analysing complex datasets and improving the accuracy of flood susceptibility predictions.
Conceptual framework
Conceptual framework
Develop an Integrated GIS-Based Framework.
The objective is to establish a robust GIS-based framework for flood susceptibility mapping in the Pahang State. This involves compiling and integrating geospatial datasets related to topography, hydrology, land use, and climate variables to create a comprehensive database for analysis.
Apply Ensemble Machine Learning Algorithms.
The objective is to apply ensemble machine learning algorithms, such as Random Forest (RF) and Gradient Boosting Machines (GBM), to the integrated dataset to develop predictive models of flood susceptibility.
This objective includes feature selection, model training, validation, and evaluation to ensure the accuracy and reliability of the susceptibility maps.
Generate Actionable Insights for Decision-Making:
The objective is to generate actionable insights from the flood susceptibility maps to support informed decision-making and disaster management strategies. This involves identifying vulnerable areas, assessing the factors contributing to flood risk, and recommending targeted interventions and mitigation measures to reduce the impacts of floods on communities, infrastructure, and the environment.
Protocol references