Enhancing precision flood mapping: Pahang's vulnerability unveiled

Dr. Tahmina Afrose Keya; Siventhiran S B; Maheswaran S; Sreeramanan S; Low J An; Leela A; Prahankumar R; Lokeshmaran A; Boratne AV; Abdullah; M. T

Jun 06, 2024

Enhancing precision flood mapping: Pahang's vulnerability unveiled

DOI

dx.doi.org/10.17504/protocols.io.kxygxyy6zl8j/v1

Dr. Tahmina Afrose Keya^1,2,3,
Siventhiran S B^4,5,3,6,
Maheswaran S⁷,
Sreeramanan S^8,9,10,11,
Low J An^12,2,3,11,
Leela A^13,14,3,15,
Prahankumar R^{13,2,16,17,18,19},
Lokeshmaran A^{13,2,16,17,18,19},
Boratne AV^20,21,22,23,
Abdullah, M. T^24,25,11

¹Community Medicine Department;
²Faculty of Medicine;
³AIMST University;
⁴Research Management Centre;
⁵Faculty of Applied Sciences;
⁶Malaysia .;
⁷Faculty of Applied Sciences , AIMST University, Malaysia.;
⁸Professor;
⁹Centre for Chemical Biology;
¹⁰USM University Sains;
¹¹Malaysia;
¹²Department of Medical Microbiology;
¹³Department of Community Medicine;
¹⁴Public Health, Faculty of Medicine;
¹⁵Malaysia.;
¹⁶MGMCRI;
¹⁷Sri Balaji Vidyapeeth Deemed to be University;
¹⁸Puducherry;
¹⁹India.;
²⁰Dept of Community & Family Medicine;
²¹AIIMS Deoghar;
²²Jharkhand;
²³India;
²⁴Academy of Science Malaysia;
²⁵Kuala Lumpur

Dr. Tahmina Afrose Keya

Community Medicine Department , Faculty of Medicine, AIMST U...

DOI: dx.doi.org/10.17504/protocols.io.kxygxyy6zl8j/v1

Protocol Citation: Dr. Tahmina Afrose Keya, Siventhiran S B, Maheswaran S, Sreeramanan S, Low J An, Leela A, Prahankumar R, Lokeshmaran A, Boratne AV, Abdullah, M. T 2024. Enhancing precision flood mapping: Pahang's vulnerability unveiled. protocols.io https://dx.doi.org/10.17504/protocols.io.kxygxyy6zl8j/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working

We use this protocol and it's working

Created: June 06, 2024

Last Modified: June 06, 2024

Protocol Integer ID: 101308

Keywords: Flood susceptibility, vulnerability, Geographic Information System, Ensemble Machine Learning, Pahang,

Funders Acknowledgement:

The Ministry of Higher Education (MoHE), Mal. Fundamental Research Grant Scheme (FRGS)

Grant ID: FRGS/1/2022/SKK04/AIMST/03/1

Abstract

Flooding in Malaysia is considered one of the most impactful natural disasters. Annually, Pahang experiences substantial destruction due to floods. The aim of this research is to address the urgent issue of flood susceptibility in Pahang, Malaysia. To achieve this, a combination of Geographic Information System (GIS) and Ensemble Machine Learning (EML) will be utilized. By considering nine factors from a geospatial database that contribute to flooding, the areas prone to floods will be mapped. The mapping process will be carried out using the ArcGIS environment, and a model called Random Forest (RF)-embedding will be developed using the Ensemble Machine Learning (EML) technique. To determine the most influential factors in flooding, Feature Selection (FS) will be employed. The accuracy of the flood susceptibility models will be assessed by analysing the Area Under the Curve (AUC). Flood susceptibility mapping is a complex procedure with uncertainties. Hence, our research can contribute to flood management in vulnerable regions by improving flood models and providing spatial outcomes to help decision-makers implement risk reduction strategies.

Attachments

Protocol_Pahang.pdf

166KB

Materials

Study Area
Pahang in Peninsular Malaysia has been chosen as the research site due to its annual monsoon floods, which harm the local population.
Study design and Data Collection Tool

Flood influencing factors.
According to the data available for Pahang and a comprehensive literature search, a total of nine factors have been identified as potential indicators of heightened flood susceptibility in the context of modelling studies. These factors encompass elevation, slope, curvature, flow direction, flow accumulation, distance from river, rainfall, land-use, and geology. Together, these parameters effectively capture the topographical and hydrometeorological conditions that contribute to the overall vulnerability of the region to flooding events[15,16] .

Digital Elevation Models (DEMs) have demonstrated their indispensable role in ensuring the precision of hydrodynamic models [17]  .  The Earth data platform provided access to the 30 m resolution Shuttle Radar Topography Mission (SRTM) DEM Version 3, from which the digital elevation data will be obtained [18] .  The presence of flooding is largely impacted by the slope of the land, as steeper slopes can accelerate the flow of water over the surface, hindering its ability to seep into the ground [19]  . The shape of a surface, as determined by its curvature, indicates whether it is convex, concave, or flat, indicating changes in slope inclination. Concave surfaces tend to collect flood water, increasing the likelihood of flooding [20] . The direction of flow plays a crucial role in determining the path that surface water will take and the potential for flooding [21]   

An increase in flow accumulation coincides with an increase in vulnerability to flooding [19] .  In this research, the distance from rivers was estimated using the Euclidean distance tool in ArcGIS software, which utilized a raster layer depicting the river network. The ArcGIS platform will be used to generate maps for the elevation, slope, curvature, flow direction, flow accumulation, and distance from river, which will be subsequently categorized into sub-classes using the natural break classification method. Flooding occurs when there is a sudden increase in water levels in rivers, lakes, and reservoirs due to intense rainfall, often resulting in inadequate drainage [22] . We will be using data from 10 precipitation stations in Pahang, including Cameron Highlands, Bentong, Bera, Kuantan, Lipis, Maran, Pekan, Raub, Rompin, and Temerloh, to create a rainfall distribution map for the research area. We will employ the Inverse Distance Weighted (IDW) approach, utilizing a 10-year dataset from 2012 to 2021, to construct the map [23] . This method ensured that the rainfall patterns in the area being studied were accurately depicted.   

The properties of drainage systems are significantly affected by changes in land use and land cover (LULC) in the upstream watersheds. These modifications directly impact the occurrence of surface overflow and the land surface's capacity to absorb water, ultimately playing a role in the frequency and intensity of flooding events [24] . The global geological and LULC data will be obtained from the worldwide geological maps database provided by the USGS and the Global data [25] . The LULC map will be created using the ArcGIS platform, delineating seven distinct categories: water bodies, trees, flooded vegetation, crops, built area, bare terrain, and rangeland. In the case of the Geology map of Pahang, and will be segmented into nine primary soil features, based on the USGS-USA soil taxonomy  [25,27] .

Random forest (RF) Embedding classifier.
The random forest technique demonstrates strong predictive accuracy and is adept at managing large datasets for regression and classification purposes. By training numerous decision trees concurrently
through bootstrapping, aggregation, and bagging methods, the RF method consistently outperforms alternative techniques in accuracy and prevents overfitting. Moreover, the training process for the RF-embedding model is quicker, leading to superior classification accuracy [28].

Feature Selection  
Feature selection is crucial for improving model efficiency, eliminating unnecessary data, preventing overfitting, and enhancing generalization on test data. In this study, an embedded feature selection method using a shuffling algorithm was used to create random probes based on the original variables. These probes were combined with the variables to train a Random Forest regression model, which determined the significance of each variable (Z-score). Variables with a Z-score higher than the maximum Z-score among the random probes were considered important [29]  . In this context, the DML algorithm uses the embedded Mean Decrease Accuracy (MDA) measure. It typically splits based on "gini" for Gini impurity and
"entropy" for information gain, mathematically defined as p(xi) for each possible value i of random variable x and c for the number of classes in the dataset (Eq 1,2)   [30–32].

Entropy∶H(x)=−sum(​i=1)ni3=((n(n+1))/2)2             (1)

Gini(E)=1−sum(​i=1)Ci1=pi2                                     (2)


The RF learning model, using multiple decision trees, is more accurate than a single decision tree. It combines random feature selection and bagging for classification and regression. In this study, a popular machine learning FS method ranked flood influencing factors. This algorithm is widely endorsed by researchers for its strong predictive performance, high accuracy, and ease of interpretation. It iteratively
generates rankings by shuffling features and identifying consistently important ones [29,33] .

Before start

GIS is essential for spatial data analysis and decision-making, particularly in flood susceptibility mapping. It integrates geospatial data to examine spatial relationships and visualize vulnerable areas. Machine Learning, specifically ensemble methods like Random Forest, provide advanced techniques for analysing complex datasets and improving the accuracy of flood susceptibility predictions.

Conceptual framework

Develop an Integrated GIS-Based Framework. 
The objective is to establish a robust GIS-based framework for flood susceptibility mapping in the Pahang
State. This involves compiling and integrating geospatial datasets related to topography, hydrology, land use, and climate variables to create a comprehensive database for analysis.

Apply Ensemble Machine Learning Algorithms. 
The objective is to apply ensemble machine learning algorithms, such as Random Forest (RF) and Gradient Boosting Machines (GBM), to the integrated dataset to develop predictive models of flood susceptibility.     

This objective includes feature selection, model training, validation, and evaluation to ensure the accuracy and reliability of the susceptibility maps.

Generate Actionable Insights for Decision-Making: 
The objective is to generate actionable insights from the flood susceptibility maps to support informed decision-making and disaster management strategies. This involves identifying vulnerable areas, assessing the factors contributing to flood risk, and recommending targeted interventions and mitigation measures to reduce the impacts of floods on communities, infrastructure, and the environment.

Protocol references

[1] UNDRR. Report of the open-ended intergovernmental expert working group on indicators and terminology relating to disaster risk reduction. Source United Nations Office for Disaster Risk Reduction United Nations General Assembly 2017.  (accessed April 5, 2024).

[2]Du W, Fitzgerald GJ, Clark M, Hou XY. Health impacts of floods. Prehosp Disaster Med 2010;25:265–72.
 
[3]Sun F, Lai X, Shen J, Nie L, Gao X. Initial allocation of flood drainage rights based on a PSR model and entropy-based matter-element theory in the Sunan Canal, China. PLoS One 2020;15:e0233570.

[4] Khan MMA, Shaari N, Nahar A, Baten MdA, Nazaruddin DA. Flood  impact assessment in Kota Bharu, Malaysia: a statistical analysis 2014.

[5] Nurul Ashikin A, Nor Diana MI, Siwar C, Alam MM, Yasar M. Community preparation and vulnerability indices for floods in Pahang State of Malaysia. Land (Basel) 2021;10:1–23.

[6] DARUL MAKMUR. Portal Rasmi Kerajaan Negeri Pahang 2024.  (accessed April 5, 2024).

[7] ASM - Academy of Sciences Malaysia. Assessment on the Sustainability of the Tasik Chini Basin and Tasik Chini Biosphere Reserve -Official Portal Academy of Sciences Malaysia 2023.  (accessed April 22, 2024).

[8] Saimi FM, Hamzah FM, Toriman ME, Jaafar O, Tajudin H. Trend and Linearity Analysis of Meteorological Parameters in Peninsular Malaysia. Sustainability 2020;12:9533.

[9] Wong C, Liew J, Yusop Z, Ismail T, Venneker R, Uhlenbrook S. Rainfall Characteristics and Regionalization in Peninsular Malaysia Based on a High Resolution Gridded Data Set. Water (Basel) 2016;8:500.

[10] Britannica. Pahang | History, Nature & Industry 2024.   (accessed April 5, 2024).
 
[11] Muhammad NS, Abdullah J, Julien PY. Characteristics of Rainfall in Peninsular Malaysia. J Phys Conf Ser 2020;1529:052014.

[12] Kamarudin MKA, Toriman ME, Abd Wahab N, Abu Samah MA, Abdul Maulud KN, Mohamad Hamzah F, et al. Hydrological and climate impacts on river characteristics of pahang river basin, Malaysia. Heliyon 2023;9.

[13] Mahajan P, Uddin S, Hajati F, Moni MA. Ensemble Learning for Disease Prediction: A Review. Healthcare (Switzerland) 2023;11.
 
[14] Shirzadi A, Soliamani K, Habibnejhad M, Kavian A, Chapi K, Shahabi H, et al. Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping. Sensors (Switzerland) 2018;18.

[15] Khoirunisa N, Ku C-Y, Liu C-Y, Esteban D, López-Gutiérrez J-S, Negro V, et al. A GIS-Based Artificial Neural Network Model for Flood Susceptibility Assessment. International Journal of Environmental Research and Public Health 2021, Vol 18, Page 1072 2021;18:1072.
 
[16]  Nguyen VN, Yariyan P, Amiri M, Tran AD, Pham TD, Do MP, et al. A New Modeling Approach for Spatial Prediction of Flash Flood with Biogeography Optimized CHAID Tree Ensemble and Remote Sensing Data. Remote Sensing 2020, Vol 12, Page 1373 2020;12:1373.
 
[17]  Xu K, Fang J, Fang Y, Sun Q, Wu C, Liu M. The Importance of Digital Elevation Model Selection in Flood Simulation and a Proposed Method to Reduce DEM Errors: A Case Study in Shanghai. International Journal of Disaster Risk Science 2021;12:890–902.

[18] Earthdata. Earthdata Search  Search. NASA USAGov 2023. (accessed July 6, 2023).

[19] Chaulagain D, Ram Rimal P, Ngando SN, Nsafon BEK, Suh D, Huh JS. Flood susceptibility mapping of Kathmandu metropolitan city using GIS-based multi-criteria decision analysis. Ecol Indic 2023;154:110653.

[20] Ramesh V, Iqbal SS. Urban flood susceptibility zonation mapping using evidential belief function, frequency ratio and fuzzy gamma operator models in GIS: a case study of Greater Mumbai, Maharashtra, India. Geocarto Int 2022;37:581–606.

[21] Towfiqul Islam ARM, Talukdar S, Mahato S, Kundu S, Eibek KU, Pham QB, et al. Flood susceptibility modelling using advanced ensemble machine learning models. Geoscience Frontiers 2021;12:101075.
 
[22] Liu X, Zhou P, Lin Y, Sun S, Zhang H, Xu W, et al. Influencing Factors and Risk Assessment of Precipitation-Induced Flooding in Zhengzhou, China, Based on Random Forest and XGBoost Algorithms. International Journal of Environmental Research and Public Health 2022, Vol 19, Page 16544 2022;19:16544.

[23] Paul Stackhouse. NASA POWER | DAVe. Esri, TomTom, Garmin, FAO, NOAA, USGS, EPA, USFWS 2024. (accessed June 6, 2024).

[24] Sugianto S, Deli A, Miswar E, Rusdi M, Irham M. The Effect of Land Use and Land Cover Changes on Flood Occurrence in Teunom Watershed, Aceh Jaya. Land 2022, Vol 11, Page 1271 2022;11:1271.

[25] USGS. EarthExplorer. Science for a Changing World 2024.  (accessed April 20, 2024).

[26] USGS. U.S. Geological Survey. The Water Cycle 2022.  (accessed April 13, 2024).

[27] Steinshouer DW, Qiang J, McCabe PJ, Ryder RT. Maps showing geology, oil and gas fields, and geologic provinces of the Asia Pacific region. 1999.

[28] Netzer M, Baumgartner C, Baumgarten D. Predicting prediction: A systematic workflow to analyze factors affecting the classification performance in genomic biomarker discovery. PLoS One 2022;17.

[29] Chen Y, Ma L, Yu D, Zhang H, Feng K, Wang X, et al. Comparison of feature selection methods for mapping soil organic matter in subtropical restored forests. Ecol Indic 2022;135:108545.

[30] Cai J, Luo J, Wang S, Yang S. Feature selection in machine learning: A new perspective. Neurocomputing 2018;300:70–9.

[31] Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Frontiers in Bioinformatics 2022;2:927312.

[32] Sarker IH. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput Sci 2021;2:1–21.

[33] Masrur Ahmed AA, Deo RC, Feng Q, Ghahramani A, Raj N, Yin Z, et al. Deep learning hybrid model with Boruta-Random forest optimiser algorithm for streamflow forecasting with climate mode indices, rainfall, and periodicity. J Hydrol (Amst) 2021;599:126350.

Public workspaceEnhancing precision flood mapping: Pahang's vulnerability unveiled

Enhancing precision flood mapping: Pahang's vulnerability unveiled