Landslide Susceptibility Mapping Using the MARS Algorithm in a Part of the Gorganrud River Basin | ||
| پژوهش های آبخیزداری | ||
| Articles in Press, Accepted Manuscript, Available Online from 19 March 2026 | ||
| Document Type: Research | ||
| DOI: 10.22092/wmrj.2026.371697.1652 | ||
| Authors | ||
| Narges Javidan* 1; ataollah Kavian2 | ||
| 1Faculty of Natural Resources, Sari Agricultural Sciences and Natural Resources University. Iran | ||
| 2- Professor, Department of Watershed Management, Sari Agricultural Sciences and Natural Resources University. | ||
| Abstract | ||
| Landslides represent a significant and widespread natural hazard, posing substantial threats to human life, infrastructure, and the environment, particularly in mountainous and hilly regions.Recent advancements in machine learning (ML) techniques, integrated with Geographic Information Systems (GIS), have shown great promise in improving the accuracy of landslide susceptibility maps (LSMs). The Multivariate Adaptive Regression Splines (MARS) algorithm is one such advanced data mining model known for its ability to handle complex, non-linear relationships between environmental predisposing factors and landslide occurrence. However, the performance and reliability of ML models like MARS can be significantly influenced by the model's configuration, particularly the sample size ratio used for training and validation, and the number of model replications. This study aims to systematically evaluate the impact of different sample sizes and replication numbers on the predictive performance of the MARS algorithm for landslide susceptibility mapping in a landslide-prone area within the Gorganrud River Basin, Golestan Province, Iran. The primary objective is to identify the optimal combination of these parameters to enhance the reliability and accuracy of the resulting susceptibility maps, thereby providing a more robust tool for land managers and planners. Materials and Methods The study was conducted in a 4.115 km² section of the Gorganrud River Basin in northeastern Iran, an area characterized by complex topography, diverse geological formations, and significant landslide activity. A total of 351 historical landslide locations were identified and mapped through field surveys, interpretation of Google Earth imagery, and existing landslide inventory maps. Eighteen conditioning factors influencing landslide occurrence were selected based on literature review and local characteristics. These factors included: land use, distance from fault, distance from river, lithology, slope percentage, slope aspect, Digital Elevation Model (DEM), annual rainfall, Topographic Wetness Index (TWI), longitudinal curvature, transverse curvature, LS factor, drainage density, soil texture, Relative Slope Position (RSP), Stream Power Index (SPI), Topographic Roughness Index (TRI), and distance from road. All factors were converted to 30x30 m raster layers in ArcGIS 10.5 and SAGA GIS. Multi-collinearity among these factors was assessed using the Variance Inflation Factor (VIF) and Tolerance indices in SPSS software, leading to the removal of the TRI layer to avoid redundancy. The MARS algorithm was implemented using different data splitting scenarios to analyze sensitivity and model uncertainty. Two main scenarios were tested: 1) Different sample sizes: 50/50%, 70/30%, and 80/20% (training/validation) with 10 replications. 2) Different replication numbers: 5, 10, and 15 replications for a fixed 70/30% sample split. The model's performance for each scenario was evaluated using the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC). Further validation was conducted using threshold-independent metrics including Sensitivity, Specificity, Efficiency, Accuracy, Kappa coefficient, and the Youden index to assess model fit, stability, and predictive capability comprehensively. Results and Discussion The evaluation results demonstrated that the MARS model performed excellently across all tested scenarios, with AUC values ranging from 0.80 to 0.94, indicating "excellent" to "outstanding" predictive accuracy according to standard classification. The scenario utilizing an 80/20% sample split for training/validation achieved the highest AUC value of 0.92. Among the replication scenarios, the model with 15 replications yielded the highest AUC of 0.94. Further analysis using the comprehensive validation metrics identified the 80/20% sample split and the 15-replication scenarios as the most robust and stable. The 80/20% scenario showed high Sensitivity (0.87), Specificity (0.62), Efficiency (85.30%), Accuracy (0.75), Kappa (0.50), and a Youden index of 0.24. Similarly, the 15-replication scenario exhibited strong performance with Sensitivity (0.84), Specificity (0.65), Efficiency (86.51%), Accuracy (0.74), Kappa (0.49), and a Youden index of 0.35. The high sensitivity values confirm the model's superior ability to correctly identify areas prone to landslides (true positives). The relatively moderate specificity indicates some limitations in correctly identifying stable areas, which is expected given the complex, multifactorial nature of landslide phenomena. The acceptable Kappa and overall accuracy values denote a good agreement between model predictions and ground observations. The stability analysis revealed minimal fluctuations in accuracy metrics when the input data were changed, confirming the model's robustness. The superior performance of the 80/20% split suggests that allocating a larger portion of data to the training phase is beneficial for the MARS model in this context. Likewise, increasing the number of replications to 15 enhanced the model's consistency and predictive power, mitigating the effects of random sampling variability. Conclusion and Suggestions This study successfully demonstrated the high efficacy of the MARS algorithm for landslide susceptibility mapping in the Gorganrud River Basin. The systematic investigation of sample size and replication parameters revealed that these factors significantly influence model performance and stability. The optimal configurations were identified as an 80/20 training/validation sample split and 15 model replications, which produced the most accurate, reliable, and stable susceptibility maps. The final landslide susceptibility map, generated under the optimal scenario, effectively delineates the basin into zones of very low, low, moderate, high, and very high susceptibility. The model's ability to capture complex, non-linear relationships between various geo-environmental factors and landslide occurrence underscores its advantage over traditional statistical models. The high predictive accuracy and robustness of the MARS model, as validated by multiple statistical measures, make it a valuable and trustworthy tool for spatial prediction of landslide hazards. The resulting susceptibility map provides a scientifically sound basis for informed decision-making in land-use planning, infrastructure development, disaster risk management, and the implementation of targeted mitigation measures in the study area and similar landslide-prone regions. | ||
| Keywords | ||
| Landslide Susceptibility; MARS Algorithm; ROC Curve; Sample Size; Replication; Gorganrud Basin | ||
| References | ||
|
| ||
|
Statistics Article View: 2 |
||