مدل هوش مصنوعی چندهدفه تقویتیِ گرادیان برای پیش‌بینی جریان رودخانه، با بهبود انتخاب ویژگی به کمک الگوریتم جستجوی هارمونی

ترابی, حسن; نجف پور, نویده

doi:10.22092/wmrj.2026.371142.1639

فهرست نشریات

اعطای مجوز انتشار 2 مجله ترویجی در شورای انتشارات سازمان

اعطای مجوز انتشار 3 مجله ترویجی در شورای انتشارات سازمان

به روز رسانی بخش استخراج به پایگاه‌های علمی (08-04-1401)

به روز رسانی ظاهری سامانه (08-04-1401)

ارتقاء سامانه مدیریت نشریات

وبینار آموزشی آیین نامه نشریات علمی و شیوه نامه ارزیابی و رتبه بندی نشریات علمی در تاریخ 25 شهریور ماه 1398

برگزاری دوره آموزشی ویراستاری متون علمی از طریق وب کنفرانس در تاریخ 1398/2/11

راه اندازی سامانه ارزیابی نشریات

قابل توجه سردبیران و مدیران داخلی نشریات (ساماندهی صفحه اول نشریات در سامانه)

مجوز انتشار دو نشریه توسط شورای انتشارات سازمان در مردادماه 1397

تعداد نشریات	282
تعداد نشریات سازمان	80
تعداد شماره‌ها	3,197
تعداد مقالات	35,446
تعداد مشاهده مقاله	52,451,671
تعداد دریافت فایل اصل مقاله	91,909,915

	مدل هوش مصنوعی چندهدفه تقویتیِ گرادیان برای پیش‌بینی جریان رودخانه، با بهبود انتخاب ویژگی به کمک الگوریتم جستجوی هارمونی
پژوهش های آبخیزداری
مقالات آماده انتشار، پذیرفته شده، انتشار آنلاین از تاریخ 01 تیر 1405
نوع مقاله: پژوهشی
شناسه دیجیتال (DOI): 10.22092/wmrj.2026.371142.1639
نویسندگان
حسن ترابی^* ¹؛ نویده نجف پور²
¹دانشکده کشاورزی - دانشگاه لرستان
²دکتری مهندسی علوم آب دانشگاه لرستان گروه مهندسی آب
چکیده
پیش‌بینی دقیق جریان رودخانه به عنوان یکی از جنبه‌های کلیدی در مدیریت پایدار منابع آب، طراحی و بهره‌برداری از سازه‌های هیدرولیکی و کاهش خطرات ناشی از سیلاب و خشکسالی شناخته می‌شود. این اهمیت به‌ویژه در مناطق کوهستانی نیمه‌خشک، مانند زاگرس در جنوب غرب ایران، که با محدودیت‌های جدی در داده‌های مشاهده‌ای و رفتار هیدرولوژیکی پیچیده مواجه هستند، دوچندان می‌شود. در این مطالعه، تلاش شده است تا یک مدل هوش مصنوعی رگرسیون مبتنی بر گرادیان تقویتی چندهدفه توسعه یافته و ارزیابی شود که می‌تواند به‌طور مؤثری جریان رودخانه را در حوضه آبریز کشکان پیش‌بینی کند. داده های ایستگاههای هواشناسی بدون تاخیر ، با تاخیر یک ماه تا سه ماه و داده های ایستگاه هیدرومتری کاکارضا و ماه و سال به عنوان ویژگی مورد استفاده قرار گرفتند. بنابراین 39 سری زمانی به عنوان ویژگی و دو سری زمانی داده های هیدرومتری کشکان افرینه و کشکان پلدختر به عنوان تارگت مد نظر قرار گرفت. با توجه به حذف داده های مفقود شده در کل داده های تارگت و ویژگی ها تعداد 633 ردیف داده در هر سری زمانی هم زمان حاصل گردید. از این داده ها جهت آموزش و تست مدلها استفاده شد. هدف اصلی این پژوهش، بهبود دقت پیش‌بینی دبی جریان در ایستگاه‌های کشکان–افرینه و کشکان–پلدختر است. به این منظور، از ترکیب مدل گرادیان تقویتی (GBR) با بهینه‌سازی انتخاب ویژگی مبتنی بر الگوریتم جستجوی هارمونی (HS) و اعتبارسنجی متقاطع k بخشی استفاده شده است. مدل تقویتی گرادیان با بهره‌گیری از رویکرد تجمعی مدل‌های ضعیف و یادگیری تدریجی خطاها، توانایی شناسایی الگوهای پیچیده و روابط غیرخطی بین متغیرهای هیدرولوژیکی و هواشناسی را دارد و می‌تواند از پدیده بیش‌برازش جلوگیری کند. نتایج به‌دست‌آمده نشان داد که ورودی‌هایی مانند دبی ایستگاه کاکارضا، همبستگی بسیار بالایی با متغیر هدف دارند و نقش کلیدی در افزایش دقت مدل ایفا می‌کنند. به‌طوری که حذف این ویژگی منجر به افت محسوس عملکرد مدل شد. به‌عنوان مثال، ضریب تعیین (R²) برای داده‌های آزمون در ایستگاه‌های کشکان–پلدختر و کشکان–افرینه به‌ترتیب از 0.92 و 0.96 به 0.78 و 0.75 کاهش یافت. همچنین، مقادیر RMSE و MSE نیز افزایش قابل‌توجهی را تجربه کردند. نتایج کلی این مطالعه نشان می‌دهد که ترکیب مدل هوش مصنوعی GBR با انتخاب ویژگی‌های هوشمند مبتنی بر جستجوی هارمونی و اعتبارسنجی متقاطع k بخشی می‌تواند روشی کارآمد و دقیق برای پیش‌بینی جریان رودخانه در حوضه‌های با داده‌های محدود فراهم آورد. همچنین، این ترکیب می‌تواند درک بهتری از نقش ایستگاه‌های کلیدی هیدرومتری در فرآیند یادگیری مدل ارائه دهد. در ادامه، مدل RF (جنگل تصادفی) با استفاده از استراتژی اعتبارسنجی متقاطع k بخشی و انتخاب ویژگی‌های مؤثر با استفاده از روش بهینه‌سازی الگوریتم ژنتیک برای پیش‌بینی دبی جریان در ایستگاه‌های هیدرومتری کشکان-افرینه و کشکان-پلدختر توسعه داده شد. این مطالعه به وضوح نشان می‌دهد که انتخاب ویژگی‌ها در مدل‌های هیدرولوژی نقش حیاتی دارد و این انتخاب باید با دقت و توجه صورت گیرد. نسخه اول مدل که شامل داده‌های ایستگاه هیدرومتری کاکارضا بود، دقت بالاتر، خطای کمتر و پایداری بیشتری را ارائه داد. در مقابل، حذف یک منبع داده با همبستگی بسیار قوی (دبی کاکارضا) منجر به کاهش قابل توجه R² و افزایش خطا در داده‌های آزمون شد. این افت به‌طور معناداری بر عملکرد و تعمیم‌پذیری مدل تأثیر گذاشته و توان پیش‌بینی مدل در شرایط آزمون را کاهش می‌دهد. نتایج این پژوهش همچنین تأکید می‌کنند که حتی با استفاده از روش‌های پیشرفته یادگیری ماشین مانند جنگل تصادفی، همراه با اعتبارسنجی متقاطع k بخشی، حضور داده‌های کلیدی با همبستگی بالا از اهمیت بسیار زیادی برخوردار است. بنابراین، این تحقیق به‌وضوح اهمیت ترکیب داده‌های چندمنبعی و انتخاب دقیق ویژگی‌ها را برای بهبود دقت و قابلیت تعمیم مدل‌های پیش‌بینی جریان رودخانه نشان می‌دهد. این نتایج می‌توانند به توسعه استراتژی‌های بهینه مدیریت منابع آب و کاهش خطرات ناشی از تغییرات اقلیمی و بحران‌های آب و هوایی در مناطق حساس کمک کنند.
کلیدواژه‌ها
حوزه آبریز؛ مدل هیدرولوژیکی؛ هوش مصنوعی؛ چند هدفه؛ یادگیری ماشین
عنوان مقاله [English]
A Multi-Objective Gradient Boosting Artificial Intelligence Model for River Flow Prediction Enhanced by Harmony Search-based Feature Selection
نویسندگان [English]
حسن ترابی¹؛ Navideh Najafpour²
¹Faculty of Agriculture - Lorestan University
²2. PhD in Water Science and Engineering, Esfahan Water Organization, Esfahan, Iran, Email:
چکیده [English]
Accurate river flow prediction is one of the key challenges in sustainable water resources management and in assessing hydrological hazards such as floods and droughts. In the semi-arid mountainous regions of the Zagros range, pronounced spatial and temporal variability in precipitation, temperature, and evaporation results in complex and nonlinear relationships among runoff-generating factors. Consequently, the performance of classical conceptual and physically based models such as HBV and SWAT tends to decline under high uncertainty or data-scarce conditions. In recent years, artificial intelligence (AI) approaches based on machine learning, particularly gradient boosting models, have emerged as powerful data-driven tools for hydrological modeling. This study aims to develop and evaluate a multi-objective Gradient Boosting Regressor (GBR) model for river discharge prediction in the Kashkan watershed, located in southwestern Iran. By integrating the GBR model with the Harmony Search (HS) metaheuristic algorithm for optimal feature selection, and employing k-fold cross-validation for model validation, this research seeks to enhance model accuracy, stability, and generalizability under data-limited conditions. The main objective is to propose an efficient framework for identifying key hydrometric and meteorological predictors of river flow and improving the performance of data-driven models in regions with limited observational records. Materials and Methods The study area is the Kashkan River Basin in Lorestan Province, covering approximately 9,500 km². The dataset includes observations from three hydrometric stations (Kakarza, Kashkan–Afrineh, and Kashkan–Poldokhtar) and nine meteorological stations over a regular statistical period. After removing incomplete data, a total of 633 concurrent records were used for model training and testing. In the first step, the Gradient Boosting Regressor (GBR) model was developed using the Scikit-Learn library in Python. The model iteratively combines multiple weak decision trees to progressively minimize prediction errors and identify nonlinear relationships between inputs and targets through ensemble learning. Key hyperparameters, including the learning rate, number of trees, and tree depth, were tuned using k-fold cross-validation. Next, optimal and multi-objective feature selection was performed using the Harmony Search (HS) algorithm. The objective function combined prediction accuracy (based on Mean Squared Error) and the number of selected features, while a weighting factor (β) controlled the balance between these two objectives. The input features included lagged time series of discharge from neighboring stations, meteorological variables with time delays, and temporal components such as month and year. Model performance was evaluated using several statistical metrics, including the Coefficient of Determination (R²), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Nash–Sutcliffe Efficiency (NSE). Qualitative analyses were also performed using the correlation matrix, error distribution histograms, SHAP (Shapley Additive Explanations) plots, and error boxplots to assess the influence of key stations and the model’s predictive behavior under varying conditions. Results and Discussion The results demonstrated that the Gradient Boosting Regressor, coupled with the optimized feature subset from the HS algorithm, achieved high accuracy in flow prediction. When the discharge data from the Kakarza station were included as an input, the model achieved R² values of 0.92 and 0.96 for the test data at the Kashkan–Poldokhtar and Kashkan–Afrineh stations, respectively, with corresponding RMSE values of 15.51 and 8.54 L/s. In contrast, excluding Kakarza station data led to a marked decrease in model performance: R² dropped to 0.78 and 0.75, while RMSE increased to 26.98 and 19.25 L/s, respectively. The correlation matrix revealed that the Kakarza discharge exhibited a very strong correlation (>0.95) with the target variable, highlighting its crucial role in improving model stability and identifying flow relationships. SHAP feature importance analysis confirmed that removing this station changed the model behavior, increasing its reliance on meteorological and temporal inputs. When hydrometric data were included, the model effectively learned seasonal flow patterns and long-term trends. The error histograms and boxplots further supported the robustness of the model configuration that included Kakarza data, showing reduced error dispersion and mean errors close to zero. Conversely, excluding the key station resulted in positive error skewness and more outliers, indicating a tendency toward underestimation. These findings underscore the importance of multi-source data integration and intelligent feature selection for achieving robust and generalizable performance in data-driven hydrological modeling. Conclusions and Recommendations This study demonstrates that integrating the Gradient Boosting Regressor (GBR) model with the Harmony Search (HS) metaheuristic algorithm and employing multi-fold cross-validation constitutes an effective approach for predicting river flows in data-scarce basins. The results emphasize the critical importance of including highly correlated hydrometric stations to improve model accuracy and stability, as their removal substantially reduces R² and increases prediction error. The developed model successfully captured nonlinear relationships between meteorological and hydrometric variables and prevented overfitting through intelligent feature selection. Future studies are encouraged to extend this framework by incorporating advanced boosting algorithms such as XGBoost or LightGBM, and by exploring hybrid metaheuristic algorithms (e.g., HS–GA). Expanding this approach to real-time discharge or rainfall–runoff prediction in other Zagros catchments could significantly contribute to the integrated management of the country’s water resources.
کلیدواژه‌ها [English]
Watershed, Hydrological Model, Artificial Intelligence, Multi-objective, Machine Learning

مراجع

آمار تعداد مشاهده مقاله: 29

سامانه مدیریت نشریات علمی. طراحی و پیاده سازی از سیناوب

پیوندهای مفید

اخبار و اعلانات

آمار

مدل هوش مصنوعی چندهدفه تقویتیِ گرادیان برای پیش‌بینی جریان رودخانه، با بهبود انتخاب ویژگی به کمک الگوریتم جستجوی هارمونی