
Data Mining Projects
CSE Projects
Download Project List Description
Data Mining Projects: Data Mining is the computing process of discovering patterns in large data sets involving the intersection of machine learning, statistics and database. We provide data mining and data analysis projects with source code to students that can solve many real time issues with various software based systems.Quality Factor
- 100% Assured Results
- Best Project Explanation
- Tons of Reference
- Cost optimized
- Controlpanel Access
1. Synthetic Data Generation and Evaluation Techniques for Classifiers in Data Starved Medical Applications
This study presents a robust framework for asthma prediction using machine learning and data augmentation techniques to address class imbalance in medical datasets. The pipeline integrates several modules, including data collection, preprocessing, class imbalance handling, model training, evaluation, and prediction. Three synthetic oversampling techniques—Standard SMOTE, Auto encoder-based generation, and Incremental SMOTE—were compared to enhance minority class representation. A Random Forest classifier was used for prediction, and performance was evaluated using F1-score, recall, precision, ROC-AUC, and precision-recall curves. Among the models, SMOTE-enhanced training yielded improved recall and F1-score. A prediction module was also developed for personalized asthma risk assessment, classifying patients as Healthy or Asthma with corresponding risk levels. This modular and scalable architecture allows easy adaptation to other healthcare prediction problems involving imbalanced data. The final results, including a comparative performance analysis of oversampling techniques, were saved for further evaluation and clinical decision support integration.
2. Customer Behavior Analysis and Predictive Modeling in Supermarket Retail A Comprehensive Data Mining Approach
This project presents a comprehensive data mining and predictive modeling approach to analyze customer behavior in a supermarket retail setting using the Instacart dataset. The process begins with data selection and integration of transactional records and product metadata, followed by detailed preprocessing, including handling missing values and encoding categorical variables. Customer-level features such as total items purchased, distinct items, and average basket size are derived from historical order data. These features are used to label customers into spending categories—Low, Medium, or High—based on quantile thresholds. Using these labels, machine learning models like Multi-layer Perceptron (MLP) and Support Vector Classifier (SVC) are trained and evaluated, with SVC selected for deployment due to its performance. The model and its corresponding scaler are serialized for reuse in real-time predictions. A user-interactive prediction function is also implemented, allowing input of shopping behavior metrics to classify a customer’s spending category and provide tailored business insights. This system enables retailers to segment customers effectively, predict purchasing behavior, and personalize marketing strategies to maximize customer lifetime value.
3. AI-Driven Meat Food Drying Time Prediction for Resource Optimization and Production Planning in Smart Manufacturing
This project presents an AI-driven solution for predicting meat drying time in smart manufacturing environments to enhance resource optimization and production planning. Using real-world data from a meat drying process, key parameters such as moisture, protein, fat content, temperature, humidity, and energy consumption were analyzed. The data underwent preprocessing, including handling missing values and label encoding, followed by training and evaluation using two machine learning models: XGBoost Regressor and a Multi-Layer Perceptron (MLP) Neural Network. Both models demonstrated strong predictive capabilities, with performance assessed through MAE, RMSE, and R² metrics. A user interface was designed to allow real-time predictions, making the solution practical for deployment via a Flask-based web application, supporting data-driven decision-making in smart food manufacturing.
4. Deep Classification of Algarrobo Trees in Seasonally Dry Forests of Peru Using Aerial Imagery
Plus trees are superior individuals selected for their exceptional genetic and morphological traits—such as health, growth form, and seed productivity—and play a vital role in sustainable forest management and reforestation. This study proposes an automated approach to classify plus trees of the algarrobo species (Neltuma pallida) using RGB aerial imagery and deep learning algorithms. A dataset combining geographic, phenological, and morphometric data was compiled with support from the Peru National Forest Service, and aerial imagery was processed to distinguish between plus and non-plus specimens. Three state-of-the-art convolutional neural networks—ResNet50, EfficientNetB0, and MobileNetV2—were evaluated, alongside a custom lightweight architecture named AlgarroboNet, designed specifically for this task. The models were assessed using hold-out validation across multiple trials. The results indicate the feasibility of deep learning-based classification for large-scale identification and inventory of plus trees, providing a scalable tool to support conservation planning and reforestation programs.
5. A Hybrid Machine Learning Model for Efficient XML Parsing
This project presents an intelligent algorithm selection system designed to predict the most efficient parsing algorithm based on file characteristics, such as file size and number of CPU cores. Utilizing a dataset, the system performs comprehensive exploratory data analysis (EDA) and applies machine learning models, including Artificial Neural Networks (ANN), Support Vector Machines (SVM), and a Hybrid Voting Classifier combining both. Data preprocessing includes label encoding and feature scaling for model readiness. Hyper parameter tuning is implemented for SVM using GridSearchCV, while ANN leverages adaptive learning and early stopping to optimize performance. The models are evaluated using confusion matrices and accuracy scores. The hybrid model achieves superior accuracy by leveraging the strengths of both individual models. A user interface allows real-time predictions with class probabilities and graphical comparison. This system provides a robust decision-making tool for algorithm selection, enhancing parsing efficiency and resource utilization in software environments handling large or variable file sizes.
6. Predicting the Compressive Strength of Recycled Concrete Using Ensemble Learning Model
This project focuses on predicting the compressive strength of concrete using machine learning techniques, based on its material composition and age. The dataset comprises several key input features including the amounts of cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate, and the age of the concrete in days. The goal is to develop an accurate predictive model that can estimate the compressive strength of concrete, which is essential for ensuring the structural integrity and durability of construction projects. The project pipeline includes multiple stages, beginning with data loading and preprocessing. The data is imported from a CSV file, where any missing values are filled with zeros, and the features are normalized using MinMax scaling to bring all variables into a similar range. Next, the dataset is divided into training and testing sets to allow for a fair and unbiased evaluation of the models. Three machine learning algorithms are employed: Random Forest, Polynomial Regression, and XGBoost. These models are trained and tested to determine how well they can predict compressive strength. Their performance is evaluated using standard regression metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (R²), and Mean Absolute Error (MAE). To visually compare model performance, a bar chart is created showing the MSE values for each approach. The results indicate that XGBoost outperforms the other models in terms of accuracy. As a final step, the XGBoost model is saved using the pickle library, enabling its reuse for future predictions or deployment in real-world applications.
7. Adaptive Optimizable Gaussian Process Regression Linear Least Squares Regression Filtering Method for SEM Images
Scanning Electron Microscopy (SEM) is a powerful imaging technique widely used in material science and nanotechnology for high-resolution surface characterization. However, SEM images often suffer from significant noise introduced during acquisition due to factors like low electron counts or environmental disturbances. Effective noise reduction is critical for accurateinterpretation and analysis of SEM images. This work explores and evaluates a hybrid filtering framework combining traditional and machine learning-based techniques to enhance the quality of SEM images by improving their Signal-to-Noise Ratio (SNR).The proposed framework incorporates a two-stage process. In the first stage, statistical features such as SNR, mean intensity, and variance are extracted from image patches. These features serve as inputs to a Gaussian Process Regressor (GPR), which is trained on synthetic data to predict local noise variance. The experiment is conducted on a set of SEM images divided into quadrants to analyze spatially varying noise behavior.
8. Prediction of Myocardial Infraction based on Non-ECG sleep data Combined with domain knowledge
The provided code presents a complete machine learning pipeline for classifying sleep disorders based on lifestyle and sleep-related data, using methods such as Random Forest and Convolutional Neural Networks (CNNs). This pipeline involves data pre-processing (including handling missing values and label encoding), feature selection, model training, evaluation with multiple metrics, and visualization of results.This approach can be adapted effectively to predict myocardial infarction (MI) using non-ECG sleep data combined with domain knowledge. While MI diagnosis traditionally depends on ECG and invasive tests, sleep and lifestyle features—such as sleep duration, quality, stress, physical activity, and cardiovascular indicators—carry important predictive information for MI risk. By applying similar pre-processing and modeling techniques from the sleep disorder classification code, these non-ECG features can be leveraged to build robust MI prediction models.Domain knowledge plays a critical role in this adaptation by guiding the selection and interpretation of relevant features associated with cardiovascular health, improving model accuracy and clinical relevance. The Random Forest model provides interpretable feature importance, while CNNs can capture complex temporal patterns in sequential sleep data. Additionally, generating synthetic data with noise, as done in the CNN section, can help address data scarcity and class imbalance common in medical datasets.Visualization tools such as correlation heatmaps and pairplots assist in understanding feature relationships, further enhancing model development and explanation. Saving trained models enables future deployment in clinical decision support systems, potentially allowing early and non-invasive MI risk assessment.In summary, the sleep disorder classification framework provides a solid foundation for predicting myocardial infarction using non-ECG sleep data. Combining machine learning with domain expertise offers a promising avenue for accessible, accurate, and interpretable cardiovascular risk prediction.
9. Recognizing Supporting Standpoints for Online Restaurant Reviews: Evidence From Tripadvisor
This article presents a practical framework that analyzes a real dataset from Tripadvisor with 2330 restaurant reviews. The reviews are written in English, and the structure of the reviews is free, not following any specific form. We conducted four experiments that exploited different artificial intelligence models and natural language processing systems. Our results indicate that an online restaurant review often shares several standpoints with another review of the same restaurant. Our results also illustrate that the number of agreed standpoints between reviews is not related to the rating of the respective reviews. Likewise, the results imply that the pattern of variation in the number of reviews (identified as supported with a specific evidence intensity) is unrelated to the reviews’ rating, where ‘‘evidence intensity’’ refers to the number of shared standpoints between a review and its supporter review. Additionally, if a restaurant review is found unsupported, the cause is mainly related to how the review was structured and written. Our framework can be utilized by online review platforms, such as Tripadvisor, to accredit or rank reviews based on supporting evidence from other related reviews.
10. Shape Penalized Decision Forests for Imbalanced Data Classification
Class imbalance presents a significant challenge in binary classification, especially when rare yet critical events are underrepresented in training data. Traditional machine learning and deep learning methods often struggle with this issue, while decision trees and random forests combined with sampling techniques have shown promise but can lead to information loss and increased complexity. This paper proposes Shape Penalized Decision Forests, a novel classifier that incorporates a penalty on the surface-to-volume ratio of decision sets during tree construction to inherently handle class imbalance without relying on oversampling or undersampling. By integrating ensemble methods such as bagging and adaptive boosting, the approach improves predictive accuracy and generalization. Extensive evaluation on twenty benchmark tabular datasets with varying imbalance ratios demonstrates superior performance compared to state-of-the-art data-level and algorithmic-level methods. Additional tests on simulated datasets highlight the model’s strong generalization. Statistical significance analyses confirm the robustness of the method. A Python package, ‘imbalanced-spdf,’ is released to facilitate adoption.
11. SCM-DL: Split-Combine-Merge Deep Learning Model Integrated With Feature Selection in Sports for Talent Identification
Talent Identification (TID) is crucial for early detection of athletes’ potential in specific sports branches. Existing AI-based TID methods face challenges with complex, non-linear data, scalability, hierarchical structures, incomplete inputs, and lack of adaptability across datasets. To overcome these, we propose a two-stage TID framework. The first stage (TID1) uses a Shallow Deep Learning (SDL) model to classify admitted athletes, achieving 98.85% accuracy. The second stage (TID2) focuses on classifying athletes into sports branches—football, basketball, volleyball, or athletics—by applying nine feature selection techniques to reduce dimensionality. We introduce a novel SCM-DL deep learning classifier with parallel and combinatorial layers, outperforming traditional classifiers such as Random Forest and SVM. Integrated with RFE_DTC feature selection, SCM-DL achieved 97.40% accuracy and a Matthews Correlation Coefficient of 96.6% using only six features. This approach effectively guides coaches in focusing on key performance metrics, improving talent identification precision and efficiency.
12. Enhancing stress detection in IT workplaces: Integrating machine learning, visual processing, and privacy-enhancing techniques
Identifying stress in IT employees requires advanced technologies that balance accurate detection with ethical and privacy concerns. This study addresses the challenges of precisely recognizing diverse stress responses while safeguarding sensitive mental health data through machine learning and visual processing techniques. The proposed system integrates customized stress detection algorithms designed to account for individual differences, alongside state-of-the-art encryption methods to ensure data confidentiality. By combining these approaches, the system enhances both the accuracy and reliability of stress identification and alleviates privacy concerns related to sensitive information handling. Experimental results demonstrate that the system outperforms existing models in terms of accuracy, precision, and scalability. This framework offers a promising solution for real-time, privacy-preserving stress monitoring in the workplace, supporting IT organizations in fostering healthier work environments.
13. RFMVDA: An Enhanced Deep Learning Approach for Customer Behavior Classification in E-Commerce Environments
Customer Relationship Management (CRM) systems have evolved into Software-as-a-Service platforms enhanced by Customer Data Platforms (CDP) that continuously collect customer behavior data. In the rapidly growing e-commerce sector, customer classification and analysis demand models that capture dynamic and complex behaviors. Traditional RFM (Recency, Frequency, Monetary) models face limitations in this environment due to difficulties in collecting and reflecting real-time customer interactions. To address these challenges, we propose the RFMVDA model, which extends RFM by incorporating Visits, Durations, and Actions to better capture customer sessions and behaviors in e-commerce contexts. Leveraging this enriched data, we developed a Deep Neural Network (DNN) for customer segmentation and behavior prediction. Experimental results demonstrate that the RFMVDA-based model achieves a high segmentation accuracy of 92.98%, outperforming traditional methods. This approach provides a more comprehensive and accurate tool for understanding and predicting customer behavior in the e-commerce environment, facilitating more effective marketing and customer management strategies.
14. A Cloud-Based Optimized Ensemble Model for Risk Prediction of Diabetic Progression
This study presents an optimized ensemble algorithm combining Light Gradient-Boosting Machine (LightGBM) and K-Nearest Neighbour (KNN) for predicting the progression risk of Type 2 Diabetes. Utilizing patient health parameters and serum measurements, the model classifies patients as high or low risk. Optimization techniques, including 10-fold cross-validation and grid search, enhance model performance. The ensemble employs a soft voting classifier to leverage the strengths of both LightGBM and KNN. Implemented on Microsoft Azure Machine Learning, the approach benefits from cloud scalability and integration potential with IoT-based healthcare systems, enabling remote patient monitoring. The ensemble achieved an AUC-ROC score of 83.2% and 75% accuracy, outperforming other classification and ensemble models. Validation on an additional risk prediction dataset confirmed its robustness. This predictive model offers valuable insights for patients and medical professionals, supporting timely interventions and improved management of diabetes progression.
15. Personalized Learning Through MBTI Prediction: A Deep Learning Approach Integrated With Learner Profile Ontology
This paper presents a novel personalized e-learning framework that integrates MBTI personality prediction with deep learning and a Learner Profile Ontology (LPO) to enhance learning recommendations. Leveraging the BERT Transformer model, the approach accurately predicts learners’ personality types, addressing data imbalance through oversampling techniques. The predicted MBTI profiles are incorporated into a Semantic Web Rule Language (SWRL)-based ontology, enriched with WordNet, to semantically align learning resources with individual learner traits. This integration enables more precise personalization by adapting content to match learners’ unique preferences and styles. Experimental results demonstrate that the proposed method significantly improves prediction accuracy, learner satisfaction, and educational outcomes. The framework addresses challenges in handling large data volumes while balancing accuracy and efficiency, offering a sustainable solution for adaptive e-learning environments. This work highlights the potential of combining advanced AI techniques with semantic technologies for more effective personalized education.
16. FastGEMF: Scalable High-Speed Simulation of Stochastic Spreading Processes Over Complex Multilayer Networks
Predicting stochastic spreading processes across large-scale multi-layered networks remains a significant computational challenge due to the intricate interplay between network structure and spread dynamics. This study introduces FastGEMF, a novel, scalable simulation framework for exact, high[1]speed modeling of Markov chain processes on complex multi-layer networks. Inspired by the Gillespie algorithm and optimized for efficiency, FastGEMF achieves logarithmic time complexity per event, enabling simulations on networks with millions of nodes and edges without sacrificing accuracy. It introduces an event-driven algorithm with cautious update strategies, supporting diverse multi-compartment spreading processes. FastGEMF is implemented in Python programming language as an open-source package, providing accessibility to researchers and practitioners across domains such as epidemiology, cybersecurity, and information propagation, establishing an exact baseline for model validation and comparative analysis
17. Artificial Intelligence in Dyslexia Research and Education: A Scoping Review
This scoping review explores the evolving role of Artificial Intelligence (AI) in dyslexia research and education, mapping current applications, emerging trends, and existing gaps. AI technologies such as machine learning, natural language processing, and adaptive learning systems have shown potential in early detection, personalized interventions, and support tools for dyslexic learners. The review examines peer-reviewed studies and gray literature to assess how AI contributes to diagnosis accuracy, reading and writing support, and inclusive learning environments. Key findings reveal a growing interest in AI-driven screening tools and educational apps that adapt to individual learning patterns. However, ethical concerns, data privacy, and the need for more diverse training datasets remain challenges. This review highlights interdisciplinary collaboration as essential to integrating AI effectively into dyslexia support systems. It also underscores the need for longitudinal studies to evaluate long-term outcomes. Overall, AI offers promising avenues for enhancing dyslexia education but requires careful, evidence-based implementation.
18. Comparative Study of Machine Learning and Deep Learning Models for Early Prediction of Ovarian Cancer
This study presents a comparative analysis of machine learning and deep learning models for the early prediction of ovarian cancer. Various algorithms, including traditional classifiers like Support Vector Machines (SVM), Random Forest, and Logistic Regression, are compared with deep learning approaches such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks. The models are evaluated on clinical and imaging datasets to assess their accuracy, sensitivity, and specificity in detecting ovarian cancer at early stages. Results demonstrate that deep learning models generally outperform traditional methods, offering improved predictive capabilities that can aid timely diagnosis and enhance patient outcomes.
19. An Analysis of Semi-Supervised Machine Learning in Electrical Machines
This research explores semi-supervised learning approaches to improve fault diagnosis and performance monitoring in electrical machines, where labeled data is limited. The study applies algorithms that leverage both labeled and unlabeled data to enhance model accuracy. Results demonstrate better detection rates compared to purely supervised or unsupervised methods. The approach supports predictive maintenance and operational efficiency. Challenges in industrial implementation and data quality are addressed. Findings suggest semi-supervised learning as a practical solution for smart manufacturing.
20. Ad Click Fraud Detection Using Machine Learning and Deep Learning Algorithms
The Python script presents a complete machine learning pipeline for detecting ad click fraud using both classical and deep learning methods. It starts by loading and preprocessing the dataset (ad_click_fraud_dataset_updated.csv), removing unnamed columns and handling missing values. Feature engineering is performed by extracting temporal features from timestamps and calculating behavioral metrics like click speed and click rate. The data is split into features (X) and labels (y), then divided into training and testing sets.Two models are explored: a simplified Multi-Layer Perceptron (MLP) and a lightweight 1D Convolutional Neural Network (CNN). The MLP uses a small training subset, shallow architecture, and limited iterations to target moderate accuracy (~80%). The CNN employs a reduced training size (40% of training data), high dropout, and a compact design aiming for around 90% accuracy. Both models are evaluated on the test set, and their accuracies are compared using a bar chart. Additional visualizations include a histogram for exploratory data analysis. Finally, the trained MLP model is saved with Python’s pickle module for future use. Overall, the script demonstrates a controlled experiment to compare neural network approaches in fraud detection under deliberate constraints.
21. A Novel Partitioned Random Forest Method-Based Facial Emotion Recognition
The Python script presents a complete machine learning pipeline for detecting ad click fraud using both classical and deep learning methods. It starts by loading and preprocessing the dataset (ad_click_fraud_dataset_updated.csv), removing unnamed columns and handling missing values. Feature engineering is performed by extracting temporal features from timestamps and calculating behavioral metrics like click speed and click rate. The data is split into features (X) and labels (y), then divided into training and testing sets.Two models are explored: a simplified Multi-Layer Perceptron (MLP) and a lightweight 1D Convolutional Neural Network (CNN). The MLP uses a small training subset, shallow architecture, and limited iterations to target moderate accuracy (~80%). The CNN employs a reduced training size (40% of training data), high dropout, and a compact design aiming for around 90% accuracy. Both models are evaluated on the test set, and their accuracies are compared using a bar chart. Additional visualizations include a histogram for exploratory data analysis. Finally, the trained MLP model is saved with Python’s pickle module for future use. Overall, the script demonstrates a controlled experiment to compare neural network approaches in fraud detection under deliberate constraints.
22. Cardiac Clarity: Harnessing Machine Learning for Accurate Heart-Disease Prediction
Heart disease continues to be a leading cause of death globally, highlighting the urgent need for reliable and early diagnostic tools.Cardiac Clarity explores the application of machine learning algorithms to enhance the accuracy of heart disease prediction using patient medical data. By analyzing features such as age, blood pressure, cholesterol levels, and other clinical indicators, the system is trained to recognize patterns associated with cardiac risk. Machine learning models are developed and evaluated to identify the most effective approaches for distinguishing between healthy individuals and those at risk of heart disease. The study demonstrates that machine learning can significantly improve predictive performance compared to traditional diagnostic methods, offering a powerful decision-support tool for clinicians. This work underscores the potential of artificial intelligence in transforming preventive cardiology and supporting timely, data-driven healthcare decisions.
23. A Novel Framework for Saraiki Script Recognition Using Advanced Machine Learning Models (YOLOv8 and CNN)
Script recognition plays a crucial role in the digitization and preservation of regional languages. This study presents a novel framework for automatic recognition of the Saraiki script using advanced machine learning models, specifically YOLOv8 and Convolutional Neural Networks (CNN). The proposed system is designed to accurately detect and classify handwritten and printed Saraiki characters in complex visual environments. YOLOv8 is employed for real-time detection of script regions, while CNN is utilized for detailed classification of the extracted characters. A custom dataset of Saraiki script images was developed and preprocessed to train and validate the models. Experimental results demonstrate high accuracy and efficiency in script recognition, even under challenging conditions such as varying fonts, noise, and backgrounds. This framework marks a significant advancement in regional language processing and provides a scalable solution for future applications in optical character recognition (OCR), education, and digital archiving.
24. AgeML: Age Modeling With Machine Learning
Accurately estimating a person’s age based on visual or biometric data has valuable applications in healthcare, security, digital forensics, and personalized services. AgeML introduces a machine learning-based framework for robust age modeling using a variety of input features such as facial images, biometric markers, or demographic data. The framework leverages advanced machine learning algorithms, including regression models and deep learning architectures, to predict chronological age with high accuracy. A diverse dataset is used to train and evaluate the models under varying conditions, including lighting, expression, and ethnicity. Through extensive experimentation, AgeML demonstrates strong performance in both age estimation and classification tasks. The results highlight the effectiveness of machine learning in capturing complex, non-linear age-related patterns. This work contributes to the development of intelligent systems capable of real-time age prediction, with potential for deployment in real-world, multi-domain applications.
25. Fault Detection in Photovoltaic Systems Using a Machine Learning Approach
Ensuring the reliability and efficiency of photovoltaic (PV) systems is essential for the sustainable generation of solar energy. This study proposes a machine learning-based approach for the automatic detection of faults in photovoltaic systems, aiming to enhance system performance and reduce downtime. By analyzing operational data such as voltage, current, irradiance, and temperature, the proposed framework uses supervised learning algorithms to identify patterns indicative of various fault types, including shading, soiling, and component failures. A labeled dataset of real-world PV system measurements is used to train and validate the models. The results show that the machine learning approach can accurately detect and classify faults with high precision and recall, outperforming traditional threshold-based techniques. This work demonstrates the potential of intelligent fault detection systems to support predictive maintenance, improve energy output, and ensure the long-term viability of solar power installations.
26. Semi-Supervised Building Footprint Extraction Using Debiased Pseudo-Labels
Accurate extraction of building footprints from remote sensing imagery is critical for urban planning, disaster response, and geographic information systems. This study presents a semi-supervised learning framework that leverages debiased pseudo-labels to improve building footprint extraction performance, particularly in scenarios with limited labeled data. The proposed method combines a small set of manually annotated satellite images with a large pool of unlabeled data, generating pseudo-labels through a trained model and applying debiasing techniques to correct for systematic labeling errors. These refined pseudo-labels are iteratively used to retrain the model, enhancing generalization and reducing overfitting to noisy predictions. Experimental evaluations on benchmark datasets demonstrate that the framework significantly outperforms conventional supervised and naive pseudo-labeling methods in both accuracy and boundary delineation. This approach offers a scalable and cost-effective solution for high-quality building footprint mapping in data-scarce environments.
27. A Survey of Ransomware Detection Methods
Ransomware has emerged as one of the most significant cybersecurity threats, causing substantial financial and operational damage across various sectors. This survey provides a comprehensive overview of current methods and approaches for ransomware detection, with a focus on both traditional and machine learning-based techniques. The study categorizes detection methods into signature-based, behavior-based, heuristic, and hybrid approaches, highlighting their respective strengths and limitations. Additionally, the survey explores the growing role of artificial intelligence and anomaly detection models in identifying ransomware attacks in real-time. Comparative analysis of existing techniques is presented based on detection accuracy, response time, evasion resistance, and resource efficiency. Key challenges such as zero-day attacks, obfuscation techniques, and dataset availability are discussed. The paper concludes by identifying open research directions and emphasizing the need for adaptive, robust, and scalable detection frameworks to combat the evolving threat landscape.
28. Weak–Strong Graph Contrastive Learning Neural Network for Hyperspectral Image Classification
Hyperspectral image (HSI) classification plays a vital role in remote sensing applications, yet it remains challenging due to the high dimensionality, limited labeled samples, and complex spectral–spatial dependencies. This study introduces a novel Weak–Strong Graph Contrastive Learning Neural Network (WS-GCLNet) to improve HSI classification performance under limited supervision. The proposed framework leverages graph-based contrastive learning by generating two augmented views of the data: a weakly augmented view preserving local features and a strongly augmented view emphasizing global context. These views are processed through a graph neural network that captures spectral–spatial correlations, and a contrastive loss is applied to align their embeddings, promoting robust feature representations. Extensive experiments on benchmark hyperspectral datasets demonstrate that WS-GCLNet significantly outperforms existing supervised and self-supervised methods, particularly in low-label scenarios. This approach highlights the potential of combining weak–strong augmentation strategies with graph-based contrastive learning for efficient and accurate HSI classification.
29. Ensuring Zero Trust Security in Consumer Internet of Things Using Federated Learning-Based Attack Detection Model
Advanced behavior-based intranet attacks represent a sophisticated and evolving form of cyber threat, where malicious actors exploit vulnerabilities within internal networks using highly dynamic and adaptive strategies. Unlike traditional signature-based methods of intrusion detection, behavior-based approaches analyze patterns, anomalies, and deviations from normal user and system activities. These attacks often target weak points in the internal security infrastructure, such as unpatched software, poor access controls, or misconfigurations. By mimicking legitimate activities or hiding within normal system operations, they can bypass conventional security measures, making detection more challenging. This paper explores the various types of advanced behavior-based attacks within intranet environments, discusses the underlying techniques used by attackers, and highlights the potential risks to organizational security. Additionally, we examine the effectiveness of behavior-based detection systems, such as machine learning and anomaly detection tools, in identifying and mitigating these advanced threats. The study aims to provide insights into enhancing intranet security strategies to defend against evolving cyber threats effectively.
30. Mitigating Cyber Risks in Smart Cyber-Physical Power Systems Through Deep Learning and Hybrid Security Models
The rise in cyber threats has posed significant challenges to organizations and individuals, making cybersecurity a critical concern in the digital age. Traditional systems for detecting and mitigating cyber attacks often fall short in real-time threat prediction and classification due to their reliance on predefined rules and manual interventions. To address these issues, this project proposes a novel approach using Deep Neural Networks (DNNs) to predict and mitigate cyber threats by analyzing user login data and other related patterns. The model is trained on a dataset containing various cyber attack scenarios, providing valuable insights into different types of cyber threats and attack strategies.The system follows a structured process starting with data collection, followed by data preprocessing where text data is cleaned and normalized to enhance model accuracy. The preprocessed data is then split into training and test sets, and a deep neural network classifier is trained to identify patterns indicative of potential threats. The model undergoes rigorous evaluation and is able to classify various types of cyber attacks such as login anomalies, brute force attacks, and unauthorized access attempts
31. AI-Powered IoT: A Survey on Integrating Artificial Intelligence With IoT for Enhanced Security,Efficiency, and Smart Applications
The increasing integration of Internet of Things (IoT) devices in smart homes has brought numerous benefits, including automation, convenience, and energy efficiency. However, it has also introduced new vulnerabilities, making IoT networks prime targets for cyberattacks. Cyber threats targeting IoT devices are becoming more sophisticated, often exploiting the weaknesses in smart home networks, and can lead to severe consequences such as unauthorized access, data breaches, or even physical harm. In response to this, this project aims to develop an intelligent system to detect and mitigate cyberattacks on smart home IoT networks. The system utilizes a networking rate-based dataset containing user login information to predict potential threats. The data is stored in a dataframe, and various preprocessing steps, such as cleaning, normalization, and feature extraction, are applied to prepare the data for machine learning.Once the data is processed, it undergoes classification using a Convolutional Neural Network (CNN), a deep learning algorithm well-suited for recognizing complex patterns. The platform provides real-time price updates using machine learning algorithms to forecast price trends, ensuring transparency and fairness in pricing.
Farmers can manage their stock, while markets can update demand, facilitating efficient matching of supply and demand.
Using historical data, ML models such as Random Forest and Passive Classifier are employed to forecast price trends and help farmers make informed decisions.
A web interface enables users to register, upload data, view price trends, track inventory, and access information about government schemes.
32. Evaluation of Blockchain-Based Tracking and Tracing System With Uncertain Information: A Multi-Criteria Decision-Making Approach
The increasing volume and pseudonymity of blockchain-style transactions present significant challenges for identifying illicit activities such as fraud and money laundering. Traditional analytical methods often fail to capture the complex, interconnected nature of these financial networks. This project introduces a comprehensive, modular Python framework designed to address this challenge by leveraging graph-based machine learning for transaction analysis and classification.
33. Trip Based Modeling of Fuel Consumption in Modern Heavy-Duty Vehicles Using AI
The AI-based model for analyzing and predicting fuel consumption in commercial vehicles, aiming to optimize fuel efficiency and reduce operational costs. By utilizing machine learning algorithms such as Decision Tree and Linear Regression, the model predicts fuel consumption based on vehicle attributes and operational data. The dataset undergoes preprocessing, including missing data handling and label encoding, followed by dimensionality reduction using Principal Component Analysis (PCA). Model performance is evaluated using metrics like accuracy, precision, and F1-score, providing a robust framework for improving fuel efficiency in commercial fleets and supporting sustainability efforts in transportation.
34. The Impact of Aging on an FPGA-based Physical Unclonable Function
The core methodology begins by transforming raw, tabular transaction data into a rich, structural representation using the NetworkX library. A directed graph is constructed where network entities (e.g., wallet addresses) are modeled as nodes and transactions are represented as weighted, directed edges, with the weight corresponding to the transaction amount. This graph-centric approach effectively captures the flow of funds and the relational dynamics between participants.
The application includes a registration and login system for user access, followed by a comprehensive dashboard divided into three key categories: Monitoring, Inventory, and Contact. The Monitoring section provides real-time crowd detection and manages student details, helping cafe owners keep track of the number of customers and student visitors. The Inventory section enables staff to monitor food-related data, track stock levels, and make suggestions for ordering based on popular items. The Contact section offers a streamlined way for customers to reach out to cafe management for support or inquiries.
35. Assessment of Climate Change in Angola and Potential Impacts on Agriculture
This study analyzes climate change trends in Angola and their impacts on the agricultural sector. By examining historical climate data alongside predictive models, it assesses significant shifts in temperature and precipitation patterns over recent decades. The research identifies vulnerable crops and regions most at risk from these changes, highlighting potential consequences such as reduced crop yields and increased food insecurity. It also explores adaptation strategies and sustainable farming practices that could mitigate negative effects. Additionally, the study emphasizes the importance of informed policy recommendations to support resilient agricultural systems, ensuring food security and the livelihoods of communities dependent on farming in the face of ongoing climate challenges.
Through detailed user profiles, it tracks learning goals, preferred pace, and styles, ensuring a customized journey. Immediate feedback and assessments empower learners to stay on track, while dynamic progress tracking and analytics provide insights into strengths and areas for improvement.
Gamification elements, such as badges and leaderboards, enhance motivation and engagement. With mobile accessibility and cross-platform compatibility, the platform offers flexibility, allowing students to learn anytime, anywhere.
Built with Python, MySQL, and Flask, the platform provides a seamless and interactive experience for learners.
36. A Fusion Deep Learning Model for Predicting Adverse Drug Reactions Based on Multiple Drug Characteristics
From this network structure, a comprehensive set of features is engineered for each node to capture various aspects of its behavior. The initial implementation primarily focuses on the transaction count, represented by the out-degree of each node. However, the framework is designed to be extensible, allowing integration of more advanced graph metrics such as in-degree, centrality measures including PageRank and betweenness centrality, as well as clustering coefficients. These extracted features serve as the input to a supervised machine learning model. Specifically, a RandomForestClassifier from scikit-learn is trained to predict predefined categories for each node, enabling automated and accurate detection of entities linked to specific behaviors within the network.
37.Machine learning in smart production logistics: a review of technological capabilities
This review explores the application of machine learning (ML) technologies in smart production logistics, emphasizing critical areas such as demand forecasting, inventory management, and supply chain optimization. It illustrates how ML improves operational efficiency, lowers costs, and enables real-time decision-making within manufacturing environments. The paper addresses challenges related to data quality, system integration, and scalability. Additionally, it examines emerging trends like reinforcement learning and the integration of the Internet of Things (IoT). The review identifies current gaps and suggests future research directions to further advance smart logistics. Overall, ML is demonstrated as a transformative force driving Industry 4.0 initiatives forward.
38. Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches
This project introduces a comprehensive end-to-end pipeline structured into distinct classes to promote clarity, modularity, and reusability. The pipeline encompasses essential stages such as data selection, preprocessing, graph construction, model training, and detailed performance evaluation. A JSON serialization component is integrated to realistically simulate transaction data handling as encountered in operational systems. The model's performance is rigorously assessed through a complete classification report, including precision, recall, and F1-score—key metrics especially important for imbalanced datasets. This project effectively demonstrates how combining graph theory with machine learning techniques can increase transparency and deliver actionable intelligence for analyzing complex transactional data systems.
39. Mining the Opinions of Software Developers for Improved Project Insights: Harnessing the Power of Transfer Learning
This research applies transfer learning to analyze software developers’ opinions from forums and code repositories. The approach enhances sentiment analysis and topic modeling in technical contexts. Insights help project managers identify bottlenecks and improve collaboration. Transfer learning enables effective use of limited labeled data. The study evaluates models on real-world software development datasets. It offers recommendations for integrating AI in software project management.
40. Multiparametric MRI-Based Radiomics Signature with MachineLearning for Preoperative Predictionof Prognosis Stratification inPediatric Medulloblastoma
This paper develops a machine learning model using multiparametric MRI radiomics to predict prognosis in pediatric medulloblastoma. Quantitative imaging features are extracted to create a radiomics signature. The model stratifies patients into risk groups preoperatively with high accuracy. Early risk assessment supports personalized treatment planning. Validation on clinical data shows promising predictive performance. Future work includes multi-center studies for generalization.
41. Classification models for likelihood prediction of diabetes at early stage using feature selection
This study explores the application of classification algorithms integrated with feature selection techniques to predict the early onset of diabetes. It evaluates various machine learning models such as decision trees, logistic regression, and others, assessing their accuracy and efficiency. Feature selection plays a crucial role in enhancing model performance by identifying the most relevant attributes, thereby reducing computational costs and improving interpretability. The research employs clinical datasets for robust training and testing, ensuring practical relevance. Early diabetes prediction facilitates timely preventive healthcare interventions, which can significantly improve patient outcomes. The study underscores the need for accurate, transparent models to support medical decision-making and foster trust in AI-based healthcare systems.
42. Artificial intelligence algorithms in flood prediction: a general overview
This paper provides an extensive review of artificial intelligence (AI) algorithms employed in flood prediction systems. It explores the use of machine learning, deep learning, and hybrid models that leverage meteorological and hydrological data to enhance forecasting accuracy and increase lead times compared to conventional methods. The study addresses challenges such as limited data availability, model generalization issues, and difficulties in integrating diverse data sources. Practical implementations include early warning systems and disaster management frameworks. Additionally, the review identifies current advancements and outlines future research directions to improve the robustness, reliability, and overall effectiveness of flood prediction systems for better disaster preparedness and risk reduction.
43. Enhancing Proactive Cyber Defense: A Theoretical Framework for AI-Driven Predictive Cyber Threat Intelligence
This project focuses on building a machine learning-based system for detecting malware in educational IoT systems using the IoEd-Net dataset. The process begins with data collection and preprocessing, including handling missing data, feature engineering, normalization, and addressing imbalances. The data is split into training and test sets for model evaluation. Various machine learning algorithms, such as Convolutional Neural Networks (CNN) and K-Nearest Neighbors (KNN), are used to train and predict system behaviors, identifying benign or malicious activities. Model performance is assessed through accuracy, precision, recall, and F1 score, with visualizations like confusion matrices and ROC curves. The system also generates real-time alerts and provides a dashboard for monitoring detected threats. The backend is implemented in Python with Flask for the front-end, MySQL for the database, and Anaconda Navigator for development.
44. Machine learning applications in flood forecasting and predictions, challenges, and way-out in the perspective of changing environment
Flood severity prediction is critical for disaster management, and this study leverages IoT- enabled sensor data to develop an ensemble machine learning model for accurate flood forecasting. The dataset includes real-time environmental parameters such as rainfall intensity, water level, soil moisture, and temperature, preprocessed to handle missing values, outliers, and normalization. The model combines Long Short-Term Memory (LSTM) networks for time-series analysis and Support Vector Machines (SVM) for classification, enabling robust flood severity predictions.
Visualization techniques, including time-series plots and correlation heatmaps, provide insights into feature relationships and model performance. Evaluation metrics such as MAE, RMSE, accuracy, and F1-score demonstrate the model's effectiveness in predicting flood severity levels. The results highlight the potential of IoT-driven ensemble models in enhancing flood risk assessment and mitigation strategies.
45. Revolutionizing cardiovascular health integrating deep learning techniques for predictive analysis of personal key indicators in heart disease
This study aims to predict cardiovascular diseases (CVD) using advanced machine learning models, with an integration of chronic kidney disease (CKD) and stock risk prediction to enhance prediction accuracy. Initially, the dataset is pre-processed to calculate relevant formulas, extracting features such as cardiovascular indicators, CKD metrics, and stock risk factors. A hybrid approach combining Long Short-Term Memory (LSTM) networks and XGBoost is employed to predict CVD outcomes. LSTM models, known for their proficiency in sequential data analysis, capture temporal dependencies in the data, while XGBoost provides robust performance in classification tasks. By incorporating CKD and stock risk data, the model not only aims for higher prediction accuracy in detecting cardiovascular risks but also explores the relationship between health indicators and financial stability. The combined use of LSTM and XGBoost offers a multi-faceted approach, significantly improving the predictive power of cardiovascular disease detection and potentially facilitating more informed healthcare and investment decisions.
46. Optimization of network device hardening in a multivendor environment
This study aims to predict cardiovascular diseases (CVD) using advanced machine learning models, with an integration of chronic kidney disease (CKD) and stock risk prediction to enhance prediction accuracy. Initially, the dataset is pre-processed to calculate relevant formulas, extracting features such as cardiovascular indicators, CKD metrics, and stock risk factors. A hybrid approach combining Long Short-Term Memory (LSTM) networks and XGBoost is employed to predict CVD outcomes. LSTM models, known for their proficiency in sequential data analysis, capture temporal dependencies in the data, while XGBoost provides robust performance in classification tasks. By incorporating CKD and stock risk data, the model not only aims for higher prediction accuracy in detecting cardiovascular risks but also explores the relationship between health indicators and financial stability. The combined use of LSTM and XGBoost offers a multi-faceted approach, significantly improving the predictive power of cardiovascular disease detection and potentially facilitating more informed healthcare and investment decisions.
47. DETECTION OF FAKE ACCOUNTS IN SOCIAL MEDIA
This project focuses on developing a system for detecting fake profiles on social media platforms, specifically Instagram, Twitter, and Facebook. The system utilizes machine learning algorithms to classify profiles as either real or fake based on key features extracted from the data. Data preprocessing techniques such as handling missing data, feature scaling, text cleaning, and categorical encoding are applied to ensure the data is clean and suitable for analysis. The dataset is split into training, validation, and testing sets to properly train and evaluate the models. Random Forest, Artificial Neural Networks (ANN), and Convolutional Neural Networks (CNN) are employed for classification tasks. Performance is measured using accuracy, precision, recall, and F1 score, with the goal of achieving a balanced detection system. The Random Forest model shows strong accuracy and precision, while the ANN and CNN models help capture intricate patterns, contributing to more accurate fake profile identification.
48. EEG-based functional connectivity analysis of brain abnormalities: A systematic review study
This project aims to leverage EEG signals for the diagnosis and performance tracking of individuals with ADHD in online learning environments. By analyzing brainwave patterns during cognitive tasks, the system can assess engagement, attention, and task performance in real-time. Using a combination of preprocessing techniques (such as signal filtering, artifact removal, and feature extraction), machine learning algorithms like Random Forest, CNNs, and Online Gradient Descent will be employed to classify and predict ADHD-related behaviors. Real-time feedback mechanisms will be integrated to provide interventions, such as reminders or encouragement, when distractions or inattention are detected, enhancing personalized learning experiences. The model will continuously improve through adaptive learning, offering a dynamic solution for ADHD management in educational contexts.
49. From AI to digital transformation-The AI readiness framework
The "Focus AI" project aims to revolutionize visual content processing by integrating three powerful AI techniques: image enhancement, background removal, and image captioning. Using the DIV2K dataset, Generative Adversarial Networks (GANs) are employed to enhance low-resolution images into high-resolution versions, improving visual quality. For background removal, the system utilizes CNN and U-Net architectures to accurately segment and remove complex backgrounds, leaving only the relevant foreground. Additionally, image captioning is achieved by combining the RESNet50 model, which extracts features from images, with an LSTM network to generate descriptive captions. The system’s performance is evaluated using metrics such as PSNR, SSIM, IoU, and BLEU, among others, ensuring high-quality results in image processing. With tools like PyTorch and Torchvision, the project delivers a seamless pipeline for enhancing, segmenting, and captioning images, making it a transformative solution for various visual content applications.
50. Square lasso
The Additive Square Lasso-based Multiple Imputation (ASLMI) method addresses missing data imputation in high-dimensional datasets by combining Lasso regularization for feature selection with multiple imputation techniques.
This approach is particularly effective in handling complex datasets that contain both continuous and categorical features, and missing values due to various mechanisms (MCAR, MAR, NMAR). ASLMI improves prediction accuracy by generating robust imputations and selecting relevant features through Lasso regularization. By comparing its performance with conventional methods like mean imputation and complete case analysis, ASLMI demonstrates enhanced imputation quality, higher prediction accuracy, and better component selection, as measured by performance metrics such as Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and imputation quality.
51. Intrusion detection system framework for cyber-physical systems
This study focuses on improving network intrusion detection by leveraging two publicly available datasets: NSL-KDD and CIC-IDS-2017. The NSL-KDD dataset, an extension of the KDD Cup dataset, is commonly used for network security research, while CIC-IDS-2017 represents a more contemporary and diverse range of cyber-attacks, particularly in vehicle networks.The research employs several preprocessing techniques, including data normalization, dimensionality reduction, and data cleaning, to prepare the datasets for analysis. The data is split into training and testing subsets, ensuring the model's generalization ability and reducing the risk of overfitting. For dimensionality reduction, the study utilizes GRIPCA (Gaussian Random Incremental Principal Component Analysis), which projects high-dimensional data into a lower-dimensional space while preserving key features. To perform intrusion detection, the Optimal Weighted Extreme Learning Machine (OWELM) algorithm, enhanced by Dynamic Inertia Weight Particle Swarm Optimization (DPSO), is applied to optimize model parameters for superior classification performance. The effectiveness of the model is evaluated using several performance metrics, including accuracy, precision, recall, F1-score, false positive rate (FPR), and false negative rate (FNR). The trained OWELM model is then tested for its ability to accurately classify network activities, distinguishing between intrusions and normal traffic. The results highlight the potential of this approach for efficient intrusion detection in modern network environments.
52. A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity
This project focuses on the early prediction of obesity levels using machine learning techniques, specifically the XGBoost algorithm. The dataset, which includes various attributes related to individuals' health, nutrition, and physical activity, is preprocessed through methods like handling missing values, label encoding, scaling, and feature engineering. The data is split into training and testing sets, with the XGBoost algorithm applied to train the model on the training set and predict obesity levels on the testing set. Key performance metrics, including accuracy, precision, recall, and F1-score, demonstrate that XGBoost provides high predictive accuracy and reliability in categorizing individuals into obesity levels (Normal weight, Overweight, Obese). The results highlight XGBoost's ability to handle large datasets, prevent overfitting, and effectively predict obesity levels, making it a valuable tool for early detection and intervention in obesity management.
53. Exploring Predictive Modeling for Food Quality Enhancement-A case study on Wine
This project demonstrates the use of machine learning techniques to predict wine quality and prices using a large tabular dataset. The dataset includes features like chemical properties of wine (e.g., alcohol content, acidity, pH) along with target variables such as wine quality (rated on a scale of 0-10) and price. The process involves preprocessing steps like handling missing values, feature scaling, encoding categorical variables, and detecting outliers. Various machine learning algorithms, including Random Forests and Gradient Boosting Machines (GBM), are employed to build predictive models. The model performance is evaluated using metrics such as accuracy, Mean Absolute Error (MAE), Mean Squared Error (MSE), R-squared (R²), and visualizations like confusion matrices and ROC curves. Once trained, the model can predict both wine quality and price, providing valuable insights for retailers, buyers, and wine enthusiasts.
54. AI-POWERED INNOVATIONS FOR MANAGING COMPLEX MENTAL HEALTH CONDITIONS AND ADDICTION TREATMENTS
The AI-powered mental health and wellness companion is a personalized, continuous support system designed to enhance mental resilience through data-driven interventions. It utilizes user input such as mood tracking, stress levels, and wellness assessments, alongside healthcare professional insights, to provide tailored mental health support. The system employs Natural Language Processing (NLP) algorithms like BERT and GPT, and machine learning techniques such as Random Forest Classifier, to predict emotional states and recommend personalized wellness strategies. With a user-friendly chatbot interface, it offers immediate support, tracks progress, and ensures privacy. Early testing has shown positive outcomes in reducing stress and improving emotional well-being, making it a valuable tool for both individuals and healthcare professionals.
55. SUSTAINING THE SELF: A PRACTICAL SELF-CARE MANUAL FOR MENTAL HEALTH PRACTITIONERS
The AI-powered mental health and wellness companion is a personalized, continuous support system designed to enhance mental resilience through data-driven interventions. It utilizes user input such as mood tracking, stress levels, and wellness assessments, alongside healthcare professional insights, to provide tailored mental health support. The system employs Natural Language Processing (NLP) algorithms like BERT and GPT, and machine learning techniques such as Random Forest Classifier, to predict emotional states and recommend personalized wellness strategies. With a user-friendly chatbot interface, it offers immediate support, tracks progress, and ensures privacy. Early testing has shown positive outcomes in reducing stress and improving emotional well-being, making it a valuable tool for both individuals and healthcare professionals.
56. FRAUD-X: An Integrated AI, Blockchain, and Cybersecurity Framework With Early Warning Systems for Mitigating Online Financial Fraud
Online financial fraud poses significant challenges, especially in mid-sized markets like North Macedonia, where rapid digital adoption in BFSI sectors outpaces security infrastructure development. This paper presents FRAUD-X, an integrated framework combining AI-based anomaly detection, blockchain-enabled transaction verification, cybersecurity intrusion detection, and real-time early warning systems. Evaluated on three datasets—including local anonymized BFSI transactions—FRAUD-X achieves a 2–4% F1-score improvement over single-layer AI models and maintains approximately 90% recall for zero-day threats. Key innovations include a permissioned blockchain for immutable transaction records, dynamic risk scoring through AI-cybersecurity integration, and real-time alerts that cut response times from hours to minutes. Operating efficiently at ~15–16 ms per transaction with moderate CPU usage, FRAUD-X supports near-real-time processing. Ablation studies validate the importance of each component, while security analyses demonstrate resilience against node compromises and advanced threats. FRAUD-X offers a practical, scalable solution for fraud detection in mid-scale BFSI markets, balancing high accuracy with operational feasibility.
57. End-to-End Deployment of the Educational AI Hub for Personalized Learning and Engagement: A Case Study on Environmental Science Education
This study presents a novel deep knowledge tracking model enhanced with an attention mechanism to improve the prediction of students’ knowledge acquisition. By incorporating both exercise knowledge features and students’ learning abilities into the input layer, the model effectively captures critical information across temporal learning sequences, addressing challenges in interpretability and long-term dependency found in existing frameworks. Extensive experiments on five real-world datasets demonstrate the model’s superior predictive accuracy. Ablation studies further highlight the significance of integrating practice difficulty and learning ability, which together enhance feature representation and contribute to improved performance. The proposed approach offers valuable insights into personalized learning by enabling more accurate monitoring of learners’ knowledge states and learning trajectories, supporting tailored educational interventions in online learning environments.
58. Predicting Student Performance Based on Knowledge Characteristics and Learning Ability
This study introduces a novel deep knowledge tracking model that incorporates an attention mechanism to enhance the prediction of students’ knowledge acquisition. By integrating exercise knowledge features and learners’ abilities into the input layer, the model effectively captures critical information at various temporal points, addressing challenges related to interpretability and long sequence dependencies found in existing frameworks. Extensive experiments on five real-world datasets demonstrate improved predictive accuracy compared to traditional approaches. Ablation studies reveal the crucial role of incorporating practice difficulty and learning ability, which enrich input features and synergistically boost model performance. The proposed approach offers precise monitoring of students’ knowledge states and learning trajectories, supporting personalized learning and adaptive educational interventions in online learning environments.
59. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey
Agentic AI represents a new paradigm in artificial intelligence characterized by autonomous systems capable of pursuing complex goals with minimal human oversight. Unlike traditional AI, Agentic AI exhibits adaptability, advanced decision-making, and self-sufficiency, allowing dynamic operation in evolving environments. This survey comprehensively reviews the foundational principles, distinctive features, and key methodologies underlying Agentic AI development. It explores current and prospective applications in domains such as healthcare, finance, and adaptive software systems, highlighting the benefits of agentic autonomy in real-world contexts. The paper also addresses significant ethical challenges, including goal alignment, resource management, and environmental adaptability, proposing frameworks for their mitigation. Emphasizing safe and effective integration, this work aims to guide researchers, developers, and policymakers in responsibly harnessing Agentic AI’s transformative potential for positive societal impact.
60. A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model
The rapid proliferation of fake news, particularly via social media, threatens societal stability and institutional trust. Despite numerous detection techniques, effectiveness often varies across different contexts. This study introduces a formal, multilayered fake news detection framework incorporating topic, social, and context layers to deliver a comprehensive solution. Central to this framework is the topic layer, featuring a novel tweet similarity evaluation method that enhances accuracy by combining FastText embeddings, cosine similarity, and word positional information. Validation on the STSBenchmark dataset using a rigorous 50-fold evaluation yielded a superior median correlation of 0.6735, surpassing existing state-of-the-art models. This work lays the foundation for an advanced fake news detection system, with planned future enhancements including multimodal content analysis and multilingual similarity measures to better address the evolving challenges of misinformation.
61. Systematic Literature Review for Detecting Intrusions in Unmanned Aerial Vehicles Using Machine and Deep Learning
Unmanned aerial vehicles (UAVs), known as drones, have significantly impacted the agricultural, police, military, and commercial sectors, aiming to enhance the quality of life; however, they are exposed to significant risks from the adversarial side, thereby gaining benefits from security vulnerabilities, including insecure communication channels, authorization risks, hardware, software, and network risks, to perform various attacks. One of those attacks is intrusion malware, which uses malicious programs, signal spoofing, denial of services, targeting integrity, confidentiality, and availability of the system. Detecting these intrusions has recently gained attention in academia and industrial fields for addressing existing threats and developing detection frameworks, such as utilizing machine and deep learning algorithms. Because of its importance in this field, this survey aims to provide a background for researchers interested in detecting malware in drones, discuss recent approaches, depict a taxonomy of constructing approaches, identify existing problems, and explore trends in future work.
62. Tractor Detection and Load Classification
This paper presents a system for detecting tractors and classifying their load status (loaded or unloaded) using advanced deep learning techniques. The system employs the YOLO (You Only Look Once) object detection algorithm to accurately identify tractors in images, leveraging its ability to detect objects in complex environments.
Once the tractor is detected, a Convolutional Neural Network (CNN) is used to classify the tractor's load status, determining whether it is loaded or unloaded. The dataset used for training the models consists of images and corresponding annotation files, which provide essential ground truth information for both object detection and load classification tasks.
The approach demonstrates the potential to automate and enhance agricultural machinery monitoring, facilitating decision-making for tasks like resource allocation, maintenance scheduling, and operational efficiency.
63. Integrating Decision Tools for Environmental Impact Reduction in Sustainable Urban Planning
Environmental impact reduction is crucial for sustainable urban planning, as it mitigates the harmful effects of urbanization on ecosystems, climate, and public health. As cities expand, minimizing environmental degradation through well-planned strategies becomes essential to ensuring long-term sustainability. This paper integrates T-Spherical Fuzzy Sets (T-SFS) to address the inherent uncertainty in evaluating urban development alternatives. T-SFS enables more accurate assessments by capturing expert judgment variability, particularly for complex criteria such as energy efficiency, biodiversity, and land use optimization. In addition, the study employs the CRADIS method alongside the Logarithmic Percentage Change-Driven Objective Weighting (LOPCOW) technique to assign precise weights to evaluation criteria. This integrated approach ensures that each criterion’s significance is accurately represented, supporting robust decision-making. The results provide a comparative analysis of five urban development alternatives, helping urban planners prioritize sustainable options that balance environmental, social, and economic factors. By incorporating fuzzy logic and advanced decision-making techniques, this study offers a comprehensive tool for policymakers to make informed decisions that contribute to sustainable urban growth, ultimately fostering cities that are more resilient, eco-friendly, and beneficial to society.
64. Personalized Federated Learning for Cellular VR: Online Learning and Dynamic Caching
This paper presents a Field of View (FoV) aware caching strategy for Mobile Edge Computing (MEC)-enabled wireless Virtual Reality (VR) networks to ensure seamless real-time video delivery. The proposed approach caches and prefetches each VR user’s FoV at base stations (BSs) using decentralized and personalized federated learning (DP-FL) algorithms tailored to individual BSs. The DP-FL method personalizes content delivery while providing a probably approximately correct (PAC) guarantee on cache hit rates. To reduce communication overhead, a one-bit stochastic gradient descent quantization (OBSGD) is employed, achieving a convergence rate of O(1/√T). Additionally, VR users’ FoVs are grouped into multicast or unicast groups based on demand and wireless channel dynamics, optimizing network resource use. Evaluations on realistic VR head-tracking data demonstrate that the proposed DP-FL caching outperforms baseline methods in reducing average delay and improving cache hit rates, offering an efficient solution for wireless VR content delivery.
65. Securing federated learning a defense strategy against targeted data poisoning attack
This project presents a comprehensive implementation and evaluation of data poisoning techniques in centralized and federated learning environments using tabular and image datasets. The study explores three major attacks: label flipping on the Iris dataset, backdoor injection into Medical MNIST, and federated poisoning on CIFAR-10. For the tabular data, a Logistic Regression model is trained and evaluated to observe the impact of label manipulation. In the image domain, a Convolutional Neural Network (CNN) is used to detect and measure the success of backdoor attacks, where a white square is embedded as a trigger pattern. For federated learning, the Flower (FL) framework is employed, simulating client-based data poisoning with malicious updates. The models are evaluated using standard metrics—accuracy, precision, recall, and F1-score—and the Attack Success Rate (ASR) is computed for the backdoor scenario. This setup enables a robust simulation of adversarial threats in real-world AI systems and provides insights into the vulnerability of different learning paradigms under poisoning attacks.
66. Artificial intelligence in clinical genetics
This project explores the application of artificial intelligence (AI) in clinical genetics, focusing on leveraging machine learning and deep learning techniques to aid in disease diagnosis, variant interpretation, and therapeutic predictions. By utilizing diverse clinical databases such as OMIM, ClinVar, and gnomAD, as well as genomic and phenotypic data, the system aims to improve the accuracy of genetic diagnoses, particularly for rare diseases. Through advanced methods like convolutional neural networks (CNNs) for facial dysmorphology detection and AlphaFold for protein structure prediction, the model provides valuable insights, such as ranked differential diagnoses and drug-binding predictions. Despite impressive results, such as Face2Gene's 90% accuracy in syndrome recognition and AlphaFold's near-experimental protein structure prediction, challenges like dataset biases, especially underrepresentation of non-European populations, remain significant. The system, built using Python and Flask, integrates these advanced AI techniques to enhance clinical decision-making and advance personalized medicine.
67. Energy-Efficient and Trust-Based Autonomous Underwater Vehicle Scheme for 6G-Enabled Internet of Underwater Things
This project presents a hybrid trust-based approach for detecting attacks and optimizing node deployment in Underwater Wireless Sensor Networks (UWSNs). Utilizing publicly available or synthetically generated datasets that capture critical node and environmental features—including energy levels, signal strength, and trust metrics—this system employs machine learning algorithms such as Random Forest and K-Nearest Neighbors to identify malicious behavior (e.g., blackhole, Sybil, and jamming attacks). The pipeline includes comprehensive data preprocessing, training-testing data splits, and performance evaluation using metrics like accuracy, precision, recall, and F1-score. The model predicts potential threats based on trust thresholds and behavioral anomalies, while a Streamlit-based front end enables real-time interaction and visualization. This solution aims to enhance security and reliability in UWSNs through intelligent, trust-aware detection and deployment strategies.

