Data Mining Projects Topics and Titles | CSE | IT

1. Synthetic Data Generation and Evaluation Techniques for Classifiers in Data Starved Medical Applications

This study presents a robust framework for asthma prediction using machine learning and data augmentation techniques to address class imbalance in medical datasets. The pipeline integrates several modules, including data collection, preprocessing, class imbalance handling, model training, evaluation, and prediction. Three synthetic oversampling techniques—Standard SMOTE, Auto encoder-based generation, and Incremental SMOTE—were compared to enhance minority class representation. A Random Forest classifier was used for prediction, and performance was evaluated using F1-score, recall, precision, ROC-AUC, and precision-recall curves. Among the models, SMOTE-enhanced training yielded improved recall and F1-score. A prediction module was also developed for personalized asthma risk assessment, classifying patients as Healthy or Asthma with corresponding risk levels. This modular and scalable architecture allows easy adaptation to other healthcare prediction problems involving imbalanced data. The final results, including a comparative performance analysis of oversampling techniques, were saved for further evaluation and clinical decision support integration.

2. Customer Behavior Analysis and Predictive Modeling in Supermarket Retail A Comprehensive Data Mining Approach

3. AI-Driven Meat Food Drying Time Prediction for Resource Optimization and Production Planning in Smart Manufacturing

4. Deep Classification of Algarrobo Trees in Seasonally Dry Forests of Peru Using Aerial Imagery

5. A Hybrid Machine Learning Model for Efficient XML Parsing

6. Predicting the Compressive Strength of Recycled Concrete Using Ensemble Learning Model

7. Adaptive Optimizable Gaussian Process Regression Linear Least Squares Regression Filtering Method for SEM Images

8. Prediction of Myocardial Infraction based on Non-ECG sleep data Combined with domain knowledge

The provided code presents a complete machine learning pipeline for classifying sleep disorders based on lifestyle and sleep-related data, using methods such as Random Forest and Convolutional Neural Networks (CNNs). This pipeline involves data pre-processing (including handling missing values and label encoding), feature selection, model training, evaluation with multiple metrics, and visualization of results.This approach can be adapted effectively to predict myocardial infarction (MI) using non-ECG sleep data combined with domain knowledge. While MI diagnosis traditionally depends on ECG and invasive tests, sleep and lifestyle features—such as sleep duration, quality, stress, physical activity, and cardiovascular indicators—carry important predictive information for MI risk. By applying similar pre-processing and modeling techniques from the sleep disorder classification code, these non-ECG features can be leveraged to build robust MI prediction models.Domain knowledge plays a critical role in this adaptation by guiding the selection and interpretation of relevant features associated with cardiovascular health, improving model accuracy and clinical relevance. The Random Forest model provides interpretable feature importance, while CNNs can capture complex temporal patterns in sequential sleep data. Additionally, generating synthetic data with noise, as done in the CNN section, can help address data scarcity and class imbalance common in medical datasets.Visualization tools such as correlation heatmaps and pairplots assist in understanding feature relationships, further enhancing model development and explanation. Saving trained models enables future deployment in clinical decision support systems, potentially allowing early and non-invasive MI risk assessment.In summary, the sleep disorder classification framework provides a solid foundation for predicting myocardial infarction using non-ECG sleep data. Combining machine learning with domain expertise offers a promising avenue for accessible, accurate, and interpretable cardiovascular risk prediction.

9. Recognizing Supporting Standpoints for Online Restaurant Reviews: Evidence From Tripadvisor

10. Shape Penalized Decision Forests for Imbalanced Data Classification

11. SCM-DL: Split-Combine-Merge Deep Learning Model Integrated With Feature Selection in Sports for Talent Identification

12. Enhancing stress detection in IT workplaces: Integrating machine learning, visual processing, and privacy-enhancing techniques

13. RFMVDA: An Enhanced Deep Learning Approach for Customer Behavior Classification in E-Commerce Environments

14. A Cloud-Based Optimized Ensemble Model for Risk Prediction of Diabetic Progression

15. Personalized Learning Through MBTI Prediction: A Deep Learning Approach Integrated With Learner Profile Ontology

16. FastGEMF: Scalable High-Speed Simulation of Stochastic Spreading Processes Over Complex Multilayer Networks

17. Artificial Intelligence in Dyslexia Research and Education: A Scoping Review

18. Comparative Study of Machine Learning and Deep Learning Models for Early Prediction of Ovarian Cancer

19. An Analysis of Semi-Supervised Machine Learning in Electrical Machines

20. Ad Click Fraud Detection Using Machine Learning and Deep Learning Algorithms

21. A Novel Partitioned Random Forest Method-Based Facial Emotion Recognition

22. Cardiac Clarity: Harnessing Machine Learning for Accurate Heart-Disease Prediction

23. A Novel Framework for Saraiki Script Recognition Using Advanced Machine Learning Models (YOLOv8 and CNN)

24. AgeML: Age Modeling With Machine Learning

25. Fault Detection in Photovoltaic Systems Using a Machine Learning Approach

26. Semi-Supervised Building Footprint Extraction Using Debiased Pseudo-Labels

27. A Survey of Ransomware Detection Methods

28. Weak–Strong Graph Contrastive Learning Neural Network for Hyperspectral Image Classification

29. Ensuring Zero Trust Security in Consumer Internet of Things Using Federated Learning-Based Attack Detection Model

30. Mitigating Cyber Risks in Smart Cyber-Physical Power Systems Through Deep Learning and Hybrid Security Models

31. AI-Powered IoT: A Survey on Integrating Artificial Intelligence With IoT for Enhanced Security,Efficiency, and Smart Applications

32. Evaluation of Blockchain-Based Tracking and Tracing System With Uncertain Information: A Multi-Criteria Decision-Making Approach

33. Trip Based Modeling of Fuel Consumption in Modern Heavy-Duty Vehicles Using AI

34. The Impact of Aging on an FPGA-based Physical Unclonable Function

35. Assessment of Climate Change in Angola and Potential Impacts on Agriculture

36. A Fusion Deep Learning Model for Predicting Adverse Drug Reactions Based on Multiple Drug Characteristics

37.Machine learning in smart production logistics: a review of technological capabilities

38. Online Recruitment Fraud (ORF) Detection Using Deep Learning Approaches

39. Mining the Opinions of Software Developers for Improved Project Insights: Harnessing the Power of Transfer Learning

40. Multiparametric MRI-Based Radiomics Signature with MachineLearning for Preoperative Predictionof Prognosis Stratification inPediatric Medulloblastoma

41. Classification models for likelihood prediction of diabetes at early stage using feature selection

42. Artificial intelligence algorithms in flood prediction: a general overview

43. Enhancing Proactive Cyber Defense: A Theoretical Framework for AI-Driven Predictive Cyber Threat Intelligence

44. Machine learning applications in flood forecasting and predictions, challenges, and way-out in the perspective of changing environment

45. Revolutionizing cardiovascular health integrating deep learning techniques for predictive analysis of personal key indicators in heart disease

46. Optimization of network device hardening in a multivendor environment

47. DETECTION OF FAKE ACCOUNTS IN SOCIAL MEDIA

48. EEG-based functional connectivity analysis of brain abnormalities: A systematic review study

49. From AI to digital transformation-The AI readiness framework

50. Square lasso

51. Intrusion detection system framework for cyber-physical systems

52. A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity

53. Exploring Predictive Modeling for Food Quality Enhancement-A case study on Wine

54. AI-POWERED INNOVATIONS FOR MANAGING COMPLEX MENTAL HEALTH CONDITIONS AND ADDICTION TREATMENTS

55. SUSTAINING THE SELF: A PRACTICAL SELF-CARE MANUAL FOR MENTAL HEALTH PRACTITIONERS

56. FRAUD-X: An Integrated AI, Blockchain, and Cybersecurity Framework With Early Warning Systems for Mitigating Online Financial Fraud

57. End-to-End Deployment of the Educational AI Hub for Personalized Learning and Engagement: A Case Study on Environmental Science Education

58. Predicting Student Performance Based on Knowledge Characteristics and Learning Ability

59. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey

60. A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model

61. Systematic Literature Review for Detecting Intrusions in Unmanned Aerial Vehicles Using Machine and Deep Learning

62. Tractor Detection and Load Classification

63. Integrating Decision Tools for Environmental Impact Reduction in Sustainable Urban Planning

64. Personalized Federated Learning for Cellular VR: Online Learning and Dynamic Caching

65. Securing federated learning a defense strategy against targeted data poisoning attack

66. Artificial intelligence in clinical genetics

67. Energy-Efficient and Trust-Based Autonomous Underwater Vehicle Scheme for 6G-Enabled Internet of Underwater Things

Data Mining Projects – ElysiumPro