Classification can be performed on structured or unstructured data. Classification is a technique where we categorize data into a given number of classes. The main goal of a classification problem is to identify the category/class to which a new data will fall under.
In machine learning, classification problems are one of the most fundamentally exciting and yet challenging existing problems. The implications of a competent classification model are enormous — these models are leveraged for natural language processing text classification, image recognition, data prediction, reinforcement training, and a countless number of further applications. This module also offer state of the art discussions on classification models, such as xgboost for structured data and deep learning for unstructured data.
Prerequisites : SFDS, MFDS, GLM, EDA.
Objectives/Content :
Having successfully completed this module trainees are expected to be able to:
- Understand and be able to apply the concepts and methods underlying the analysis of classification problems and the context for interpretation of results.
- Finding best model for the given problem and finding optimal parameters.
- Understand the theoretical bases of different methods of classification models.
Reference :
- Aggarwal, C. C. (2015). Data mining: the textbook. Springer.
- Cabena, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi. Discovering Data Mining: From Concept to Implementation. IBM, 1997
- Fayyad, G. Piatetsky-Shapiro, and P. Smith. From data mining to knowledge discovery. AI Magzine,Volume 17, pages 37-54, 1996.
- Barry, A. J. Michael & Linoff, S. Gordon. 2004. Data Mining Techniques. Wiley Publishing, Inc. Indianapolis : xxiii + 615 hlm.
- Hand, David etc. 2001. Principles of Data Mining. MIT Press Cambridge, Massachusetts : xxvii + 467 hlm.
- Hornick, Mark F., Marcade, Erik & Vankayala, Sunil. 2007. Java Data Mining: Strategy,Standard, and Practice. Morgan Kaufman. San Francisco : xxi + 519 hlm.
- Tang, ZhaoHui & Jamie, MacLennan. 2005. Data Mining with SQL Server 2005. Wiley Publishing, Inc. Indianapolis : xvii + 435 hal
- Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
- Yang, X. S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
- Simovici, D. (2018). Mathematical Analysis for Machine Learning and Data Mining. World Scientific Publishing Co., Inc..
- Zheng, A. (2015). Evaluating machine learning models: a beginner's guide to key concepts and pitfalls.
- Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
- Jason Brownlee: A Gentle Introduction to XGBoost for Applied Machine Learning. Mach. Learn. Mastery. (2016).
- Ketkar, N.: Deep Learning with Python. (2017). https://doi.org/10.1007/978-1-4842-2766-4.
Topic ID | Topic Title | Lessons |
SLCM1
|
Introduction to Classification Methods
|
- Introduction to Classification problems
- Inductive Bias and Consistent Learning - Evaluation Metrics (ROC-AUC, Lifts, Prec, recall, Error Types, F-bscores, NMI, Rand Index, micro-macro metrics, etc) - Best practice on data labelling - Dealing with new category / changes of data distributions |
SLCM2 | Naive Bayes Classifier | - Model Introduction - Assumptions - Parameter Estimation - Visualizations - Interpretations - Modifications - Case studies |
SLCM3 | k_Nearest Neighbour | - Model Introduction - Assumptions - Parameter Estimation - Visualizations - Interpretations - Modifications - Case studies |
SLCM4 | Decision Tree | - Model Introduction - Assumptions - Parameter Estimation - Visualizations - Interpretations - Modifications - Case studies |
SLCM5 | Support Vector Machines | - Model Introduction - Assumptions - Parameter Estimation - Visualizations - Interpretations - Modifications - Case studies |
SLCM6 | Neural Network Models | - Model Introduction - Assumptions - Parameter Estimation - Visualizations - Interpretations - Modifications - Case studies |
SLCM7 | SLCM Grouped Discussion | Recap Discussion SLCM 1-6 |
...
SLCM8 | State of The Arts Classification Models | - XGBoost (structured Data) - Deep Learning (unstructured Data) - Capstone Project (report + presentation) |
SLCM9 | Advanced Classification Problems Case Studies | - High-dimensional data classification problems - Ensemble/Hybrid Classification problems (bagging, boosting, blending, etc) - Imbalance learning - Rare case classification - Fine-grained classification problem - Multilabel Classification - Multimodal Classification |
No comments:
Post a Comment
Relevant & Respectful Comments Only.