Applied Data Mining (ADM)
In industry 4.0 era, data has been growing rapidly. This explosive growth of stored and transient data has generated an urgent need for efficient and effective techniques that can assist in transforming this data into useful information, knowledge, or insights. Data mining has emerged as a multidisciplinary field that addresses this need.
This module discusses techniques for preprocessing data before mining, business understanding, hypothesis building, building optimal models, model evaluations and interpretations, and data generalization. It presents methods for mining frequent patterns, associations, and correlations. It also presents methods for data classification and prediction, data-clustering approaches, and outlier analysis.
Prerequisites : EDA
Objectives/Content :
- Be able to approach data mining as a process, by demonstrating competency in the use of CRISP-DM, the Cross-Industry Standard Process for Data Mining, including the business understanding phase, the data understanding phase, the exploratory data analysis phase, the modeling phase, the evaluation phase, and the deployment phase.
- Be proficient with data mining software/tools such as Python.
- Understand and apply a wide range of clustering, estimation, prediction, and classification algorithms, including k-means clustering, classification and regression trees, logistic Regression, k-nearest neighbor, multiple regression, and neural networks.
- Understand and apply the most current data mining techniques and applications, such as text mining and social media analytics.
- Understand the mathematical statistics foundations of the algorithms outlined above.
Evaluations/Assignments:
- At the end of the fundamental lessons in this module, trainee will be given a dataset and the metadata (story) behind it. The trainees than need to form a team and apply data mining process to find as many important insights as possible from the data. The evaluation is based on the report and presentation of the findings. The case study can be taken from real dataset from trainee’s division/department or from any other source such as Kaggle.
- Evaluation on the advance data mining topics is based on the speedup, efficiency, deep insights, an-or data creativity from a more challenging data problem. Such as data with high-dimensionality, multimodal, fine-grained, and so on.
- Online quizzes in the eLearning platform.
Lessons:
Reference:
- Data Mining: Concepts and Techniques by J Han, M Kamber & J Pei, 2012, 3rd edition, Morgan Kaufmann.
- Aggarwal, C. C. (2015). Data mining: the textbook. Springer.
- Cabena, P. Hadjinian, R. Stadler, J. Verhees, and A. Zanasi. Discovering Data Mining: From Concept to Implementation. IBM, 1997
- Fayyad, G. Piatetsky-Shapiro, and P. Smith. From data mining to knowledge discovery. AI Magzine,Volume 17, pages 37-54, 1996.
- Barry, A. J. Michael & Linoff, S. Gordon. 2004. Data Mining Techniques. Wiley Publishing, Inc. Indianapolis : xxiii + 615 hlm.
- Hand, David etc. 2001. Principles of Data Mining. MIT Press Cambridge, Massachusetts : xxvii + 467 hlm.
- Hornick, Mark F., Marcade, Erik & Vankayala, Sunil. 2007. Java Data Mining: Strategy,Standard, and Practice. Morgan Kaufman. San Francisco : xxi + 519 hlm.
- Tang, ZhaoHui & Jamie, MacLennan. 2005. Data Mining with SQL Server 2005. Wiley Publishing, Inc. Indianapolis : xvii + 435 hal
- Bishop, C. M. (2006). Pattern recognition and machine learning. springer.
- Yang, X. S. (2019). Introduction to Algorithms for Data Mining and Machine Learning. Academic Press.
- Simovici, D. (2018). Mathematical Analysis for Machine Learning and Data Mining. World Scientific Publishing Co., Inc..
- Zheng, A. (2015). Evaluating machine learning models: a beginner's guide to key concepts and pitfalls.
- Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45(37), 870-877.
No comments:
Post a Comment
Relevant & Respectful Comments Only.