《数据仓库与数据挖掘》在线课程注重理论联系实践,理论为经,应用为纬。立足数据,在统一框架内介绍数据仓库和数据挖掘技术,主要包括数据概念、数据仓库模型、知识类型,数据预处理、数据分类、数据回归、关联挖掘、数据聚类、异常检测、数据可视化等方法,以及大数据挖掘平台的设计与实现。通过学习,学生可以掌握海量数据仓库存储与挖掘的基本原理,利用数据预处理、关联规则挖掘、聚类分析、分类挖掘、异常检测等算法,研制软件工具,解决实际工程中海量数据的高效管理与深度利用问题。该课程为学生今后从事科学研究工作或从事各种数据利用工作提供必要的基础理论和基本技能。
The online course "Data Warehouse and Data Mining" focuses on the connection of theory with practice, with theory as warp and application as weft. Based on data, data warehouse and data mining technology is introduced within a unified framework, including data concepts, data warehouse models, knowledge types, data preprocessing, data classification, data regression, association mining, data clustering, anomaly detection, data visualization and so on, as well as the design and implementation of a big data mining platform. By learning the course, you can master the basic principles of massive data warehouse storage and mining, and further take advantage of data preprocessing, association rule mining, cluster analysis, classification mining, anomaly detection and other algorithms to develop software tools to solve the problems on efficient management and in-depth utilization of massive data in actual projects. This course provides the necessary basic theories and basic skills for students to engage in scientific research or engage in various data utilization tasks in the future.
1 Introduction
1. What Is Data Mining and Why Data Mining
2.Data Mining Process
3. Data to be Mined
4. Data Mining Tasks
5.Evaluation of Knowledge
Test 1
2 Data
Data Objects and Attribute Types
Basic Statistical Descriptions of Data
Measuring Data Similarity and Dissimilarity
Test 2
3 Data Preprocessing
Overview
Data Cleaning
Data Integration
Data Transformation
Data Reduction
Test 3
4 Association Rule Mining
Basic Concept
Frequent Itemset Generation
Rule Generation
Factors Affecting Complexity of Apriori
Compact Representation of Frequent Itemsets
Pattern Evaluation
Test 4
5 Classification
Classification: Basic Concepts
Decision Tree Induction
Bayes Classification Methods
Techniques to Improve Classification Accuracy: Ensemble Methods
Classification of Class-Imbalanced Data Sets
Model Evaluation and Selection
Test 5
6 Cluster Analysis
An Introduction
Partitioning Methods
Hierarchical Methods
Density- and Grid-Based Methods
Evaluation of Clustering
Test 6
7 Outlier Analysis
Outlier and Outlier Analysis
Outlier Detection Methods
Statistical Approaches
Proximity-Based Approaches
Clustering-Based and Classification–Based Approaches
Test 7
8 Data visualization
Introduction
Function of Data Visualization
Data Visualization Methods
Tools of Data Visualization
Test 8
9 Data warehouse
An Introduction
Test 9
10 Perspective
数据资源
10.2数据使用
10.3数据生态
Test 10