0
0
1
1. 云栖社区>
2. 博客>
3. 正文

# 第一部分：数据建模理论和逻辑

## 一、从数据分析的定义开始

Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
（来源：Data analysis

Data modeling is a process used to define and analyze data requirements needed to support the business processes within the scope of corresponding information systems in organizations. Therefore, the process of data modeling involves professional data modelers working closely with business stakeholders, as well as potential users of the information system.（来源：Data modeling

Mathematical formulas or models called algorithms may be applied to the data to identify relationships among the variables, such as correlation or causation. In general terms, models may be developed to evaluate a particular variable in the data based on other variable(s) in the data, with some residual error depending on model accuracy (i.e., Data = Model + Error)（来源：Data modeling

• 选择变量与重构变量

• 选择算法

• 设定参数

• 加载算法与测试结果

# 第二部分：数据建模的应用

## Step 2：数据获取

（打码方式比较简单粗暴，请凑合看吧……）

## Step 3：数据清洗

1、论坛由于其特殊性，很多人成交后会把帖子改成《已出》等标题，这一类数据需要删除：

2、有一部分人用直接贴图的方式放求购信息，这部分体现为只抓到图片链接，需要删除。

## Step 4：数据整理

（主机掌机那个标签后来我在实际操作时没有使用）

## Step 9 & 10：设定参数 & 加载算法

K-means算法除了输入变量以外，还需要设定聚类数，我们先拍脑袋聚个五类吧！

（别笑，实际操作中很多初始参数都是靠拍脑袋得来的，要通过结果来逐步优化）

+ 关注