sklearn调包侠之学习曲线和Pipeline

0
0
0
1. 云栖社区>
2. 博客>
3. 正文

## sklearn调包侠之学习曲线和Pipeline

### 学习曲线

##### 绘制流程
• 把数据集划分为多等分（5份或其它）
• 把数据集划分为训练集和测试集
• 以训练集准确性和验证集准确性做为纵坐标，训练集个数作为横坐标。
• 每次增加1等分
##### 绘制函数

``````from sklearn.neighbors import KNeighborsClassifier,RadiusNeighborsClassifier

model1 = KNeighborsClassifier(n_neighbors=2)
model1.fit(X_train, Y_train)
score1 = model1.score(X_test, Y_test)

from sklearn.model_selection import learning_curve

train_size, train_score, test_score = learning_curve(model1, X, Y, cv=10, train_sizes=np.linspace(0.1, 1.0, 5))

train_scores_mean = np.mean(train_score, axis=1)
train_scores_std = np.std(train_score, axis=1)
test_scores_mean = np.mean(test_score, axis=1)
test_scores_std = np.std(test_score, axis=1)

plt.fill_between(train_size, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.1,
color="r")
plt.fill_between(train_size, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.1, color="g")
plt.plot(train_size, train_scores_mean, 'o--', color="r",
label="Training score")
plt.plot(train_size, test_scores_mean, 'o-', color="g",
label="Cross-validation score")

plt.grid()
plt.title('Learn Curve for KNN')
plt.legend(loc="best")
``````

### Pipeline

##### Pipeline技术

Pipeline 的中间过程由sklearn相适配的转换器（transformer）构成，最后一步是一个estimator（模型）。中间的节点都可以执行fit和transform方法，这样预处理都可以封装进去；最后节点只需要实现fit方法，通常就是我们的模型。流程如下图所示。

##### Pipeline代码

``````from sklearn.pipeline import Pipeline
from sklearn.preprocessing import Normalizer
norm = Normalizer()
poly = PolynomialFeatures(2, include_bias=False)
lr = LinearRegression()
pipeline = Pipeline([('norm', norm),('poly',poly),('lr', lr)])
pipeline.fit(X_train, y_train)
``````

+ 关注