1.knn和超参数
约 364 字大约 1 分钟
2025-09-20
kNN算法
- 使用分类情况 基本原理--分类 思路:最近的k个邻居->民主集中制投票->(可加权)分类表决
- 使用回归情况(计算未知点的值) 和上面唯一不同的不是表决了,而是(加权)计算均值
注意:k太小容易导致过拟合,容易把噪声学进来;k太大容易欠拟合,决策效率低
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_splitiris = load_iris()
X = iris.data
y = iris.targetX.shape,y.shape运行结果
((150, 4), (150,))
x_train,x_test,y_train,y_test=train_test_split(X,y,train_size=0.8,random_state=42)
x_train.shape,x_test.shape,y_train.shape,y_test.shape运行结果
((120, 4), (30, 4), (120,), (30,))
超参数设置
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(
n_neighbors=3,
weights='distance',
p= 2,# 明可夫斯基距离
)neigh.fit(x_train, y_train)运行结果
KNeighborsClassifier(n_neighbors=3, weights='distance')
y_predict = neigh.predict(x_test)np.sum(y_predict == y_test) / len(y_test)运行结果
1.0
# 超参数地毯式搜索
for n in range(1,20):
for weight in ['uniform','distance']:
for p in range(1,7):
knn = KNeighborsClassifier(n_neighbors=n,weights=weight,p=p)
knn.fit(x_train,y_train)
knn_score = knn.score(x_test,y_test)
# print(f"n_neighbors:{n},weights:{weight},p:{p},score:{knn_score}")# 使用sklearn进行超参数搜索
from sklearn.model_selection import GridSearchCV
params = {
'n_neighbors':[n for n in range(1, 20)],
'weights':['uniform', 'distance'],
'p':[p for p in range(1, 7)]
}
grid = GridSearchCV(estimator=KNeighborsClassifier(), param_grid=params, n_jobs=-1) # n_jobs=-1表示并行最大使用资源(因为各个参数之间没有关系)
grid.fit(x_train, y_train)
grid.best_params_运行结果
{'n_neighbors': 5, 'p': 4, 'weights': 'uniform'} 