Classifier_Comparison

Classifier Comparison

Adapted from http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

A comparison of a several classifiers in scikit-learn on synthetic datasets. The point of this example is to illustrate the nature of decision boundaries of different classifiers. This should be taken with a grain of salt, as the intuition conveyed by these examples does not necessarily carry over to real datasets.

Particularly in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better generalization than is achieved by other classifiers.

The plots show training points in solid colors and testing points semi-transparent. The lower right shows the classification accuracy on the test set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Python Code source: Gaël Varoquaux
# Andreas Müller
# Julia adaptation: Cédric St-Jean
# Modified for documentation by Jaques Grobler
# License: BSD 3 clause

using ScikitLearn
using PyCall
using PyPlot
using ScikitLearn.CrossValidation: train_test_split
@pyimport matplotlib.colors as mplc
@sk_import preprocessing: StandardScaler
@sk_import datasets: (make_moons, make_circles, make_classification)
@sk_import neighbors: KNeighborsClassifier
@sk_import svm: SVC
@sk_import tree: DecisionTreeClassifier
@sk_import ensemble: (RandomForestClassifier, AdaBoostClassifier)
@sk_import naive_bayes: GaussianNB
@sk_import discriminant_analysis: (LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis)
using ScikitLearn.Utils: meshgrid
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
h = .02  # step size in the mesh

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Decision Tree",
"Random Forest", "AdaBoost", "Naive Bayes", "Linear Discriminant Analysis",
"Quadratic Discriminant Analysis"]
classifiers = [
KNeighborsClassifier(3),
SVC(kernel="linear", C=0.025),
SVC(gamma=2, C=1),
DecisionTreeClassifier(max_depth=5),
RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1),
AdaBoostClassifier(),
GaussianNB(),
LinearDiscriminantAnalysis(),
QuadraticDiscriminantAnalysis()]

X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
random_state=1, n_clusters_per_class=1)
srand(42)
X += 2 * rand(size(X)...)
linearly_separable = (X, y)

datasets = [make_moons(noise=0.3, random_state=0),
make_circles(noise=0.2, factor=0.5, random_state=1),
linearly_separable
];
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
fig = figure(figsize=(27, 9))
i = 1
# iterate over datasets
for ds in datasets
# preprocess dataset, split into training and test part
X, y = ds
X = fit_transform!(StandardScaler(), X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4)

x_min, x_max = minimum(X[:, 1]) - .5, maximum(X[:, 1]) + .5
y_min, y_max = minimum(X[:, 2]) - .5, maximum(X[:, 2]) + .5
xx, yy = meshgrid(x_min:h:x_max, y_min:h:y_max)

# just plot the dataset first
cm = PyPlot.cm[:RdBu]
cm_bright = mplc.ListedColormap(["#FF0000", "#0000FF"])
ax = subplot(length(datasets), length(classifiers) + 1, i)
# Plot the training points
ax[:scatter](X_train[:, 1], X_train[:, 2], c=y_train, cmap=cm_bright)
# and testing points
ax[:scatter](X_test[:, 1], X_test[:, 2], c=y_test, cmap=cm_bright, alpha=0.6)

ax[:set_xlim](minimum(xx), maximum(xx))
ax[:set_ylim](minimum(yy), maximum(yy))
ax[:set_xticks](())
ax[:set_yticks](())
i += 1

# iterate over classifiers
for (name, clf) in zip(names, classifiers)
ax = subplot(length(datasets), length(classifiers) + 1, i)
fit!(clf, X_train, y_train)
scor = score(clf, X_test, y_test)

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
try
# Not implemented for some
Z = decision_function(clf, hcat(xx[:], yy[:]))
catch
Z = predict_proba(clf, hcat(xx[:], yy[:]))[:, 2]
end

# Put the result into a color plot
Z = reshape(Z, size(xx)...)
ax[:contourf](xx, yy, Z, cmap=cm, alpha=.8)

# Plot also the training points
ax[:scatter](X_train[:, 1], X_train[:, 2], c=y_train, cmap=cm_bright)
# and testing points
ax[:scatter](X_test[:, 1], X_test[:, 2], c=y_test, cmap=cm_bright,
alpha=0.6)

ax[:set_xlim](minimum(xx), maximum(xx))
ax[:set_ylim](minimum(yy), maximum(yy))
ax[:set_xticks](())
ax[:set_yticks](())
ax[:set_title](name)

ax[:text](maximum(xx) - .3, minimum(yy) + .3, @sprintf("%.2f", scor),
size=15, horizontalalignment="right")
i += 1
end
end
fig[:subplots_adjust](left=.02, right=.98)

png

/Users/kay/.julia/v0.6/Conda/deps/usr/lib/python2.7/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
文章作者: Monad Kai
文章链接: onlookerliu.github.io/2017/12/29/Classifier-Comparison/
版权声明: 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Code@浮生记
支付宝打赏
微信打赏