K-Nearest Neighbor is a supervised lazy learning technique.
The Iris dataset is used, with 150 instances, 4 features and 3 classes. The first 50 observations (rows) correspond to class 0, next 50 rows to class 1 and last 50 rows to class 2. The program prints the class names.
10-fold cross validation is used. Thus 150/10 = 15 instances are used for testing, and the rest for training. This is done 10 times, each time with a new set of indices. The KFold function has the shuffle parameter set to True so each test/training will have samples from all 3 classes.
The accuracy_score function is used to find the fraction of correctly labelled test values. Since there are 135 training labels, we thus find 135 distances in 4D space during each train-test iteration.
# ML1.py
from __future__ import print_function, division
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cross_validation import KFold
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
# Loading data (150,4)
data = load_iris()
x = data.data
y = data.target
print('The three classes are',data.target_names)
# Use 5 nearest neighbors
classifier = KNeighborsClassifier(n_neighbors=5)
# Running 10 tests using 10-fold cross validataon
test = set()
acc = []
kf = KFold(len(x), n_folds=10, shuffle=True)
for trn,tst in kf:
x_train = x[trn]
y_train = y[trn]
print('length of x_train:',len(x_train))
classifier.fit(x_train, y_train)
x_test = x[tst]
y_test = y[tst]
test = test.intersection(tst)
print('length of x_test:',len(x_test))
print('tst:',tst)
pred = classifier.predict(x_test)
acc.append(accuracy_score(y_test,pred))
# Accuracy
print('Result: {}'.format(sum(acc)/len(acc)))
print('length of test: {}'.format(len(test)))
#The three classes are ['setosa' 'versicolor' 'virginica']
#length of x_train: 135
#length of x_test: 15
#tst: [ 11 20 37 42 58 88 94 95 99 101 117 121 132 136 146]
#length of x_train: 135
#length of x_test: 15
#tst: [ 0 13 19 26 47 64 76 86 97 98 104 105 120 133 143]
#length of x_train: 135
#length of x_test: 15
#tst: [ 12 18 24 27 30 33 35 38 48 51 55 60 106 122 144]
#length of x_train: 135
#length of x_test: 15
#tst: [ 5 32 45 52 65 66 81 83 90 102 116 131 137 139 148]
#length of x_train: 135
#length of x_test: 15
#tst: [ 3 4 17 23 29 31 40 41 49 79 85 87 109 114 145]
#length of x_train: 135
#length of x_test: 15
#tst: [ 1 2 54 57 61 80 89 96 113 115 118 127 128 134 141]
#length of x_train: 135
#length of x_test: 15
#tst: [ 7 8 10 15 16 71 74 82 125 129 130 135 140 142 149]
#length of x_train: 135
#length of x_test: 15
#tst: [ 9 14 53 56 68 69 73 75 77 91 100 103 107 110 111]
#length of x_train: 135
#length of x_test: 15
#tst: [ 6 21 25 28 34 44 46 62 63 70 92 119 126 138 147]
#length of x_train: 135
#length of x_test: 15
#tst: [ 22 36 39 43 50 59 67 72 78 84 93 108 112 123 124]
#Result: 0.973333333333
#length of test: 0
No comments:
Post a Comment