cs231n-2022-assignment1#Q1：kNN图像分类器实验_python

1. 前言

2. 加载数据

3. compute_distances_two_loops实现

4. predict_labels实现

5. compute_distances_one_loop()实现

6. compute_distances_no_loops()实现

7. Cross-Validation

1. 前言

本文是李飞飞cs231n-2022的第一次作业的第一个问题的要点介绍。

本次作业相关的课程内容参见：CS231n Convolutional Neural Networks for Visual Recognition

Assignment1的内容要求参见：Assignment 1 (cs231n.github.io)

建议有兴趣的伙伴读原文，过于精彩，不敢搬运。本文仅在作业完成过程中根据需要补充说明一些要点。作业的原始starter_code也可以从该课程网站下载。本文中只贴一些需要说明的关键代码，本渣完成的作业也将另外打包上传（下载地址参见文末）供有兴趣的小伙伴参考，修改的文件仅限于以下两个文件：

(1) root\cs231n\classifier\k_nearest_neighbor.py
(2) root\knn.ipynb

2. 加载数据

原始文件中的数据加载处理好像是有些问题。

我修改为用tensorflow.keras的数据加载函数进行处理，如下所示：

import tensorflow.keras as keras
(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

数据加载后样本图片的显示结果如下：

3. compute_distances_two_loops实现

kNN算法的第一个关键点就是计算“距离”。

本作业中采用的是欧几里得距离，以下是用最直观的两层循环的方式实现两张图片之间的欧几里得距离（或者说L2-norm）。其中，一个需要注意的地方是，图像数据的缺省格式是uint8(无符号8比特整数)，直接用这个数据类型进行距离运算会出现奇怪的结果（本渣的血泪教训参见：Image数据数值计算处理的一个小问题）

    def compute_distances_two_loops(self, X):
        num_test = X.shape[0]
        num_train = self.X_train.shape[0]
        dists = np.zeros((num_test, num_train))
        for i in range(num_test):
            for j in range(num_train):
                #####################################################################
                # TODO:                                                             #
                # Compute the l2 distance between the ith test point and the jth    #
                # training point, and store the result in dists[i, j]. You should   #
                # not use a loop over dimension, nor use np.linalg.norm().          #
                #####################################################################
                # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

                train = self.X_train[j][:].astype('float')
                test  = X[i][:].astype('float')
                diff  = (train-test)
                #print(i,j,(train-test).shape, test.shape)
                tmp1 = np.linalg.norm(diff,2)
                #tmp2 = np.sqrt(np.dot(diff,diff))
                #tmp3 = np.sqrt(np.sum(np.multiply(diff,diff)))
                #tmp5 = np.sqrt(np.sum(np.square(diff)))
                #
                #squ_sum = 0
                #for k in range(len(diff)):
                #    squ_sum += diff[k] * diff[k]
                #tmp4 = np.sqrt(squ_sum)
                #print(tmp1,tmp2,tmp3,tmp4,tmp5) # All five should be the same.
                dists[i][j] = tmp1
                
                # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return dists

以上代码中包含了5种计算方式，都是完全等价的。

4. predict_labels实现

这个作业其实最重要的方面并不在kNN算法，而是如何numpy编程的熟练掌握。numpy有太强大的功能，但是你不知道的话就等于没有^-^。之前我不知道有np.argsort这种函数，如果没有提示，估计就只能用最粗暴最简单的语句用循环的方式进行搜索，会得出既冗长容易出错而且还低效的代码来。

另外，求众数（mode）的代码也是从网上搜索到的一个解决方案，大开眼界。不过，对于这些细节问题，只要你知道自己想要什么，或者说你能问出正确的问题，那你大可相信在无所不有无所不能的网络上总有哪个角落有一个答案在等着你^-^。

    def predict_labels(self, dists, k=1):
        """
        Given a matrix of distances between test points and training points,
        predict a label for each test point.

        Inputs:
        - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
          gives the distance betwen the ith test point and the jth training point.

        Returns:
        - y: A numpy array of shape (num_test,) containing predicted labels for the
          test data, where y[i] is the predicted label for the test point X[i].
        """
        num_test = dists.shape[0]
        y_pred = np.zeros(num_test)
        for i in range(num_test):
            # A list of length k storing the labels of the k nearest neighbors to
            # the ith test point.
            closest_y = []
            #########################################################################
            # TODO:                                                                 #
            # Use the distance matrix to find the k nearest neighbors of the ith    #
            # testing point, and use self.y_train to find the labels of these       #
            # neighbors. Store these labels in closest_y.                           #
            # Hint: Look up the function numpy.argsort.                             #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

            closest_y = self.y_train[np.argsort(dists[i,:])[:k]]

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
            #########################################################################
            # TODO:                                                                 #
            # Now that you have found the labels of the k nearest neighbors, you    #
            # need to find the most common label in the list closest_y of labels.   #
            # Store this label in y_pred[i]. Break ties by choosing the smaller     #
            # label.                                                                #
            #########################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
            
            #find unique values in array along with their counts
            vals, counts = np.unique(closest_y, return_counts=True)

            #find mode
            y_pred[i] = vals[np.argwhere(counts == np.max(counts))][0]            

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        return y_pred

5. compute_distances_one_loop()实现

将 compute_distances_two_loops()中两层嵌套循环浓缩为仅针对测试图像的单层循环。

以下代码中的要点是利用利用broadcasting求出每张测试图像与所有训练图像的的差值，然后在利用np.square()进行逐点平方运算，然后再np.sum求和，需要注意np.sum是针对axis=1进行求和。

    def compute_distances_one_loop(self, X):
        """
        Compute the distance between each test point in X and each training point
        in self.X_train using a single loop over the test data.

        Input / Output: Same as compute_distances_two_loops
        """
        num_test = X.shape[0]
        num_train = self.X_train.shape[0]
        dists = np.zeros((num_test, num_train))
        for i in range(num_test):
            #######################################################################
            # TODO:                                                               #
            # Compute the l2 distance between the ith test point and all training #
            # points, and store the result in dists[i, :].                        #
            # Do not use np.linalg.norm().                                        #
            #######################################################################
            # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
            test = X[i,:].astype('float') # (1,3072)
            diff = self.X_train.astype('float') - test # N,3072, broadcasting
            dists[i,:] = np.sqrt(np.sum(np.square(diff),axis=1))
            #dists[i,:] = np.sqrt(np.sum(np.multiply(diff,diff),axis=1)) # Should also be OK.
            
            #dists[i,:] = np.dot(diff,diff.T) # Doesn't work here.

            # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return dists

但是，正如后面的时间测试结果所示，这个函数的计算速度甚至比 compute_distances_two_loops()要慢得多（我的测试结果是慢一倍）。不知道为什么会慢这么多？按道理来说即便不会变得更快，也应该和compute_distances_two_loops()相当啊？

当然，这段代码中利用broadcasting方式进行计算确实不是最优，更好的方案是像后面compute_distances_no_loops()那样将测试图片与训练图片之间的内积，与各图像自身的L2-norm分开来求。这个在下一节进行说明，就不再对这个函数进行优化了。

6. compute_distances_no_loops()实现

这个的实现花了不少时间琢磨。根本的要点在于：

\end{align}" src="https://latex.codecogs.com/gif.latex?%5Cbegin%7Balign%7Dd_2%28I_1%2C%20I_2%29%20%26%3D%20%5Csum%5Climits_%7Bk%7D%28I_1%5Bk%5D-I_2%5Bk%5D%29%5E2%20%5C%5C%20%26%3D%20%5Csum%5Climits_%7Bk%7D%7BI_1%5E2%5Bk%5D%7D%20+%20%5Csum%5Climits_%7Bk%7D%7BI_2%5E2%5Bk%5D%7D%20-%202%20%5Ccdot%20%5Csum%5Climits_%7Bk%7D%7BI_1%5Bk%5D%20%5Ccdot%20I_2%5Bk%5D%7D%20%5C%5C%20%26%3D%20%5Cleft%20%7C%20I_1%20%5Cright%20%7C_2%20+%20%5Cleft%20%7C%20I_2%20%5Cright%20%7C_2%20-%202%5Ccdot%20%3CI_1%2C%20I_2%3E%5Cend%7Balign%7D" />

经过这样分解以后，每张测试图像和每张训练图像自身的L2-norm只需要计算一次，而最后一项就是两者的内积（inner product）。

    def compute_distances_no_loops(self, X):
        """
        Compute the distance between each test point in X and each training point
        in self.X_train using no explicit loops.

        Input / Output: Same as compute_distances_two_loops
        """
        num_test = X.shape[0]
        num_train = self.X_train.shape[0]
        dists = np.zeros((num_test, num_train))
        #########################################################################
        # TODO:                                                                 #
        # Compute the l2 distance between all test points and all training      #
        # points without using any explicit loops, and store the result in      #
        # dists.                                                                #
        #                                                                       #
        # You should implement this function using only basic array operations; #
        # in particular you should not use functions from scipy,                #
        # nor use np.linalg.norm().                                             #
        #                                                                       #
        # HINT: Try to formulate the l2 distance using matrix multiplication    #
        #       and two broadcast sums.                                         #
        #########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
                
        X_train_norm_squ = (np.sum(np.multiply(self.X_train.astype('float'),self.X_train.astype('float')),axis=1))
        X_test_norm_squ  = (np.sum(np.multiply(X.astype('float'),X.astype('float')),axis=1))
        dotprod          = np.dot(X.astype('float'), self.X_train.astype('float').T) 
        dists            = np.sqrt(np.expand_dims(X_test_norm_squ, axis=1) + np.expand_dims(X_train_norm_squ, axis=0) - 2*dotprod)
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return dists

以上三种实现的时间对比如下：

Two loop version took 30.926073 seconds
One loop version took 61.558051 seconds
No loop version took 0.312998 seconds

最后的vectorized实现比前两者快两个数量级。

7. Cross-Validation

这个练习中完成了两个任务：

扫描各种不同的k以确认不同k对于本kNN分类器检测性能的影响
针对每个k进行5-fold交叉验证以得到更加合理的性能评估

重点在于利用np.array_split对原始数据集进行分割，以及然后利用np.concatenate()对分割后的数据重新拼接成各个fold的训练/测试集。关键代码如下：

        for l in range(num_folds):
            if l!=j:
                if Xtr is None:
                    Xtr = X_train_folds[l]
                    ytr = y_train_folds[l]
                else:
                    Xtr = np.concatenate((Xtr, X_train_folds[l]),axis=0)
                    ytr = np.concatenate((ytr, y_train_folds[l]),axis=0)
        # Training
        classifier = KNearestNeighbor()
        classifier.train(Xtr, ytr)
        # Distance calculation
        dists_cur  = classifier.compute_distances_no_loops(Xval)
        # Prediction
        y_val_pred = classifier.predict_labels(dists_cur, k)
        num_correct = np.sum(y_val_pred == np.squeeze(yval))
        accuracy = float(num_correct) / len(y_val_pred)
        #print('k = %d: Got %d / %d correct => accuracy: %f' % (k, num_correct, len(y_val_pred), accuracy))    
        if k in k_to_accuracies:
            k_to_accuracies[k].append(accuracy)
        else:
            k_to_accuracies[k] = [accuracy]

运行以后得到预期结果（与原始网页课件中看到的图基本一致），说明以上实现基本是没毛病的。

以上实现的完整文件包下载地址： https://download.csdn.net/download/chenxy_bwave/85311643

欢迎分享，转载请注明来源：内存溢出

原文地址: http://www.outofmemory.cn/langs/874768.html

cs231n-2022-assignment1#Q1：kNN图像分类器实验

发表评论

评论列表（0条）