【随学随想】numpy实现线性回归（批量梯度下降法)_随笔

【随学随想】numpy实现线性回归（批量梯度下降法)

前言
一、线性回归是什么？
二、引用资料
三、实现步骤
- 1.引入库
- 2.搭建LR(线性回归)类，并设定超参数
- 3.构建假设函数
- 4.损失函数
- 5.梯度
- 6.批量梯度下降法
- 7.验证
- - 7-1.构建数据集
  - 7-2.对数据集进行预测
总结

前言

线性回归(Linear Rrgresson)是最简单的机器学习算法之一，是绝大部分萌新入坑机器学习的第一个算法，相当于机器学习界的"Hello World"。为了让大家更好地学习线性回归，因此笔者将使用numpy(手写)批量梯度下降法实现线性回归的代码。

一、线性回归是什么？

关于线性回归，网络上有许多相关资料，譬如维基百科，百度百科，MBA智库百科上的“线性回归”条目，吴恩达的机器学习lecture2~4以及其他CSDN bloger写的文章，相信大部分人写的都比笔者要好，所以对于原理部分笔者将不再赘述。

二、引用资料

吴恩达《Machine Learning|Coursera》课件中的Lecture2~Lecture4部分

三、实现步骤

代码如下

1.引入库

首先引入numpy库

import numpy as np

2.搭建LR(线性回归)类，并设定超参数

这里，由于计算精度的问题，优化器要迭代至完全0误差需要很多的步数，或者说周期数，故设定阈值。即若误差小于阈值，将停止迭代 (之前的文章【随学随想】自适应过滤法预测时间序列有提到过)

class LR:
    def __init__(self, variable, label, alpha=0.0001, max_epoch=100):
        self.variable = np.array(variable, dtype=float)  # 特徵
        self.label = np.array(label)  # 標簽值

        self.num_sample = self.variable.shape[0]  # 樣本數
        self.num_variable = self.variable.shape[1]  # 特徵數，用其確定參數(假設函數權重)的個數

        self.alpha = alpha  # 學習率
        self.threshold = 0.0001  # 均方誤差變化閾值

        self.theta = np.random.normal(size=(1, self.num_variable)).T.flatten()  # 初始權重

        self.max_epoch = max_epoch  # 最大迭代數

3.构建假设函数

假设函数表达式如下： h θ = ∑ i = 0 n θ i x i h_theta=sum^{n}_{i=0}theta_{i}x_{i} hθ=i=0∑nθixi
为了方便计算机运算以及表达的清晰简洁，我们将写成向量运算的形式，即 h θ = θ T x boldsymbol h_theta = boldsymbol theta^T boldsymbol x hθ=θTx

 def Hypothesis_func(self, X, Theta):  # 假設函數

        return np.dot(X, Theta)

4.损失函数

损失函数表达式如下： J = 1 2 m ∑ i = 0 m ( h θ ( x i ) − y i ) 2 J = frac{1}{2m}sum^m_{i=0}(h_theta(boldsymbol x^i)-y^i)^2 J=2m1i=0∑m(hθ(xi)−yi)2
其中 m m m为样本数

    def loss_func(self, Prediction, label):  # 損失函數
        '''
        print('損失函數')
        print('Prediction.shape')
        print(Prediction.shape)
        print('label.shape')
        print(label.shape)
        print('損失函數')
        print('')
        '''
        return (1 / 2 * self.num_variable) * (np.sum((Prediction.flatten() - label)) ** 2)

5.梯度

我们对损失函数中的各参数,即 θ i theta_i θi求偏导数，并将其构建为一向量,即为梯度

    def delta_loss_func(self, Prediction, label):  # 損失函數導函數
        Prediction = Prediction.flatten()
        h = (Prediction - label).T
        return (1 / self.num_sample) * np.dot((Prediction - label).T, self.variable)

6.批量梯度下降法

在前面损失函数中，损失函数样本数 m m m取全部样本的个数

    def BGD(self):
        epoch = 0  # 當前步數
        del_mse = self.delta_loss_func(Prediction=self.Hypothesis_func(self.variable, self.theta),
                                       label=self.label)


        while del_mse.all() > self.threshold and (epoch < self.max_epoch):
           
            self.theta = self.theta - self.alpha * self.delta_loss_func(
                Prediction=self.Hypothesis_func(self.variable, self.theta), label=self.label).T
            epoch = epoch + 1
            # print('self.theta')
            # print(self.theta)
            mse = self.loss_func(self.Hypothesis_func(self.variable, self.theta), label=self.label)
            print('第{}次迭代,均方誤差為{:.7f}'.format(epoch, mse))
        return self.theta

7.验证 7-1.构建数据集

我们将随机生成100个样本，并构建标签 y = 3 x 0 + 1 x 1 + 2 x 2 + 6 x 3 + 5 x 4 y=3x_0+1x_1+2x_2+6x_3+5x_4 y=3x0+1x1+2x2+6x3+5x4

if __name__ == "__main__":
    '''數據集構建'''
    data_x = np.random.normal(size=(100, 5))
    weight_0 = np.array([3, 1, 2, 6, 5])

    data_y = np.dot(data_x, weight_0)

    train_x = data_x[0:80]
    train_y = data_y[0:80]

    test_x = data_x[81:100]
    test_y = data_y[81:100]

7-2.对数据集进行预测

'''對數據進行預測'''
    LR = LR(train_x, train_y, alpha=0.1, max_epoch=1000)
    theta = LR.BGD()
    print(theta)
    y_predict = np.dot(test_x, theta)
    MSE_test = LR.loss_func(Prediction=y_predict, label=test_y)
    print('測試集上的誤差為: ', MSE_test)

总结

通过本篇文章，相信大家已经了解了如何使用numpy库搭建线性回归模型。

谢谢大家看完我的文章。最后，祝大家在今后学习顺利，生活愉快。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://www.outofmemory.cn/zaji/5443084.html

【随学随想】numpy实现线性回归（批量梯度下降法)

发表评论

评论列表（0条）