神经网络思想建立LR模型(DL公开课第二周答案)
LR回顾
LR计算图求导
算法结构
设计一个简单的算法实现判别是否是猫。
用一个神经网络的思想建立一个LR模型,下面这个图解释了为什么LR事实上是一个简单的神经网。
[图片上传失败...(image-4b2c8b-1515499689320)]
Mathematical expression of the algorithm:
For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b tag{1}$$
$$hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})tag{2}$$
$$ mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} log(a^{(i)}) - (1-y^{(i)} ) log(1-a^{(i)})tag{3}$$
The cost is then computed by summing over all training examples:
$$ J = frac{1}{m} sum_{i=1}^m mathcal{L}(a^{(i)}, y^{(i)})tag{6}$$
构建算法的各个部分
建立神经网络的主要步骤是:
- 定义模型结构(例如输入特性的数量)
- 初始化模型的参数
- 循环:
计算当前损失(正向传播)
计算当前梯度(向后传播)
更新参数(梯度下降)
您通常将1-3单独构建并将它们集成到一个我们称为model()的函数中。
01
工具函数
# GRADED FUNCTION: sigmoiddef sigmoid(z):
"""
Compute the sigmoid of z
Arguments: z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
s = 1/(1+np.exp(-z))
return s
02
初始化参数
# GRADED FUNCTION: initialize_with_zeros
def initialize_with_zeros(dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""
w = np.zeros((dim,1))
b = 0
assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))
return w, b
03
向前和向后传播
现在参数已经初始化,可以执行向前和向后传播步骤来学习参数。
Exercise: 实现方法 propagate()计算代价函数和梯度 Hints:
Forward Propagation:
- You get X
- You compute $A = sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
- You calculate the cost function: $J = -frac{1}{m}sum_{i=1}{m}y{(i)}log(a{(i)})+(1-y{(i)})log(1-a^{(i)})$
Here are the two formulas you will be using:
$$ frac{partial J}{partial w} = frac{1}{m}X(A-Y)^Ttag{7}$$
$$ frac{partial J}{partial b} = frac{1}{m} sum_{i=1}^m (a{(i)}-y{(i)})tag{8}$$
# GRADED FUNCTION: propagate
def propagate(w, b, X, Y):
"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
A = sigmoid(np.dot(w.T,X)+b) # compute activation
cost = -np.sum((Y*np.log(A)+(1-Y)*np.log(1-A)))/m # compute cost
# BACKWARD PROPAGATION (TO FIND GRAD)
dw = np.dot(X,(A-Y).T)/m
db = np.sum((A-Y))/m
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw, "db": db}
return grads, cost
04
优化
- 已经初始化了参数。
- 也可以计算一个成本函数和它的梯度。
- 现在,需要使用梯度下降来更新参数。
目标是通过最小化代价函数$J$来学习$w$ 和 $b$。对于$theta$,更新规则是 $ theta = theta - alpha text{ } dtheta$,$alpha$是学习率。
# GRADED FUNCTION: optimize
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples) num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""
costs = []
for i in range(num_iterations):
# Cost and gradient calculation (≈ 1-4 lines of code)
grads, cost = propagate(w, b, X, Y)
# Retrieve derivatives from grads
dw = grads["dw"]
db = grads["db"]
# update rule (≈ 2 lines of code)
w = w-learning_rate*dw
b = b-learning_rate*db
# Record the costs
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
05
预测
前面的函数将输出学习的w和b,我们可以使用w和b来预测数据集x的标签,实现预测()函数。计算预测有两个步骤:
1、Calculate $hat{Y} = A = sigma(w^T X + b)$
2、Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).
# GRADED FUNCTION: predictdef predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Compute vector "A" predicting the probabilities of a cat being present in the picture
A = sigmoid(np.dot(w.T,X)+b)
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
if A[0][i]>0.5:
Y_prediction[0][i]=1
assert(Y_prediction.shape == (1, m))
return Y_prediction
06
合并各个部分组成模型
现在,将通过将所有构建块(在前面部分中实现的函数)组合在一起,以正确的顺序将整个模型构建起来。
# GRADED FUNCTION: model
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
"""
Builds the logistic regression model by calling the function you've implemented previously Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train) Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""
# initialize parameters with zeros (≈ 1 line of code)
w, b = initialize_with_zeros(X_train.shape[0])
# Gradient descent (≈ 1 line of code)
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost = False)
# Retrieve parameters w and b from dictionary "parameters"
w = parameters["w"]
b = parameters["b"]
# Predict test/train set examples (≈ 2 lines of code)
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
# Print train/test Errors
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100)) print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
- system表空间不足的问题分析(二) (r8笔记第5天)
- golang基于redis lua封装的优先级去重队列
- python基础知识——内置数据结构(元组)
- python基础知识——控制语句
- python基础知识——基本语法
- 11g主库归档自动删除的小问题分析 (r8笔记第1天)
- JavaWeb02-CSS,JS(Java真正的全栈开发)
- 数据处理——One-Hot Encoding
- JavaWeb20-文件上传;下载(Java真正的全栈开发)
- 转--每周一个GoLang设计模式之组合模式
- 简单易学的机器学习算法——Softmax Regression
- JavaWeb19-Listener ; Filter
- dataguard归档路径的问题(r7笔记第99天)
- 厚土Go学习笔记 | 34. 一个简单的 web 服务器实现
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 【论文解读】无需额外数据、Tricks、架构调整,CMU开源首个将ResNet50精度提升至80%+新方法
- 机器学习模型调参指南(附代码)
- Flink SQL 自定义函数指南 - 以读取 GBK 编码的数据库为例
- 光照不均匀图像分割技巧1——分块阈值
- MySQL 8.0新特性 — 不可见索引
- 【小白学PyTorch】9.tensor数据结构与存储结构
- 【Python相关】jupyter平台最强插件没有之一
- 基于 OpenCV 的图像分割
- 再见,可视化!你好,Pandas!
- 40000字 Matplotlib 实操干货,真的全!
- Python自动化(二十) | 聊聊 Python 操作PDF的几种方法(合并、拆分、水印、加密)
- C语言发展史的点点滴滴
- 我写了一个R包,简化芯片的差异分析
- 【收藏】万字解析Scipy的使用技巧!
- Python 如何使用 HttpRunner 做接口自动化测试