DL-C4W1

为什么要卷积?

for big size picture, the input scale would be very large. eg. an 10001000 size picture,
after flattening its features , you can get a vector as (3
1000*1000,1) = (3million ,1)

upload ssful

如果hidden layer只有1000层,那么W1 的输入大小是(1000,3m)

因为z(1000,1) = W1(1000,3M)*X(3M,1)+b

边缘检测

uad successful

用一个33大小的卷积核对一张66大小的图片进行卷积运算,最终得到一个4x4的图片

python:conv_forward

tf.nn.conv2d

边缘检测原理

用垂直边缘filter,可以明显吧边缘和非边缘区区分出来。

upload sucssful

多种边缘检测

upload succful

我们可以直接把filter中的数字直接看作是需要学习的参数

在nn中通过反向传播算法,学习到相应于目标结果的filter,然后把其应用在整个图片上,输出其提取到的所有有用的特征。

padding

从上面注意到每次卷积操作,图片会缩小。

filename alrea exists, renamed

所以我们要在卷积之前,为图片加padding,包围角落和边缘的像素,使得通过filter的卷积运算后,图片大小不变,也不会丢失角落。

filename y exists, renamed

valid/Some 卷积

Valid: no padding

nxn –>(n-f+1)x(n-f+1)

Same: padding

输出和输入图片的大小相同

p = (f-1)/2,在CV中,一般来说padding的值位奇数

N+2P-F+1 = N ,SO p = (F-1)/2

卷积步长(stride)

stride=1,表示每次卷积运算以一个步长进行移动。

upload ful

立体卷积

upd successful

filename alrea exists, renamed

第一行表示只检测红色通道的垂直边缘

第二行表示检测所有通道垂直边缘

卷积核第三个维度大小等于图片通道大小

多卷积

upcessful

上图意思是把检测垂直和水平边缘的两个图片叠成两层。

upload ful

单层卷积网络

upload cessful

与普通神经网络单层前向传播类似,卷机神经网络也是先由权重和bias做线性运算,然后得到结果在输入到一个激活函数中。

upload succsful

对应上图a[0]表示图片层(nn3)

w[1]对应卷积核(ff3)

a[1] 对应下一层(4x4x2)

单层卷积参数个数

filename alreay exists, renamed

不受图片大小影响

标记

filename alread exists, renamed

f[l] 卷积核大小

卷积核第三个维度大小等于输入图片通道数

而权重就是卷积核大小×卷积核个数,卷积核个数就是输出层的通道数目

激活值大小就是下一层输出层的大小: nH X nW X nC

简单卷积网络

filename y exists, renamed

最后得到的7x7x40,一共1960个参数,就是最后输入激活函数的所有参数

池化层

最大池化(max pooling)

把前一层得到的特征图进行池化减小,仅由当前小区域内的最大值来代表最终池化后的值。

uploaduccessful

平均池化

upload suessful

池化只需要设置好超参数,没有要学习的参数

总结

CNN的最大特点在于卷积的权值共享结构,可以大幅减少神经网络参数量,防止过拟合的同时又降低了神经网络模型的复杂度。

CNN通过卷积的方式实现局部链接,得到图片的参数量只跟卷积核的大小有关,一个卷积核对应一个图片特征,每一个卷积核滤波得到的图像就是一类特征的映射。

也就是说训练的权值数量只与卷积核大小与数量有关,但注意的是隐含层节点数量没有下降,隐含节点的数量只与卷积的步长有关,如果步长为1,那么隐含节点数量与输入图像像素数量一致。如果步长为5,那么每5x5个像素才需要一个隐含节点。

再总结,CNN的要点就是

1.局部连接

2.权值共享

3.池化层的降采样

其中1与2降低了参数量,训练复杂度下降并减轻过拟合。

同时权值共享赋予了卷积网络对平移的容忍性。

upload successl

随着nn层数增加,提取的特征图片大小将会减小,但是同时间通道数量会增加

为什么使用CNN?

1.参数少
upload succeul

2.参数共享&链接的稀疏性

参数共享指一个卷积核可以有多个不同的卷积核,而每一个卷积核对应一个滤波后映射出的新图像,同一个新图像的每一个像素都来自完全相同的卷积核。

filename already exists, renad

implementation

zero padding

filename alreay exists, renamed

benefits

finame already exists, renamed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

def zero_pad(X, pad):
"""
Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image,
as illustrated in Figure 1.

Argument:
X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
pad -- integer, amount of padding around each image on vertical and horizontal dimensions

Returns:
X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
"""

### START CODE HERE ### (≈ 1 line)
X_pad = np.pad(X, ((0, 0), (pad, pad), (pad, pad), (0, 0)), 'constant', constant_values=0)
### END CODE HERE ###

return X_pad

forward convolution

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def conv_single_step(a_slice_prev, W, b):
"""
Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation
of the previous layer.

Arguments:
a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)

Returns:
Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
"""

### START CODE HERE ### (≈ 2 lines of code)
# Element-wise product between a_slice and W. Add bias.
s = np.multiply(a_slice_prev, W) + b
# Sum over all entries of the volume s
Z = np.sum(s)
### END CODE HERE ###

return Z

define a slice

upload succeful

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def conv_forward(A_prev, W, b, hparameters):
"""
Implements the forward propagation for a convolution function

Arguments:
A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
b -- Biases, numpy array of shape (1, 1, 1, n_C)
hparameters -- python dictionary containing "stride" and "pad"

Returns:
Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
cache -- cache of values needed for the conv_backward() function
"""

### START CODE HERE ###
# Retrieve dimensions from A_prev's shape (≈1 line)
(m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape

# Retrieve dimensions from W's shape (≈1 line)
(f, f, n_C_prev, n_C) = W.shape

# Retrieve information from "hparameters" (≈2 lines)
stride = hparameters['stride']
pad = hparameters['pad']

# Compute the dimensions of the CONV output volume using the formula given above. Hint: use int() to floor. (≈2 lines)
n_H = int((n_H_prev - f + 2 * pad) / stride) + 1
n_W = int((n_W_prev - f + 2 * pad) / stride) + 1

# Initialize the output volume Z with zeros. (≈1 line)
Z = np.zeros((m, n_H, n_W, n_C))

# Create A_prev_pad by padding A_prev
A_prev_pad = zero_pad(A_prev, pad)

for i in range(m): # loop over the batch of training examples
a_prev_pad = A_prev_pad[i] # Select ith training example's padded activation
for h in range(n_H): # loop over vertical axis of the output volume
for w in range(n_W): # loop over horizontal axis of the output volume
for c in range(n_C): # loop over channels (= #filters) of the output volume
# Find the corners of the current "slice" (≈4 lines)
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
# Use the corners to define the (3D) slice of a_prev_pad (See Hint above the cell). (≈1 line)
a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
# Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron. (≈1 line)
Z[i, h, w, c] = conv_single_step(a_slice_prev, W[...,c], b[...,c])

### END CODE HERE ###

# Making sure your output shape is correct
assert(Z.shape == (m, n_H, n_W, n_C))

# Save information in "cache" for the backprop
cache = (A_prev, W, b, hparameters)

return Z, cache

对应的notebook

https://github.com/AlexanderChiuluvB/deep-learning-coursera/blob/master/Convolutional%20Neural%20Networks/Convolution%20model%20-%20Step%20by%20Step%20-%20v1.ipynb