Alex Chiu


  • Home

  • About

  • Categories

  • Archives

  • Schedule

  • Sitemap

CS230-深层神经网络图像分类

Posted on 2018-12-24

数据预处理

filename eady exists, renamed

每一张图片都可以把它转化为一个m*1的向量

1
2
3
4
5
6
7
8
9
10
11

# Reshape the training and test examples
train_x_flatten = train_x_orig.reshape(train_x_orig.shape[0], -1).T # The "-1" makes reshape flatten the remaining dimensions
test_x_flatten = test_x_orig.reshape(test_x_orig.shape[0], -1).T

# Standardize data to have feature values between 0 and 1.
train_x = train_x_flatten/255.
test_x = test_x_flatten/255.

print ("train_x's shape: " + str(train_x.shape))
print ("test_x's shape: " + str(test_x.shape))

upload sussful

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):#lr was 0.009
"""
Implements a L-layer neural network: [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID.

Arguments:
X -- data, numpy array of shape (number of examples, num_px * num_px * 3)
Y -- true "label" vector (containing 0 if cat, 1 if non-cat), of shape (1, number of examples)
layers_dims -- list containing the input size and each layer size, of length (number of layers + 1).
learning_rate -- learning rate of the gradient descent update rule
num_iterations -- number of iterations of the optimization loop
print_cost -- if True, it prints the cost every 100 steps

Returns:
parameters -- parameters learnt by the model. They can then be used to predict.
"""

np.random.seed(1)
costs = [] # keep track of cost

# Parameters initialization.
### START CODE HERE ###
parameters = initialize_parameters_deep(layers_dims)
### END CODE HERE ###

# Loop (gradient descent)
for i in range(0, num_iterations):

# Forward propagation: [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOID.
### START CODE HERE ### (≈ 1 line of code)
AL, caches = L_model_forward(X, parameters)
### END CODE HERE ###

# Compute cost.
### START CODE HERE ### (≈ 1 line of code)
cost = compute_cost(AL, Y)
### END CODE HERE ###

# Backward propagation.
### START CODE HERE ### (≈ 1 line of code)
grads = L_model_backward(AL, Y, caches)
### END CODE HERE ###

# Update parameters.
### START CODE HERE ### (≈ 1 line of code)
parameters = update_parameters(parameters, grads, learning_rate=learning_rate)
### END CODE HERE ###

# Print the cost every 100 training example
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
if print_cost and i % 100 == 0:
costs.append(cost)

# plot the cost
plt.plot(np.squeeze(costs))
plt.ylabel('cost')
plt.xlabel('iterations (per tens)')
plt.title("Learning rate =" + str(learning_rate))
plt.show()

return parameters

CS230-搭建深层的神经网络

Posted on 2018-12-24

参数的初始化

upload succeful

搭建深层的神经网络,各层W,X,B的维度一定要搞清楚

如上图所示,第i层W的维度为(n^[i],n^[i-1])

其中12288是特征数量,209是样本数,n^[i]是第i层神经元的数量。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

代码如下


def initialize_parameters_deep(layer_dims):
"""
Arguments:
layer_dims -- python array (list) containing the dimensions of each layer in our network

Returns:
parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
bl -- bias vector of shape (layer_dims[l], 1)
"""

np.random.seed(3)
parameters = {}
L = len(layer_dims) # number of layers in the network

for l in range(1, L):
parameters['W'+str(l)] = np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
parameters['b'+str(l)] = np.zeros((layer_dims[l],1))
### END CODE HERE ###

assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))


return parameters

Forward Propagation

upload successl

实现Relu或者sigmoid激活函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
def sigmoid(Z):
"""
Implements the sigmoid activation in numpy

Arguments:
Z -- numpy array of any shape

Returns:
A -- output of sigmoid(z), same shape as Z
cache -- returns Z as well, useful during backpropagation
"""

A = 1/(1+np.exp(-Z))
cache = Z

return A, cache

def relu(Z):
"""
Implement the RELU function.

Arguments:
Z -- Output of the linear layer, of any shape

Returns:
A -- Post-activation parameter, of the same shape as Z
cache -- a python dictionary containing "A" ; stored for computing the backward pass efficiently
"""

A = np.maximum(0,Z)

assert(A.shape == Z.shape)

cache = Z
return A, cache

以下是linear_activation_forward的函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

def linear_activation_forward(A_prev, W, b, activation):
"""
Implement the forward propagation for the LINEAR->ACTIVATION layer

Arguments:
A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
b -- bias vector, numpy array of shape (size of the current layer, 1)
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

Returns:
A -- the output of the activation function, also called the post-activation value
cache -- a python dictionary containing "linear_cache" and "activation_cache";
stored for computing the backward pass efficiently
"""

if activation == "sigmoid":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
### START CODE HERE ### (≈ 2 lines of code)
Z = np.dot(W,A_prev)+b
A,activation_cache = sigmoid(Z)
### END CODE HERE ###

elif activation == "relu":
# Inputs: "A_prev, W, b". Outputs: "A, activation_cache".
### START CODE HERE ### (≈ 2 lines of code)
Z = np.dot(W,A_prev)+b
A,activation_cache = relu(Z)
### END CODE HERE ###

assert (A.shape == (W.shape[0], A_prev.shape[1]))
cache = (linear_cache, activation_cache)

return A, cache

对于L层网络的forwarding

upload succul

前L-1层作Relu变换,最后一层做sigmoid变换(由于是二分类)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

def L_model_forward(X, parameters):
"""
Implement forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID computation

Arguments:
X -- data, numpy array of shape (input size, number of examples)
parameters -- output of initialize_parameters_deep()

Returns:
AL -- last post-activation value
caches -- list of caches containing:
every cache of linear_relu_forward() (there are L-1 of them, indexed from 0 to L-2)
the cache of linear_sigmoid_forward() (there is one, indexed L-1)
"""

caches = []
A = X
L = len(parameters) // 2 # number of layers in the neural network


# Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
for l in range(1, L):
A_prev = A
### START CODE HERE ### (≈ 2 lines of code)
A,cache = linear_activation_forward(A_prev, parameters['W'+str(l)], parameters['b'+str(l)], activation='relu')
caches.append(cache)
### END CODE HERE ###

# Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
### START CODE HERE ### (≈ 2 lines of code)
AL,cache = linear_activation_forward(A, parameters['W'+str(L)], parameters['b'+str(L)], activation='sigmoid')
caches.append(cache)
### END CODE HERE ###

assert(AL.shape == (1,X.shape[1]))

return AL, caches

在这里你得到的AL就是经过L曾训练后的parameters,可以用来测试结果,同时所有中间结果都保存在了caches中。

cost function

upload succel

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

def compute_cost(AL, Y):
"""
Implement the cost function defined by equation (7).

Arguments:
AL -- probability vector corresponding to your label predictions, shape (1, number of examples)
Y -- true "label" vector (for example: containing 0 if non-cat, 1 if cat), shape (1, number of examples)

Returns:
cost -- cross-entropy cost
"""

m = Y.shape[1]

# Compute loss from aL and y.
### START CODE HERE ### (≈ 1 lines of code)
##attention! here Y,and np.log(AL) is scalar so use multiply rather than dot
cost = (-1./m)*np.sum(np.multiply(Y,np.log(AL))+np.multiply(1-Y,np.log(1-AL)))
### END CODE HERE ###

cost = np.squeeze(cost) # To make sure your cost's shape is what we expect (e.g. this turns [[17]] into 17).
assert(cost.shape == ())

return cost

backward propagation

upload succeful

要通过计算dL/dz 来算出dw,dAprev,db 来进行梯度下降

upload succesful

假设已知第l层的dZ,同时我们在做forward propagation的时候把对应该层的A_PREV,W已经存起来了

两个辅助函数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

def relu_backward(dA, cache):
"""
Implement the backward propagation for a single RELU unit.

Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently

Returns:
dZ -- Gradient of the cost with respect to Z
"""

Z = cache
dZ = np.array(dA, copy=True) # just converting dz to a correct object.

# When z <= 0, you should set dz to 0 as well.
dZ[Z <= 0] = 0

assert (dZ.shape == Z.shape)

return dZ

def sigmoid_backward(dA, cache):
"""
Implement the backward propagation for a single SIGMOID unit.

Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently

Returns:
dZ -- Gradient of the cost with respect to Z
"""

Z = cache

s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)

assert (dZ.shape == Z.shape)

return dZ

linear backward

计算公式如上图所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def linear_backward(dZ, cache):
"""
Implement the linear portion of backward propagation for a single layer (layer l)

Arguments:
dZ -- Gradient of the cost with respect to the linear output (of current layer l)
cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
A_prev, W, b = cache
m = A_prev.shape[1]

### START CODE HERE ### (≈ 3 lines of code)
dA_prev = np.dot(W.T,dZ)
dW = (1/m)*np.dot(dZ,A_prev.T)
db = (1/m)*np.sum(dZ,axis=1,keepdims = True)
### END CODE HERE ###

assert (dA_prev.shape == A_prev.shape)
assert (dW.shape == W.shape)
assert (db.shape == b.shape)

return dA_prev, dW, db

linear_activation_backward

那么上一个函数的dZ又是怎么计算?

upload succsful

我们运用最开始定义的两个辅助函数来计算

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def linear_activation_backward(dA, cache, activation):
"""
Implement the backward propagation for the LINEAR->ACTIVATION layer.

Arguments:
dA -- post-activation gradient for current layer l
cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
linear_cache, activation_cache = cache

if activation == "relu":
### START CODE HERE ### (≈ 2 lines of code)
dZ = relu_backward(dA, activation_cache)
dA_prev,dW,db = linear_backward(dZ,linear_cache)
### END CODE HERE ###

elif activation == "sigmoid":
### START CODE HERE ### (≈ 2 lines of code)
dZ = sigmoid_backward(dA, activation_cache)
dA_prev,dW,db = linear_backward(dZ,linear_cache)
### END CODE HERE ###

return dA_prev, dW, db

对L层的神经网络的backward

对最后一层是sigmoid_backward,剩下的所有层是linear_backward

filename already exists, renaed

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# GRADED FUNCTION: L_model_backward

def L_model_backward(AL, Y, caches):
"""
Implement the backward propagation for the [LINEAR->RELU] * (L-1) -> LINEAR -> SIGMOID group

Arguments:
AL -- probability vector, output of the forward propagation (L_model_forward())
Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
caches -- list of caches containing:
every cache of linear_activation_forward() with "relu" (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
the cache of linear_activation_forward() with "sigmoid" (it's caches[L-1])

Returns:
grads -- A dictionary with the gradients
grads["dA" + str(l)] = ...
grads["dW" + str(l)] = ...
grads["db" + str(l)] = ...
"""
grads = {}
L = len(caches) # the number of layers
m = AL.shape[1]
Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL

# Initializing the backpropagation
### START CODE HERE ### (1 line of code)
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
### END CODE HERE ###

# Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "AL, Y, caches". Outputs: "grads["dAL"], grads["dWL"], grads["dbL"]
### START CODE HERE ### (approx. 2 lines)
current_cache = caches[-1]
grads["dA"+str(L)],grads["dW"+str(L)],grads["db"+str(L)] = linear_activation_backward(dAL, current_cache, 'sigmoid')
### END CODE HERE ###

for l in reversed(range(L-1)):
# lth layer: (RELU -> LINEAR) gradients.
# Inputs: "grads["dA" + str(l + 2)], caches". Outputs: "grads["dA" + str(l + 1)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]
### START CODE HERE ### (approx. 5 lines)
current_cache = caches[l];
current_A = grads["dA"+str(l+2)]
grads["dA" + str(l + 1)],grads["dW" + str(l + 1)],grads["db" + str(l + 1)] = linear_activation_backward(current_A, current_cache, 'relu')
### END CODE HERE ###

return grads

更新参数

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def update_parameters(parameters, grads, learning_rate):
"""
Update parameters using gradient descent

Arguments:
parameters -- python dictionary containing your parameters
grads -- python dictionary containing your gradients, output of L_model_backward

Returns:
parameters -- python dictionary containing your updated parameters
parameters["W" + str(l)] = ...
parameters["b" + str(l)] = ...
"""

L = len(parameters) // 2 # number of layers in the neural network

# Update rule for each parameter. Use a for loop.
### START CODE HERE ### (≈ 3 lines of code)
for l in range(L):
parameters["W"+str(l+1)]-=learning_rate*grads["dW"+str(l+1)]
parameters["b"+str(l+1)]-=learning_rate*grads["db"+str(l+1)]

### END CODE HERE ###

return parameters

PRML-线性模型

Posted on 2018-12-22

线性基函数模型

线性回归模型 y(x,w) = w0+w1x1+w2x2+…wDxD

而为了改进模型,把xi替换为非线性表达式

uploaccessful

本实验使用了多项式基函数,高斯基函数和sigmoid基函数,以及混合型(多项式+sin正弦)而损失函数有两种选择,交叉熵函数以及均方误差,这里选择均方误差,求导比较方便。

推导

多项式基函数

y = w0+w1x1+w2x2^2 + b

λ2是L2正则化系数,可以解决过拟合问题

正则化意义:https://blog.csdn.net/jinping_shi/article/details/52433975

推导

filename alrey exists, renamed

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86

import numpy as np

def propogate(w,b,X,Y):

"""
:param w: (n,1)
:param b: scalar
:param X: (n,m)
:param Y: (1,n)
:return: grads a dict that saves the params dw,db
cost saves the cost after each iteration
"""
#np.squeeze
#squeeze 函数:从数组的形状中删除单维度条目,即把shape中为1的维度去掉
m = X.shape[1]

Yba = np.dot(w.T,X) + b

cost = np.squeeze((0.5/m *np.dot((Yba-Y),(Yba-Y).T)))

dw = (1./m) * np.dot(X,(Yba-Y).T)

db = (1./m) * np.squeeze(np.sum(Yba-Y))
#save the updated result to the grads dict
grads = {"dw":dw,"db":db}

return grads,cost

def optimize(w,b,X,Y,epochs,learning_rate,l2):
"""

:param w: (n,1)
:param b: scalar
:param X: (n,m)
:param Y: (1,n)
:param epochs: 迭代次数
:param learning_rate:
:param l2: 正则化系数
:return: params
grads save the final params ,used for test
costs save the cost
"""

costs = []

for i in range(epochs):
grads,cost = propogate(w,b,X,Y)
dw = grads["dw"]
db = grads["db"]

w-=learning_rate*dw+l2*w
b-=learning_rate*db

params = {"w":w,"b":b}
grads = {"dw":dw,"db":db}

return params,grads,costs

def main(x_train, y_train,n,epoches,learning_rate,l2):
"""训练模型,并返回从x到y的映射。"""

# 使用线性回归训练模型,根据训练集计算最优化参数
## 请补全此处代码,替换以下示例
m = x_train.shape[0]
w = (np.random.randn(n,1))
X = np.zeros((n,m),dtype = np.float64)
b = np.float(0)
for i in range(n):
X[i,:] = x_train**(i+1)
Y = np.float64(np.reshape(y_train,newshape=(1,m)))
params,grads,costs = optimize(w,b,X,Y,epoches,learning_rate,l2)
w = params['w']
b = params['b']
def f(x):
## 请补全此处代码,替换以下示例
m = x.shape[0]

X = np.zeros((n,m))
for i in range(n):
X[i,:] = x**(i+1)
y = b+np.dot(w.T,X)
return y.squeeze()
pass

return f

result

参数:n=2 epochs = 100000 lr 1e-7

filename alrey exists, renamed

高斯基函数

upload succeful

推导

upload ssful

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
import numpy as np

def propogate(w,b,X,Y,mu,s):

m = X.shape[1]
Z = (X-mu)/s
A = np.exp(-(Z*Z)/2)
Yba = np.dot(w.T,A) + b
cost = np.squeeze((0.5/m *np.dot((Yba-Y),(Yba-Y).T)))

dY = 1./m*(Yba-Y)
dw = np.dot(A,dY.T)
db = np.squeeze(np.sum(dY))
dA = w*dY
dZ = dA*A*(-Z)
dmu = 1./m*np.sum(dZ*(-1/s),axis=1,keepdims=True)
ds = 1./m*np.sum(dZ*(-(X-mu)/(s*s)),axis=1,keepdims=True)


grads = {"dw":dw,"db":db,"dmu":dmu,"ds":ds}

return grads,cost


def optimize(w,b,X,Y,mu,s,epochs,learning_rate,l2):

costs = []

for i in range(epochs):
grads,cost = propogate(w,b,X,Y,mu,s)
dw = grads["dw"]
db = grads["db"]
dmu = grads["dmu"]
ds = grads["ds"]

w-=learning_rate*dw+l2*w
b-=learning_rate*db+l2*b
mu-=learning_rate*dmu+l2*mu
s -=learning_rate*ds+l2*s

costs.append(cost)

params = {"w":w,"b":b,"mu":mu,"s":s}


return params,costs



def main(x_train, y_train,n,epoches,learning_rate,l2):
"""训练模型,并返回从x到y的映射。"""

# 使用线性回归训练模型,根据训练集计算最优化参数
## 请补全此处代码,替换以下示例
m = x_train.shape[0]
w = (np.random.randn(n,1))
# means of gaussian
mu = np.random.randn(n,1)
#I still have no idea why set like this?
for i in range(n):
mu[i,0]+=i*40
X = np.zeros((n,m),dtype = np.float64)
b = np.float(0)
for i in range(n):
X[i,:] = x_train
Y = np.float64(np.reshape(y_train,newshape=(1,m)))

#initalize s if s is too close to 0 then it will go wrong
while True:
flag = True
s = np.random.randn(n,1)
jump = np.int32(np.abs(s)<1)
if(np.sum(jump))>=1:
flag = False
if flag is True:
break

params,costs = optimize(w,b,X,Y,mu,s,epoches,learning_rate,l2)
w = params['w']
b = params['b']
mu = params['mu']
s = params['s']
def f(x):
## 请补全此处代码,替换以下示例
m = x.shape[0]

X = np.zeros((n,m))
for i in range(n):
X[i,:] = x
Z = (X-mu)/s
A = np.exp(-(Z*Z)/2)
y = b+np.dot(w.T,A)
return y.squeeze()
pass

return f

参数:f = gaussian.main(x_train, y_train,3,500000,1e-2,0)

结果

fileme already exists, renamed

sigmoid 函数

upload succeful

推导

结果与高斯类似

fiename already exists, renamed

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import numpy as np

def sigmoidFunction(z):
return 1./(1+np.exp(-z))


def propogate(w,b,X,Y,mu,s):

m = X.shape[1]
Z = (X-mu)/s
A = sigmoidFunction(Z)
Yba = np.dot(w.T,A) + b
cost = np.squeeze((0.5/m *np.dot((Yba-Y),(Yba-Y).T)))

dY = 1./m*(Yba-Y)
dw = np.dot(A,dY.T)
db = np.squeeze(np.sum(dY))
dA = w*dY
dZ = dA*A*(1-A)
dmu = 1./m*np.sum(dZ*(-1./s),axis=1,keepdims=True)
ds = 1./m*np.sum(dZ*(-(X-mu)/(s*s)),axis=1,keepdims=True)


grads = {"dw":dw,"db":db,"dmu":dmu,"ds":ds}

return grads,cost


def optimize(w,b,X,Y,mu,s,epochs,learning_rate,l2):



costs = []

for i in range(epochs):
grads,cost = propogate(w,b,X,Y,mu,s)
dw = grads["dw"]
db = grads["db"]
dmu = grads["dmu"]
ds = grads["ds"]

w-=learning_rate*dw+l2*w
b-=learning_rate*db+l2*b
mu-=learning_rate*dmu+l2*mu
s -=learning_rate*ds+l2*s

costs.append(cost)

params = {"w":w,"b":b,"mu":mu,"s":s}


return params,costs



def main(x_train, y_train,n,epoches,learning_rate,l2):
"""训练模型,并返回从x到y的映射。"""

# 使用线性回归训练模型,根据训练集计算最优化参数
## 请补全此处代码,替换以下示例
m = x_train.shape[0]
w = (np.random.randn(n,1))
# means of gaussian
mu = np.random.randn(n,1)
# mu is the mean of the sigmoid
for i in range(n):
mu[i,0]+= i*100/n
X = np.zeros((n,m),dtype = np.float64)
b = np.float(0)
for i in range(n):
X[i,:] = x_train
Y = np.float64(np.reshape(y_train,newshape=(1,m)))

#initalize s if s is too close to 0 then it will go wrong
while True:
flag = True
s = np.random.randn(n,1)
jump = np.int32(np.abs(s)<1)
if(np.sum(jump))>=1:
flag = False
if flag is True:
break

params,costs = optimize(w,b,X,Y,mu,s,epoches,learning_rate,l2)
w = params['w']
b = params['b']
mu = params['mu']
s = params['s']
def f(x):
## 请补全此处代码,替换以下示例
m = x.shape[0]

X = np.zeros((n,m))
for i in range(n):
X[i,:] = x
Z = (X-mu)/s
A = sigmoidFunction(Z)
y = b+np.dot(w.T,A)
return y.squeeze()
pass

return f

参数: f = sigmoid.main(x_train, y_train,10,100000,1e-2,0)

结果

upload successful

reference

https://github.com/Haicang/PRML/blob/master/lab1/Report.ipynb

https://blog.csdn.net/pipisorry/article/details/73770637

生产者与消费者模型&同步异步概念

Posted on 2018-12-18

Ref:

https://blog.csdn.net/fuzhongmin05/article/details/54616344

upload succsful

code:

upload successl

无法避免竞争

对count的访问没有做限制。

这里有可能出现竞争条件,其原因是对count的访问未作限制。有可能出现以下情况:缓冲区为空,消费者刚刚读取count的值发现它为0,此时调度程序决定暂停消费者并启动运行生产者。生产者向缓冲区加入一个数据项,count加1。现在count的值变成了1,它推断认为count刚才为0,所以消费者此时一定在睡眠,于是生产者调用wakeup来唤醒消费者。

但是消费者在逻辑上并未睡眠,所以wakeup信号丢失,当消费者下次运行时,它将测试先前读取的count值,发现它为0。于是睡眠,生产者迟早会填满整个缓冲区,然后睡眠,这样一来,两个进程将永远睡眠下去。

引入信号量的操作

Dijkstra建议设立两种操作:down和up(分别为一般化后的sleep和wakeup)。对一信号量执行down操作,则是检查其值是否大于0。若该值大于0,则将其减1(即用掉一个保存的唤醒信号)并继续;若该值为0,则进程将睡眠,而且此时down操作并未结束。检查数值、修改变量值以及可能发生的睡眠操作均作为一个单一的、不可分割的原子操作完成。保证一旦一个信号量操作开始,则在该操作完成或阻塞之前,其他进程均不允许访问该信号量。这种原子性对于解决同步问题和避免竞争条件是绝对必要的。所谓原子操作,是指一组相关联的操作要么都不间断地执行,要么不执行。

up操作对信号量的值增1。如果一个或多个进程在该信号量上睡眠,无法完成一个先前的down操作,则由系统选择其中的一个(如随机挑选)并允许该进程完成它的down操作。于是,对一个有进程在其上睡眠的信号量执行一次up操作后,该信号量的值仍旧是0,但在其上睡眠的进程却少了一个。信号量的值增加1和唤醒一个进程同样也是不可分割的,不会有某个进程因执行up而阻塞,正如前面的模型中不会有进程因执行wakeup而阻塞一样。

在Dijkstra原来的论文中,他分别使用名称P和V而不是down和up,荷兰语中,Proberen的意思是尝试,Verhogen的含义是增加或升高。

从物理上说明信号量的P、V操作的含义。 P(S)表示申请一个资源,S.value>0表示有资源可用,其值为资源的数目;S.value=0表示无资源可用;S.value<0, 则|S.value|表示S等待队列中的进程个数。V(S)表示释放一个资源,信号量的初值应该大于等于0。P操作相当于“等待一个信号”,而V操作相当于“发送一个信号”,在实现同步过程中,V操作相当于发送一个信号说合作者已经完成了某项任务,在实现互斥过程中,V操作相当于发送一个信号说临界资源可用了。实际上,在实现互斥时,P、V操作相当于申请资源和释放资源。

该解决方案使用了三个信号量:一个称为full,用来记录充满缓冲槽数目,一个称为empty,记录空的缓冲槽总数;一个称为mutex,用来确保生产者和消费者不会同时访问缓冲区。full的初值为0,empty的初值为缓冲区中槽的数目,mutex的初值为1。供两个或多个进程使用的信号量,其初值为1,保证同时只有一个进程可以进入临界区,称作二元信号量。如果每个进程在进入临界区前都执行down操作,并在刚刚退出时执行一个up操作,就能够实现互斥。

在下面的例子中,我们实际上是通过两种不同的方式来使用信号量,两者之间的区别是很重要的,信号量mutex用于互斥,它用于保证任一时刻只有一个进程读写缓冲区和相关的变量。互斥是避免混乱所必需的操作。

例题:

uplouccessful

items 就是full,记录满槽数目,初始值位0

slots 就是empty,就是缓冲区中槽的数目,初始值为10

mutex inital value = 0

那么insert function

1
2
3
4
5
6

P(slots); //empty slot-1
P(mutex); //进入临界区
insert_item
V(mutex); //离开临界区
V(items); //full slot+1

remove function

1
2
3
4
5
P(items);		//满槽数目-1
P(mutex);
remove_item;
V(mutex);
V(slots); //空槽数目+1

本例中,信号量保证缓冲区满的时候生产者停止运行,缓冲区空的时候消费者停止运行。

信号量用于同步

同步

一个进程执行某个请求,如果需要一段时间才能返回信息,那么进程会一直等待下去。也就是说直到收到返回信息才继续下去。

异步

进程不需要一直等待下去,继续执行下面的操作,不管其他进程的状态。

对于无界缓冲区问题,消费者只需关心缓冲区是否空即可

upload suessful。

僵尸进程和孤儿进程

Posted on 2018-12-18

我们知道在unix/linux中,正常情况下,子进程是通过父进程创建的,子进程在创建新的进程。子进程的结束和父进程的运行是一个异步过程,即父进程永远无法预测子进程 到底什么时候结束。 当一个 进程完成它的工作终止之后,它的父进程需要调用wait()或者waitpid()系统调用取得子进程的终止状态。

  孤儿进程:一个父进程退出,而它的一个或多个子进程还在运行,那么那些子进程将成为孤儿进程。孤儿进程将被init进程(进程号为1)所收养,并由init进程对它们完成状态收集工作。

  僵尸进程:一个进程使用fork创建子进程,如果子进程退出,而父进程并没有调用wait或waitpid获取子进程的状态信息,那么子进程的进程描述符仍然保存在系统中。这种进程称之为僵死进程。

3、问题及危害

  unix提供了一种机制可以保证只要父进程想知道子进程结束时的状态信息, 就可以得到。这种机制就是: 在每个进程退出的时候,内核释放该进程所有的资源,包括打开的文件,占用的内存等。 但是仍然为其保留一定的信息(包括进程号the process ID,退出状态the termination status of the process,运行时间the amount of CPU time taken by the process等)。直到父进程通过wait / waitpid来取时才释放。 但这样就导致了问题,如果进程不调用wait / waitpid的话, 那么保留的那段信息就不会释放,其进程号就会一直被占用,但是系统所能使用的进程号是有限的,如果大量的产生僵死进程,将因为没有可用的进程号而导致系统不能产生新的进程. 此即为僵尸进程的危害,应当避免。

  孤儿进程是没有父进程的进程,孤儿进程这个重任就落到了init进程身上,init进程就好像是一个民政局,专门负责处理孤儿进程的善后工作。每当出现一个孤儿进程的时候,内核就把孤 儿进程的父进程设置为init,而init进程会循环地wait()它的已经退出的子进程。这样,当一个孤儿进程凄凉地结束了其生命周期的时候,init进程就会代表党和政府出面处理它的一切善后工作。因此孤儿进程并不会有什么危害。

  任何一个子进程(init除外)在exit()之后,并非马上就消失掉,而是留下一个称为僵尸进程(Zombie)的数据结构,等待父进程处理。这是每个 子进程在结束时都要经过的阶段。如果子进程在exit()之后,父进程没有来得及处理,这时用ps命令就能看到子进程的状态是“Z”。如果父进程能及时 处理,可能用ps命令就来不及看到子进程的僵尸状态,但这并不等于子进程不经过僵尸状态。 如果父进程在子进程结束之前退出,则子进程将由init接管。init将会以父进程的身份对僵尸状态的子进程进行处理。

upload succesul

b

CSAPP-习题

Posted on 2018-12-17

filename already exists, renmed

C。读写锁只能同时由多个读者或者一个写者拥有,而且读和写是排他的,两者不能共存。

uplo successful

页大小大概占10位,那么虚拟地址分为虚拟页号和虚拟页偏移量,那么后10位就是偏移量,剩下的位就是索引位,对应在页表中的索引,那么所以说他们的索引一样,物理位置的物理页号一样,而物理偏移量A总是小于B

upload succesul

*(x+2) = A[2];

upload sussful

b

upload successl

d

Size of a pointer should be 8 byte on any 64-bit C/C++ compiler, but not necessarily size of int.

静态持续变量

1
2
3
4
5
6
7
int a = 3;  //静态变量,外部链接性
static int b = 4; //静态变量,内部链接性

void fun()
{
static int c = 5; //静态变量,无链接性
}

外部链接性: 多个文件共享

内部链接性: 只有所属的文件可用

无连接性: 只有该代码段内可用

外部链接性的静态变量

upload successful

1
2
3
4
5
6
int a = 2;

void fun()
{
int a = 3; //这是自动变量,会覆盖掉全局静态变量a
}

内部链接性的静态变量

filename alrey exists, renamed

无连接性的静态变量

upload succsful

CSAPP6.34/6.35

cache共有两个block,分别位于两个set中,设他们为b1和b2。每个block可以放下4个int类型的变量,也就是数组中的一行。在这一题中,源数组和目的数组是相邻排列的。所以内存和cache的映射情况是这样的:

b1 : src[0][] src[2][] dst[0][] dst[2][]

b2 : src[1][] src[3][] dst[1][] dst[3][]

upload succeful

这里搞错了,应该是dst数组全是m,src数组是左边的样子。

filename already exists,named

a

upld successful
An object that is “8 bytes aligned” is stored at a memory address that is a multiple of 8.

so choose c

理解cache中的block,cache和line

upload succeful

line就是block

directed mapped 就是一个set只有一个line

2way就是一个set可以放2个line

4way,就是一个set可以放4个line

fully associated 就是只有一个set,所以不需要中间位的index,来作为line index

为什么要用中间位来做索引?

空间局部性。

upload succeful

fork函数到底复制了啥?

“子进程是父进程的副本。例如,子进程获得父进程数据空间、堆和栈的副本。注意,这是子进程所拥有的副本。父进程和子进程并不共享这些存储空间部分。父进程和子进程共享正文段。”

upload succeful

filename already exists, ramed

子进程得到的只是全局变量的副本,该题应该选b

CSAPP-proxy lab

Posted on 2018-12-15

Implementing a sequential web proxy

先搞清listen socket 和 connected socket 的区别。

upload succful

一个套接字对标记着一个客户端和服务器的链接。

客户端是发起连接请求的主动实体,而内核会认为socket函数创建的套接字是主动套接字,而服务器就是要调用listen函数告诉内核,该套接字是被服务器而不是客户端使用的,即listen函数将一个主动套接字转化为监听套接字。

服务器通过accept函数等待来自客户端的连接请求到达监听套接字,并返回一个已连接套接字,这个connfd可以被用来与客户端进行通讯。

实验过程如下:

upload successful

some def

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* You won't lose style points for including this long line in your code */
static const char *user_agent_hdr = "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n";
static const char *conn_hdr = "Connection: close\r\n";
static const char *prox_hdr = "Proxy-Connection: close\r\n";
static const char *hostFormat = "Host: %s\r\n";
static const char *requestHeaderFormat = "GET %s HTTP/1.0\r\n";
static const char *endof_hdr = "\r\n";
static const char *connection_key = "Connection";
static const char *user_agent_key= "User-Agent";
static const char *proxy_connection_key = "Proxy-Connection";
static const char *hostKey = "Host";

void doit(int fd);
void parse_uri(char *uri,char *hostname,char *path,int *port);
void buildHTTPHeader(char *http_header,char *hostname,char *path,int port,rio_t *client_rio);
int connectEndServer(char *hostname,int port,char *httpHeader);
void *thread(void *vargp);
void initCache();
int reader(int fd,char *uri);
void writer(char *uri,char *buf);
//reference: https://zhuanlan.zhihu.com/p/37902495

typedef struct{
char *buf;
char *uri;
}cacheLine;

typedef struct{
cacheLine* objects;
int count;
}Cache;

Cache cache;
int readCount;
sem_t mutex,wmutex;
//用于多线程编程中,防止两条线程同时对同一公共资源(比如全局变量)进行读写的机制

main 函数,参考课本的tiny服务器,注意的是这里pthread_create是传值而不是引用,是为了避免竞争。(传值是传一个独立的副本)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int main(int argc,char **argv)
{

int listenfd,connfd;
socklen_t clientlen;
char hostname[MAXLINE];
char port[MAXLINE];
struct sockaddr_storage clientaddr;
pthread_t tid;
if(argc!=2){
fprintf(stderr,"usage %s <port>\n",argv[0]);
exit(1);
}
initCache();
//ignore the SIGPIPE signal
signal(SIGPIPE, SIG_IGN);
//transfrom the fd to listenfd
listenfd = Open_listenfd(argv[1]);
while(1){
clientlen = sizeof(clientaddr);
connfd = Accept(listenfd,(SA*)&clientaddr,&clientlen);
//ip->host name
Getnameinfo((SA *)&clientaddr,clientlen,hostname,MAXLINE,port,MAXLINE,0);
printf("Accepted connection from (%s %s)\n",hostname,port);
//pass value of connfd to create function to avoid competition
Pthread_create(&tid,NULL,thread,(void*)connfd);
}
return 0;
}

多线程并发

1
2
3
4
5
6
7
void *thread(void *vargp){
int connfd = (int)vargp;
//要把线程分离出去,让这个线程计数结束之后自己回收资源,避免内存泄露。
Pthread_detach(pthread_self());
doit(connfd);
Close(connfd);
}

doit

函数逻辑:

1.得到解析后的请求行和请求头

2.然后去连接对应的服务器,发送请求

3.建立连接后,返回信息会在描述符中,也就是endServerFd

4.再把信息从endServerFd中读取出来,直接写进客户端对应的描述符fd就可以。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67


//对客户端请求的HTTP header 进行处理,首先获得request header
//eg: GET http://www.zhihu.com HTTP/1.1
//然后对于请求URL进行分析,获取需要连接的服务器的hostname,port,
//修改客户端的HTTP,让proxy充当客户端把信息转发给正确的服务器,然后接收服务器
//的返回并转发给正确的客户端
void doit(int connfd){

char buf[MAXLINE],uri[MAXLINE],method[MAXLINE],version[MAXLINE];
//parseRequest(fd,&requestLine,headers,&numHead);
char endServerHTTP [MAXLINE];
char hostname[MAXLINE],path[MAXLINE];
char objectBUF[MAX_OBJECT_SIZE];
int port,endServerFd;

rio_t rio,serverRio;

Rio_readinitb(&rio,connfd);
Rio_readlineb(&rio,buf,MAXLINE);
//read GET http://www.zhihu.com HTTP/1.1
//format read function
sscanf(buf,"%s %s %s",method,uri,version);

if(strcasecmp(method,"GET")){
printf("Proxy does not implement the method");
return;
}

//parse the uri and save the hostname,path,port number to the argument
parse_uri(uri,hostname,path,&port);

//build the http header which will send to the end server
buildHTTPHeader(endServerHTTP,hostname,path,port,&rio);


strcpy(uri,hostname);
strcpy(uri+strlen(uri),path);
if(reader(connfd,uri)){
fprintf(stdout,"%s from cache\n",uri);
fflush(stdout);
return;
}

int totalSize = 0;
//connect to the end server;
endServerFd = connectEndServer(hostname,port);
if(endServerFd<0){
printf("connection failed");
return;
}
Rio_readinitb(&serverRio,endServerFd);
Rio_writen(endServerFd,endServerHTTP,strlen(endServerHTTP));

//receive message from end server and send to client
size_t n;
while((n=Rio_readlineb(&serverRio,buf,MAXLINE))){
printf("proxy received %ld bytes,then send.\n",n);
Rio_writen(connfd,buf,n);
strcpy(objectBUF+totalSize,buf);
totalSize+=n;
}
//each objectBUF save all info of the request
if(totalSize<MAX_OBJECT_SIZE)
writer(uri,objectBUF);
Close(endServerFd);
}

build HTTP that send to the end server

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
void buildHTTPHeader(char *http_header,char *hostname,char *path,int port,rio_t *client_rio){

char buf[MAXLINE],requestHeader[MAXLINE],otherHeader[MAXLINE],hostHeader[MAXLINE];

//request line
//static const char *requestHeaderFormat = "GET %s HTTP/1.0\r\n";
//把path内容按照格式写入requestHeader
sprintf(requestHeader,requestHeaderFormat,path);
while(Rio_readlineb(client_rio,buf,MAXLINE)>0){

if(!strcmp(buf,endof_hdr)){
break; //EOF
}
//Host
if(!strncasecmp(buf,hostKey,strlen(hostKey))){
strcpy(hostHeader,buf);
continue;
}
if(!strncasecmp(buf,connection_key,strlen(connection_key))&&!
strncasecmp(buf,proxy_connection_key,strlen(proxy_connection_key)),
!strncasecmp(buf,user_agent_key,strlen(user_agent_key))){
//把两个串连接起来
strcat(otherHeader,buf);
}

}
if(strlen(hostHeader)==0){
sprintf(hostHeader,hostFormat,buf);
}
//static const char *user_agent_hdr = "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120305 Firefox/10.0.3\r\n";
//static const char *conn_hdr = "Connection: close\r\n";
//static const char *prox_hdr = "Proxy-Connection: close\r\n";
//static const char *endof_hdr = "\r\n";

//put all header to http_header
sprintf(http_header,"%s%s%s%s%s%s%s",requestHeader,hostHeader,
conn_hdr,prox_hdr,user_agent_hdr,otherHeader,endof_hdr);
return;
}

parseuri

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

void parse_uri(char *uri,char *hostname,char *path,int *port){

//strstr(str1,str2) 函数用于判断字符串str2是否是str1的子串。
//如果是,则该函数返回str2在str1中首次出现的地址;否则,返回NULL。

*port = 80;
char *pos1 = strstr(uri,"//");

if(pos1){
pos1 = pos1+2;
}else{
pos1 = uri;
}

char *pos2 = strstr(pos1,":");
//case the uri has the port info
if(pos2){

// userinfo host port
// ┌─┴────┐ ┌────┴────────┐ ┌┴┐
// https://john.doe@www.example.com:123/forum/questions/?tag=networking&order=newest#top
// └─┬─┘ └───────┬────────────────────┘└─┬─────────────┘└──┬───────────────────────┘└┬─┘
// scheme authority path query fragment


//initalize the head of pos2 is \0 i.e clean the pos2
*pos2 = '\0';
sscanf(pos1,"%s",hostname);
sscanf(pos2+1,"%d%s",port,path);
}
else{
//
//telnet://192.0.2.16:80/xxx
//└──┬─┘ └──────┬──────┘│
//scheme authority path
// no port info
pos2 = strstr(pos1,"/");
if(pos2){
sscanf(pos1,"%s",hostname);
*pos2 = '/';
sscanf(pos2,"%s",path);
}
//only hostname info
else{
sscanf(pos1,"%s",hostname);
}
}
return;
}

connect end server

1
2
3
4
5
6
7
inline int connectEndServer(char *hostname,int port){

char portStr[100];
sprintf(portStr,"%d",port);
return Open_clientfd(hostname,portStr);

}

cache

采用读者-写者模型,可以让多个线程同时来读。

没有实现LRU,只是简单地把1MiB内存分为十块,每次接受请求并解析之后,先去cache看看有没有对应的web object,如果有直接返回给客户端,没有再从服务端请求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


int reader(int fd,char *uri){

//here uri = each server's hostname+path
int Found = 0;
P(&mutex);
readCount++;
if(readCount==1){
P(&wmutex);
}
V(&mutex);

for(int i=0;i<10;i++){
if(!strcmp(cache.objects[i].uri,uri)){
Rio_writen(fd,cache.objects[i].buf,MAX_OBJECT_SIZE);
Found=1;
break;
}
}

P(&mutex);
readCount--;
if(readCount==0){
V(&wmutex);
}
V(&mutex);
return Found;
}

void writer(char *uri,char *buf){

P(&wmutex);
strcpy(cache.objects[cache.count].uri,uri);
strcpy(cache.objects[cache.count].buf,buf);
++cache.count;
V(&wmutex);
}

//simple cache no use LRU,just split the memory to 10 block,
//each time use a loop to find whether the uri of the request is in the block
//在server和client之间加入代理的好处之一,就可以实现cache化。
//因为,经常有很多对同一个资源多次请求的情况,如果每次都从服务端获取,那样服务器会很累。
//如果可以在代理部分就实现一个cache,
//将最近客户端请求过的数据给存储起来,那样就不需要每次都要从服务器请求了,进而提高服务器的效率。


void initCache(){

sem_init(&mutex,0,1);
sem_init(&wmutex,0,1);
cache.objects = (cacheLine*)malloc(sizeof(cacheLine)*10);
cache.count=0;
readCount=0;
for(int i=0;i<10;i++){
cache.objects[i].buf = malloc(sizeof(char)*MAXLINE);
cache.objects[i].uri = malloc(sizeof(char)*MAX_OBJECT_SIZE);
}
}

Test

  • use ./free-port.sh to get a free port, like 4501

  • open a terminal, nc -l 4501

    • this is to start netcat as a server listening on port you get
  • open a terminal

1
curl -v --proxy http://localhost:23885/ http://localhost:4501/
  • open a terminal

./proxy 23885

filename alady exists, renamed

netcat is listening on 4501,proxy is listening on 23885,here netcat serves as a server,print sth in the ‘nc -l’ window,then you can see the exact sth print on the ‘curl’ window

Ref

https://blog.csdn.net/u012336567/article/details/52056089

https://zhuanlan.zhihu.com/p/37902495

CSAPP-Synchronization

Posted on 2018-12-15

现代操作系统提供三种构造并发程序的方法:

1.进程

2.I/O多路复用

3.线程

基于进程的并发编程

1.内核自动管理多个逻辑流

2.每个进程有私有的地址空间(进程切换的时候要保存和载入数据)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void sigchld_handler(int sig){
while (waitpid(-1, 0, WNOHANG) > 0)
;
return;
// Reap all zombie children
}
int main(int argc, char **argv) {
int listenfd, connfd;
socklen_t clientlen;
struct sockaddr_storage clientaddr;

Signal(SIGCHLD, sigchld_handler);
listenfd = Open_listenfd(argv[1]);
while (1) {
clientlen = sizeof(struct sockaddr_storage);
connfd = Accept(listenfd, (SA *) &clientaddr, &clientlen);
if (Fork() == 0) {
Close(listenfd); // Child closes its listening socket
echo(connfd); // Child services client
Close(connfd); // Child closes connection with client
exit(0); // Child exits
}
Close(connfd); // Parent closes connected socket (important!)
}
}

服务器在accept函数中等待连接请求,然后客户端通过调用connect函数发送连接请求,最后服务器在accept中返回connfd并且fork一个子进程来处理客户端链接,链接就建立在listenfd和connfd之间。

  • 每个客户端由独立的子进程处理,而且必须回收僵尸进程,避免内存泄漏

  • 不同进程之间不共享数据

  • 父进程和子进程都有listenfd和connfd,所以父进程中要关闭connfd,子进程要关闭listenfd

    • 内核会保存对每个socket的引用计数,(refcnt(connfd)=2),所以父进程需要关闭connfd,这样在子进程结束后引用计数才会变为0

优点:只共享已打开的file table,无论是descriptor还是全局变量都不共享,不容易造成同步问题。

缺点:带来额外的进程管理开销,进程间通信需要用IPC

基于事件 I/O multiplexing

1.由程序员手动控制多个逻辑流。

2.所有逻辑流共享同一个地址空间

基于线程

内核自动管理多个逻辑流

每个线程共享地址空间

属于基于进程和基于事件的混合体

传统观点下的进程

upload sessful

线程

一个进程有多个线程,每个线程有自己的线程id,有自己的逻辑控制流,也有自己用来保存局部变量的栈(其他线程可以修改)。而且共享所有代码,数据和内核上下文。

upload sucssful

多线程

upload sussful

概念上的

upload sessful

实际上的

upload sucsful

不同线程之间的数据其实可以相互访问

Shared variable

a variable x is shared if and only if multiple threads reference some instance of x

global & local variable

upload sful

example

upload suessful

ptr是全局变量,shared by main thread and thread 1&2

i is only referenced by main, so not shared

msgs is referenced by all 3 threads, so is is shared

myid is only referenced by thread 1 & 2 seperately

static int cnt is referenced by both 1 &2

so if multiple threads reference the same x instance, the x is shared

upload succful

bad example

upload successful

看loop的汇编代码

filename ady exists, renamed

正常结果

upload succful

下图是线程2提前Load了cnt,结果rdx2为0,正确结果是等到线程1store之后线程2再load

upload succeful

volatile

volatile的本意是“易变的” 因为访问寄存器要比访问内存单元快的多,所以编译器一般都会作减少存取内存的优化,但有可能会读脏数据。当要求使用volatile声明变量值的时候,系统总是重新从它所在的内存读取数据,即使它前面的指令刚刚从该处读取过数据。精确地说就是,遇到这个关键字声明的变量,编译器对访问该变量的代码就不再进行优化,从而可以提供对特殊地址的稳定访问;如果不使用valatile,则编译器将对所声明的语句进行优化。(简洁的说就是:volatile关键词影响编译器编译的结果,用volatile声明的变量表示该变量随时可能发生变化,与该变量有关的运算,不要进行编译优化,以免出错)

再看信号量:

upload sussful

信号量的提出,是为了解决同步不同执行线程问题的方法。

什么是信号量?

信号量s是一个具有非负值的全局变量。

只能由两种特殊操作P,V来处理

P,V原理如上图

uploadccessful

定义P和V,为了确保一个正在运行的程序绝不可能进入s是负值的状态,这个属性叫信号量不变性

使用信号量实现互斥

upload cessful

filename alrey exists, renamed

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14

volatile int cnt=0;
sem_t mutex; //声明信号量mutex

Sem_init(&mutex,0,1);//主线程中初始化


//在线程例程中对共享变量cnt的更新包围P和V操作,从而保护他们
for(i=0;i<niters;i++){

P(&mutex);
cnt++;
V(&mutex);
}

使用信号量来调度共享资源

生产者和消费者

upload cessful

读者和写者问题

load successful

upload sucsful

filename exists, renamed

线程安全

一个函数如果被多个并发线程反复调用的时候,会一直产生正确的结果,否则就是线程不安全的。

1.不保护共享变量的函数

解决方案,利用P,V这样的同步操作来保护共享的变量

2.保持跨越多个调用状态的函数

uoad successful

3.返回指向静态变量的指针的函数

upload scessful

4.调用线程不安全函数的函数。

1.如果函数f调用线程不安全函数g。那么f可能不安全。
2.如果g是第二类,那么f一定不安全,也没有办法去修正,只能改变g.
3.如果g是第一,三类,可以用加锁-拷贝技术来解决。

竞争

当一个程序的正确性依赖于一个线程要在另一个线程到达y点之前到达它的控制流中的x点,就会发生竞争。

1.通常,竞争发生的理由是因为程序员假定某种特殊的轨迹线穿过执行状态空间。

u successful

filename alry exists, renamed

upload successf

死锁

filename ady exists, renamed

uploaessful

练习:

判断 myid会不会出现竞争

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// 版本一
void *foo(void *vargp)
{
int myid;
myid = *((int *)vargp);
Free(vargp);
printf("Thread %d\n", myid);
}
int main()
{
pthread_t tid[2];
int i, *ptr;

for (i = 0; i < 2; i++)
{
ptr = Malloc(sizeof(int));
*ptr = i;
Pthread_create(&tid[i], 0, foo, ptr);
}
Pthread_join(tid[0], 0);
Pthread_join(tid[1], 0);
}

循环中每次创建不同的指针,两个线程不是共享同一个变量。


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// 版本二
void *foo(void *vargp)
{
int myid;
myid = *((int *)vargp);
printf("Thread %d\n", myid);
}
int main()
{
pthread_t tid[2];
int i;

for (i = 0; i < 2; i++)
Pthread_create(&tid[i], NULL, foo, &i);
Pthread_join(tid[0], NULL);
Pthread_join(tid[1], NULL);
}

两个线程共享i变量,而i++和myid = ((int )vargp)
会发生竞争,当然前提是传的是引用&i,因为这样的话foo函数会改变i本身的值,若如果是传值i,则只会改变i的一个副本。

CSAPP-Network

Posted on 2018-12-14 | In CSAPP

socket

upload sucsful

socket address structures

upload suful

filename aly exists, renamed

发明套接字接口的时候,还没有void*,为了兼容,可以定义套接字函数要求一个指向通用sockaddr结构的指针,然后要求应用程序将与协议特定的结构的指针强制转换成这个通用的结构。

socket Interface

upload succesl

1.开启服务器

  • (open_listenfd函数,做好接收请求的准备)
  • socket
  • bind
  • listen
  • accept

2.开启客户端

  • getaddrinfo
  • socket
  • connect
  1. 交换数据

4.关闭客户端

5.断开客户端

getaddrinfo:

设置服务器的相关信息

socket

创建socket descriptor ,也就是之后用来读写的file descriptor

upload cessful

bind

请求kernel把socket address 和 socket descriptor绑定

Bind is kernel call to designate which service this program will be hosting

告诉内核将addr中的服务器套接字地址和套接字描述符sockfd联系起来,对于socket和connect,最好方法是用getaddrinfo来为bind提供参数

filename alrey exists, renamed

listen

由于客户端是发起连接要求的主动实体,服务器是被动等待连接请求的,所以默认条件下,内核会认为socket函数创建的描述符对应于主动套接字,它存在于一个连接的客户端。服务器调用listen函数告诉内核,描述符是被服务器而不是客户端使用的。

也就是说listen函数把sockfd从一个主动套接字转化为一个监听套接字,这个套接字可以接受来自客户端的连接请求。

upload succeul

accept

等待来自客户端的连接请求到达listenfd,然后在addr中填写客户端的套接字地址,并且返回一个已连接描述符,这个描述符可以用来利用Unix I/O 函数与客户端通信。

upload succeul

filename ady exists, renamed

connect (client side)

connect 函数试图与套接字地址为addr的服务器建立一个因特网链接,其中addrlen是sizeof(sockaddr_in)connect函数会阻塞,一直到连接成功或者发现错误。

uploessful

diff of connected & listening descriptors

upload successful

getaddrinfo

upload succesl

Host and service connection

uload successful

result 指向addrinfo结构的链表

getnameinfo

upload succesul

LLR by getinfo

upload sussful

addrinfo struct(代码实现)

upload succel

以下函数是用来把域名转化为IP地址的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int main(int argc,char **argv){

struct addrinfo *p,*listp,hints;
char buf[MAXLINE];
int rc,flags;

if(argc!=2){
fprintf(stderr,"usage: %s <domain name>\n",argv[0]);
exit(0);
}

//get a list of addinfo records
memset(&hints,0,sizeof(struct addrinfo));
hints.ai_family = AF_INET;
hints.ai_socktype = SOCK_STREAM;
if((rc=getaddrinfo(argv[1],NULL,&hints,&listp))!=0){
fprintf(stderr,"getaddrinfo error: %s\n",gai_strerror(rc));
exit(1);
}

flags = NI_NUMERICHOST;

for(p=listp;p;p = p->ai_next){
Getnameinfo(p->ai_addr,p->ai_addrlen,buf,MAXLINE,NULL,0,flags);
printf("%s\n",buf);
}

Freeaddrinfo(listp);

exit(0);
}

Web

upload sssful

web server return content to clients,in 2 different ways:

1.取磁盘文件,并把他的内容返回给客户端,磁盘文件称为静态内容,返回文件给客户端叫服务静态内容。

2.运行一个可执行文件,把输出返回给客户端,运行的时候输出内容叫做动态内容,而运行程序并返回输出内容叫做服务动态内容。

web服务器返回的内容都是和它管理的某个文件相关联的,这些文件中的每一个都有一个唯一的名字叫URL,通用资源定位符。

HTTP

超文本传输协议

CGI

通用网关接口,用来处理服务器向客户端提供动态内容的问题

CSAPP-UNIX/IO

Posted on 2018-12-14

Unix I/O Overview

1.A linux file is a sequence of m bytes

B0,B1..Bm-1

2.all I/O devices and even the kernel are represented as files

/dev/sda2 (/usr disk partition)
/dev/tty2 (terminal)

filename aady exists, renamed

File Types

1.regular file

2.directory

3.socket for communicating with a process on another machine

regular files

Kernel doesn’t know the difference between text files and regular files

pathname

absolute pathname

starts with ‘/‘ and denotes path from root

/home/droh/a.c

relative pathname

../home/droh/a.c

upload succsful

open&closing file

upload successul

upload suessful

Reading Files

upload succful

Writing Files

filenameady exists, renamed

Simple Unix I/O example

bad case

Each while loop, execute 2 operating system function,that means you have to go to the kernel,go through,context switch,do whatever you wants,then switch back. in a word,it is too expensive.

1
2
3
4
5
6
7
8
9
10
#include "csapp.h"

int main(void)
{
char c;

while(Read(STDIN_FILENO, &c, 1) != 0)
Write(STDOUT_FILENO, &c, 1);
exit(0);
}

short counts

filename y exists, renamed

File Metadata

upload essful

File Sharing

upload succesul

filename ady exists, renamed

fork

upload scessful

filename aldy exists, renamed

内核用三个相关的数据结构来表示打开的文件

1.每个进程都有它独立的描述符表,它的表项是由进程打开的文件描述符来索引的,每个打开的描述符表项指向文件表一个表项。

2.文件表。是所有进程共享的,保存当前文件位置,引用计数(当前指向该表项的描述符表项数),以及一个指向vnode表对应表项的指针

3.vnode表,所有进程共享

I/O Redirection

upload cessful

steps 1

upload sucessful

steps 2

upload essful

1
2

fd(5,0)

就是把重定向标准输入到描述符5

一些例子

upload succeful

fd3重定向到了fd2,所以他们指向同样的open file table项,c2是第一个字符a,而c3在此基础上读下一个字符,所以是b。

1234…6
Alex Chiu

Alex Chiu

Alex's personal blog

54 posts
4 categories
4 tags
© 2019 Alex Chiu
Powered by Hexo
|
Theme — NexT.Mist v5.1.4