Keras初探

Keras

简单示例

Keras 层就像神经网络层。有全连接层、最大池化层和激活层。你可以使用模型的 add() 函数添加层。例如,简单的模型可以如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Flatten

#创建序列模型
model = Sequential()

#第一层 - 添加有128个节点的全连接层以及32个节点的输入层
model.add(Dense(128, input_dim=32))

#第二层 - 添加 softmax 激活层
model.add(Activation('softmax'))

#第三层 - 添加全连接层
model.add(Dense(10))

#第四层 - 添加 Sigmoid 激活层
model.add(Activation('sigmoid'))

Keras 将根据第一层自动推断后续所有层的形状。这意味着,你只需为第一层设置输入维度。

上面的第一层 model.add(Dense(input_dim=32)) 将维度设为 32(表示数据来自 32 维空间)。第二层级获取第一层级的输出,并将输出维度设为 128 个节点。这种将输出传递给下一层级的链继续下去,直到最后一个层级(即模型的输出)。可以看出输出维度是 10。

构建好模型后,我们就可以用以下命令对其进行编译。我们将损失函数指定为我们一直处理的 categorical_crossentropy。我们还可以指定优化程序,稍后我们将了解这一概念,暂时将使用 adam。最后,我们可以指定评估模型用到的指标。我们将使用准确率。

1
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics = ['accuracy'])
1
2
# 查看网络结构
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_3 (Dense)              (None, 128)               4224      
_________________________________________________________________
activation_3 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1290      
_________________________________________________________________
activation_4 (Activation)    (None, 10)                0         
=================================================================
Total params: 5,514
Trainable params: 5,514
Non-trainable params: 0
_________________________________________________________________
1
# model.fit(X, y, epochs=1000, verbose=0)

小练习

构建一个简单的多层前向反馈神经网络以解决 XOR 问题。

  1. 将第一层设为 Dense() 层,并将节点数设为8,且 input_dim 设为 2。
  2. 在第二层之后使用 softmax 激活函数。
  3. 将输出层节点设为 2,因为输出只有 2 个类别。
  4. 在输出层之后使用 softmax 激活函数。
  5. 对模型运行 10 个 epoch。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import numpy as np
from keras.utils import np_utils
import tensorflow as tf
# Using TensorFlow 1.0.0; use tf.python_io in later versions
# tf.python.control_flow_ops = tf

# Set random seed
np.random.seed(42)

# Our data
X = np.array([[0,0],[0,1],[1,0],[1,1]]).astype('float32')
y = np.array([[0],[1],[1],[0]]).astype('float32')

# Initial Setup for Keras
from keras.models import Sequential
from keras.layers.core import Dense, Activation, Flatten

y = np_utils.to_categorical(y)

# Building the model
xor = Sequential()
# Add required layers
xor.add(Dense(8, input_dim=2))

xor.add(Activation("softmax"))

xor.add(Dense(2))

xor.add(Activation("softmax"))

# Specify loss as "binary_crossentropy", optimizer as "adam",
# and add the accuracy metric
xor.compile(loss="binary_crossentropy", optimizer='adam', metrics=['accuracy'])

# Uncomment this line to print the model architecture
xor.summary()

# # Fitting the model
history = xor.fit(X, y, epochs=100, verbose=0)

# # Scoring the model
score = xor.evaluate(X, y)
print("\nAccuracy: ", score[-1])

# # Checking the predictions
print("\nPredictions:")
print(xor.predict_proba(X))
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_74 (Dense)             (None, 8)                 24        
_________________________________________________________________
activation_70 (Activation)   (None, 8)                 0         
_________________________________________________________________
dense_75 (Dense)             (None, 2)                 18        
_________________________________________________________________
activation_71 (Activation)   (None, 2)                 0         
=================================================================
Total params: 42
Trainable params: 42
Non-trainable params: 0
_________________________________________________________________
4/4 [==============================] - 0s 44ms/step

Accuracy:  0.75

Predictions:
[[0.51535237 0.4846476 ]
 [0.49666327 0.5033367 ]
 [0.49206728 0.5079328 ]
 [0.49131778 0.5086822 ]]

利用神经网络的 Kera 预测学生录取情况

在该 notebook 中,我们基于以下三条数据预测了加州大学洛杉矶分校的研究生录取情况:

  • GRE 分数(测试)即 GRE Scores (Test)

  • GPA 分数(成绩)即 GPA Scores (Grades)

  • 评级(1-4)即 Class rank (1-4)

数据集来源:http://www.ats.ucla.edu/

加载数据

为了加载数据并很好地进行格式化,我们将使用两个非常有用的包,即 Pandas 和 Numpy。 你可以在这里此文档:

1
2
3
import pandas as pd
data = pd.read_csv('student_data.csv')
data.head()

观察数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Importing matplotlib
import matplotlib.pyplot as plt

# Function to help us plot
def plot_points(data):
X = np.array(data[["gre","gpa"]])
y = np.array(data["admit"])
admitted = X[np.argwhere(y==1)]
rejected = X[np.argwhere(y==0)]
plt.scatter([s[0][0] for s in rejected], [s[0][1] for s in rejected], s = 25, color = 'red', edgecolor = 'k')
plt.scatter([s[0][0] for s in admitted], [s[0][1] for s in admitted], s = 25, color = 'cyan', edgecolor = 'k')
plt.xlabel('Test (GRE)')
plt.ylabel('Grades (GPA)')

# Plotting the points
plot_points(data)
plt.show()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Separating the ranks
data_rank1 = data[data["rank"]==1]
data_rank2 = data[data["rank"]==2]
data_rank3 = data[data["rank"]==3]
data_rank4 = data[data["rank"]==4]

# Plotting the graphs
plot_points(data_rank1)
plt.title("Rank 1")
plt.show()
plot_points(data_rank2)
plt.title("Rank 2")
plt.show()
plot_points(data_rank3)
plt.title("Rank 3")
plt.show()
plot_points(data_rank4)
plt.title("Rank 4")
plt.show()

数据预处理

对评级(rank)进行独热编码

1
2
3
one_hot_rank = pd.get_dummies(data['rank'], prefix='rank')
data = pd.concat([data, one_hot_rank], axis=1)
del data['rank']

缩放数据

1
2
data.gre = data.gre / np.max(data.gre)
data.gpa = data.gpa / np.max(data.gpa)

生成数据

1
2
3
X = np.array(data)[:,1:]
# y = np_utils.to_categorical(np.array(data['admit']))
y = np.array(pd.get_dummies(data['admit']),dtype='float32')

构建模型架构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
from keras.utils import np_utils

model = Sequential()
# 第一层
model.add(Dense(128, input_dim=6, activation='relu'))
model.add(Dropout(.3))

# 第二层
model.add(Dense(64, activation='relu'))
model.add(Dropout(.1))

# # 第三层
# model.add(Dense(32, activation='relu'))
# model.add(Dropout(.1))

# 输出层
model.add(Dense(2))
model.add(Activation('softmax'))

# 编译模型
model.compile(loss='mean_squared_error',
optimizer='adam', metrics=['accuracy'])

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_153 (Dense)            (None, 128)               896       
_________________________________________________________________
dropout_26 (Dropout)         (None, 128)               0         
_________________________________________________________________
dense_154 (Dense)            (None, 64)                8256      
_________________________________________________________________
dropout_27 (Dropout)         (None, 64)                0         
_________________________________________________________________
dense_155 (Dense)            (None, 2)                 130       
_________________________________________________________________
activation_113 (Activation)  (None, 2)                 0         
=================================================================
Total params: 9,282
Trainable params: 9,282
Non-trainable params: 0
_________________________________________________________________

训练模型

1
model.fit(X, y, epochs=1000, batch_size=100, verbose=0)
<keras.callbacks.History at 0x11f4905c0>

评估模型

1
2
3
4
5
6
score = model.evaluate(X, y)
print("\nAccuracy: ", score[-1])

# # Checking the predictions
print("\nPredictions:")
# print(model.predict_proba(X)[:5])
400/400 [==============================] - 1s 2ms/step

Accuracy:  0.48

Predictions:

拆分训练及测试集

1
2
3
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=666)
1
model.fit(X_train, y_train, epochs=200, batch_size=100, verbose=0)
<keras.callbacks.History at 0x11b578f28>
1
2
3
4
5
6
7
8
9
test_score = model.evaluate(X_test, y_test)
train_score = model.evaluate(X_train, y_train)
print("\ntest_Accuracy: ", test_score[1])
print("\ntrain_Accuracy: ", train_score[1])

# # Checking the predictions
print("\nPredictions:")
print(model.predict_proba(X_test)[:5])
print("\nLabel:\n", y_test[:5])
80/80 [==============================] - 0s 5ms/step
320/320 [==============================] - 0s 58us/step

test_Accuracy:  0.7

train_Accuracy:  0.709375

Predictions:
[[0.6729823  0.3270178 ]
 [0.7473139  0.252686  ]
 [0.76458406 0.23541589]
 [0.8641033  0.1358967 ]
 [0.6661182  0.3338818 ]]

Label:
 [[1. 0.]
 [1. 0.]
 [0. 1.]
 [1. 0.]
 [0. 1.]]

玩转参数

你可以看到我们在训练过程中做了几个规定。 例如,对图层的数量,图层的大小,epoch 的数量等有所规定。
现在轮到你来玩转参数! 你能提高准确性吗? 以下是针对这些参数的其他建议。 我们将在课程后面学习如下概念的定义:

  • 激活函数 (Activation function):relu 和 sigmoid
  • 损失函数 (Loss function):categorical_crossentropy,mean_squared_error
  • 优化器 (Optimizer):rmsprop,adam,ada

几种优化器简介

SGD
这是随机梯度下降。它使用了以下参数:

  • 学习速率
  • 动量(获取前几步的加权平均值,以便获得动量而不至于陷在局部最低点)。
  • Nesterov 动量(当最接近解决方案时,它会减缓梯度)。

Adam
Adam (Adaptive Moment Estimation) 使用更复杂的指数衰减,不仅仅会考虑平均值(第一个动量),并且会考虑前几步的方差(第二个动量)。

RMSProp
RMSProp (RMS 表示均方根误差)通过除以按指数衰减的平方梯度均值来减小学习速率。