Building an Artificial Neural Network using Tensorflow 2.0
In this notebook, we will see how to implement an Artificial Neural Network using Tensorflow 2.0. We will be using Fashion MNIST dataset directly importing it from Tensorflow datasets
Importing libraries
import numpy as np
import datetime
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnisttf.__version__'2.3.0'
Data preprocessing
Loading the dataset
(X_train, y_train),(X_test, y_test)=fashion_mnist.load_data()Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
32768/29515 [=================================] - 0s 3us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
26427392/26421880 [==============================] - 12s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
8192/5148 [===============================================] - 0s 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
4423680/4422102 [==============================] - 2s 0us/step
Normalizing the images
the goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization. It is required only when features have different ranges.
Here we divide each pixel of image in both training and test set by maximum number of pixels(255) In this way each pixel will be in range [0,1]. By normalizing images, our model trains faster
X_train=X_train/255
X_test=X_test/255
Reshaping the data
We are building a fully connected network, we reshape the training set and test set to be in vector format
Since each image’s dimension is 28x28, we reshape full dataset to (-1, height x width)
X_train.shape(60000, 28, 28)X_train=X_train.reshape(-1,28*28)X_train.shape(60000, 784)# Also reshape the X_test
X_test=X_test.reshape(-1,28*28)
X_test.shape(10000, 784)
Building an Artificial Neural Network
Defining the model
Let us start with a sequential model. A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. The sequential API allows you to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs.
model=tf.keras.models.Sequential()
Adding a first fully connected hidden layer
Layer hyper-params
1) number of units/neurons: 128
2) activation function: ReLU
3) input_shape: (784,)model.add(tf.keras.layers.Dense(units=128, activation='relu',input_shape=(784,)))
Adding second layer with dropout
A dropout is a simple but helpful technique to train a deep network with a relatively small dataset. The idea of dropout is to randomly deactivate a fraction of the units, e.g., 50%, in a network on each training iteration (Fig. 1.10B). This helps prevent complex co-adaptations among units, i.e., undesirable dependence on the presence of particular other units. By preventing complex co-adaptations with dropout, it naturally helps avoid overfitting, and thus makes the trained model better generalized. The other noteworthy effect of dropout is to provide a way of combining exponentially many different network architectures efficiently. The random and temporal removal of units in training results in different network architectures, and thus at each iteration, it can be thought to train different networks but their connection weights are shared. In testing, all units in the network should be on, i.e., no dropout, but the weights are halved to maintain the same output range.
model.add(tf.keras.layers.Dropout(0.2))
Adding the output layer
1) units: number of classes (10 in Fashion MNIST dataset)
2) activation: softmaxmodel.add(tf.keras.layers.Dense(units=10, activation='softmax'))
Compiling the model
1) Optimizer: Adam
2) Loss: Sparse softmax (categorical) crossentropy
The Adam optimization algorithm is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. Adam is different to classical stochastic gradient descent. Stochastic gradient descent maintains a single learning rate (termed alpha) for all weight updates and the learning rate does not change during training. A learning rate is maintained for each network weight (parameter) and separately adapted as learning unfolds. Adam realizes the benefits of both AdaGrad and RMSProp. Instead of adapting the parameter learning rates based on the average first moment (the mean) as in RMSProp, Adam also makes use of the average of the second moments of the gradients (the uncentered variance).
The only difference between sparse categorical cross entropy and categorical cross entropy is the format of true labels. When we have a single-label, multi-class classification problem, the labels are mutually exclusive for each data, meaning each data entry can only belong to one class. Then we can represent y_true using one-hot embeddings.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['sparse_categorical_accuracy'])model.summary()Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 128) 100480
_________________________________________________________________
dropout (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 1290
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________
Training the model
model.fit(X_train, y_train, epochs=10)Epoch 1/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.5286 - sparse_categorical_accuracy: 0.8122
Epoch 2/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.3987 - sparse_categorical_accuracy: 0.8558
Epoch 3/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3625 - sparse_categorical_accuracy: 0.8688
Epoch 4/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3433 - sparse_categorical_accuracy: 0.8730
Epoch 5/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3284 - sparse_categorical_accuracy: 0.8804
Epoch 6/10
1875/1875 [==============================] - 5s 2ms/step - loss: 0.3183 - sparse_categorical_accuracy: 0.8833
Epoch 7/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.3081 - sparse_categorical_accuracy: 0.8849
Epoch 8/10
1875/1875 [==============================] - 5s 3ms/step - loss: 0.2960 - sparse_categorical_accuracy: 0.8903
Epoch 9/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2888 - sparse_categorical_accuracy: 0.8917
Epoch 10/10
1875/1875 [==============================] - 4s 2ms/step - loss: 0.2831 - sparse_categorical_accuracy: 0.8944
<tensorflow.python.keras.callbacks.History at 0x7ffa570fd130>
Model evaluation and prediction
test_loss, test_accuracy=model.evaluate(X_test, y_test)313/313 [==============================] - 0s 777us/step - loss: 0.3337 - sparse_categorical_accuracy: 0.8796print("Test accuracy: {}".format(test_accuracy))Test accuracy: 0.8795999884605408
References
https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/
https://towardsai.net/p/data-science/how-when-and-why-should-you-normalize-standardize-rescale-your-data-3f083def38ff
https://machinelearningmastery.com/dropout-for-regularizing-deep-neural-networks/
https://keras.io/guides/sequential_model/
https://cwiki.apache.org/confluence/display/MXNET/Multi-hot+Sparse+Categorical+Cross-entropy