Building deep learning neural networks using TensorFlow layers


Samples from the MNIST test data set
Samples from the MNIST test data set (source: Josef Steppan on Wikimedia Commons)

Deep learning has proven its effectiveness in many fields, such as computer vision, natural language processing (NLP), text translation, or speech to text. It takes its name from the high number of layers used to build the neural network performing machine learning tasks. There are several types of layers as well as overall network architectures, but the general rule holds that the deeper the network is, the more complexity it can grasp. This article will explain fundamental concepts of neural network layers and walk through the process of creating several types using TensorFlow.

TensorFlow is the platform that contributed to making artificial intelligence (AI) available to the broader public. It’s an open source library with a vast community and great support. TensorFlow provides a set of tools for building neural network architectures, and then training and serving the models. It offers different levels of abstraction, so you can use it for cut-and-dried machine learning processes at a high level or go more in-depth and write the low-level calculations yourself.

TensorFlow offers many kinds of layers in its tf.layers package. The module makes it easy to create a layer in the deep learning model without going into many details. At the moment, it supports types of layers used mostly in convolutional networks. For other types of networks, like RNNs, you may need to look at tf.contrib.rnn or tf.nn. The most basic type of layer is the fully connected one. To implement it, you only need to set up the input and the size in the Dense class. Other kinds of layers might require more parameters, but they are implemented in a way to cover the default behaviour and spare the developers’ time.

There is some disagreement on what a layer is and what it is not. One opinion states that a layer must store trained parameters (like weights and biases). This means, for instance, that applying the activation function is not another layer. Indeed, tf.layers implements such a function by using the activation parameter. Layers introduced in the module don’t always strictly follow this rule, though. You can find a large range of types there: fully connected, convolution, pooling, flatten, batch normalization, dropout, and convolution transpose. It may seem that, for example, layer flattening and max pooling don’t store any parameters trained in the learning process. Nonetheless, they are performing more complex operations than activation function, so the authors of the module decided to set them up as separate classes. Later in the article, we’ll discuss how to use some of them to build a deep convolutional network.

A typical convolutional network is a sequence of convolution and pooling pairs, followed by a few fully connected layers. A convolution is like a small neural network that is applied repeatedly, once at each location on its input. As a result, the network layers become much smaller but increase in depth. Pooling is the operation that usually decreases the size of the input image. Max pooling is the most common pooling algorithm, and has proven to be effective in many computer vision tasks.

In this article, I’ll show the use of TensorFlow in applying a convolutional network to image processing, using the MNIST data set for our example. The task is to recognize a digit ranging from 0 to 9 from its handwritten representation.

First, TensorFlow has the capabilities to load the data. All you need to do is to use the input_data module:

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(folder_path, one_hot=True)

We are now going to build a multilayered architecture. After describing the learning process, I’ll walk you through the creation of different kinds of layers and apply them to the MNIST classification task.

The training process works by optimizing the loss function, which measures the difference between the network predictions and actual labels’ values. Deep learning often uses a technique called cross entropy to define the loss.

TensorFlow provides the function called tf.losses.softmax_cross_entropy that internally applies the softmax algorithm on the model’s unnormalized prediction and sums results across all classes. In our example, we use the Adam optimizer provided by the tf.train API. labels will be provided in the process of training and testing, and will represent the underlying truth. output represents the network predictions and will be defined in the next section when building the network.

loss = tf.losses.softmax_cross_entropy(labels, output)
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)

To evaluate the performance of the training process, we want to compare the output with the real labels and calculate the accuracy:

correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

Now, we’ll introduce a simple training process using batches and a fixed number of steps and learning rate. For the MNIST data set, the next_batch function would just call mnist.train.next_batch. After the network is trained, we can check its performance on the test data.

# Open the session
sess = tf.InteractiveSession()

for i in range(steps):
# Get the next batch
   input_batch, labels_batch = next_batch(100)
   feed_dict = {x_input: input_batch, y_labels: labels_batch}

   # Print the current batch accuracy every 100 steps
   if i%100 == 0:
      train_accuracy = accuracy.eval(feed_dict=feed_dict)
      print("Step %d, training batch accuracy %g"%(i, train_accuracy))

   # Run the optimization step

# Print the test accuracy once the training is over
print("Test accuracy: %g"%accuracy.eval(feed_dict={x_input: test_images, y_labels: test_labels}))

For the actual training, let’s start simple and create the network with just one output layer. We begin by defining placeholders for the input data and labels. During the training phase, they will be filled with the data from the MNIST data set. Because the data was flattened, the input layer has only one dimension. The size of the output layer corresponds to the number of labels. Both input and labels have the additional dimension set to None, which will handle the variable number of examples.

input = tf.placeholder(tf.float32, [None, image_size*image_size])
labels = tf.placeholder(tf.float32, [None, labels_size])

Now is the time to build the exciting part: the output layer. The magic behind it is quite straightforward. Every neuron in it has the weight and bias parameters, gets the data from every input, and performs some calculations. This is what makes it a fully connected layer.

TensorFlow’s tf.layers package allows you to formulate all this in just one line of code. All you need to provide is the input and the size of the layer.

output = tf.layers.dense(inputs=input, units=labels_size)

Our first network isn’t that impressive in regard to accuracy. But it’s simple, so it runs very fast.

We’ll try to improve our network by adding more layers between the input and output. These are called hidden layers. First, we add another fully connected one.

Some minor changes are needed from the previous architecture. First of all, there is another parameter indicating the number of neurons of the hidden layer. The definition itself takes the input data and connects to the output layer:

hidden = tf.layers.dense(inputs=input, units=1024, activation=tf.nn.relu)
output = tf.layers.dense(inputs=hidden, units=labels_size)

Notice that this time, we used an activation parameter. It runs whatever comes out of the neuron through the activation function, which in this case is ReLU. This algorithm has been proven to work quite well with deep architectures.

You should see a slight decrease in performance. Our network is becoming deeper, which means it’s getting more parameters to be tuned, and this makes the training process longer. On the other hand, this will improve the accuracy significantly, to the 94% level.

The next two layers we’re going to add are the integral parts of convolutional networks. They work differently from the dense ones and perform especially well with input that has two or more dimensions (such as images). The parameters of the convolutional layer are the size of the convolution window and the number of filters. A padding set of same indicates that the resulting layer is of the same size. After this step, we apply max pooling.

Using convolution allows us to take advantage of the 2D representation of the input data. We’d lost it when we flattened the digits pictures and fed the resulting data into the dense layer. To go back to the original structure, we can use the tf.reshape function.

input2d = tf.reshape(input, [-1,image_size,image_size,1])

The code for convolution and max pooling follows. Notice that for the next connection with the dense layer, the output must be flattened back.

conv1 = tf.layers.conv2d(inputs=input2d, filters=32, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
pool_flat = tf.reshape(pool1, [-1, 14 * 14 * 32])
hidden = tf.layers.dense(inputs= pool_flat, units=1024, activation=tf.nn.relu)
output = tf.layers.dense(inputs=hidden, units=labels_size)

Adding the convolution to the picture increases the accuracy even more (to 97%), but slows down the training process significantly. To take full advantage of the model, we should continue with another layer. We again are using the 2D input, but flattening only the output of the second layer. The first one doesn’t need flattening now because the convolution works with higher dimensions.

conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 5], padding="same", activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
pool_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

At this point, you need be quite patient when running the code. The complexity of the network is adding a lot of overhead, but we are rewarded with better accuracy.

We’ll now introduce another technique that could improve the network performance and avoid overfitting. It’s called Dropout, and we’ll apply it to the hidden dense layer. Dropout works in a way that individual nodes are either shut down or kept with some explicit probability. It is used in the training phase, so remember you need to turn it off when evaluating your network.

To use Dropout, we need to change the code slightly. First of all, we need a placeholder to be used in both the training and testing phases to hold the probability of the Dropout.

should_drop = tf.placeholder(tf.bool)

Second, we need to define the dropout and connect it to the output layer. The rest of the architecture stays the same.

hidden = tf.layers.dense(inputs=pool_flat, units=1024, activation=tf.nn.relu)
dropout = tf.layers.dropout(inputs=hidden, rate=0.5, training=should_drop)
output = tf.layers.dense(inputs=dropout, units=labels_size)

In this article, we started by introducing the concepts of deep learning and used TensorFlow to build a multi-layered convolutional network. The code can be reused for image recognition tasks and applied to any data set. More complex images, however, would require greater depth as well as more sophisticated twists, such as inception or ResNets.

The key lesson from this exercise is that you don’t need to master statistical techniques or write complex matrix multiplication code to create an AI model. TensorFlow can handle those for you. However, you need to know which algorithms are appropriate for your data and application, and determine the best hyperparameters, such as network architecture, depth of layers, batch size, learning rate, etc. Be aware that the variety of choices in libraries like TensorFlow give you requires a lot of responsibility on your side.

This post is a collaboration between O’Reilly and TensorFlow. See our statement of editorial independence.

Powered by WPeMatico