Neural Net Notebook

This TensorFlow Neural Network tutorial has several aspects that are unique or not evident in other tutorials like the MNIST handwritten digits tutorial. The focus is on business, both in terms of the use case and data and in terms of extra steps needed to help take your data science results to production.

Reading data and reshaping it for TensorFlow neural net input
Epoch based training and training data randomization
Training in small batches for larger data sets
Tuning hyperparameters like the network structure and activation function
Tricks for properly saving and restoring models for use in a production environment
How to do transfer learning
How to derive confidence values for neural net outputs

You can follow along if you have taken the time to locally install Jupyter/Python/TensorFlow/etc., or you can take a few minutes to sign up for the free trial of IBM Data Science Experience on IBM Cloud.

The sample data we’ll be using for training and testing is in a file called bankLoanData.csv, which is a sample data file I obtained from my laptop IBM SPSS statistics package. I’ve used this data because I could easily use SPSS to double-check that all the TensorFlow code was behaving as I expected. The advantage to both you and me, then, is that we can now easily adapt the resulting TensorFlow code to build bigger neural nets that learn from much larger data sets.

The goal will be to train and perform inferences with a TensorFlow neural network for predicting whether a loan applicant is likely to ‘default’ on a bank loan, based on features of the applicant that may be predictive of their ability to repay a loan. The dependent variable being predicted is the column named ‘default’ in the CSV file. The dependent variable is also called the ‘label’, and the data in the column is called the ‘labeled data’. The predictor variables are ‘age’, ‘ed’ (level of education), ‘employ’ (years with current employer), ‘address’ (years at current address), ‘income’ (household income in thousands), ‘debtinc’ (debt to income ratio x 100), ‘creddebt’ (credit card debt in thousands), and ‘othdebt’ (other debt in thousands). The predictor variables are also called ‘features’. The remaining columns of data are not needed and will be discarded in the code below. When training, the feature values of the instances (rows) of data are fed as input to the neural net, and the weights and biases of the neural network are adjusted so as to minimize ‘loss’, which coarsely maps to maximizing accuracy of the neural network’s output layer predictions of the labeled data.

The first cell of the Jupyter Python notebook has to do some version of reading the CSV file. In my prior tutorial, I showed how to load a CSV file into a database and then load the data into a Pandas dataframe using a SQL query. For this tutorial, any version of pandas.read_csv() will suffice. For example, in IBM Data Science Experience on IBM Cloud, you can simply drag and drop the CSV file to add it as a dataset, and then select “Insert Code” to automatically generate the code to read the CSV file from cloud object storage. For larger datasets, you may prefer to use a SparkSession Dataframe instead, but in that case, you’ll need to slightly adjust the numpy extraction code in the next notebook cell.

The next cell of code below assumes a Pandas dataframe named ‘df_data_1’ and uses it to extract the data into numpy arrays needed as input to the TensorFlow API. The comprehension in the first np.array() removes instances (rows) that have a missing label.

import numpy as np

# Make a numpy array from the dataframe, except remove rows with no value for 'default'
i = list(df_data_1.columns.values).index('default')
data = np.array([x for x in df_data_1.values if x[i] in ['0', '1']])

# Remove the columns for preddef1, predef2 and preddef3
data = np.delete(data, slice(9,12), axis=1)

# Separate the 'predictors' (aka 'features') from the dependent variable (aka 'label') that we will learn how to predict
predictors = np.delete(data, 8, axis=1)
dependent = np.delete(data, slice(0, 8), axis=1)

The next cell reshapes the data just a bit. The predictors are separated from the dependent variable. The labeled data is converted from strings to integers, and the dependent array is flattened to one dimension to match the shape of the data that will come from the neural network output layer. The predictors are converted to all floats to facilitate matrix multiplication with weights and biases within the neural network.

# Convert the label type to numeric categorical representing the classes to predict (binary classfier)
dependent = dependent.astype(int)

# And flatten it to one dimensional for use as the expected output label vector in TensorFlow
dependent = dependent.flatten()

# Convert all the predictors to float to simplify this demo TensorFlow code
predictors = predictors.astype(float)

# Get the shape of the predictors
m, n = predictors.shape

The next cell simply takes the first 500 instances as training data, leaving the remaining 200 instances for a test set. It’s not unusual to randomly select the training and test sets from the given data, but this particular sample was already random. It’s also typical to choose about a 70/30 percent split for training and test, and this code does so, except for rounding to a size divisible by the training batch size we’ll define later. This cell also defines a method that returns batch-sized slices of the training data. If the training data were too large to fit in memory, then this method could instead load data one batch at a time, such as with a SQL query.

m_train = 500
m_test = m - m_train

predictors_train = predictors[:m_train]
dependent_train = dependent[:m_train]

predictors_test = predictors[m_train:]
dependent_test = dependent[m_train:]

# Gets a batch of the training data. 
# NOTE: Rather than loading a whole large data set as above and then taking array slices as done here, 
#       This method can connect to a data source and select just the batch needed.
def get_training_batch(batch_num, batch_size):
    lower = batch_num * (m_train // batch_size)
    upper = lower + batch_size
    return predictors_train[lower:upper], dependent_train[lower:upper]

Now we’re set to start with some actual TensorFlow code. This next cell imports TensorFlow, makes a few useful initializations, and then defines a method that will build a neural network layer of a given size, fully connect it to a preceding layer, and set its output activation function.

import tensorflow as tf

# Make this notebook's output stable across runs
tf.reset_default_graph()
tf.set_random_seed(42)
np.random.seed(42)

# A method to build a new neural net layer of a given size, fully connect  
# it to a given preceding layer X, and compute its output Z either with 
# or without (default) an activation function.
# Call with activation=tf.nn.relu or tf.nn.sigmoid or tf.nn.tanh, etc.

def make_nn_layer(layer_name, layer_size, X, activation=None):
    with tf.name_scope(layer_name):
        X_size = int(X.get_shape()[1])
        SD = 2 / np.sqrt(X_size)
        weights = tf.truncated_normal((X_size, layer_size), dtype=tf.float64, stddev=SD)
        W = tf.Variable(weights, name='weights')
        b = tf.Variable(tf.zeros([layer_size], dtype=tf.float64), name='biases')
        Z = tf.matmul(X, W) + b

        if activation is not None:
            return activation(Z)
        else:
            return Z

Now we can add the code cell that builds the neural network structure. In this case, we’re going to have one input layer (X), one hidden layer (hidden1), and one output layer (outputs). The ### comments show how to add more hidden layers, but with this sample data, we’re going to be able to learn everything we can with only one layer. The output layer has two nodes, one for outputting class 0 (the loan applicant won’t default) and the other for class 1 (the loan applicant will default). The ‘y’ variable will be used during training to store the labeled data we expect to match with the output layer.

One line of code that helps makes this tutorial unique is the one that creates a tf.identity() node that gives the name ‘nn_output’. This enables us to save a name for the output layer so that we can recover and use the output layer after a restore.

n_inputs = n
n_hidden1 = n 
### n_hidden2 = n // 2
n_outputs = 2   # Two output classes: defaulting or non-defaulting on loan

X = tf.placeholder(tf.float64, shape=(None, n_inputs), name='X')

with tf.name_scope('nn'):
    hidden1 = make_nn_layer('hidden1', n_hidden1, X, **activation=tf.nn.relu**)
    hidden2 = hidden1
    ### hidden2 = make_nn_layer('hidden2', n_hidden2, hidden1, activation=tf.nn.relu)

    outputs = make_nn_layer('outputs', n_outputs, hidden2) 
    outputs = tf.identity(outputs, "nn_output")

y = tf.placeholder(tf.int64, shape=(None), name='y')

The cell above and the next cell are where most of the hyperparameter tuning occurs. Neural network is just the algorithm. The input parameters passed to a neural network during inference are the feature values. During training, the input parameters are feature values and the expected output labeled data. But the neural network is adaptable beyond those input parameters, and so these configurable parts are called hyperparameters. The number and size of the hidden layers are among the hyperparameters, as is the activation function. For examples, you can try other numbers and sizes of hidden layers, and ‘tanh’ and ‘sigmoid’ are other activation functions to try. However, the given configuration seems to do very near the best on this data.

What we’ve done so far is to create the main part of a TensorFlow compute graph that happens to have the shape needed for a neural network. What we’re going to do in the next cell below is attach two different root nodes to the output layer, one that adds functionality for training and the other for testing. The ‘training_op’ uses the gradient descent method for minimizing loss (of perfect confidence in the correct answers and zero confidence in incorrect answers, where the correct answers are provided by the labeled data that will be in ‘y’).

with tf.name_scope('loss'):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=outputs)
    loss = tf.reduce_mean(xentropy, name='l')

learning_rate = 0.01
with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("test"):
    correct = tf.nn.in_top_k(tf.cast(outputs, tf.float32), y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

Now we’re going to do a quick little cell that sets up our ability to save the model once it is trained. You only need to do these mkdir commands the first time you run the notebook, so you may want to put them in a separate cell to make it easier to skip them. Also, in Data Science Experience Local, you only need the second mkdir.

init = tf.global_variables_initializer()
saver = tf.train.Saver()

!mkdir "../datasets"
!mkdir "../datasets/Neural Net"

Now we can have the magic notebook cell that trains and saves the trained model. Each epoch of training exposes the neural net to the entire set of training data. When you see this code run, you will see accuracy increase over the many epochs, just as biological neural networks learn through repetition. For each epoch, we run through the training data in batches, to simulate how we’d handle a larger training set. Each batch of features and corresponding labeled data is fed to the ‘training_op’ root node in the compute graph, which is run by training_session.run().

One aspect of this tutorial that is made evident (relative to other tutorials) is the randomization of the training data that takes place at the beginning of each epoch. This essentially drives different data into the batches in each epoch, which dramatically improves accuracy over a larger number of epochs (though it is much easier programmatically to do this randomization when all data fits into memory).

# This is how many times to use the full set of training data
n_epochs = 3000

# For larger training sets, we can break training into batches so only the
# memory needed to store one batch of training data is used
batch_size = 50

with tf.Session() as training_session:
    init.run()

    for epoch in range(n_epochs):
    
        # Shuffling (across batches) is easier to do for small data sets and
        # helps increase accuracy
        training_set = [[pt_elem, dependent_train[i]] for i, pt_elem in enumerate(predictors_train)]
        np.random.shuffle(training_set)
        predictors_train = [ts_elem[0] for ts_elem in training_set]
        dependent_train = [ts_elem[1] for ts_elem in training_set]
    
        # Loop through the whole training set in batches
        for batch_num in range(m_train // batch_size):
            X_batch, y_batch = get_training_batch(batch_num, batch_size)
            training_session.run(training_op, feed_dict={X: X_batch, y: y_batch})

        if epoch % 100 == 99:
            acc_train = accuracy.eval(feed_dict={X: predictors_train, y: dependent_train})
            acc_test = accuracy.eval(feed_dict={X: predictors_test, y: dependent_test})
            print(epoch+1, "Training accuracy:", acc_train, "Testing accuracy:", acc_test)

    save_path = saver.save(training_session, "../datasets/Neural Net/Neural Net.ckpt")

Yet another reason why this tutorial is unique is that we’ll actually take a little sidebar to understand why, when doing business with a real stakeholder customer, we need to have a second test set, often called a validation set or a blind set. Why do we need a second test set? When I ask this, the usual reply is something like, “I don’t know, to double-check accuracy?” Well, sort of. But if you look at the structure of training, the weights and biases are affected not just by by the training data. Indirectly, they are also affected by the test data because we choose n_epochs to keep running training epochs until we get the best accuracy on the test set. In other words, we’re teaching to the test. The validation set or blind set has no such indirect effect on the weights and biases computed for the neural network. It is simply another test set that, to ensure construct validity, should be randomly from the same pool of data that the training set and test set are randomly selected from. In this way, the validation set is not just the ‘final exam’, it’s the first experience of the real world. Sidebar complete.

Once all training has been done, we save the trained model into the previously created datasets subdirectory. In this sample code, we are only saving at the end, but this same command can be used to save the intermediate results of a very long training run.

The data files that TensorFlow created during the save operation can be transported to a production environment. The neural network can then be restored using the code in the next cell below, and the output layer can be obtained and used for inference (using get_tensor_by_name()). In fact, showing how to do that is part of what makes this tutorial unique, as even the current TensorFlow documentation for save/restore (incorrectly) reuses variables after restore that were defined before save (rather than running variables obtained from the restored graph). The code below also shows how to reference into the hierarchy of a namescope.

As another sidebar unique in this tutorial, note that you can also use this method of naming with tf.identity and then getting the tensor from a restored graph to do transfer learning between neural nets. Specifically, once you create a hidden layer with make_nn_layer(), you can name it with tf.identity. Then, you train and save as shown above. Then, to transfer to a second neural net, you restore the trained and saved one, get the hidden layer by name, attach alternate fully connected hidden layers as needed, and an alternate output layer, and then train the new second neural network using the methodology above. Sidebar complete.

The cell below mocks up having a REST API receiving and converting to numpy a batch of feature instances by simply taking a slice of the test data. With the inference TensorFlow session, the compute graph and the values it contained are restored. After that, we obtain the tensor corresponding to the neural network output layer by using the name we previously assigned. Then, we run the output layer, giving the batch of feature instances to the input layer ‘X’. The predictions of the dependent variable are then obtained by choosing whichever of the two output layer nodes has the higher value.

predictors_received = predictors_test[20:40]

import tensorflow as tf_inference

with tf_inference.Session() as inference_session:
    inf_saver = tf_inference.train.import_meta_graph('../datasets/Neural Net/Neural Net.ckpt.meta')
    inf_saver.restore(inference_session, tf_inference.train.latest_checkpoint('../datasets/Neural Net/'))

    graph = tf_inference.get_default_graph()    
    nn_output = graph.get_tensor_by_name("nn/nn_output:0")

    Z = inference_session.run(nn_output, feed_dict={X: predictors_received})
    dependent_pred = np.argmax(Z, axis=1)

Finally, to add one more feature that makes this tutorial unique, let’s look at how to get the actual confidence values for the predictions. Somehow, this may not have seemed as important when doing the MNIST hand-written digit tutorial, but in a business context it’s important to know how much confidence we have in an answer.

dependent_prob = inference_session.run(tf_inference.nn.softmax(nn_output), feed_dict={X: predictors_received})

confidences = [p[dependent_pred[i]] for i, p in enumerate(dependent_prob)]

To get the confidence values, we do something interesting with the compute graph. Remember, it’s just a compute graph and won’t bite. In this case, we pop a new root node onto the output layer to apply the softmax function, which gives the probability of occurrence of each output value. Then, we do one last comprehension to ferret out the confidences of the predicted labels for each feature instance.

And that’s a wrap. Now, it’s your turn. Go ahead, get started with that free IBM Data Science Experience account so you can amaze your friends and bosses with your newfound TensorFlow AI machine learning data science hyperparametric hyperwizardry. You know you waaaanna!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural Net Notebook

Clone this wiki locally