Overview of Week 4

15 Sep 2019

Welcome to the week 4 of the course! In this week, we will learn how to build basic ML models with TensorFlow 2.0. We will be building the following models:

As we discussed in Week 1, after building the model, we will encounter that we have achieved one of the following: (i) overfitting, (ii) underfitting or (iii) just right fit model. We will concretely understand the concept of underfitting and overfitting through practical examples. Here is the video for overfitting and underfitting.

Finally we often need to save the trained ML model so that we can use it later for prediction or use it to initialize weights of model in the next run. We will cover the mechanism of storing a model in TensorFlow 2.0. We can either store only weights of the model, store only architecture of the model or the both. There are variety of formats in which we can store the model like JSON, YMAL and HDF5. The model can be stored at the end of training or after every few epochs during the training. The later is very useful for models that take long time to train. In such models, we can take the intermediate model and use it to get a sense of how the model is performing on the given task. We will demonstrate how to restore the model from the latest checkpoint or from any checkpoint in the past. Here is the video for saving and restoring models.

Note on teaching style/method

In this week, we will be writing code for building ML models using concepts learnt so far in the course. The lectures mainly focus on walking you through the colab notebooks and explaining how the model looks like. We will not be explaining any of the concepts explained so far in the course and we advise you to revisit the respective videos again. This has been in order to make more time to learn new concepts.

Broad steps in training ML model

As discussed in week 1, most of these models will have the following broad steps:

Training Data

Problem Dataset Source ML problem
Image Classification Fashion MNIST keras.dataset.fashion_mnist Classification
Regression AutoMPG UCI ML repository Regresson
Structured Data Heart Disease Cleveland Clinic Foundation Classification
Text Classification IMDB tensorflow_datasets Classification

As you can see that we have datasets from mixed sources so that we can demonstrate how to load data from different sources.

Data Visualization

We will be using matplotlib for visualization of structured data as well as image related visualizations (imshow).

Data Preprocessing

Model Construction

Specification

We fix the architecture of the models and for problems we will be using feed-forward neural networks (FFNN). We will be using tf.keras.models.Sequential model for building FFNNs. Typical model specifciation looks like -

keras.Sequential([
    keras.layers.Dense(num_hidden_units, kernel_regularizer=regularization, activation=activation),  #hidden layer
    keras.layers.Dense(num_hidden_units, kernel_regularizer=regularization, activation=activation),  #hidden layer
    keras.layers.Dense(num_hidden_units, activation=activation)                                      #output layer
])

We usually use relu activation in the hidden layers. The activation for the output layer depends on the problem:

The number of hidden layers and units within them are part of model configuration or hyperparameters. The number of units in the output layer depends on the problem:

Compilation

After specifying the model, we need to compile them where we specify loss function, optimization algorithm and metrics to track during model training.

model.compile(optimizer=optimizer,
              loss=loss,
              metrics=list_of_metrics)

Depending of the problem, we provide the loss function.

Model Training

We perfom model training with model.fit function where we give training data - features and labels, validation data - features and labels(optional), and number of epochs for training.

Model Evaluation

We use model.evaluate function for evaluating model performance on test data. We provide both test data and labels to the evaluate function.

Model Training

Finally, we use model.predict function to predict labels for new data. Here we only specify the data and the function returns its prediction in terms of labels.

Review

The following is the list of important functions and concepts to remember from this week

Task API/Function
Model Specification tf.keras.Sequential
Add layers in model layers, layers.Dense
Activation relu, sigmoid, linear, softmax
Loss functions mse, binary_cross_entropy, sparse_categorical_cross_entropy
Optimizers adam
Metrics accuracy
Data load tf.data.Dataset