Welcome to the week 4 of the course! In this week, we will learn how to build basic ML models with TensorFlow 2.0. We will be building the following models:

As we discussed in Week 1, after building the model, we will encounter that we have achieved one of the following: (i) overfitting, (ii) underfitting or (iii) just right fit model. We will concretely understand the concept of underfitting and overfitting through practical examples. Here is the video for overfitting and underfitting.

Finally we often need to save the trained ML model so that we can use it later for prediction or use it to initialize weights of model in the next run. We will cover the mechanism of storing a model in TensorFlow 2.0. We can either store only weights of the model, store only architecture of the model or the both. There are variety of formats in which we can store the model like JSON, YMAL and HDF5. The model can be stored at the end of training or after every few epochs during the training. The later is very useful for models that take long time to train. In such models, we can take the intermediate model and use it to get a sense of how the model is performing on the given task. We will demonstrate how to restore the model from the latest checkpoint or from any checkpoint in the past. Here is the video for saving and restoring models.

Note on teaching style/method

In this week, we will be writing code for building ML models using concepts learnt so far in the course. The lectures mainly focus on walking you through the colab notebooks and explaining how the model looks like. We will not be explaining any of the concepts explained so far in the course and we advise you to revisit the respective videos again. This has been in order to make more time to learn new concepts.

Broad steps in training ML model

As discussed in week 1, most of these models will have the following broad steps:

Load training data from files or from inbuilt datasets from keras or TensorFlow datasets.
Data Exploration including visualization, dataset statistics, etc.
Data Preprocessing which involves normalization, outlier removal etc.
Model construction - We choose the appropriate model depending on the problem class and data exploration. There are two steps in tf.keras API: (i) Model building - Choose the model and write code to build it. (ii) Model Compilation where we specify loss function, optimization algorithm to use for training and metrics to track during training process.
Model training by specifying training data and also validation data for diagnosing problems with training (like underfitting/overfitting, inappropriate learning rates etc.)
Model evaluation on test data.
Prediction on new data.
Error analysis (in some cases), where we analyze errors made by model. This learning is usually fed back in making changes in the mdoel for obtaining better performance.

Training Data

Problem	Dataset	Source	ML problem
Image Classification	Fashion MNIST	`keras.dataset.fashion_mnist`	Classification
Regression	AutoMPG	UCI ML repository	Regresson
Structured Data	Heart Disease	Cleveland Clinic Foundation	Classification
Text Classification	IMDB	`tensorflow_datasets`	Classification

As you can see that we have datasets from mixed sources so that we can demonstrate how to load data from different sources.

Data Visualization

We will be using matplotlib for visualization of structured data as well as image related visualizations (imshow).

Data Preprocessing

Structured data: We apply normalization on data in case of continous attributes. We convert discrete attributes into appropriate representations like one-hot encoding, interger encoding or embeddings (which is handled as part of model.)
Image Data: We augment image data through rotation, translation and by adding noise. This enables us to obtain more training examples from the existing ones.
Text Data: We construct vocabulary, discard in-frequent words. We also transform strings into appropriate feature representation like integer encoding or one-hot encoding.

Model Construction

Specification

We fix the architecture of the models and for problems we will be using feed-forward neural networks (FFNN). We will be using tf.keras.models.Sequential model for building FFNNs. Typical model specifciation looks like -

keras.Sequential([
    keras.layers.Dense(num_hidden_units, kernel_regularizer=regularization, activation=activation),  #hidden layer
    keras.layers.Dense(num_hidden_units, kernel_regularizer=regularization, activation=activation),  #hidden layer
    keras.layers.Dense(num_hidden_units, activation=activation)                                      #output layer
])

We usually use relu activation in the hidden layers. The activation for the output layer depends on the problem:

For regression problem, we use linear activation, which is the default activation.
For binary classification problems, we use sigmoid activation.
For multi-class classification problems, we use softmax activation. Here we get a probability distribution over all labels.

The number of hidden layers and units within them are part of model configuration or hyperparameters. The number of units in the output layer depends on the problem:

For a single output regression or binary classification problem, we have a single unit in the output layer.
For multiclass classification problem, like MNIST or Fashion MNIST, we have number of units equal to the number of classes.

Compilation

After specifying the model, we need to compile them where we specify loss function, optimization algorithm and metrics to track during model training.

model.compile(optimizer=optimizer,
              loss=loss,
              metrics=list_of_metrics)

Depending of the problem, we provide the loss function.

For regression, we use mean squared error mse or mean absolute error mae.
For binary classification, we use binary_cross_entropy_loss
For multi-class classification, we use either categorical_cross_entropy_loss or sparse_categorical_cross_entropy_loss.

Model Training

We perfom model training with model.fit function where we give training data - features and labels, validation data - features and labels(optional), and number of epochs for training.

Model Evaluation

We use model.evaluate function for evaluating model performance on test data. We provide both test data and labels to the evaluate function.

Model Training

Finally, we use model.predict function to predict labels for new data. Here we only specify the data and the function returns its prediction in terms of labels.

Review

The following is the list of important functions and concepts to remember from this week

Task	API/Function
Model Specification	`tf.keras.Sequential`
Add layers in model	`layers`, `layers.Dense`
Activation	`relu`, `sigmoid`, `linear`, `softmax`
Loss functions	`mse`, `binary_cross_entropy`, `sparse_categorical_cross_entropy`
Optimizers	`adam`
Metrics	`accuracy`
Data load	`tf.data.Dataset`

Overview of Week 4

Note on teaching style/method

Broad steps in training ML model

Training Data

Data Visualization

Data Preprocessing

Model Construction

Model Training

Model Evaluation

Model Training

Review