Preliminary setup¶
Before we delve into coding, you need to prepare your own pc, or your account in an HPC to be able to run codes.
The below instructions are some highlights. I assume you already know how to install the rest (for example the jupyter notebook), or how to run a python script without a jupyter notebook. I also assume you are familiar with terminal and linux.
Anaconda/Miniconda¶
This recipe is intended for Linux, specifically Ubuntu 16.04 or higher (64-bit). If you are using windows, please consider installing WSL2. There is no official GPU support for MacOs, so please avoid.
Install Miniconda
Miniconda is the recommended approach for installing TensorFlow with GPU support. It creates a separate environment to avoid changing any installed software in your system. This is also the easiest way to install the required software especially for the GPU setup.
You can use the following command to install Miniconda. During installation, you may need to press enter and type “yes”.
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh bash Miniconda3-latest-Linux-x86_64.sh
You may need to restart your terminal or source ~/.bashrc to enable the conda command. Use conda -V to test if it is installed successfully. 2. Create a conda environment
Create a new conda environment named qml with the following command.
conda create –name qml python=3.8
You can deactivate and activate it with the following commands.
conda deactivate conda activate qml
Make sure it is activated for the rest of the installation. 3. GPU setup
You can skip this section if you only run TensorFlow on the CPU.
First install the NVIDIA GPU driver if you have not. You can use the following command to verify it is installed.
nvidia-smi
Then install CUDA and cuDNN with conda.
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
Configure the system paths. You can do it with following command everytime your start a new terminal after activating your conda environment.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
For your convenience it is recommended that you automate it with the following commands. The system paths will be automatically configured when you activate this conda environment.
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
Install TensorFlow
TensorFlow requires a recent version of pip, so upgrade your pip installation to be sure you’re running the latest version.
pip install --upgrade pip
Then, install TensorFlow with pip. Note: Do not install TensorFlow with conda. It may not have the latest stable version. pip is recommended since TensorFlow is only officially released to PyPI.
pip install tensorflow
Verify install
Verify the CPU setup:
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
If a tensor is returned, you’ve installed TensorFlow successfully.
Verify the GPU setup:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
If a list of GPU devices is returned, you’ve installed TensorFlow successfully.
We will install other packages as we progress
Aptainer (formerly Singularity)¶
Apptainer/Singularity is the most widely used container system for HPC. It is designed to execute applications at bare-metal performance while being secure, portable, and 100% reproducible. Apptainer is an open-source project with a friendly community of developers and users. The user base continues to expand, with Apptainer/Singularity now used across industry and academia in many areas.
Apptainer/Singularity is already installed in carbon.physics.metu.edu.tr, so you can start using it immediately. If you want to install it to your own pc, the instructions are at https://docs.sylabs.io/guides/3.0/user-guide/quick_start.html
NVIDIA kindly provides, optimized containers for numerous academic software. Please check out NGC Catalog
In carbon, you can find the TensorFlow Containers at /share/apps/singularity-containers/
If you want to “pull” i.e. download and compile a container from NGC, try something like singularity pull tensorflow-22.09-tf1-py3.sif docker://nvcr.io/nvidia/tensorflow:22.09-tf1-py
. This will download (a lot) and compile (a lot) to produce a sif image for you. Then you can run this image with singularity run --nv '-B<your working directory>:/host_pwd' --pwd /host_pwd tensorflow-22.09-tf1-py3.sif
Running a singularity container in an HPC environment is similar to running it in your own computer. An example script can be found here.
Basic regression: Predict fuel efficiency¶
In a regression problem, the aim is to predict the output of a continuous value, like a price or a probability. Contrast this with a classification problem, where the aim is to select a class from a list of classes (for example, where a picture contains an apple or an orange, recognizing which fruit is in the picture).
This tutorial uses the classic Auto MPG dataset and demonstrates how to build models to predict the fuel efficiency of the late-1970s and early 1980s automobiles. To do this, you will provide the models with a description of many automobiles from that time period. This description includes attributes like cylinders, displacement, horsepower, and weight.
This example uses the Keras API. (Visit the Keras tutorials and guides to learn more.)
# Use seaborn for pairplot.
!pip install -q seaborn
/bin/bash: /home/obm/Prog/miniconda3/envs/qml/lib/libtinfo.so.6: no version information available (required by /bin/bash)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
# Make NumPy printouts easier to read.
np.set_printoptions(precision=3, suppress=True)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
print(tf.__version__)
2022-10-27 12:48:37.014789: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 12:48:37.141219: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-10-27 12:48:37.713385: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/obm/Prog/miniconda3/envs/qml/lib/
2022-10-27 12:48:37.713458: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/obm/Prog/miniconda3/envs/qml/lib/
2022-10-27 12:48:37.713463: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2.10.0
The Auto MPG dataset¶
The dataset is available from the UCI Machine Learning Repository.
Get the data¶
First download and import the dataset using pandas:
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data'
column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower', 'Weight',
'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(url, names=column_names,
na_values='?', comment='\t',
sep=' ', skipinitialspace=True)
dataset = raw_dataset.copy()
dataset.tail()
MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model Year | Origin | |
---|---|---|---|---|---|---|---|---|
393 | 27.0 | 4 | 140.0 | 86.0 | 2790.0 | 15.6 | 82 | 1 |
394 | 44.0 | 4 | 97.0 | 52.0 | 2130.0 | 24.6 | 82 | 2 |
395 | 32.0 | 4 | 135.0 | 84.0 | 2295.0 | 11.6 | 82 | 1 |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625.0 | 18.6 | 82 | 1 |
397 | 31.0 | 4 | 119.0 | 82.0 | 2720.0 | 19.4 | 82 | 1 |
Clean the data¶
The dataset contains a few unknown values:
dataset.isna().sum()
MPG 0
Cylinders 0
Displacement 0
Horsepower 6
Weight 0
Acceleration 0
Model Year 0
Origin 0
dtype: int64
Drop those rows to keep this initial tutorial simple:
dataset = dataset.dropna()
The "Origin"
column is categorical, not numeric. So the next step is to one-hot encode the values in the column with pd.get_dummies.
Note: You can set up the tf.keras.Model
to do this kind of transformation for you but that’s beyond the scope of this tutorial. Check out the Classify structured data using Keras preprocessing layers or Load CSV data tutorials for examples.
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, columns=['Origin'], prefix='', prefix_sep='')
dataset.tail()
MPG | Cylinders | Displacement | Horsepower | Weight | Acceleration | Model Year | Europe | Japan | USA | |
---|---|---|---|---|---|---|---|---|---|---|
393 | 27.0 | 4 | 140.0 | 86.0 | 2790.0 | 15.6 | 82 | 0 | 0 | 1 |
394 | 44.0 | 4 | 97.0 | 52.0 | 2130.0 | 24.6 | 82 | 1 | 0 | 0 |
395 | 32.0 | 4 | 135.0 | 84.0 | 2295.0 | 11.6 | 82 | 0 | 0 | 1 |
396 | 28.0 | 4 | 120.0 | 79.0 | 2625.0 | 18.6 | 82 | 0 | 0 | 1 |
397 | 31.0 | 4 | 119.0 | 82.0 | 2720.0 | 19.4 | 82 | 0 | 0 | 1 |
Split the data into training and test sets¶
Now, split the dataset into a training set and a test set. You will use the test set in the final evaluation of your models.
train_dataset = dataset.sample(frac=0.8, random_state=0)
test_dataset = dataset.drop(train_dataset.index)
Inspect the data¶
Review the joint distribution of a few pairs of columns from the training set.
The top row suggests that the fuel efficiency (MPG) is a function of all the other parameters. The other rows indicate they are functions of each other.
sns.pairplot(train_dataset[['MPG', 'Cylinders', 'Displacement', 'Weight']], diag_kind='kde')
<seaborn.axisgrid.PairGrid at 0x7f8980f1db20>
Let’s also check the overall statistics. Note how each feature covers a very different range:
train_dataset.describe().transpose()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
MPG | 314.0 | 23.310510 | 7.728652 | 10.0 | 17.00 | 22.0 | 28.95 | 46.6 |
Cylinders | 314.0 | 5.477707 | 1.699788 | 3.0 | 4.00 | 4.0 | 8.00 | 8.0 |
Displacement | 314.0 | 195.318471 | 104.331589 | 68.0 | 105.50 | 151.0 | 265.75 | 455.0 |
Horsepower | 314.0 | 104.869427 | 38.096214 | 46.0 | 76.25 | 94.5 | 128.00 | 225.0 |
Weight | 314.0 | 2990.251592 | 843.898596 | 1649.0 | 2256.50 | 2822.5 | 3608.00 | 5140.0 |
Acceleration | 314.0 | 15.559236 | 2.789230 | 8.0 | 13.80 | 15.5 | 17.20 | 24.8 |
Model Year | 314.0 | 75.898089 | 3.675642 | 70.0 | 73.00 | 76.0 | 79.00 | 82.0 |
Europe | 314.0 | 0.178344 | 0.383413 | 0.0 | 0.00 | 0.0 | 0.00 | 1.0 |
Japan | 314.0 | 0.197452 | 0.398712 | 0.0 | 0.00 | 0.0 | 0.00 | 1.0 |
USA | 314.0 | 0.624204 | 0.485101 | 0.0 | 0.00 | 1.0 | 1.00 | 1.0 |
Split features from labels¶
Separate the target value—the “label”—from the features. This label is the value that you will train the model to predict.
train_features = train_dataset.copy()
test_features = test_dataset.copy()
train_labels = train_features.pop('MPG')
test_labels = test_features.pop('MPG')
Normalization¶
In the table of statistics it’s easy to see how different the ranges of each feature are:
train_dataset.describe().transpose()[['mean', 'std']]
mean | std | |
---|---|---|
MPG | 23.310510 | 7.728652 |
Cylinders | 5.477707 | 1.699788 |
Displacement | 195.318471 | 104.331589 |
Horsepower | 104.869427 | 38.096214 |
Weight | 2990.251592 | 843.898596 |
Acceleration | 15.559236 | 2.789230 |
Model Year | 75.898089 | 3.675642 |
Europe | 0.178344 | 0.383413 |
Japan | 0.197452 | 0.398712 |
USA | 0.624204 | 0.485101 |
It is good practice to normalize features that use different scales and ranges.
One reason this is important is because the features are multiplied by the model weights. So, the scale of the outputs and the scale of the gradients are affected by the scale of the inputs.
Although a model might converge without feature normalization, normalization makes training much more stable.
Note: There is no advantage to normalizing the one-hot features—it is done here for simplicity. For more details on how to use the preprocessing layers, refer to the Working with preprocessing layers guide and the Classify structured data using Keras preprocessing layers tutorial.
The Normalization layer¶
The tf.keras.layers.Normalization
is a clean and simple way to add feature normalization into your model.
The first step is to create the layer:
normalizer = tf.keras.layers.Normalization(axis=-1)
Then, fit the state of the preprocessing layer to the data by calling Normalization.adapt
:
normalizer.adapt(np.array(train_features))
2022-10-27 12:48:41.432307: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.478466: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.478687: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.479354: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-27 12:48:41.481218: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.481411: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.481574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.920037: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.928261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.928420: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:48:41.928551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6911 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5
Calculate the mean and variance, and store them in the layer:
print(normalizer.mean.numpy())
[[ 5.478 195.318 104.869 2990.252 15.559 75.898 0.178 0.197
0.624]]
When the layer is called, it returns the input data, with each feature independently normalized:
first = np.array(train_features[:1])
with np.printoptions(precision=2, suppress=True):
print('First example:', first)
print()
print('Normalized:', normalizer(first).numpy())
First example: [[ 4. 90. 75. 2125. 14.5 74. 0. 0. 1. ]]
Normalized: [[-0.87 -1.01 -0.79 -1.03 -0.38 -0.52 -0.47 -0.5 0.78]]
Linear regression¶
Before building a deep neural network model, start with linear regression using one and several variables.
Linear regression with one variable¶
Begin with a single-variable linear regression to predict 'MPG'
from 'Horsepower'
.
Training a model with tf.keras
typically starts by defining the model architecture. Use a tf.keras.Sequential
model, which represents a sequence of steps.
There are two steps in your single-variable linear regression model:
Normalize the
'Horsepower'
input features using thetf.keras.layers.Normalization
preprocessing layer.Apply a linear transformation (\(y = mx+b\)) to produce 1 output using a linear layer (
tf.keras.layers.Dense
).
The number of inputs can either be set by the input_shape
argument, or automatically when the model is run for the first time.
First, create a NumPy array made of the 'Horsepower'
features. Then, instantiate the tf.keras.layers.Normalization
and fit its state to the horsepower
data:
horsepower = np.array(train_features['Horsepower'])
horsepower_normalizer = layers.Normalization(input_shape=[1,], axis=None)
horsepower_normalizer.adapt(horsepower)
Build the Keras Sequential model:
horsepower_model = tf.keras.Sequential([
horsepower_normalizer,
layers.Dense(units=1)
])
horsepower_model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
normalization_1 (Normalizat (None, 1) 3
ion)
dense (Dense) (None, 1) 2
=================================================================
Total params: 5
Trainable params: 2
Non-trainable params: 3
_________________________________________________________________
This model will predict 'MPG'
from 'Horsepower'
.
Run the untrained model on the first 10 ‘Horsepower’ values. The output won’t be good, but notice that it has the expected shape of (10, 1)
:
horsepower_model.predict(horsepower[:10])
1/1 [==============================] - ETA: 0s
1/1 [==============================] - 1s 826ms/step
array([[-1.176],
[-0.664],
[ 2.171],
[-1.648],
[-1.491],
[-0.585],
[-1.767],
[-1.491],
[-0.389],
[-0.664]], dtype=float32)
Once the model is built, configure the training procedure using the Keras Model.compile
method. The most important arguments to compile are the loss
and the optimizer
, since these define what will be optimized (mean_absolute_error
) and how (using the tf.keras.optimizers.Adam
).
horsepower_model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
loss='mean_absolute_error')
Use Keras Model.fit
to execute the training for 100 epochs:
%%time
history = horsepower_model.fit(
train_features['Horsepower'],
train_labels,
epochs=100,
# Suppress logging.
verbose=0,
# Calculate validation results on 20% of the training data.
validation_split = 0.2)
/home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages/keras/engine/data_adapter.py:1699: FutureWarning: The behavior of `series[i:j]` with an integer-dtype index is deprecated. In a future version, this will be treated as *label-based* indexing, consistent with e.g. `series[i]` lookups. To retain the old behavior, use `series.iloc[i:j]`. To get the future behavior, use `series.loc[i:j]`.
return t[start:end]
CPU times: user 3.58 s, sys: 796 ms, total: 4.37 s
Wall time: 3.16 s
Visualize the model’s training progress using the stats stored in the history
object:
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()
loss | val_loss | epoch | |
---|---|---|---|
95 | 3.803624 | 4.189087 | 95 |
96 | 3.804405 | 4.211332 | 96 |
97 | 3.806713 | 4.183324 | 97 |
98 | 3.805672 | 4.184466 | 98 |
99 | 3.802745 | 4.190432 | 99 |
def plot_loss(history):
plt.plot(history.history['loss'], label='loss')
plt.plot(history.history['val_loss'], label='val_loss')
plt.ylim([0, 10])
plt.xlabel('Epoch')
plt.ylabel('Error [MPG]')
plt.legend()
plt.grid(True)
plot_loss(history)
Collect the results on the test set for later:
test_results = {}
test_results['horsepower_model'] = horsepower_model.evaluate(
test_features['Horsepower'],
test_labels, verbose=0)
Since this is a single variable regression, it’s easy to view the model’s predictions as a function of the input:
x = tf.linspace(0.0, 250, 251)
y = horsepower_model.predict(x)
1/8 [==>...........................] - ETA: 0s
8/8 [==============================] - 0s 971us/step
def plot_horsepower(x, y):
plt.scatter(train_features['Horsepower'], train_labels, label='Data')
plt.plot(x, y, color='k', label='Predictions')
plt.xlabel('Horsepower')
plt.ylabel('MPG')
plt.legend()
plot_horsepower(x, y)
Linear regression with multiple inputs¶
You can use an almost identical setup to make predictions based on multiple inputs. This model still does the same \(y = mx+b\) except that \(m\) is a matrix and \(b\) is a vector.
Create a two-step Keras Sequential model again with the first layer being normalizer
(tf.keras.layers.Normalization(axis=-1)
) you defined earlier and adapted to the whole dataset:
linear_model = tf.keras.Sequential([
normalizer,
layers.Dense(units=1)
])
When you call Model.predict
on a batch of inputs, it produces units=1
outputs for each example:
linear_model.predict(train_features[:10])
1/1 [==============================] - ETA: 0s
1/1 [==============================] - 0s 30ms/step
array([[ 1.233],
[ 0.684],
[-1.213],
[ 2.07 ],
[ 1.148],
[ 0.034],
[ 1.103],
[-1.954],
[-0.219],
[ 0.038]], dtype=float32)
When you call the model, its weight matrices will be built—check that the kernel
weights (the \(m\) in \(y=mx+b\)) have a shape of (9, 1)
:
linear_model.layers[1].kernel
<tf.Variable 'dense_1/kernel:0' shape=(9, 1) dtype=float32, numpy=
array([[-0.417],
[ 0.115],
[-0.235],
[-0.45 ],
[-0.605],
[ 0.362],
[-0.222],
[ 0.462],
[ 0.544]], dtype=float32)>
Configure the model with Keras Model.compile
and train with Model.fit
for 100 epochs:
linear_model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.1),
loss='mean_absolute_error')
%%time
history = linear_model.fit(
train_features,
train_labels,
epochs=100,
# Suppress logging.
verbose=0,
# Calculate validation results on 20% of the training data.
validation_split = 0.2)
/home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages/keras/engine/data_adapter.py:1699: FutureWarning: The behavior of `series[i:j]` with an integer-dtype index is deprecated. In a future version, this will be treated as *label-based* indexing, consistent with e.g. `series[i]` lookups. To retain the old behavior, use `series.iloc[i:j]`. To get the future behavior, use `series.loc[i:j]`.
return t[start:end]
CPU times: user 3.44 s, sys: 711 ms, total: 4.15 s
Wall time: 2.85 s
Using all the inputs in this regression model achieves a much lower training and validation error than the horsepower_model
, which had one input:
plot_loss(history)
Collect the results on the test set for later:
test_results['linear_model'] = linear_model.evaluate(
test_features, test_labels, verbose=0)
Regression with a deep neural network (DNN)¶
In the previous section, you implemented two linear models for single and multiple inputs.
Here, you will implement single-input and multiple-input DNN models.
The code is basically the same except the model is expanded to include some “hidden” non-linear layers. The name “hidden” here just means not directly connected to the inputs or outputs.
These models will contain a few more layers than the linear model:
The normalization layer, as before (with
horsepower_normalizer
for a single-input model andnormalizer
for a multiple-input model).Two hidden, non-linear,
Dense
layers with the ReLU (relu
) activation function nonlinearity.A linear
Dense
single-output layer.
Both models will use the same training procedure, so the compile
method is included in the build_and_compile_model
function below.
def build_and_compile_model(norm):
model = keras.Sequential([
norm,
layers.Dense(64, activation='relu'),
layers.Dense(64, activation='relu'),
layers.Dense(1)
])
model.compile(loss='mean_absolute_error',
optimizer=tf.keras.optimizers.Adam(0.001))
return model
Regression using a DNN and a single input¶
Create a DNN model with only 'Horsepower'
as input and horsepower_normalizer
(defined earlier) as the normalization layer:
dnn_horsepower_model = build_and_compile_model(horsepower_normalizer)
This model has quite a few more trainable parameters than the linear models:
dnn_horsepower_model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
normalization_1 (Normalizat (None, 1) 3
ion)
dense_2 (Dense) (None, 64) 128
dense_3 (Dense) (None, 64) 4160
dense_4 (Dense) (None, 1) 65
=================================================================
Total params: 4,356
Trainable params: 4,353
Non-trainable params: 3
_________________________________________________________________
Train the model with Keras Model.fit
:
%%time
history = dnn_horsepower_model.fit(
train_features['Horsepower'],
train_labels,
validation_split=0.2,
verbose=0, epochs=100)
/home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages/keras/engine/data_adapter.py:1699: FutureWarning: The behavior of `series[i:j]` with an integer-dtype index is deprecated. In a future version, this will be treated as *label-based* indexing, consistent with e.g. `series[i]` lookups. To retain the old behavior, use `series.iloc[i:j]`. To get the future behavior, use `series.loc[i:j]`.
return t[start:end]
CPU times: user 3.65 s, sys: 725 ms, total: 4.38 s
Wall time: 3.04 s
This model does slightly better than the linear single-input horsepower_model
:
plot_loss(history)
If you plot the predictions as a function of 'Horsepower'
, you should notice how this model takes advantage of the nonlinearity provided by the hidden layers:
x = tf.linspace(0.0, 250, 251)
y = dnn_horsepower_model.predict(x)
1/8 [==>...........................] - ETA: 0s
8/8 [==============================] - 0s 757us/step
plot_horsepower(x, y)
Collect the results on the test set for later:
test_results['dnn_horsepower_model'] = dnn_horsepower_model.evaluate(
test_features['Horsepower'], test_labels,
verbose=0)
Regression using a DNN and multiple inputs¶
Repeat the previous process using all the inputs. The model’s performance slightly improves on the validation dataset.
dnn_model = build_and_compile_model(normalizer)
dnn_model.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
normalization (Normalizatio (None, 9) 19
n)
dense_5 (Dense) (None, 64) 640
dense_6 (Dense) (None, 64) 4160
dense_7 (Dense) (None, 1) 65
=================================================================
Total params: 4,884
Trainable params: 4,865
Non-trainable params: 19
_________________________________________________________________
%%time
history = dnn_model.fit(
train_features,
train_labels,
validation_split=0.2,
verbose=0, epochs=100)
CPU times: user 3.55 s, sys: 693 ms, total: 4.24 s
Wall time: 2.94 s
plot_loss(history)
Collect the results on the test set:
test_results['dnn_model'] = dnn_model.evaluate(test_features, test_labels, verbose=0)
Performance¶
Since all models have been trained, you can review their test set performance:
pd.DataFrame(test_results, index=['Mean absolute error [MPG]']).T
Mean absolute error [MPG] | |
---|---|
horsepower_model | 3.652521 |
linear_model | 2.501625 |
dnn_horsepower_model | 2.879357 |
dnn_model | 1.650163 |
These results match the validation error observed during training.
Make predictions¶
You can now make predictions with the dnn_model
on the test set using Keras Model.predict
and review the loss:
test_predictions = dnn_model.predict(test_features).flatten()
a = plt.axes(aspect='equal')
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)
1/3 [=========>....................] - ETA: 0s
3/3 [==============================] - 0s 1ms/step
It appears that the model predicts reasonably well.
Now, check the error distribution:
error = test_predictions - test_labels
plt.hist(error, bins=25)
plt.xlabel('Prediction Error [MPG]')
_ = plt.ylabel('Count')
If you’re happy with the model, save it for later use with Model.save
:
dnn_model.save('dnn_model')
INFO:tensorflow:Assets written to: dnn_model/assets
If you reload the model, it gives identical output:
reloaded = tf.keras.models.load_model('dnn_model')
test_results['reloaded'] = reloaded.evaluate(
test_features, test_labels, verbose=0)
pd.DataFrame(test_results, index=['Mean absolute error [MPG]']).T
Mean absolute error [MPG] | |
---|---|
horsepower_model | 3.652521 |
linear_model | 2.501625 |
dnn_horsepower_model | 2.879357 |
dnn_model | 1.650163 |
reloaded | 1.650163 |
Takeaways:¶
This part introduced a few techniques to handle a regression problem. Here are a few more tips that may help:
Mean squared error (MSE) (
tf.keras.losses.MeanSquaredError
) and mean absolute error (MAE) (tf.keras.losses.MeanAbsoluteError
) are common loss functions used for regression problems. MAE is less sensitive to outliers. Different loss functions are used for classification problems.Similarly, evaluation metrics used for regression differ from classification.
When numeric input data features have values with different ranges, each feature should be scaled independently to the same range.
Overfitting is a common problem for DNN models, though it wasn’t a problem for this tutorial.
DO IT YOURSELF 1:¶
Please repeat the same steps as above, using the superconductivity dataset. Write it as a standalone python script and run it in carbon.physics.metu.edu.tr using singularity.
Overfit and underfit¶
As always, the code in this example will use the tf.keras
API, which you can learn more about in the TensorFlow Keras guide.
In both of the previous examples—classifying text and predicting fuel efficiency—the accuracy of models on the validation data would peak after training for a number of epochs and then stagnate or start decreasing.
In other words, your model would overfit to the training data. Learning how to deal with overfitting is important. Although it’s often possible to achieve high accuracy on the training set, what you really want is to develop models that generalize well to a testing set (or data they haven’t seen before).
The opposite of overfitting is underfitting. Underfitting occurs when there is still room for improvement on the train data. This can happen for a number of reasons: If the model is not powerful enough, is over-regularized, or has simply not been trained long enough. This means the network has not learned the relevant patterns in the training data.
If you train for too long though, the model will start to overfit and learn patterns from the training data that don’t generalize to the test data. You need to strike a balance. Understanding how to train for an appropriate number of epochs as you’ll explore below is a useful skill.
To prevent overfitting, the best solution is to use more complete training data. The dataset should cover the full range of inputs that the model is expected to handle. Additional data may only be useful if it covers new and interesting cases.
A model trained on more complete data will naturally generalize better. When that is no longer possible, the next best solution is to use techniques like regularization. These place constraints on the quantity and type of information your model can store. If a network can only afford to memorize a small number of patterns, the optimization process will force it to focus on the most prominent patterns, which have a better chance of generalizing well.
In this notebook, you’ll explore several common regularization techniques, and use them to improve on a classification model.
Setup¶
Before getting started, import the necessary packages:
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import regularizers
print(tf.__version__)
2.10.0
!pip install git+https://github.com/tensorflow/docs
import tensorflow_docs as tfdocs
import tensorflow_docs.modeling
import tensorflow_docs.plots
/bin/bash: /home/obm/Prog/miniconda3/envs/qml/lib/libtinfo.so.6: no version information available (required by /bin/bash)
Collecting git+https://github.com/tensorflow/docs
Cloning https://github.com/tensorflow/docs to /tmp/pip-req-build-9_r6grn8
Running command git clone --filter=blob:none --quiet https://github.com/tensorflow/docs /tmp/pip-req-build-9_r6grn8
Resolved https://github.com/tensorflow/docs to commit 48bbec70106050fedf462e5f79993aceef114cfb
Preparing metadata (setup.py) ... ?25l-
done
?25hRequirement already satisfied: astor in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from tensorflow-docs==0.0.0.dev0) (0.8.1)
Requirement already satisfied: absl-py in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from tensorflow-docs==0.0.0.dev0) (1.3.0)
Requirement already satisfied: jinja2 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from tensorflow-docs==0.0.0.dev0) (3.0.3)
Requirement already satisfied: nbformat in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from tensorflow-docs==0.0.0.dev0) (5.5.0)
Requirement already satisfied: protobuf<3.20,>=3.12.0 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from tensorflow-docs==0.0.0.dev0) (3.19.6)
Requirement already satisfied: pyyaml in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from tensorflow-docs==0.0.0.dev0) (6.0)
Requirement already satisfied: MarkupSafe>=2.0 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from jinja2->tensorflow-docs==0.0.0.dev0) (2.1.1)
Requirement already satisfied: jsonschema>=2.6 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from nbformat->tensorflow-docs==0.0.0.dev0) (4.16.0)
Requirement already satisfied: fastjsonschema in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from nbformat->tensorflow-docs==0.0.0.dev0) (2.16.2)
Requirement already satisfied: jupyter_core in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from nbformat->tensorflow-docs==0.0.0.dev0) (4.11.1)
Requirement already satisfied: traitlets>=5.1 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from nbformat->tensorflow-docs==0.0.0.dev0) (5.1.1)
Requirement already satisfied: importlib-resources>=1.4.0 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from jsonschema>=2.6->nbformat->tensorflow-docs==0.0.0.dev0) (5.2.0)
Requirement already satisfied: attrs>=17.4.0 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from jsonschema>=2.6->nbformat->tensorflow-docs==0.0.0.dev0) (21.4.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from jsonschema>=2.6->nbformat->tensorflow-docs==0.0.0.dev0) (0.18.0)
Requirement already satisfied: pkgutil-resolve-name>=1.3.10 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from jsonschema>=2.6->nbformat->tensorflow-docs==0.0.0.dev0) (1.3.10)
Requirement already satisfied: zipp>=3.1.0 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from importlib-resources>=1.4.0->jsonschema>=2.6->nbformat->tensorflow-docs==0.0.0.dev0) (3.9.0)
from IPython import display
from matplotlib import pyplot as plt
import numpy as np
import pathlib
import shutil
import tempfile
logdir = pathlib.Path(tempfile.mkdtemp())/"tensorboard_logs"
shutil.rmtree(logdir, ignore_errors=True)
The Higgs dataset¶
The goal of this tutorial is not to do particle physics, so don’t dwell on the details of the dataset. It contains 11,000,000 examples, each with 28 features, and a binary class label.
gz = tf.keras.utils.get_file('HIGGS.csv.gz', 'http://mlphysics.ics.uci.edu/data/higgs/HIGGS.csv.gz')
FEATURES = 28
The tf.data.experimental.CsvDataset
class can be used to read csv records directly from a gzip file with no intermediate decompression step.
ds = tf.data.experimental.CsvDataset(gz,[float(),]*(FEATURES+1), compression_type="GZIP")
That csv reader class returns a list of scalars for each record. The following function repacks that list of scalars into a (feature_vector, label) pair.
def pack_row(*row):
label = row[0]
features = tf.stack(row[1:],1)
return features, label
TensorFlow is most efficient when operating on large batches of data.
So, instead of repacking each row individually make a new tf.data.Dataset
that takes batches of 10,000 examples, applies the pack_row
function to each batch, and then splits the batches back up into individual records:
packed_ds = ds.batch(10000).map(pack_row).unbatch()
Inspect some of the records from this new packed_ds
.
The features are not perfectly normalized, but this is sufficient for this tutorial.
for features,label in packed_ds.batch(1000).take(1):
print(features[0])
plt.hist(features.numpy().flatten(), bins = 101)
tf.Tensor(
[ 0.869 -0.635 0.226 0.327 -0.69 0.754 -0.249 -1.092 0. 1.375
-0.654 0.93 1.107 1.139 -1.578 -1.047 0. 0.658 -0.01 -0.046
3.102 1.354 0.98 0.978 0.92 0.722 0.989 0.877], shape=(28,), dtype=float32)
To keep this tutorial relatively short, use just the first 1,000 samples for validation, and the next 10,000 for training:
N_VALIDATION = int(1e3)
N_TRAIN = int(1e4)
BUFFER_SIZE = int(1e4)
BATCH_SIZE = 500
STEPS_PER_EPOCH = N_TRAIN//BATCH_SIZE
The Dataset.skip
and Dataset.take
methods make this easy.
At the same time, use the Dataset.cache
method to ensure that the loader doesn’t need to re-read the data from the file on each epoch:
validate_ds = packed_ds.take(N_VALIDATION).cache()
train_ds = packed_ds.skip(N_VALIDATION).take(N_TRAIN).cache()
train_ds
<CacheDataset element_spec=(TensorSpec(shape=(28,), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.float32, name=None))>
These datasets return individual examples. Use the Dataset.batch
method to create batches of an appropriate size for training. Before batching, also remember to use Dataset.shuffle
and Dataset.repeat
on the training set.
validate_ds = validate_ds.batch(BATCH_SIZE)
train_ds = train_ds.shuffle(BUFFER_SIZE).repeat().batch(BATCH_SIZE)
Demonstrate overfitting¶
The simplest way to prevent overfitting is to start with a small model: A model with a small number of learnable parameters (which is determined by the number of layers and the number of units per layer). In deep learning, the number of learnable parameters in a model is often referred to as the model’s “capacity”.
Intuitively, a model with more parameters will have more “memorization capacity” and therefore will be able to easily learn a perfect dictionary-like mapping between training samples and their targets, a mapping without any generalization power, but this would be useless when making predictions on previously unseen data.
Always keep this in mind: deep learning models tend to be good at fitting to the training data, but the real challenge is generalization, not fitting.
On the other hand, if the network has limited memorization resources, it will not be able to learn the mapping as easily. To minimize its loss, it will have to learn compressed representations that have more predictive power. At the same time, if you make your model too small, it will have difficulty fitting to the training data. There is a balance between “too much capacity” and “not enough capacity”.
Unfortunately, there is no magical formula to determine the right size or architecture of your model (in terms of the number of layers, or the right size for each layer). You will have to experiment using a series of different architectures.
To find an appropriate model size, it’s best to start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until you see diminishing returns on the validation loss.
Start with a simple model using only densely-connected layers (tf.keras.layers.Dense
) as a baseline, then create larger models, and compare them.
Training procedure¶
Many models train better if you gradually reduce the learning rate during training. Use tf.keras.optimizers.schedules
to reduce the learning rate over time:
lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
0.001,
decay_steps=STEPS_PER_EPOCH*1000,
decay_rate=1,
staircase=False)
def get_optimizer():
return tf.keras.optimizers.Adam(lr_schedule)
The code above sets a tf.keras.optimizers.schedules.InverseTimeDecay
to hyperbolically decrease the learning rate to 1/2 of the base rate at 1,000 epochs, 1/3 at 2,000 epochs, and so on.
step = np.linspace(0,100000)
lr = lr_schedule(step)
plt.figure(figsize = (8,6))
plt.plot(step/STEPS_PER_EPOCH, lr)
plt.ylim([0,max(plt.ylim())])
plt.xlabel('Epoch')
_ = plt.ylabel('Learning Rate')
Each model in this tutorial will use the same training configuration. So set these up in a reusable way, starting with the list of callbacks.
The training for this tutorial runs for many short epochs. To reduce the logging noise use the tfdocs.EpochDots
which simply prints a .
for each epoch, and a full set of metrics every 100 epochs.
Next include tf.keras.callbacks.EarlyStopping
to avoid long and unnecessary training times. Note that this callback is set to monitor the val_binary_crossentropy
, not the val_loss
. This difference will be important later.
Use callbacks.TensorBoard
to generate TensorBoard logs for the training.
def get_callbacks(name):
return [
tfdocs.modeling.EpochDots(),
tf.keras.callbacks.EarlyStopping(monitor='val_binary_crossentropy', patience=200),
tf.keras.callbacks.TensorBoard(logdir/name),
]
Similarly each model will use the same Model.compile
and Model.fit
settings:
def compile_and_fit(model, name, optimizer=None, max_epochs=10000):
if optimizer is None:
optimizer = get_optimizer()
model.compile(optimizer=optimizer,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[
tf.keras.losses.BinaryCrossentropy(
from_logits=True, name='binary_crossentropy'),
'accuracy'])
model.summary()
history = model.fit(
train_ds,
steps_per_epoch = STEPS_PER_EPOCH,
epochs=max_epochs,
validation_data=validate_ds,
callbacks=get_callbacks(name),
verbose=0)
return history
Tiny model¶
Start by training a model:
tiny_model = tf.keras.Sequential([
layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
layers.Dense(1)
])
size_histories = {}
size_histories['Tiny'] = compile_and_fit(tiny_model, 'sizes/Tiny')
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_8 (Dense) (None, 16) 464
dense_9 (Dense) (None, 1) 17
=================================================================
Total params: 481
Trainable params: 481
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.5086, binary_crossentropy:0.7727, loss:0.7727, val_accuracy:0.5040, val_binary_crossentropy:0.7448, val_loss:0.7448,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:0.6019, binary_crossentropy:0.6234, loss:0.6234, val_accuracy:0.5760, val_binary_crossentropy:0.6287, val_loss:0.6287,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:0.6385, binary_crossentropy:0.6065, loss:0.6065, val_accuracy:0.6270, val_binary_crossentropy:0.6056, val_loss:0.6056,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 300, accuracy:0.6511, binary_crossentropy:0.5959, loss:0.5959, val_accuracy:0.6450, val_binary_crossentropy:0.5913, val_loss:0.5913,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 400, accuracy:0.6614, binary_crossentropy:0.5888, loss:0.5888, val_accuracy:0.6610, val_binary_crossentropy:0.5833, val_loss:0.5833,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 500, accuracy:0.6703, binary_crossentropy:0.5849, loss:0.5849, val_accuracy:0.6760, val_binary_crossentropy:0.5810, val_loss:0.5810,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 600, accuracy:0.6728, binary_crossentropy:0.5821, loss:0.5821, val_accuracy:0.6500, val_binary_crossentropy:0.5808, val_loss:0.5808,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 700, accuracy:0.6802, binary_crossentropy:0.5795, loss:0.5795, val_accuracy:0.6710, val_binary_crossentropy:0.5779, val_loss:0.5779,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 800, accuracy:0.6777, binary_crossentropy:0.5777, loss:0.5777, val_accuracy:0.6790, val_binary_crossentropy:0.5776, val_loss:0.5776,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 900, accuracy:0.6845, binary_crossentropy:0.5758, loss:0.5758, val_accuracy:0.6790, val_binary_crossentropy:0.5782, val_loss:0.5782,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Now check how the model did:
plotter = tfdocs.plots.HistoryPlotter(metric = 'binary_crossentropy', smoothing_std=10)
plotter.plot(size_histories)
plt.ylim([0.5, 0.7])
(0.5, 0.7)
Small model¶
To check if you can beat the performance of the small model, progressively train some larger models.
Try two hidden layers with 16 units each:
small_model = tf.keras.Sequential([
# `input_shape` is only required here so that `.summary` works.
layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
layers.Dense(16, activation='elu'),
layers.Dense(1)
])
size_histories['Small'] = compile_and_fit(small_model, 'sizes/Small')
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_10 (Dense) (None, 16) 464
dense_11 (Dense) (None, 16) 272
dense_12 (Dense) (None, 1) 17
=================================================================
Total params: 753
Trainable params: 753
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.4784, binary_crossentropy:0.7893, loss:0.7893, val_accuracy:0.4910, val_binary_crossentropy:0.7389, val_loss:0.7389,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:0.6241, binary_crossentropy:0.6140, loss:0.6140, val_accuracy:0.6270, val_binary_crossentropy:0.6097, val_loss:0.6097,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:0.6642, binary_crossentropy:0.5867, loss:0.5867, val_accuracy:0.6460, val_binary_crossentropy:0.5886, val_loss:0.5886,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 300, accuracy:0.6755, binary_crossentropy:0.5751, loss:0.5751, val_accuracy:0.6780, val_binary_crossentropy:0.5781, val_loss:0.5781,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 400, accuracy:0.6827, binary_crossentropy:0.5686, loss:0.5686, val_accuracy:0.6580, val_binary_crossentropy:0.5762, val_loss:0.5762,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 500, accuracy:0.6842, binary_crossentropy:0.5638, loss:0.5638, val_accuracy:0.6760, val_binary_crossentropy:0.5702, val_loss:0.5702,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 600, accuracy:0.6924, binary_crossentropy:0.5591, loss:0.5591, val_accuracy:0.6810, val_binary_crossentropy:0.5713, val_loss:0.5713,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 700, accuracy:0.6942, binary_crossentropy:0.5546, loss:0.5546, val_accuracy:0.6730, val_binary_crossentropy:0.5729, val_loss:0.5729,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 800, accuracy:0.6973, binary_crossentropy:0.5518, loss:0.5518, val_accuracy:0.6650, val_binary_crossentropy:0.5776, val_loss:0.5776,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Medium model¶
Now try three hidden layers with 64 units each:
medium_model = tf.keras.Sequential([
layers.Dense(64, activation='elu', input_shape=(FEATURES,)),
layers.Dense(64, activation='elu'),
layers.Dense(64, activation='elu'),
layers.Dense(1)
])
And train the model using the same data:
size_histories['Medium'] = compile_and_fit(medium_model, "sizes/Medium")
Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_13 (Dense) (None, 64) 1856
dense_14 (Dense) (None, 64) 4160
dense_15 (Dense) (None, 64) 4160
dense_16 (Dense) (None, 1) 65
=================================================================
Total params: 10,241
Trainable params: 10,241
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.4971, binary_crossentropy:0.6986, loss:0.6986, val_accuracy:0.4890, val_binary_crossentropy:0.6791, val_loss:0.6791,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:0.7060, binary_crossentropy:0.5381, loss:0.5381, val_accuracy:0.6780, val_binary_crossentropy:0.6080, val_loss:0.6080,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:0.7829, binary_crossentropy:0.4352, loss:0.4352, val_accuracy:0.6390, val_binary_crossentropy:0.6822, val_loss:0.6822,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Large model¶
As an exercise, you can create an even larger model and check how quickly it begins overfitting. Next, add to this benchmark a network that has much more capacity, far more than the problem would warrant:
large_model = tf.keras.Sequential([
layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
layers.Dense(512, activation='elu'),
layers.Dense(512, activation='elu'),
layers.Dense(512, activation='elu'),
layers.Dense(1)
])
And, again, train the model using the same data:
size_histories['large'] = compile_and_fit(large_model, "sizes/large")
Model: "sequential_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_17 (Dense) (None, 512) 14848
dense_18 (Dense) (None, 512) 262656
dense_19 (Dense) (None, 512) 262656
dense_20 (Dense) (None, 512) 262656
dense_21 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.5174, binary_crossentropy:0.8343, loss:0.8343, val_accuracy:0.4600, val_binary_crossentropy:0.6981, val_loss:0.6981,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:1.0000, binary_crossentropy:0.0024, loss:0.0024, val_accuracy:0.6720, val_binary_crossentropy:1.7037, val_loss:1.7037,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:1.0000, binary_crossentropy:0.0001, loss:0.0001, val_accuracy:0.6680, val_binary_crossentropy:2.3695, val_loss:2.3695,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Plot the training and validation losses¶
The solid lines show the training loss, and the dashed lines show the validation loss (remember: a lower validation loss indicates a better model).
While building a larger model gives it more power, if this power is not constrained somehow it can easily overfit to the training set.
In this example, typically, only the "Tiny"
model manages to avoid overfitting altogether, and each of the larger models overfit the data more quickly. This becomes so severe for the "large"
model that you need to switch the plot to a log-scale to really figure out what’s happening.
This is apparent if you plot and compare the validation metrics to the training metrics.
It’s normal for there to be a small difference.
If both metrics are moving in the same direction, everything is fine.
If the validation metric begins to stagnate while the training metric continues to improve, you are probably close to overfitting.
If the validation metric is going in the wrong direction, the model is clearly overfitting.
plotter.plot(size_histories)
a = plt.xscale('log')
plt.xlim([5, max(plt.xlim())])
plt.ylim([0.5, 0.7])
plt.xlabel("Epochs [Log Scale]")
Text(0.5, 0, 'Epochs [Log Scale]')
Note: All the above training runs used the callbacks.EarlyStopping
to end the training once it was clear the model was not making progress.
View in TensorBoard¶
These models all wrote TensorBoard logs during training.
Open an embedded TensorBoard viewer inside a notebook:
#docs_infra: no_execute
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Open an embedded TensorBoard viewer
%tensorboard --logdir {logdir}/sizes
You can view the results of a previous run of this notebook on TensorBoard.dev.
TensorBoard.dev is a managed experience for hosting, tracking, and sharing ML experiments with everyone.
It’s also included in an <iframe>
for convenience:
display.IFrame(
src="https://tensorboard.dev/experiment/vW7jmmF9TmKmy3rbheMQpw/#scalars&_smoothingWeight=0.97",
width="100%", height="800px")
If you want to share TensorBoard results you can upload the logs to TensorBoard.dev by copying the following into a code-cell.
Note: This step requires a Google account.
!tensorboard dev upload --logdir {logdir}/sizes
Caution: This command does not terminate. It’s designed to continuously upload the results of long-running experiments. Once your data is uploaded you need to stop it using the “interrupt execution” option in your notebook tool.
Strategies to prevent overfitting¶
Before getting into the content of this section copy the training logs from the "Tiny"
model above, to use as a baseline for comparison.
shutil.rmtree(logdir/'regularizers/Tiny', ignore_errors=True)
shutil.copytree(logdir/'sizes/Tiny', logdir/'regularizers/Tiny')
PosixPath('/tmp/tmpt5wb_tsk/tensorboard_logs/regularizers/Tiny')
regularizer_histories = {}
regularizer_histories['Tiny'] = size_histories['Tiny']
Add weight regularization¶
You may be familiar with Occam’s Razor principle: given two explanations for something, the explanation most likely to be correct is the “simplest” one, the one that makes the least amount of assumptions. This also applies to the models learned by neural networks: given some training data and a network architecture, there are multiple sets of weights values (multiple models) that could explain the data, and simpler models are less likely to overfit than complex ones.
A “simple model” in this context is a model where the distribution of parameter values has less entropy (or a model with fewer parameters altogether, as demonstrated in the section above). Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weights only to take small values, which makes the distribution of weight values more “regular”. This is called “weight regularization”, and it is done by adding to the loss function of the network a cost associated with having large weights. This cost comes in two flavors:
L1 regularization, where the cost added is proportional to the absolute value of the weights coefficients (i.e. to what is called the “L1 norm” of the weights).
L2 regularization, where the cost added is proportional to the square of the value of the weights coefficients (i.e. to what is called the squared “L2 norm” of the weights). L2 regularization is also called weight decay in the context of neural networks. Don’t let the different name confuse you: weight decay is mathematically the exact same as L2 regularization.
L1 regularization pushes weights towards exactly zero, encouraging a sparse model. L2 regularization will penalize the weights parameters without making them sparse since the penalty goes to zero for small weights—one reason why L2 is more common.
In tf.keras
, weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Add L2 weight regularization:
l2_model = tf.keras.Sequential([
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001),
input_shape=(FEATURES,)),
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(512, activation='elu',
kernel_regularizer=regularizers.l2(0.001)),
layers.Dense(1)
])
regularizer_histories['l2'] = compile_and_fit(l2_model, "regularizers/l2")
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_22 (Dense) (None, 512) 14848
dense_23 (Dense) (None, 512) 262656
dense_24 (Dense) (None, 512) 262656
dense_25 (Dense) (None, 512) 262656
dense_26 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.5091, binary_crossentropy:0.8260, loss:2.3510, val_accuracy:0.4620, val_binary_crossentropy:0.7253, val_loss:2.1769,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:0.6579, binary_crossentropy:0.5974, loss:0.6198, val_accuracy:0.6440, val_binary_crossentropy:0.5902, val_loss:0.6126,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:0.6784, binary_crossentropy:0.5840, loss:0.6071, val_accuracy:0.6560, val_binary_crossentropy:0.5865, val_loss:0.6098,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 300, accuracy:0.6804, binary_crossentropy:0.5726, loss:0.5946, val_accuracy:0.6620, val_binary_crossentropy:0.5803, val_loss:0.6022,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 400, accuracy:0.6882, binary_crossentropy:0.5658, loss:0.5899, val_accuracy:0.6950, val_binary_crossentropy:0.5791, val_loss:0.6032,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 500, accuracy:0.6920, binary_crossentropy:0.5629, loss:0.5881, val_accuracy:0.6840, val_binary_crossentropy:0.5796, val_loss:0.6048,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 600, accuracy:0.6990, binary_crossentropy:0.5513, loss:0.5797, val_accuracy:0.6860, val_binary_crossentropy:0.5772, val_loss:0.6055,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 700, accuracy:0.7068, binary_crossentropy:0.5456, loss:0.5754, val_accuracy:0.6920, val_binary_crossentropy:0.5734, val_loss:0.6031,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 800, accuracy:0.7142, binary_crossentropy:0.5349, loss:0.5648, val_accuracy:0.6910, val_binary_crossentropy:0.5796, val_loss:0.6095,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 900, accuracy:0.7179, binary_crossentropy:0.5303, loss:0.5610, val_accuracy:0.6710, val_binary_crossentropy:0.5826, val_loss:0.6132,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
l2(0.001)
means that every coefficient in the weight matrix of the layer will add 0.001 * weight_coefficient_value**2
to the total loss of the network.
That is why we’re monitoring the binary_crossentropy
directly. Because it doesn’t have this regularization component mixed in.
So, that same "Large"
model with an L2
regularization penalty performs much better:
plotter.plot(regularizer_histories)
plt.ylim([0.5, 0.7])
(0.5, 0.7)
As demonstrated in the diagram above, the "L2"
regularized model is now much more competitive with the "Tiny"
model. This "L2"
model is also much more resistant to overfitting than the "Large"
model it was based on despite having the same number of parameters.
More info¶
There are two important things to note about this sort of regularization:
If you are writing your own training loop, then you need to be sure to ask the model for its regularization losses.
result = l2_model(features)
regularization_loss=tf.add_n(l2_model.losses)
This implementation works by adding the weight penalties to the model’s loss, and then applying a standard optimization procedure after that.
There is a second approach that instead only runs the optimizer on the raw loss, and then while applying the calculated step the optimizer also applies some weight decay. This “decoupled weight decay” is used in optimizers like tf.keras.optimizers.Ftrl
and tfa.optimizers.AdamW
.
Add dropout¶
Dropout is one of the most effective and most commonly used regularization techniques for neural networks, developed by Hinton and his students at the University of Toronto.
The intuitive explanation for dropout is that because individual nodes in the network cannot rely on the output of the others, each node must output features that are useful on their own.
Dropout, applied to a layer, consists of randomly “dropping out” (i.e. set to zero) a number of output features of the layer during training. For example, a given layer would normally have returned a vector [0.2, 0.5, 1.3, 0.8, 1.1]
for a given input sample during training; after applying dropout, this vector will have a few zero entries distributed at random, e.g. [0, 0.5, 1.3, 0, 1.1]
.
The “dropout rate” is the fraction of the features that are being zeroed-out; it is usually set between 0.2 and 0.5. At test time, no units are dropped out, and instead the layer’s output values are scaled down by a factor equal to the dropout rate, so as to balance for the fact that more units are active than at training time.
In Keras, you can introduce dropout in a network via the tf.keras.layers.Dropout
layer, which gets applied to the output of layer right before.
Add two dropout layers to your network to check how well they do at reducing overfitting:
dropout_model = tf.keras.Sequential([
layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, activation='elu'),
layers.Dropout(0.5),
layers.Dense(1)
])
regularizer_histories['dropout'] = compile_and_fit(dropout_model, "regularizers/dropout")
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_27 (Dense) (None, 512) 14848
dropout (Dropout) (None, 512) 0
dense_28 (Dense) (None, 512) 262656
dropout_1 (Dropout) (None, 512) 0
dense_29 (Dense) (None, 512) 262656
dropout_2 (Dropout) (None, 512) 0
dense_30 (Dense) (None, 512) 262656
dropout_3 (Dropout) (None, 512) 0
dense_31 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.5018, binary_crossentropy:0.8187, loss:0.8187, val_accuracy:0.5720, val_binary_crossentropy:0.6890, val_loss:0.6890,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:0.6568, binary_crossentropy:0.5966, loss:0.5966, val_accuracy:0.6820, val_binary_crossentropy:0.5713, val_loss:0.5713,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:0.6929, binary_crossentropy:0.5539, loss:0.5539, val_accuracy:0.6910, val_binary_crossentropy:0.5774, val_loss:0.5774,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 300, accuracy:0.7214, binary_crossentropy:0.5114, loss:0.5114, val_accuracy:0.6860, val_binary_crossentropy:0.5982, val_loss:0.5982,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
plotter.plot(regularizer_histories)
plt.ylim([0.5, 0.7])
(0.5, 0.7)
It’s clear from this plot that both of these regularization approaches improve the behavior of the "Large"
model. But this still doesn’t beat even the "Tiny"
baseline.
Next try them both, together, and see if that does better.
Combined L2 + dropout¶
combined_model = tf.keras.Sequential([
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu', input_shape=(FEATURES,)),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001),
activation='elu'),
layers.Dropout(0.5),
layers.Dense(1)
])
regularizer_histories['combined'] = compile_and_fit(combined_model, "regularizers/combined")
Model: "sequential_10"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_32 (Dense) (None, 512) 14848
dropout_4 (Dropout) (None, 512) 0
dense_33 (Dense) (None, 512) 262656
dropout_5 (Dropout) (None, 512) 0
dense_34 (Dense) (None, 512) 262656
dropout_6 (Dropout) (None, 512) 0
dense_35 (Dense) (None, 512) 262656
dropout_7 (Dropout) (None, 512) 0
dense_36 (Dense) (None, 1) 513
=================================================================
Total params: 803,329
Trainable params: 803,329
Non-trainable params: 0
_________________________________________________________________
Epoch: 0, accuracy:0.4995, binary_crossentropy:0.8109, loss:0.9692, val_accuracy:0.5220, val_binary_crossentropy:0.6790, val_loss:0.8366,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 100, accuracy:0.6497, binary_crossentropy:0.6062, loss:0.6349, val_accuracy:0.6440, val_binary_crossentropy:0.5934, val_loss:0.6219,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 200, accuracy:0.6647, binary_crossentropy:0.5905, loss:0.6159, val_accuracy:0.6830, val_binary_crossentropy:0.5816, val_loss:0.6070,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 300, accuracy:0.6718, binary_crossentropy:0.5834, loss:0.6108, val_accuracy:0.6800, val_binary_crossentropy:0.5613, val_loss:0.5888,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 400, accuracy:0.6796, binary_crossentropy:0.5778, loss:0.6073, val_accuracy:0.6740, val_binary_crossentropy:0.5608, val_loss:0.5903,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 500, accuracy:0.6832, binary_crossentropy:0.5706, loss:0.6025, val_accuracy:0.6830, val_binary_crossentropy:0.5558, val_loss:0.5877,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 600, accuracy:0.6856, binary_crossentropy:0.5664, loss:0.6000, val_accuracy:0.6870, val_binary_crossentropy:0.5464, val_loss:0.5799,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 700, accuracy:0.6886, binary_crossentropy:0.5611, loss:0.5965, val_accuracy:0.6890, val_binary_crossentropy:0.5451, val_loss:0.5805,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Epoch: 800, accuracy:0.6895, binary_crossentropy:0.5560, loss:0.5926, val_accuracy:0.6880, val_binary_crossentropy:0.5453, val_loss:0.5819,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
plotter.plot(regularizer_histories)
plt.ylim([0.5, 0.7])
(0.5, 0.7)
This model with the "Combined"
regularization is obviously the best one so far.
View in TensorBoard¶
These models also recorded TensorBoard logs.
To open an embedded tensorboard viewer inside a notebook, copy the following into a code-cell:
%tensorboard --logdir {logdir}/regularizers
Takeaway¶
To recap, here are the most common ways to prevent overfitting in neural networks:
Get more training data.
Reduce the capacity of the network.
Add weight regularization.
Add dropout.
Two important approaches not covered in this guide are:
Data augmentation
Batch normalization (
tf.keras.layers.BatchNormalization
)
DO IT YOURSELF 2:¶
Modify any example above with a batch normalization layer. Prepare it as a seperate pyton script. Run it in Carbon
Save and load models¶
Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training times. Saving also means you can share your model and others can recreate your work. When publishing research models and techniques, most machine learning practitioners share:
code to create the model, and
the trained weights, or parameters, for the model
Sharing this data helps others understand how the model works and try it themselves with new data.
Caution: TensorFlow models are code and it is important to be careful with untrusted code. See Using TensorFlow Securely for details.
Options¶
There are different ways to save TensorFlow models depending on the API you’re using. This guide uses tf.keras—a high-level API to build and train models in TensorFlow. For other approaches, refer to the Using the SavedModel format guide and the Save and load Keras models guide.
Setup¶
Installs and imports¶
Install and import TensorFlow and dependencies:
!pip install pyyaml h5py # Required to save models in HDF5 format
/bin/bash: /home/obm/Prog/miniconda3/envs/qml/lib/libtinfo.so.6: no version information available (required by /bin/bash)
Requirement already satisfied: pyyaml in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (6.0)
Requirement already satisfied: h5py in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (3.7.0)
Requirement already satisfied: numpy>=1.14.5 in /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages (from h5py) (1.23.4)
import os
import tensorflow as tf
from tensorflow import keras
print(tf.version.VERSION)
2.10.0
Get an example dataset¶
To demonstrate how to save and load weights, you’ll use the MNIST dataset. To speed up these runs, use the first 1000 examples:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_labels = train_labels[:1000]
test_labels = test_labels[:1000]
train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0
Define a model¶
Start by building a simple sequential model:
# Define a simple sequential model
def create_model():
model = tf.keras.Sequential([
keras.layers.Dense(512, activation='relu', input_shape=(784,)),
keras.layers.Dropout(0.2),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
return model
# Create a basic model instance
model = create_model()
# Display the model's architecture
model.summary()
Model: "sequential_11"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_37 (Dense) (None, 512) 401920
dropout_8 (Dropout) (None, 512) 0
dense_38 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
Save checkpoints during training¶
You can use a trained model without having to retrain it, or pick-up training where you left off in case the training process was interrupted. The tf.keras.callbacks.ModelCheckpoint
callback allows you to continually save the model both during and at the end of training.
Checkpoint callback usage¶
Create a tf.keras.callbacks.ModelCheckpoint
callback that saves weights only during training:
checkpoint_path = "training_1/cp.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
# Create a callback that saves the model's weights
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
save_weights_only=True,
verbose=1)
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=10,
validation_data=(test_images, test_labels),
callbacks=[cp_callback]) # Pass callback to training
# This may generate warnings related to saving the state of the optimizer.
# These warnings (and similar warnings throughout this notebook)
# are in place to discourage outdated usage, and can be ignored.
Epoch 1/10
1/32 [..............................] - ETA: 5s - loss: 2.3800 - sparse_categorical_accuracy: 0.0000e+00
32/32 [==============================] - ETA: 0s - loss: 1.1511 - sparse_categorical_accuracy: 0.6790
Epoch 1: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 7ms/step - loss: 1.1511 - sparse_categorical_accuracy: 0.6790 - val_loss: 0.7251 - val_sparse_categorical_accuracy: 0.7710
Epoch 2/10
1/32 [..............................] - ETA: 0s - loss: 0.6371 - sparse_categorical_accuracy: 0.8125
Epoch 2: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.4274 - sparse_categorical_accuracy: 0.8780 - val_loss: 0.5376 - val_sparse_categorical_accuracy: 0.8400
Epoch 3/10
1/32 [..............................] - ETA: 0s - loss: 0.1832 - sparse_categorical_accuracy: 0.9688
30/32 [===========================>..] - ETA: 0s - loss: 0.2894 - sparse_categorical_accuracy: 0.9271
Epoch 3: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 5ms/step - loss: 0.2879 - sparse_categorical_accuracy: 0.9250 - val_loss: 0.4587 - val_sparse_categorical_accuracy: 0.8570
Epoch 4/10
1/32 [..............................] - ETA: 0s - loss: 0.1983 - sparse_categorical_accuracy: 0.9375
30/32 [===========================>..] - ETA: 0s - loss: 0.2148 - sparse_categorical_accuracy: 0.9510
Epoch 4: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.2094 - sparse_categorical_accuracy: 0.9530 - val_loss: 0.4463 - val_sparse_categorical_accuracy: 0.8620
Epoch 5/10
1/32 [..............................] - ETA: 0s - loss: 0.2585 - sparse_categorical_accuracy: 0.9375
Epoch 5: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.1544 - sparse_categorical_accuracy: 0.9680 - val_loss: 0.4129 - val_sparse_categorical_accuracy: 0.8660
Epoch 6/10
1/32 [..............................] - ETA: 0s - loss: 0.0530 - sparse_categorical_accuracy: 1.0000
Epoch 6: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.1133 - sparse_categorical_accuracy: 0.9760 - val_loss: 0.4269 - val_sparse_categorical_accuracy: 0.8630
Epoch 7/10
1/32 [..............................] - ETA: 0s - loss: 0.1295 - sparse_categorical_accuracy: 1.0000
Epoch 7: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 5ms/step - loss: 0.0943 - sparse_categorical_accuracy: 0.9880 - val_loss: 0.4148 - val_sparse_categorical_accuracy: 0.8640
Epoch 8/10
1/32 [..............................] - ETA: 0s - loss: 0.0677 - sparse_categorical_accuracy: 1.0000
28/32 [=========================>....] - ETA: 0s - loss: 0.0707 - sparse_categorical_accuracy: 0.9922
Epoch 8: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.0700 - sparse_categorical_accuracy: 0.9920 - val_loss: 0.4009 - val_sparse_categorical_accuracy: 0.8710
Epoch 9/10
1/32 [..............................] - ETA: 0s - loss: 0.0380 - sparse_categorical_accuracy: 1.0000
Epoch 9: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 3ms/step - loss: 0.0508 - sparse_categorical_accuracy: 0.9950 - val_loss: 0.4181 - val_sparse_categorical_accuracy: 0.8670
Epoch 10/10
1/32 [..............................] - ETA: 0s - loss: 0.0248 - sparse_categorical_accuracy: 1.0000
Epoch 10: saving model to training_1/cp.ckpt
32/32 [==============================] - 0s 4ms/step - loss: 0.0389 - sparse_categorical_accuracy: 0.9990 - val_loss: 0.4115 - val_sparse_categorical_accuracy: 0.8670
<keras.callbacks.History at 0x7f8a47f8e2b0>
This creates a single collection of TensorFlow checkpoint files that are updated at the end of each epoch:
os.listdir(checkpoint_dir)
['cp.ckpt.index', 'checkpoint', 'cp.ckpt.data-00000-of-00001']
As long as two models share the same architecture you can share weights between them. So, when restoring a model from weights-only, create a model with the same architecture as the original model and then set its weights.
Now rebuild a fresh, untrained model and evaluate it on the test set. An untrained model will perform at chance levels (~10% accuracy):
# Create a basic model instance
model = create_model()
# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Untrained model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 2.3659 - sparse_categorical_accuracy: 0.1080 - 106ms/epoch - 3ms/step
Untrained model, accuracy: 10.80%
Then load the weights from the checkpoint and re-evaluate:
# Loads the weights
model.load_weights(checkpoint_path)
# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 0.4115 - sparse_categorical_accuracy: 0.8670 - 43ms/epoch - 1ms/step
Restored model, accuracy: 86.70%
Checkpoint callback options¶
The callback provides several options to provide unique names for checkpoints and adjust the checkpointing frequency.
Train a new model, and save uniquely named checkpoints once every five epochs:
# Include the epoch in the file name (uses `str.format`)
checkpoint_path = "training_2/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
batch_size = 32
# Create a callback that saves the model's weights every 5 epochs
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=5*batch_size)
# Create a new model instance
model = create_model()
# Save the weights using the `checkpoint_path` format
model.save_weights(checkpoint_path.format(epoch=0))
# Train the model with the new callback
model.fit(train_images,
train_labels,
epochs=50,
batch_size=batch_size,
callbacks=[cp_callback],
validation_data=(test_images, test_labels),
verbose=0)
Epoch 5: saving model to training_2/cp-0005.ckpt
Epoch 10: saving model to training_2/cp-0010.ckpt
Epoch 15: saving model to training_2/cp-0015.ckpt
Epoch 20: saving model to training_2/cp-0020.ckpt
Epoch 25: saving model to training_2/cp-0025.ckpt
Epoch 30: saving model to training_2/cp-0030.ckpt
Epoch 35: saving model to training_2/cp-0035.ckpt
Epoch 40: saving model to training_2/cp-0040.ckpt
Epoch 45: saving model to training_2/cp-0045.ckpt
Epoch 50: saving model to training_2/cp-0050.ckpt
<keras.callbacks.History at 0x7f8acce3f3d0>
Now, review the resulting checkpoints and choose the latest one:
os.listdir(checkpoint_dir)
['cp-0035.ckpt.index',
'cp-0050.ckpt.index',
'cp-0000.ckpt.index',
'cp-0025.ckpt.data-00000-of-00001',
'cp-0030.ckpt.index',
'cp-0000.ckpt.data-00000-of-00001',
'cp-0025.ckpt.index',
'cp-0045.ckpt.data-00000-of-00001',
'checkpoint',
'cp-0015.ckpt.data-00000-of-00001',
'cp-0030.ckpt.data-00000-of-00001',
'cp-0015.ckpt.index',
'cp-0020.ckpt.data-00000-of-00001',
'cp-0045.ckpt.index',
'cp-0020.ckpt.index',
'cp-0050.ckpt.data-00000-of-00001',
'cp-0035.ckpt.data-00000-of-00001',
'cp-0005.ckpt.index',
'cp-0010.ckpt.index',
'cp-0040.ckpt.index',
'cp-0040.ckpt.data-00000-of-00001',
'cp-0010.ckpt.data-00000-of-00001',
'cp-0005.ckpt.data-00000-of-00001']
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
'training_2/cp-0050.ckpt'
Note: The default TensorFlow format only saves the 5 most recent checkpoints.
To test, reset the model, and load the latest checkpoint:
# Create a new model instance
model = create_model()
# Load the previously saved weights
model.load_weights(latest)
# Re-evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
32/32 - 0s - loss: 0.4811 - sparse_categorical_accuracy: 0.8790 - 97ms/epoch - 3ms/step
Restored model, accuracy: 87.90%
What are these files?¶
The above code stores the weights to a collection of checkpoint-formatted files that contain only the trained weights in a binary format. Checkpoints contain:
One or more shards that contain your model’s weights.
An index file that indicates which weights are stored in which shard.
If you are training a model on a single machine, you’ll have one shard with the suffix: .data-00000-of-00001
Manually save weights¶
To save weights manually, use tf.keras.Model.save_weights
. By default, tf.keras
—and the Model.save_weights
method in particular—uses the TensorFlow Checkpoint format with a .ckpt
extension. To save in the HDF5 format with a .h5
extension, refer to the Save and load models guide.
# Save the weights
model.save_weights('./checkpoints/my_checkpoint')
# Create a new model instance
model = create_model()
# Restore the weights
model.load_weights('./checkpoints/my_checkpoint')
# Evaluate the model
loss, acc = model.evaluate(test_images, test_labels, verbose=2)
print("Restored model, accuracy: {:5.2f}%".format(100 * acc))
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.iter
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.decay
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.learning_rate
32/32 - 0s - loss: 0.4811 - sparse_categorical_accuracy: 0.8790 - 97ms/epoch - 3ms/step
Restored model, accuracy: 87.90%
Save the entire model¶
Call tf.keras.Model.save
to save a model’s architecture, weights, and training configuration in a single file/folder
. This allows you to export a model so it can be used without access to the original Python code*. Since the optimizer-state is recovered, you can resume training from exactly where you left off.
An entire model can be saved in two different file formats (SavedModel
and HDF5
). The TensorFlow SavedModel
format is the default file format in TF2.x. However, models can be saved in HDF5
format. More details on saving entire models in the two file formats is described below.
Saving a fully-functional model is very useful—you can load them in TensorFlow.js (Saved Model, HDF5) and then train and run them in web browsers, or convert them to run on mobile devices using TensorFlow Lite (Saved Model, HDF5)
*Custom objects (for example, subclassed models or layers) require special attention when saving and loading. Refer to the Saving custom objects section below.
SavedModel format¶
The SavedModel format is another way to serialize models. Models saved in this format can be restored using tf.keras.models.load_model
and are compatible with TensorFlow Serving. The SavedModel guide goes into detail about how to serve/inspect
the SavedModel. The section below illustrates the steps to save and restore the model.
# Create and train a new model instance.
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# Save the entire model as a SavedModel.
!mkdir -p saved_model
model.save('saved_model/my_model')
Epoch 1/5
1/32 [..............................] - ETA: 5s - loss: 2.3582 - sparse_categorical_accuracy: 0.0938
32/32 [==============================] - 0s 1ms/step - loss: 1.1544 - sparse_categorical_accuracy: 0.6850
Epoch 2/5
1/32 [..............................] - ETA: 0s - loss: 0.5347 - sparse_categorical_accuracy: 0.9062
29/32 [==========================>...] - ETA: 0s - loss: 0.4139 - sparse_categorical_accuracy: 0.8869
32/32 [==============================] - 0s 2ms/step - loss: 0.4138 - sparse_categorical_accuracy: 0.8860
Epoch 3/5
1/32 [..............................] - ETA: 0s - loss: 0.3314 - sparse_categorical_accuracy: 0.9375
30/32 [===========================>..] - ETA: 0s - loss: 0.2936 - sparse_categorical_accuracy: 0.9146
32/32 [==============================] - 0s 2ms/step - loss: 0.2937 - sparse_categorical_accuracy: 0.9140
Epoch 4/5
1/32 [..............................] - ETA: 0s - loss: 0.1980 - sparse_categorical_accuracy: 0.9688
29/32 [==========================>...] - ETA: 0s - loss: 0.1998 - sparse_categorical_accuracy: 0.9547
32/32 [==============================] - 0s 2ms/step - loss: 0.2069 - sparse_categorical_accuracy: 0.9520
Epoch 5/5
1/32 [..............................] - ETA: 0s - loss: 0.3500 - sparse_categorical_accuracy: 0.9375
29/32 [==========================>...] - ETA: 0s - loss: 0.1576 - sparse_categorical_accuracy: 0.9709
32/32 [==============================] - 0s 2ms/step - loss: 0.1538 - sparse_categorical_accuracy: 0.9720
/bin/bash: /home/obm/Prog/miniconda3/envs/qml/lib/libtinfo.so.6: no version information available (required by /bin/bash)
INFO:tensorflow:Assets written to: saved_model/my_model/assets
The SavedModel format is a directory containing a protobuf binary and a TensorFlow checkpoint. Inspect the saved model directory:
# my_model directory
!ls saved_model
# Contains an assets folder, saved_model.pb, and variables folder.
!ls saved_model/my_model
/bin/bash: /home/obm/Prog/miniconda3/envs/qml/lib/libtinfo.so.6: no version information available (required by /bin/bash)
my_model
/bin/bash: /home/obm/Prog/miniconda3/envs/qml/lib/libtinfo.so.6: no version information available (required by /bin/bash)
assets keras_metadata.pb saved_model.pb variables
Reload a fresh Keras model from the saved model:
new_model = tf.keras.models.load_model('saved_model/my_model')
# Check its architecture
new_model.summary()
Model: "sequential_16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_47 (Dense) (None, 512) 401920
dropout_13 (Dropout) (None, 512) 0
dense_48 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
The restored model is compiled with the same arguments as the original model. Try running evaluate and predict with the loaded model:
# Evaluate the restored model
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))
print(new_model.predict(test_images).shape)
32/32 - 0s - loss: 0.4337 - sparse_categorical_accuracy: 0.8570 - 96ms/epoch - 3ms/step
Restored model, accuracy: 85.70%
1/32 [..............................] - ETA: 0s
32/32 [==============================] - 0s 668us/step
(1000, 10)
HDF5 format¶
Keras provides a basic save format using the HDF5 standard.
# Create and train a new model instance.
model = create_model()
model.fit(train_images, train_labels, epochs=5)
# Save the entire model to a HDF5 file.
# The '.h5' extension indicates that the model should be saved to HDF5.
model.save('my_model.h5')
Epoch 1/5
1/32 [..............................] - ETA: 5s - loss: 2.3057 - sparse_categorical_accuracy: 0.0938
32/32 [==============================] - 0s 1ms/step - loss: 1.1472 - sparse_categorical_accuracy: 0.6820
Epoch 2/5
1/32 [..............................] - ETA: 0s - loss: 0.4974 - sparse_categorical_accuracy: 0.8750
32/32 [==============================] - 0s 1ms/step - loss: 0.4142 - sparse_categorical_accuracy: 0.8860
Epoch 3/5
1/32 [..............................] - ETA: 0s - loss: 0.2761 - sparse_categorical_accuracy: 0.8438
31/32 [============================>.] - ETA: 0s - loss: 0.2756 - sparse_categorical_accuracy: 0.9234
32/32 [==============================] - 0s 2ms/step - loss: 0.2734 - sparse_categorical_accuracy: 0.9240
Epoch 4/5
1/32 [..............................] - ETA: 0s - loss: 0.3729 - sparse_categorical_accuracy: 0.9062
29/32 [==========================>...] - ETA: 0s - loss: 0.1999 - sparse_categorical_accuracy: 0.9515
32/32 [==============================] - 0s 2ms/step - loss: 0.1980 - sparse_categorical_accuracy: 0.9520
Epoch 5/5
1/32 [..............................] - ETA: 0s - loss: 0.1235 - sparse_categorical_accuracy: 1.0000
30/32 [===========================>..] - ETA: 0s - loss: 0.1456 - sparse_categorical_accuracy: 0.9719
32/32 [==============================] - 0s 2ms/step - loss: 0.1455 - sparse_categorical_accuracy: 0.9720
Now, recreate the model from that file:
# Recreate the exact same model, including its weights and the optimizer
new_model = tf.keras.models.load_model('my_model.h5')
# Show the model architecture
new_model.summary()
Model: "sequential_17"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_49 (Dense) (None, 512) 401920
dropout_14 (Dropout) (None, 512) 0
dense_50 (Dense) (None, 10) 5130
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
Check its accuracy:
loss, acc = new_model.evaluate(test_images, test_labels, verbose=2)
print('Restored model, accuracy: {:5.2f}%'.format(100 * acc))
32/32 - 0s - loss: 0.4245 - sparse_categorical_accuracy: 0.8620 - 96ms/epoch - 3ms/step
Restored model, accuracy: 86.20%
Keras saves models by inspecting their architectures. This technique saves everything:
The weight values
The model’s architecture
The model’s training configuration (what you pass to the
.compile()
method)The optimizer and its state, if any (this enables you to restart training where you left off)
Keras is not able to save the v1.x
optimizers (from tf.compat.v1.train
) since they aren’t compatible with checkpoints. For v1.x optimizers, you need to re-compile the model after loading—losing the state of the optimizer.
Saving custom objects¶
If you are using the SavedModel format, you can skip this section. The key difference between HDF5 and SavedModel is that HDF5 uses object configs to save the model architecture, while SavedModel saves the execution graph. Thus, SavedModels are able to save custom objects like subclassed models and custom layers without requiring the original code.
To save custom objects to HDF5, you must do the following:
Define a
get_config
method in your object, and optionally afrom_config
classmethod.
get_config(self)
returns a JSON-serializable dictionary of parameters needed to recreate the object.from_config(cls, config)
uses the returned config fromget_config
to create a new object. By default, this function will use the config as initialization kwargs (return cls(**config)
).
Pass the object to the
custom_objects
argument when loading the model. The argument must be a dictionary mapping the string class name to the Python class. E.g.tf.keras.models.load_model(path, custom_objects={'CustomLayer': CustomLayer})
Refer to the Writing layers and models from scratch tutorial for examples of custom objects and get_config
.
TFP Probabilistic Layers: Regression¶
In this example we show how to fit regression models using TFP’s “probabilistic layers.”
Dependencies & Prerequisites¶
from pprint import pprint
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import tensorflow.compat.v2 as tf
tf.enable_v2_behavior()
import tensorflow_probability as tfp
sns.reset_defaults()
#sns.set_style('whitegrid')
#sns.set_context('talk')
sns.set_context(context='talk',font_scale=0.7)
%matplotlib inline
tfd = tfp.distributions
WARNING:tensorflow:Detecting that an object or model or tf.train.Checkpoint is being deleted with unrestored values. See the following logs for the specific values in question. To silence these warnings, use `status.expect_partial()`. See https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restorefor details about the status object returned by the restore function.
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.iter
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_1
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.beta_2
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.decay
WARNING:tensorflow:Value in checkpoint could not be found in the restored object: (root).optimizer.learning_rate
Make things Fast!¶
Before we dive in, let’s make sure we’re using a GPU for this demo.
To do this, select “Runtime” -> “Change runtime type” -> “Hardware accelerator” -> “GPU”.
The following snippet will verify that we have access to a GPU.
if tf.test.gpu_device_name() != '/device:GPU:0':
print('WARNING: GPU device not found.')
else:
print('SUCCESS: Found GPU: {}'.format(tf.test.gpu_device_name()))
SUCCESS: Found GPU: /device:GPU:0
2022-10-27 12:55:33.922075: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.922294: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.922438: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.922630: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.922781: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.922903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 6911 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5
2022-10-27 12:55:33.923030: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.923180: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.923321: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.923482: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.923628: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:980] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-10-27 12:55:33.923746: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /device:GPU:0 with 6911 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:21:00.0, compute capability: 7.5
Note: if for some reason you cannot access a GPU, this colab will still work. (Training will just take longer.)
Motivation¶
Wouldn’t it be great if we could use TFP to specify a probabilistic model then simply minimize the negative log-likelihood, i.e.,
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
Well not only is it possible, but this colab shows how! (In context of linear regression problems.)
#@title Synthesize dataset.
w0 = 0.125
b0 = 5.
x_range = [-20, 60]
def load_dataset(n=150, n_tst=150):
np.random.seed(43)
def s(x):
g = (x - x_range[0]) / (x_range[1] - x_range[0])
return 3 * (0.25 + g**2.)
x = (x_range[1] - x_range[0]) * np.random.rand(n) + x_range[0]
eps = np.random.randn(n) * s(x)
y = (w0 * x * (1. + np.sin(x)) + b0) + eps
x = x[..., np.newaxis]
x_tst = np.linspace(*x_range, num=n_tst).astype(np.float32)
x_tst = x_tst[..., np.newaxis]
return y, x, x_tst
y, x, x_tst = load_dataset()
Case 1: No Uncertainty¶
# Build model.
model = tf.keras.Sequential([
tf.keras.layers.Dense(1),
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1000, verbose=False);
# Profit.
[print(np.squeeze(w.numpy())) for w in model.weights];
yhat = model(x_tst)
assert isinstance(yhat, tfd.Distribution)
0.13465934
5.133879
#@title Figure 1: No uncertainty.
w = np.squeeze(model.layers[-2].kernel.numpy())
b = np.squeeze(model.layers[-2].bias.numpy())
plt.figure(figsize=[6, 1.5]) # inches
#plt.figure(figsize=[8, 5]) # inches
plt.plot(x, y, 'b.', label='observed');
plt.plot(x_tst, yhat.mean(),'r', label='mean', linewidth=4);
plt.ylim(-0.,17);
plt.yticks(np.linspace(0, 15, 4)[1:]);
plt.xticks(np.linspace(*x_range, num=9));
ax=plt.gca();
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_smart_bounds(True)
#ax.spines['bottom'].set_smart_bounds(True)
plt.legend(loc='center left', fancybox=True, framealpha=0., bbox_to_anchor=(1.05, 0.5))
plt.savefig('/tmp/fig1.png', bbox_inches='tight', dpi=300)
Case 2: Aleatoric Uncertainty¶
# Build model.
model = tf.keras.Sequential([
tf.keras.layers.Dense(1 + 1),
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.05 * t[...,1:]))),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1000, verbose=False);
# Profit.
[print(np.squeeze(w.numpy())) for w in model.weights];
yhat = model(x_tst)
assert isinstance(yhat, tfd.Distribution)
[0.126 0.966]
[5.191 6.73 ]
#@title Figure 2: Aleatoric Uncertainty
plt.figure(figsize=[6, 1.5]) # inches
plt.plot(x, y, 'b.', label='observed');
m = yhat.mean()
s = yhat.stddev()
plt.plot(x_tst, m, 'r', linewidth=4, label='mean');
plt.plot(x_tst, m + 2 * s, 'g', linewidth=2, label=r'mean + 2 stddev');
plt.plot(x_tst, m - 2 * s, 'g', linewidth=2, label=r'mean - 2 stddev');
plt.ylim(-0.,17);
plt.yticks(np.linspace(0, 15, 4)[1:]);
plt.xticks(np.linspace(*x_range, num=9));
ax=plt.gca();
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_smart_bounds(True)
#ax.spines['bottom'].set_smart_bounds(True)
plt.legend(loc='center left', fancybox=True, framealpha=0., bbox_to_anchor=(1.05, 0.5))
plt.savefig('/tmp/fig2.png', bbox_inches='tight', dpi=300)
Case 3: Epistemic Uncertainty¶
# Specify the surrogate posterior over `keras.layers.Dense` `kernel` and `bias`.
def posterior_mean_field(kernel_size, bias_size=0, dtype=None):
n = kernel_size + bias_size
c = np.log(np.expm1(1.))
return tf.keras.Sequential([
tfp.layers.VariableLayer(2 * n, dtype=dtype),
tfp.layers.DistributionLambda(lambda t: tfd.Independent(
tfd.Normal(loc=t[..., :n],
scale=1e-5 + tf.nn.softplus(c + t[..., n:])),
reinterpreted_batch_ndims=1)),
])
# Specify the prior over `keras.layers.Dense` `kernel` and `bias`.
def prior_trainable(kernel_size, bias_size=0, dtype=None):
n = kernel_size + bias_size
return tf.keras.Sequential([
tfp.layers.VariableLayer(n, dtype=dtype),
tfp.layers.DistributionLambda(lambda t: tfd.Independent(
tfd.Normal(loc=t, scale=1),
reinterpreted_batch_ndims=1)),
])
# Build model.
model = tf.keras.Sequential([
tfp.layers.DenseVariational(1, posterior_mean_field, prior_trainable, kl_weight=1/x.shape[0]),
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1000, verbose=False);
# Profit.
[print(np.squeeze(w.numpy())) for w in model.weights];
yhat = model(x_tst)
assert isinstance(yhat, tfd.Distribution)
[ 0.139 5.136 -4.123 -2.283]
[0.137 5.122]
#@title Figure 3: Epistemic Uncertainty
plt.figure(figsize=[6, 1.5]) # inches
plt.clf();
plt.plot(x, y, 'b.', label='observed');
yhats = [model(x_tst) for _ in range(100)]
avgm = np.zeros_like(x_tst[..., 0])
for i, yhat in enumerate(yhats):
m = np.squeeze(yhat.mean())
s = np.squeeze(yhat.stddev())
if i < 25:
plt.plot(x_tst, m, 'r', label='ensemble means' if i == 0 else None, linewidth=0.5)
avgm += m
plt.plot(x_tst, avgm/len(yhats), 'r', label='overall mean', linewidth=4)
plt.ylim(-0.,17);
plt.yticks(np.linspace(0, 15, 4)[1:]);
plt.xticks(np.linspace(*x_range, num=9));
ax=plt.gca();
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_smart_bounds(True)
#ax.spines['bottom'].set_smart_bounds(True)
plt.legend(loc='center left', fancybox=True, framealpha=0., bbox_to_anchor=(1.05, 0.5))
plt.savefig('/tmp/fig3.png', bbox_inches='tight', dpi=300)
Case 4: Aleatoric & Epistemic Uncertainty¶
# Build model.
model = tf.keras.Sequential([
tfp.layers.DenseVariational(1 + 1, posterior_mean_field, prior_trainable, kl_weight=1/x.shape[0]),
tfp.layers.DistributionLambda(
lambda t: tfd.Normal(loc=t[..., :1],
scale=1e-3 + tf.math.softplus(0.01 * t[...,1:]))),
])
# Do inference.
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
model.fit(x, y, epochs=1000, verbose=False);
# Profit.
[print(np.squeeze(w.numpy())) for w in model.weights];
yhat = model(x_tst)
assert isinstance(yhat, tfd.Distribution)
[ 0.133 2.172 5.165 3.044 -3.117 -0.886 -2.087 -0.053]
[0.133 2.192 5.172 3.03 ]
#@title Figure 4: Both Aleatoric & Epistemic Uncertainty
plt.figure(figsize=[6, 1.5]) # inches
plt.plot(x, y, 'b.', label='observed');
yhats = [model(x_tst) for _ in range(100)]
avgm = np.zeros_like(x_tst[..., 0])
for i, yhat in enumerate(yhats):
m = np.squeeze(yhat.mean())
s = np.squeeze(yhat.stddev())
if i < 15:
plt.plot(x_tst, m, 'r', label='ensemble means' if i == 0 else None, linewidth=1.)
plt.plot(x_tst, m + 2 * s, 'g', linewidth=0.5, label='ensemble means + 2 ensemble stdev' if i == 0 else None);
plt.plot(x_tst, m - 2 * s, 'g', linewidth=0.5, label='ensemble means - 2 ensemble stdev' if i == 0 else None);
avgm += m
plt.plot(x_tst, avgm/len(yhats), 'r', label='overall mean', linewidth=4)
plt.ylim(-0.,17);
plt.yticks(np.linspace(0, 15, 4)[1:]);
plt.xticks(np.linspace(*x_range, num=9));
ax=plt.gca();
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_smart_bounds(True)
#ax.spines['bottom'].set_smart_bounds(True)
plt.legend(loc='center left', fancybox=True, framealpha=0., bbox_to_anchor=(1.05, 0.5))
plt.savefig('/tmp/fig4.png', bbox_inches='tight', dpi=300)
Case 5: Functional Uncertainty¶
#@title Custom PSD Kernel
class RBFKernelFn(tf.keras.layers.Layer):
def __init__(self, **kwargs):
super(RBFKernelFn, self).__init__(**kwargs)
dtype = kwargs.get('dtype', None)
self._amplitude = self.add_variable(
initializer=tf.constant_initializer(0),
dtype=dtype,
name='amplitude')
self._length_scale = self.add_variable(
initializer=tf.constant_initializer(0),
dtype=dtype,
name='length_scale')
def call(self, x):
# Never called -- this is just a layer so it can hold variables
# in a way Keras understands.
return x
@property
def kernel(self):
return tfp.math.psd_kernels.ExponentiatedQuadratic(
amplitude=tf.nn.softplus(0.1 * self._amplitude),
length_scale=tf.nn.softplus(5. * self._length_scale)
)
# For numeric stability, set the default floating-point dtype to float64
tf.keras.backend.set_floatx('float64')
# Build model.
num_inducing_points = 40
model = tf.keras.Sequential([
tf.keras.layers.InputLayer(input_shape=[1]),
tf.keras.layers.Dense(1, kernel_initializer='ones', use_bias=False),
tfp.layers.VariationalGaussianProcess(
num_inducing_points=num_inducing_points,
kernel_provider=RBFKernelFn(),
event_shape=[1],
inducing_index_points_initializer=tf.constant_initializer(
np.linspace(*x_range, num=num_inducing_points,
dtype=x.dtype)[..., np.newaxis]),
unconstrained_observation_noise_variance_initializer=(
tf.constant_initializer(np.array(0.54).astype(x.dtype))),
),
])
# Do inference.
batch_size = 32
loss = lambda y, rv_y: rv_y.variational_loss(
y, kl_weight=np.array(batch_size, x.dtype) / x.shape[0])
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=loss)
model.fit(x, y, batch_size=batch_size, epochs=1000, verbose=False)
# Profit.
yhat = model(x_tst)
assert isinstance(yhat, tfd.Distribution)
/tmp/ipykernel_81160/1709427333.py:7: UserWarning: `layer.add_variable` is deprecated and will be removed in a future version. Please use the `layer.add_weight()` method instead.
self._amplitude = self.add_variable(
/tmp/ipykernel_81160/1709427333.py:12: UserWarning: `layer.add_variable` is deprecated and will be removed in a future version. Please use the `layer.add_weight()` method instead.
self._length_scale = self.add_variable(
WARNING:tensorflow:From /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages/tensorflow_probability/python/distributions/distribution.py:342: calling GaussianProcess.__init__ (from tensorflow_probability.python.distributions.gaussian_process) with jitter is deprecated and will be removed after 2021-05-10.
Instructions for updating:
`jitter` is deprecated; please use `marginal_fn` directly.
/home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages/tensorflow_probability/python/distributions/gaussian_process.py:402: UserWarning: Unable to detect statically whether the number of index_points is 1. As a result, defaulting to treating the marginal GP at `index_points` as a multivariate Gaussian. This makes some methods, like `cdf` unavailable.
warnings.warn(
WARNING:tensorflow:From /home/obm/Prog/miniconda3/envs/qml/lib/python3.8/site-packages/tensorflow_probability/python/internal/auto_composite_tensor.py:97: GaussianProcess.jitter (from tensorflow_probability.python.distributions.gaussian_process) is deprecated and will be removed after 2022-02-04.
Instructions for updating:
the `jitter` property of `tfd.GaussianProcess` is deprecated; use the `marginal_fn` property instead.
2022-10-27 12:56:17.168452: I tensorflow/core/util/cuda_solvers.cc:179] Creating GpuSolver handles for stream 0x55ce4315f4c0
#@title Figure 5: Functional Uncertainty
y, x, _ = load_dataset()
plt.figure(figsize=[6, 1.5]) # inches
plt.plot(x, y, 'b.', label='observed');
num_samples = 7
for i in range(num_samples):
sample_ = yhat.sample().numpy()
plt.plot(x_tst,
sample_[..., 0].T,
'r',
linewidth=0.9,
label='ensemble means' if i == 0 else None);
plt.ylim(-0.,17);
plt.yticks(np.linspace(0, 15, 4)[1:]);
plt.xticks(np.linspace(*x_range, num=9));
ax=plt.gca();
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_smart_bounds(True)
#ax.spines['bottom'].set_smart_bounds(True)
plt.legend(loc='center left', fancybox=True, framealpha=0., bbox_to_anchor=(1.05, 0.5))
plt.savefig('/tmp/fig5.png', bbox_inches='tight', dpi=300)