EfficientNetのチュートリアル

本記事ではEfficientNetの学習・推論を行います。本記事ではImageNetで事前学習されたモデルを使用してCIFAR-10データセットで finetuning し、実際に推論までやってみます。今回はtensorflowからCIFAR-10をダウンロードします。

動作環境

実行環境は Google Colaboratry を使用します。 TPU でも GPU でも下記コードで実行することができますが、tensorflowを使用する場合、TPU の方が断然高速です。今回は TPU で実行します。

インストール

下記コードで必要なパッケージをインストールします。

!pip install tensorflow_addons

import os
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import backend as K

import tensorflow_addons as tfa
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

CIFAR-10 データセットの準備

下記コードでCIFAR-10データセットをダウンロードし、trainとvalidデータを準備します。

(x_train,y_train), (x_test,y_test) = tf.keras.datasets.cifar10.load_data()

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

train_data, validation_data = (x_train,y_train), (x_test,y_test)

dataset の作成

上記データをモデルの学習で使用するためのtf.data.Datasetに加工します。 batch_sizeは128としています。

ds_train = tf.data.Dataset.from_tensor_slices(train_data)
ds_train = ds_train.shuffle(len(train_data)).batch(128, drop_remainder=True)
ds_validation = tf.data.Dataset.from_tensor_slices(validation_data)
ds_validation = ds_validation.batch(128)

def data_preprocesssing(image, label):
  image = tf.cast(image, tf.float32)
  image = tf.keras.applications.efficientnet.preprocess_input(image)
  return image, label

ds_train = ds_train.map(lambda image, label: data_preprocesssing(image, label))
ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

ds_validation = ds_validation.map(lambda image, label: data_preprocesssing(image, label))
ds_validation = ds_validation.prefetch(tf.data.experimental.AUTOTUNE)

EfficientNet のモデル準備

今回のモデルはEfficientNetB5を使用することにします。出力層は CIFAR-10 のクラス数に合わせるため専用のレイヤーを追加します。今回はoptimizerにSGDを使用しています。下記コードで学習済みモデルを準備します。

model = tf.keras.applications.EfficientNetB5(include_top=False, input_shape=(456,456,3), weights='imagenet')

inputs = tf.keras.layers.Input(shape=(32,32,3))
x = tf.keras.layers.Lambda(lambda image: tf.image.resize(image, (456,456)), output_shape=(456,456,3), name='stem_resize')(inputs)
x = model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(0.2)(x)
x = tf.keras.layers.Dense(10)(x)
outputs = tf.keras.layers.Activation('softmax')(x)
model = tf.keras.Model(inputs, outputs)

model.compile(
  optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),
  loss=tf.keras.losses.CategoricalCrossentropy(),
  metrics=[tf.keras.metrics.CategoricalAccuracy(name='acc')]
)

学習実行

上記で準備しdatasetとモデルで学習を実行します。ただし、上記のdataset作成からの処理をtf.distribute.strategy.scope()の中で実行するように書き換えます。

(x_train,y_train), (x_test,y_test) = tf.keras.datasets.cifar10.load_data()

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

train_data, validation_data = (x_train,y_train), (x_test,y_test)

try:
    tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
    tf.config.experimental_connect_to_cluster(tpu)
    tf.tpu.experimental.initialize_tpu_system(tpu)
    strategy = tf.distribute.TPUStrategy(tpu)
except ValueError:
    tpu=None
    strategy = tf.distribute.get_strategy()

with strategy.scope():
  # dataset作成
  ds_train = tf.data.Dataset.from_tensor_slices(train_data)
  ds_train = ds_train.shuffle(len(train_data)).batch(128, drop_remainder=True)
  ds_validation = tf.data.Dataset.from_tensor_slices(validation_data)
  ds_validation = ds_validation.batch(128)

  def data_preprocesssing(image, label):
    image = tf.cast(image, tf.float32)
    image = tf.keras.applications.efficientnet.preprocess_input(image)
    return image, label

  ds_train = ds_train.map(lambda image, label: data_preprocesssing(image, label))
  ds_train = ds_train.prefetch(tf.data.experimental.AUTOTUNE)

  ds_validation = ds_validation.map(lambda image, label: data_preprocesssing(image, label))
  ds_validation = ds_validation.prefetch(tf.data.experimental.AUTOTUNE)

  # model作成
  model = tf.keras.applications.EfficientNetB5(include_top=False, input_shape=(456,456,3), weights='imagenet')

  x = tf.keras.layers.Lambda(lambda image: tf.image.resize(image, (456,456)), output_shape=(456,456,3), name='stem_resize')(x)
  x = model(x, training=True)
  x = tf.keras.layers.GlobalAveragePooling2D()(x)
  x = tf.keras.layers.Dropout(0.2)(x)
  x = tf.keras.layers.Dense(10)(x)
  outputs = tf.keras.layers.Activation('softmax')(x)
  model = tf.keras.Model(inputs, outputs)

  model.compile(
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),
    loss=tf.keras.losses.CategoricalCrossentropy(),
    metrics=[tf.keras.metrics.CategoricalAccuracy(name='acc')]
  )
  # Scheduler Config
  warmup_epochs = 10
  flat_epochs = 10
  cooldown_epochs = 10
  min_lr = 0.001
  max_lr = 0.025

  def scheduler(epoch, lr):
    if epoch < warmup_epochs:
      return min_lr + 0.5*(max_lr-min_lr)*(1.0-np.cos(epoch/warmup_epochs*np.pi))
    elif epoch < warmup_epochs+flat_epochs:
      return max_lr
    else:
      epoch = epoch - (warmup_epochs+flat_epochs) + 1
      return min_lr + 0.5*(max_lr-min_lr)*(1.0+np.cos(epoch/cooldown_epochs*np.pi))

  # 学習実行
  result = model.fit(ds_train, epochs=30, validation_data=ds_validation, callbacks=[tf.keras.callbacks.LearningRateScheduler(scheduler)])

学習結果確認

学習の結果を確認します。下記コード実行で学習結果を表示します。

history = result.history
acc = history['acc']
val_acc = history['val_acc']
max_acc = max(val_acc)

loss = history['loss']
val_loss = history['val_loss']

epochs = range(len(acc))

plt.figure(figsize=(16,6))

# Accracy
plt.subplot(1,2,1)
plt.plot(epochs, acc, 'r', label='Training')
plt.plot(epochs, val_acc, 'b', label='Validation')
plt.title('Accuracy')
plt.grid()
plt.legend()

# Loss
plt.subplot(1,2,2)
plt.plot(epochs, loss, 'r', label='Training')
plt.plot(epochs, val_loss, 'b', label='Validation')
plt.title('Loss')
if max(max(loss),max(val_loss))>10.0:
    plt.ylim(0.0,10.0)
plt.grid()
plt.legend()
plt.show()

結果は以下の通りです。

推論

上記で学習したモデルを使用して推論を実施してみます。 CIFAR-100のうち、ヒョウ・オオカミだけを抽出して推論を実施します。

_, (x_test,y_test) = tf.keras.datasets.cifar100.load_data()
x_test_2 = []
for x, y in zip(x_test, y_test):
    if y == 42 or y == 97: #42:ヒョウ, 97:オオカミ
        x_test_2.append(x)

x_test = np.array(x_test_2)
y_test = y_test[(y_test==42) | (y_test==97)]
y_test = to_categorical(y_test)
infer_data = (x_test,y_test)

with strategy.scope():
  ds_infer = tf.data.Dataset.from_tensor_slices(infer_data)
  ds_infer = ds_infer.batch(128)

  def data_preprocesssing(image, label):
    image = tf.cast(image, tf.float32)
    image = tf.keras.applications.efficientnet.preprocess_input(image)
    return image, label

  ds_infer = ds_infer.map(lambda image, label: data_preprocesssing(image, label))
  ds_infer = ds_infer.prefetch(tf.data.experimental.AUTOTUNE)
  result = model.predict(ds_infer, verbose=1)

preds = []
for r in result:
  for i, c in enumerate(r):
    if c == r.max():
        preds.append((i, c))

classes = {
  0:'airplane',
  1:'automobile',
  2:'bird',
  3:'cat',
  4:'deer',
  5:'dog',
  6:'frog',
  7:'horse',
  8:'ship',
  9:'truck'
}
for i, p in enumerate(preds[:3]):
  pil_img = Image.fromarray(x_test[i])
  plt.show()
  print(classes[p[0]], p[1])

推論した結果は以下の通りです。データセットが異なることと画質が荒いデータのため全然違うクラスが予想されているものも多いですが、分かりやすい画像は比較的ヒョウ →cat、オオカミ →dog で予測されていました。

まとめ

今回はEfficientNetの簡単なチュートリアルを説明しました。 EfficientNetは画像分類のデファクトスタンダードとなっているため使用する機会も多いと思われます。また上記コードに少し手を加えるだけで従来のモデルや改良モデルのEfficientNetV2も使用でき、 GPU インスタンスでもそのまま実行することができます。 Pytorchでも簡単に実装できますが煩雑になったため割愛しました。ぜひ試してみてください。

参考

CIFAR-10 EfficientNet Qiita 記事