Skip to content

GLambard/AdamW_Keras

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Implementation of the AdamW optimizer(Ilya Loshchilov, Frank Hutter) for Keras.

Tested on this system

  • python 3.6
  • Keras 2.1.6
  • tensorflow(-gpu) 1.8.0

Usage

Additionally to a usual Keras setup for neural nets building (see Keras for details)

from AdamW import AdamW

adamw = AdamW(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0., weight_decay=0.025, batch_size=1, samples_per_epoch=1, epochs=1)

Then nothing change compared to the usual usage of an optimizer in Keras after the definition of a model's architecture

model = Sequential()
<definition of the model_architecture>
model.compile(loss="mse", optimizer=adamw, metrics=[metrics.mse], ...)

Note that the size of a batch (batch_size), number of training samples per epoch (samples_per_epoch) and the number of epochs (epochs) are necessary to the normalization of the weight decay (paper, Section 4)

Done

  • Weight decay added to the parameters optimization
  • Normalized weight decay added

To be done (eventually - help is welcome)

  • Cosine annealing
  • Warm restarts

Source

ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION, D.P. Kingma, J. Lei Ba

Fixing Weight Decay Regularization in Adam, I. Loshchilov, F. Hutter

About

AdamW optimizer for Keras

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages