Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG Report + fix]: Earlystopping does not stop according to "_min_delta" and "_baseline" parameters #1178

Closed
GaijinOtohp opened this issue Sep 20, 2023 · 4 comments

Comments

@GaijinOtohp
Copy link
Contributor

Description

  1. [BUG Report]: Earlystopping should triger stopping the training if loss value is not improved by at least _min_delta.
  2. [Suggestion]: I also suggest that the training should stop after loss reaches _base_line which should infer to the optimal value (the threshold to the desired loss value).

Reproduction Steps

            var layers = tf.keras.layers;

            Tensor input = tf.keras.Input((-1, 1), name: "input_place_holder");
            Tensor hiddenLayer1 = layers.Dense(7, activation: "linear").Apply(input);
            Tensor output = layers.Dense(2, activation: "linear").Apply(hiddenLayer1);
            var model = tf.keras.Model(input, output);

            Tensorflow.NumPy.NDArray inputs = new Tensorflow.NumPy.NDArray(new float[,] { { -5 }, { -4 }, { -3 }, { -2 }, { -1 }, { 0 }, { 1 }, { 2 }, { 3 }, { 4 }, { 5 } });
            Tensorflow.NumPy.NDArray outputs = new Tensorflow.NumPy.NDArray(new float[,] { { 0, 2 }, { 1, 4 }, { 2, 6 }, { 3, 8 }, { 4, 10 }, { 5, 12 }, { 6, 14 }, { 7, 16 }, { 8, 18 }, { 9, 20 }, { 10, 22 } });

            List<Tensorflow.NumPy.NDArray> weights = model.get_weights();

            var callbackPars = new Tensorflow.Keras.Callbacks.CallbackParams() { Model = model };
            var earlystoppingCB = new Tensorflow.Keras.Callbacks.EarlyStopping(parameters: callbackPars,
                                                                    monitor: "loss",
                                                                    patience: 3,
                                                                    mode: "min",
                                                                    restore_best_weights: true,
                                                                    //baseline: 0.001f,
                                                                    min_delta: 0.0001f);

            Tensorflow.Keras.Engine.ICallback historyCB;
            float learningRate = 0.1f;
            for (int i = 0; i < 100; i++)
            {
                model.compile(optimizer: tf.keras.optimizers.SGD(learningRate), loss: tf.keras.losses.MeanSquaredError());

                historyCB = model.fit(inputs, outputs, epochs: 1000, callbacks: new List<Tensorflow.Keras.Engine.ICallback>() { earlystoppingCB });

                if (historyCB.history["loss"][historyCB.history["loss"].Count - 1] < 0.5f)
                    break;

                model.set_weights(weights);
                learningRate /= 10f;
            }

            Tensor results0 = model.predict(tf.constant(new float[,] { { -8 } }));

Known Workarounds

Suggested fix to the bug

  1. The fix for the bug is in line 134 in the file Tensorflow.Keras.Callbacks.EarlyStopping.cs:
    bool less_op = (monitor_value - _min_delta) < reference_value;
    ---> bool less_op = (monitor_value + _min_delta) < reference_value;

Why:

since _min_delta is always positive, and monitor_value is a slight improvement of reference_value, then (monitor_value - _min_delta) < reference_value will always be true. Therefore, it keeps resetting _wait to 0.

The suggestion to earlystopping using _baseline

  1. The fix to the suggestion is in line 88 in the file Tensorflow.Keras.Callbacks.EarlyStopping.cs:
    if (_baseline == 0f || _is_improvement(current, _baseline))
    ---> if (_baseline == 0f || _is_improvement(_baseline, current))

Why:

We should check if the current improved loss is less than _baseline. If yes then do not reset _wait to 0.

Configuration and Other Information

No response

@Wanglongzhi2001
Copy link
Collaborator

Wanglongzhi2001 commented Sep 20, 2023

Hello, very thanks for your suggestion. In TensorFlow.NET, it uses a single bool expression to represent self.monitor_op rather than np.greater and np.less that used in tensorflow. And it seems like the reason trigers this bug is that there lacks of parts corresponding to the following code in tensorflow:

if self.monitor_op == np.greater:
    self.min_delta *= 1
else:
    self.min_delta *= -1

I will fix it, and you can refer to the implemention in tensorflow:
https://github.com/tensorflow/tensorflow/blob/879a5e37eb350f556debec3cd5c9b6dcb8809318/tensorflow/python/keras/callbacks.py#L1702

@Wanglongzhi2001
Copy link
Collaborator

Hello, I have commit a PR to solve this problem. #1180
If there still exist some problem, please tell me. ^_^

@Wanglongzhi2001
Copy link
Collaborator

Can this issue be closed?

@GaijinOtohp
Copy link
Contributor Author

Yes. The bug is fixed.
It can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants