systemd: Do not report "stopping" state to systemd · rabbitmq/rabbitmq-server@b5a8e61

Commit

systemd: Do not report "stopping" state to systemd

The problem is we only know about the state of the `rabbit` Erlang
application — when it is started and stopped. But we can't know the fate
of the Erlang VM, except if `rabbit:stop_and_halt()` is called. This
function is not called if `init:stop()` or a SIGTERM are used for
instance.

systemd is interested in the state of the system process (the Erlang
VM), not what's happening inside. But inside, we have multiple
situations where the Erlang application is stopped, but not the Erlang
VM. For instance:

    * When clustering, the Erlang application is stopped before the
      cluster is created or expanded. The application is restarted once
      done. This is controled either manually or using the peer
      discovery plugins.

    * The `pause_minority` or `pause_if_all_down` partition strategies
      both stop the Erlang application for an indefinite period of time,
      but RabbitMQ as a service is still up (even though it is managing
      its own degraded mode and no connections are accepted).

In both cases, the service is still running from the system's service
manager's point of view.

As said above, we can never tell "the VM is being terminated" with
confidence. We can only know about the Erlang application itself.
Therefore, it is best to report the latter as a systemd state
description, but not reporting the "STOPPING=1" state at all. systemd
will figure out itself that the Erlang VM exited anyway.

Before this change, we were reporting the "STOPPING=1" state to systemd
every time the Elang application was stopped. The problem was that
systemd expected the system process (the Erlang VM) to exit within a
configured period of time (90 seconds by default) or report that's it's
ready again ("READY=1"). This issue remained unnoticed when the cluster
was created/expanded because it probably happened within that time
frame. However, it was reported with the partition healing strategies
because the partition might last longer than 90 seconds. When this
happened, the Erlang VM was killed (SIGKILL) and the service restarted.

References #3262.
Fixes #3289.

(cherry picked from commit 23c71b2)

Loading branch information

dumbbell authored and mergify-bot committed Aug 10, 2021

1 parent bc91083 commit b5a8e61

deps/rabbit/apps/rabbitmq_prelaunch/src/rabbit_boot_state_systemd.erl

-Original file line number
+Diff line change
@@ Expand Up / @@ -45,8 +45,7 @@ code_change(_OldVsn, State, _Extra) -> @@
     %% Private
-    notify_boot_state(BootState)
-      when BootState =:= ready orelse BootState =:= stopping ->
+    notify_boot_state(ready = BootState) ->
         Status = boot_state_to_desc(BootState),
         ?LOG_DEBUG(
            ?LOG_PREFIX "notifying of state `~s`",
@@ Expand All / @@ -62,7 +61,7 @@ notify_boot_state(BootState) -> @@
         systemd:notify({status, Status}).
     boot_state_to_desc(stopped) ->
-        "";
+        "Standing by";
     boot_state_to_desc(booting) ->
         "Startup in progress";
     boot_state_to_desc(core_started) ->
@@ Expand Down @@

0 comments on commit `b5a8e61`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `b5a8e61`

Commit

There are no files selected for viewing

0 comments on commit b5a8e61

0 comments on commit `b5a8e61`