Skip to content
This repository has been archived by the owner on Feb 15, 2018. It is now read-only.

Add startup lock support (general framework + tested etcd implementation) #98

Closed
wants to merge 5 commits into from

Conversation

binarin
Copy link
Contributor

@binarin binarin commented Oct 7, 2016

I'm sorry for such a big patch, but I think there was no other way. (Also this PR includes patch from #96, which makes diff even bigger).

This required completely reworking startup sequence in autocluster.erl - as adding locking
would have made the existing dozen step sequence into an incomprehensible
mess. Now steps are more independent and are executed in order with
function that provides something like State + Either monads.

Boot process is now split into 3 stages:

  • First stage roughly corresponds to original implementation
  • Second step is responsible for registering node in the backend. Also
    TTL timers are now started there, instead of spreading
    -rabbit_boot_steps through backend modules. This makes
    autocluster.erl the single source of truth about what is actually
    happening during startup.
  • Third stage is responsible for releasing startup lock. We need it
    because of ignore failure mode - we want the lock to be released
    even if some prior steps failed.

During this rework it became evident that explicit mnesia:reset/0 is not
needed (more thorough explanation is in comments for
autocluster:maybe_cluster/1).

Startup locking support should work the same way for every backend, but
for now the only working implementation is for etcd. It even has some
property-based tests =)
Other backend fall back to original random delay behaviour.

Also it looks like dialyzer ignores type specs in comments - because
some of types there were definitely wrong, and converting to them to
-spec caused a lot of barfing from dialyzer.

This required completely reworking startup sequence in `autocluster.erl`
without making the existing dozen step sequence into an incomprehensible
mess. Now steps are more independent and are executed in order with
function that provides something like State + Either monads.

Boot process is now split into 3 stages:
- First stage roughly corresponds to original implementation
- Second step is responsible for registering node in the backend. Also
  TTL timers are now started there, instead of spreading
  `-rabbit_boot_steps` through backend modules. This makes
  `autocluster.erl` the single source of truth about what is actually
  happening during startup.
- Third stage is responsible for releasing startup lock. We need it
  because of `ignore` failure mode - we want the lock to be released
  even if some prior steps failed.

During this rework it became evident that explicit mnesia:reset/0 is not
needed (more thorough explanation is in comments for
`autocluster:maybe_cluster/1`).

Startup locking support should work the same way for every backend, but
for now the only working implementation is for `etcd`. It even has some
property-based tests =)
Other backend fall back to original random delay behaviour.

Also it looks like `dialyzer` ignores type specs in comments - because
some of types there were definitely wrong, and converting to them to
`-spec` caused a lot of barfing from `dialyzer`.
@codecov-io
Copy link

codecov-io commented Oct 11, 2016

Current coverage is 53.84% (diff: 62.50%)

Merging #98 into master will increase coverage by 12.13%

@@             master        #98   diff @@
==========================================
  Files            13         14     +1   
  Lines           513        611    +98   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            214        329   +115   
+ Misses          299        282    -17   
  Partials          0          0          

Powered by Codecov. Last update 9ed8213...67506b8

@binarin
Copy link
Contributor Author

binarin commented Oct 28, 2016

@gmr Will you have some time to look into this PR in the nearest future?

@gmr
Copy link
Contributor

gmr commented Oct 28, 2016

Yes, will do...

@michaelklishin
Copy link
Collaborator

michaelklishin commented Mar 2, 2017

@gmr we'd like to get this feature into 3.7.0. May I merge it here first?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants