Skip to content

Set env.MSB_BROKER_RECONNECT to false by default #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shumsky opened this issue Jul 26, 2017 · 11 comments
Closed

Set env.MSB_BROKER_RECONNECT to false by default #200

shumsky opened this issue Jul 26, 2017 · 11 comments

Comments

@shumsky
Copy link
Collaborator

shumsky commented Jul 26, 2017

Hello,

amqp-coffee library which the AMQP adapter is based on provides a feature of connection retry. MSB enables that feature by default:

  config.amqp = {
    // ...
    reconnect: env.MSB_BROKER_RECONNECT != 'false',
    // ...
  };

The drawback of that feature is that the retry is done silently without logging an error or emitting any events. If a microservice which consumes a queue via MSB is misconfigured and fails to establish a connection with the borker, it will retry forever without giving any signs of the problem.

I propose to disable connection retry in MSB by default, namely set MSB_BROKER_RECONNECT to false. So that the microservice would terminate with an error printed to stderr once the connection fails to be established or is lost. In that case the problem would be immediately detected.

When the application crashes, PM2 will try to restart the it with limited number of attempts. If the connection problem is temporary and may be gone in a moment, the microservice could be healed after a restart. If the connection problem is permanent, PM2 will exhaust the number of attempts and set the microservice to the errored state.

What do you think about that?

@agritsik
Copy link
Collaborator

agritsik commented Jul 26, 2017

Consider this as a minor fix for the 1.x version. Still backward compatible but brings a new behavior in case of connectivity issue.

The same fix should be added to 2.x version as well.

@agritsik
Copy link
Collaborator

duplicates #56

@Kumjami
Copy link
Contributor

Kumjami commented Jul 27, 2017

Hi @shumsky,

I agree that an app/module doing things silently without notifying is a very annoying behaviour.
By deactivating this silent retry you are delegating retry responsibility to process level (PM2 in that case).
Not sure if retry is a required feature for MSB. Actually, wasn't it one of the main reasons to pick amqp-coffe on the first place?
If amqp-coffe retry functionality is not something pleasant to work with, I would rather go to use amqplib which is the official package recommended by RabbitMQ javascript examples and implement retry logic by ourselves if that is a requirement.

Maybe I'm going a little bit out of the scope. To stuck with the initial issue: I would disable retry if that is working silently by default and try to document really well that.
For a further step I would plan a way to fix this silently behaviour.

@shumsky
Copy link
Collaborator Author

shumsky commented Jul 27, 2017

Hi @Kumjami,

To me, MSB should have retry, whether it's provided by the underlying AMQP library or implemented by ourselves on the adapter level. But it looks like the current solution brings more harm than good.

So maybe it's worth to disable it by default (BTW, it is disabled by default in the amqp-coffee itself) and create an issue for implementing a better solution which could then become the default behavior. As you suggested, we could also consider replacement of the amqp-coffee with the amqplib within the scope of the new solution.

I like the way the retry is done in the RabbitMQ Java client. msb-java is aligned with it. We could consider implementing it in MSB for Node.

@Kumjami
Copy link
Contributor

Kumjami commented Jul 27, 2017

@shumsky I completely agree.

@shumsky
Copy link
Collaborator Author

shumsky commented Jul 31, 2017

I compared default behaviour of three AMQP libraries in a few different scenarios of connection loss:

Library / Scenrario Consumption on startup Consumption while running Publish
amqp-coffee with RECONNECT=true silent retry silent retry silent retry
amqp-coffee with RECONNECT=false stop stop stop
amqplib for Node stop stop stop
amqp-client for Java stop retry with logging; recovery of consumers and bindings throw exception

The worst behaviour is amqp-coffee with RECONNECT=true, where silent retry is applied. And a good behaviour, to me, is amqp-client for Java, which we could consider implementing in MSB and which is also an official RabbitMQ client for Java.

Summing it up. Due to amqp-coffee API being not fluent enough, it is hard to adjust its retry mechanism without workarounds in MSB or microservices. We can do next sequence of steps:

  1. Disable internal retry in amqp-coffee, i.e. set RECONNECT=false (align with amqplib for Node behaviour)
  2. Since we are now aligned with amqplib behaviour, we can consider switching to it in the MSB AMQP adapter. amqplib compared to amqp-coffee looks more mature and stable
  3. Implement a retry mechanism on MSB adapter level (align MSB with amqp-client for Java or another good strategy)

The first step is the solution I initially proposed in this issue. Once we encounter a connection loss, we would let the microservice die and delegate recovery to the process supervisor (e.g. PM2). The same approach is also advocated by the creator of the amqplib for Node: amqp-node/amqplib#25 (comment)

@Kumjami
Copy link
Contributor

Kumjami commented Aug 14, 2017

Hi @shumsky, great report.

So I understand that official amqp libraries has different behaviours in Java and Node. This is something a little bit confusing.

In my opinion we should try to have the logic a the same place for different implementations. So, if in java the retry is at amqp lib level, in Node it should be also at that level, and not in MSB logic itself.
If retry logic is at MSB level, then it should be implemented in Java and Node versions. Having different logic is a little bit strange for me.

So maybe it would be a possibility to open a PR to add retry and logging at amqplib or to add the logging while retrying at amqp-coffee. Do you think that could be possible?

@shumsky
Copy link
Collaborator Author

shumsky commented Aug 17, 2017

@Kumjami Agree that the retry approach should be aligned in Java and Node versions of MSB. I think opening a PR to either amqplib or amqp-coffee would be the right way. Though it might take time. The maintainer of the amqp-coffee hasn't been responsive for last few weeks. And the maintainer of amqplib seems to have concerns regarding retry functionality as a corresponding issue has been unresolved since 2013.

@miguel-cotta
Copy link

Hi all,
just wanted to add some other points regarding the amqplib vs amqp-coffee debate. Newer versions (>= 2.0.0) of the New Relic nodejs agent support amqplib:

https://docs.newrelic.com/docs/agents/nodejs-agent/supported-features/message-queues

@agritsik
Copy link
Collaborator

Thanks for the update. It is certainly one more good point for migrating to amqplib.

@agritsik
Copy link
Collaborator

All details regarding migration to amqplib will be described there #204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants