-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize event subscribe exception handling #5719
Comments
why need |
To offer users a choice, synchronization services can remain stable while allowing occasional exceptions to occur during subscribe logic. |
@jwrct @317787106 @ss3344520 Do you have any ideas about this issue? |
@lurais From your description, there are more than two exception handling strategies. It is recommended to use enumeration types for configuration parameters instead of boolean types. |
I don't recommend adding more configurations for user purposes. |
@ss3344520 In my opinion, is there any situation in which it needs to use another strategy besides stopping or continuing to synchronize blocks? |
@lurais is System.exit(1) has any potential problems, such as resource release ? |
@317787106 When you use the command "System.exit", it will take effect as same as "kill -15" actually. |
@halibobo1205 Why do you not recommend adding more configurations for user purposes? Do you have any more detail reasons? |
Based on your plan and default configuration, if an exception is thrown causing the system to exit when publishing an event, how should the node user handle it to recover? Will there be any data loss issue after the node restarts? Will there be any impact on the event subscribers? |
Event subscribers are not allowed to drop data in most cases. |
@jwrct We expect the event subscription mechanism to work without any issues and not lose any data. If you encounter any problems with the event subscription component itself, you should fix the problem and restart it as before. If the problem is caused by sporadic issues like network problems, restarting the node can generally return it to normal. As the currently processed block is not solidified, there will be no data loss. However, duplicate data may be generated due to the impact on the event subscription component. |
@halibobo1205 As we can see, if someone builds a website, the website provides external services by calling the RPC interface of the node, and also uses event subscription for subsequent on-chain tracking. If the node is stopped directly, the overall service will lose availability. Is this reasonable? |
@jwrct @ss3344520 @317787106 do you have any ideas about this scenario? |
@lurais What scene? I don't understand what you are asking. |
@jwrct if someone builds a website, the website provides external services by calling the RPC interface of the node, and also uses event subscription for subsequent on-chain tracking. If the node is stopped directly, the overall service will lose availability. Is this reasonable? |
Isn't that right? I see that when the modified code throws an exception in the event subscription service, if the switch is not turned on, the program will exit directly? There's nothing like that. How do users perceive when a node stops? Since we are concerned about the generation of duplicate data, we should focus on data deduplication. |
@lxcmyf If the switch is not turned on, the program will terminate. This behavior is similar to the one before the modification. Previously, if there was an abnormal event subscription, synchronization would stop but the process would continue running. However, existing tools can easily detect when a node stops running. Since forks and other similar situations may arise, it is possible that the subscription service may receive duplicate data, this is another question. |
I think the focus is on why the event subscription service threw an error, and then resolving the error and restarting it. |
My point is to make sure that messages can't be lost if the event subscription is turned on . |
@lxcmyf Most people think that the switch is not needed. Therefore, it will be removed to avoid data loss in the event subscription service. However, users with specific requirements can modify the code to suit their needs. |
This switch doesn't do much for me, I don't really like using a service that clearly tells me there's a problem. |
@bladehan1 Do you mean you would like to be told that there is a problem clearly without an option to ignore it? |
I've read all of the above and think the ignoreSubscribeErrorSwitchOn switch is unnecessary. |
@CarlChaoCarl Ok, got it. |
As discussed below, the config ignoreSubscribeError is unnecessary and will not be added. |
|
@halibobo1205 From what I can understand, the event subscription logic seems to work fine without any exceptions. However, there may be cases where we end up throwing exceptions due to incorrect usage of third-party components. For instance, in this scenario issue, if two threads try to use the Zeromq client simultaneously, it might result in errors. Although it may work if we try pushBlock again, it is not advisable as some data might be lost, and it could lead to other issues in the future. |
@lurais This exception only happens when write to ZeroMQ fails. When event plugin enables, the data is write to a queue that exists in memory so exception may not happen in this scene generaly. ZeroMQ and event plugin cannot be used at the same time. ZeroMQ is a native queue, exception happens rarely. I prefer to stop the node as retrying to write data might not work if IO error occurs. |
@317787106 Yes , it may be wise to resolve a problem as soon as you find it, to avoid any further complications. |
@lxcmyf @ss3344520 @jwrct @CarlChaoCarl As discussed earlier, the node will stop if an exception is found in the event subscription. Do you have any suggestions? |
Based on the discussion above, I find that most of the views do not want to add a new switch to complicate the feature. Secondly, the scenario in which the exception occurs is so rare that a retry operation is not always guaranteed to succeed. And for most Dapps, event loss is unacceptable. So I suggest:
I would like to continue this topic AllDevsMeeting |
The optimize logic: tomatoishealthy@a9bc104 Will be submitted as a PR later. |
Rationale
Why should this feature exist?
I notice that the codes of the event subscribe function are quite close to the applyBlock process codes in the pushBlock function, and any exception thrown by the event subscribe function will lead to normal blocks being removed from KhaosDB, which has been committed successfully.
In this case, the next block will be processed failed when putting into the KhaosDB since no parent block could be found in KhaosDB.
The detail code is here:
The block synchronization should work as we expected when any exception is thrown in the event subscribe function, in detail, it should work well if the event subscribe function is allowed to miss some data.
Implementation
Do you have ideas regarding the implementation of this feature?
Solution: The event subscribe logic is the subsequent processing logic of block processing, and the event subscribe logic is independent from the existing block processing logic. Therefore, exceptions related to the event subscribe logic should be handled separately. In addition, due to some users' high dependence on the stability of event subscribe functions, while others require synchronization services to remain stable and allow occasional exceptions to occur in the event subscribe logic, a new switch is added to control whether synchronization functions should be suspended when event subscribe services are abnormal.
The new added switch is located in config file path: event.subscribe.ignoreSubscribeError, and the node will stop when any exception is thrown in event subscribe services if the config is not set or set to false; and the node will ignore the exceptions thrown by the event subscribe services and continue to synchronize blocks only if it has set to true .
The related code will be like this:
As discussed below, the config ignoreSubscribeError is unnecessary and will not be added. The next question is, if the event subscription function throws an exception, does it need to be retried? Or just stop the node if any exception is found in the event subscription:
As discussed in the comments, the node will stop if an exception is found in the event subscription.
Are you willing to implement this feature?
Yes.
The text was updated successfully, but these errors were encountered: