-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix application hang when network is lost during QoS0 publish loop #1006
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mbedTLS uses sockets API for network communication when running on a Linux platform. The application data sent using sockets send API does not immediately gets sent over the network but gets copied to an internal buffer in the TCP stack for later transmission. The socket send API copies data in an internal buffer of the TCP stack and returns success to the application. The data is later transmitted by the TCP stack and the internal buffer is freed only when the TCP ACK confirming the receipt of the data is received from the other end. When the network connection is lost, the TCP stack will not be able to send any data over the network and will stop receiving any ACK from the other end. As a result, if the application continues to send data, the TCP stack's the internal buffers will keep getting consumed as no buffer will be freed by received ACKs. Note that the sockets send API will continue to return success to the application even though the data is actually not getting sent. When all the TCP internal buffers are full, the socket send API will: - Either block forever, if the socket is blocking. - Or return error if the socket is non-blocking or a send timeout is set using SO_SNDTIMEO. Look at the following diagram: -------------------------------------------------- ^ ^ ^ ^ | | | | | | | | + + + + T0 T1 T2 T3 Start Start QoS0 Network Lost TCP Queue Connection Publish Loop Full In the above diagram, the network connection is lost at time T2 but the application finds out only at a later time T3 when the TCP internal buffers are full. By default, the underlying socket in mbedTLS is blocking. As a result, an application which publishes QoS0 messages in a loop may hit the condition above and appear to hang. mbedTLS provides an API, namely mbedtls_net_set_nonblock, to set the underlying socket as non-blocking which will ensure that the application gets notified of the failed send instead of hanging forever. This change adds a config parameter AWS_IOT_MQTT_SOCKET_NON_BLOCKING which can be defined in the aws_iot_config.h file to set the underlying socket as non-blocking. The application should use QoS1 to be able to quickly detect broken connections as opposed to relying on a failed send from the TCP stack which is dependent on the number of internal buffers in the TCP stack and network load etc. If the requirement of the user application is to use QoS0 and to eventually detect a broken connection, the newly added option AWS_IOT_MQTT_SOCKET_NON_BLOCKING can be used. Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
aggarw13
reviewed
Jun 21, 2020
aggarw13
reviewed
Jun 21, 2020
Signed-off-by: Gaurav Aggarwal <aggarg@amazon.com>
abhidixi11
approved these changes
Jun 23, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The quality error is "Can't find spellcheck script, exiting.", may be path is incorrect , but I don't see any problem with this PR.
dan4thewin
approved these changes
Jun 23, 2020
Hi, please ignore the failing checks. This target branch was mistakenly allowed in CI intended for the development branch. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
mbedTLS uses sockets API for network communication when running on a Linux platform. The application data sent using sockets
send
API does not immediately gets sent over the network but gets copied to an internal buffer in the TCP stack for later transmission. The socketsend
API copies data in an internal buffer of the TCP stack and returns success to the application. The data is later transmitted by the TCP stack and the internal buffer is freed only when the TCP ACK confirming the receipt of the data is received from the other end.When the network connection is lost, the TCP stack will not be able to send any data over the network and will stop receiving any ACK from the other end. As a result, if the application continues to send data, the TCP stack's the internal buffers will keep getting consumed as no buffer will be freed by received ACKs. Note that the sockets
send
API will continue to return success to the application even though the data is actually not getting sent. When all the TCP internal buffers are full, the socketsend
API will:SO_SNDTIMEO
.Look at the following diagram:
In the above diagram, the network connection is lost at time T2 but the application finds out only at a later time T3 when the TCP internal buffers are full.
By default, the underlying socket in mbedTLS is blocking. As a result, an application which publishes QoS0 messages in a loop may hit the condition above and appear to hang. mbedTLS provides an API, namely
mbedtls_net_set_nonblock
, to set the underlying socket as non-blocking which will ensure that the application gets notified of the failed send instead of hanging forever.This change adds a config parameter
AWS_IOT_MQTT_SOCKET_NON_BLOCKING
which can be defined in theaws_iot_config.h
file to set the underlying socket as non-blocking.The application should use QoS1 to be able to quickly detect broken connections as opposed to relying on a failed send from the TCP stack which is dependent on the number of internal buffers in the TCP stack and network load etc. If the requirement of the user application is to use QoS0 and to eventually detect a broken connection, the newly added option
AWS_IOT_MQTT_SOCKET_NON_BLOCKING
can be used.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.