Skip to content
This repository has been archived by the owner on Aug 30, 2022. It is now read-only.

Fix RemoteReporter test race condition #135

Merged
merged 2 commits into from
Jan 2, 2019

Conversation

isaachier
Copy link
Contributor

  • Avoid race conditions by relaxing test requirements.

Signed-off-by: Isaac Hier <isaachier@gmail.com>
@isaachier isaachier changed the title - #134 Fix RemoteReporter test race condition Sep 18, 2018
@isaachier
Copy link
Contributor Author

@tudor do you approve of this fix for Travis CI? It just relaxes the test case to avoid "off-by-one" errors in reporting thread.

@tudor
Copy link
Contributor

tudor commented Sep 18, 2018

I don't understand the fix. Shouldn't the reporter flush on close() (which should wait for the reporting thread to be done!), in which case no sleep()s should be necessary?

(I firmly believe that close to 100% of tests that have sleep() in them are broken.)

@isaachier
Copy link
Contributor Author

Lol in general agree with you. At first I was confused as well. Then remembered Jaeger reporter use a buffer that flushes periodically or upon full queue. Either way, if spans overflow the queue, they are dropped. That's what the sleep does here. The alternative would be to somehow force a blocking report in test cases, but that sounds a bit hacky.

@tudor
Copy link
Contributor

tudor commented Sep 19, 2018

You know there are only 100 spans in the test. Make the reporter use a queue that's >= 100 entries so drops don't happen.

Close() should definitely flush the queue and wait for it to be flushed. (And should not be automatically called from the destructor! The OpenTracing pattern is that you have one global tracer, and global variables should definitely not do anything "interesting" from the destructor, as global destruction order is weird, and they might depend on other globals that have already been destroyed.)

@isaachier
Copy link
Contributor Author

Interesting caveat about global destructors. I've actually never heard that before. Wouldn't this affect any RAII class? It sounds a bit extreme when you know what your dependencies are. Regardless, this pertains more to #134, where the close in the destructor is a real issue.

If I change the queue size to be equal or greater than the total number of spans, that will probably reduce coverage. I'll try that and report back with an update soon.

@tudor
Copy link
Contributor

tudor commented Sep 20, 2018

Wouldn't this affect any RAII class? It sounds a bit extreme when you know what your dependencies are.

To some degree, but you probably don't use RAII classes as global objects. Also, code often has dependencies on globals and singletons and they're not always obvious (are you sure that something in a third-party library that you use doesn't have a reference to a global?)

A similar problem occurs at initialization (initialization order is also counter-intuitive) but you can at least get around it by initializing on first use, but doing the same at destruction is more difficult. See https://isocpp.org/wiki/faq/ctors#static-init-order

(RAII is not without its caveats: what if you have a class wrapping a (buffered) file writer? You obviously want to flush the buffer in the destructor, but what if flushing the buffer fails? You can't really throw an exception. What do you do? Also, think of std::thread; destroying a thread without join()ing it is such a terrible error that the standard library actually calls std::terminate() if you try.)

FYI, the Google C++ style guide bans globals of non-trivial types because of this.

Signed-off-by: Isaac Hier <isaachier@gmail.com>
@isaachier
Copy link
Contributor Author

Merging this until a better solution comes up.

@isaachier isaachier merged commit cdfaf5b into jaegertracing:master Jan 2, 2019
@isaachier isaachier deleted the remote-reporter-test-fix branch January 2, 2019 17:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants