Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate retryCallCluster for new ES client #71412

Merged

Conversation

pgayvallet
Copy link
Contributor

@pgayvallet pgayvallet commented Jul 13, 2020

Summary

Part of #35508
Migrate retryCallCluster and migrationRetryCallCluster to be usable with new ES client.

Checklist

@pgayvallet pgayvallet added release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.9.0 v8.0.0 labels Jul 13, 2020
Comment on lines 98 to 106
return iif(
() =>
error instanceof esErrors.NoLivingConnectionsError ||
error instanceof esErrors.ConnectionError ||
error instanceof esErrors.TimeoutError ||
retryMigrationStatusCodes.includes(error.statusCode) ||
error?.body?.error?.type === 'snapshot_in_progress_exception',
timer(delay),
throwError(error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a direct adaptation of

() => {
return (
error instanceof esErrors.NoConnections ||
error instanceof esErrors.ConnectionFault ||
error instanceof esErrors.ServiceUnavailable ||
error instanceof esErrors.RequestTimeout ||
error instanceof esErrors.AuthenticationException ||
error instanceof esErrors.AuthorizationException ||
// @ts-expect-error
error instanceof esErrors.Gone ||
error?.body?.error?.type === 'snapshot_in_progress_exception'
);
},

@delvedor Could you take a look see if I did not do any mapping mistake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legacy client has a bug, where both the HTTP timeout error (408) and the socket timeout error will fall into the RequestTimeout error.
In the new client, the socket timeout is TimeoutError, while the HTTP timeout is a ResponseError with the statusCode property set to 408.
Not sure which one are you checking here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we probably want both. @rudolf can you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to retry all operations, regardless of where the timeout occurred.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a performance point of view, instead of checking via instanceof, you could use error.name === {name}, which is an order of magnitude faster :)
Each error object of the client has a corresponding .name property: https://github.com/elastic/elasticsearch-js/blob/master/lib/errors.js

* @internal
*/
export const retryCallCluster = <TResponse, TContext>(
apiCaller: ApiCaller<TResponse, TContext>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we delegate type inference to the apiCaller?

export const retryCallCluster = <T extends Promise<unknown>>(apiCaller: () => T): T => {
  return defer(apiCaller)
    .pipe(
      retryWhen((errors) =>
        errors.pipe(
          concatMap((error, i) =>
            iif(
              () => error instanceof esErrors.NoLivingConnectionsError,
              timer(1000),
              throwError(error)
            )
          )
        )
      )
    )
    .toPromise() as T;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I thought I had it with my APICaller type, but nope. Fixed.

.pipe(
retryWhen((errors) =>
errors.pipe(
concatMap((error, i) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i is not used

@pgayvallet pgayvallet marked this pull request as ready for review July 13, 2020 14:06
@pgayvallet pgayvallet requested a review from a team as a code owner July 13, 2020 14:06
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform (Team:Platform)

import { Logger } from '../../logging';

const retryResponseStatuses = [
503, // ServiceUnavailable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, if the status code of a response is 502, 503 or 504, the client performs an automatic failover internally, so have it here could be a duplication.
Regarding 4xx status codes, the client does not perform any automatic failover/retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. Having this feature for our 'end' consumers is great. For the migration needs, we need to perform an infinity of retries, so keeping the logic in a single place is probably better. But we don't need to disable the internal failover here though, the duplication is probably fine.

}
return iif(
() =>
error.name === 'NoLivingConnectionsError' ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: it's not type safe anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, however the unit test are asserting behavior against concrete error from the library, so I guess this is alright. But I can revert the change if we prefer instanceof based checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that optional. Feel free to merge

@pgayvallet pgayvallet added v7.10.0 and removed v7.9.0 labels Jul 17, 2020
const successReturn = elasticsearchClientMock.createClientResponse({ ...dummyBody });

let i = 0;
client.asyncSearch.get.mockImplementation(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I usually use chain of mockImplementationOnce

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@pgayvallet pgayvallet merged commit b29e8ee into elastic:master Jul 20, 2020
pgayvallet added a commit to pgayvallet/kibana that referenced this pull request Jul 20, 2020
* adapt retryCallCluster for new ES client

* review comments

* retry on 408 ResponseError

* use error name instead of instanceof base check

* use error name instead of instanceof base check bis

* use mockImplementationOnce chaining

Co-authored-by: restrry <restrry@gmail.com>
pgayvallet added a commit that referenced this pull request Jul 20, 2020
* adapt retryCallCluster for new ES client

* review comments

* retry on 408 ResponseError

* use error name instead of instanceof base check

* use error name instead of instanceof base check bis

* use mockImplementationOnce chaining

Co-authored-by: restrry <restrry@gmail.com>

Co-authored-by: restrry <restrry@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.10.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants