Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure (connection failure in get_decommission_status) in RandomNodeOperationsTest.test_node_operations #8589

Closed
rystsov opened this issue Feb 3, 2023 · 4 comments
Labels

Comments

@rystsov
Copy link
Contributor

rystsov commented Feb 3, 2023

https://buildkite.com/redpanda/redpanda/builds/22392#01861451-188a-4f7d-a07d-be20d45da4d6

Module: rptest.tests.random_node_operations_test
Class:  RandomNodeOperationsTest
Method: test_node_operations
Arguments:
{
  "enable_failures": true
}
test_id:    rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True
status:     FAIL
run time:   2 minutes 49.988 seconds

    ConnectionError(MaxRetryError("HTTPConnectionPool(host='docker-rp-9', port=9644): Max retries exceeded with url: /v1/brokers/1/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f887d32c670>: Failed to establish a new connection: [Errno 111] Connection refused'))"))
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f887d32c670>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='docker-rp-9', port=9644): Max retries exceeded with url: /v1/brokers/1/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f887d32c670>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/usr/local/lib/python3.10/dist-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/root/tests/rptest/utils/mode_checks.py", line 63, in f
    return func(*args, **kwargs)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/random_node_operations_test.py", line 103, in test_node_operations
    executor.execute_operation(op)
  File "/root/tests/rptest/utils/node_operations.py", line 377, in execute_operation
    self.wait_for_removed(node_id)
  File "/root/tests/rptest/utils/node_operations.py", line 240, in wait_for_removed
    waiter.wait_for_removal()
  File "/root/tests/rptest/utils/node_operations.py", line 124, in wait_for_removal
    decommission_status = self.admin.get_decommission_status(
  File "/root/tests/rptest/services/admin.py", line 476, in get_decommission_status
    return self._request('get', path, node=node).json()
  File "/root/tests/rptest/services/admin.py", line 307, in _request
    r = self._session.request(verb, url, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='docker-rp-9', port=9644): Max retries exceeded with url: /v1/brokers/1/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f887d32c670>: Failed to establish a new connection: [Errno 111] Connection refused'))
@michael-redpanda
Copy link
Contributor

@travisdowns
Copy link
Member

Also happening in

NodeOperationFuzzyTest.test_node_operations.enable_failures=True.num_to_upgrade=0.compacted_topics=True

https://buildkite.com/redpanda/vtools/builds/5649#01861128-2136-4662-844f-deacd09e0995

Here's a test log snippet, analysis below:

[INFO  - 2023-02-02 10:08:05,106 - failure_injector - _start - lineno:226]: starting redpanda on ip-172-31-7-0
[DEBUG - 2023-02-02 10:08:05,106 - remoteaccount - _log - lineno:166]: root@ip-172-31-7-0: Running ssh command: ulimit -Sc unlimited;  ASAN_OPTIONS=abort_on_error=1:disable_coredump=0:unmap_shadow_on_exit=1 nohup /opt/redpanda/bin/redpanda --redpanda-cfg /etc/redpanda/redpanda.yaml --default-log-level info --logger-log-level=cluster=trace:kafka=trace:raft=trace:admin_api_server=trace:kvstore=trace  --abort-on-seastar-bad-alloc  --dump-memory-diagnostics-on-alloc-failure-kind=all  --unsafe-bypass-fsync=0  >> /var/lib/redpanda/redpanda.log 2>&1 &
[DEBUG - 2023-02-02 10:08:05,145 - admin - _request - lineno:305]: Dispatching get http://ip-172-31-7-0:9644/v1/brokers/5/decommission
[ERROR - 2023-02-02 10:08:05,146 - cluster - wrapped - lineno:41]: Test failed, doing failure checks...
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb50618c280>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='ip-172-31-7-0', port=9644): Max retries exceeded with url: /v1/brokers/5/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb50618c280>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/node_operations_fuzzy_test.py", line 162, in test_node_operations
    executor.execute_operation(op)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 381, in execute_operation
    self.wait_for_removed(node_id)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 240, in wait_for_removed
    waiter.wait_for_removal()
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 124, in wait_for_removal
    decommission_status = self.admin.get_decommission_status(
  File "/home/ubuntu/redpanda/tests/rptest/services/admin.py", line 476, in get_decommission_status
    return self._request('get', path, node=node).json()
  File "/home/ubuntu/redpanda/tests/rptest/services/admin.py", line 307, in _request
    r = self._session.request(verb, url, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-172-31-7-0', port=9644): Max retries exceeded with url: /v1/brokers/5/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb50618c280>: Failed to establish a new connection: [Errno 111] Connection refused'))

Failure injector starts up the *.7-0 node at 10:08:05,106 and then right after at 10:08:05,145 we dispatch the http request to the decomission status API, but since RP doesn't start in 39 milliseconds the connection is refused. For whatever reason the retries are exhausted immediately and the test fails.

So that path in the python code probably needs to have a longer timeout or connection refused to be handled differently so it doesn't break out of the wait_until.

@travisdowns
Copy link
Member

Python stack for the above:

    ConnectionError(MaxRetryError("HTTPConnectionPool(host='ip-172-31-15-187', port=9644): Max retries exceeded with url: /v1/brokers/0/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb5061d6230>: Failed to establish a new connection: [Errno 111] Connection refused'))"))
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 159, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 670, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 392, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.10/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.10/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib/python3.10/http/client.py", line 975, in send
    self.connect()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 187, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connection.py", line 171, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fb5061d6230>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py", line 726, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py", line 446, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='ip-172-31-15-187', port=9644): Max retries exceeded with url: /v1/brokers/0/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb5061d6230>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/ducktape/mark/_mark.py", line 476, in wrapper
    return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/home/ubuntu/redpanda/tests/rptest/scale_tests/node_operations_fuzzy_test.py", line 162, in test_node_operations
    executor.execute_operation(op)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 377, in execute_operation
    self.wait_for_removed(node_id)
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 240, in wait_for_removed
    waiter.wait_for_removal()
  File "/home/ubuntu/redpanda/tests/rptest/utils/node_operations.py", line 124, in wait_for_removal
    decommission_status = self.admin.get_decommission_status(
  File "/home/ubuntu/redpanda/tests/rptest/services/admin.py", line 476, in get_decommission_status
    return self._request('get', path, node=node).json()
  File "/home/ubuntu/redpanda/tests/rptest/services/admin.py", line 307, in _request
    r = self._session.request(verb, url, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-172-31-15-187', port=9644): Max retries exceeded with url: /v1/brokers/0/decommission (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb5061d6230>: Failed to establish a new connection: [Errno 111] Connection refused'))

@mmaslankaprv
Copy link
Member

This should already be fixed by: #8568

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants