Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Failure in partition_balancer_test.PartitionBalancerTest.test_full_nodes #5884

Closed
BenPope opened this issue Aug 8, 2022 · 11 comments · Fixed by #6540
Closed

CI Failure in partition_balancer_test.PartitionBalancerTest.test_full_nodes #5884

BenPope opened this issue Aug 8, 2022 · 11 comments · Fixed by #6540

Comments

@BenPope
Copy link
Member

BenPope commented Aug 8, 2022

Version & Environment

Redpanda version: dev

What went wrong?

CI Failure

What should have happened instead?

Ci Success

How to reproduce the issue?

??

Additional information

CI Failure: https://buildkite.com/redpanda/redpanda/builds/13764#01827be6-58e1-4f56-8692-47e1ef55e0b5

[WARNING - 2022-08-08 06:18:22,909 - service_registry - free_all - lineno:83]: Error cleaning service <FranzGoVerifiableProducer-0-140169200079712: num_nodes: 1, nodes: ['docker-rp-24']>: 'super' object has no attribute 'free_all'
[INFO  - 2022-08-08 06:18:22,909 - runner_client - log - lineno:278]: RunnerClient: rptest.tests.partition_balancer_test.PartitionBalancerTest.test_full_nodes: Summary: AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 448, in test_full_nodes
    assert used_ratio < 0.8
AssertionError
@BenPope BenPope added kind/bug Something isn't working ci-failure labels Aug 8, 2022
@BenPope
Copy link
Member Author

BenPope commented Aug 9, 2022

Also got a crash in this test: https://buildkite.com/redpanda/redpanda/builds/13842#0182811f-0072-4f5d-9220-359c79d86828

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 38, in wrapped
    self.redpanda.raise_on_crash()
  File "/root/tests/rptest/services/redpanda.py", line 1048, in raise_on_crash
    raise NodeCrash(crashes)
rptest.services.utils.NodeCrash: <NodeCrash (docker-rp-8,docker-rp-20,docker-rp-21,docker-rp-22) docker-rp-8: ERROR 2022-08-09 07:01:31,572 [shard 0] assert - Assert failure: (../../../src/v/cluster/node/local_monitor.cc:109) 'used < *_disk_size_for_test' mock disk size 2147483648 must be > used size 2333523968
>

@dotnwat
Copy link
Member

dotnwat commented Aug 9, 2022

@ajfabbri what do you think?

@travisdowns
Copy link
Member

travisdowns commented Aug 9, 2022

I got a failure in this test but different to Ben's: this was complaining about BadLogLines:

https://buildkite.com/redpanda/redpanda/builds/13828#01827fde-bcb1-487b-b14f-9ad6dded442f

--------------------------------------------------------------------------------
--


[INFO:2022-08-09 00:51:17,651]: RunnerClient: rptest.tests.partition_balancer_test.PartitionBalancerTest.test_full_nodes: FAIL: <BadLogLines nodes=docker-rp-20(1) example="ERROR 2022-08-09 00:50:53,785 [shard 0] cluster - storage space alert: free space at 4.050% on /var/lib/redpanda/data: 2.000GiB total, 82.941MiB free, min. free 0.000bytes. Please adjust retention policies as needed to avoid running out of space">

Traceback (most recent call last):

  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run

    data = self.run_test()

  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test

    return self.test_context.function(self.test)

  File "/root/tests/rptest/services/cluster.py", line 48, in wrapped

    self.redpanda.raise_on_bad_logs(allow_list=log_allow_list)

  File "/root/tests/rptest/services/redpanda.py", line 1126, in raise_on_bad_logs

    raise BadLogLines(bad_lines)

rptest.services.utils.BadLogLines: <BadLogLines nodes=docker-rp-20(1) example="ERROR 2022-08-09 00:50:53,785 [shard 0] cluster - storage space alert: free space at 4.050% on /var/lib/redpanda/data: 2.000GiB total, 82.941MiB free, min. free 0.000bytes. Please adjust retention policies as needed to avoid running out of space">

@ztlpn
Copy link
Contributor

ztlpn commented Aug 9, 2022

We investigated this one with @mmaslankaprv this morning - basically it is caused by stale health monitor data after a controller leader changed (partition balancer thought that partition sizes were zero when in reality they were quite big). Should be fixed by #5922

@ztlpn
Copy link
Contributor

ztlpn commented Aug 11, 2022

should be fixed by #5922

@ztlpn ztlpn closed this as completed Aug 11, 2022
@andrwng
Copy link
Contributor

andrwng commented Aug 17, 2022

Saw another instance of this

FAIL test: PartitionBalancerTest.test_full_nodes (1/24 runs)
  failure at 2022-08-16T16:42:59.830Z: AssertionError()
      in job https://buildkite.com/redpanda/redpanda/builds/14220#0182a71a-48cc-4534-8c04-7d2bea40908c

Stack trace:

====================================================================================================
test_id:    rptest.tests.partition_balancer_test.PartitionBalancerTest.test_full_nodes
status:     FAIL
run time:   1 minute 20.539 seconds


    AssertionError()
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 473, in test_full_nodes
    assert used_ratio < 0.8
AssertionError

@rystsov
Copy link
Contributor

rystsov commented Aug 19, 2022

@VladLazar
Copy link
Contributor

Another instance on 19.09.2022:

https://buildkite.com/redpanda/redpanda/builds/15468#01835650-c8ef-448b-a958-f521a6fd0569

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 135, in run
    data = self.run_test()
  File "/usr/local/lib/python3.10/dist-packages/ducktape/tests/runner_client.py", line 227, in run_test
    return self.test_context.function(self.test)
  File "/root/tests/rptest/services/cluster.py", line 35, in wrapped
    r = f(self, *args, **kwargs)
  File "/root/tests/rptest/tests/partition_balancer_test.py", line 531, in test_full_nodes
    assert used_ratio < 0.8
AssertionError

@ztlpn
Copy link
Contributor