Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAM Roles: invalid config results in crash #5507

Closed
abhijat opened this issue Jul 19, 2022 · 2 comments · Fixed by #6864
Closed

IAM Roles: invalid config results in crash #5507

abhijat opened this issue Jul 19, 2022 · 2 comments · Fixed by #6864
Assignees
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working

Comments

@abhijat
Copy link
Contributor

abhijat commented Jul 19, 2022

Aborting on shard 0.
Backtrace:
  0x2fa44e5a
  0x402b2de4
  0x402b2a03
  0x400982c9
  0x400c45a5
  0x40220ac9
  0x40220cee
  0x40220b3a
  0x296f3fd1641f
  /opt/redpanda/lib/libc.so.6+0x4300a
  /opt/redpanda/lib/libc.so.6+0x22858
  /opt/redpanda/lib/libc.so.6+0x22728
  /opt/redpanda/lib/libc.so.6+0x33fd5
  0x2fd0627e
  0x2fb21224
  0x2fb1b1ff
  0x2fb536ef
  0x2fb53150
  0x2fb530f0
  0x2fb5304d
  0x2fb52dfd
  0x2fb52afe
  0x2fb528b8
  0x3fe9f588
  0x406a8b07
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address (pc 0x7fa8021d2941 bp 0x7fa802368588 sp 0x7fa7fd103cd0 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: this fault was caused by a dereference of a high value address (see register values below).  Disassemble the provided pc to learn which register was used.
    #0 0x7fa8021d2941  (/opt/redpanda/lib/libc.so.6+0x22941) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #1 0x7fa8021d2728  (/opt/redpanda/lib/libc.so.6+0x22728) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #2 0x7fa8021e3fd5  (/opt/redpanda/lib/libc.so.6+0x33fd5) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #3 0x5638f23cb27e  (/opt/redpanda/libexec/redpanda+0x2fd0627e) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #4 0x5638f21e6224  (/opt/redpanda/libexec/redpanda+0x2fb21224) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #5 0x5638f21e01ff  (/opt/redpanda/libexec/redpanda+0x2fb1b1ff) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #6 0x5638f22186ef  (/opt/redpanda/libexec/redpanda+0x2fb536ef) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #7 0x5638f2218150  (/opt/redpanda/libexec/redpanda+0x2fb53150) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #8 0x5638f22180f0  (/opt/redpanda/libexec/redpanda+0x2fb530f0) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #9 0x5638f221804d  (/opt/redpanda/libexec/redpanda+0x2fb5304d) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #10 0x5638f2217dfd  (/opt/redpanda/libexec/redpanda+0x2fb52dfd) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #11 0x5638f2217afe  (/opt/redpanda/libexec/redpanda+0x2fb52afe) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #12 0x5638f22178b8  (/opt/redpanda/libexec/redpanda+0x2fb528b8) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #13 0x563902564588  (/opt/redpanda/libexec/redpanda+0x3fe9f588) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)
    #14 0x563902d6db07  (/opt/redpanda/libexec/redpanda+0x406a8b07) (BuildId: 65fa21b22205712066fdc00c0cd1cbbb42527757)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/opt/redpanda/lib/libc.so.6+0x22941) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee) 
==1==ABORTING

when deploying redpanda on gcp and setting the credentials api to STS (which is an incorrect combination), an error should be logged when aborting for better usability.

instead a stack trace and an abort is seen as above.

@abhijat abhijat added kind/bug Something isn't working area/cloud-storage Shadow indexing subsystem labels Jul 19, 2022
@abhijat abhijat self-assigned this Jul 19, 2022
@abhijat
Copy link
Contributor Author

abhijat commented Jul 22, 2022

it looks like the error is because of an unhandled exception during app startup. the STS client expects two env variables to have been set up. they are missing and the error causes this crash. Adding a try block around remote::start shows this:

ERROR 2022-07-22 17:27:27,961 [shard 0] cloud_storage - remote.cc:181 - caught std::runtime_error (environment variable AWS_ROLE_ARN is not set, the STS client cannot function without this.)

@abhijat
Copy link
Contributor Author

abhijat commented Jul 22, 2022

stack trace with matching binary

seastar::sharded<cloud_storage::configuration>::~sharded() at ??:?
application::wire_up_redpanda_services() at ??:?
application::wire_up_services() at ??:?
application::run(int, char**)::$_1::operator()() const::{lambda()#1}::operator()() const at application.cc:?
decltype ((static_cast<application::run(int, char**)::$_1::operator()() const::{lambda()#1}>({parm#1}))()) std::__1::__invoke_constexpr<application::run(int, char**)::$_1::operator()() const::{lambda()#1}>(application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&, (application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&)...) at application.cc:?
decltype(auto) std::__1::__apply_tuple_impl<application::run(int, char**)::$_1::operator()() const::{lambda()#1}, std::__1::tuple<>>(application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&, std::__1::tuple<>&&, std::__1::__tuple_indices<>) at application.cc:?
decltype(auto) std::__1::apply<application::run(int, char**)::$_1::operator()() const::{lambda()#1}, std::__1::tuple<> >(application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&, std::__1::tuple<>&&) at application.cc:?
seastar::future<int> seastar::futurize<int>::apply<application::run(int, char**)::$_1::operator()() const::{lambda()#1}>(application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&, std::__1::tuple<>&&) at application.cc:?
seastar::async<application::run(int, char**)::$_1::operator()() const::{lambda()#1}>(seastar::thread_attributes, std::__1::invoke_result&&, (application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&)...)::{lambda()#1}::operator()() const at application.cc:?
seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<application::run(int, char**)::$_1::operator()() const::{lambda()#1}>(seastar::thread_attributes, std::__1::invoke_result&&, (application::run(int, char**)::$_1::operator()() const::{lambda()#1}&&)...)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) at application.cc:?
seastar::noncopyable_function<void ()>::operator()() const at future.cc:?
seastar::thread_context::main() at thread.cc:?

to repro

docker run --entrypoint /entrypoint.sh localhost/redpanda:dev start --logger-log-level=cloud_roles=trace --smp 1 --memory 512M --reserve-memory 0M --overprovisioned --node-id 0 --set redpanda.auto_create_topics_enabled=false --kafka-addr inside://0.0.0.0:9094,outside://0.0.0.0:9092 --advertise-kafka-addr inside://localhost:9094,outside://localhost:9092 --set redpanda.cloud_storage_enabled=true --set redpanda.cloud_storage_region=ap-southeast-1 --set redpanda.cloud_storage_api_endpoint=s3.ap-southeast-1.amazonaws.com --set redpanda.cloud_storage_bucket=proj-iam-roles --set redpanda.cloud_storage_segment_max_upload_interval_sec=30 --set redpanda.cloud_storage_credentials_source=sts

we need to handle the exceptions thrown from the IAM roles client gracefully if the invariants for it to function are broken

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant