Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opensearch: aoss support - 1.9 #6448

Merged
merged 3 commits into from
Feb 21, 2023
Merged

opensearch: aoss support - 1.9 #6448

merged 3 commits into from
Feb 21, 2023

Conversation

matthewfala
Copy link
Contributor

@matthewfala matthewfala commented Nov 23, 2022

master pr: #6612

OpenSearch and ElasticSearch improvements:

Several minor changes:

  • Add payload hash to authenticated request headers as:
x-amz-content-sha256
  • Allow for configurable service name where "es" is used by default.
  • Add Excluded_Headers parameter to aws sigv4 function.
  • Exclude content_length from OpenSearch & ES headers

Testing

Configuration Files

Tested by FireLens team on Amazon AWS Opensearch with the following plugin configurations

[SERVICE]
    Flush 1
    Grace 30

[INPUT]
    Name cpu
    Tag cpu

[OUTPUT]
    Name  es
    Match *
    Index cpu
    Host my.es.amazonaws.com
    port 443
    aws_auth on
    aws_region us-west-2
    tls on
[SERVICE]
    Flush 1
    Grace 30

[INPUT]
    Name cpu
    Tag cpu

[OUTPUT]
    Name  opensearch
    Match *
    Index cpu
    Host my.es.amazonaws.com
    port 443
    aws_auth on
    aws_region us-west-2
    tls on

Test Results

No error messages were detected. Logs are successfully received at the Opensearch cluster.

Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/20 04:14:12] [ info] [fluent bit] version=1.9.10, commit=062d4c1144, pid=6382
[2022/12/20 04:14:12] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/12/20 04:14:12] [ info] [cmetrics] version=0.3.7
[2022/12/20 04:14:12] [ info] [sp] stream processor started

Test Valgrind Results

valgrind --leak-check=full ./fluent-bit -c ../../.vscode/fluent-bit-config/fluent-bit.conf

Opensearch

==13015== Memcheck, a memory error detector
==13015== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==13015== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==13015== Command: ./fluent-bit -c ../../.vscode/fluent-bit-config/fluent-bit.conf
==13015== 
Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/20 04:24:05] [ info] [fluent bit] version=1.9.10, commit=c0e20d18dd, pid=13015
[2022/12/20 04:24:05] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/12/20 04:24:05] [ info] [cmetrics] version=0.3.7
[2022/12/20 04:24:06] [ info] [sp] stream processor started
==13015== Warning: client switching stacks?  SP change: 0x76fe8a8 --> 0x84fef80
==13015==          to suppress, use: --max-stackframe=14681816 or greater
==13015== Warning: client switching stacks?  SP change: 0x84feef8 --> 0x76fe8a8
==13015==          to suppress, use: --max-stackframe=14681680 or greater
==13015== Warning: client switching stacks?  SP change: 0x76fead8 --> 0x84feef8
==13015==          to suppress, use: --max-stackframe=14681120 or greater
==13015==          further instances of this message will not be shown.
^C[2022/12/20 04:24:10] [engine] caught signal (SIGINT)
[2022/12/20 04:24:10] [ info] [input] pausing cpu.0
[2022/12/20 04:24:10] [ warn] [engine] service will shutdown in max 30 seconds
[2022/12/20 04:24:10] [ info] [engine] service has stopped (0 pending tasks)
==13015== 
==13015== HEAP SUMMARY:
==13015==     in use at exit: 110,402 bytes in 3,763 blocks
==13015==   total heap usage: 60,332 allocs, 56,569 frees, 6,568,634 bytes allocated
==13015== 
==13015== LEAK SUMMARY:
==13015==    definitely lost: 0 bytes in 0 blocks
==13015==    indirectly lost: 0 bytes in 0 blocks
==13015==      possibly lost: 0 bytes in 0 blocks
==13015==    still reachable: 110,402 bytes in 3,763 blocks
==13015==         suppressed: 0 bytes in 0 blocks
==13015== Reachable blocks (those to which a pointer was found) are not shown.
==13015== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==13015== 
==13015== For lists of detected and suppressed errors, rerun with: -s
==13015== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

ElasticSearch

Unfortunately, es on both 1.9 and aoss-1.9 (this pr's branch) have numerous networking invalid read errors. This may need to be further investigated. There are no memory leaks introduced though:

es 1.9

==17772== Memcheck, a memory error detector
==17772== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==17772== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==17772== Command: ./fluent-bit -c ../../.vscode/fluent-bit-config/fluent-bit.conf
==17772== 
Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/20 04:29:02] [ info] [fluent bit] version=1.9.10, commit=760956f50c, pid=17772
[2022/12/20 04:29:02] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/12/20 04:29:02] [ info] [cmetrics] version=0.3.7
[2022/12/20 04:29:03] [ info] [sp] stream processor started
[2022/12/20 04:29:03] [ info] [output:es:es.0] worker #0 started
[2022/12/20 04:29:03] [ info] [output:es:es.0] worker #1 started
==17772== Warning: client switching stacks?  SP change: 0x8f009c8 --> 0x9949270
==17772==          to suppress, use: --max-stackframe=10782888 or greater
==17772== Warning: client switching stacks?  SP change: 0x99491e8 --> 0x8f009c8
==17772==          to suppress, use: --max-stackframe=10782752 or greater
==17772== Warning: client switching stacks?  SP change: 0x8f009c8 --> 0x99491e8
==17772==          to suppress, use: --max-stackframe=10782752 or greater
==17772==          further instances of this message will not be shown.
==17772== Thread 5 flb-out-es.0-w1:
==17772== Invalid write of size 8
==17772==    at 0x7CC1C7: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9902500 is in a rw- anonymous segment
==17772== 
==17772== Invalid write of size 8
==17772==    at 0x7CC1CB: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9902508 is in a rw- anonymous segment
==17772== 
==17772== Invalid write of size 8
==17772==    at 0x7CC1CF: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9902510 is in a rw- anonymous segment
==17772== 
==17772== Invalid write of size 8
==17772==    at 0x7CC1D3: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9902518 is in a rw- anonymous segment
==17772== 
==17772== Invalid write of size 8
==17772==    at 0x7CC1D7: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9902520 is in a rw- anonymous segment
==17772== 
==17772== Invalid write of size 8
==17772==    at 0x7CC1DB: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9902528 is in a rw- anonymous segment
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x7CC1DF: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9a322a8 is 8 bytes inside a block of size 25,088 alloc'd
==17772==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==17772==    by 0x7CD2E0: co_create (amd64.c:142)
==17772==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==17772==    by 0x460323: output_thread (flb_output_thread.c:288)
==17772==    by 0x477A70: step_callback (flb_worker.c:43)
==17772==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==17772==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x7CC1E3: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9a322b0 is 16 bytes inside a block of size 25,088 alloc'd
==17772==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==17772==    by 0x7CD2E0: co_create (amd64.c:142)
==17772==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==17772==    by 0x460323: output_thread (flb_output_thread.c:288)
==17772==    by 0x477A70: step_callback (flb_worker.c:43)
==17772==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==17772==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x7CC1E7: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9a322b8 is 24 bytes inside a block of size 25,088 alloc'd
==17772==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==17772==    by 0x7CD2E0: co_create (amd64.c:142)
==17772==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==17772==    by 0x460323: output_thread (flb_output_thread.c:288)
==17772==    by 0x477A70: step_callback (flb_worker.c:43)
==17772==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==17772==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x7CC1EB: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9a322c0 is 32 bytes inside a block of size 25,088 alloc'd
==17772==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==17772==    by 0x7CD2E0: co_create (amd64.c:142)
==17772==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==17772==    by 0x460323: output_thread (flb_output_thread.c:288)
==17772==    by 0x477A70: step_callback (flb_worker.c:43)
==17772==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==17772==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x7CC1EF: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9a322c8 is 40 bytes inside a block of size 25,088 alloc'd
==17772==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==17772==    by 0x7CD2E0: co_create (amd64.c:142)
==17772==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==17772==    by 0x460323: output_thread (flb_output_thread.c:288)
==17772==    by 0x477A70: step_callback (flb_worker.c:43)
==17772==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==17772==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x7CC1F3: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==17772==  Address 0x9a322d0 is 48 bytes inside a block of size 25,088 alloc'd
==17772==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==17772==    by 0x7CD2E0: co_create (amd64.c:142)
==17772==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==17772==    by 0x460323: output_thread (flb_output_thread.c:288)
==17772==    by 0x477A70: step_callback (flb_worker.c:43)
==17772==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==17772==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==17772== 
==17772== Invalid read of size 8
==17772==    at 0x45F84C: output_pre_cb_flush (flb_output.h:497)
==17772==  Address 0x9902700 is in a rw- anonymous segment
==17772== 

...

==17772== 
==17772== HEAP SUMMARY:
==17772==     in use at exit: 110,402 bytes in 3,763 blocks
==17772==   total heap usage: 73,309 allocs, 69,546 frees, 48,019,335 bytes allocated
==17772== 
==17772== LEAK SUMMARY:
==17772==    definitely lost: 0 bytes in 0 blocks
==17772==    indirectly lost: 0 bytes in 0 blocks
==17772==      possibly lost: 0 bytes in 0 blocks
==17772==    still reachable: 110,402 bytes in 3,763 blocks
==17772==         suppressed: 0 bytes in 0 blocks
==17772== Reachable blocks (those to which a pointer was found) are not shown.
==17772== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==17772== 
==17772== Use --track-origins=yes to see where uninitialised values come from
==17772== For lists of detected and suppressed errors, rerun with: -s

es aoss-1.9


Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/20 04:32:42] [ info] [fluent bit] version=1.9.10, commit=f22e8b90c1, pid=21318
[2022/12/20 04:32:42] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/12/20 04:32:42] [ info] [cmetrics] version=0.3.7
[2022/12/20 04:32:43] [ info] [sp] stream processor started
[2022/12/20 04:32:43] [ info] [output:es:es.0] worker #1 started
[2022/12/20 04:32:43] [ info] [output:es:es.0] worker #0 started
==21318== Warning: client switching stacks?  SP change: 0x8f009c8 --> 0x9949270
==21318==          to suppress, use: --max-stackframe=10782888 or greater
==21318== Warning: client switching stacks?  SP change: 0x99491e8 --> 0x8f009c8
==21318==          to suppress, use: --max-stackframe=10782752 or greater
==21318== Warning: client switching stacks?  SP change: 0x8f009c8 --> 0x99491e8
==21318==          to suppress, use: --max-stackframe=10782752 or greater
==21318==          further instances of this message will not be shown.
==21318== Thread 5 flb-out-es.0-w1:
==21318== Invalid write of size 8
==21318==    at 0x7CC547: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x9902500 is in a rw- anonymous segment
==21318== 
==21318== Invalid write of size 8
==21318==    at 0x7CC54B: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x9902508 is in a rw- anonymous segment
==21318== 
==21318== Invalid write of size 8
==21318==    at 0x7CC54F: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x9902510 is in a rw- anonymous segment
==21318== 
==21318== Invalid write of size 8
==21318==    at 0x7CC553: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x9902518 is in a rw- anonymous segment
==21318== 
==21318== Invalid write of size 8
==21318==    at 0x7CC557: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x9902520 is in a rw- anonymous segment
==21318== 
==21318== Invalid write of size 8
==21318==    at 0x7CC55B: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x9902528 is in a rw- anonymous segment
==21318== 
==21318== Invalid read of size 8
==21318==    at 0x7CC55F: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x99efbf8 is 8 bytes inside a block of size 25,088 alloc'd
==21318==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==21318==    by 0x7CD660: co_create (amd64.c:142)
==21318==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==21318==    by 0x460323: output_thread (flb_output_thread.c:288)
==21318==    by 0x477A70: step_callback (flb_worker.c:43)
==21318==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==21318==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)
==21318== 
==21318== Invalid read of size 8
==21318==    at 0x7CC563: co_swap_function (in /home/ec2-user/fala-forks/fluent-bit/build/bin/fluent-bit)
==21318==  Address 0x99efc00 is 16 bytes inside a block of size 25,088 alloc'd
==21318==    at 0x4C2D065: malloc (vg_replace_malloc.c:381)
==21318==    by 0x7CD660: co_create (amd64.c:142)
==21318==    by 0x460323: flb_output_flush_create (flb_output.h:569)
==21318==    by 0x460323: output_thread (flb_output_thread.c:288)
==21318==    by 0x477A70: step_callback (flb_worker.c:43)
==21318==    by 0x4E4444A: start_thread (in /usr/lib64/libpthread-2.26.so)
==21318==    by 0x642C52E: clone (in /usr/lib64/libc-2.26.so)

...


[2022/12/20 04:33:02] [ info] [output:es:es.0] thread worker #0 stopped
[2022/12/20 04:33:02] [ info] [output:es:es.0] thread worker #1 stopping...
[2022/12/20 04:33:02] [ info] [output:es:es.0] thread worker #1 stopped
==21318== 
==21318== HEAP SUMMARY:
==21318==     in use at exit: 110,402 bytes in 3,763 blocks
==21318==   total heap usage: 61,959 allocs, 58,196 frees, 11,855,407 bytes allocated
==21318== 
==21318== LEAK SUMMARY:
==21318==    definitely lost: 0 bytes in 0 blocks
==21318==    indirectly lost: 0 bytes in 0 blocks
==21318==      possibly lost: 0 bytes in 0 blocks
==21318==    still reachable: 110,402 bytes in 3,763 blocks
==21318==         suppressed: 0 bytes in 0 blocks
==21318== Reachable blocks (those to which a pointer was found) are not shown.
==21318== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==21318== 
==21318== Use --track-origins=yes to see where uninitialised values come from
==21318== For lists of detected and suppressed errors, rerun with: -s
==21318== ERROR SUMMARY: 11634 errors from 1000 contexts (suppressed: 0 from 0)

Interpretation of Valgrind Results

It's not clear why the invalid reads only appear in es and not the opensearch plugin which have nearly identical code.

CloudWatch_logs Plugin Regression Testing

CloudWatch_logs plugin is also tested with this PR to ensure that aws auth still works for other plugins.
CloudWatch receives logs successfully.

Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/12/20 04:37:32] [ info] [fluent bit] version=1.9.10, commit=f22e8b90c1, pid=23324
[2022/12/20 04:37:32] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/12/20 04:37:32] [ info] [cmetrics] version=0.3.7
[2022/12/20 04:37:33] [ info] [sp] stream processor started
[2022/12/20 04:37:33] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] worker #0 started
==23324== Warning: client switching stacks?  SP change: 0x93009c8 --> 0x87472f0
==23324==          to suppress, use: --max-stackframe=12293848 or greater
==23324== Warning: client switching stacks?  SP change: 0x8747268 --> 0x93009c8
==23324==          to suppress, use: --max-stackframe=12293984 or greater
==23324== Warning: client switching stacks?  SP change: 0x93009c8 --> 0x8747268
==23324==          to suppress, use: --max-stackframe=12293984 or greater
==23324==          further instances of this message will not be shown.
[2022/12/20 04:37:34] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream test2_test in log group fluent_test
[2022/12/20 04:37:36] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Log Group fluent_test not found. Will attempt to create it.
[2022/12/20 04:37:36] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group fluent_test
[2022/12/20 04:37:44] [ warn] [net] getaddrinfo(host='logs.us-west-2.amazonaws.com', err=12): Timeout while contacting DNS servers
[2022/12/20 04:37:44] [error] [aws_client] connection initialization error
[2022/12/20 04:37:45] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Created log group fluent_test
[2022/12/20 04:37:45] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream test2_test in log group fluent_test
[2022/12/20 04:37:45] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Created log stream test2_test
^C[2022/12/20 04:38:46] [engine] caught signal (SIGINT)
[2022/12/20 04:38:46] [ warn] [engine] service will shutdown in max 30 seconds
[2022/12/20 04:38:46] [ info] [engine] service has stopped (0 pending tasks)
[2022/12/20 04:38:46] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] thread worker #0 stopping...
[2022/12/20 04:38:46] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] thread worker #0 stopped
==23324== 
==23324== HEAP SUMMARY:
==23324==     in use at exit: 110,402 bytes in 3,763 blocks
==23324==   total heap usage: 67,457 allocs, 63,694 frees, 33,207,221 bytes allocated
==23324== 
==23324== LEAK SUMMARY:
==23324==    definitely lost: 0 bytes in 0 blocks
==23324==    indirectly lost: 0 bytes in 0 blocks
==23324==      possibly lost: 0 bytes in 0 blocks
==23324==    still reachable: 110,402 bytes in 3,763 blocks
==23324==         suppressed: 0 bytes in 0 blocks
==23324== Reachable blocks (those to which a pointer was found) are not shown.
==23324== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==23324== 
==23324== For lists of detected and suppressed errors, rerun with: -s
==23324== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@matthewfala matthewfala changed the title Aoss 1.9 es opensearch: Aoss 1.9 Nov 23, 2022
@matthewfala matthewfala temporarily deployed to pr November 23, 2022 22:30 Inactive
@matthewfala matthewfala temporarily deployed to pr November 23, 2022 22:30 Inactive
@matthewfala matthewfala changed the title es opensearch: Aoss 1.9 opensearch: aoss support - 1.9 Nov 23, 2022
@matthewfala matthewfala temporarily deployed to pr November 23, 2022 22:44 Inactive
Comment on lines 65 to 66
ctx->aws_region, ctx->aws_service_name,
S3_MODE_SIGNED_PAYLOAD,
Copy link
Contributor

@PettitWesley PettitWesley Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewfala We need to test this with opensearch without using the new option to ensure there is no regression.

Since this applies to all requests, even if the new option is not set. Need to check if opensearch always allows S3 signing mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm pretty sure you're always allowed to send this header, since it increases the security of the request

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -862,6 +862,7 @@ static void cb_es_flush(struct flb_event_chunk *event_chunk,
}
else {
/* The request was issued successfully, validate the 'error' field */
flb_plg_error(ctx->ins, "Headers: \n%s", c->resp.data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was added for debugging, let's remove it for prod release?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Good catch. Updated.

@@ -879,6 +879,7 @@ static void cb_opensearch_flush(struct flb_event_chunk *event_chunk,
}
else {
/* The request was issued successfully, validate the 'error' field */
flb_plg_error(ctx->ins, "Headers: \n%s", c->resp.data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@matthewfala matthewfala temporarily deployed to pr November 24, 2022 00:55 Inactive
@matthewfala matthewfala temporarily deployed to pr November 24, 2022 00:56 Inactive
@matthewfala matthewfala temporarily deployed to pr November 24, 2022 01:11 Inactive
@@ -62,8 +62,8 @@ static flb_sds_t add_aws_auth(struct flb_http_client *c,
flb_http_add_header(c, "User-Agent", 10, "aws-fluent-bit-plugin", 21);

signature = flb_signv4_do(c, FLB_TRUE, FLB_TRUE, time(NULL),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've heard that:

You can't include Content-Length as a signed header, otherwise you'll get an invalid signature error.

Currently, content-length is added in the http client add_host_and_content_length function which is called when you create a client- do we need to add some code here to ignore that header? I think I remember that when we worked on this before we did...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we test it and we don't get that error response then it works

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved.

I'm wondering if aoss needs content-type to also be excluded from the signed headers. This PR does not exclude content-type from the signed headers

Comment on lines 853 to 854
flb_http_add_header(c, "Content-Type", 12, "application/x-ndjson", 20);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You moved this line down so that its not part of the signed headers?

I think this is not a long term maintainable solution if that's the case... what if another maintainer goes and switches it back?

May be we need something like an option to pass a list of headers to flb_signv4_do that should not be signed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also where is the fix for Content-Length not being signed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. This was actually a part of Zhonghui's commits. No change on my part. The new "unsigned_headers" parameter will omit what we want. Following up on whether Content-Type should be omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to move this back up

@matthewfala matthewfala temporarily deployed to pr December 14, 2022 03:11 — with GitHub Actions Inactive
@matthewfala matthewfala temporarily deployed to pr December 14, 2022 03:11 — with GitHub Actions Inactive
@matthewfala
Copy link
Contributor Author

This pr needs more testing.

@matthewfala matthewfala temporarily deployed to pr December 14, 2022 03:27 — with GitHub Actions Inactive
0, ctx->aws_provider);
0, NULL, ctx->aws_provider);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matthewfala Do you know why we have AWS Auth in the bigquery output here?? Isn't that a google only product??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supposedly, AWS clients can use google cloud products with AWS credentials via Workload Identity Federation

I see someone mention this usecase for fluent bit in the following comment: #5084

Comment on lines 65 to 67
ctx->aws_region, ctx->aws_service_name,
S3_MODE_SIGNED_PAYLOAD, NULL,
ctx->aws_provider);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to add the AWS unsigned headers in es as well? Otherwise there is no point in adding the AWS_Service_Name field to this plugin I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goood point

@@ -40,6 +40,7 @@ flb_sds_t flb_signv4_do(struct flb_http_client *c, int normalize_uri,
time_t t_now,
char *region, char *service,
int s3_mode,
struct mk_list *unsigned_headers, /* flb_slist */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this broke the signv4 unit tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@matthewfala matthewfala temporarily deployed to pr December 20, 2022 04:58 — with GitHub Actions Inactive
@matthewfala matthewfala temporarily deployed to pr December 20, 2022 04:58 — with GitHub Actions Inactive
@matthewfala matthewfala temporarily deployed to pr December 20, 2022 05:06 — with GitHub Actions Inactive
@matthewfala matthewfala temporarily deployed to pr December 20, 2022 05:06 — with GitHub Actions Inactive
@matthewfala matthewfala marked this pull request as ready for review December 20, 2022 05:08
@matthewfala
Copy link
Contributor Author

matthewfala commented Dec 20, 2022

AWS-for-fluent-bit test image

Test Image

Repo 1: 826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit:2.29.0-add-opensearch-aoss

Documentation

Cherry-picked AWS-for-fluent-bit test image.
ft.json

{
    "campaign": "opensearch-aoss",
    "links": [
        "https://github.com/fluent/fluent-bit/pull/6448"
    ],
    "imageRepository": "826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit",
    "firelensBase": {
        "repository": "https://github.com/fala-aws/aws-for-fluent-bit.git",
        "branch": "flamethrower"
    },

    "images": [
        {
            "tag": "2.29.0-add-opensearch-aoss",
            "fluentbitBase": {
                "repository": "https://github.com/matthewfala/fluent-bit.git",
                "tag": "v1.9.10"
            },
            "fluentbitCherryPicks": [
                {
                    "name": "aws: add excluded_headers option to signv4 function",
                    "repository": "https://github.com/matthewfala/fluent-bit.git",
                    "branch": "aoss-1.9",
                    "commit": "1633c49aadad55bac483c5e55772de0e6c29704a"
                },
                {
                    "name": "es: aoss authentication support",
                    "repository": "https://github.com/matthewfala/fluent-bit.git",
                    "branch": "aoss-1.9",
                    "commit": "4dc5368502d22d5ed54f06e5700242a0138e3ff7"
                },
                {
                    "name": "opensearch: aoss authentication support",
                    "repository": "https://github.com/matthewfala/fluent-bit.git",
                    "branch": "aoss-1.9",
                    "commit": "f7be9ba54fca28a4f4d154f8ba55e53a81f31130"
                }
            ]
        }
        
    ]
}

Output AWS-for-fluent-bit cherrypick file

# 2.29.0-add-opensearch-aoss
## FLUENTBIT_BASE: v1.9.10
## IMAGE_TAG: 826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit:2.29.0-add-opensearch-aoss
## DOCKER_CMD_0: aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 826489191740.dkr.ecr.us-west-2.amazonaws.com
## DOCKER_CMD_1: docker tag  amazon/aws-for-fluent-bit:quick-debug 826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit:2.29.0-add-opensearch-aoss
## DOCKER_CMD_2: docker push 826489191740.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-for-fluent-bit:2.29.0-add-opensearch-aoss
# aws: add excluded_headers option to signv4 function
https://github.com/matthewfala/fluent-bit.git aoss-1.9 1633c49aadad55bac483c5e55772de0e6c29704a
# es: aoss authentication support
https://github.com/matthewfala/fluent-bit.git aoss-1.9 4dc5368502d22d5ed54f06e5700242a0138e3ff7
# opensearch: aoss authentication support
https://github.com/matthewfala/fluent-bit.git aoss-1.9 f7be9ba54fca28a4f4d154f8ba55e53a81f31130

Out

 Date: Tue Dec 20 04:57:47 2022 +0000
 3 files changed, 27 insertions(+), 2 deletions(-)
Cherry Pick Patch Summary:
Commit 0 -- 760956f50 out_s3: add store_dir_limit_size to limit S3 disk usage
Commit 1 -- 78cbff8b9 aws: add excluded_headers option to signv4 function
Commit 2 -- da85e147d es: aoss authentication support
Commit 3 -- acb379e19 opensearch: aoss authentication support
Removing intermediate container 5c3516b317d2
 ---> 75ab94c0f0c9
Step 23/48 : RUN cmake -DFLB_DEBUG=On           

@matthewfala matthewfala temporarily deployed to pr December 20, 2022 05:22 — with GitHub Actions Inactive
@@ -40,6 +40,7 @@ flb_sds_t flb_signv4_do(struct flb_http_client *c, int normalize_uri,
time_t t_now,
char *region, char *service,
int s3_mode,
struct mk_list *unsigned_headers, /* flb_slist */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved

@@ -62,8 +62,8 @@ static flb_sds_t add_aws_auth(struct flb_http_client *c,
flb_http_add_header(c, "User-Agent", 10, "aws-fluent-bit-plugin", 21);

signature = flb_signv4_do(c, FLB_TRUE, FLB_TRUE, time(NULL),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved.

I'm wondering if aoss needs content-type to also be excluded from the signed headers. This PR does not exclude content-type from the signed headers

ctx->aws_region, "es",
0,
ctx->aws_region, ctx->aws_service_name,
S3_MODE_SIGNED_PAYLOAD, ctx->aws_unsigned_headers,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the magic line of code.

@@ -61,6 +61,8 @@ struct flb_elasticsearch {
/* one for the standard chain provider, one for sts assume role */
struct flb_tls *aws_sts_tls;
char *aws_session_name;
char *aws_service_name;
struct mk_list *aws_unsigned_headers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically in code in FLB that I've seen we have just struct mk_list no pointer on context that way we don't have to alloc it and its simpler I think...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it like that at first, however there is an issue with just having a struct directly nested:

When you run the clean up logic, you need to know whether the slist has been initialized so that you can call:
flb_slist_destroy(ctx->aws_unsigned_headers);

accordingly.

aws_unsigned_headers might not be initialized depending on when the initialization script fails.

Thus using a pointer serves as a double purpose:

  1. Let us know if the slist is created
  2. store the slist

I could also have made a boolean to track this, but I think it is more straight forward and conforms better with the other code to just allocate memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think that's fine but its not the only way to do it.

There is:

static inline int mk_list_is_set(struct mk_list *head)
{
    if (head->next && head->prev) {
        return 0;
    }

    return -1;
}

If you just embed the mk_list struct on the context and alloc it with calloc then you can use this function because the next and prev pointers will be initialized to NULL.

@@ -73,6 +73,8 @@ struct flb_opensearch {
/* one for the standard chain provider, one for sts assume role */
struct flb_tls *aws_sts_tls;
char *aws_session_name;
char *aws_service_name;
struct mk_list *aws_unsigned_headers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as on es

PettitWesley
PettitWesley previously approved these changes Dec 23, 2022
Signed-off-by: Matthew Fala <falamatt@amazon.com>
Signed-off-by: Matthew Fala <falamatt@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants