Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka scaler doesn't handle Sarama errors properly #3056

Closed
lsolovey opened this issue May 20, 2022 · 7 comments · Fixed by #3452
Closed

Kafka scaler doesn't handle Sarama errors properly #3056

lsolovey opened this issue May 20, 2022 · 7 comments · Fixed by #3452
Assignees
Labels
bug Something isn't working

Comments

@lsolovey
Copy link
Contributor

lsolovey commented May 20, 2022

Report

Sarama can report Kafka errors not just as Go standard error but rather use custom field on the returned data object.
(I'm not a Go expert so please excuse my terminology).

For example, DescribeTopics() in Sarama Admin client is declared as:

DescribeTopics(topics []string) (metadata []*TopicMetadata, err error)

And TopicMetadata is a struct with "custom" KError field.

type TopicMetadata struct {
	Err        KError
	Name       string
	IsInternal bool // Only valid for Version >= 1
	Partitions []*PartitionMetadata
}

Kafka scaler only checks for error returned from DescribeTopics, but it doesn't check for any KErrors on the returned payloads.
So effectively - some Kafka errors are ignored by KEDA.

The same applies for all other Sarama API calls used by Kafka Scaler.

I debugged KEDA Kafka Scaler - please see the screenshot with a sample response from DescribeTopics() call:
image

Expected Behavior

KEDA Kafka Scaler correctly reports errors from Sarama client in logs.

Actual Behavior

KEDA Kafka Scaler doesn't report errors from Sarama client in logs.
No way to troubleshoot Kafka Scaler other than debugging.

Steps to Reproduce the Problem

  1. We use Confluent Platform Kafka with RBAC enabled.
  2. Create ScaledObject with Kafka Trigger. Use TLS certs issued for Kafka principal which has just one RBAC permission: DeveloperRead permission on the Consumer Group used by Kafka trigger.
  3. Publish some records in the Kafka topic specified in Kafka Trigger (to increase the offset).
  4. KEDA doesn't activate target Deployment. No errors reported in KEDA logs.
  5. Upon debugging of Kafka Scaler, it's clear that Kafka error about unauthorized topic access is reported by Sarama but ignored by KEDA.

Logs from KEDA operator

No response

KEDA Version

2.7.1

Kubernetes Version

1.21

Platform

Other

Scaler Details

Kafka

Anything else?

No response

@lsolovey lsolovey added the bug Something isn't working label May 20, 2022
@JorTurFer
Copy link
Member

JorTurFer commented May 23, 2022

Hey @lsolovey
Thanks for reporting this. I'm not an expert in kafka so I'm not sure if explain better the errors is possible or not (maintaining the current compatibility), but if it is, are you willing to contribute? It makes sense IMO
@zroubalik

@lsolovey
Copy link
Contributor Author

As far as I understand - these Sarama KErrors correspond to Kafka Broker error codes as described here:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ErrorCodes

So KEDA Kafka Scaler just needs to fail if KError is anything but zero.

Unfortunately I have no experience with Go - I don't think I can write quality Go code as of now.

PS. Great job explaining how to use VSCode in BUILD.md ! I was able to debug Kafka scaler without any Go knowledge at all :)

@zroubalik
Copy link
Member

zroubalik commented May 24, 2022

Great investigation! This is indeed a great feature that would help with investigation once there are problems with Kafka Scaler.

Unfortunately I have no experience with Go - I don't think I can write quality Go code as of now.

So maybe this is the time where you start with Go, WDYT? It is not that hard :)

@zroubalik
Copy link
Member

Would be nice to print these extended error messages as a first step. Currently the standard Go error returned from Go client is very generic and it is not obvious where is the problem. This might help especially with debugging auth problems.

@liortct
Copy link

liortct commented May 25, 2022

I am willing to contribute this :)

@zroubalik
Copy link
Member

@liortct any update on this?

@JorTurFer
Copy link
Member

I'm going to do it before the release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants