Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedPanda's Schema Registry handles redundant namespace tags in Avro differently compared to Confluent's Schema Registry #11912

Closed
danielfagerstrom opened this issue Jul 6, 2023 · 1 comment · Fixed by #12334
Assignees
Labels
area/schema-registry Schema Registry service within Redpanda kind/bug Something isn't working

Comments

@danielfagerstrom
Copy link

Version & Environment

Redpanda version: v23.1.12
Confluent Schema Registry version: 7.4.0

What went wrong?

I get problems when trying to use RedPanda together with Kafka Connect on topics with Avro schema.

The topics I try to consume in a Kafka Connect Sink, are created from C# clients that use the AvroGen code generator from Apache Avro. AvroGen add redundant namespace tags to nested schemas. And the Confluent Schema Registry (SR) and Confluent's AvroConverter remove redundant name space tags while RedPanda's SR doesn't.

An example: if I post this schema without redundant nested namespace tags:

A:

{
    "type": "record",
    "namespace": "io.example",
    "name": "main",
    "fields": [
        {"name": "title", "type": "string"},
        {
            "name": "sub",
            "type": {
                "type": "record",
                "name": "sub",
                "fields": [
                    {"name": "content", "type": "string"}
                ]
            }
        }
    ]
}

and this schema with a redundant nested namespace tag ("namespace": "io.example" whithin the sub record):

B:

{
    "type": "record",
    "namespace": "io.example",
    "name": "main",
    "fields": [
        {"name": "title", "type": "string"},
        {
            "name": "sub",
            "type": {
                "type": "record",
                "name": "sub",
                "namespace": "io.example",
                "fields": [
                    {"name": "content", "type": "string"}
                ]
            }
        }
    ]
}

to Confluent's SR schema, A and B will be registered under the same id and it will store the schema (A) without the redundant namespace tag. While if I post to the RedPanda SR, schema A and B will be stored under different ids. I do the post to /subjects/subject-value/versions without any normalization.

Internally inside Kafka Connect the method: io.confluent.connect.avro.AvroConverter.toConnectData first do a lookup of the schema based on the id in the message, after that it does the same "normalization" of removing redundant name space tags as was done in the Confluent SR. And then it post the normalized schema to /subjects/subject-value?normalize=false&deleted=true, probably to get the latest version of the schema. This step fails when I use RedPanda's SR, but succeeds with Confluent's SR.

What should have happened instead?

The Schema Registry from RedPanda should behave like the Schema Registry from Confluent and consider the two schemas above as being identical. It could of course be discussed if the behavior in the Confluent SR is "correct", I have at least not found any documentation that documents it. But as Confluent have implemented often used open source libraries like the AvroConverter it will be easier to use RedPandas SR if it behaves in the same way.

How to reproduce the issue?

See above.

Additional information

Stack trace for the Kafka Connect problem:

Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic example-topic to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:148)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$5(WorkerSinkTask.java:525)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:190)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:224)
... 14 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro value schema version for id 65
at io.confluent.kafka.serializers.AbstractKafkaSchemaSerDe.toKafkaException(AbstractKafkaSchemaSerDe.java:703)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.schemaVersion(AbstractKafkaAvroDeserializer.java:219)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserializeWithSchemaAndVersion(AbstractKafkaAvroDeserializer.java:266)
at io.confluent.connect.avro.AvroConverter$Deserializer.deserialize(AvroConverter.java:199)
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:126)
... 17 more
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Schema not found; error code: 40403
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:314)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:384)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:475)
at io.confluent.kafka.schemaregistry.client.rest.RestService.lookUpSubjectVersion(RestService.java:460)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getVersionFromRegistry(CachedSchemaRegistryClient.java:338)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getVersion(CachedSchemaRegistryClient.java:538)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getVersion(CachedSchemaRegistryClient.java:518)
at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.schemaVersion(AbstractKafkaAvroDeserializer.java:201)
... 20 more
@danielfagerstrom danielfagerstrom added the kind/bug Something isn't working label Jul 6, 2023
@BenPope BenPope added the area/schema-registry Schema Registry service within Redpanda label Jul 6, 2023
BenPope added a commit to BenPope/redpanda that referenced this issue Jul 20, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
BenPope added a commit to BenPope/redpanda that referenced this issue Jul 21, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jul 24, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 9cfd7bd)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jul 24, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 9cfd7bd)
vbotbuildovich pushed a commit to vbotbuildovich/redpanda that referenced this issue Jul 24, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 9cfd7bd)
BenPope added a commit to BenPope/redpanda that referenced this issue Jul 24, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 9cfd7bd)
BenPope added a commit to BenPope/redpanda that referenced this issue Jul 24, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 9cfd7bd)
graphcareful pushed a commit to graphcareful/redpanda that referenced this issue Aug 2, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
rockwotj pushed a commit to rockwotj/redpanda that referenced this issue Aug 15, 2023
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
(cherry picked from commit 9cfd7bd)
@dynamike2010
Copy link

If you still have issue with schema not found then please take a look here:
tabular-io/iceberg-kafka-connect#238 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema-registry Schema Registry service within Redpanda kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants