Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema_registry: Improve sanitization of Avro namespaces #12334

Merged
merged 5 commits into from
Jul 24, 2023

Conversation

BenPope
Copy link
Member

@BenPope BenPope commented Jul 20, 2023

Improve sanitization of namespaces

  • Ensure that empty namespaces are explicitly set if it overrides the outer scope
  • Remove redundant namespaces

Fixes #11912

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Bug Fixes

  • schema_registry: Strip redundant namespaces in Avro to improve schema lookup

Pass a context instead of just a memoryt allocator.

Signed-off-by: Ben Pope <[email protected]>
@BenPope BenPope added the area/schema-registry Schema Registry service within Redpanda label Jul 20, 2023
@BenPope BenPope self-assigned this Jul 20, 2023
src/v/pandaproxy/schema_registry/avro.cc Outdated Show resolved Hide resolved
src/v/pandaproxy/schema_registry/test/sanitize_avro.cc Outdated Show resolved Hide resolved
@@ -296,7 +302,21 @@ result<void> sanitize(json::Value::Object& o, sanitize_context& ctx) {
new_namespace = std::move(fullname);
}

if (!new_namespace.empty()) {
if (!new_namespace.has_value()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: From what I can tell, the variable new_namespace represents the namespace taken from the name of a field. For example: name: existing.Simple implies new_namespace="existing"

It may be easier to read this code if we change the variable name for new_namespace to better reflect the above. For example: new_namespace -> namespace_from_field_name

This is a nit though and it could be apart of one of our tickets to refactor the Schema Registry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_namespace is the value of the namespace attribute we expect to put onto the object, but that's not true until the fullname has been split, and if it's not a fullname, the existing namespace, if any, has been set. Maybe I could make it more functional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm adding a link to this comment in one of our epics for SR startup for tracking.
Regardless, this comment does not block the PR

src/v/pandaproxy/schema_registry/avro.cc Show resolved Hide resolved
src/v/pandaproxy/schema_registry/avro.cc Show resolved Hide resolved
src/v/pandaproxy/schema_registry/avro.cc Show resolved Hide resolved
src/v/pandaproxy/schema_registry/avro.cc Show resolved Hide resolved
@BenPope
Copy link
Member Author

BenPope commented Jul 20, 2023

If the name field is a fullname, then an existing namespace is ignored.

Sanitize splits the fullname into name and namespace, so rewrite the
namespace rather than returning an error.

Reference: https://avro.apache.org/docs/1.11.1/specification/#names

Signed-off-by: Ben Pope <[email protected]>
Check that simple namespace cases are left alone.

Signed-off-by: Ben Pope <[email protected]>
This PR keeps track of the current namespace in a stack, starting with
the implicitly null "empty" namespace.

If the namespace changes, push it to the stack.

If a namespace is redundant (the same as outer scope), remove it.

Fixes redpanda-data#11912

Signed-off-by: Ben Pope <[email protected]>
@NyaliaLui NyaliaLui self-requested a review July 21, 2023 15:20
Copy link
Contributor

@NyaliaLui NyaliaLui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. CI Failures look unrelated from what I can tell.

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@BenPope BenPope changed the title schema_registry: Improve sanitization of namespaces schema_registry: Improve sanitization of Avro namespaces Jul 24, 2023
@piyushredpanda piyushredpanda merged commit dd9d851 into redpanda-data:dev Jul 24, 2023
33 of 35 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.2.x

@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.3.x

@vbotbuildovich
Copy link
Collaborator

/backport v22.2.x

@vbotbuildovich
Copy link
Collaborator

Failed to run cherry-pick command. I executed the commands below:

git checkout -b backport-pr-12334-v22.2.x-963 remotes/upstream/v22.2.x
git cherry-pick -x 312817c246464d0ef2412bdb890c3d84206bdff1 29af7c8dd3ab3dceff95fe62d74de60fce5f2b0e 6e7c061b487416ae664e0a68d9882d9cdc8f04cc 9cfd7bd73731c2f865e59ac91460a9608f9054c8 d73b2c8945f404f9b7574d1bac19bf648df033f2

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/schema-registry Schema Registry service within Redpanda
Projects
None yet
5 participants