Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tsm): Allow for deletion of series outside default rp #25312

Merged
merged 2 commits into from
Sep 17, 2024

Conversation

devanbenz
Copy link

@devanbenz devanbenz commented Sep 11, 2024

Modifies logic of TSDB deletion to include filtering by retention policy

Part one of this FR is located here: influxdata/influxql#71

This is part two, InfluxQL is updated here and this adds filtering by shard's retention policy within the delete series method.

Closes https://github.com/influxdata/feature-requests/issues/175

@devanbenz devanbenz self-assigned this Sep 11, 2024
@devanbenz devanbenz marked this pull request as draft September 11, 2024 20:29
@devanbenz devanbenz marked this pull request as ready for review September 12, 2024 19:58
@davidby-influx davidby-influx changed the title feat(edge): Allow for deletion of series outside default rp feat(tsm): Allow for deletion of series outside default rp Sep 12, 2024
tsdb/store.go Outdated Show resolved Hide resolved
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your shard copying algorithm is wrong, or I am misunderstanding it.

tsdb/store.go Outdated Show resolved Hide resolved
tsdb/store.go Outdated Show resolved Hide resolved
tsdb/store.go Outdated Show resolved Hide resolved
tsdb/store.go Outdated Show resolved Hide resolved
tsdb/store.go Outdated Show resolved Hide resolved
Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, and a large concern related to filtering by retention policy.

@devanbenz
Copy link
Author

devanbenz commented Sep 13, 2024

This seems wrong. Is the goal to only return shards that every function returns true for, and to only return each shard once?

You are allocating the shards slice once per function. That means that you are only getting the shards which pass the last function in the function slice, correct?

Do you want to instead, walk the shards in the outer loop, and the functions in the inner loop, and append a shard only if all functions return true for that shard?

I'm thinking of something like this. Am I misunderstanding your algorithm?

for _, sh := range s.shards {
        votes := 0
	for _, fn := range fns {
                if fn == nil || fn(sh) {
                        votes++
                }                      
        }
        if votes == len(fns) {
                   shards = append(shards, sh)
        }
}

You are correct, revisiting this there is a bug in the shard filtering logic. It will effectively throw away the "database" filtered shards using the byDatabase filter then create a new slice of shards and only filter by the next function (in this case byRetentionPolicy). The logic was operating correctly on existing tests just due to the fact that all other calls to the method are only using a single filter function... introduce more and it would not filter correctly.

Furthermore, If I were to write another test where I have two or more databases with the same retention policy name this would effectively delete series across multiple databases which is not the desired functionality or current way this works. I suspect that what ever database is the currently selected database should be processing the deletes. I'll go ahead and adjust this today.

Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion

tsdb/store.go Outdated Show resolved Hide resolved
tsdb/store.go Outdated
if len(sources) != 0 {
if measurement, ok := sources[0].(*influxql.Measurement); ok {
if measurement.RetentionPolicy != "" {
shardFilter = func(sh *Shard) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner here to actually make a composed function that calls the first shardFilter function. That way we can change byDatabase to do more or less or different things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this:

func ComposeFilter(original func(i int) bool, addition func(i int) bool) func(i int) bool {

	return func(i int) bool { return original(i) && addition(i) }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like that suggestion, I think it looks really clean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ComposeFilter seems needlessly complicated, especially since it would get used one place.

You could just make use byDatabase in the new shardFilter definition:

shardFilter := func(sh *Shard) bool { return byDatabase(sh) && sh.retentionPolicy == measurement.RetentionPolicy }

You could then pull sh.retentionPolicy == measurement.RetentionPolicy into a byRetentionPolicy function if you wanted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably change the name of shardFilter to something else because it is too similar to filterShards. My brain crossed them together the first time I read the code. Maybe change to shardFilter to sf or filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support wildcards for measurements in DELETE. Please verify the syntax to use, and add tests including them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do support wildcards for measurements in DELETE. Please verify the syntax to use, and add tests including them.

Sounds good 🫡

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devanbenz You could do byDatabase(database)(sh) && byRetentionPolicy(retention)(sh), but it's probably better to call byDatabase() and byRetentionPolicy() outside the lambda and assign them to variables, then use them inside the lambda. Or use @davidby-influx's ComposeFilter idea. Another way to do the compose filter would be something like:

func ComposeFilters(fns ...func(sh *Shard) bool) func(sh *Shard) bool {
   return func(sh *Shard) bool {
        for _, f := range fns {
            if !f(sh) {
                return false
            }
        }
        return true
    }
}

....

    shards := s.filterShards(ComposeFilters(byDatabase(db), byRetentionPolicy(rp)))

I find that way more readable, but that's personal preference.

I'd also be tempted to add a type ShardPredicate func(sh *Shard) bool 😁

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! @gwossum this is similar to what I was attempting to go for with my earlier design of passing in multiple functions but much much more elegant 🙏

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay modifications have been made -- testing of wildcards as well as adding the composable function for shard filtering + I refactored other areas that is calling the function to include the composition fn

davidby-influx
davidby-influx previously approved these changes Sep 17, 2024
Copy link
Contributor

@davidby-influx davidby-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I still think we should handle more than one source, lest we regret it in the future.

tsdb/store.go Outdated
Comment on lines 1527 to 1544
if len(sources) != 0 {
if measurement, ok := sources[0].(*influxql.Measurement); ok {
if measurement.RetentionPolicy != "" {
shardFilterFn = ComposeShardFilter(shardFilterFn, byRetentionPolicy(measurement.RetentionPolicy))
}
} else {
return fmt.Errorf("unsupported source type in delete %v", sources[0])
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided the parser wouldn't allow more than a single source, but I think it would still be good to check for that here and return an error. There might be another way to get more than measurement (Flux? Probably not, but are you sure?). There also might be a change made to the parser that allows more than one source, and if we don't check for that possibility here that could lead to strange issues in the future that we spend weeks tracking down. Returning an error instead what make it clear what happened.

9d116f6
This PR adds the ability for deletion of series that are outside
of the default retention policy. This updates InfluxQL to include changes
from: influxdata/influxql#71

closes: influxdata/feature-requests#175
9d116f6
This PR adds the ability for deletion of series that are outside
of the default retention policy. This updates InfluxQL to include changes
from: influxdata/influxql#71

closes: influxdata/feature-requests#175
Copy link
Member

@gwossum gwossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@devanbenz devanbenz merged commit 8eaa24d into master-1.x Sep 17, 2024
9 checks passed
@devanbenz devanbenz deleted the feat/175-delete-records-rp branch September 17, 2024 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants