-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: versioned deployments for UDFs #25121
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I left a few suggestions, but nothing that stops it from merging.
# 3. Copy the `user_defined_function.xml` file in the newly created version folder (e.g. `user_scripts/v4/user_defined_function.xml`) to the `posthog-cloud-infra` repo and deploy it | ||
# 4. After that deploy goes out, it is safe to land and deploy the changes to the `posthog` repo | ||
# If deploys aren't seamless, look into moving the action that copies the `user_scripts` folder to the clickhouse cluster earlier in the deploy process | ||
UDF_VERSION = 0 # Last modified by: @aspicer, 2024-09-20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do versioning here by an analogy with Django migrations? Django has latest_migration.manifest
that contains the latest versions. Additionally, the version name is not unique, so if someone doesn't change a comment after the version, it won't be caught by git unless there are merge conflicts in files. Hash suffix/automated comment makes more sense here. It is probably not a big deal right now since not many folks work with UDFs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, latest_user_defined_function.xml
should catch such cases. I suppose the most significant breaking change here is a difference in function definitions, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, absolutely. This should get caught in user_defined_function.xml
inside of the docker folder. It definitely won't merge without conflict if two people have changed the signatures differently. I think this is okay, if it becomes an issue, we can add something like Django!
@@ -0,0 +1,70 @@ | |||
import argparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be a Django management script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any good reason to add that dependency, since these files have to run without django and it's really a completely separate subsystem.
I added a test that fails if |
Requires: https://github.com/PostHog/posthog-cloud-infra/pull/3059
Problem
We want to start rolling out UDFs to customers.
Currently, if there is a change in the schema of the UDFs and posthog is rolled back, UDFs will not work anymore.
Changes
Create a versioning system for UDFs. Every time we make a schema change or a breaking functionality change, we need to create a new version.
The process is documented at the top of
udf_versioner.py
but I will copy it here for posterity and ease:For revertible cloud deploys:
user_scripts
, along withuser_defined_function.xml
inside ofdocker/clickhouse
udf_versioner.py
and run that file every time you make breaking changes to UDFs (likely involving type definitions).user_defined_function.xml
file in the newly created version folder (e.g.user_scripts/v4/user_defined_function.xml
) to theposthog-cloud-infra
repo and deploy itposthog
repoIf deploys aren't seamless, look into moving the action that copies the
user_scripts
folder to the clickhouse cluster earlier in the deploy processThis ends up creating a
user_script
directory that looks like the following:The
user_defined_function.xml
file of each version contains all of the signatures versions before it.Finally, the mapping code is updated to look like this:
Does this work well for both Cloud and self-hosted?
No impact for self-hosted or dev, only cloud.
How did you test this code?
Ran the script, checked the output.
Updated CI to use the cloud versioned config and functions for testing.
Deploy Plan
user_defined_function.xml
file with the v0 definitions to cloud-infra