Beam Enrich has been moved to the snowplow/enrich repo, where it is still actively maintained.
Beam Enrich processes raw Snowplow events from an input GCP PubSub subscription, enrich them and store them into an output PubSub topic. Events are enriched using the scala-common-enrich library.
This project uses sbt-native-packager.
To build the zip archive, run:
sbt universal:packageBin
To build a Docker image, run:
sbt docker:publishLocal
Once unzipped the artifact can be run as follows:
./bin/beam-enrich \
--runner=DataFlowRunner \
--project=project-id \
--streaming=true \
--zone=europe-west2-a \
--gcpTempLocation=gs://location/ \
--raw=projects/project/subscriptions/raw-topic-subscription \
--enriched=projects/project/topics/enriched-topic \
--bad=projects/project/topics/bad-topic \
--pii=projects/project/topics/pii-topic \ #OPTIONAL
--resolver=iglu_resolver.json \
--labels={"label": "value"} \ #OPTIONAL
--enrichments=enrichments/
To display the help message:
./bin/beam-enrich \
--runner=DataFlowRunner \
--help
A container can be run as follows:
docker run \
-v $PWD/config:/snowplow/config \
-e GOOGLE_APPLICATION_CREDENTIALS=/snowplow/config/credentials.json \ # if running outside GCP
snowplow/beam-enrich:1.2.0 \
--runner=DataFlowRunner \
--job-name=snowplow-enrich \
--project=project-id \
--streaming=true \
--zone=europe-west2-a \
--gcpTempLocation=gs://location/ \
--raw=projects/project/subscriptions/raw-topic-subscription \
--enriched=projects/project/topics/enriched-topic \
--bad=projects/project/topics/bad-topic \
--pii=projects/project/topics/pii-topic \ #OPTIONAL
--resolver=/snowplow/config/iglu_resolver.json \
--labels={"label": "value"} \ #OPTIONAL
--enrichments=/snowplow/config/enrichments/
To display the help message:
docker run snowplow/beam-enrich:1.2.0 \
--runner=DataFlowRunner \
--help
Note that, for the enrichments relying on local files, the files need to be accessible from Google Cloud Storage, e.g. for the IP lookups enrichment:
{
"schema": "iglu:com.snowplowanalytics.snowplow/ip_lookups/jsonschema/2-0-0",
"data": {
"name": "ip_lookups",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"geo": {
"database": "GeoLite2-City.mmdb",
"uri": "gs://beam-enrich-test/maxmind"
}
}
}
}
A full list of all the Beam CLI options can be found at: https://cloud.google.com/dataflow/pipelines/specifying-exec-params#setting-other-cloud-pipeline-options.
To run the tests:
sbt test
To experiment with the current codebase in Scio REPL simply run:
sbt repl/run
Technical Docs | Setup Guide |
---|---|
Technical Docs | Setup Guide |
Copyright 2018-2020 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.