Stream Enrich: New Hope #328

chuwy · 2020-09-01T11:47:13Z

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Main.scala

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Payload.scala

benjben · 2020-09-16T08:34:42Z

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Enrich.scala

+        }
+        .map(enriched => Payload(enriched, row.ack))
+
+    result.handleErrorWith(sendToSentry[F](row, sentry))


Shouldn't we send the payload rather than the row ? So that to troubleshoot we don't need to Thrift deserialize it

I think sendToSentry is probably slightly misleading. It does three things:

Sends an exception to Sentry (we cannot send anything from an event because it can contain PII data

Creates a generic_error bad row - that's why we need a row

Logs an error

My point is, why not use payload instead of row in the generic bad row created? An array of Thrift bytes is not very useful to troubleshoot, compared with a BadRow (CPFormatViolation) or a CollectorPayload

Ah, ok makes sense now!

The reason was that our payload is something like ValidatedNel[CPFormatViolation, Option[CollectorPayload]], so we don't really have a parsed payload yet. We technicall can pattern-match on it, like:

payload match { case Validated.Invalid(errors) => // what to do here? why we were trying to process error in a first place errors // thrift bytes anyway case Validated.Valid(payload) => turnIntoAdapterFailure(payload) // but what if it's not an adapter failure? enrichment failure? then we need a raw event to construct bad row

And also feels weird to produce different kinds of bad row from the same place, so I decided to stick with the most generic one.

I was thinking of putting something like show"$payload" in the generic_error, whatever it contains.

But in that case we won't have a clear way to recover it. I think it's an important promise that whenever you have generic_error coming from enrich you need to be able to base64 payload in order to recover it.

It's actually a bug that ThriftLoader.toCollectorPayload's signature says that it can return multiple bad rows, it can return only one.

So in the generic_error bad row we could put either the single CPFormatViolation (recoverable) or the raw extracted CollectorPayload (recoverable).

chuwy · 2020-10-11T15:37:20Z

Hey @benjben! I adressed all your feedback, added few more tests and couple of tickets (#370 - depends on NH, #371 was also discovered while I was writing tests). If anyone else from @snowplow/com-snowplowanalytics-engineering-datacapability wants to have a look - you're welcome. Otherwise this should be ready.

oguzhanunlu · 2020-10-19T11:40:49Z

config/config.fs2.hocon.sample

@@ -0,0 +1,23 @@
+auth = {
+  type = "Gcp"


Acronyms are generally all-caps. Is there a specific reason to use Gcp?

From a user's perspective, it'd be useful if we could see valid values of configuration fields, e.g. is gcp or GCP valid here? Or do we want to rely on user-friendly error messages explaining what's wrong and how to fix?

It'd be nice to see which fields are optional and and which values are used by default when applicable e.g. assetsUpdatePeriod if it is not configured

Spelling was dictated by codecs deriivation (it uses a exact case class name). I decided no to change it for now as Gcp is the only valid value here, but I agree this is something that should be fixed.

I'll add comments to the config file.

oguzhanunlu · 2020-10-19T11:55:39Z

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Assets.scala

+
+  object State {
+
+    /** Test pair is used in tests to initialize HTTP client */


Why do we have it here if it is used in tests?

Because we need to skip it here https://github.com/snowplow/enrich/pull/328/files/72e8d30543915d21af2150223cb1598d043e754a#diff-638a4515052fa93eecdd441517d16652eb2cd047af5d77847997c83602d1eee9R92

Looking at L92, it seems it isn't used in tests only, could you update this scaladoc?

I didn't notice your comment and posted the last one, still scaladoc needs an update I think

Why production code needs to know anything about test code?

lukeindykiewicz

Looks good in general! It's a bit to big PR to read it very carefully.

lukeindykiewicz · 2020-10-19T17:31:38Z

.github/workflows/test.yml

-      if: ${{ always() }}
-      run: sbt coveralls
-      env:
-        COVERALLS_REPO_TOKEN: ${{ secrets.COVERALLS_REPO_TOKEN }}
    - name: Check Scala formatting
      if: ${{ always() }}
      run: sbt scalafmtCheck


Could you change to scalafmtCheckAll and add scalafmtSbtCheck, please?

lukeindykiewicz · 2020-10-19T19:05:15Z

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Assets.scala

+
+  object State {
+
+    /** Test pair is used in tests to initialize HTTP client */


Why production code needs to know anything about test code?

lukeindykiewicz · 2020-10-19T20:14:49Z

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Assets.scala

+  final case class Hash private (s: String) extends AnyVal
+
+  object Hash {
+    private[this] def fromBytes(bytes: Array[Byte]): Hash = {


This small function should have a test

lukeindykiewicz · 2020-10-19T20:19:09Z

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Assets.scala

+          // side-effecting get-set is inherently not thread-safe
+          // we need to be sure the state.stop is set to true
+          // before re-initializing enrichments
+          _ <- Logger[F].info(s"Unpausing enrich stream")


Unpausing -> Resuming

Would be good to stick to one, either show or s.

…371)

chuwy assigned benjben Sep 1, 2020

chuwy changed the base branch from master to feature/cats2 September 1, 2020 11:47

benjben reviewed Sep 14, 2020

View reviewed changes

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Main.scala Outdated Show resolved Hide resolved

chuwy changed the title ~~Stream Enrich NG~~ Stream Enrich: New Hope Sep 14, 2020

benjben reviewed Sep 15, 2020

View reviewed changes

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Main.scala Outdated Show resolved Hide resolved

chuwy force-pushed the feature/cats2 branch 2 times, most recently from 8ea132b to ada7cd5 Compare September 15, 2020 22:28

chuwy force-pushed the feature/fs2-enrich branch 3 times, most recently from 352044f to 15da7b7 Compare September 15, 2020 22:56

benjben reviewed Sep 16, 2020

View reviewed changes

modules/fs2/src/main/scala/com/snowplowanalytics/snowplow/enrich/fs2/Payload.scala Show resolved Hide resolved

benjben reviewed Sep 16, 2020

View reviewed changes

chuwy force-pushed the feature/cats2 branch from ada7cd5 to dd8ed1b Compare September 16, 2020 17:52

chuwy force-pushed the feature/fs2-enrich branch from cde51dc to 399ad89 Compare September 16, 2020 18:04

chuwy force-pushed the feature/cats2 branch 3 times, most recently from 2cbe8b4 to e3dfc3e Compare September 17, 2020 12:49

chuwy force-pushed the feature/fs2-enrich branch from 399ad89 to 4e27039 Compare September 17, 2020 14:03

chuwy changed the base branch from feature/cats2 to develop September 17, 2020 14:04

chuwy force-pushed the feature/fs2-enrich branch 4 times, most recently from 2b8c98f to 3f425e3 Compare September 20, 2020 14:31

chuwy force-pushed the feature/fs2-enrich branch 4 times, most recently from 8df2f4b to e80962a Compare October 6, 2020 10:42

chuwy requested a review from a team October 6, 2020 10:43

chuwy force-pushed the feature/fs2-enrich branch 2 times, most recently from 8b0d3a3 to cc5d6b1 Compare October 6, 2020 16:12

chuwy mentioned this pull request Oct 9, 2020

Common: Loader.toCollectorPayload should emit only one bad row and not a list #56

Open

chuwy force-pushed the feature/fs2-enrich branch 3 times, most recently from 437a69c to e367370 Compare October 11, 2020 15:21

chuwy force-pushed the feature/fs2-enrich branch 6 times, most recently from 9851e1e to 6e25f3b Compare October 16, 2020 16:27

chuwy force-pushed the develop branch from b7cb02f to 9683e92 Compare October 16, 2020 16:31

chuwy force-pushed the feature/fs2-enrich branch 6 times, most recently from f0acad3 to a874162 Compare October 17, 2020 08:40

oguzhanunlu reviewed Oct 19, 2020

View reviewed changes

lukeindykiewicz approved these changes Oct 19, 2020

View reviewed changes

chuwy added 4 commits October 21, 2020 08:47

Stream FS2: add (close #346)

b5daeee

Common: add benchmarking module (close #370)

0f250f9

Common: fix NullPointerException on serializing invalid state (close #…

af2278b

…371)

Common: make assets publishing independent of each other (close #373)

c61fa01

chuwy force-pushed the feature/fs2-enrich branch 2 times, most recently from bfc2b5f to c61fa01 Compare October 21, 2020 11:27

chuwy merged commit c61fa01 into develop Oct 21, 2020

chuwy deleted the feature/fs2-enrich branch October 21, 2020 14:54

chuwy mentioned this pull request Oct 21, 2020

Release/1.4.0 #363

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream Enrich: New Hope #328

Stream Enrich: New Hope #328

chuwy commented Sep 1, 2020 •

edited

Loading

benjben Sep 16, 2020

chuwy Sep 16, 2020

benjben Sep 16, 2020

chuwy Sep 16, 2020

benjben Sep 16, 2020

chuwy Sep 16, 2020

benjben Sep 16, 2020

chuwy commented Oct 11, 2020

oguzhanunlu Oct 19, 2020

oguzhanunlu Oct 19, 2020

chuwy Oct 19, 2020

oguzhanunlu Oct 19, 2020

chuwy Oct 19, 2020

oguzhanunlu Oct 19, 2020

oguzhanunlu Oct 19, 2020

lukeindykiewicz Oct 19, 2020

lukeindykiewicz left a comment

lukeindykiewicz Oct 19, 2020

lukeindykiewicz Oct 19, 2020

lukeindykiewicz Oct 19, 2020

lukeindykiewicz Oct 19, 2020

lukeindykiewicz Oct 19, 2020


		object State {

		/** Test pair is used in tests to initialize HTTP client */

Stream Enrich: New Hope #328

Stream Enrich: New Hope #328

Conversation

chuwy commented Sep 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuwy commented Oct 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lukeindykiewicz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chuwy commented Sep 1, 2020 •

edited

Loading