Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⁉️ When pipeline is re-run for this user, it doesn't generate cleaned objects #1071

Open
JGreenlee opened this issue May 6, 2024 · 5 comments

Comments

@JGreenlee
Copy link

JGreenlee commented May 6, 2024

  • Using e-mission-server master

  • Load the Apr 24 dump for dfc-fermata

  • Inspect the cleaned objects for my opcode:

    import emission.storage.timeseries.abstract_timeseries as esta
    import emission.core.wrapper.user as ecwu
    
    opcode = "nrelop_dfc-fermata_test_████"
    keys = [
      "segmentation/raw_trip",
      "analysis/cleaned_trip",
      "analysis/cleaned_place",
      "analysis/cleaned_untracked"
    ]
    
    user_id = ecwu.User.fromEmail(opcode).uuid
    ts = esta.TimeSeries.get_time_series(user_id)
    for key in keys:
      print(key)
      entries_df = esta.TimeSeries.get_time_series(user_id).get_data_df(key)
      print(len(entries_df))

    Output:

    segmentation/raw_trip
    39
    analysis/cleaned_trip
    37
    analysis/cleaned_place
    49
    analysis/cleaned_untracked
    11
    
  • Reset pipeline for user
    ./e-mission-py.bash bin/reset_pipeline.py -e nrelop_dfc-fermata_test_████

  • Run pipeline for user
    ./e-mission-py.bash bin/debug/intake_single_user.py -e nrelop_dfc-fermata_test_████

    "Cleaning and resampling failed"

    2024-05-06 01:43:14,441:DEBUG:140704544475072:Found 3 sections, need to remove the uncommon ones...
    2024-05-06 01:43:14,507:DEBUG:140704544475072:section counts = [(ObjectId('66386db4b3b578d6ae775005'), 47272), (ObjectId('66386dd9b3b578d6ae7808ae'), 144), (ObjectId('66386ddab3b578d6ae780941'), 120)]
    2024-05-06 01:43:14,507:ERROR:140704544475072:Section counts = [47272, 144, 120], expecting 1
    2024-05-06 01:43:14,509:ERROR:140704544475072:Cleaning and resampling failed for user 4acedfd2-de6a-4623-a968-9d5046a8573b
    Traceback (most recent call last):
      File "/Users/jgreenle/openpath/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 88, in clean_and_resample
        last_raw_place = save_cleaned_segments_for_ts(user_id, time_query.startTs, time_query.endTs)
      File "/Users/jgreenle/openpath/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 114, in save_cleaned_segments_for_ts
        return save_cleaned_segments_for_timeline(user_id, tl)
      File "/Users/jgreenle/openpath/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 140, in save_cleaned_segments_for_timeline
        (last_cleaned_place, filtered_tl) = create_and_link_timeline(tl, user_id, trip_map)
      File "/Users/jgreenle/openpath/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 971, in create_and_link_timeline
        link_trip_start(ts, curr_cleaned_trip, curr_cleaned_start_place, raw_start_place)
      File "/Users/jgreenle/openpath/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1024, in link_trip_start
        _fix_squished_place_mismatch(cleaned_trip.user_id, cleaned_trip.get_id(),
      File "/Users/jgreenle/openpath/e-mission-server/emission/analysis/intake/cleaning/clean_and_resample.py", line 1169, in _fix_squished_place_mismatch
        assert False
    AssertionError
    
  • Now we have no cleaned objects (and thus no confirmed or composite objects)

    segmentation/raw_trip
    31
    analysis/cleaned_trip
    0
    analysis/cleaned_place
    0
    analysis/cleaned_untracked
    0
    
@shankari
Copy link
Contributor

shankari commented May 6, 2024

The assertion errors are validation or quality control checks on the pipeline. They are turned on by default in dev versions so we can see what is going wrong. In this case, it is almost certainly

2024-05-06 01:43:14,507:ERROR:140704544475072:Section counts = [47272, 144, 120], expecting 1

But if you would like to ignore that for now since it is not a section you worked on, you can turn off the various assertion errors in conf/analysis/debug.conf.json

@JGreenlee
Copy link
Author

In that case, it should not affect production right?
This opcode does not have any trips on production either

@shankari
Copy link
Contributor

shankari commented May 6, 2024

It doesn't have any trips on production because I upgraded DFC fermata to the version with the BLE matching, and reset the pipeline for all users so that the trips and sections would be re-created and we could generate a snapshot with the correct data format for public dashboard testing.
e-mission/em-public-dashboard#124 (comment)

The trips should be showing up soon, or any errors will be in the intake pipeline logs.

@JGreenlee
Copy link
Author

Ah I see, I thought it would have finished processing by now

@JGreenlee
Copy link
Author

Circling back to this and confirming that I was able to get my trips to process locally by turning off the assertion errors.

I'd still like to know why my sections were considered "invalid" (I've also seen logs saying "This is messed up segment. Investigate further while processing section, skipping...")
But that might be a hairy topic that we don't have time to dive into now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants