-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confidence discounting for small clusters #4
Confidence discounting for small clusters #4
Conversation
+ Implements the algorithm described in e-mission/e-mission-docs#663 (comment) + Uses constant values `A=0.01`, `B=0.75`, `C=0.25` + Changes eacilp.primary_algorithms to use the new algorithm No unit tests yet (working on a tight timeline), but tested as follows: 1. Run `[eacili.n_to_confidence_coeff(n) for n in [1,2,3,4,5,7,10,15,20,30,1000]]`, check for reasonableness, compare to results from plugging the formula into a calculator 2. Run the modeling and intake pipeline with `eacilp.primary_algorithms` set to the old algorithm 3. Run the first few cells of the "Explore label inference confidence" notebook for a user with many inferrable trips to get a list of unique probabilities and counts for that user 4. Set `eacilp.primary_algorithms` to the new algorithm, rerun the intake pipeline (modeling pipeline hasn't changed) 5. Rerun the notebook as above, examine how the list of probabilities and counts has changed
+ See comments in e-mission/e-mission-docs#663
+ Useful for playing around with other constants in notebooks
if max_confidence is None: max_confidence = 0.99 # Confidence coefficient for n approaching infinity -- in the GitHub issue, this is 1-A | ||
if first_confidence is None: first_confidence = 0.80 # Confidence coefficient for n = 1 -- in the issue, this is B | ||
if confidence_multiplier is None: confidence_multiplier = 0.30 # How much of the remaining removable confidence to remove between n = k and n = k+1 -- in the issue, this is C | ||
return max_confidence-(max_confidence-first_confidence)*(1-confidence_multiplier)**(n-1) # This is the u = ... formula in the issue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
later, I would like to put these into a config file. But definitely not needed now.
Ran into error
I think this will fix it.
will test and fix as part of my pending changes |
Actually, that was because of some uncommitted changes to the build stage that was saving "null" values. If the file is not found, we will go directly to the |
See e-mission/e-mission-docs#663 for an explanation of the problem and the solution. This should be ready to merge; I'll run some final tests to make sure everything is working smoothly in the next few minutes.