-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Set random seeds #101
Changes from 12 commits
954db17
d801861
279e595
c42c49c
7693856
24d447e
8f50ccd
c291a6b
290002f
f976f8a
fe89c5d
50e22e0
fbee7e3
cfb1197
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -43,14 +43,16 @@ def multinomial(table: biom.Table, | |||||
# split up training and testing | ||||||
trainX, testX, trainY, testY = split_training( | ||||||
dense_table, metadata, design, | ||||||
training_column, num_random_test_examples | ||||||
training_column, num_random_test_examples, | ||||||
seed=0, | ||||||
) | ||||||
|
||||||
model = MultRegression(learning_rate=learning_rate, clipnorm=clipnorm, | ||||||
beta_mean=differential_prior, | ||||||
batch_size=batch_size, | ||||||
save_path=None) | ||||||
with tf.Graph().as_default(), tf.Session() as session: | ||||||
tf.set_random_seed(0) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
this won't work - the seed will always be set to zero otherwise. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably will want to have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the QIIME plugin, I thought it might be better to make the seed 0 always, so as to avoid exposing a seed argument to the QIIME user. Given this assumption, the code is fine as it stands ( If you want me to expose the seed to the user via QIIME, that is fine too, and I can make the corresponding change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't want the seed to be always set to zero in the qiime2 side. Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay. I will rewrite so that default seed for QIIME is 0, but can be set via a parameter. I'm not sure I understand where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll still want to have an option to not specify a seed, namely
|
||||||
model(session, trainX, trainY, testX, testY) | ||||||
|
||||||
loss, cv, its = model.fit( | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still leaning towards having the default be None, otherwise the option to not specify a random seed will no longer exist. For example, if you have a multiple random runs, you do things like take averages, or take the best fit amongst multiple random runs. If this option is not available, it won't be possible for the user to not specify a random seed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! And I appreciate the feedback on the PR.
I think it is important to mandate a seed for Q2. For a software that has a mission of reproducibility, you should not be able to create artifacts that you cannot reproduce. If someone were to give me an artifact created by songbird with seed set to None, I’d never be able to reproduce it exactly.
I’m also not sure why you can’t achieve your use case by setting the random seed differently, multiple times? This use case is actually the reason I had two different seeds originally, so you could keep the same train/test split, but change the model fitting. Though this behavior could be achieved by this PR by setting a training column and then varying the seed.
Also, as you mentioned, setting a None seed is still available via the CLI.