Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect window count in training set #2 #23

Open
popgengent opened this issue Mar 25, 2022 · 2 comments
Open

Incorrect window count in training set #2 #23

popgengent opened this issue Mar 25, 2022 · 2 comments
Assignees

Comments

@popgengent
Copy link

popgengent commented Mar 25, 2022

We ran G-Nomix on array data from all autosomes using the default config file for arrays. All chromosomes completed successfully except chromosome 7 which died while training the smoother. The error message was:

Traceback (most recent call last):
File "gnomix.py", line 396, in
model = train_model(config, data_path, verbose=verbose)
File "gnomix.py", line 195, in train_model
model.train(data=data, retrain_base=retrain_base, evaluate=True, verbose=verbose)
File "/tmp/tmp.jstlaojJjT/src/model.py", line 117, in train
self.smooth.train(B_t2,y_t2)
File "/tmp/tmp.jstlaojJjT/src/Smooth/smooth.py", line 35, in train
B_s, y_s = self.process_base_proba(B, y)
File "/tmp/tmp.jstlaojJjT/src/Smooth/models.py", line 23, in process_base_proba
B_slide, y_slide = slide_window(B, self.S, y)
File "/tmp/tmp.jstlaojJjT/src/Smooth/utils.py", line 27, in slide_window
y_slide = None if y is None else y.reshape(N*W)
ValueError: cannot reshape array of size 1609674 into shape (1607992,)

In the final line above, y enters as a 2d object with size 1682*957, but is expected to be reshaped to size N*W=1682*956. We are unsure where the discrepancy in window counts arises, but after changing the value for model: window_size_cM to 0.4 from 0.2 to test if a different window count might work, the program is able to pass the previous point of failure and we confirmed that the new dimensions of y (1682*478) are compatible with the new values of N (1682) and W (478).

@weekend37 weekend37 self-assigned this May 29, 2022
@nirav572
Copy link

nirav572 commented Aug 8, 2022

Hi @weekend37 - I had a similar issue.

The error message was:

Launching in training mode....
...
Training base models...
100%|███████████████████████████████████████▉| 546/547 [01:46<00:00,  5.15it/s]Training smoother...

Traceback (most recent call last):
  File "/shared_resources/software/gnomix/gnomix.py", line 397, in <module>
    model = train_model(config, data_path, verbose=verbose)
  File "/shared_resources/software/gnomix/gnomix.py", line 195, in train_model
    model.train(data=data, retrain_base=retrain_base, evaluate=True, verbose=verbose)
  File "/shared_resources/software/gnomix/src/model.py", line 117, in train
    self.smooth.train(B_t2,y_t2)
  File "/shared_resources/software/gnomix/src/Smooth/smooth.py", line 35, in train
    B_s, y_s = self.process_base_proba(B, y)
  File "/shared_resources/software/gnomix/src/Smooth/models.py", line 23, in process_base_proba
    B_slide, y_slide = slide_window(B, self.S, y)
  File "/shared_resources/software/gnomix/src/Smooth/utils.py", line 27, in slide_window
    y_slide = None if y is None else y.reshape(N*W)
ValueError: cannot reshape array of size 691408 into shape (690144,)

As attempted by @popgengent - I modified window_size_cM from 0.2 to 0.4, and it successfully worked. However, i encountered this issue only for one chromosome (chr20).

Some additional details if that might help:

  • I decided to train my own models using G-Nomix. This was performed on SHAPEIT2-phased datasets independently for each chromosome. Both query and reference datasets were phased together as one file using SHAPEIT2, and then split for LAI. The query and reference file was merged across all chromosomes to have one *.vcf.gz files (i.e. query.vcf.gz, and reference.vcf.gz)
  • It failed only for chr20 (Error attached), but was successful in model creation for all the remaining autosomes. (Note: chr20 has 26,803 SNPs)
  • RFMix2 was successful on all autosomes (with same query/reference/genetic map/sample map files).

Do you have a recommended fix to this?

Thank you for the very useful tool!

@bchak10
Copy link

bchak10 commented Sep 9, 2022

Hi all, I've encountered the same problem with a 0.2 window size while training the smoother for certain chromosomes (not all) and would appreciate help with troubleshooting (if possible, without having to increase the window size). Where exactly is this expectation of array dimensions coming from?

Below is the error message—

Traceback (most recent call last):
File "[gnomix.py](http://gnomix.py/)", line 397, in <module>
model = train_model(config, data_path, verbose=verbose)
File "[gnomix.py](http://gnomix.py/)", line 195, in train_model
model.train(data=data, retrain_base=retrain_base, evaluate=True, verbose=verbose)
File "/project/lbarreiro/USERS/bridget/programs/gnomix/src/model.py", line 117, in train
self.smooth.train(B_t2,y_t2)
File "/project/lbarreiro/USERS/bridget/programs/gnomix/src/Smooth/smooth.py", line 35, in train
B_s, y_s = self.process_base_proba(B, y)
File "/project/lbarreiro/USERS/bridget/programs/gnomix/src/Smooth/models.py", line 23, in process_base_proba
B_slide, y_slide = slide_window(B, self.S, y)
File "/project/lbarreiro/USERS/bridget/programs/gnomix/src/Smooth/utils.py", line 27, in slide_window
y_slide = None if y is None else y.reshape(N*W)
ValueError: cannot reshape array of size 378148 into shape (377880,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants