Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd errors in larger data sets? #12

Open
vortexing opened this issue Feb 23, 2021 · 1 comment
Open

Odd errors in larger data sets? #12

vortexing opened this issue Feb 23, 2021 · 1 comment

Comments

@vortexing
Copy link

vortexing commented Feb 23, 2021

We've been attempting to try out pyclone-vi on our data and we're seeing this weird behavior where it works just fine when we put in like 10-20 variants per sample, but once we put the full list of 300-400 mutations, it balks. We're continuing to troubleshoot to see if it's somehow our HPC or software install environment, but on the off chance this looks familiar to you I thought I'd post the error.

The data input are data from 1 sample at a time, in the right format but there is no tumor content column or error rate column in our datasets. When the script is run, stdout only has: Tumour content column not found. Setting values to 1.0., so we know things are getting to the right place and getting read in to that point, but then we're seeing this (again, only when we do not truncate our input data set to a small number of variants):

Traceback (most recent call last):
  File "/opt/conda/bin/pyclone-vi", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/cli.py", line 113, in fit
    pyclone_vi.run.fit(**kwargs)
  File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/run.py", line 29, in fit
    log_p_data, mutations, samples = load_data(in_file, density, num_grid_points, precision=precision)
  File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/data.py", line 11, in load_data
    data, mutations, samples = load_pyclone_data(file_name)
  File "/opt/conda/lib/python3.8/site-packages/pyclone_vi/data.py", line 78, in load_pyclone_data
    cn, mu, log_pi = cn_priors[(
  File "/opt/conda/lib/python3.8/site-packages/pandas/core/generic.py", line 1668, in __hash__
    raise TypeError(
TypeError: 'Series' objects are mutable, thus they cannot be hashed

Any gems? Could we have some sort of file parsing issue for a particular variant name (are there certain characters we can't use in a variant ID)? I feel like this is something silly but can't put my finger on it.

@vortexing
Copy link
Author

Oh. My. Gosh. Just FYI, your code breaks if there is a duplicate mutation_id in a sample's dataset. It doesn't FIX the duplicate, just breaks. SUPER minor, but hey, just FYI for ease of use, perhaps a quick filter for uniqueness OR a mention in the docs. ;) I KNEW it felt like something stupid... and it was...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant