Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when installing from conda #106

Open
findalexli opened this issue Mar 23, 2020 · 7 comments
Open

Issue when installing from conda #106

findalexli opened this issue Mar 23, 2020 · 7 comments

Comments

@findalexli
Copy link

Hello!

I am having the following issue with installing the condo environment from YML file.

(base) MacBook-Pro-4:snorkeling alexanderli$ conda env create --file environment.yml
Collecting package metadata: done
Solving environment: failed

UnsatisfiableError: The following specifications were found to be in conflict:
  - gensim=3.8.1 -> python_abi=[build=*_cp37m] -> pypy[version='<0a0']
  - sqlalchemy=1.1.13
Use "conda search <package> --info" to see the dependencies for each package.
@ajlee21
Copy link
Contributor

ajlee21 commented Mar 23, 2020

Hi Alex,
Excited to see that you're trying to use some of the work in this repository. I'm actually not the owner so I'm going to tag @danich1 who is!

@danich1
Copy link
Contributor

danich1 commented Mar 23, 2020

Greetings Alex,
I believe your problem is an OS version issue. For linux things work fine, but not surprised MacOS is having issues. I think the quick fix for this situation is to move gensim and sqlalchemy onto the pip section and let pip handle the versioning.

Correct file (note the environment name change):

name: snorkeling
channels:
  - conda-forge
  - pytorch
dependencies:
- beautifulsoup4=4.6.0
- ipykernel=5.1.2
- ipywidgets=7.5.1
- jupyter=1.0.0
- jupyter_client=5.3.4
- jupyter_console=6.0.0
- jupyter_core=4.6.0
- llvmlite=0.21.0
- lxml==4.1.1
- matplotlib=3.1.1
- neo4j-python-driver==1.3.1
- networkx=2.1
- nltk=3.2.4
- numpy=1.17.2
- pandas=0.24.0
- pip=19.2.3
- plotnine=0.5.1
- psycopg2=2.7.3.2
- python=3.6.7
- pytorch=1.1.0
- py4j=0.10.6
- requests=2.18.4
- seaborn=0.9.0
- scikit-image=0.13.1
- scikit-learn=0.21.3
- scipy=1.3.1 
- six=1.12.0
- sqlite=3.30.0
- tensorflow==2.0.0
- tensorboard==2.0.0
- tqdm=4.28.1
- tika=1.15
- xlrd=1.1.0
- xlsxwriter=1.0.4
- pip:
    - gensim==3.8.1
    - hetio==0.2.6
    - matplotlib-venn==0.11.5
    - snorkel==0.9.1
    - spacy==1.10.0
    - sqlalchemy==1.1.13

~danich1

@findalexli
Copy link
Author

Thank you, Alexandra and David. I appreciate the prompt reply.

A follow up question... the snorkel 0.9.1 does not contain a snorkel module. I am trying to run some notesbooks, like compound_disease/compound_treats_disease/dataset_statistics/dataset_statistics.ipynb, but it will not run.

ModuleNotFoundError Traceback (most recent call last)
in
8 os.environ['SNORKELDB'] = database_str
9
---> 10 from snorkel.model import SnorkelSession
11 session = SnorkelSession()

ModuleNotFoundError: No module named 'snorkel.model'

@danich1
Copy link
Contributor

danich1 commented Mar 25, 2020

Right. I figured that would happen. Reason for the error is that some of my notebooks was using snorkel's old version, before the authors upgraded their code. The old code was using a database to access sentences and other information and now the authors adapted their code to move away from using a database.

If you want to run the notebooks that use snorkel's old code, you will have to install this library as a separate conda environment.

@Jatin6004
Copy link

Jatin6004 commented Jan 5, 2021

Hi, I am having an issue while running the pubtator-to-postgres.ipynb file from the create_database folder.


PicklingError Traceback (most recent call last)
in
35 for edges in [dge, gge, cge, cde]:
36 print(edges)
---> 37 insert_cand_to_db(edges, [train_sens, dev_sens, test_sens])
38
39 offset = offset + chunk_size

~\Desktop\snorkeling\create_database\database_insertion.py in insert_cand_to_db(extractor, sentences)
131 def insert_cand_to_db(extractor, sentences):
132 for split, sens in enumerate(sentences):
--> 133 extractor.apply(sens, split=split, parallelism=5, clear=False)
134
135 def print_candidates(session, context_class, edge):

~\Anaconda3\envs\snorkel-extraction\lib\site-packages\snorkel\candidates.py in apply(self, xs, split, **kwargs)
216
217 def apply(self, xs, split=0, **kwargs):
--> 218 super(PretaggedCandidateExtractor, self).apply(xs, split=split, **kwargs)
219
220 def clear(self, session, split, **kwargs):

~\Anaconda3\envs\snorkel-extraction\lib\site-packages\snorkel\udf.py in apply(self, xs, clear, parallelism, progress_bar, count, **kwargs)
51 self.apply_st(xs, clear=clear, count=count, **kwargs)
52 else:
---> 53 self.apply_mt(xs, parallelism, clear=clear, **kwargs)
54
55 if self.pb is not None:

~\Anaconda3\envs\snorkel-extraction\lib\site-packages\snorkel\udf.py in apply_mt(self, xs, parallelism, **kwargs)
108 # Start the UDF processes, and then join on their completion
109 for udf in self.udfs:
--> 110 udf.start()
111
112 while any([udf.is_alive() for udf in self.udfs]) and count < total_count:

~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

~\Anaconda3\envs\snorkel-extraction\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

PicklingError: Can't pickle <class 'snorkel.models.candidate.DiseaseGene'>: attribute lookup DiseaseGene on snorkel.models.candidate failed

@danich1
Copy link
Contributor

danich1 commented Jan 5, 2021

PicklingError: Can't pickle <class 'snorkel.models.candidate.DiseaseGene'>: attribute lookup DiseaseGene on snorkel.models.candidate failed

This error arises because the pickle library cannot pickle sqlalchemy's candidate_subclass class. This is actually not an easy problem to fix. (see this post) If you want to get functionality working here you might have to edit snorkel's code directly. Turns out their old version is depreciated, so it adds to the complexity.
FYI: I'm coming out with my own code to do the above extraction. Should be uploaded in a few weeks.

@Jatin6004
Copy link

Thanks a lot David that would be really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants