Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Still "writing" MTHSPL triples after 24 hrs, even with 244 GB RAM #29

Open
turbomam opened this issue Apr 4, 2019 · 2 comments
Open

Comments

@turbomam
Copy link

turbomam commented Apr 4, 2019

I'm running the umls2rdf script on an Ubuntu 16 AWS EC2 server. I bump the RAM up to 128 GB when I'm doing this. I have extracted several other, larger sources with zero or minimal difficulty. I'm using UMLS 2018AA. I'm extracting on CUIs.

I haven't done any MySQL tuning, but the SQL portion of the extraction goes quickly... less than 5 minutes, I think. I have tried to do this with the MTHSPL content combined with other sources in a single mmsys extract/MySQL database, and I have also tried doing MTHSPL in a database all by itself, which has been helpful with some of the other sources.

The triples writing has been going for over 1 day, but I don't think the Turtle file's size has grown beyond roughly 400 MB in the last 10 hours. top shows the python process at 100% CPU but a pretty small RAM usage... ~ 10 GB, I think.

select count(distinct CUI) from MRCONSO; in a MTHSPL-only database says there are 58,041 CUIs used by MTHSPL. I have loaded the Turtle content that I have after one day into a triplestore, and that only shows 3,633 CUIs from MTHSPL.

PREFIX umls: <http://bioportal.bioontology.org/ontologies/umls/>
select (count(distinct ?o) as ?count)
where
{
    graph <https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MTHSPL/> {
        ?s umls:cui ?o
    }
}
@turbomam turbomam changed the title MTHSPL triples writing still going after 24 hrs, even with 128 GB Still "writing" MTHSPL triples after 24 hrs, even with 128 GB Apr 4, 2019
@turbomam turbomam changed the title Still "writing" MTHSPL triples after 24 hrs, even with 128 GB Still "writing" MTHSPL triples after 24 hrs, even with 128 GB RAM Apr 4, 2019
@turbomam
Copy link
Author

turbomam commented Jun 2, 2019

I'm trying again now with UMLS 2019AA and fresh pull of umls2rdf.

Python 2.7 and Ubuntu 18 on an AWS EC2 x1e.2xlarge instance with 8 virtual CPUs, 244 GB RAM, and solid state storage provisioned at 4500 IOPS.

I set it up with MTHSPL as the only source:

MTHSPL,MTHSPL_only.ttl,load_on_cuis

It's been running for about 45 minutes now, most of that time completely idle. 0% CPU activity and 0 bytes/second disk activity.

$ grep -c 'owl:Class' MTHSPL_only.ttl
189
$ ls -lh MTHSPL_only.ttl
-rw-rw-r-- 1 ubuntu ubuntu 7.1M Jun  2 00:38 MTHSPL_only.ttl

head:

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix umls: <http://bioportal.bioontology.org/ontologies/umls/> .


<http://purl.bioontology.org/ontology/MTHSPL/>
    a owl:Ontology ;
    rdfs:comment "RDF Version of the UMLS ontology MTHSPL; converted with the UMLS2RDF tool (https://github.com/ncbo/umls2rdf), developed by the NCBO project." ;
    rdfs:label "MTHSPL" ;
    owl:imports <http://www.w3.org/2004/02/skos/core> ;
    owl:versionInfo "2019aa" .

<http://purl.bioontology.org/ontology/MTHSPL/C3486878> a owl:Class ;
        skos:prefLabel """CALCIUM FLUORIDE 30 [hp_C] in 1 mL ORAL PELLET [Sore Throat]"""@en ;
        skos:notation """C3486878"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0596235> ;
        <http://purl.bioontology.org/ontology/MTHSPL/has_inactive_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0038636> ;
        <http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0006695> ;
        <http://purl.bioontology.org/ontology/MTHSPL/has_inactive_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0022949> ;
        <http://purl.bioontology.org/ontology/MTHSPL/DM_SPL_ID> """36500"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/LABELER> """Natural Health Supply"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/LABEL_TYPE> """HUMAN OTC DRUG LABEL"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/MARKETING_CATEGORY> """Unapproved homeopathic"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/MARKETING_EFFECTIVE_TIME_LOW> """19980604"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/MARKETING_STATUS> """active"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/NDC> """64117-748-02"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/NDC> """64117-748-01"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/SPL_SET_ID> """e8ec7791-c5b4-4f16-8a64-2cadb203800e"""^^xsd:string ;
        <http://purl.bioontology.org/ontology/MTHSPL/UNAPPROVED_HOMEOPATHIC> """N/A"""^^xsd:string ;
        umls:cui """C3486878"""^^xsd:string ;
        umls:tui """T200"""^^xsd:string ;
        umls:hasSTY <http://purl.bioontology.org/ontology/STY/T200> ;
 .

tail:

<http://purl.bioontology.org/ontology/MTHSPL/C3818362> a owl:Class ;
	skos:prefLabel """ACETALDEHYDE 12 [hp_X] in 59 mL / ARSENIC TRIOXIDE 12 [hp_X] in 59 mL / BALSAM PERU 12 [hp_X] in 59 mL / OYSTER SHELL CALCIUM CARBONATE, CRUDE 12 [hp_X] in 59 mL / PHENOL 12 [hp_X] in 59 mL / CONIUM MACULATUM FLOWERING TOP 12 [hp_X] in 59 mL / COUMARIN 12 [hp_X] in 59 mL / SAFFRON 12 [hp_X] in 59 mL / HISTAMINE DIHYDROCHLORIDE 12 [hp_X] in 59 mL / LACHESIS MUTA VENOM 12 [hp_X] in 59 mL / LYCOPODIUM CLAVATUM SPORE 12 [hp_X] in 59 mL / PHOSPHORUS 12 [hp_X] in 59 mL / SEPIA OFFICINALIS JUICE 12 [hp_X] in 59 mL ORAL LIQUID [Allergies Fragrances and Phenolics]"""@en ;
	skos:notation """C3818362"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0019588> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C3696061> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0031705> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0031705> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C3489013> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C2346854> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0010206> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_inactive_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0043047> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0070570> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C3487991> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C3486868> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0010206> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0543456> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C3487991> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0052416> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_inactive_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0032841> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0000966> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0070477> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0070477> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C3486868> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C0000966> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C3484409> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C3484411> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0070570> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C3484409> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C3696061> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_inactive_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0724556> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C3484411> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_active_moiety> <http://purl.bioontology.org/ontology/MTHSPL/C3489013> ;
	<http://purl.bioontology.org/ontology/MTHSPL/has_inactive_ingredient> <http://purl.bioontology.org/ontology/MTHSPL/C0725616> ;
	<http://purl.bioontology.org/ontology/MTHSPL/MARKETING_EFFECTIVE_TIME_LOW> """20140324"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/LABELER> """King Bio Inc."""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/SPL_SET_ID> """9f673dc6-70e3-48f2-90a3-f38fc3a142d8"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/NDC> """57955-2205-2"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/LABEL_TYPE> """HUMAN OTC DRUG LABEL"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/MARKETING_CATEGORY> """Unapproved homeopathic"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/MARKETING_STATUS> """active"""^^xsd:string ;
	<http://purl.bioontology.org/ontology/MTHSPL/DM_SPL_ID> """237619"""^^xsd:string ;
	umls:cui """C3818362"""^^xsd:string ;
	umls:tui """T200"""^^xsd:string ;
	umls:hasSTY <http://purl.bioontology.org/ontology/STY/T200> ;
 .

After 9 hours

ubuntu@ip-172-31-94-83:/terabytes/umls2rdf/output$ grep -c 'owl:Class' MTHSPL_only.ttl
3412

That's ~ 350 classes/hour

https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/MTHSPL/ says

MTHSPL contains approximately 148,045 drug products and 20,739 substances.

Is this really going to take 170,000/350 = 485 hours!?

@turbomam
Copy link
Author

turbomam commented Jun 2, 2019

If I set debug mode to True

length atoms: 169417
Traceback (most recent call last):
  File "./umls2rdf.py", line 744, in <module>
    ont.load_tables()
  File "./umls2rdf.py", line 497, in load_tables
    sys.stderr.write("length atoms_by_aui: %d\n" % len(self.atoms_by_aui))
AttributeError: 'UmlsOntology' object has no attribute 'atoms_by_aui'

@turbomam turbomam changed the title Still "writing" MTHSPL triples after 24 hrs, even with 128 GB RAM Still "writing" MTHSPL triples after 24 hrs, even with 244 GB RAM Jun 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant