Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferencing does not work #3

Closed
plandes opened this issue Aug 5, 2022 · 8 comments
Closed

Inferencing does not work #3

plandes opened this issue Aug 5, 2022 · 8 comments
Labels
documentation Improvements or additions to documentation

Comments

@plandes
Copy link

plandes commented Aug 5, 2022

I followed the instructions to run the inference script `` but I get the following error (below). It appears the module global variable gfeaturizer is not being set in `amr_coref.amr_coref.coref.coref_featurizer`.

Before running I added the downloaded model (from the releases section on this repo's GitHub section) and installed the LDC2020T02 corpus in the data directory.

Loading model from data/model_coref-v0.1.0/
Loading test data
ignoring epigraph data for duplicate triple: ('w', ':polarity', '-')
Clustering
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 259, in worker
    mlist       = gfeaturizer.mdata.mentions[doc_name]
AttributeError: 'NoneType' object has no attribute 'mdata'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/work/third/amr_coref/40_Run_Inference.py", line 67, in <module>
    cluster_dict = inference.coreference(ordered_pgraphs)
  File "/work/third/amr_coref/amr_coref/coref/inference.py", line 47, in coreference
    self.test_dloader = get_data_loader_from_data(tdata_dict, self.model, show_prog=self.show_prog, shuffle=False)
  File "/work/third/amr_coref/amr_coref/coref/coref_data_loader.py", line 41, in get_data_loader_from_data
    fdata   = build_coref_features(mdata, model, **kwargs)
  File "/work/third/amr_coref/amr_coref/coref/coref_featurizer.py", line 243, in build_coref_features
    for fdata in pool.imap_unordered(worker, idx_keys, chunksize=chunksize):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 448, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/pool.py", line 870, in next
    raise value
AttributeError: 'NoneType' object has no attribute 'mdata'

Compilation exited abnormally with code 1 at Fri Aug  5 18:19:19
@bjascob
Copy link
Owner

bjascob commented Aug 6, 2022

If you're running on Windows that might be the problem. Windows sometimes has problems running multi-processing code with globals do to the fact that it doesn't fork like Unix does.

Regardless, the easiest thing to do is to remove the multiprocessing. Simply remove line 242 "With Pool" and change line 243 so it calls map instead of pool.imap_unordered (plus change the indent and remove the chunksize parameter).

That'll be a bit slower but if you're only doing a few graphs it won't be noticeable.

@plandes
Copy link
Author

plandes commented Aug 6, 2022

@bjascob Thanks again for the speedy response. No, this is on macOS, so I'd think it would work like Linux since it's FreeBSD with a mach kernel.

I actually did something similar to remove multiprocessing (see below). However its very slow. Does it split up the work across just using the model? In other words, the multiprocessing isn't specifically used just for training is it?

from   multiprocessing.pool import ThreadPool
...
    if use_multithreading:
        pool = Pool(processes=processes, maxtasksperchild=maxtpc)
    else:
        pool = ThreadPool(processes=processes)
    for fdata in pool.imap_unordered(worker, idx_keys, chunksize=chunksize):
        dn, midx, sspans, dspans, words, sfeats, pfeats, slabels, plabels = fdata
        feat_data[dn][midx] = {'sspans':sspans,   'dspans':dspans, 'words':words,
                               'sfeats':sfeats,   'pfeats':pfeats,
                               'slabels':slabels, 'plabels':plabels}
        pbar.update(1)

@bjascob
Copy link
Owner

bjascob commented Aug 6, 2022

I believe this is just for inference but I haven't look at this code in a long time.

If you're using a Mac, I'm not sure what the issue is. That global should be defined in all "forked" processes unless FreeBSD uses a different MP scheme than "fork" as it's default. You might want to look at the docs and see if if the behavior is the same as Ubuntu.

Your solution looks OK to me, though simply removing the multiprocessing altogether would be worth trying.

@plandes
Copy link
Author

plandes commented Aug 6, 2022

@bjascob I've tested it and it works.

I've also added a simple make files and setup.py for the project. You mention you'd like to merge this amrlib. I can do that as well if you like.

Do you want me to create a pull request for any of this? If so, what do you want and in which repos?

Thanks again for all this great work.

@bjascob
Copy link
Owner

bjascob commented Aug 8, 2022

It's unlikely I will add this to amrlib. I think there is better co-referencing approaches available and I may take a look at those at some point in the future.

Todo - add note to README about compatibility with non-Ubuntu systems due to multi-processing.

@bjascob bjascob added the documentation Improvements or additions to documentation label Aug 8, 2022
@bjascob bjascob closed this as completed in 7b73096 Aug 8, 2022
@plandes
Copy link
Author

plandes commented Aug 8, 2022

@bjascob This makes sense.

Question about the implementation overall: Why did you decide to create a new model using AMR graphs rather than using the alignments along with the output of say neuralcoref?

@bjascob
Copy link
Owner

bjascob commented Aug 8, 2022

Co-referencing the sentence and then aligning to the graph was the only method available previously. I simply tried it this way to see if it would work. While it did work, the scores seemed somewhat mediocre and the code programatically complex. Recently there were two different papers/projects posted that I believe do this directly as well. I'm not sure which of all of these approaches gives the best results. At some point I'd like do dig into this a bit deeper but it's not a priority for me right now.

@plandes
Copy link
Author

plandes commented Nov 12, 2023

I have made these changes and another that raised an error while trying to use the library from a child process. I've forked this repo and made these changes in this repo. This includes my own setup to build wheels and deploy them to PyPi. I can create a pull request if you like.

Regarding the other AMR co-reference methods: I'll comment in the other issue, but despite the issues you mentioned, this is still the best working co-reference library out there (thanks again).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants