Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad file descriptor error when saving dgl graph to HDFS #2106

Closed
ChenAris opened this issue Aug 26, 2020 · 11 comments · Fixed by #2186
Closed

Bad file descriptor error when saving dgl graph to HDFS #2106

ChenAris opened this issue Aug 26, 2020 · 11 comments · Fixed by #2186

Comments

@ChenAris
Copy link

❓ Questions and Help

To save DGLGraph to HDFS, I build DGL from source with HDFS dmlc. The function I use is dgl::serialize::SaveDGLGraphs, and the input "filename" of this function is added with "hdfs:://" prefix. The following is the exception information:

terminate called after throwing an instance of 'dmlc::Error'
what(): [11:09:40] /dgl/third_party/dmlc-core/src/io/hdfs_filesys.cc:66: HDFSStream.hdfsSeek Error:Bad file descriptor
Stack trace:
[bt] (0) /dgl/build/libdgl.so(dmlc::io::HDFSStream::Seek(unsigned long)+0x30f) [0x7fd81b3b456f]
[bt] (1) /dgl/build/libdgl.so(dgl::serialize::SaveDGLGraphs(std::string, dgl::runtime::List<dgl::serialize::GraphData, void>, std::vector<std::pair<std::string, dgl::runtime::NDArray>, std::allocator<std::pair<std::string, dgl::runtime::NDArray> > >)+0xdd) [0x7fd81a6490bd]

I also checked the output folder in HDFS, and the saved "graph.dgl" only has 24 bytes, which means it did connect to hdfs and attempt to write. It seems that the dmlc::SeekStream cannot be directly used for HDFS filename.

auto fs = std::unique_ptr<SeekStream>(dynamic_cast<SeekStream *>(

I wonder how to correctly use the dmlc hdfs filesystem to serialize and save dgl graph to hdfs. Please help. Thanks.

@VoVAllen
Copy link
Collaborator

Hi,

Which dgl branch are you using? SaveDGLGraphs should not be called in the latest branch.

@ChenAris
Copy link
Author

Hi,

Thank you for your reply.

I use the master branch, and yes, the function SaveDGLGraphs is not called from python interface. This is for our own need. We need to save dgl graph directly on HDFS, instead of saving to local filesystem and copying to HDFS. Hence, we build the dmlc with HDFS, and try to call the function SaveDGLGraphs in our own C++ program.

@VoVAllen
Copy link
Collaborator

VoVAllen commented Aug 26, 2020

Hi,

Currently we all move to the DGLHeteroGraph instead of DGLGraph. The original SaveDGLGraphs doesn't support hdfs. We only considered local file saving when implementing this function. Thus we assumed it's a seekable stream, but it's not the case for hdfs

Please try to use the SaveHeteroGraphs, which supports hdfs

@VoVAllen
Copy link
Collaborator

And there's api

*rv = HeteroGraphRef(ig->AsHeteroGraph());
to convert to DGLHeteroGraph, fyi

@ChenAris
Copy link
Author

ChenAris commented Sep 1, 2020

We finally managed to save heterographs to hdfs. Thank you for your help.

BTW, the hdfs writing function used in dml-core cannot handle size larger than int32, due to tSize defined in hdfs.
https://github.com/dmlc/dmlc-core/blob/16c6f68c09af7ed2762cedcd2017307baaf875ed/src/io/hdfs_filesys.cc#L54

This will cause error for dgl hdfs saving. You may need to modify the writing function by slicing the large size input following its reading function.
https://github.com/dmlc/dmlc-core/blob/16c6f68c09af7ed2762cedcd2017307baaf875ed/src/io/hdfs_filesys.cc#L35

@VoVAllen
Copy link
Collaborator

VoVAllen commented Sep 1, 2020

Thanks for your report. We will work with dmlc-core to fix this

@VoVAllen
Copy link
Collaborator

VoVAllen commented Sep 1, 2020

Do you mean both Read and Write function need to be modified?

@VoVAllen
Copy link
Collaborator

VoVAllen commented Sep 1, 2020

Does this block your current workflow? Or did you make a patch locally?

@ChenAris
Copy link
Author

ChenAris commented Sep 1, 2020

For reading from hdfs, it does not have to be modified since dmlc-core has already done the slicing. But I don't know why they do not apply same slicing method when writing...

Yes, it blocks the dgl hdfs saving function when we attempt to save large graph or tensor to hdfs. We modify the source code of dmlc-core and build dgl from source to make it work.

@VoVAllen
Copy link
Collaborator

VoVAllen commented Sep 1, 2020

I see where's the problem. If the size > MAX_INT, there would be error. Will you meet the scenario that write more than the max of size_t into the hdfs or other streams? (i.e. an NDArray with more than 4,294,967,295 bytes)

@VoVAllen VoVAllen reopened this Sep 1, 2020
@ChenAris
Copy link
Author

ChenAris commented Sep 1, 2020

Yes, e.g., the dataset "ogbn-paper100M" contains around 100 GB data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants