Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we include the title in the reference section? #55

Open
bryanwweber opened this issue Jun 20, 2017 · 11 comments
Open

Should we include the title in the reference section? #55

bryanwweber opened this issue Jun 20, 2017 · 11 comments

Comments

@bryanwweber
Copy link
Member

Title question raised by Mike Burke's group at Columbia.

@bryanwweber
Copy link
Member Author

My thought is that we shouldn't for two reasons

  1. It doesn't add anything that we don't already have from the DOI
  2. Validating that the title is correct is likely to be prone to error due to various encoding issues and have a bunch of edge cases with conversion of non-ASCII characters in the response from the DOI server.

@kyleniemeyer any thoughts?

@kyleniemeyer
Copy link
Member

kyleniemeyer commented Jun 20, 2017 via email

@bryanwweber
Copy link
Member Author

Though, this might lead to the question of whether we need anything beyond DOI...

I think we need a full set of reference information, like would be published in a journal. Some (most?) journals don't include the title of the article in the references section. Also, including the authors field for the reference feels like giving credit where its due.

If the reference doesn't have a DOI... like for a report or something, maybe we should have a URL field? If the data isn't publicly available somehow, I don't think we should include it in the database at all. In either case, it still feels like the title field is redundant.

@kyleniemeyer
Copy link
Member

I agree that we should probably only accept files when the reference is publicly available somewhere—I don't want to exclude conference papers that don't get turned into journal papers, though.

Thinking about this is leading to a chicken-and-egg problem in my head: ideally we want people (including us, or you at least) to create ChemKED files when they put a paper together, and perhaps include that as supplementary material with the submission. In that case, what do they put in the reference block? Just authors and a note about being under review? Perhaps the file-version should be 1.0alpha or something?

@bryanwweber
Copy link
Member Author

bryanwweber commented Jun 20, 2017

I agree that we should probably only accept files when the reference is publicly available somewhere—I don't want to exclude conference papers that don't get turned into journal papers, though.

Does this include papers presented at, e.g. the US National Combustion Meetings, where the proceedings aren't published online? I'm inclined to not allow submissions of data from such meetings, because there's no way for someone who didn't attend to verify the data, and the data hasn't been peer-reviewed, which for all its faults, is still the minimum standard of acceptability.

I'm working on some files now for a paper; I'm putting the journal, year, and authors. Once its in-press, I'll add the DOI and submit it to the database. I'm not sure if I'll put the files in the supplementary material... If I do, I'll leave out the DOI (because I won't know it, I don't think), and I'll bump the file-version to 1 when I add the DOI and submit to the database. Then I'll bump it to 2 when I get a volume/issue/page.

@kyleniemeyer
Copy link
Member

I think that if it came from a conference paper, at minimum the conference paper would need to be available on (e.g.) Figshare or something. I agree that we should prefer peer-reviewed data, but I also don't want to 100% exclude something potentially useful that didn't get published for some reason... not sure.

I'm working on some files now for a paper; I'm putting the journal, year, and authors. Once its in-press, I'll add the DOI and submit it to the database. I'm not sure if I'll put the files in the supplementary material... If I do, I'll leave out the DOI (because I won't know it, I don't think), and I'll bump the file-version to 1 when I add the DOI and submit to the database. Then I'll bump it to 2 when I get a volume/issue/page.

I definitely think we should encourage people to include the files as supplementary material, so that they are attached to the source paper. Not sure if you will have the DOI when it comes time to upload final materials for the paper, though.

@bryanwweber
Copy link
Member Author

OK, perhaps the criteria is that it has to have a permanent identifier of some sort. But this discussion has gotten way off track (sorry, I got us off track 😃), and we should probably move the bits about the acceptability of data (or not) over to the ChemKED-database repo (and also write a wiki entry there on how to submit new data).

I think we agreed that title is not worth adding to the schema. If that's correct, feel free to close the issue (I just wanted to document the discussion for future reference).

@kyleniemeyer
Copy link
Member

Yes, I agree we don't need to add it.

@bryanwweber
Copy link
Member Author

From Mike Burke via email to Bryan:

With regard to the title, the value I see for having a title is that I can recognize what dataset it is by simply looking at the title rather than having to look up the paper based on the DOI. Could it simply be an optional item to specify? In my view, if one already specifies file authors, journal, etc., there seems little reason why a title would not be included.

@bryanwweber
Copy link
Member Author

That's a reasonable use case. My concern is that validating that the title is correct (by comparing with the value from a DOI lookup) is bound to have many edge cases - for instance, some journals use HTML in their titles in the DOI service, while others don't. Having to code for all of these cases seems like it will lead to many false warnings.

The reason I'm insisting that we validate the title is correct is because we are trying, to the best of our ability, to ensure that we check that everything specified in the data file is correct according to some external standard. For instance, we also check the ORCID values for authors, if provided, to ensure the spelling of their names are correct, and we check the volume, issue, year, journal, and authors from a DOI lookup.

I'll look into testing this, picking say 100 random DOIs and seeing how accurate a relatively simple comparison will be. Reopening so I don't forget to do this.

@bryanwweber
Copy link
Member Author

OK as I suspected, there are a number of differences in title formatting and such. However, it's not that difficult to print out a useful diff between the returned title and the title from the YAML, so I think this is workable. We might need to wait until #78 is resolved so that the diff can be shown to the user in a useful way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants