Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting code for English nouns and verbs #39

Closed
4 tasks done
SaurabhJamadagni opened this issue Aug 17, 2023 · 20 comments
Closed
4 tasks done

Formatting code for English nouns and verbs #39

SaurabhJamadagni opened this issue Aug 17, 2023 · 20 comments
Assignees
Labels
-next release- Included in the next release data Relates to data or Wikidata GSoC Available for Google Summer of Code participants

Comments

@SaurabhJamadagni
Copy link
Collaborator

SaurabhJamadagni commented Aug 17, 2023

Terms

Languages

English

Description

The issue focuses on adding the formatting scripts for nouns and verbs for the English language queried data.

  • format_nouns.py
  • format_verbs.py

Will tackle the issue in parts. Part of the GSoC project.

@SaurabhJamadagni SaurabhJamadagni added the data Relates to data or Wikidata label Aug 17, 2023
@SaurabhJamadagni SaurabhJamadagni self-assigned this Aug 17, 2023
@SaurabhJamadagni
Copy link
Collaborator Author

Hey @andrewtavis, could you add the appropriate priority to this issue?

Also, what formatter does Scribe-Data use? autopep8?

@andrewtavis
Copy link
Member

Priority applied, @SaurabhJamadagni! And Scribe-Data uses black for formatting :)

@andrewtavis andrewtavis added the -next release- Included in the next release label Aug 18, 2023
@andrewtavis
Copy link
Member

For the verbs formatting, @SaurabhJamadagni, the output I got from Wikidata would be for simple present, simple past and past perfect. Unless you really want to keep the model lean as on Wikidata, let’s make the full six conjugations per case in the formatting so the references would be easier. We also don’t need to do six versions of past perfect as this would just be conjugations of have with the past participle as we do for other languages :) :)

@SaurabhJamadagni
Copy link
Collaborator Author

let’s make the full six conjugations per case in the formatting so the references would be easier

Hey @andrewtavis, could you please list the six conjugation cases to include? I am a bit confused 😅

@SaurabhJamadagni
Copy link
Collaborator Author

So would the list be:

presFPS
presSPS
presTPS
presFPP
presSPP
presTPP
pastFPS
pastSPS
pastTPS
pastFPP
pastSPP
pastTPP
pastParticiple

@andrewtavis
Copy link
Member

@SaurabhJamadagni, sure thing!

  • First person singular
  • Second person singular
  • Third person singular
  • First person plural
  • Second person plural
  • Third person plural

We’d need the above for present and past, and then for present perfect and past perfect we can just construct it with “have” conjugations: “have run” or “had run”, etc.

@andrewtavis
Copy link
Member

So would the list be:

presFPS
presSPS
presTPS
presFPP
presSPP
presTPP
pastFPS
pastSPS
pastTPS
pastFPP
pastSPP
pastTPP
pastParticiple

Exactly, @SaurabhJamadagni!

@SaurabhJamadagni
Copy link
Collaborator Author

Awesome! Got it thanks! On it right now :)

@SaurabhJamadagni
Copy link
Collaborator Author

Hey @andrewtavis, I don't think the verbs_queried.json file shows the updates that were pushed with the new query_verbs.sparql update in the PR #40. How to run the sparql query to update the file? Since it is currently in it's old format, it is difficult to figure out the correct if-else cases.

@andrewtavis
Copy link
Member

Ya you’re right, @SaurabhJamadagni 🤦‍♂️ Should have ran it and sent the file. I’ll be home in 20 min and will update it for you!

andrewtavis added a commit that referenced this issue Aug 26, 2023
@andrewtavis
Copy link
Member

Got held up, @SaurabhJamadagni. 2836dbb adds the new verbs with all their forms :)

@SaurabhJamadagni
Copy link
Collaborator Author

Awesome @andrewtavis! Thanks :)

@SaurabhJamadagni
Copy link
Collaborator Author

So the values for the key simpPast in the queried verbs, what key in the conjugation does it get assigned too? I mean in these keys:

presFPS
presSPS
presTPS
presFPP
presSPP
presTPP
pastFPS
pastSPS
pastTPS
pastFPP
pastSPP
pastTPP
pastParticiple

@andrewtavis
Copy link
Member

I guess all of past ones, so:

pastFPS
pastSPS
pastTPS
pastFPP
pastSPP
pastTPP

Feels a bit weird to do it this way, but then we'd have to change the view. We do that for Russian, but that's explicitly because past tense works differently rather than in the case for English where there just isn't a change. I'd say we're fine to do the 3x2 view for English for now, and we can reevaluate later :)

@SaurabhJamadagni
Copy link
Collaborator Author

So if the following is one of the entries in queried_verbs.json

{
    "infinitive": "dialogue",
    "presFPS": "dialogue",
    "presTPS": "dialogues",
    "simpPast": "dialoged",
    "pastPart": "dialoged"
}

then the formatted file would have:

dialogue: {
"presFPS": "dialogue",
"presSPS": "",
"presTPS": "dialogues",
"presFPP": "",
"presSPP": "",
"presTPP": ""
"pastFPS": "dialoged",
"pastSPS": "dialoged",
"pastTPS": "dialoged",
"pastFPP": "dialoged",
"pastSPP": "dialoged",
"pastTPP": "dialoged",
"pastPart": "dialoged"
}

is this correct @andrewtavis?

@andrewtavis
Copy link
Member

We’d also want to apply presFPS to the other present tenses that are currently blank, @SaurabhJamadagni :) Maybe we should switch that to simplePresent in the SPARQL too?

@SaurabhJamadagni
Copy link
Collaborator Author

SaurabhJamadagni commented Aug 27, 2023

We’d also want to apply presFPS to the other present tenses that are currently blank

Ohh, so something like this:

"presFPS": "dialogue",
"presSPS": "dialogue",
"presTPS": "dialogues",
"presFPP": "dialogue",
"presSPP": "dialogue",
"presTPP": "dialogue"

But how is "dialogues" third person singular?

@andrewtavis
Copy link
Member

I think that dialogue is maybe not the best example @SaurabhJamadagni, but that is the structure we’ll need :) :) “He dialogues the steps in the code”? Seems ok-ish, and other verbs will be fine 😊

@SaurabhJamadagni
Copy link
Collaborator Author

but that is the structure we’ll need :

Ahhh got it!

@andrewtavis andrewtavis added the GSoC Available for Google Summer of Code participants label Aug 27, 2023
@andrewtavis
Copy link
Member

da6b5c4 finalizes this, @SaurabhJamadagni 😊 Maybe some minor changes will be needed during the next update process, but we can figure it out then :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-next release- Included in the next release data Relates to data or Wikidata GSoC Available for Google Summer of Code participants
Projects
Archived in project
Development

No branches or pull requests

2 participants