Skip to content

Section 1 What is SimpleNLG

Ruud de Jong edited this page Oct 24, 2018 · 1 revision

This page is based on a page of the wiki for the original SimpleNLG.

SimpleNLG-NL can be used to help you write a program which generates grammatically correct Dutch, French and English sentences. It’s a library (not an application), written in Java, which performs simple and useful tasks that are necessary for natural language generation (NLG).

Because it’s a library, you will need to write your own Java program which makes use of SimpleNLG-NL classes. These classes will allow you to specify the subject of a sentence (‘my dog’), the exact verb you want to appear in the sentence (‘chase’), the object (‘George’), and additional complements (‘because George looked funny’). You can also use SimpleNLG-NL methods to indicate, for example, that you would like the verb to be in the past tense and expressed in the progressive form (‘was chasing’). If this is already confusing, don't worry -- this tutorial will help you with all of that. The example on this page is written in English, but similar input can be written for French or Dutch.

Once you have stipulated what the content of your sentence will be and expressed this information in SimpleNLG terms, SimpleNLG-NL can assemble the parts of your sentence into a grammatical form and output the result. In our example, the resulting output would be "My dog was chasing George because George looked funny". Here, SimpleNLG-NL has:

  1. Organized all the different parts into the correct order for the chosen language.
  2. Capitalized the first letter of the sentence.
  3. Added the English auxiliary ‘was’ and made it agree with the subject.
  4. Added ‘-ing’ to the end of the verb (because the progressive aspect of the verb is desired).
  5. Put all the words together in a grammatical form.
  6. Inserted the appropriate whitespace between the words of the sentence.
  7. Put a period at the end of the sentence.

As you can see, SimpleNLG-NL will not choose particular words for you: you will need to specify the words you want to appear in the output and their parts of speech. What SimpleNLG-NL’s library of classes will do for you is create a grammatically correct sentence from the parts of speech you have provided it with. SimpleNLG-NL automates some of the more mundane tasks that all Natural Language Generation (NLG) systems need to perform. (For more information on NLG, see Appendix A). Tasks such as:

Orthography:

  • Inserting appropriate whitespace in sentences and paragraphs.
  • Absorbing punctuation – for example, generating the sentence "He lives in Washington D.C." instead of "He lives in Washington D.C.." (i.e., the sentence ends with a single period rather than two).
  • Pouring – that is, inserting line breaks between words (rather than in the middle of a word) in order to fit text into rows of, for example, 80 characters (or whatever length you choose).
  • Formatting lists such as: "apples, pears and oranges."

Morphology:

  • Handling inflected forms – that is, modifying/marking a word/lexeme to reflect grammatical information such as gender, tense, number or person.

Simple Grammar:

  • Ensuring grammatical correctness by, among other things, enforcing noun-verb agreement [1].
  • Creating well-formed verb groups (i.e., verb plus auxiliaries) such as "does not like".
  • Allowing the user to define parts of a sentence or phrase and having simplenlg gather those parts together into an appropriate syntactic structure.

For those familiar with the terminology of NLG , SimpleNLG is a realiser for a simple grammar. We hope that SimpleNLG will eventually provide simple algorithms for not only realization but all of microplanning as well. As its functionality expands over time, components such as microplanning will be added as self-contained modules: self-contained, in order to allow students and researchers use of parts of the library they want, with the freedom to extend or replace other modules with their own implementations.


[1] Agreement describes how a word’s form sometimes depends on other words that appear with it in a sentence. For example you don’t say "I is" in English, because "is" cannot be used when the subject is "I". The word "is" is said not to agree with the word "I". The correct form is "I am", even though the verb still has the same function and basic meaning.