Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better note detection #18

Open
wvengen opened this issue Nov 14, 2019 · 2 comments
Open

Better note detection #18

wvengen opened this issue Nov 14, 2019 · 2 comments
Labels
enhancement New feature or request

Comments

@wvengen
Copy link
Member

wvengen commented Nov 14, 2019

Detection of ingredient notes could be improved (especially with the loose parser, I'd think). There are often-occuring ingredient notes that can be recognized, maybe even before parsing (as pre-processing).

A list of Dutch ingredient notes: ingredient_notes.xlsx

@wvengen wvengen added the enhancement New feature or request label Nov 14, 2019
@wvengen
Copy link
Member Author

wvengen commented Dec 28, 2020

A start.

rule note_template
  'Vrij van'i !char /
  ( 'E' / '"E"' ) ws* '-' ws* 'nummers zijn'i !char /
  'E' ws* '=' ws* 'door'i ws+ ( 'de'i ws+ )? ( 'EU' / 'E.U' '.'? / 'E.G' '.'? ) !char /
  'HOOFDLETTERS staan'i !char /
  'Deze verpakking kan'i !char /
  'Kan sporen van'i !char /
  'Kan sporen bevatten'i !char /
  'Bevat een bron'i !char /
  'Dit product is'i !char /
  'Zie www.'i /
  'Overmatig gebruik kan'i !char /
  'GGN'i ws* digit+ /
  char* 'chocolade:'i ws* 'ten minste'i !char /
  'Producten uit'i !char /
  'Kan'i ( ws+ word ','? )+  ws+ 'bevatten'i !char /
  'Voor allergenen:'i /
  'Deze verpakking kan'i !char /
  'Bevat mogelijk'i !char /
  'cacao'i ( 'bestanddelen'i )? ( ':'i )? ws* 'ten minste'i !char /
  'Allergenen staan' !char /
  'bereid met'i !char /
  'Verpakt onder beschermende atmosfeer'i !char /
  'Geproduceerd in'i !char /
  'Wordt gemaakt in'i !char /
  'Bereid in een bedrijf'i !char /
  'Allergie-informatie staat'i !char /
  'In een fabriek geproduceerd'i !char /
  'Op natuurlijke wijze'i !char /
  'Voor allergenen'i !char /
  'Van nature lactosevrij'i !char /
  'Ondanks onze kwaliteitszorg'i !char /
  'Kijk voor meer informatie'i !char
end

@wvengen
Copy link
Member Author

wvengen commented Jan 19, 2024

This could alternatively be implemented as a pre- or post-processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant