For FOOd COmposition Parser
FOODCOP BETA 1.11.12 [beta number.beta month.beta day] - 2021
main credit: Tristan Salord, AGIR ODYCEE INRAE
collaborators: Guillaume Cabanac, Marie-Benoît Magrini
INRAE, UMR 1248 AGIR
Code hase been submitted to INRAE registration system. Code is available under the GNU Reneral Public License
This file includes a detailed description of the github and instructions to execute code made available.
General Purpose FOODCOP( for FOOd COmposition Parser) is a set of python scripts for parsing ingredient lists obtained by scanning product packaging.
It is, actually, fully operative to analyse ingredient lists extracted from the MINTEL GNPD food innovation database, and will be benchmarked/enhanced to work on other food Database soon (report to section "Improvements Planification").
Why a new(?) parser Data from the MINTEL GNPD Database use different sorts of grammar to reproduce ingredients lists given in products packaging. Most of classical parsers failed on parsing such heterogenuous data.
Utility/Use Parsing food ingredient list, i.e transofrming raw ingredient list text into a structured data type allow to perform numerous scientific operation: identify certain types of ingredient, of species used in food composition, study food product evolution, assess food product complexity, etc.
Performance FOODCOP was tested on a large dataset of about 300,000 food ingredient lists extracted from MINTEL GNPD DB. It works with a margin of errors located under 2% depending on the size of the input data.
Evolution FOODCOP will be updated until it reaches the point in its evolution where it can become a complete python library.
FOODCOP runs with python3. It can works on any OS as long as python3 is installed.
All required libraries are listed on the next section "Required Libraries" and the data structure of the parser is described in section "Data Organisation".
Script functions are described in the "Script Description" section.
Once installed all required librairies simply:
- open a terminal,
- navigate to folder FOODCOP you have downloaded
- run the foodcoprun.py file with the correct arguments (for now -d for dataframe usage or -f for single raw ingredient list): "python3 foodcoprun.py [-d] [-f]"
- Results and Log file are automatically saved in a folder named "FOODCOP"+date of operation located into the user folder (i.e /Users/myusername on MACOS).
Decoding simple raw ingredient list:
-
python3 foodcoprun.py -f "water (50%),acidifier(salt, citric acid), wheat flour, soybean oil, coconut"
-
follow your terminal information
N.B: Will be soon updated for complex ingredient list
Decoding full dataframe:
- python3 foodcoprun.py -d "path/to/ur/excel/or/csv/file"
- follow your terminal information
- fully operational
Libraries to be installed:
- thefuzz
- numpy
- pandas
- openpyxl (to export as xlsx from pandas)
- python-Levenshtein
To simplify installation in your testing environment: "pip install -r requirements.txt"
[TO UPDATE]
[ to be populated ]
- User choice for output directory
- More verbose Log File
- MultiThreading
- Benchmark on other FoodDB