Skip to content

πŸ“„ MongoDB database to support managing and validation of EU Digital COVID - like certificates

Notifications You must be signed in to change notification settings

pablogiaccaglia/mongodb-covid-certificates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Systems and Methods for Big and Unstructured Data - Delivery #2 - AA 2021/2022 - Prof. Marco Brambilla

πŸ“„ Covid certificates-oriented MongoDB Database

MongoDB β€’ Report

This project's purpose is to keep track of COVID-19 pandemic data about people, authorized bodies, vaccines, tests and, most of all, Covid certificates of vaccination or testing by designing and implementing a document-based MongoDB database. The primary objective is to support a fast tool that checks the validity of the certificate. The data stored allows to extract actionable insights concerning various statistical purposes, involving information such as health services, vaccination & testing hubs and vaccine lots, even though the database is not optimized for these tasks, since the already mentioned main goal regards certificates validity check.

Contents

System requirements

Required software

  • Python 3.8 or higher (only if you want to perform manual load from CSVs)
  • MongoDB database
  • Python modules in requirements.txt (only if you want to perform manual load from CSVs)

πŸš€ Setup instructions

Clone the repo

git clone https://github.com/pablogiaccaglia/mongodb-covid-certificates
cd mongodb-covid-certificates/

Install required packages

From the project's directory run the following commands:

pip install -r requirements.txt

πŸ‘¨β€πŸ’» Usage

Load DB Dump

Download the database dump and navigate to the folder where it is located, then from the command line run something like this:

mongorestore -h host.com:port -d covid_certificates -u username -p password downloads/dumps/

Assuming that you want to put the contents of the database into a new database called covid_certificates.

Load from CSV

To populate the database from the provided CSVs and Python scripts (from which further customizations of the generated data can be performed), the first step to accomplish is to establish a connection to a MongoDB Server. The provided code relies on a MongoDB Atlas based connection, but it can easly customized to connect to a MongoDB Server on your local machine, as shown here.

As you can see in the main method of the main.py file, a MongoDB object is created in the following way:

    uri = "MONGODB_URI"
    mongoDB = MongoDB(connectionURI = uri)

the data passed to the class' constructor is used in the init method to establish a connection through a driver:

   class MongoDB(MongoClient):

    def __init__(self, connectionURI) -> None:
        super(MongoDB, self).__init__(connectionURI, connect = False)

After this step all you need to do is execute the main method and wait the routine to complete.

The Python code manipulates several CSV files which can be found in different versions inside the datasets folders. Detailed information of the manipulation process which lead to the final state of the database can be found in the Report.

πŸ“Š Diagrams

ER Diagram


Document Diagram


πŸ“‡ About mimed certificates

Certificate of vaccination

This document contains information which reflects a real green certificate. In fact its fields (excepting the QRCode field) are compliant to the official

European eHealth network COVID certificate JSON Schema Specification, even though not all the specified fields are here included. In particular these fields are :

  • QRCode: This string value represent the encoded JSON 'certificateOfVaccination' document, excluding this field. The process for producing this remotely mimics what the European eHealth Trust Framework for Certificates expects. The process we applied is the following:

    • JSON Dump of the Python dictionary representing the certificate.
    • UTF-8 encoding of the JSON string.
    • Base 45 encoding of the bytes generated by the previous encoding.
    • Compression with zlib of the previously generated bytes.
    • QR Code Image generation with qrcode Python library.
    • Base 64 encoding of image bytes.
    • Bytes conversion to string.

    Through appropriate functions the process can be performed backwards to obtain both the QR Code and the original JSON string.

  • diseaseOrAgentTargeted: This value set has a single entry 840539006 , which is the code for COVID19 from SNOMED CT.

  • vaccineOrProphylaxis: SNOMED CT code indicating the vaccine or prophylaxis used. The mapping is the following:

    • SARS-CoV2 antigen vaccine: 1119305005
    • SARS-CoV2 mRNA vaccine: 1119349007
  • vaccineProduct: Code complying the Union Register of medicinal products code system representing Medicinal product used for the specific dose of vaccination, The mapping is the following:

    • Pfizer Vaccine: EU/1/20/1528
    • Moderna Vaccine: EU/1/20/1507
    • AstraZeneca Vaccine: EU/1/21/1529
    • Janssen Vaccine: EU/1/20/1525
  • uniqueCertificateIdentifier: Unique certificate identifier (UVCI), whose structure mimics the one specified in this document only in terms of sequence of digits and characters.

  • doseNumber:Sequence number (positive integer) of the dose given during a vaccination event. 1 for the first dose, 2 for the second dose etc.

  • totalSeriesOfDoses: Total number of doses (positive integer) in a complete vaccination series according to the used vaccination protocol. In the database this value is set to 1 for "Janssen" vaccine certificate, 2 in all the other cases.

  • countryOfVaccination: Country expressed as a 2-letter ISO3166 code. In the current database this value is 'IT', since the data regards Italy.

  • marketingAuthorizationHolder: Marketing authorisation holder code from EMA SPOR Organisations Management System. The mapping is the following:

    • AstraZeneca AB: ORG-100001699
    • Biontech Manufacturing GmbH: ORG-100030215
    • Janssen-Cilag International: ORG-100001417
    • Moderna Biotech Spain: ORG-100031184
  • certificateIssuer: Name of the organisation that issued the certificate. In the current database this value is always Italian Ministry of Health since the data regards Italy.

  • certificateValidFrom: The first date on which the certificate is considered to be valid, provided in the format YYYY-mm-ddTHH:MM:ss. Following what specified here, in the current database this date if after 15 days from the first dose and after 3 days in case of "Janssen" Vaccine.

  • certificateValidUnti: The last date on which the certificate is considered to be valid, assigned by the certificate issuer, provided in the format YYYY-mm-ddTHH:MM:ss. Following what specified here, in the current database this date is after 28 days from the vaccination in case of first dose, 270 days in case of second dose or single dose.

  • schemaVersion: Value matching the identifier of the schema version used for producing the EUDCC. In the current database this value is set to 1.0.0.

Certificate of testing

This document contains information which reflects a real green certificate. In fact its fields (excepting the QRCode field) are compliant to the official European eHealth network COVID certificate JSON Schema Specification, even though not all the specified fields are here included. In particular these fields are :

  • QRCode: This string value represent the encoded JSON 'certificateOfVaccination' document, excluding this field. The process for producing this remotely mimics what the European eHealth Trust Framework for Certificates expects. The process we applied is the following:

    • JSON Dump of the Python dictionary representing the certificate.
    • UTF-8 encoding of the JSON string.
    • Base 45 encoding of the bytes generated by the previous encoding.
    • Compression with zlib of the previously generated bytes.
    • QR Code Image generation with qrcode Python library.
    • Base 64 encoding of image bytes.
    • Bytes conversion to string.
  • diseaseOrAgentTargeted: This value set has a single entry 840539006 , which is the code for COVID19 from SNOMED CT.

  • testType: The type of the test used's LOINC code, based on the material targeted by the test. According to this report, the mapping is the following :

    • Molecular: 94309-2
    • Antigen: 94558-4
    • Antibody: 94762-2
  • resultOfTheTest: coded value based on SNOMED CT. The mapping is the following:

    • Detected: 260373001
    • Not Detected: 260415000
  • uniqueCertificateIdentifier: Unique certificate identifier (UVCI), whose structure mimics the one specified in this document only in terms of sequence of digits and characters.

  • countryOfTesting: Country expressed as a 2-letter ISO3166 code. In the current database this value is IT, since the data regards Italy.

  • certificateIssuer: Name of the organisation that issued the certificate. In the current database this value is always Italian Ministry of Health since the data regards Italy.

  • certificateValidFrom: The first date on which the certificate is considered to be valid, provided in the format YYYY-mm-ddTHH:MM:ss. Following what specified here, in the current database this date is after 3 days for the Antibody test, 2 days the Molecular test and 0 days for the Antigen test, which are the result wait days.

  • certificateValidUnti: The last date on which the certificate is considered to be valid, assigned by the certificate issuer, provided in the format YYYY-mm-ddTHH:MM:ss. Following what specified here, in the current database this date is after 3 days from the result date for the Molecular test and the Antibody test, 2 days for the Antigen test.

  • schemaVersion: Value matching the identifier of the schema version used for producing the EUDCC. In the current database this value is set to 1.0.0.

πŸ“ License

This file is part of "Covid certificates-oriented MongoDB Database".

"Covid certificates-oriented MongoDB Database" is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

"Covid certificates-oriented MongoDB Database" is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program (LICENSE.txt). If not, see http://www.gnu.org/licenses/

About

πŸ“„ MongoDB database to support managing and validation of EU Digital COVID - like certificates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published