Skip to content

Download Covid-19 data from the official sources of the city of Berlin.

License

Notifications You must be signed in to change notification settings

jakubvalenta/covid-berlin-scraper

Repository files navigation

Scraper for Covid-19 Data in Berlin

Download Covid-19 data from the official sources of the city of Berlin:

See covid-berlin-data for the data itself (updated daily).

Installation

Mac

$ brew install python
$ pip install poetry
$ make setup

Arch Linux

# pacman -S poetry
$ make setup

Other systems

Install these dependencies manually:

  • Python >= 3.8.1
  • poetry

Then run:

$ make setup

Usage

This program works in several steps:

  1. Download press releases from the current RSS feed and save their metadata to a database in the passed cache directory:

    $ ./covid-berlin-scraper --cache my_cache_dir --verbose download-feed
  2. Download the current district table (Verteilung in den Bezirken) and save the data to a database in the passed cache directory:

    $ ./covid-berlin-scraper --cache my_cache_dir --verbose download-district-table
  3. Download the current dashboard and save the data in a database to the passed cache directory:

    $ ./covid-berlin-scraper --cache my_cache_dir --verbose download-dashboard
  4. (Optional) Download press releases from the press release archive and save their metadata to the same database:

    $ ./covid-berlin-scraper --cache my_cache_dir --verbose download-archives
  5. Parse the content of all press releases, district tables and dashboards stored in the database and generate a CSV output:

    $ ./covid-berlin-scraper --cache my_cache_dir --verbose parse-press-releases \
        -o my_output.csv \
        --output-hosp my_output_incl_hospitalized.csv

Help

See all command line options:

$ ./covid-berlin-scraper --help

Development

Installation

$ make setup

Testing and linting

$ make test
$ make lint

Help

$ make help

Contributing

Feel free to remix this project under the terms of the Apache License, Version 2.0.