Download Covid-19 data from the official sources of the city of Berlin:
- Pressemitteilungen der Senatsverwaltung für Gesundheit, Pflege und Gleichstellung
- COVID-19 in Berlin, Verteilung in den Bezirken
- COVID-19 in Berlin (dashboard).
See covid-berlin-data for the data itself (updated daily).
$ brew install python
$ pip install poetry
$ make setup
# pacman -S poetry
$ make setup
Install these dependencies manually:
- Python >= 3.8.1
- poetry
Then run:
$ make setup
This program works in several steps:
-
Download press releases from the current RSS feed and save their metadata to a database in the passed cache directory:
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-feed
-
Download the current district table (Verteilung in den Bezirken) and save the data to a database in the passed cache directory:
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-district-table
-
Download the current dashboard and save the data in a database to the passed cache directory:
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-dashboard
-
(Optional) Download press releases from the press release archive and save their metadata to the same database:
$ ./covid-berlin-scraper --cache my_cache_dir --verbose download-archives
-
Parse the content of all press releases, district tables and dashboards stored in the database and generate a CSV output:
$ ./covid-berlin-scraper --cache my_cache_dir --verbose parse-press-releases \ -o my_output.csv \ --output-hosp my_output_incl_hospitalized.csv
See all command line options:
$ ./covid-berlin-scraper --help
$ make setup
$ make test
$ make lint
$ make help
Feel free to remix this project under the terms of the Apache License, Version 2.0.