Skip to content

somnisomni/twitter-account-data-crawler

Repository files navigation

Twitter Account Data Crawler

License: MIT

A 'smol' program that crawls following/followers/statuses count data from Twitter account profile page using Selenium, and put the crawled data into MySQL database using PyMySQL.

The purpose of this program is to record the followers count daily and see how the count changes everyday. MAYBE THIS IS NOT PRODUCTION-READY, so use this with caution!

Why? You Can Simply Use Twitter API, Aren't You?

Twitter API application suspended

YES, I HAD. but one day Twitter suspended my API application, even though I didn't overuse or abuse it! Probably this is an Elon thing

Source code of original implementation, which uses Twitter API using python-twitter, is stored in old branch.

Deal With Docker

Dockerfile is ready, in both current and old(original) source tree.

To build:

$ cd <root-directory-of-source>
$ docker build -t twitter-account-data-crawler:latest .

After build, run:

$ docker run -d \
             --name twitter-account-data-crawler \
             -v <path-of-config.yaml>:/app/config/config.yaml \
             twitter-account-data-crawler

You have to prepare configuration file(config.yaml). Please refer the example config file and create your own.

If you're using Podman, just replace docker with podman in command line.

Deal Without Docker

You may still run the program without Docker or OCI-compliant runtimes.

To get this work:

$ cd <root-direvtory-of-source>
# Install requirements
$ pip install -r requirements.txt
# and run!
$ python index.py

Configuration file(config.yaml) should be exist in config folder.

Database Table Structure

Currently only MySQL(and probably MySQL-based DBMS like MariaDB) is supported.

Creating tables per target account is recommended.

The table at least should have these columns:

  • date: type date
  • following_count: type int, unsigned
  • follower_count: type int, unsigned
  • tweet_count: type int, unsigned

An example SQL query for these columns:

CREATE TABLE `account_track_table` (
  `date` date NOT NULL,
  `following_count` int UNSIGNED NOT NULL,
  `follower_count` int UNSIGNED NOT NULL,
  `tweet_count` int UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;