Skip to content

zastrixarundell/data_reeler

Repository files navigation

data_reeler

A WebCrawler microservice for fishing stuff!

Prerequisites for runtime

Elixir

The specific version of Elixir and Erlang are set in the .tool-versions file. This can be done with asdf. After following the asdf instructions one can start adding the required plugins for the runtimes:

asdf plugin add elixir
asdf plugin add erlang
asdf install

After a few minute of compiling Erlang you should have elixir working:

$ elixir -v 
Erlang/OTP 26 [erts-14.2.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

Elixir 1.16.0 (compiled with Erlang/OTP 24)

CONCURRENT is the amount of active connections. Generally 4 should be used per service, so just multiply the amount of services by 4 for this value.

PREBOOT_QUANTITY should probably be the amount of active chrome browsers at any time, although it might not work because of bad documentation.

Elasticsearch

As this application uses elasticsearch for the main logic for filtering through the products, it needs to be ran as a service. Due to a bug of implementation in the ES library, elasticsearch 8 can't be used. To run the correct version in a container, run this command:

podman run --restart always -d --name elasticsearch --memory 2048m -p 0.0.0.0:9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:7.17.18

To index/sync products onto the ES server, you need to run this command:

mix elasticsearch.build products --cluster DataReeler.Elasticsearch.Cluster

Starting the crawlers

To start all crawlers, run this command:

mix data_reeler.crawlers

Or alternatively if DECOUPLED_CRAWLERS is set to false, it will run the crawlers alongside the server.

To start an individual crawler, head to the Server folder and find the last part of the module CamelCase name. Example: DataReeler.Servers.MyStore. Then run the following comamnd:

mix data_reeler.crawlers --crawler my_store

You can also run multiple crawlers like this:

mix data_reeler.crawlers \
  --crawler my_store \
  --crawler another_store \
  --crawler yet_another_store

Starting the server

  • Run mix setup to install and setup dependencies
  • Start Phoenix endpoint with mix phx.server or inside IEx with iex -S mix phx.server

Now you can visit localhost:4000 from your browser.

Ready to run in production? Please check our deployment guides.

Learn more

About

Webcrawler service and server

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published