Web Crawler

Sp1d3R | 2024

Usage

The below snippet is used to set crawler options.

[crawl_options]
log_file = './crawl.log' # log file
database_location = './databases' # databases
debug = true # enable debug log 
profile = true # start profiler
cache_dir = './data' # page cache
graph_dir = './graphs' # graph folder
index = "./indexes.pkl" # index filename
workers = 8 # number of workers to index

The below snippet is used for defining a profile for the crawler.

[profiles]
[profiles.PROFILE_NAME]
    locations = [ 'https://sp1d3r.vercel.app' ]
    depth = 3
    match = [ Regex Matches ]
    filter = [ Regex filters ]

Refer to the config.toml file for more example usages.

Crawler:

$ python crawler -config config.toml

TODO

Add Graph frontend

Finished

Add Indexing
Add search engine

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
crawler		crawler
graphvis		graphvis
indexing		indexing
models		models
search_engine		search_engine
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
crawl.toml		crawl.toml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler

Usage

TODO

Finished

About

Contributors 2

Languages

spider2048/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Usage

TODO

Finished

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages