Skip to content

funspectre/web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple Web Crawler

This is a simple web crawler that takes a URL and scrapes for every URL on the same domain in order to generate a sitemap.

Build

To build the program, you need to have the following installed

  • Golang v1.19+

Then run the following commands in the project folder:

# Download dependencies
go mod vendor
# Build the program
go build -o web-crawler
# Run the program
./web-crawler https://gobyexample.com/

Sitemap Format

The sitemap format is as follows:

  • For every page visited the parent URL is written to sitemap.txt in the current working directory followed by every anchor URL within the same host domain found on the page
  • Each group of URLs as above is delineated by an empty line
[parent-url]
[child-url]
[empty-line]
[parent-url]
[child-url]
[child-url]
[empty-line]
...

About

A simple golang web-crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages