Skip to content

a multi-threaded and high-performance crawler

Notifications You must be signed in to change notification settings

gaorong/GRCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GRCrawler

  • GRCrawler is an open source web crawler for C++, which is customizable and high-performance.
  • Based on the Reactor pattern(epoll), supports Http 1.1(persistent connections, pipelining, chunked transfer-encoding).
  • Using Bloom Filter, ThreadPool, async DNS parse etc, imporve the performance.  
  • Custom the feature by the configuration file for basic and add the DSO(dynamic shared object which is ".so" file) for advanced.

Requirements

  • libevent
  • Works on Linux

Installation

 $make && make install

Quickstart

You need to config the crawler by spider.conf, and run the spider, that's all.
Moreover, "-d" option assign running the program as a daemon process.

About

a multi-threaded and high-performance crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published