Skip to content

Scrapy Bot which scrapes data from the popular websites.

Notifications You must be signed in to change notification settings

niteshkumar2000/ScrapeIt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ScrapeIt

Scrapy Bot which scrapes data from the popular websites. This scrapy bot is generated by popular python package scrapy.

How to start

Initially we have to create a scrapy project.

scrapy startproject projname

Then change directory and generate spider for the website to be scraped.

cd projname
scrapy genspider spiderclassname domain

Note: Domain name should not include https://

Locate the file and start designing the bot.

How to test

Scrapy shell helps us to test the commands and see the reslts instantly.

scrapt shell "URL"

Note:URL should be inside the quotes.

Then you can check the scraped site by typing the command

view(response)

if it return true, the page will be displayed in your default browser.

Data can be scraped using two techniques,

  1. XPath (Recomended).
  2. Css Selector.

How to run

You can clone the repo and navigate to any of the folders and type,

cd projname
scrapy crawl spiderclassname

the output will be rendered as a json on the console. You can also save the output in .json or .csv file by typing,

scrapy crawl spiderclassname -o result.json
scrapy crawl spiderclassname -o result.csv

If you are unable to run the scrapy the project you can still view the ouput of spider in result.csv file located in the folder.

Happy Scraping :)

About

Scrapy Bot which scrapes data from the popular websites.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages