Skip to content

Latest commit

 

History

History
48 lines (42 loc) · 1.35 KB

README.md

File metadata and controls

48 lines (42 loc) · 1.35 KB

ScrapeIt

Scrapy Bot which scrapes data from the popular websites. This scrapy bot is generated by popular python package scrapy.

How to start

Initially we have to create a scrapy project.

scrapy startproject projname

Then change directory and generate spider for the website to be scraped.

cd projname
scrapy genspider spiderclassname domain

Note: Domain name should not include https://

Locate the file and start designing the bot.

How to test

Scrapy shell helps us to test the commands and see the reslts instantly.

scrapt shell "URL"

Note:URL should be inside the quotes.

Then you can check the scraped site by typing the command

view(response)

if it return true, the page will be displayed in your default browser.

Data can be scraped using two techniques,

  1. XPath (Recomended).
  2. Css Selector.

How to run

You can clone the repo and navigate to any of the folders and type,

cd projname
scrapy crawl spiderclassname

the output will be rendered as a json on the console. You can also save the output in .json or .csv file by typing,

scrapy crawl spiderclassname -o result.json
scrapy crawl spiderclassname -o result.csv

If you are unable to run the scrapy the project you can still view the ouput of spider in result.csv file located in the folder.

Happy Scraping :)