Spider Monkey

PHP Web Scraping Engine

I started this project in January 2011. It was going to be an easy to use web scraper that anyone could configure. It has an attractive GUI interface using jQuery UI elements.

The selectors can be entered using three different methods; the first is the tried and true regular expression method. The second, which is the easiest and most powerful to use, is CSS selectors. Someone could visit the pages they want to scrape, use their developer console to debug some selectors, and then toss them into the engine. The third method is a simplified regex syntax I call asterisk, where the user enteres the start string and the end string and an asterisk in the middle.

There is also an easy to use configuration screen for data storage. The engine was going to be smart enough to build different mysql tables and even build the relationshipts between the data. Or, data could be stored in XML or JSON documents, and a hierarchy would be maintained.

But, I never finished the project.

This repo is a hodge podge of code, rather unorganized. I haven't been through it in a while, there might actually be two distinct code bases in here.

It is now released under the BSD license.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
css		css
includes		includes
js		js
screenshots		screenshots
spider		spider
README.md		README.md
execute.php		execute.php
index.php		index.php

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider Monkey

About

Releases

Packages

Languages

tlhunter/spidermonkey

Folders and files

Latest commit

History

Repository files navigation

Spider Monkey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages