You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since HTTP requests executes synchronously there is a lot of idle time spend waiting for IO to complete. Gevent can be used for a massive concurrency improvement.
Yeah, sure it would increase the performance, however I did this one night in some hours and I have never used it... I mean, I'm not sure if this this useful for something and if it deserves improvements :-P
Apart from this, there is a very big issue. Crawler works recursively, so it takes all links from the current web page. Then for each Web it takes n links (probably more than 10). Then the queue of "remaining links to visit" is increasing exponentialy, and it will never stops. So, which is the end? your memory is full...
Thanks for this information! I will use it in other crawler's stuff
Since HTTP requests executes synchronously there is a lot of idle time spend waiting for IO to complete. Gevent can be used for a massive concurrency improvement.
http://www.youtube.com/watch?v=2gcrTsQ7yi4&feature=plcp
The text was updated successfully, but these errors were encountered: