oh spider don’t come visit me please!

I do love web spiders, you know those search engines spiders that crawls to your website and visit each of your links so that it will be indexed by the search engine... well for the last two days I had a hell from one of the spider... IT BRINGS MY SITE DOWN!!!

Last two days i received an email alert from my own scripts that monitor my sites availability. My server was very slow for the past two days and when I checked the server processes there were so many process. I was thinking that the same incident happened like before, but it was not!, it is what the term called "rapid fire".

After checking each of my websites' logs in apache, I found out this was because a seach engine spider (Sosospider+), was crawling to my websites. The spider was crawling too frequent within the short time to get info from my websites. Banning the spider using robots.txt would not immediately stop the crawler.

Thankfully for me, the company responsible for the spider responded immediately after I sent them email regarding the problem and immediately stop and decrease their spider visit to my websites....

Now,  I am just thinking out loud... this only help me for that spider only at this time, what about in future where there will be new search engines and new spider (I'm sure!) ? they will crawl again and causing the same problem and I will need to do the same again? Why can't we have some syntax (in robots.txt maybe) that says only active  bots allowed to crawl... or better yet syntax that says only certain bots can crawl (yeah then I will put only major search engines bots - oh no, this will alienated the rest of the search engines! well what the heck, go and crawl the major search engines!).

anyway whatever it is we'll need a better robots.txt or watsoever....

Tags: , , ,
Search Engine & Optimisation, Security, System Administration, Uncategorized |

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

No comments yet.

Leave a comment

(required)

(required)