There are protocols for bots. Not all of them follow it... so block requests fro...

turbohz · on May 25, 2011

How do you get indexed, then? because I can't see how this solves anything.

Isn't the proposal clear enough?

1. Optimize the indexing process so that we avoid each search engine crawling independently every site.

2. Devise a method to refresh the index when the content changes (hash, date...)

Seems resonable enough, to me.

Meai · on May 25, 2011

I assume he means to block all search engines except the big guys: Google, Bing, Yahoo. Anyone else has too little impact. Not sure how one could do that, it's not like a request tells me "hey there, I'm a robot! Let me in?"

gnaritas · on May 25, 2011

Most actually do via the user agent string in the request. You can't stop a malicious bot this way, but you could kill most bot traffic with a rewrite rule, presuming robots.txt isn't good enough.