Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are protocols for bots. Not all of them follow it... so block requests from them.

Problem solved... like a million internet years ago.



How do you get indexed, then? because I can't see how this solves anything.

Isn't the proposal clear enough?

1. Optimize the indexing process so that we avoid each search engine crawling independently every site.

2. Devise a method to refresh the index when the content changes (hash, date...)

Seems resonable enough, to me.


I assume he means to block all search engines except the big guys: Google, Bing, Yahoo. Anyone else has too little impact. Not sure how one could do that, it's not like a request tells me "hey there, I'm a robot! Let me in?"


Most actually do via the user agent string in the request. You can't stop a malicious bot this way, but you could kill most bot traffic with a rewrite rule, presuming robots.txt isn't good enough.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: