Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think this has any legal bearing for anyone, but it's worth noting that Google respects publishers who do not wish to be indexed.


Here's the abstract for a nice review of search engine law: http://works.bepress.com/james_grimmelmann/13/ If I remember correctly, indexing a site that asks to not be indexed might be illegal as an illegal tresspass, but it is not settled law. The argument is that you are stealing resources (computer time) from the site owner.


>The argument is that you are stealing resources (computer time) from the site owner.

That's why 3taps is getting the data from google's cache without touching craigslist servers.


How the heck are they scraping Google without being banned or rate limited?


There are a lot of companies that are scraping Google quite successfully. Many of these are for 'rank checking' services that provide ranking data for certain keywords over time; these are heavily used by SEO and marketing agencies.

The two that jump to mind are Authority Labs and SEOmoz.

I guess: a shed load of proxies. :)


Amazon/ other clouds out there. Just auto provision your instances (lots of them), scrap, sleep, wake, scrap, sleep...


It's not impossible. You just tell google not to cache.


And makes it impossible to block them as a side effect.


IANAL but AFAIK it is only a civil matter (i.e. not illegal) since it is a usually prosecuted as a tort of trespass to chattels. For such a case to succeed the prosecution needs to show that the actions of the defendant deprived them of use of the good they were trespassing on. i.e. they need to cause enough of a burden on the servers that the claimant or their customers could not use the service.


What I'd care more about is whether the people posting the ads wish to be indexed; I'd be surprised if many don't. (I don't know about the legal bearing either.)


Perhaps craigslist could put up a checkbox (like they do with the never-checked "It's okay to contact me about products ...") that says something like:

"I'm okay with people finding this listing through another service."

Perhaps because they don't want people to get to their listings that way even if they want to, and thus don't want your opinion? (Edit: and perhaps they really aren't as interested as they claim in making it easier for buyers and sellers to find each other?)


So eventually this comes down to CL trying to protect it's monopoly as the listing site of the internet




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: