Reminds me of a (personal) website I saw long time ago that replicates an earlier day OSX desktop, with functional itunes player. Does anyone remember what it was called?
Hmm maybe I will if I have time. We've been using this technique for user-initiated scraping. The only issue we've run in to is we get rate-limited by IP sometimes. Changing the IP has solved the problem each time.
If I am correct in assuming the parent is talking about puppeteer, there is a plugin[1] that claims to evade most of the methods used to detect headless browsers. I have used it recently for just that purpose, and I can say that it worked wifh minimal setup and configuration for my usecase, but I guess depending on the detection mechanisms youre evading YMMV.
The creator of that plugin does mention it is very much a cat and mouse game, just like most of the “scraping industry”
That's a different story. Aws was competiting for revenue with the developers of MongoDB and Redis' for profit company. Postgres' publisher does not use managed service as a source of income.
Or even better, have contracts with the companies. Maybe unlikely for them, but I think “scraping” is too often assumed to be “bad” in some way. The company I work for does a lot of web scraping, but we have contracts with our partners to scrape their websites. They may still have robots.txt that ask users not to scrape some areas, but we are allowed to bypass those.