Gallery-dl – download images from several image hosting sites

IngvarLynn · on Nov 29, 2020

One obvious issue with both these downloaders is lack of proper modularization. Which greatly hinders adoption, I believe. I would expect some kind of plugin system. Naturally, there are attempts to fix that: https://github.com/un-def/dl-plus is one example. As a bonus that would help greatly with recent youtube-dl sort of situations and RIAA would have barely made a splash.

Plenty of other features immediately come to mind as well: universal media support, proper parallelism, GUI, desktop integrations, proxies support, anti-captcha... The punchline is: all of this and much more you could find in jdownloader 10 years ago. But somehow youtube-dl won that race. How did that happen?

bigbubba · on Nov 29, 2020

Jdownloader has the aesthetic of sketchy windows freeware from a bygone era. I tried it once, then uninstalled it a minute later. youtube-dl also integrates well with other tools or autonomous workflows; for example mpv invokes it to stream media from the web without downloading it to the disk.

Hello71 · on Nov 29, 2020

it is sketchy windows freeware. https://en.wikipedia.org/wiki/JDownloader, https://support.jdownloader.org/Knowledgebase/Article/View/a...

bigbubba · on Nov 30, 2020

Huh, I thought it was properly FOSS albeit with a plea for subscriptions, but that's much worse. Evidently my gut instinct was right.

mmebane · on Nov 30, 2020

I think it is OSS [1], although all the code is in Subversion and I don't see a web interface. I haven't dug through it much, but poking around inside the "trunk" module, I see files with headers that say they're GPLv3.

EDIT: Here's a Git mirror of trunk [2]. See, e.g., this file [3].

[1]: https://svn.jdownloader.org/projects/jd [2]: https://github.com/mirror/jdownloader [3]: https://github.com/mirror/jdownloader/blob/master/src/jd/gui...

funcDropShadow · on Nov 30, 2020

> I think it is OSS [1], although all the code is in Subversion

What has the license with the chosen version control system to do?

mmebane · on Dec 1, 2020

The emphasis was more on the "no web interface" part. In 2020, I feel like Subversion is significantly less common than git, so having to install SVN + possibly learn SVN commands is a moderate barrier to browsing the code.

funcDropShadow · on Dec 7, 2020

By the way git is an excellent SVN command-line client, though it doesn't help with the learning curve.

benibela · on Nov 30, 2020

I always run it under a separate user account because it looks so sketchy

I thought it was sketchy by association. It used to be always recommended by sketchy pirate sites to bypass the download limits on sketchy one-click-hosters

NegativeLatency · on Nov 29, 2020

As a former jdownlader user, it never stopped working but seems to be neglected.

Not to mention the ability to pip install youtube-dl compared to getting jdownloder off of some software site with multiple dark patterns to dodge.

ckharri · on Nov 29, 2020

I still use jdownloader, it does it's job. It works on a wide range of sources, works on youtube very well. The install bit is true you literally need to google "jdowloader clean installers no adware" which is pretty bad

paulmd · on Nov 29, 2020

the trick is to download the "other OS" version, that one is just a jar based application and doesn't have all the adware

rakoo · on Nov 29, 2020

That's exactly what the weboob project aims to do: modularize backends (ie website) and applications so you can "use" website content the way you want

https://weboob.org/

pkage · on Nov 30, 2020

This comment comes up every time weboob is mentioned, but it'd probably see much better adoption if it was named anything other than "we boob". It's a bit of an off-putting name.

rakoo · on Nov 30, 2020

An old post about the views of the author concerning the complainers: https://laurent.bachelier.name/2013/12/weboob-the-asshole-de...

A more constructive approach to doing what every complainer asked without atually doing anything about it: https://laurent.bachelier.name/2018/07/a-proposal-for-saniti...

qwerty456127 · on Nov 30, 2020

> youtube-dl sort of situations and RIAA would have barely made a splash

First - that wouldn't be good, such situations should make big splashes. Second - removing a YouTube plug-in from YouTube-DL would make a significant splash anyway.

DamnableNook · on Nov 29, 2020

gallery-dl has almost all of these features, minus the GUI and desktop integration. Did you read the link?

IngvarLynn · on Nov 29, 2020

Did you?

I did not find a single one.

DamnableNook · on Nov 29, 2020

I mean, proxy support is listed in the config docs[1]. Those docs also break down the functionality of gallery-do by module, leading me to believe you either didn’t read or didn’t understand the docs.

[1] https://github.com/mikf/gallery-dl/blob/master/docs/configur...

IngvarLynn · on Nov 29, 2020

[1] describes a very basic proxy support indeed. However I can't imagine a use case for that. What is useful is a support for rotation of proxy list with every download request together with auto-updating such list.

As for modules - splitting each site's support by file is an absolute minimum for sanity. Keeping up all that support in one project quickly becomes quite a chore for a single project maintainer. The whole process fetching the while project and initiating a PR on a github is also rather awkward for developer too. Hence the existence of https://github.com/un-def/dl-plus project.

egeozcan · on Nov 29, 2020

Not to take anything from how great it is to have something like this, some of the supported sites have heavy rate limiting and bot detection and using this with your account can easily get you banned.

For example, I had immense difficulty parsing my own saved posts from instagram (used a one-off script that runs in the browser).

fireattack · on Nov 29, 2020

There is instaloader [1]

[1] https://instaloader.github.io/

skeletonjelly · on Nov 29, 2020

I had an account rate limited with this, though maybe I was using it wrong. It ended up forcing me to give it a mobile number to confirm which I didn't want to do.

Not sure why they'd do this when a spammer (not me) would just create a new account. I was just archiving twice a day with a random sleep time between accounts

dicytea · on Nov 30, 2020

I have been scraping regularly from Instagram (and a bunch of different sites) using gallery-dl for a while now, and I have yet to face any issues. Granted, I scrape at most only once a day, in a batch. So I don't know what problems you might face if you scrape more often.

The only rate-limit issues I've faced is with Twitter, because the way I do it is that I feed gallery-dl a text file containing the profile URLs that I want to scrape. But gallery-dl doesn't add a timeout in-between each input URLs, so Twitter might force a temporary cooldown on you if your list is a bit long.

But you can easily avoid this by writing your own shell script with your own custom timeout.

pabs3 · on Nov 29, 2020

I wonder if there are any Instagram clients that are offline-first; save a copy of your posts locally before sending them to Instagram's servers.

anamexis · on Nov 29, 2020

The official Instagram client does this, at least for iOS - all of your posts get put in the "Instagram" Photos album.

toomuchtodo · on Nov 29, 2020

Does Instagram not have an export function (similar to Facebook) for personal data to comply with CCPA and GDPR? If not, they should be reported to the appropriate regulatory bodies to encourage such an export function.

Disclaimer: I don’t have or use Insta, so apologies if this is a naive question.

a254613e · on Nov 29, 2020

They do. You can easily get all your photos, chats, etc using that feature.

https://www.facebook.com/help/instagram/181231772500920?help...

hprotagonist · on Nov 29, 2020

ah, but i want a tarball of every photo/photogallery that i have liked. This seems to just be what I’ve ever posted.

dtien · on Nov 29, 2020

not sure if this is sarcasm, but photos you liked aren't yours so this would be violating their privacy rules.

hprotagonist · on Nov 29, 2020

I'm perfectly serious; i want to download a bunch of stupid memes that my account has liked and saved into various groups in my account over the years. They're public photos that shitpost accounts have posted.

ffpip · on Nov 30, 2020

If you find a way, please comment here again.I have saved thousands of posts and even sorted them. I desperately want to download them before nuking my account.

gsich · on Nov 29, 2020

Maybe you get the link or some id in your export.

sneak · on Nov 29, 2020

Your likes absolutely are your own interactions, and it's no violation of privacy to allow you access to data to which you already have access. You can't like it unless you can view it first. In fact (it's hidden but it's there) you can go back and view all of your past likes on Instagram.

This is not privacy, this is walled garden nonsense. Instagram thinks they own all your curation activity on the site, and they do not.

totony · on Nov 30, 2020

>privacy rules

Yet those are pictures posted on the Internet, I don't think people have a strong conviction that others can't save them

pbowyer · on Nov 29, 2020

Does anyone know of a downloader that works for private Facebook Groups? I'm a member of some local history groups and the material posted is amazing. I'd like to refer to it in the future, likely long after the posters are gone. I can't rely on Facebook being around then.

Copying and pasting into a personal archive is slow when you want to capture everything (posts and comments) since you don't know which history you'll want to refer to in the future.

RamRodification · on Nov 29, 2020

There are browser extensions that simply lets you save all (or some selection of) images that load on a website. Maybe that's good enough if you can't find one that has proper support for Facebook groups. I imagine you could quite quickly just click through the material and save it. Won't work for text or metadata like descriptions/titles I guess...

blt · on Nov 29, 2020

Scraping images is not enough. There's still pagination to deal with. It's also important to scrape the post text and comments.

blt · on Nov 29, 2020

Facebook actively wants to make this hard. I hope you figure out a way. If so, please post to HN. I think many Facebook group users and admins would be interested.

toomuchtodo · on Nov 29, 2020

Mind if I reach out to you via your contact info? I can get this into the Internet Archive.

pbowyer · on Dec 8, 2020

My email address was wrong - I've fixed that now.

pbowyer · on Nov 29, 2020

Please do

BugWatch · on Nov 29, 2020

I don't know of any solutions, but would like to add that I am interested in one as well.

siliconmountain · on Nov 29, 2020

I would like to find the same thing for online ebook/textbook readers that University websites use. Suggestions?

Every online reader I've used ProQuest, VLE, BCR, etc offer the worst online readers with frames, annoying 15 minute timeouts, terrible highlighting/notes, etc

I just want to print or download a local copy to highlight & full text search offline (I run them through OCR for text)

mixmastamyk · on Nov 29, 2020

Cool tool but for some reason reason I was expecting more support for museum and library galleries and less for porn sites. :D

I guess they’re already downloadable though a command line tool could be convenient.

nonbirithm · on Nov 29, 2020

I wrote something similar, but for the purpose of saving material to a personal image board while using a mobile device or browser, using a web service architecture. This means you don't have to use a terminal every time you want to save an image, and several different clients can be used without needing to rewrite the extraction code for each one. It also saves the original tags if the source supports them, which makes everything way more searchable. But this new program has a lot more site support than mine.

We need good wrappers for these kinds of programs for use with mobile devices.

https://github.com/Ruin0x11/szurupull

tholford · on Nov 29, 2020

Nice work! Made something similar (and less feature-rich) using Deno:

https://github.com/tomholford/media-downloader

airstrike · on Nov 29, 2020

Glad to see ArtStation is included. Might finally give me enough impetus to write a script to change my wallpaper daily

aomega08 · on Nov 29, 2020

The following will give you "#1 wallpaper of the day" as decided by the wise folks at Reddit:

  curl -A 'random' "https://www.reddit.com/r/wallpaper/top.json?t=day&limit=1 | jq -r '.data.children[0].data.preview.images[0].source.url' | sed 's/preview\./i\./g'

And if you're a Mac user, the following will change the background:

  osascript -e 'tell application "Finder" to set desktop picture to POSIX file "<path to the picture>"'

Glueing these two pieces should be trivial :)

hyiltiz · on Nov 30, 2020

The call to curl is missing a closing " pair. Besides, it seems some people submit a low-res image at the post, then post another link as the first reply, which is the case today:

https://preview.redd.it/b2ihaxu2i5261.jpg?width=640&crop=sma...

Care to adjust the jq a bit to look for both, then maybe somehow compare the resolution of the two and download the better one?

blindm · on Nov 30, 2020

Interesting that the list of image upload sites is a who's-who of sites. Always looking for alternatives to Imgur which is laden with ADs and doesn't work with an AD-blocker turned on (it explicitly asks you to turn it off)

BugWatch · on Nov 29, 2020

Hm, I couldn't find the list of supported sites. Is Pinterest covered? Or does anyone know of similar working tools that would work for Pinterest? As a means of "backup" and/or sync with Pinry.

pantulis · on Nov 29, 2020

It’s here https://github.com/mikf/gallery-dl/blob/master/docs/supporte...

paulcarroty · on Nov 29, 2020

Cool project. Why download from Facebook isn't supported yet? Does FB use any protection for his content?

ffpip · on Nov 29, 2020

FB (and instagram, whatsapp) do something weird with all media files. They assign a unique ID to each image/video/file and they don't let you download if you try to remove that ID or change it. It changes every time you reload the page. Maybe something to do with that.

Plus there is also massive rate limiting. Especially instagram, since it is only images.

submeta · on Nov 29, 2020

Excellent. Thanks for sharing. - Beautifully coded in Python.

nikisweeting · on Nov 29, 2020

Ooh this looks great, I think it would work well as an extractor in ArchiveBox next to youtube-dl.

krmani · on Nov 29, 2020

Recently I read about pyodide (WASM). Can it run in browser using pyodide also?

utxaa · on Nov 29, 2020

ah ok there's xhamster. for a second there I thought this was useless.

dismantlethesun · on Nov 30, 2020

There's also one for Luscious.net; they basically have all the major h-sites and x-sites.

I imagine that that's a big source for people to download.

Rolpa · on Nov 29, 2020

This strikes me as the FOSS equivalent of trying to piss off the volcano.

drsozesakamoto · on Nov 29, 2020

What volcano ? The RIAA ?

bigbubba · on Nov 29, 2020

I guess the photograph 'volcano' would probably be Getty.

marwan-nwh · on Nov 29, 2020

The first link in homepage with just 8 points and zero comments?

Triv888 · on Nov 29, 2020

some posts gets boosted by a greater force