Note that actually *running* the model without a A100 GPU or better will be tric...

dheera · on Aug 1, 2024

You don't need an A100, you can get a used 32GB V100 for $2K-$3K. It's probably the absolute best bang-for-buck inference GPU at the moment. Not for speed but just the fact that there are models you can actually fit on it that you can't fit on a gaming card, and as long as you can fit the model, it is still lightyears better than CPU inference.

Morphiak · on Aug 2, 2024

Why this versus the 2 3090s (with nvlink for marginal gains) and 48GB for 2$K ?

CuriouslyC · on Aug 1, 2024

3090 TIs should be able to handle it without much in the way of tricks for a "reasonable" (for the HN crowd) price.

fl0id · on Aug 1, 2024

higher ram apple silicon should be able to run it too. if they don't use some ancient pytorch version or something.

phkahler · on Aug 1, 2024

Why not on a CPU with 32 or 64 GB of RAM?

holoduke · on Aug 1, 2024

Much slower memory and limited parallelism. Gpu ÷- 8k pr more cuda cores vs +-16 on regular cpu. Less mem swapping between operations. Gpu much much faster.

CuriouslyC · on Aug 1, 2024

Performance, mostly. It'll work but image generation is shitty to do slowly compared to text inference.

s-macke · on Aug 2, 2024

Got it running. But it is a special setup.

* NVIDIA Jetson AGX Orin Dev. Kit with 64 GB shared RAM.

* Default configuration for flux-dev. (FP16, 50 steps)

* 33GB GPU RAM usage.

* 4 minutes 20 seconds per image at around 50 Watt power usage.