Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Note that actually running the model without a A100 GPU or better will be tricker than usual given its size (12B parameters, 24GB on disk).

There is a PR to that repo for a diffusers implementation, which may run on a cheap L4 GPU w/ enable_model_cpu_offload(): https://huggingface.co/black-forest-labs/FLUX.1-schnell/comm...



You don't need an A100, you can get a used 32GB V100 for $2K-$3K. It's probably the absolute best bang-for-buck inference GPU at the moment. Not for speed but just the fact that there are models you can actually fit on it that you can't fit on a gaming card, and as long as you can fit the model, it is still lightyears better than CPU inference.


Why this versus the 2 3090s (with nvlink for marginal gains) and 48GB for 2$K ?


3090 TIs should be able to handle it without much in the way of tricks for a "reasonable" (for the HN crowd) price.


higher ram apple silicon should be able to run it too. if they don't use some ancient pytorch version or something.


Why not on a CPU with 32 or 64 GB of RAM?


Much slower memory and limited parallelism. Gpu ÷- 8k pr more cuda cores vs +-16 on regular cpu. Less mem swapping between operations. Gpu much much faster.


Performance, mostly. It'll work but image generation is shitty to do slowly compared to text inference.


Got it running. But it is a special setup.

* NVIDIA Jetson AGX Orin Dev. Kit with 64 GB shared RAM.

* Default configuration for flux-dev. (FP16, 50 steps)

* 33GB GPU RAM usage.

* 4 minutes 20 seconds per image at around 50 Watt power usage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: