More

ericdotlee · 2025-09-24T17:05:42 1758733542

With the recent release of Qwen-3 Omni I've decided to put together my first local machine. As much as I just want to pick up a beelink and flash it with Omarchy I think I want a bit more horsepower.

However, the internet seems littered with "clever" loca ai monstrosities that gang together 4-6 ancient nVidia GPU's (priced today to seem like overpriced e-waste) to get lackluster performance from piles of nVidia m60's and P100's? In 2025 this kind of seems like a waste or just bad advice to use hardware this old?

Curious if this find seems like a good source of info regarding staying away from Intel and AMD GPU's for local inference? Might do some training but right now more interested in light RAG and maybe some local coding.

Hoping to build something before the holiday season to keep my office warm with GPU's :).

Thanks!

ericdotlee · 2025-09-23T13:37:07 1758634627

What is llama-swap?

Been looking for more details about software configs on https://llamabuilds.ai

elsombrero · 2025-09-23T15:33:48 1758641628

https://github.com/mostlygeek/llama-swap

it's a transparent proxy that automatically launches your selected model with your preferred inference server so that you don't need to manually start/stop the server when you want to switch model

so, let's say I have configured roo code to use qwen3 30ba3b as the orchestrator and glm4.5 air as coder, roo code would call the proxy server with model "qwen3" when using orchestrator mode and then kill llama.cpp with qwen3 and restart it with "glm4.5air"

ericdotlee · 2025-09-23T13:36:14 1758634574

I've purchase 16 of these - cpayne is great! Hope he finds a US distributor to help with tariffs a bit!

jacquesm · 2025-09-23T13:54:22 1758635662

What blew me away is the quality and price point of what obviously can't be a very high volume product. This guy makes amazing stuff.

ericdotlee · 2025-09-23T13:33:31 1758634411

Any reason you wouldn't opt for the 4090 or 5090?

Instantix · 2025-09-24T20:29:18 1758745758

3090 second hand can be found at something like $600.

ericdotlee · 2025-09-22T21:55:19 1758578119

Hi, I'm looking to transition from renting GPU's from RunPod to hosting some models locally - specifically qwen-2.5 and some lightweight VLM's like Moondream. It looks like the RTX 3060 12gb is a relatively good option but I don't necessarily have a lot of experience with pc hardware, let alone used hardware.

Curious if anyone here has a similar config of 1-4 RTX 3060s? Trying to decide if picking up a few of these is a good value or if I should just continue renting cloud GPU's?

leakycap · 2025-09-22T22:45:55 1758581155

Have you set up hardware/software to run models locally before? Not that it's incredibly hard, but the cloud GPU providers usually simplify a lot of the steps.

The jump from 1 RTX running a local model to multiple RTX is a big jump that I'd wait to tackle until I was very comfortable with 1. Which might mean you want to do something with more VRAM instead of the 3060.

Starting with 1 card doesn't work for everyone; but you're biting off a lot vs. renting a cloud GPU and it can take a lot of time to get a local setup going for the first time.

ericdotlee · 2025-09-17T22:21:32 1758147692

Why is zog so popular these days? Seems really cool but I have yet to get the buzz / learn it.

Is there a big reason why Triton is considered a "failure"?

ericdotlee · 2025-09-17T22:20:43 1758147643

I've built a few small machines in my homelab - but I've been a huge fan of the Beelink mini ryzen PC's that DHH has been featuring with Omarchy.

That said - I don't think I want to opt for one of the Ryzen "ai mini pc's" they seem solidly a worse option than a 3090?

ericdotlee · 2025-09-17T21:52:46 1758145966

I've been toying with the idea of building a dedicated machine to continue working on my FPV flight control model.

So far I've just been renting gpu time but it's getting expensive. I also occasionally use qwen3-coder in my homelab but my headless "inference" machine is quite limited with an RTX 3080.

Curious if this build from llamabuilds.ai looks good? In terms of gpu etc? I'm maybe considering buying more GPUs in the future - hard to tell if stacking RTX 6000 ADA's is worthwhile!

Thanks HN!

ericdotlee · 2025-09-17T14:56:40 1758121000

I think this is going to change really fast - not in favor of ai companies or investors.

Initially, ChatGPT was kind of a cool toy that felt like something new but not really a threat. Now, even people who aren't engineers see the threat this has to destroy the dignity and financial stability of millions of people.

I like Ai - but at the same time I hope our country picks people over corporate profits... unfortunately I don't see that being likely.

ericdotlee · 2025-09-17T00:26:08 1758068768

It's wildly sad - but the number of people using this for concerning emotional needs use cases is troubling.

Companies like BetterHelp have not helped people improve really just training them to have a constantly affirming voice at arms length at all hours.