Hacker Newsnew | past | comments | ask | show | jobs | submit | edude03's commentslogin

Sounds like you're using LLMs to replace human connection.

For example instead of: Duolingo - I practice with my friends Calorie tracking - I have planned meals from my dietitian Workout tracking - I have WhatsApp with my PT, who adjusts my next workouts from our conversations Reminders - A combo of Siri + Fantastical + My Wife

I'm sure my way is more expensive but I don't know, there is also a non tangible cost of not having friends/personal connections as well.


I may be missing your intent, but this feels like a misread of what I was describing.

I wasn’t swapping human connection for LLMs. These workflows already existed; I’ve simply used newer tools to make them better aligned to my needs and more cost-effective for me.


> There is no economic rule that says that riveting should pay more than taking care of the elderly or food delivery.

There kind of is - it's the same reason B2B SaaS tend to make more money than B2C - it's easy (easier) to sell someone something if they can make money from it.

If I can pay you Y to rivet some sheet metal together and sell the finished product for Y * 10, that's a much better outcome for me (economically) than paying someone to take care of my elderly parents. In fact, maybe I'm not mean, maybe _I_ don't make enough money to afford to pay someone to take care of my elderly parents.


Economic rules are all subject to externalities like the effects of taxes, regulations, etc. I mean there is no rule in the sense that some of the jobs that are poverty wage in the US are not poverty wage in other countries due to the impact of regulation.

It's a policy choice to allow Walmart pay full-time employees so little that taxpayers have to subsidize their food. We are free to make different choices.


I've been thinking about building this with friends, in the short term though you could do this today with Garage http://garagehq.deuxfleurs.fr


Kind of - AFAIK "micro" was never actually throughly defined. In my mind I think of it as mapping to one table (IE, users = user service, balances = balances service) but that might still be a "full service" worth of code if you need anything more than basic CRUD


The original sense was one business domain or business function (which often would include more than one table in a normalized relational db); the broader context was that, given the observation that software architecture tends to reflect software development organization team structure, software development organizations should parallel businesses organizations and that software serving different business functions should be loosely coupled, so that business needs in any area could be addressed with software change with only the unavoidable level of friction from software serving different business functions, which would be directly tied to the business impacts of the change on those connected functions, rather than having unrelated constraints from coupling between unrelated (in business function) software components inhibiting change driven by business needs in a particular area.


It blows my mind that with all the technology we have we can't find a plane that we have a pretty decent rough idea where is.

It's a testament to how big and deep the ocean is


> I maintain an S3 client that has a test matrix for the commonly used S3 implementations.

Is it open to the public? I'd like to check it out


Why not? What better things does the CTO have to do?


$50k would be the cost to run it un-quantized, 10k could get you for example 4 5090 system, that would run the 671b q4 model which is 90% as good, which was the OPs target


which 671b quants can fit into 96GB VRAM? Everything I’m aware of needs hundreds at least (e.g. https://apxml.com/models/deepseek-r1-671b).


5090 is 32 GB so it's 128GB, not 96.


128 is still not 300. Something like 4x 6000 blackwell is the minimum to run any model that is going to feel anything like claude locally.

To my deep disappointment the economics are simply not there at the moment. Openrouter using only providers with zero data retention policies is probably the best option right now if you care about openness, privacy and vendor lock-in.


For local use and experimentation you don't need to match a top of the line model. In fact something that you train or rather fine-tune locally might be better for certain use cases.

If I was working with sensitive data I sure would only use on prem models.


> why he regretted introducing Neo4J

Even for use cases Graph dbs knock out of the park Neo4j (historically, I haven't used it in like 10 years) didn't work very reliably compared to modern competitors.

But as always it's about picking the right tool for the job - I tried to build a "social network" in mysql and neo4j, and (reliability aside) neo4j worked way better.


> I wonder if it will spur nvidia to work on an inference only accelerator.

Arguably that's a GPU? Other than (currently) exotic ways to run LLMs like photonics or giant SRAM tiles there isn't a device that's better at inference than GPUs and they have the benefit that they can be used for training as well. You need the same amount of memory and the same ability to do math as fast as possible whether its inference or training.


> Arguably that's a GPU?

Yes, and to @quadrature's point, NVIDIA is creating GPUs explicitly focused on inference, like the Rubin CPX: https://www.tomshardware.com/pc-components/gpus/nvidias-new-...

"…the company announced its approach to solving that problem with its Rubin CPX— Content Phase aXcelerator — that will sit next to Rubin GPUs and Vera CPUs to accelerate specific workloads."


Yeah, I'm probably splitting hairs here but as far as I understand (and honestly maybe I don't understand) - Rubin CPX is "just" a normal GPU with GDDR instead of HBM.

In fact - I'd say we're looking at this backwards - GPUs used to be the thing that did math fast and put the result into a buffer where something else could draw it to a screen. Now a "GPU" is still a thing that does math fast, but now sometimes, you don't include the hardware to put the pixels on a screen.

So maybe - CPX is "just" a GPU but with more generic naming that aligns with its use cases.


There are some inference chips that are fundamentally different from GPUs. For example, one of the guys who designed Google's original TPU left and started a company (with some other engineers) called groq ai (not to be confused with grok ai). They make a chip that is quite different from a GPU and provides several advantages for inference over traditional GPUs:

https://www.cdotrends.com/story/3823/groq-ai-chip-delivers-b...


The AMD NPU has more than 2x the performance per watt versus basically any Nvidia GPU. Nvidia isn't leading because they are power efficient.

And no, the NPU isn't a GPU.


Maybe a better way to make my point - the GPU is nvidias golden goose egg and it's good enough that they may go down with the ship. For example (illustrative numbers) - if it costs nvidia $100 to make a GPU they can sell to gamers for $2000, researchers for $5000 and enterprise for $15,000, would it make sense for them to start from scratch and invest billions to make something that's today an unknown amount better and that would only be interesting to the $15,000 market they've already cornered? (Yes, I'm assuming there are more gamers than people who want to run a local LLM)


I would submit Google's TPUs are not GPUs.

Similarly, Tenstorrent seems to be building something that you could consider "better", at least insofar that the goal is to be open.


Isn't Etched's Soho ASIC claimed to be much better than a GPU?

https://www.etched.com/announcing-etched


I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.


They’re already optimizing GPU die area for LLM inference over other pursuits: the FP64 units in the latest Blackwell GPUs were greatly reduced and FP4 was added


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: