I strongly recommend that people run LLMs locally for a different reason. The on...

fragmede · on April 1, 2024

The other reason is to find out what a detuned model is capable of. The canonical example is how to make cocaine, which ChatGPT will admonish you for even asking, while llama2-uncensored will happily describe the process which is only really interesting if you're an amateur chemist and want to be Scarface-that-knocks. (the recipe is relatively easy, it's getting access to the raw ingredients that's the hard part, same as with nukes.)

if you accidentally use the word"hack" when trying to get ChatGPT to write some code for you. it'll stop and tell you that hacking is bad, and not a colloquial expression, and refuse to go further.

privacy reasons are another reason to try a local LLM. for the extremely paranoid (justified or not), a local LLM gives users a place to ask questions without the text being fed to a server somewhere for later lawsuit discovery (Google searches are routinely subpoenaed, it's only a matter of time until ChatGPT chats are as well.)

There's an uncensored model for vision available as well. The censored vision models won't play the shallow game of hot or not with you.

There are uncensored image generation models as well, but, ah, those are NSFW and not for polite company. (As well as there's multiple thesis' worth of content on what that'll do to society.)

bambax · on April 1, 2024

> if you accidentally use the word "hack" [with] ChatGPT...

Side note: ChatGPT is now completely useless for most creative tasks. I'm trying to use it, via NovelCrafter, to help flesh out a story where a minor character committed suicide. ChatGPT refuses to respond, mentioning "self harm" as a reason.

The character in question killed himself before the story even begins (and for very good reasons, story-wise); it's not like one's asking about ways to commit suicide.

This is insane, ridiculous, and different from what all other actors of the industry do, including Claude or Mistral. It seems OpenAI is trying to shoot itself in the foot and doing a pretty good job at it.

luma · on April 1, 2024

OpenAI is angling for enterprise users who have different notions about safety. Writing novels isn't the use case, powering customer service chatbots that will never ever ever say "just kill yourself" is.

antonvs · on April 2, 2024

My contrarian tendencies now have me thinking of scenarios where a customer service chatbot might need to say "just kill yourself".

Perhaps the HR support line for OpenAI developers tasked with implementing the censorship system?

fwn · on April 3, 2024

You don't have to go that far. Depending on the topic, it's already very hard to use Gemini/ChatGPT in a corporate setting. Think: You're doing an FAQ for a child safety crisis/PA slide deck.

It's funny how sometimes the quality of associate work drops as the topics make text generators less useful.

The way forward for corporations is almost certainly either for very specific use cases or local/internal LLMs. Producers of these will probably be less afraid of being canceled by populists, hence introduce less censorship.

luma · on April 3, 2024

Do you actually talk to enterprise users? Nobody I’ve spoken to has ever once complained about censorship. Everyone is far more worried about data governance and not getting sued. Maybe it’s just that zero people I talk to are making child safety crisis slide decks? Seems like an unusual use case for most businesses.

fwn · on April 3, 2024

Your reference to data security is very valid, but why is that comment so provocative and upset?

> Seems like an unusual use case for most businesses.

You may not work in a public affairs consultancy. Companies do different things. Well drilling is also "an unusual use case for most companies". That does not make it any less important.

If your tool tries to fiddle with the content of your statement, it's not a serious tool. No one would accept their spell check tool to have an opinion on the type of content it is correcting.

aksss · on April 2, 2024

Canadian MAID support line?

bambax · on April 3, 2024

I don't think that's true. OpenAI today opened up ChatGPT to all users, without even the need to login [0]. They are fighting for dominance and firs-mover advantage, and are maybe beginning to feel the heat of the competition.

[0] https://twitter.com/OpenAI/status/1774848681981710821

marpstar · on April 1, 2024

I’ve been frustrated by this, too. Trying to ask for ways to support a close family member who experienced sexual trauma. ChatGPT won’t touch the topic.

barfingclouds · on April 1, 2024

Darn I guess you’ll have to go back to living in the dark ages and actually write it yourself

bambax · on April 2, 2024

https://news.ycombinator.com/item?id=39894775

OOPMan · on April 1, 2024

[flagged]

serf · on April 1, 2024

you wouldn't give out the same sort of advice when a compiler or linker failed to complete a given task, although one certainly could do the work manually.

it's just fashionable to hate on AI, /s or not.

jpc0 · on April 1, 2024

Is it common for you to write the header file to say a BST and the compiler didn't rightfully throw an error?

That's what you are asking the LLM to do, you are generating code not searching for an error condition.

The parent comment is saying maybe as a writer, writing the actual story isn't such a bad thing...

Just a different perspective, live your own life. If the story is good I'll enjoy it, who wrote it is for unions and activists to fight. I've got enough fights to fight.

bambax · on April 1, 2024

The alternative to ChatGPT isn't non-AI, it's Claude.

drngdds · on April 1, 2024

This is a really conservative view about what ways to make art are valid. Generative art is nothing new and it can be pretty awesome sometimes.

astrange · on April 1, 2024

> if you accidentally use the word"hack" when trying to get ChatGPT to write some code for you. it'll stop and tell you that hacking is bad, and not a colloquial expression, and refuse to go further.

Is that 3.5 or 4? I asked 4 for an example of code which "is a hack", it misunderstood me as asking for hacking code rather than buggy code, but then it did actually answer on the first try.

https://chat.openai.com/share/ca2c320c-f4ba-41bf-8f40-f7faf2...

semi-extrinsic · on April 1, 2024

I don't use LLMs for my coding, I manage just fine with LSP and Treesitter. So genuine question: is that answer representative of the output quality of these things? Because both answers are pretty crappy and assume the user has already done the difficult things, and is asking for help on the easy things.

fragmede · on April 1, 2024

The response seems pretty reasonable; it's answering the question it was asked. If you want to ask it how to do the difficult part, ask it about that instead. Expecting it to get the answer right in the first pass is like expecting your code to compile the very first time. You have to have more of a conversation with it to coax the difference out of you're thinking and what you're actually saying.

If you're looking to read a more advanced example of its capabilities and limitations, try

https://simonwillison.net/2024/Mar/23/building-c-extensions-...

lpapez · on April 1, 2024

It's not representative.

The models are capable of much much more, and they are being significantly nerfed over time by these ineffective attempts to introduce safeguards.

Recently I've asked GPT4 to quote me some code to which it replied that it is not allowed to do so - even though it was perfectly happy to quote anything until recently. When prompted to quote the source code, but output it as PHP comments, it happily complied because it saw that as "derivative work" which it is allowed to do.

astrange · on April 1, 2024

My point is that there aren't any safeguards in the reply. In fact I didn't even want it to give me hacking info and it did it anyway.

rpigab · on April 1, 2024

I asked ChatGPT for some dataviz task (I barely ever do dataviz myself) and it recommended some nice Python libraries to use, some I had already heard of and some I hadn't, and provided the code.

I'm grateful because I thought code LLMs only sped up the "RTFM" part, but it made me find those libs so I didn't have to Google around for (and sometimes it's hard to guess if they're the right tool for the job, and they might be behind in SEO).

miki123211 · on April 1, 2024

There are three things I find LLMs really excellent at for coding:

1. Being the "senior developer" who spend their whole career working with a technology you're very junior at. No matter what you do and how long your programming career is, you're inevitably going to run into one of these sooner or later. Whether it's build scripts, frontend code, interfacing with third-party APIs or something else entirely, you aren't an expert at every technology you work with.

2. Writing the "boring" parts of your program, and every program has some of these. If you're writing a service to fooize a bar really efficiently, Copilot won't help you with the core bar fooization algorithm, but will make you a lot faster at coding up user authentication, rate limiting for different plans, billing in whatever obscure payment method your country uses etc.

3. Telling you what to even Google for. This is where raw Chat GPT comes into play, not Copilot. Let's say you need a sorting algorithm that preserves the order of equal elements from the original list. This is called stable sorting, and Googling for stable sorting is a good way to find what you're looking for, but Chat GPT is usually a better way to tell you what it's called based on the problem description.

saagarjha · on April 3, 2024

> Being the "senior developer" who spend their whole career working with a technology you're very junior at. No matter what you do and how long your programming career is, you're inevitably going to run into one of these sooner or later. Whether it's build scripts, frontend code, interfacing with third-party APIs or something else entirely, you aren't an expert at every technology you work with.

Neither is the LLM.

astrange · on April 1, 2024

I asked a stupid question and got a stupid answer. Relatively speaking the answer was stupider than it should have been, so yes, it was wrong.

I asked it to try again and got a better result though, just didn't include it.

yunohn · on April 1, 2024

> I don't use LLMs for my coding, I manage just fine with LSP and Treesitter.

You’re literally comparing apples to oranges.

freedomben · on April 1, 2024

You need to read more than just the first sentence of a comment. They only said that part so the reader would know that they have never used an LLM for coding, so they would have more context for the question:

> So genuine question: is that answer representative of the output quality of these things?

yunohn · on April 1, 2024

Yes, I did read it. I’m kind of tired of HNers loudly proclaiming they are ignoring LLMs more than a year into this paradigm shift.

Is it that hard to input a prompt into the free version of ChatGPT and see how it helps with programming?

jpc0 · on April 1, 2024

I did exactly that and found it lackluster for the domain I asked it for.

And most use I've seen on it realistically a good LSP covers.

Or to put it a other way. It's no good at writing algorithms or data structures ( or at least no better thab I would have with a first drafy but the first draft puts me ahead of the LLM in understanding that actual problem at hand, handing it off to an LLM doesn't help me get to the final solution faster).

So that leaves writing boiler plate but concidering my experience with it writing more complex stuff, I would need to read over the boilerplate code to ensure it's correct which in that case I may as well have written it.

yunohn · on April 1, 2024

> found it lackluster for the domain I asked it for

Fair, that is possible depending on your domain.

> It's no good at writing algorithms or data structures

In my experience, this is untrue. I’ve gotten it to write algorithms with various constraints I had. You can even tell it to use specific function signatures instead of any stdlib, and make changes to tweak behavior.

> And most use I've seen on it realistically a good LSP covers.

Again, I really don’t understand this comparison. LSPs and LLMs go hand in hand.

I think it’s more of a workflow clash. One really needs to change how they operate to effectively use LLMs for programming. If you’re just typing nonstop, maybe it would feel like Copilot is just an LSP. But, if you try harder, LLMs are game changers when:

- maybe you like rubber ducking

- need to learn a new concept and implement it

- or need to glue things together

- or for new projects or features

- or filling in boilerplate based on existing context.

jpc0 · on April 1, 2024

https://chat.openai.com/share/c8c19f42-240f-44e7-baf4-50ee5e...

https://godbolt.org/z/s9Yvnjz7K

I mean I could write the algorithm by hand pretty quickly in C++ and would follow the exact same thought pattern but also deal with the edge cases. And factoring in the loss of productivity from the context switch that is a net negative. This algorithm is also not generic over enough cases but that is just up to the prompt.

If I can't trust it to write `strip_whitespace` correctly which is like 5 lines of code, can I trust it to do more without a thorough review of the code and writing a ton of unit tests... Well I was going to do that anyway.

The argument that I just need to learn better prompt engineering to make the LLM do what I want just doesn't sit with me when instead I could just spend the time writing the code. As I said your last point is absolutely the place I can see LLMs being actually useful but then I need to spend a significant amount of time in code review for generated code from an "employee" who is known to make up interfaces or entire libraries that doesn't exist.

mrtranscendence · on April 1, 2024

I'm a Python-slinging data scientist so C++ isn't my jam (to say the least), but I changed the prompt to the following and asked it to GPT-4:

> Write me an algorithm in C++ which finds the begin and end iterator of a sequence where leading and trailing whitespace is stripped. Please write secure code that handles any possible edge cases.

It gave me this:

https://chat.openai.com/share/55a4afe2-5db2-4dd1-b516-a3cacd...

I'm not sure what other edge cases there might be, however. This only covers one of them.

In general, I've found LLMs to be marginally helpful. Like, I can't ever remember how to get matplotlib to give me the plot I want, and 9 times out of 10 GPT-4 easily gives me the code I want. Anything even slightly off the beaten path, though, and it quickly becomes absolutely useless.

jpc0 · on April 1, 2024

My guess is that this was generated using GPT4?

Free GPT I get https://chat.openai.com/share/f533429d-63ca-4505-8dc8-b8d2e7... which has exactly the same problem as my previous example and doesn't consider the string of all whitespace.

Sure GPT4 is better at that, it wasn't the argument made.

The example you gave absolutely was the code I would write on a first draft since it does cover the edge cases (assuming we aren't dealing with the full UTF charset and all that could be considered a space there).

However this is code that is trivial to write in any language and the "Is it that hard to input a prompt into the free version of ChatGPT and see how it helps with programming? " argument doesn't hold up. Am I to believe it will implement something more complex correctly. This is also code that would absolutely be in hundreds of codebases so GPT has tons of context for it.

mrtranscendence · on April 1, 2024

I think you have the mistaken impression that I was arguing with you (certainly my comment makes it clear that I don't feel that LLMs are a panacea). I merely thought that you might be curious how GPT-4 would respond.

> My guess is that this was generated using GPT4?

This is a good guess, since I stated outright that I used GPT-4, and then mentioned GPT-4 later on in the comment.

jpc0 · on April 1, 2024

I was curious and yes I was mistaken.

yunohn · on April 1, 2024

Yeah honestly, I think you have a completely different expectation and style of usage than what is optimal with LLMs. I don’t have the energy to convince you further, but maybe one day it’ll click for you? No worries either way.

jpc0 · on April 1, 2024

Could you maybe give me an example of what is concidered an optimal use of LLMs.

Maybe a prompt to GPT

yunohn · on April 1, 2024

Like sibling commenter mentioned, simonw’s blog is a great resource.

Regarding your point around being able to whip up the code yourself - the point is to have a decent starting point to save time and energy. Like you said, you know the edge cases so you could skip the boring parts using GPT and focus purely on fixing those. Though, with more prompting (especially providing examples), GPT can also handle that for you.

I have nearly 2 decades of experience as a developer and it took me a while to reorient my flow around LLMs. But now that I have, it’s truly gamechanging.

And since you asked, here’s my system prompt:

You are an experienced developer who follows industry standards and best practices. Write lean code and explain briefly using bullet points or numbered lists. Elaborate only when explaining concepts or making choices. Always mention which file and where to store provided code.

Tech Stack: < insert all the languages, frameworks, etc you’d like to use >

If I provide code, highlight and explain problematic code. Also show and explain the corrected code.

Take a deep breath and think step by step.

Also, always use GPT4 and customize the above to your style and liking.

jpc0 · on April 2, 2024

I will definitely try this out when I have time later in the day.

There is some code I would really prefer not to write that is a decent test case for this and won't expose company code to GPT. Will give feedback when I am done. Maybe you are correct.

yunohn · on April 2, 2024

If you really want to experiment, give Cursor a try. It’s free up to a limit, so maybe it’ll be enough for your example use case.

It handles even more complex use cases and will automatically include/patch code for you via the inbuilt LLM framework. This helps with iteration and modifications as you massage it to what you need. Plus, it’ll scan your code and find the languages/frameworks automatically.

Finally, keep in mind that the goal should not be perfect production code - that’s just Twitter AI hype. It’s about saving time and energy for you (the human) to achieve more than possible before.

https://cursor.sh/

jpc0 · on April 3, 2024

To give some feedback.

I tried your prompt and the above approach and it took me about 45 minutes of putsing around to get a result I am happy to begin iteration on.

Effectively: I have an 80bit byte array representing a timestamp struct consisting of a 48 bit unsigned integer for seconds and a 32 bit unsigned integer representing nanoseconds. The byte array is big endian and the host systen is little endian.

I gave it full signatures for all functions and relevant structs and instructions on how I would want the parsing done regarding algorithmic complexity and yet it still took multiple iterations to get anything useful.

At this point it is converting to little endian during the decode then doing a check if host the system is big endian and converting back to big endian if that is true.

There is likely some easy optimisations to be done where there and I would have definitely have gotten to this point quicker had I just written the 10 lines of code this needed and would have done the optimisations where I'm pretty sure that entire operation can happen in a few instructions.

fragmede · on April 1, 2024

Simonw's blog has some examples I'd consider show off its usefulness and limitations, eg

https://simonwillison.net/2024/Mar/23/building-c-extensions-...

(linked previously above)

jpc0 · on April 2, 2024

Are you happy with the C code generated there?

I'm not sure there isn't a buffer overflow in the vector_decode code he showed there, likewise I don't see any error checks on the code and I am not familiar with the sqlite api to even know whether errors can be propagated upwards and what error conditions would mean in that code.

This code is probably fine for a quick side project but doesn't pass my smell test for anything close to production ready code.

I definitely would want to see a lot of unit tests around the decode and encode functions with fuzzing and to be honestly that would be the bulk of the work here. That and documentation on this code. Even though he encode function looks correct at first glance.

I also don't see an easy way to actually unit test this code as it is without actually running it through sqlite which outs a lot of dependencies on the unit test.

I would either need to spend a lot more time massaging gpt to get this to a point where I would be fine shipping the code or you know just write it myself.

coldtea · on April 1, 2024

I think the point was like "when it comes to programming assistance, auto-completion/linting/and whatever else LSP does and syntax assist from Treesitter, are enough for me".

Though it does come a little off as a comparison. How about programming assistance via asking a colleague for help, Stack Overflow, or online references, code examples, and other such things, which are closer to what the LLM would provide than LSP and treesitter?

fragmede · on April 1, 2024

Interesting. It was 4. I can't share the chat I had where ChatGPT refused to help because I used the wrong words, because I can't find it (ChatGPT conversation history search when?), but I just remember it refusing to do something because it thought I was trying to break some sort of moral and ethical boundary writing a chrome extension when all I wanted to do is move some divs around or some such.

BytesAndGears · on April 1, 2024

One time I wanted to learn about transmitter antenna design, just because I’m curious. ChatGPT 4 refused to give me basic information because you could use that to break some FCC regulations (I’m not even living in the US currently)

lodovic · on April 1, 2024

I usually get around that with "I'm writing a research paper" or "I'm writing a novel and need to depict this as accurate as possible"

kg · on April 1, 2024

If you want to be an amateur chemist I recommend not getting your instructions from an LLM that might be hallucinating. Chemistry can be very dangerous if you're following incorrect instructions.

isoprophlex · on April 1, 2024

From experience as a failed organic chemist (who happily switched to computational chemistry for reasons of self preservation) I can tell you it's plenty dangerous when you're following correct instructions :^)

rpigab · on April 1, 2024

Yes, just as the best professional cooks recommend avoiding to boil cow eggs, as they can explode.

slowmovintarget · on April 1, 2024

They don't explode, the shell simply cracks and then you get egg soup.

Now microwaving eggs... that's a different matter.

rpigab · on April 1, 2024

I was talking about cow eggs specifically! When ChatGPT et al got out, one of the funniest things to do was ask it about the best recipes for cow egg omelette or camel egg salad, and the LLM would provide. Sadly, most of it got patched somehow.

slowmovintarget · on April 1, 2024

Oops... Yep, I missed that too. (On the internet, no one knows you're a dog.)

That's funny. It makes me wonder how these statistical mad libs machines will handle the gradual boundaries nature gives us. Almost all mammals give birth live, but not all. Nearly all mammals had mammalian parents, but not all.

Daniel Dennett was making this argument for why we haven't developed reasonable models for the nature of consciousness. It's because we're so sure there will be an absolute classification, and not a gradual accumulation of interacting systems that together yield the phenomenon.

anukin · on April 1, 2024

Which uncensored model is willing to play hot or not? I just knew about llava. Are there other such models now?

Eisenstein · on April 2, 2024

Llava just integrates Clip with a llama model. Koboldcpp can now do this with many models out of the box:

* https://github.com/LostRuins/koboldcpp/releases/tag/v1.61.2

gryn · on April 1, 2024

> There's an uncensored model for vision available as well.

you mean the LLava based variants ?

fragmede · on April 1, 2024

https://huggingface.co/SkunkworksAI/BakLLaVA-1

supposemaybe · on April 1, 2024

Links to all these models you speak of?

fragmede · on April 1, 2024

https://huggingface.co/georgesung/llama2_7b_chat_uncensored

https://huggingface.co/SkunkworksAI/BakLLaVA-1

you'll have to brave 4chan yourself to find links to the NSFW ones, I don't actually have them.

supposemaybe · on April 1, 2024

I just can’t brave the venture to 4chan, I may get mugged or worse.

devsda · on April 1, 2024

For someone interested in learning about LLMs, running them locally is a good way to understand the internals.

For everyone else, I wish they experience these (locally or elsewhere) weak LLMs atleast once before using the commercial ones just to understand various failure modes and to introduce a healthy dose of skepticism towards the results instead of blindly trusting them to be the facts/truth.

simonw · on April 1, 2024

Completely agree. Playing around with a weak LLM is a great way to give yourself a little bit of extra healthy skepticism for when you work with the strong ones.

samus · on April 1, 2024

This skepticism is completely justified since ChatGPT 3.5 is also happily hallucinating things that don't exist. For example how to integrate a different system Python interpreter into pyenv. Though maybe ChatGPT 4 doesn't :)

mmahemoff · on April 1, 2024

How do you learn about the internals by running LLMs locally? Are you playing with The code, runtime params, or just interacting via chat?

samus · on April 1, 2024

The abstractions are relatively brittle. If you don't have a powerful GPU, you will be forced to consider how to split the model between CPU and GPU, how much context size you need, whether to quantize the model, and the tradeoffs implied by these things. To understand these, you have to develop a basic model how an LLM works.

barrkel · on April 1, 2024

By interacting with it. You see the contours of its capabilities much more clearly, learn to recognize failure modes, understand how prior conversation can set the course of future conversation in a way that's almost impossible to correct without starting over or editing the conversation history.

tracerbulletx · on April 1, 2024

I don't really think this is true, you can't really extrapolate the strengths and weaknesses of bigger models from the behavior of smaller/quantized models and in fact a lot of small models are actually great at lots of things and better at creative writing. If you want to know how they work, just learn how they work, it takes like 5 hours of watching Youtube videos if you're a programmer.

simonw · on April 1, 2024

Sure, you can't extrapolate the strengths and weaknesses of the larger ones from the smaller ones - but you still get a much firmer idea of what "they're fancy autocomplete" actually means.

If nothing else it does a great job of demystifying them. They feel a lot less intimidating once you've seen a small one running on your computer write a terrible haiku and hallucinate some non-existent API methods.

fzzzy · on April 1, 2024

It's funny that you say this, because the first thing I tried after ChatGPT came out (3.5-turbo was it?) was writing a haiku. It couldn't do it at all. Also, after 4 came out, it hallucinated an api that wasted a day for me. It's an api that absolutely should have existed, but didn't. Now, I frequently apply llm to things that are easily verifiable, and just double check everything.

famouswaffles · on April 2, 2024

>but you still get a much firmer idea of what "they're fancy autocomplete" actually means.

Interesting how you can have the same experience and come to opposite conclusions.

Seeing so many failure modes of the smaller models fall by the wayside as compute goes brrr just made me realize how utterly meaningless that phrase is.

tgma · on April 1, 2024

If you have an >=M1-class machine with sufficient RAM, the medium-sized models that are on the order of 30GB in size perform decently on many tasks to be quite useful without leaking your data.

noman-land · on April 1, 2024

I'm using Mixtral 8x7b as a llamafile on an M1 regularly for coding help and general Q&A. It's really something wonderful to just run a single command and have this incredible offline resource.

tgma · on April 1, 2024

I concur; in my experience Mixtral is one of the best ~30G models (likely the best pro laptop-size model currently) and Gemma is quite good compared to other below 8GB models.

tchvil · on April 1, 2024

By any chance, do you have a good link to some help with the installation?

yaantc · on April 1, 2024

Use llamafile [1], it can be as simple as downloading a file (for mixtral, [2]), making it executable and running it. The repo README has all the info, it's simple and downloading the model is what takes the most time.

In my case I got the runtime detection issue (explained in the README "gotcha" section). Solved my running "assimilate" [3] on the downloaded llamafile.

    [1] https://github.com/Mozilla-Ocho/llamafile/
    [2] https://huggingface.co/jartine/Mixtral-8x7B-Instruct-v0.1-llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M.llamafile?download=true
    [3] https://cosmo.zip/pub/cosmos/bin/assimilate

tchvil · on April 1, 2024

Thank you !

tgma · on April 1, 2024

Either https://lmstudio.ai (desktop app with nice GUI) or https://ollama.com (command-like more like a docker container that you can also hook up to a web UI via https://openwebui.com) should be super straightforward to get running.

tchvil · on April 1, 2024

Thank you for letting me know it was possible on an M1. I'll try all this now.

chown · on April 1, 2024

I am the author of Msty [1]. My goal is to make it as straightforward as possible with just one click (once you download the app). If you try it, let me know what you think.

1: https://msty.app

omnibrain · on April 2, 2024

Looks great. Can you recommend what GPU to get to just play with the models for a bit? (I want to have perform it fast, otherwise I lose interest too quickly). Are consumer GPUs like the RTX 4080 Super sufficient, or do I need anything else?

yunohn · on April 1, 2024

Why is this both free and closed source? Ideally, when you advertise privacy-first, I’d like to see a GitHub link with real source code. Or I’d rather pay for it to ensure you have a financial incentive to not sell my data.

chown · on April 1, 2024

It will be paid down the road, but we are not there yet. It’s all offline, data is locally saved. You own it, we don’t have it even if you ask for it.

yunohn · on April 2, 2024

There’s incredible competition in this space already - I’d highly recommend outright stating your future pricing plans, instead of a bait-and-switch later.

tchvil · on April 1, 2024

I'll try in a week+ when I'm back to a fast connection. Thank you.

firewolf34 · on April 2, 2024

Check out PrivateGPT on GitHub. Pretty much just works put of the box. I got Mistral7B running on a GTX 970 in about 30 minutes flat first try. Yep, that's the triple-digit GTX 970.

bongobingo1 · on April 1, 2024

What is sufficient RAM in that case? 30gb+? Or can you get by streaming it?

AaronFriel · on April 1, 2024

30gb+, yeah. You can't get by streaming the model's parameters: NVMe isn't fast enough. Consumer GPUs and Apple Silicon processors boast memory bandwidths in the hundreds of gigabytes per second.

To a first order approximation, LLMs are bandwidth constrained. We can estimate single batch throughput as Memory Bandwidth / (Active Parameters * Parameter Size).

An 8-bit quantized Llama 2 70B conveniently uses 70GiB of VRAM (and then some, let's ignore that.) The M3 Max with 96GiB of VRAM and 300GiB/s bandwidth would have a peak throughput around 4.2 tokens per second.

Quantized models trade reduced quality for lower VRAM requirements and may also offer higher throughput with optimized kernels, largely as a consequence of transfering less data from VRAM into the GPU die for each parameter.

Mixture of Expert models reduce active parameters for higher throughput, but disk is still far too slow to page in layers.

supposemaybe · on April 1, 2024

It’s an awful thing for many to accept, but just downloading and setting up an LLM which doesn’t connect to the web doesn’t mean that your conversations with said LLM won’t be a severely interesting piece of telemetry that Microsoft and (likely Apple) would swipe to help deliver a ‘better service’ to you.

kersplody · on April 1, 2024

Local LLMs are also a fantastic too for creative endeavors. Without prompt injection and having the ability to modify the amount of noise and "creativity" in the output, absolutely bonkers things pop out.

jonnycomputer · on April 1, 2024

They are not so bad as you are making out, tbh.

And privacy is a good enough reason to use local LLMs over commercial ones.

hylaride · on April 1, 2024

The ones you can run on your own machine tend to be bad - really bad. They hallucinate wildly and fail at all sorts of tasks that the larger hosted ones succeed at.

Totally. I recently asked a locally-run "speed" LLM for the best restaurants in my (major) city, but it spit out restaurants opened by chefs from said city in other cities. It's not a thing you'd want to rely on for important work, but is still quite something.

gardenhedge · on April 1, 2024

You can just chat to ChatGPT for awhile about something you know about and you'll learn that.

qingcharles · on April 2, 2024

You can have really bad and fast, or really slow and decent right now. Choose one :)

barfingclouds · on April 1, 2024

Why not just interact with a virtual one that’s equally weak? You get all the same benefits

bottlepalm · on April 2, 2024

Who cares, a local LLM still knows way way more practical knowledge than you, and without internet would provide a ton of useful information. Not surprised by this typical techy attitude - something has to be 'perfect' to be useful.

gfodor · on April 1, 2024

I mean kinda. But there's a good chance this is also misleading. Lots of people have been fooled into thinking LLMs are inherently stupid because they have had bad experiences with GPT-3.5. The whole point is that the mistakes they make and even more fundamentally what they're doing changes as you scale them up.