I think he should be mainly throwing it at the VST vendor, as opposed to the protection software, since the main issue in the article comes from the vst vendor protecting the installer but not the actual software (that said, they also show that the protection software is fairly trivial to hook and bypass)
> No matter how much some try to gaslight, homosexuality is abnormal
This is an abjectly silly thing to say, and people who push back on it are not gaslighting. Homosexuality occurs naturally and it's not even rare - it's far more common than red hair, for example.
Calling something like that "abnormal" isn't in the domain of fact, it's purely a side-effect of what you label "normal".
> Although expressed allegorically, each poem preserves an unambiguous evaluative intent. This compact dataset is used to test whether poetic reframing alone can induce aligned models to bypass refusal heuristics under a single–turn threat model. To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy:
I don't follow the field closely, but is this a thing? Bypassing model refusals is something so dangerous that academic papers about it only vaguely hint at what their methodology was?
No, this paper is just exceptionally bad. It seems none of the authors are familiar with the scientific method.
Unless I missed it there's also no mention of prompt formatting, model parameters, hardware and runtime environment, temperature, etc. It's just a waste of the reviewers time.
Eh. Overnight, an entire field concerned with what LLMs could do emerged. The consensus appears to be that unwashed masses should not have access to unfiltered ( and thus unsafe ) information. Some of it is based on reality as there are always people who are easily suggestible.
Unfortunately, the ridiculousness spirals to the point where the real information cannot be trusted even in an academic paper. shrug In a sense, we are going backwards in terms of real information availability.
Personal note: I think, powers that be do not want to repeat the mistake they made with the interbwz.
Ideally (from a scientific/engineering basis), zero bs is acceptable.
Realistically, it is impossible to completely remove all BS.
Recognizing where BS is, and who is doing it, requires not just effort, but risk, because people who are BS’ing are usually doing it for a reason, and will fight back.
And maybe it turns out that you’re wrong, and what they are saying isn’t actually BS, and you’re the BS’er (due to some mistake, accident, mental defect, whatever.).
And maybe it turns out the problem isn’t BS, but - and real gold here - there is actually a hidden variable no one knew about, and this fight uncovers a deeper truth.
There is no free lunch here.
The problem IMO is a bunch of people are overwhelmed and trying to get their free lunch, mixed in with people who cheat all the time, mixed in with people who are maybe too honest or naive.
It’s a classic problem, and not one that just magically solves itself with no effort or cost.
LLM’s have shifted some of the balance of power a bit in one direction, and it’s not in the direction of “truth justice and the American way”.
But fake papers and data have been an issue before the scientific method existed - it’s why the scientific method was developed!
And a paper which is made in a way in which it intentionally can’t be reproduced or falsified isn’t a scientific paper IMO.
<< I’m not sure what you’re trying to say by this.
I read the paper and I was interested in the concepts it presented. I am turning those around in my head as I try to incorporate some of them into my existing personal project.
What I am trying to say is that I am currently processing. In a sense, this forum serves to preserve some of that processing.
<< And a paper which is made in a way in which it intentionally can’t be reproduced or falsified isn’t a scientific paper IMO.
Obligatory, then we can dismiss most of the papers these days, I suppose.
FWIW, I am not really arguing against you. In some ways I agree with you, because we are clearly not living in 'no BS' land. But I am hesitant over what the paper implies.
I don't see the big issues with jailbreaks, except maybe for LLMs providers to cover their asses, but the paper authors are presumably independent.
That LLMs don't give harmful information unsolicited, sure, but if you are jailbreaking, you are already dead set in getting that information and you will get it, there are so many ways: open uncensored models, search engines, Wikipedia, etc... LLM refusals are just a small bump.
For me they are just a fun hack more than anything else, I don't need a LLM to find how to hide a body. In fact I wouldn't trust the answer of a LLM, as I might get a completely wrong answer based on crime fiction, which I expect makes up most of its sources on these subjects. May be good for writing poetry about it though.
I think the risks are overstated by AI companies, the subtext being "our products are so powerful and effective that we need to protect them from misuse". Guess what, Wikipedia is full of "harmful" information and we don't see articles every day saying how terrible it is.
I see an enormous threat here, I think you're just scratching the surface.
You have a customer facing LLM that has access to sensitive information.
You have an AI agent that can write and execute code.
Just image what you could do if you can bypass their safety mechanisms! Protecting LLMs from "social engineering" is going to be an important part of cybersecurity.
Having sensitive information is kind of inherent to the way the training slurps up all the data these companies can find. The people who run chatgpt don't want to dox people but also don't want to filter its inputs. They don't want it to tell you how to kill yourself painlessly but they want it to know what the symptoms of various overdoses are.
Yes, agents. But for that, I think that the usual approaches to censor LLMs are not going to cut it. It is like making a text box smaller on a web page as a way to protect against buffer overflows, it will be enough for honest users, but no one who knows anything about cybersecurity will consider it appropriate, it has to be validated on the back end.
In the same way a LLM shouldn't have access to resources that shouldn't be directly accessible to the user. If the agent works on the user's data on the user's behalf (ex: vibe coding), then I don't consider jailbreaking to be a big problem. It could help write malware or things like that, but then again, it is not as if script kiddies couldn't work without AI.
> If the agent works on the user's data on the user's behalf (ex: vibe coding), then I don't consider jailbreaking to be a big problem. It could help write malware or things like that, but then again, it is not as if script kiddies couldn't work without AI.
Tricking it into writing malware isn't the big problem that I see.
It's things like prompt injections from fetching external URLs, it's going to be a major route for RCE attacks.
There's plenty of things we should be doing to help mitigate these threats, but not all companies follow best practices when it comes to technology and security...
If you create a chatbot, you don't want screenshots of it on X helping you to commit suicide or giving itself weird nicknames based on dubious historic figures. I think that's probably the use-case for this kind of research.
Yes, that's what I meant by companies doing this to cover their asses, but then again, why should presumably independent researchers be so scared of that to the point of not even releasing a mild working example.
Furthermore, using poetry as a jailbreak technique is very obvious, and if you blame a LLM for responding to such an obvious jailbreak, you may as well blame Photoshop for letting people make porn fakes. It is very clear that the intent comes from the user, not from the tool. I understand why companies want to avoid that, I just don't think it is that big a deal. Public opinion may differ though.
Maybe their methodology worked at the start but has since stopped working. I assume model outputs are passed through another model that classifies a prompt as a successful jailbreak so that guardrails can be enhanced.
Too dangerous to handle or too dangerous for openai's reputation when "journalists" write articles about how they managed to force it to say things that are offensive to the twitter mob? When AI companies talk about ai safety, it's mostly safety for their reputation, not safety for the users.
Do you have a link that explains in more detail what was kept away from whom and why? What you wrote is wide open to all kinds of sensational interpretations which are not necessarily true, ir even what you meant to say.
I found it hard to extract any signal from this extremely chatty site. But AFAICT the tagline "basically modernized Gopher" wouldn't be too far off the mark?
Services like Mullvad and Signal are in the business of passing along messages between other parties; messages the service isn't a party to. With chatgpt chat histories, the user is talking directly to the service - you're suggesting the service should E2EE messages to and from itself, to prevent itself from spying on data generated by its own service?
> Large number of upvotes on the quoted comment however.
Sure, and also downvotes - that measures factionalism, not correctness.
But tech wise, you're confused. Functionally speaking chatgpt is a shared document editor - the server needs to store chat histories for the same reason Google Docs stores the content of documents. Users can submit text to chatgpt.com from one browser, and later edit that text from the app or a different browser. Ergo the text is stored on the server, simple as that.
SyncThing syncs only when both clients are running at the same time. Nobody who edits a document on a website expects that they'll need to leave that browser window open in order to see the document in a different browser.
Am I missing something? Is this seriously a heated HN debate over "why does this website need to store the text it sends to people who view the website?"?
We're not talking about collaborative tooling, just a record of what you've asked an AI assistant. If it doesn't sync right away, it's not the end of the world. I find that's true with most things.
And the clients don't need to be running at the same time if you have a third device that's always on and receiving the changes from either (like a backup system). Eventually everything arrives. It's not as robust as what Google or iCloud gives you, but it's good enough for me.
Chatgpt.com is essentially a CRUD app. What you're saying here amounts to saying that it could conceivably have been designed to work dramatically differently from all other CRUD apps. And obviously that's true, but why would it be?
It's a website! You submit text, that you'll view or edit later, so the server stores it. How is that controversial to a HN audience?
Also:
> the clients don't need to be running at the same time if you have a third device that's always on
An always-on device that stores data in order to sync it to clients is a server.
TBH it sounds like you're just imagining a very different service than the one openAI operates. You're imagining something where you send an input, the server returns an output - and after that they're out of the equation, and storing the output somewhere is a separate concern that could be left up to the user.
But the service they actually operate is functionally a collaborative document editor - the chat histories are basically rich text docs that you can view, edit, archive, share with others, and which are integrated with various server-side tools. And the document very obviously needs to be stored on the server to do all those things.
It's great that you'd enjoy a significantly worse product that requires you to also be familiar with a completely unrelated product.
For some reason, consumers have decided that they prefer a significantly better product that doesn't require any additional applications or technical expertise ¯\_(ツ)_/¯
Great point! After playing that game I and a few friends were trading real-world photos of spots where we'd found examples of the in-game thing you're talking about.
Funny, just yesterday I found myself casting in a way I'd never seen before:
const arr = ['foo'] as ['foo']
This wound up being useful in a situation that boiled down to:
type SomeObj = { foo: string, bar: string }
export const someFn = (props: (keyof SomeObj)[]) => {}
// elsewhere
const props = ['foo'] as ['foo']
someFn(props)
In a case like that `as const` doesn't work, since the function doesn't expect a readonly argument. Of course there are several other ways to do it, but in my case the call site didn't currently import the SomeObj type, so casting "X as X" seemed like the simplest fix.
Er, my justification was that the code in question was meant to be minimally demonstrating someFn, and adding an import or a verbose type seemed to distract from that a little.
But mostly it just gave me a chuckle. I tried it because it seemed logical, but I didn't really think it was going to work until it did..
It sounds like you didn't find any issues with either of them, except that the VST vendor chose not to protect the thing you were hoping to crack?
reply