Hacker Newsnew | past | comments | ask | show | jobs | submit | Metricon's commentslogin

It seems like some things always remain the same: https://www.youtube.com/watch?v=G0ZZJXw4MTA


There is a Yes, {Prime Minister,Minister} for every occasion in tech.


You may want to have a look at Mistral OCR: https://mistral.ai/news/mistral-ocr


GGUF version created by "isaiahbjork" which is compatible with LM Studio and llama.cpp server at: https://github.com/isaiahbjork/orpheus-tts-local/

To run llama.cpp server: llama-server -m C:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock


I've been testing this out, it's quite good and especially fast. Crazy that this is working so well at Q4


Can somebody please create a gradio client for this as well. I really want to try this out but the complexity messes me up.


Wait, how do you get audio out of llama-server?


Orpheus is a llama model trained to understand/emit audio tokens (from snac). Those tokens are just added to its tokenizer as extra tokens.

Like most other tokens, they have text reprs: '<custom_token_28631>' etc. You sample 7 of them (1 frame), parse out the ids, pass through snac decoder, and you now have a frame of audio from a 'text' pipeline.

The neat thing about this design is you can throw the model into any existing text-text pipeline and it just works.


got it, so inference in llama.cpp server won't actually get me any audio directly


If you run the `gguf_orpheus.py` file in that repository, it will capture the audio tokens and convert them to a .wav file. With a little more work, you can feed the streaming audio directly using `sounddevice` and `OutputStream`

On a Nvidia 4090, it's producing:

  prompt eval time =      17.93 ms /    24 tokens (    0.75 ms per token,  1338.39 tokens per second)

         eval time =    2382.95 ms /   421 tokens (    5.66 ms per token,   176.67 tokens per second)

        total time =    2400.89 ms /   445 tokens
*A Correction to the llama.cpp server command above, there are 29 layers so it should read "-ngl 29" to load all the layers to the GPU.


is there any reason not to just use `-ngl 999` to avoid that error? Thanks for the help though, I didn't realize lmstudio was just llama.cpp under the hood. I have it running now, though decoding is happening on CPU torch because of venv issues, still running about realtime though, I'm interested in making a full fat gguf to see what sort of degradation the quant introduces. Sounds great though, can't wait to try finetuning and messing with the pretrained model. Have you tried it? I guess you just tokenize the voice with SNAC, transcribe it with whisper, and then feed that in as a prompt? What a fascinating architecture.


You need to decode the tokens into audio. See `convert_to_audio` method in `decoder.py`

You can run `python gguf_orpheus.py --text "Hello, this is a test" --voice tara` and connect to the llama-server

See https://github.com/isaiahbjork/orpheus-tts-local

See my GH issue example output https://github.com/isaiahbjork/orpheus-tts-local/issues/15


This amuses me tremendously. I began programming in the early 1980s and quickly developed an interest in Artificial Intelligence. At the time there was a great interest in the advancement of AI by the introduction of "Expert Systems" (which would later play a part in the ‘Second AI Winter’).

What Amazon appears to have done here is use a transformers based neural network (aka LLM) to translate natural language into symbolic logic rules which are collectively used together in what could be identified as an Expert System.

Full Circle. Hilarious.

For reference to those on the younger side: The Computer Chronicles (1984) https://www.youtube.com/watch?v=_S3m0V_ZF_Q


I don't see why this is hilarious at all.

The problem with expert systems (and most KG-type applications) has always been that translating unconstrained natural language into the system requires human-level intelligence.

It's been completely obvious that LLMs are a technology that let us bridge that gap for years, and many of the best applications of LLMs are doing exactly that (eg code generation)


To be clear, my amusement isn't that I find this technique to not be useful for the purpose it was created, but that 40 years later, we find ourselves in pursuit for the advancement of AI to be somewhat back where we already were; albeit, in a more semi-automated fashion as someone still has to create the underlying rule-set.

I do feel that the introduction of generative neural network models in both natural language and multi-media creation has been a tremendous boon for the advancement of AI, it just amuses me to see that which was old is new again.


Same with symbolic systems!


Seems likely that we were on the right track, it just took 40 years for computers to get good enough.


Right. The trouble with that approach is that it's great on the easy cases and degrades rapidly with scale.

This sounds like is a fix for a very specific problem. An airline chatbot told a customer that some ticket was exchangeable. The airline claimed it wasn't. The case went to court. The court ruled that the chatbot was acting as an agent of the airline, and so ordinary rules of principal-agent law applied. The airline was stuck with the consequence of their chatbot's decision.[1]

Now, if you could reduce the Internal Revenue Code to rules in this way, you'd have something.

[1] https://www.bbc.com/travel/article/20240222-air-canada-chatb...


Yes, as I said in another comment: "By constraining the field it is trying to solve it makes grounding the natural language question in a knowledge graph tractable."

IRS rules should be tractable!


There are a number of ways this might get solved, but I would speculate that it will generally be solved by adding image metadata that is signed by a certificate authority similar to the way SSL certificates are assigned to domains.

I think eventually all digital cameras and image scanners will securely hash and sign images just as forensic cameras do to certify that an image was "captured" instead of generated.

Of course this leaves a grey area for image editing applications such as Photoshop, so there may also need to be some other level of certificate base signing introduced there as well.


For those who might not be aware of this, there is also an open source project on GitHub called "Twinny" which is an offline Visual Studio Code plugin equivalent to Copilot: https://github.com/rjmacarthy/twinny

It can be used with a number of local model services. Currently for my setup on a NVIDIA 4090, I'm running both the base and instruct model for deepseek-coder 6.7b using 5_K_M Quantization GGUF files (for performance) through llama.cpp "server" where the base model is for completions and the instruct model for chat interactions.

llama.cpp: https://github.com/ggerganov/llama.cpp/

deepseek-coder 6.7b base GGUF files: https://huggingface.co/TheBloke/deepseek-coder-6.7B-base-GGU...

deepseek-coder 6.7b instruct GGUF files: https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct...


Started 1982 on Tandy Color Computer. Still at it.

#1 Advice - Focus on getting things done (as many will not) and Lego build interconnected/isolated simplicity as much as possible.


Currently, if you disable chat history, you'll see this message:

Chat History is off for this browser. When history is turned off, new chats on this browser won't appear in your history on any of your devices, be used to train our models, or stored for longer than 30 days. This setting does not sync across browsers or devices.


It's absolutely insane to trust that they won't do this.


No it's not. If they explicitly say they won't train on your data and then they do, it's going to come out in discovery of one of the lawsuits they're fighting, and the consequences would be significant. Plus there's little incentive for them to lie about it, given most people leave history on.


yeah because no large tech company has ever lied to their customers about how their data is being handled. oh wait there are lawsuits surrounding this sort of thing all the time.


I wouldn't trust them with nuclear secrets, but to say it's "insane" to trust that they're going to do what they explicitly say they're going to do just isn't logical.


https://privacy.openai.com/policies

They hide this link a bit. They completed my opt-out request in about ten minutes and at least claim to be not using any of my data going forward for training.

I didn't lose any features like Chat History


BTW, for anyone who might not be aware of it, this model trained by Intel based on the Mistral architecture is probably the single best general 7B model available currently:

https://huggingface.co/Intel/neural-chat-7b-v3-2 (also see https://huggingface.co/Intel/neural-chat-7b-v3-1 from the previous version for more details)

It's licensed Apache 2.0 and unaligned (uncensored).


How is it better than the model from the team that made the dataset? https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca


The Intel one had supervised fine-tuning with the SlimOrca dataset, and then DPO alignment on top of that using a preference dataset.

The technique for generating the preference data is what’s so interesting about that one. Instead of having human labelers choose a preferred response, they generated a response from a small model and a large model, and then always selected the large one’s as the preferred response.


I haven't personally tried that one, but on the HuggingFace LLM Leaderboard:

Open-Orca/Mistral-7B-SlimOrca - AVG: 60.37, ARC: 62.54, HellaSwag: 83.86, MMLU: 62.77, TruthfulQA: 54.23, Winogrande: 77.43, GSM8k: 21.38

Intel/neural-chat-7b-v3-2 - AVG: 68.29, ARC: 67.49, HellaSwag: 83.92, MMLU: 63.55, TruthfulQA: 59.68, Winogrande: 79.95, GSM8k: 55.12


The RSS feed for it here appears to work: https://www.latent.space/feed


latent space author here - its a substack, it should work

let me know if any other issues!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: