More

Metricon · 2025-08-11T20:26:33 1754943993

It seems like some things always remain the same: https://www.youtube.com/watch?v=G0ZZJXw4MTA

quotemstr · 2025-08-11T20:40:40 1754944840

There is a Yes, {Prime Minister,Minister} for every occasion in tech.

Metricon · 2025-07-18T02:33:46 1752806026

You may want to have a look at Mistral OCR: https://mistral.ai/news/mistral-ocr

Metricon · 2025-03-20T04:53:52 1742446432

GGUF version created by "isaiahbjork" which is compatible with LM Studio and llama.cpp server at: https://github.com/isaiahbjork/orpheus-tts-local/

To run llama.cpp server: llama-server -m C:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 28 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock

Zetaphor · 2025-03-20T05:44:16 1742449456

I've been testing this out, it's quite good and especially fast. Crazy that this is working so well at Q4

Imustaskforhelp · 2025-03-21T13:46:39 1742564799

Can somebody please create a gradio client for this as well. I really want to try this out but the complexity messes me up.

thot_experiment · 2025-03-20T07:16:10 1742454970

Wait, how do you get audio out of llama-server?

hexaga · 2025-03-20T07:22:10 1742455330

Orpheus is a llama model trained to understand/emit audio tokens (from snac). Those tokens are just added to its tokenizer as extra tokens.

Like most other tokens, they have text reprs: '<custom_token_28631>' etc. You sample 7 of them (1 frame), parse out the ids, pass through snac decoder, and you now have a frame of audio from a 'text' pipeline.

The neat thing about this design is you can throw the model into any existing text-text pipeline and it just works.

thot_experiment · 2025-03-20T07:28:57 1742455737

got it, so inference in llama.cpp server won't actually get me any audio directly

Metricon · 2025-03-20T07:38:52 1742456332

If you run the `gguf_orpheus.py` file in that repository, it will capture the audio tokens and convert them to a .wav file. With a little more work, you can feed the streaming audio directly using `sounddevice` and `OutputStream`

On a Nvidia 4090, it's producing:

  prompt eval time =      17.93 ms /    24 tokens (    0.75 ms per token,  1338.39 tokens per second)

         eval time =    2382.95 ms /   421 tokens (    5.66 ms per token,   176.67 tokens per second)

        total time =    2400.89 ms /   445 tokens

*A Correction to the llama.cpp server command above, there are 29 layers so it should read "-ngl 29" to load all the layers to the GPU.

thot_experiment · 2025-03-20T08:03:43 1742457823

is there any reason not to just use `-ngl 999` to avoid that error? Thanks for the help though, I didn't realize lmstudio was just llama.cpp under the hood. I have it running now, though decoding is happening on CPU torch because of venv issues, still running about realtime though, I'm interested in making a full fat gguf to see what sort of degradation the quant introduces. Sounds great though, can't wait to try finetuning and messing with the pretrained model. Have you tried it? I guess you just tokenize the voice with SNAC, transcribe it with whisper, and then feed that in as a prompt? What a fascinating architecture.

gianpaj · 2025-03-26T12:11:46 1742991106

You need to decode the tokens into audio. See `convert_to_audio` method in `decoder.py`

You can run `python gguf_orpheus.py --text "Hello, this is a test" --voice tara` and connect to the llama-server

See https://github.com/isaiahbjork/orpheus-tts-local

See my GH issue example output https://github.com/isaiahbjork/orpheus-tts-local/issues/15

Metricon · on Dec 4, 2024

This amuses me tremendously. I began programming in the early 1980s and quickly developed an interest in Artificial Intelligence. At the time there was a great interest in the advancement of AI by the introduction of "Expert Systems" (which would later play a part in the ‘Second AI Winter’).

What Amazon appears to have done here is use a transformers based neural network (aka LLM) to translate natural language into symbolic logic rules which are collectively used together in what could be identified as an Expert System.

Full Circle. Hilarious.

For reference to those on the younger side: The Computer Chronicles (1984) https://www.youtube.com/watch?v=_S3m0V_ZF_Q

nl · on Dec 4, 2024

I don't see why this is hilarious at all.

The problem with expert systems (and most KG-type applications) has always been that translating unconstrained natural language into the system requires human-level intelligence.

It's been completely obvious that LLMs are a technology that let us bridge that gap for years, and many of the best applications of LLMs are doing exactly that (eg code generation)

Metricon · on Dec 4, 2024

To be clear, my amusement isn't that I find this technique to not be useful for the purpose it was created, but that 40 years later, we find ourselves in pursuit for the advancement of AI to be somewhat back where we already were; albeit, in a more semi-automated fashion as someone still has to create the underlying rule-set.

I do feel that the introduction of generative neural network models in both natural language and multi-media creation has been a tremendous boon for the advancement of AI, it just amuses me to see that which was old is new again.

rustastra · on Dec 4, 2024

Same with symbolic systems!

fzzzy · on Dec 4, 2024

Seems likely that we were on the right track, it just took 40 years for computers to get good enough.

Animats · on Dec 4, 2024

Right. The trouble with that approach is that it's great on the easy cases and degrades rapidly with scale.

This sounds like is a fix for a very specific problem. An airline chatbot told a customer that some ticket was exchangeable. The airline claimed it wasn't. The case went to court. The court ruled that the chatbot was acting as an agent of the airline, and so ordinary rules of principal-agent law applied. The airline was stuck with the consequence of their chatbot's decision.[1]

Now, if you could reduce the Internal Revenue Code to rules in this way, you'd have something.

[1] https://www.bbc.com/travel/article/20240222-air-canada-chatb...

nl · on Dec 4, 2024

Yes, as I said in another comment: "By constraining the field it is trying to solve it makes grounding the natural language question in a knowledge graph tractable."

IRS rules should be tractable!

Metricon · on Oct 7, 2024

There are a number of ways this might get solved, but I would speculate that it will generally be solved by adding image metadata that is signed by a certificate authority similar to the way SSL certificates are assigned to domains.

I think eventually all digital cameras and image scanners will securely hash and sign images just as forensic cameras do to certify that an image was "captured" instead of generated.

Of course this leaves a grey area for image editing applications such as Photoshop, so there may also need to be some other level of certificate base signing introduced there as well.

Metricon · on April 7, 2024

For those who might not be aware of this, there is also an open source project on GitHub called "Twinny" which is an offline Visual Studio Code plugin equivalent to Copilot: https://github.com/rjmacarthy/twinny

It can be used with a number of local model services. Currently for my setup on a NVIDIA 4090, I'm running both the base and instruct model for deepseek-coder 6.7b using 5_K_M Quantization GGUF files (for performance) through llama.cpp "server" where the base model is for completions and the instruct model for chat interactions.

llama.cpp: https://github.com/ggerganov/llama.cpp/

deepseek-coder 6.7b base GGUF files: https://huggingface.co/TheBloke/deepseek-coder-6.7B-base-GGU...

deepseek-coder 6.7b instruct GGUF files: https://huggingface.co/TheBloke/deepseek-coder-6.7B-instruct...

Metricon · on March 13, 2024

Started 1982 on Tandy Color Computer. Still at it.

#1 Advice - Focus on getting things done (as many will not) and Lego build interconnected/isolated simplicity as much as possible.

Metricon · on Jan 11, 2024

Currently, if you disable chat history, you'll see this message:

Chat History is off for this browser. When history is turned off, new chats on this browser won't appear in your history on any of your devices, be used to train our models, or stored for longer than 30 days. This setting does not sync across browsers or devices.

abid786 · on Jan 11, 2024

It's absolutely insane to trust that they won't do this.

tempestn · on Jan 11, 2024

No it's not. If they explicitly say they won't train on your data and then they do, it's going to come out in discovery of one of the lawsuits they're fighting, and the consequences would be significant. Plus there's little incentive for them to lie about it, given most people leave history on.

risho · on Jan 11, 2024

yeah because no large tech company has ever lied to their customers about how their data is being handled. oh wait there are lawsuits surrounding this sort of thing all the time.

tempestn · on Jan 11, 2024

I wouldn't trust them with nuclear secrets, but to say it's "insane" to trust that they're going to do what they explicitly say they're going to do just isn't logical.

knosh · on Jan 11, 2024

https://privacy.openai.com/policies

They hide this link a bit. They completed my opt-out request in about ten minutes and at least claim to be not using any of my data going forward for training.

I didn't lose any features like Chat History

Metricon · on Dec 9, 2023

BTW, for anyone who might not be aware of it, this model trained by Intel based on the Mistral architecture is probably the single best general 7B model available currently:

https://huggingface.co/Intel/neural-chat-7b-v3-2 (also see https://huggingface.co/Intel/neural-chat-7b-v3-1 from the previous version for more details)

It's licensed Apache 2.0 and unaligned (uncensored).

gardnr · on Dec 9, 2023

How is it better than the model from the team that made the dataset? https://huggingface.co/Open-Orca/Mistral-7B-SlimOrca

anon373839 · on Dec 9, 2023

The Intel one had supervised fine-tuning with the SlimOrca dataset, and then DPO alignment on top of that using a preference dataset.

The technique for generating the preference data is what’s so interesting about that one. Instead of having human labelers choose a preferred response, they generated a response from a small model and a large model, and then always selected the large one’s as the preferred response.

Metricon · on Dec 9, 2023

I haven't personally tried that one, but on the HuggingFace LLM Leaderboard:

Open-Orca/Mistral-7B-SlimOrca - AVG: 60.37, ARC: 62.54, HellaSwag: 83.86, MMLU: 62.77, TruthfulQA: 54.23, Winogrande: 77.43, GSM8k: 21.38

Intel/neural-chat-7b-v3-2 - AVG: 68.29, ARC: 67.49, HellaSwag: 83.92, MMLU: 63.55, TruthfulQA: 59.68, Winogrande: 79.95, GSM8k: 55.12

Metricon · on Nov 12, 2023

The RSS feed for it here appears to work: https://www.latent.space/feed

swyx · on Nov 12, 2023

latent space author here - its a substack, it should work

let me know if any other issues!