@irfn - that's an interesting idea. will definitely try to create benchmark using my local M2 machine and llama3-7b, just for comparison.
yes, ollama and Bodhi App both use llama.cpp but our approaches are different. Ollama embeds a binary within its binary, that it copies to a tmp folder and runs this webserver. any request that comes to ollama is then forwarded to this server, and replied sent back to the client.
Bodhi embeds the llama.cpp server, so there is no tmp binary that is copied. when a request comes to Bodhi App, it invokes the code in llama.cpp and sends the response back to client. So there is no request hopping.
Hope that approach do provide us with some benefits.
Also Bodhi uses Rust as programming language. IMHO rust have excellent interface with C/C++ libraries, so the C-code is invoked using the C-FFI bridge. And given Rust's memory safety, fearless concurrency and zero cost abstractions, should definitely provide some performance benefit to Bodhi's approach.
Will get back to you once I have results for these benchmarks. Thanks for the idea.
Hope you try Bodhi, and have some equally valuable feedback on the app.
The website is beautiful. Thank you for sharing the source code.
I took a look at this source code and could not find where you batch-processed the midjourney API calls. I am interested to learn whether you generated the midjourney images by hands or by scripts.
Midjourney sadly doesn't have public API. The images were prompted manually via Discord. Thanks to permutations [0], I could do 20 images at a time, but it's still a painful process. (The generation of the permutation strings is in the shared code)
you cannot run gumroad like financial transaction business without licence and agreement with stripe. it is bound to be closed down on short notice, and all the earnings will be clawed back.
I'm also glad the 2019 IPO didn't materialize, but if it had, anyone who bought in would have been deserving of no pity and I wouldn't call them "innocent retail investors". As badly managed as WeWork was, this was not a case of fraud. Everything was wide out in the open, for anyone with eyes and a brain to see. Heck, the whole reason the 2019 IPO didn't go through is that WeWork's original S-1 was such a shit show of epic proportions, with nothing but red ink as far as the eye could see and completely made up vanity metrics, that Wall Street's collective reaction was "Are you fucking kidding me?"