More

fangpenlin · 2025-03-07T21:21:38 1741382498

I heard that Nvidia's graph cards are the best in the class in terms of power consumption vs TFLOP ration. I wonder what's the number of AMD vs Nvidia? I would like to see the number because power consumption is going to take a big portion of AI training. In comparsion, hardware might not be that expensive in the long run.

fangpenlin · 2025-03-05T21:37:04 1741210624

For a usecase like this, a local running model would be ideal. I won't like to share my personal accounting books with LLM either.

conradev · 2025-03-06T03:31:22 1741231882

Very much agreed. I couldn’t even bring myself to use GitHub (much less BeanHub) for my Beancount.

I use git locally because the ledger is extremely sensitive

fangpenlin · 2025-03-05T20:20:15 1741206015

Some people, myself included, prefer text-based files as a user interface. Like, some Vim users won't leave their Vim session forever and would like to do everything in it. While SQLite is immortal software and will probably be there forever, using it means changing the UI/UX from text files to SQL queries or other CLI/UI operations. I think it's a preference for UI/UX style instead of a technical decision. For that preference of UI/UX, we can push on the technical end to solve some challenges.

fangpenlin · 2025-03-05T20:09:22 1741205362

If there's anything like immortal software, SQLite is definitely on the list

fangpenlin · 2025-03-05T19:42:25 1741203745

Hi, the author here.

If you are okay with Plaid[1], many of their bank connections are now using OAuth-style authentication instead of password sharing. I actually added a new feature called Direct Connect[2] a while back to allow any plaintext accounting book users to pull CSV directly via Plaid API through BeanHub. We don't train AI models with our customers' transactions, and if we want to, we will ask for explicit consent (not just ToS) and anonymize customers' data.

If you're okay with the above, the key to achieving a high automation level is the ability to pull CSV transaction files directly from the bank in a standard format. Maybe you can give it a try. We have 30 days free trial period.

I am not so familiar with the CMMC requirements, as you mentioned, but for us to access transactions from some banks, such as Chase, Plaid requires us to pass an auditing process about our security measurements. Is the CMMC compliance your company needs to meet to take a third-party software vendor into considerations?

[1]: https://plaid.com

[2]: https://beanhub.io/blog/2025/01/16/direct-connect-repository...

fangpenlin · 2025-03-05T19:26:56 1741202816

Hi, the author here.

I get where you're coming from. My books are also growing big right now, and indeed, they have become slower to process. Some projects in the community, such as Beanpost [1], are actually trying to solve the problem, as you said, by using an RMDB instead of plaintext.

But I still like text file format more for many reasons. The first would be the hot topic, which is about LLM friendliness. While I am still thinking about using AI to make the process even easier, with text-based accounting books, it's much easier to let AI process them and generate data for you.

Another reason is accessibility. Text-based accounting only requires an editor plus the CLI command line. Surely, you can build a friendly UI for SQLite-based books, but then so can text-based accounting books.

Yet another reason is, as you said, Git or VCS (Version control system) friendliness. With text-based, you can easily track all the changes from commit to commit for free and see what's changed. So, if I make a mistake in the book and I want to know when I made the mistake and how many years I need to go back and revise my reports, I can easily do that with Git.

Performance is a solvable technical challenge. We can break down the textbooks into smaller files and have a smart cache system to avoid parsing the same file repeatedly. Currently, I don't have the bandwidth to dig this rabbit hole, but I already have many ideas about how to improve performance when the file grows really big.

[1]: https://github.com/gerdemb/beanpost

asadjb · 2025-03-05T21:29:31 1741210171

Thanks for responding and your thoughts! Generally agreed with all you said.

However, I feel like maybe a different approach could be to store all the app state in the DB, and then export to this text only format when needed; like when interacting with LLMs or when someone wants an export of their data.

Breaking the file into smaller blocks would necessarily need a cache system I guess, and then maybe you're implementing your own DB engine in the cache because you still want all the same functions of being able to query older records.

There's no easy answer I guess, just different solutions with different tradeoffs.

But what you've built is very cool! If I was still doing text based accounting I would have loved this.

fangpenlin · 2025-03-05T19:10:20 1741201820

Hi, the author here.

Many customers have asked me about AI offerings, and I am considering them. While this is doable with modern LLM technologies, I need to consider many issues.

The first is that nobody, myself included, likes their data being part of someone else's machine-learning training pipeline. That's why I promised my users that I wouldn't use their data for machine learning training without asking for explicit consent (and, of course, anonymization will be needed).

While I know everything involved in AI sounds cool, do we really need LLM for a task like this? Maybe a rule-based import engine could kill 95% of the repeating transactions? And that's why I built beanhub-import[1] in the first place. Then, here comes another question: Should I make LLM generate the rule for you or generate the final transactions directly?

Yet another question is, everybody/every company's book is different from one to another. Even if you can train a big model to deal with the most common approaches, the outcome may not be what you really need. So, I am thinking about possibly using your own Git history as a source of training data to teach machine learning models to generate transactions like you would do. That would be yet another interesting blog post, I guess if I actually built a prototype or really made it a feature for BeanHub. But for now, it's still an idea.

[1]: https://beanhub-import-docs.beanhub.io/

fangpenlin · 2025-03-05T18:56:46 1741201006

Hey! Thanks for pointing out. I have already corrected it in my article :)

fangpenlin · 2025-03-05T18:49:43 1741200583

Hi, the author here.

So BeanHub is built on top of Beancount and uses double-entry accounting. It's one of the benefits of double-entry accounting. Many accounting software are not good at dealing with multi-currencies or custom currency. With Beancount, you can define any commodity you want, create transactions, and convert them with different currencies easily. For example, you can define a commodity TSM and create transactions[1] like this:

2025-01-01 commodity TSM

2025-03-05 * "Purchase TSMC"

  Assets:US:Bank:WellsFargo:Checking                        -2,000 USD @ 100 TSM
  
  Assets:US:Bank:Robinhood                                      20 TSM

I think many people trade crypto, and traditional accounting software may not be that friendly to them. That's why I emphasized a bit to the crypto target audience. But you're right; I should make it clearer that it's not just for crypto.

[1]: https://beancount.github.io/docs/beancount_language_syntax.h...

namaria · 2025-03-06T08:39:15 1741250355

Your homepage has an animation where a list of credit and debit transactions are labelled assets and liabilities. That does not bode well to a provider of accounting software.

fangpenlin · 2025-03-03T04:47:10 1740977230

There's a bug in k8s-device-plugin that stops the plugin from even launching, as I mentioned in the article:

https://github.com/NVIDIA/k8s-device-plugin/issues/1182

And I opened a PR for fixing that here:

https://github.com/NVIDIA/k8s-device-plugin/pull/1183

I am unsure if this bug is only for the NixOS environment because its library paths and other quicks differ from those of major Linux distros.

Another major problem was that the "default_runtime_name" in the Containerd config didn't work as expected. I had to create a RuntimeClass and assign it to the pod to make it pick up the Nvidia runtime.

Other than that, I haven't tried K3S, the one I am running is a full-blown K8S cluster. I guess they should be similar.

While there's no guarantee, if you find any hints showing why your Nvidia plugin won't work here, I might be able to help, as I skip some minor issues I encountered in the articles. If it happens to be the ones I faced, I can share how I solved them.

fangpenlin · 2025-03-03T04:53:10 1740977590

By the way, one of the problems I encountered but didn't mention in the article was that the libnvidia-container has problem with the pathes for reading nvidia drivers and libraries under NixOS with its non-POSIX pathes. I had to create a patch for modifying the path files. I just created a Gist here with the patch content:

https://gist.github.com/fangpenlin/1cc6e80b4a03f07b79412366b...

But later on, since I am taking the CDI route, it appears that the libnvidia-container (nvidia-container-cli) is not really used. If you are going with just container runtime approach instead of CDI, you may need a patch like this for the libnvidia-container package.

heywoodlh · 2025-03-03T04:56:29 1740977789

Oooo, thanks for the pointers! Will be revisiting this tomorrow!