Hacker Newsnew | past | comments | ask | show | jobs | submit | mfalcon's commentslogin

Nice, I'll take a look. I was thinking about building a benchmark similar to the one you described, but first focusing on the negotiation between the store and the product suppliers.

Does your software also handle this type of task?


Yes, the Shopify alternative is called Openfront[0]. Before that, I built Openship[1], an e-commerce OMS that connects Openfront (and other e-commerce platforms) to fulfillment channels like print on demand. There isn’t negotiation built in but you connect to something like Gelato[2] and when you get orders on Openfront, they are sent to Gelato to fulfill and once they ship them, tracking’s relayed back to Openfront through Openship.

0. https://github.com/openshiporg/openfront

1. https://github.com/openshiporg/openship

2. https://www.gelato.com


I was eagerly waiting for a chapter on semantic similarity as I was using Universal Sentence Encoder for paraphrase detection, then LLMs showed up before that chapter :).

I had been working on NLP, NLU mostly, some years before LLMs. I've tried the universal sentence encoder alongside many ML "techniques" in order to understand user intentions and extract entities from text.

The first time I tried chatgpt that was the thing that surprised me most, the way it understood my queries.

I think that the spotlight is on the "generative" side of this technology and we're not giving the query understanding the deserved credit. I'm also not sure we're fully taking advantage of this funcionality.


Yes, I was (and still am) similarly impressed with LLMs ability to understand the intent of my queries and requests.

I've tried several times to understand the "multi-head attention" mechanism that powers this understanding, but I'm yet to build a deep intuition.

Is there any research or expository papers that talk about this "understanding" aspect specifically? How could we measure understand without generation? Are there benchmarks out there specifically designed to test deep/nuanced understanding skills?

Any pointers or recommended reading would be much appreciated.


You can evaluate with your programming language of choice.


Good idea for a follow up post :)


Yes, and these problems are more present in the first iterations, when you are still trying to get a good enough agent behaviour.

I'm still thinking about good ways to mitigate this issue, will share.


Hey fellow hners, OP here. Been working on agents for a while so I started sharing some things.

The idea is to keep updating this post with a few more approaches I'd been using.


I think that the natural language understanding capability of current LLMs is undervalued.

To understand what the user meant before LLM's we had to train several NLP+ML models in order to get something going but in my experience we'll never get close to what LLM's do now.

I remember the first time I tried ChatGPT and I was surprised by how well it understood every input.


It's parsing. It's tokenizing. But it's a stretch to call it understanding. It creates a pattern that it can use to compose a response. Ensuring the response is factual is not fundamental to LLM algorithms.

In other words, it's not thinking. The fact that it can simulate a conversation between thinking humans without thinking is remarkable. It should tell us something about the facility for language. But it's not understanding or thinking.


I know that the "understanding" is a stretch, but I refer to the Understanding of the NLU that wasn't really understanding either.


I guess it depends on how you use the LLMs. We implemented some workflows where the LLMs were used only for dialogue understanding, then the system response was generated by classic backend code.


That's the way I'd used it, I've built a document with all the requirements and then gave it to CC. But it was not a final document, I had to go back and make some changes after experimenting with the code CC built.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: