Hacker Newsnew | past | comments | ask | show | jobs | submit | peterjliu's commentslogin

emacs and vim are not niche, lol


In 15 years of using nothing but emacs, I have never met another emacs user in any of the companies I worked for. plenty of vim but literally 0 emacs


I have a similar but opposite experience. Since around 2015 I've mostly been working with people who primarily use Emacs. In 2014 I was the only weird one, then next team about 3-5, then a dozen, then there was a team of a few dozen where only two were using Vim. On my current team also most of the devs are Emacs users. However, a lot of people use Emacs with Evil-mode, so I guess they can be considered vimmers.

Also, I don't remember the last time when I worked with anyone who writes code and uses Windows.

Anecdotal experiences can lead to a warped understanding of reality; in mine, Windows and non-emacs users are niche.


My experience aligns with this. I work for a bigco. Yet to meet a fellow Emacs user.


Don't y'all have a #emacs slack channel or equivalent at your company? I work for a medium-sized tech company and we have a single digit amount of emacs users I feel like. The channel is mostly dead except for a few tips and tricks and the odd time people asking how we each install it on our macbooks.

Anecdotally a lot of managers use Emacs, though that may be an age thing.

(I use emacs for Real Work, unless that Real Work involves a JVM. Still do all the git stuff in emacs/magit, though)


Yep. I do as much real work as possible in Emacs. Magit/Org-Mode/Org-roam/Org-gtd/Babel are all pretty essential to my workflow.


seems like misinformation for AWS. CloudFlare probably depends on GCP.


interesting are LLMs a lot better at Go than Rust?


another advantage is people want the Google bot to crawl their pages, unlike most AI companies


Reddit was an interesting case here. They knew that they had particularly good AI training data, and they were able to hold it hostage from the Google crawler, which was an awfully high risk play given how important Google search results are to Reddit ads, but they likely knew that Reddit search results were also really important to Google. I would love to be able to watch those negotiations on each side; what a crazy high stakes negotiation that must've been.


Particularly good training data?

You can't mean the bottom-of-the-barrel dross that people post on Reddit, so not sure what data you are referring to? Click-stream?


Say what you will, but there's a lot of good answers to real questions people have that's on Reddit. There's a whole thing where people say "oh Google search results are bad, but if you append the word 'REDDIT' to your search, you'll get the right answer." You can see that most of these agents rely pretty heavily from stuff they find on Reddit.

Of course, that's also a big reason why Google search results suggest putting glue on pizza.


This is an underrated comment. Yes it's a big advantage and probably a measurable pain point for Anthropic and OpenAI. In fact you could just do a 1% survey of robots.txt out there and get a reasonable picture. Maybe a fun project for an HN'er.


This is right on. I work for a company with somewhat of a data moat and AI aspirations. We spend a lot of time blocking everyone's bots except for Google. We have people whose entire job is it to make it faster for Google to access our data. We exist because Google accesses our data. We can't not let them have it.


Excellent point. If they can figure out how to either remunerate or drive traffic to third parties in conjunction with this, it would be huge.


From documentation: "TLDR; Agentic applications needs both A2A and MCP. We recommend MCP for tools and A2A for agents."

Agents can just be viewed as tools, and vice versa. Is this an attempt to save the launch after getting scooped by MCP?


We've (ex Google Deepmind researchers) been doing research in increasing the reliability of agents and realized it is pretty non-trivial, but there are a lot of techniques to improve it. The most important thing is doing rigorous evals that are representative of what your users do in your product. Often this is not the same as academic benchmarks. We made our own benchmarks to measure progress.

Plug: We just posted a demo of our agent doing sophisticated reasoning over a huge dataset ((JFK assassination files -- 80,000 PDF pages): https://x.com/peterjliu/status/1906711224261464320

Even on small amounts of files, I think there's quite a palpable difference in reliability/accuracy vs the big AI players.


> The most important thing is doing rigorous evals that are representative of what your users do in your product. Often this is not the same as academic benchmarks.

OMFG thank you for saying this. As a core contributor to RA.Aid, optimizing it for SWE-bench seems like it would actively go against perf on real-world tasks. RA.Aid came about in the first place as a pragmatic programming tool (I created it while making another software startup, Fictie.) It works well because it was literally made and tested by making other software, and these days it mostly creates its own code.

Do you have any tips or suggestions on how to do more formalized evals, but on tasks that resemble real world tasks?


I would start by making the examples yourself initially, assuming you have a good sense for what that real-world task is. If you can't articulate what a good task is and what a good output is, it is not ready for out-sourcing to crowd-workers.

And before going to crowd-workers (maybe you can skip them entirely) try LLMs.


> I would start by making the examples yourself initially

What I'm doing right now is this:

  1) I have X problem to solve using the coding agent.
  2) I ask the agent to do X
  3) I use my own brain: did the agent do it correctly?
If the agent did not do it correctly, I then ask: should the agent have been able to solve this? If so, I try to improve the agent so it's able to do that.

The hardest part about automating this is #3 above --each evaluation is one-off and it would be hard to even formalize the evaluation.

SWE bench, for example uses unit tests for this, and the agent is blind to the unit tests --so the agent has to make a red test (which it has never seen) go green.


The code can be used to train on other data. All you really need is a collection of news articles. I think there are some free ones available.

This dataset was only used to benchmark against other published results. It was first proposed in https://arxiv.org/abs/1509.00685.


Author of post here. I'd say most of the examples generated from the best model were good. However we chose examples that were not too gruesome, as news can be :)

We encourage you to try the code and see for yourself.


How does the model deal with dangling anaphora[1]? I wrote a summarizer for Spanish following a recent paper as a side project, and it looks as if I'll need a month of work to solve the issue.

[1] That is, the problem of selecting a sentence such as "He approved the motion" and then realising that "he" is now undefined.


We're not "selecting" sentences as an extractive summarizer might. The sentences are generated.

As for how does the model deal with co-reference? There's no special logic for that.


Wouldn't it suffice to do a coreference pass before extracting sentences? Obviously you'll compound coref errors with the errors in your main logic, but that seems somewhat unavoidable.


I am working on this in my kbsportal.com NLP demo. With accurate coreference substitutions (eg., substituting a previous NP like 'San Francisco' for 'there' in a later sentence, substituting full previously mentioned names for pronouns, etc.) extractive summarization should provide better results, and my intuition is that this preprocessing should help abstractive summarization also.


That is inter-sentence logic? Even humans have trouble with such ambiguity for certain cases.


In the post you mentioned that

>>"In those tasks training from scratch with this model architecture does not do as well as some other techniques we're researching, but it serves as a baseline."

Can you elaborate a little on that? Is the training the problem or is the model just not good at longer texts?


Any chance some trained model will be released?


Any hints on how to integrate the whole document for summarization? ;)

I've seen copynet, where you do seq2seq but also have a copy mechanism to copy rare words from the source sentence to the target sentence.


Is it hard to get the code up and running on Google Cloud? Does TensorFlow come as a service?


For those looking for more context, here's the Google research blog post:

http://googleresearch.blogspot.com/2015/11/tensorflow-google...


Seems like they're running out of an apartment unit:

TrueVault 801 Church St. #1328 Mountain View, CA 94041


Dell "Beginnings" http://youtu.be/Ja61fxmY77Q


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: