zsyllepsis's comments

zsyllepsis · 2025-07-01T14:00:57 1751378457

True, we have been building conversational interfaces with traditional NLP. In my experience, they’ve been fairly fragile.

Extending the example you gave, nicely packaged, fully deterministic workflows work great in demos. Then customers start going off the paved path. They ask about returning 3 items all at once, or a whole order. They get confused and provide a shipping number instead of an order number. They switch language part of the way through the conversation because they get frustrated by all these follow-up questions.

All of these absolutely can be handled through traditional NLP, but require system designers to account for them, model the conversation, and design their system accordingly to react accordingly. And suddenly the 5-6 step deterministic workflows with a couple of if-branches… isn’t.

zsyllepsis · on Nov 29, 2024

I don’t think that’s what the parent was saying.

There are cases when refactoring Rust code where it’s possible to hit limits in the compiler related to e.g. lifetime inference. When these limits are hit, simple straightforward refactorings that are perfectly safe become more complicated - suddenly you’re forced to manually annotate lifetimes, and to thread those lifetimes through function calls, and…

And your small, incremental refactor suddenly isn’t. It doesn’t happen all that often, and they’re working to reduce how often users run into these challenges, but a number of cases like this still exist. And when you run into them it can be a frustrating experience.

estebank · on Nov 29, 2024

Personally I'm convinced that the solution to that is not for the language to be more implicit but rather to make tooling for refactoring more front and center. A task for "add a lifetime to this struct everywhere it's mentioned" is already catered to by modifying the original type and then applying rustfix, but more advanced but relatively common changes should also be mechanized away. The annotations are there not only for the benefit of the compiler but also the developers.

zsyllepsis · on Oct 11, 2024

One potential downside I see with this approach is that it forces you to store and name intermediate results which may or may not be meaningful on their own.

Consider a slightly more complicated example that filters on multiple conditions, say file type, size, and last modified date. These filters could be applied sequentially, leading to names like jsonFiles, then bigJsonFiles, then recentlyModifiedBigJsonFiles. Or alternatively, names which drop that cumulative context.

Of course that can be extracted out into a standalone function that filters on all criteria at once or we can use a combinator to combine them or apply any number of other changes, but generally naming intermediate states can be challenging.

zsyllepsis · on July 25, 2024

I think the labels could be much, much worse. They could contain straight noise, just completely random text - not even words. They could also contain plausible, factual text which otherwise has no relationship with the text.

I think most commonly image datasets like this consist of images and their captions, with the presumption that the content author had _some_ reason of associating the two. The goal of the model is to learn that association. And with a _lot_ of examples, to learn nuanced representations.

In the third image, for example, we see some kind of text on a material. The caption mentions "Every year he rides for someone we know, touched by cancer". Perhaps the model is fed another example of bicycle races, with similar imagery of racing bibs. Perhaps its fed another of a race that specifically mentions it's a charity ride to raise money for cancer. Perhaps....

You get the idea. Alone, each example provides only vague connections between the image and the caption. But when you have a ton of data it becomes easier to separate noise from a weak signal.

zsyllepsis · on June 23, 2024

I don’t think this is accurate. While most popular concatenative languages are stack-based, that is not a requirement for the paradigm. The Wikipedia article calls out a few alternatives, such as Om.

Source: https://en.m.wikipedia.org/wiki/Concatenative_programming_la...

pxeger1 · on June 23, 2024

APL-family languages are somewhat concatenative but not stack-based. They use grouping rules to form chains of composed functions.

zsyllepsis · on June 14, 2024

In my experience SageMaker was relatively straightforward for fine-tuning models that could fit on a single instance, but distributed training still requires a good bit of detailed understanding of how things work under the covers. SageMaker Jumpstart includes some pretty easy out-of-the-box configurations for fine-tuning models that are a good starting point. They will incorporate some basic quantization and other cost-savings techniques to help reduce the total compute time.

To help control costs, you can choose pretty conservative settings in terms of how long you want to let the model train for. Once that iteration is done and you have a model artifact saved, you can always pick back up and perform more rounds of training using the previous checkpoint as a starting point.

zsyllepsis · on June 1, 2024

I think the author’s choice of function to demonstrate purity made this harder to grok as a reader. Asking the reader to “…exclude the I/O interactions…” when considering functional purity makes the analogy much harder to follow.

My interpretation of the thesis of that paragraph is that localized mutation does not violate referential transparency, but getting there required some gracious reading. By the end of the section, we’re given an _example_ of an effectfully pure function, but no standalone definition.

Based on that, I agree that this is a weak point in the overall piece.

zsyllepsis · on May 23, 2024

The author specifically mentions, and benchmarks against, ripgrep in the linked content.

zsyllepsis · on Oct 4, 2023

> Making the red go away is important because the red indicates a problem! This is a lot easier than other ways of discovering the error. Why would you want to discover the error later?

I don't think that's what the parent to your comment is arguing. They are arguing that "making the red go away" isn't the goal, rather that correctness is, and that it's easy to conflate the too when you focus too much on the "red" part, and don't pay attention to the "correct" part.

Worded another way, the mantra of "if it compiles it works" can lead to a dangerous false sense of security if you don't understand the limitations of your type system and what parts of your program is may or may not cover completely.

spankalee · on Oct 5, 2023

> the mantra of "if it compiles it works"

Very few people have this mantra, much less without qualification. That's more of a ML or Haskell kind of point of view, and even then it's known to not be a guarantee. A type Int -> Int -> Int isn't going to enforce that you implement multiplication correctly, instead of add or just 0.

"I refactored, fixed type errors, and it just worked!" is a thing I see a lot, but from people who just experienced it, because it happens a lot. It's a good thing.

drekipus · on Oct 5, 2023

I have this all the time at work.

We have large python codebase, all sorts of linters and checks.

Still have developers do something like:

    # we have a list of things
    # but we only need one
    try:
        a_thing = things[0]
    except IndexError:
        pass
    assert isinstance(a_thing, Thing)

Sure maybe it's a junior thing. But it still happens, and the typing doesn't save us. I don't know how to tell juniors to stop doing this

jpc0 · on Oct 5, 2023

While discussing code review, howabout shooting whoever wrote that comment.

Why is there an assert? Python case a cast function if you really wanted to force the typechecker to see a_thing as Thing but (and I'm sure this is the point you are making) you are likely hiding a prolem.

packetlost · on Oct 5, 2023

In code review, explain what can go wrong and offer suggestions for how to write it such that it doesn't have those problems?

zsyllepsis · on July 18, 2023

> If all you have are TS developers, the. IMHO, you don’t have very good developers.

Does it change anything to reframe this as having a common denominator across all developers? As in, rather than “All of my developers know only TS” to “TS is the common language all of my developers know”.

Particularly in small companies I think it makes more sense to focus on a restricted set of tools and technologies. It makes interviewing easier, ensures mobility of hires to different areas of the code, and produces an easier onboarding experience for new team members.

> It is also boring using only one programming language every day

Interest and passion come from more aspects of a project than the language it was written in. Some projects are interesting because of they incorporate cutting edge research, some because they have highly visible impact on users, and others because the solution involves a careful balance of design constraints. Choice of (or diversity between) language doesn’t have to be the distinguishing factor that makes a project interesting.