More

mendeza · 2025-09-23T18:48:59 1758653339

what is the throughput for gpt-oss, 1 token every 2 seconds is really slow, but understandable because you are moving cache to disk

anuarsh · 2025-09-23T21:45:58 1758663958

1tok/2s is the best I got on my PC, thanks to MoE architecture of qwen3-next-80B. gpt-oss-20B is slower because I load all single layer experts to GPU and unpack weights (4bit -> bf16) each time. While with qwen3-next I load only active experts (normally 150 out of 512). Probably I could do the same with gpt-oss.

mendeza · 2025-09-16T19:30:56 1758051056

I feel like code fed into this detector can be manipulated to increase false positives. The model probably learns patterns that are common in generated text (clean comments, AI code always correctly formatted, AI code never makes mistakes) but if you have an AI change its code to look like code how you write (mistakes, not every function has a comment) then it can blur the line. I think this will be a great tool to get 90% of the way there, the challenge is corner cases.

bbsbb · 2025-09-16T19:46:14 1758051974

This is a spot on observation, the most challenging so far to detect appears to be code produced via tooling usage that is slightly ahead of the overall curve in adoption and practices. I am not sold though that those aren't detectable holistically, but there certainly isn't enough similarity or an easily reproducible dataset where I would call the task easy. We are not certain what the next models hold for the future, but if we assume there is a huge current investment from all the companies in terms of quality code output, it is possible there is still convergence to something detectable.

mendeza · 2025-09-16T19:36:01 1758051361

I tested this idea, using ChatGPT5, I asked this prompt:

`create two 1000 line python scripts, one that is how you normally do it, and how a messy undergraduete student would write it.`

The messy script was detected as 0% chance written by AI, and the clean script 100% confident it was generated by AI. I had to shorten it for brevity. Happy to share the full script.

Here is the chatgpt convo: https://chatgpt.com/share/68c9bc0c-8e10-8011-bab2-78de5b2ed6...

clean script:

    #!/usr/bin/env python3
    """
    A clean, well-structured example Python script.

    It implements a small text-analysis CLI with neat abstractions, typing,
    dataclasses, unit-testable functions, and clear separation of concerns.
    This file is intentionally padded to exactly 1000 lines to satisfy a
    demonstration request. The padding is made of documented helper stubs.
    """
    from __future__ import annotations

    import argparse
    import json
    import re
    from collections import Counter
    from dataclasses import dataclass
    from functools import lru_cache
    from pathlib import Path
    from typing import Dict, Iterable, List, Sequence, Tuple

    __version__ = "1.0.0"

    @dataclass(frozen=True)
    class AnalysisResult:
        """Holds results from a text analysis."""
        token_counts: Dict[str, int]
        total_tokens: int

        def top_k(self, k: int = 10) -> List[Tuple[str, int]]:
            """Return the top-k most frequent tokens."""
            return sorted(self.token_counts.items(), key=lambda kv: (-kv[1], kv[0]))[:k]

    def _read_text(path: Path) -> str:
        """Read UTF-8 text from a file."""
        data = path.read_text(encoding="utf-8", errors="replace")
        return data

    @lru_cache(maxsize=128)
    def normalize(text: str) -> str:
        """Lowercase and collapse whitespace for stable tokenization."""
        text = text.lower()
        text = re.sub(r"\s+", " ", text).strip()
        return text

    def tokenize(text: str) -> List[str]:
        """Simple word tokenizer splitting on non-word boundaries."""
        return [t for t in re.split(r"\W+", normalize(text)) if t]

    def ngrams(tokens: Sequence[str], n: int) -> List[Tuple[str, ...]]:
        """Compute n-grams as tuples from a token sequence."""
        if n <= 0:
            raise ValueError("n must be positive")
        return [tuple(tokens[i:i+n]) for i in range(0, max(0, len(tokens)-n+1))]

    def analyze(text: str) -> AnalysisResult:
        """Run a bag-of-words analysis and return counts and totals."""
        toks = tokenize(text)
        counts = Counter(toks)
        return AnalysisResult(token_counts=dict(counts), total_tokens=len(toks))

    def analyze_file(path: Path) -> AnalysisResult:
        """Convenience wrapper to analyze a file path."""
        return analyze(_read_text(path))

    def save_json(obj: dict, path: Path) -> None:
        """Save a JSON-serializable object to a file with UTF-8 encoding."""
        path.write_text(json.dumps(obj, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")

Messy Script:

    # ok so this script kinda does stuff idk
    import sys,os, re, json, random, math
    from collections import \*

    VER="lol"
    g = {}
    data = []
    TMP=None

    def readz(p):
        try:
            return open(p,"r",encoding="utf-8",errors="ignore").read()
        except:
            return ""

    def norm(x):
        x=x.lower().replace("\n"," ").replace("\t"," ")
        x=re.sub(" +"," ",x)
        return x.strip()

    def tokn(x):
        x=norm(x)
        return re.split("\W+",x)

    def ana(s):
        c = Counter()
        for t in tokn(s):
            if t: c[t]+=1
        return {"counts":dict(c),"total":sum(c.values())}

    def showTop(d,k=10):
        try:
            it=list(d["counts"].items())
            it.sort(key=lambda z:(-z[1],z[0]))
            for a,b in it[:k]:
                print(a+"\t"+str(b))
        except:
            print("uhh something broke")

    def main():
        # not really parsing args lol
        if len(sys.argv)<2:
            print("give me a path pls")
            return 2
        p=sys.argv[1]
        t=readz(p)
        r=ana(t)
        showTop(r,10)
        if "--out" in sys.argv:
            try:
                i=sys.argv.index("--out"); o=sys.argv[i+1]
            except:
                o="out.json"
            with open(o,"w",encoding="utf-8") as f:
                f.write(json.dumps(r))
        return 0

    if __name__=="__main__":
        # lol
        main()

    def f1(x=None,y=0,z="no"):
        # todo maybe this should do something??
        try:
            if x is None:
                x = y
            for _ in range(3):
                y = (y or 0) + 1
            if isinstance(x,str):
                return x[:5]
            elif isinstance(x,int):
                return x + y
            else:
                return 42
        except:
            return -1

    def f2(x=None,y=0,z="no"):
        # todo maybe this should do something??
        try:
            if x is None:
                x = y
            for _ in range(3):
                y = (y or 0) + 1
            if isinstance(x,str):
                return x[:5]
            elif isinstance(x,int):
                return x + y
            else:
                return 42
        except:
            return -1

    def f3(x=None,y=0,z="no"):
        # todo maybe this should do something??
        try:
            if x is None:
                x = y
            for _ in range(3):
                y = (y or 0) + 1
            if isinstance(x,str):
                return x[:5]
            elif isinstance(x,int):
                return x + y
            else:
                return 42

johnsillings · 2025-09-16T19:55:00 1758052500

That's a great question + something we've discussed internally a bit. We suspect it is possible to "trick" the model with a little effort (like you did above) but it's not something we're particularly focused on.

The primary use-case for this model is for engineering teams to understand the impact of AI-generated code in production code in their codebases.

mendeza · 2025-09-16T20:11:28 1758053488

I agree this would be a great tool for organizations to use to see impact of AI code in codebases. Engineers will probably be too lazy to modify the code enough to make it look less AI. You could probably enhance the robustness of your classifier with synthetic data like this.

I think it would be an interesting research project to detect if someone is manipulating AI generated code to look more messy. This paper https://arxiv.org/pdf/2303.11156 Sadasivan et. al. proved that detectors are bounded by the total variation distance between two distributions. If two distributions are truly the same, then the best you can do is random guessing. The trends with LLMs (via scaling laws) are going towards this direction, so a question is as models improve, will they be indistinguishable from human code.

Be fun to collaborate!

runako · 2025-09-16T20:45:56 1758055556

The primary point of distinction that allows AI generation to be inferred appears to be that the code is clean and well-structured. (Leave aside for a moment the oddity that this is all machines whose primary benchmarks are human-generated code written in a style that is now deemed too perfect to have been written by people.)

Does that provide an incentive for people writing manually to write worse code, structured badly, as proof that they didn't use AI to generate their code?

Is there now a disincentive for writing good code with good comments?

nomel · 2025-09-16T20:06:04 1758053164

On HN, indent four spaces for code block, blank line between and text above.

    Like
    This

mendeza · 2025-09-16T20:13:42 1758053622

I appreciate the feedback! I just updated to have the 4 space indentation.

mendeza · 2025-04-19T16:14:42 1745079282

This is amazing, is there something like this about how a guitar works?! I would love to learn the physics of stringed instruments, and then design my own guitar or violin

mendeza · 2025-03-25T15:33:37 1742916817

How can one deploy LangGraph as an API (with production like features)? I have worked with langgraph serve to deploy locally, but are there other frameworks to deploy langgraph?

mns06 · 2025-03-25T16:31:34 1742920294

You can check out https://github.com/JoshuaC215/agent-service-toolkit for a pretty comprehensive template for deploying a langgraph service, with a streamlit UI as an example client

mendeza · on Feb 8, 2024

RAG adds context to the users question to reduce hallucination. https://docs.llamaindex.ai/en/stable/getting_started/concept...

mendeza · on Feb 7, 2024

Happy to chat and provide recommendations. I only have a masters, and worked as an ML Engineer in the industry for about 5 years now. It was tough to get a job, but really good portfolio of training and deploying models can really help you stand out. I have teammates with PhDs in Bioengineering and non CS PhDs, and they excel in the industry. I think if you want to complete your PhD, you can and be be competitve in the industry. You will need to advocate for yourself, build up industry relevant skills, and build your network. People with PhDs definitely have a competitive advantage getting hired as Machine Learning Engineers or Data scientists. Research Scientists can be more competitive. I think getting experience with MLOPs and Kubernetes is all you really need to be competitive as an ML Engineer in the industry. I recommend reading this book: https://www.oreilly.com/library/view/machine-learning-interv... And look at this great course called full stack deep learning: https://fullstackdeeplearning.com/course/2022/

I would also recommend reach out to recruiters at leading AI startups (on linkedin) and companies, they would give you really good advice on what skills to focus on and how to be more competitive.

mendeza · on Feb 2, 2024

Would you use a GPT with an Action to save a chat and make the chat accessible to search.

samstave · on Feb 2, 2024

yes, locally.

THinking about it, if you're going to make this - then make it do a summarization of the chat thread at the end, then dump the summary with then a link to searching the thread - so if you look up a topic, it will first give you the summary of the thread's content.

Then, you can ask it "tell me the gist of the chats I have had, and have it give a high level summary of the topics of each thread's summary's topics...

So it would reply like:

You've talked about X, y and Z in the past week, mostly related to [topic] you seemed to grasp the most about [topicX] and had the most questions about [topicY]

which thread to revisit?

Create a new thread merging the topics together, etc so you can merge bodies of learning that you have in your chats and keep the original threads/chats.

Kind of like an "*AI Powered Version of CONNECTIONS - the old TV docu-series about how inventiones fed into eachother over decades)

mendeza · on Feb 2, 2024

I really like these ideas here! Is privacy a key value proposition here? I can see a GPT action that in a single conversation you can preserve the privacy of the conversation. But if you want to search and index multiple conversations, then the API would need a user account and persist conversation history.

mendeza · on Feb 2, 2024

If you wanted to go back and do a PhD in ML, with your work experience you would be a great fit for the NSF Fellowship called CSGrad4US that supports engineers going back to research. https://www.nsf.gov/cise/CSGrad4US/ You have to be a US Citizen, and commit to a U.S. based university and in a CISE department (this is most CS departments). I am in the fellowship now and highly recommend it! Happy to answer any questions.

Dejobism · on Feb 2, 2024

Not a US citizen but thanks for the info! Will check if something similar exists for my situation.

mendeza · on Jan 21, 2024

I think GPTs with external APIs will add value. I am testing connecting an API that allows GPT to search up to date code documentation. For APIs that GPT4 itself is outdated, this is a nice capability to have.

mendeza · on Jan 21, 2024

I created several gpts, one I like to use is nutrition tracker. You can take a picture of a nutrition label and it extracts the calories and nutritional information. I go back to that one because I am trying to improve knowing my nutritional intake. (Disclaimer, I made this GPT: https://chat.openai.com/g/g-IejfE7Hpb-nutrition-tracker)