Somewhat tangentially, I really dislike that Ultralytics (and others) started slapping higher version numbers of their YOLO variants. Redmon used the numbering scheme v2 and v3 for his improvements on his original model. But Ultralytics' 11 is it's own thing with no connection to Redmon. I just think it gives a misleading impression of what the history is.
Ultralytics also had, for at least ~a year, a language model replying to GitHub issues using their CEO's account (without any kind of disclosure). It was frequently confidently incorrect and probably wasted thousands of developer hours (because when the CEO replies to your issue with advice why wouldn't you take it at face value?!)
Looks like they've since given the bot its own account but that experience definitely soured me on the company.
(Also, there's an MIT licensed implementation of "yolov9" here: https://github.com/WongKinYiu/YOLO . Affiliated with neither Redmond nor Ultralytics as far as I know.)
Yep, the bot gave the complete wrong answer to something for me. Problem is that it wasn’t me that asked the question, and the person who read the answer took it at face value.
I knew from the formulaic response it was an LLM but had to fight with the other person to get them to see it. As soon as you see the question being repeated back at you in summary form as part of the answer it’s probably an LLM.
I've made several contributions to their main repo and the LLM generated mush replies from various core team accounts have been a horror, derailing Issues threads and such. An excellent case study in how not to use LLMs.
The only appropriate response is to turn your own bot on them that submits pointless pull requests so their bot can reply to them with nonsense critcism.
The ultralytics/ultralytics repo is pretty beginner friendly (kudos to them for that) but I surmise that it therefore draws a lot of beginner level coders who can't immediately tell that the AI generated "solutions" are bs.
I guess my question was geared more towards the CEO and company. Either they didn't notice the BS, which isn't great, or the did and chose not to do anything about it, which might be worse.
I think they consult some arcane equation when they need to make a decision. Two of the most important variables are virality of the incident and whether or not anyone affected has an in at Google. Mercury in retrograde messes this equation up.
Serendipitously comes one day after this story[1] was on the front page: at least one Debian maintainer failing to realize the risks of non-alphanumeric usernames. "What could go wrong?" Well, here's Git allowing branch names to contain dollar signs, backticks, etc., because "what could go wrong?"... and... well, this could.
Names are identifiers. Allowing identifiers to contain anything besides identifier characters merely opens new and weird attack vectors.
In Github Actions, because their runners use echo commands to print out environment variables and variables that have been declared via inputs, which in return is parsed in the UI.
So technically, all environment variables are unsanitized and this was only the first problem in a list of bugs. This bug specifically used the "pull_request" event/action because it is automatically executed without any chance of stopping it, and was using details exposed via the pull requests head.ref.
Next up: git usernames and emails that use shellcode injection names, because github probably won't introduce sanitization to all variables/inputs now.
This is a prime example why you should never ever use a shell to log arbitrary data.
Do you have the same reasoning for SQL/XXS injection? Should developers not write code that is resistant to SQL/XXS injection and instead rely on something like a WAF?
These protections (WAF for SQL/XSS, branch names for this) will never be enough. The code/logic must be secure, any additional layer is not enough since the actual target must be secured.
Developers will do it if its necessary, and it is. These situations are just proving it is necessary.
Why not both? Git itself checks and sanitizes branch names, and GitHub should arguably match that behavior. I don't think anyone would object to safer workflows and related tooling instead of being told "here, have some bash inside YAML, now be careful!"
Of course I agree that github should follow the git spec for branch names. I meant that I don't think github should impose any additional restrictions on branch names.
in parlance this is fine because weights of a model is useless without the wrapper code around it. ignoring the monorepo bs a lot of ai companies are pushing (like hugging face also), this is not as misleading as calling open-weight models as open-source.
You are right, my choice of words was poor. The vuln is exactly as you describe, and it's the malicious payload that was not in the codebase (cache poisoning which is not detectable by reviewing the code of the repo).
This is exactly why I'm building Packj audit [1]. It detects malicious PyPI/NPM/Ruby/PHP/etc. dependencies using behavioral analysis. It uses static+dynamic code analysis to scan for indicators of compromise (e.g., spawning of shell, use of SSH keys, network communication, use of decode+eval, etc). It also checks for several metadata attributes to detect bad actors (e.g., typo squatting).