Hacker Newsnew | past | comments | ask | show | jobs | submit | icyfox's commentslogin

Exactly half of these HN usernames actually exist. So either there are enough people on HN that follow common conventions for Gemini to guess from a more general distribution, or Gemini has memorized some of the more popular posters. The ones that are missing:

- aphyr_bot - bio_hacker - concerned_grandson - cyborg_sec - dang_fan - edge_compute - founder_jane - glasshole2 - monad_lover - muskwatch - net_hacker - oldtimer99 - persistence_is_key - physics_lover - policy_wonk - pure_coder - qemu_fan - retro_fix - skeptic_ai - stock_watcher

Huge opportunity for someone to become the actual dang fan.


Before the AI stuff Google had those pop up quick answers when googling. So I googled something like three years ago, saw the answer, realized it was sourced from HN. Clicked the link, and lo and behold, I answered my own question. Look mah! Im on google! So I am not surprised at all that Google crawls HN enough to have it in their LLM.

I did chuckle at the 100% Rust Linux kernel. I like Rust, but that felt like a clever joke by the AI.


I laughed at the SQLite 4.0 release notes. They're on 3.51.x now. Another major release a decade from now sounds just about right.

That one got me as well - some pretty wild stuff about prompting the compiler, starship on the moon, and then there's SQLite 4.0

You can criticize it for many things but it seems to have comedic timing nailed.

The promise is backwards compatibility in the file format and C API until 2050.

https://sqlite.org/lts.html


I wouldn't be surprised if it went towards the LaTeX model instead where there's essentially never another major version release. There's only so much functionality you need in a local only database engine I bet they're getting close to complete.

I'd love to see more ALTER TABLE functionality, and maybe MERGE, and definitely better JSON validation. None of that warrants a version bump, though.

You know what I'd really like, that would justify a version bump? CRDT. Automatically syncing local changes to a remote service, so e.g. an Android app could store data locally on SQLite, but also log into a web site on his desktop and all the data is right there. The remote service need not be SQLite - in fact I'd prefer postgres. The service would also have to merge databases from all users into a single database... Or should I actually use postgres for authorisation but open each users' data in a replicated SQLite file? This is such a common issue, I'm surprised there isn't a canonical solution yet.


I think the unified syncing while neat is way beyond what SQLite is really meant for and you'd get into so many niche situations dealing with out of sync master and slave 'databases' it's hard to make an automated solution that covers them effectively unless you force the schema into a transactional design for everything just to sort out update conflicts. eg: Your user has the app on two devices uses one while it doesn't have an internet connection altering the state and then uses the app on another device before the original has a chance to sync.

Yes, it's a difficult problem. That's why I'd like it to be wrapped in a nice package away from my application logic.

Even a product that does this behind the scenes, by wrapping SQLite and exposing SQLite's wrapped interface, would be great. I'd pay for that.


If it had been about GIMP I would have laughed harder.

Be reasonable. It's only looking forward a single decade.

Every few years I stumble across the same java or mongodb issue. I google for it, find it on stackoverflow, and figure that it was me who wrote that very answer. Always have a good laugh when it happens.

Usually my memory regarding such things is quite well, but this one I keep forgetting, so much so that I don't remember what the issue is actually about xD


I've run into my own comments or blog posts more often than I care to admit...

Several decades into this, I assume all documentation I write is for my future self.

Beautifully self-serving while being a benefit to others.

Same thing with picking nails up in the road to prevent my/everyone’s flat tire.


ziggy42 is both a submitter of a story on the actual front page at the moment, and also in the AI generated future one.

See other comment where OP shared the prompt. They included a current copy of the front page for context. So it’s not so surprising that ziggy42 for example is in the generated page.

And for other usernames that are real but not currently on the home page, the LLM definitely has plenty occurrences of HN comments and stories in its training data so it’s not really surprising that it is able to include real usernames of people that post a lot. Their names will be occurring over and over in the training data.


one more reason to doubt that it's Ai-generated

HN has been used to train LLMs for a while now, I think it was in the Pile even

It has also fetched the current page in background. Because the jepsen post was recently on front page.

I may die but my quips shall live forever

You can straight up ask Google to look for reddit, hackernews users post history. Some of it is probably just via search because it's very recent, as in last few days. Some of the older corpus includes deleted comments so they must be scraping from reddit archive apis too or using that deprecated google history cache.

So many underscores for usernames, and yet, other than a newly created account, there was 1 other username with an underscore.

In 2032 new HN usernames must use underscores. It was part of the grandfathering process to help with moderating accounts generated after the AI singlarity spammed too many new accounts.

my hypothesis is they trained it to snake case for lower case and that obsession carried over from programming to other spheres. It can't bring itself to make a lowercaseunseparatedname

Most LLMs, including Gemini (AFAIK), operate on tokens. lowercaseunseparatedname would be literally impossible for them to generate, unless they went out of their way to enhance the tokenizer. E.g. the LLM would need a special invisible separator token that it could output, and when preprocessing the training data the input would then be tokenized as "lowercase unseparated name" but with those invisible separators.

edit: It looks like it probably is a thing given it does sometimes output names like that. So the pattern is probably just too rare in the training data that the LLM almost always prefers to use actual separators like underscore.


The tokenization can represent uncommon words with multiple tokens. Inputting your example on https://platform.openai.com/tokenizer (GPT-4o) gives me (tokens separated by "|"):

    lower|case|un|se|parated|name

This is definitely based on a search or page fetch, because there are these which are all today's topics

- IBM to acquire OpenAI (Rumor) (bloomberg.com)

- Jepsen: NATS 4.2 (Still losing messages?) (jepsen.io)

- AI progress is stalling. Human equivalence was a mirage (garymarcus.com)


The OP mentioned pasting the current frontpage into the prompt.

What % of today’s front page submissions are from users that have existed 5-10 years+?

(Especially in datasets before this year?)

I’d bet half or more - but I’m not checking.


It does memorize. But that's not actually very news.... I remember ChatGPT 3.5 or old 4.0 to remember some users on some reddit subreddts and all. Saying even the top users for each subreddit..

The thing is, most of the models were heavily post-trained to limit this...


That’s a lot more underscores than the actual distribution (I counted three users with underscores in their usernames among the first five pages of links atm).

either you only notice the xxx_yyy frequent posters or it's quite interesting that so many have this username format

I was here first

Aw, I was actually a bit disappointed how much on the nose the usernames were, relative to their postings. Like the "Rust Linux Kernel" by rust_evangelist, "Fixing Lactose Intolerance" by bio_hacker, fixing an 2024 Framework by retro_fix, etc...

We talked about this model in some depth on the last Pretrained episode: https://youtu.be/5weFerGhO84?si=Eh_92_9PPKyiTU_h&t=1743

Some interesting takeaways imo:

- Uses existing model backbones for text encoding & semantic tokens (why reinvent the wheel if you don't need to?)

- Trains on a whole lot of synthetic captions of different lengths, ostensibly generated using some existing vision LLM

- Solid text generation support is facilitated by training on all OCR'd text from the ground truth image. This seems to match how Nano Banana Pro got so good as well; I've seen its thinking tokens sketch out exactly what text to say in the image before it renders.


I used Serp via API many moons ago. The most interesting part of the company imo is their legal defense of different plans:

  Production - $150
  15,000 searches / month
  U.S. Legal Shield
ie. "Our U.S. Legal Shield protects your right to crawl and parse public search engine data under the First Amendment. We assume scraping and parsing liability for customers on most recurring plans unless your usage is illegal."

I imagine at least some portion of companies use them just for this liability shield.


Sounds a lot like the old guarantee paid SSL certificate providers used to offer; pretty words, but meaningless in practice. (IIRC, no one ever got a payout from any of them.)

"We assume scraping and parsing liabilities for both domestic and foreign companies unless your usage is otherwise illegal" seems like a big loophole in it.


Couldn't this be laid out as, We assume scraping and parsing liability unless it is ruled as being illegal, in which case your use would be illegal and our liability shield wouldn't help you?

> unless your usage is illegal

Like copyright infringement of Google's search results?


I'm always a bit surprised how long it can take to triage and fix these pretty glaring security vulnerabilities. October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed. Sure the actual bug ended up being (what I imagine to be) a <1hr fix plus the time for QA testing to make sure it didn't break anything.

Is the issue that people aren't checking their security@ email addresses? People are on holiday? These emails get so much spam it's really hard to separate the noise from the legit signal? I'm genuinely curious.


In my experience, it comes down to project management and organizational structure problems.

Companies hire a "security team" and put them behind the security@ email, then decide they'll figure out how to handle issues later.

When an issue comes in, the security team tries to forward the security issue to the team that owns the project so it can be fixed. This is where complicated org charts and difficult incentive structures can get in the way.

Determining which team actually owns the code containing the bug can be very hard, depending on the company. Many security team people I've worked with were smart, but not software developers by trade. So they start trying to navigate the org chart to figure out who can even fix the issue. This can take weeks of dead-ends and "I'm busy until Tuesday next week at 3:30PM, let's schedule a meeting then" delays.

Even when you find the right team, it can be difficult to get them to schedule the fix. In companies where roadmaps are planned 3 quarters in advance, everyone is focused on their KPIs and other acronyms, and bonuses are paid out according to your ticket velocity and on-time delivery stats (despite PMs telling you they're not), getting a team to pick up the bug and work on it is hard. Again, it can become a wall of "Our next 3 sprints are already full with urgent work from VP so-and-so, but we'll see if we can fit it in after that"

Then legal wants to be involved, too. So before you even respond to reports you have to flag the corporate counsel, who is already busy and doesn't want to hear it right now.

So half or more of the job of the security team becomes navigating corporate bureaucracy and slicing through all of the incentive structures to inject this urgent priority somewhere.

Smart companies recognize this problem and will empower security teams to prioritize urgent things. This can cause another problem where less-than-great security teams start wielding their power to force everyone to work on not-urgent issues that get spammed to the security@ email all day long demanding bug bounties, which burns everyone out. Good security teams will use good judgment, though.


Oh man this is so true. In this sort of org, getting something fixed out-of-band takes a huge political effort (even a critical issue like having your client database exposed to the world).

While there were numerous problems with the big corporate structures I worked in decades ago where everything was done by silos of specialists, there were huge advantages. No matter where there was a security, performance, network, hardware, etc. issue, the internal support infrastructure had the specialist’s pagers and for a problem like this, the people fixing it would have been on a conference call until it was fixed. There was always a team of specialists to diagnose and test fixes, always available developers with the expertise to write fixes if necessary, always ops to monitor and execute things, always a person in charge to make sure it all got done, and everybody knew which department it was and how to reach them 24/7.

Now if you needed to develop something not-urgent that involved, say, the performance department, database department, and your own, hope you’ve got a few months to blow on conference calls and procedure documents.

For that industry it made sense though.


Interesting. Wouldn't the performance department have their fingers in all the pies anyway, too, or how was that handled?

Their job was specifically managing server resource allocation— as an IT role and not a dev role— in a completely standardized environment. Most applications were given a standard allotment of resources, and they only got involved if something was running out of ram, disk access was too slow, or something just seemed to be taking a lot longer than usual. If it seemed to be a network problem, or just a program crash, for example, they were never involved unless troubleshooting indicated it involved them. More often than not, I’d get a phone call telling me the system I was working on seemed to be heavy on the disk access or something, and they had already allotted it more to keep it stable, but I should check to make sure we weren’t doing something stupid.

Now that I think of it, I’ll bet a lot of companies have a system similar to this for their infrastructure… they just outsource it to AWS, Azure, Google, etc. and comparatively fly by the seat of their pants on the dev side. You could only scale that system down so much, I imagine.


> Many security team people I've worked with were smart, but not software developers by trade.

A lot are people who cannot code at all, cannot administer - they just fill tables and check boxes, maybe from some automated suite. They dont know what http and https is, because they are just paper pushers what is far from real security, but more like security in name only.

And they joined the work since it pays well


Great comment. Very true.

A lot of the time it’s less “nobody checked the security inbox” and more “the one person who understands that part of the system is juggling twelve other fires.” Security fixes are often a one-hour patch wrapped in two weeks of internal routing, approvals, and “who even owns this code?” archaeology. Holiday schedules and spam filters don’t help, but organizational entropy is usually the real culprit.

> A lot of the time it’s less “nobody checked the security inbox” and more “the one person who understands that part of the system is juggling twelve other fires.”

At my past employers it was "The VP of such-and-such said we need to ship this feature as our top priority, no exceptions"


I've once had a whole sector of a fintech go down because one DevOps person ignored daily warning emails for three months that an API key was about to expire and needed reset.

And of course nobody remembered the setup, and logging was only accessible by the same person, so figuring out also took weeks.


I'm currently on the other side of this trying to convince management that the maintenance that should have been done 3 years ago needs to get done. They need "justification".

Write a short memo that saying you are very concerned, and describe a range of things that may happen (from "not much" over medium to maximum scare - lawsuits, brand/customer trust destroyed etc.).

Email the memo to a decision maker with the important flag on and CC: another person as a witness.

If you have been saying it for a long time and nobody has taken any action, you may use the word "escalation" as part of the subject line.

If things hit the fan, it will also make sure that what drops from the fan falls on the right people, and not on you.


It could also be someone "practicing good time management."

They have a specific time of day, when they check their email, and they only give 30 minutes to that time, and they check emails from most recent, down.

The email comes in, two hours earlier, and, by the time they check their email, it's been buried under 50 spams, and near-spams; each of which needs to be checked, so they run out of 30 minutes, before they get to it. The next day, by email check time, another 400 spams have been thrown on top.

Think I'm kidding?

Many folks that have worked for large companies (or bureaucracies) have seen exactly this.


The system would be mostly sane, if you could sort by some measure of importance, not just recency.

It's not about fixing it, it's about acknowledging it exists

security@ emails do get a lot of spam. It doesn't get talked about very much unless you're monitoring one yourself, but there's a fairly constant stream of people begging for bug bounty money for things like the Secure flag not being set on a cookie.

That said, in my experience this spam is still a few emails a day at the most, I don't think there's any excuse for not immediately patching something like that. I guess maybe someone's on holiday like you said.


This.

There is so much spam from random people about meaningless issues in our docs. AI has made the problem worse. Determining the meaningful from the meaningless is a full time job.


This is where “managed” bug bounty programs like BugCrowd or HackerOne deliver value: only telling you when there is something real. It can be a full time job to separate the wheat from the chaff. It’s made worse by the incentive of the reporters to make everything sound like a P1 hair-on-fire issue.

Half of the emails I used to get in a previous company were pointless issues, some coming from a honey pot.

The other half was people demanding payment.


Training a tech support team of interns to solve all of them would be an enviable hacker or software dev training program.

Use AI for that :)

Not kidding, I bet llm’s are excellent at triaging these reports. Humans, in a corporate setting, are apparently not.

My favorite one is the "We've identified a security hole in your website"... and I always respond quickly that my website is statically generated, nothing dynamic and immutable on cloudflare pages. For some odd reason, I never hear back from them.

Well we have 600 people in the global response center I work at. And the priority issue count is currently 26000. That means its serious enough that its been assigned to some one. There are tens of thousands of unassigned issues cuz the traige teams are swamped. People dont realize as systems get more complex issues increase. They never decrease. And the chimp troupes response has always been a Story - we can handle it.

The security@ inbox has so much junk these days with someone reporting that if you paste alert('hacked') into devtools then it makes the website hacked!

I reckon only 1% of reports are valid.

LLM's can now make a plausible looking exploit report ('there is a use after free bug in your server side implementation of X library which allows shell access to your server if you time these two API calls correctly'), but the LLM has made the whole thing up. That can easily waste hours of an experts time for a total falsehood.

I can completely see why some companies decide it'll be an office-hours-only task to go through all the reports every day.


My favorite was "we can trigger your website to initiate a connection to the server we control". They were running their own mail servers and were creating a new accounts on our website. Of course someone needs to initiate a TCP connection to deliver an email message!

Of course this could be a real vulnerability if it would disclose the real server IP behind cloudflare. This was not the case, we were sending via AWS email gateway


Not every organization prioritizes being able to ship a code change at the drop of a hat. This often requires organizational dedication to heavy automated testing a CI, which small companies often aren't set up to do.

I can't believe that any company takes a month to ship something. Even if they don't have CI, surely they'd prefer to break the app (maybe even completely) than risk all their legal documents exfiltrated.

> I can't believe that any company takes a month to ship something.

Outside of startups and big tech, it's not uncommon to have release cycles that are months long. Especially common if there is any legal or regulatory involvement.


I can only say you havent worked anywhere i have.

I remember heartbleed dropping shortly after a deployment and not being allowed to patch for like ten months because the fix wasn't "validated". This was despite insurers stating this issue could cost coverage and legal getting involved.


What? That's crazy, wow!

It’d be pretty reasonable to take the whole API down in this scenario, and put it back up once it’s patched. They’d lose tons of cash but avoid being liable for extreme amounts of damages.

> October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed

I have unfortunately seen way worse. If it will take more than an hour and the wrong people are in charge of the money, you can go a pretty long time with glaring vulnerabilities.


I call that one of the worrisome outcomes from "Marketing Driven Development" where the business people don't let you do technical debt "Stories" because you REALLY need to do work that justifies their existence in the project.

Another aspect to consider: when you reduce the amount of permission anything has (like here the returned token), you risk breaking something.

In a complex system it can be very hard to understand what will break, if anything. In a less complex system, it can still be hard to understand if the person who knows the security model very well isn't available.


> October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed

There is always the simple answer, these are lawyers so they are probably scrambling internally to write a response that covers themselves legaly also trying to figure out how fucked they are.

1 week is surprisingly not that slow.


I'm a bit conflicted about what responsible disclosure should be, but in many cases it seems like these conditions hold:

1) the hack is straightforward to do;

2) it can do a lot of damage (get PII or other confidential info in most cases);

3) downtime of the service wouldn't hurt anyone, especially if we compare it to the risk of the damage.

But, instead of insisting on the immediate shutting down of the affected service, we give companies weeks or months to fix the issue while notifying no one in the process and continuing with business as usual.

I've submitted 3 very easy exploits to 3 different companies the past year and, thankfully, they fixed them in about a week every time. Yet, the exploits were trivial (as I'm not good enough to find the hard ones, I admit). Mostly IDORs, like changing id=123456 to id=1 all the way up to id=123455 and seeing a lot medical data that doesn't belong to me. All 3 cases were medical labs because I had to have some tests done and wanted to see how secure my data was.

Sadly, in all 3 cases I had to send a follow-up e-mail after ~1 week, saying that I'll make the exploit public if they don't fix it ASAP. What happened was, again, in all 3 cases, the exploit was fixed within 1-2 days.

If I'd given them a month, I feel they would've fixed the issue after a month. If I'd given then a year - after a year.

And it's not like there aren't 10 different labs in my city. It's not like online access to results is critical, either. You can get a printed result or call them to write them down. Yes, it would be tedious, but more secure.

So I should've said from the beginning something like:

> I found this trivial exploit that gives me access to medical data of thousands of people. If you don't want it public, shut down your online service until you fix it, because it's highly likely someone else figured it out before me. If you don't, I'll make it public and ruin your reputation.

Now, would I make it public if they don't fix it within a few days? Probably not, but I'm not sure. But shutting down their service until the fix is in seems important. If it was some hard-to-do hack chaining several exploits, including a 0-day, it would be likely that I'd be the first one to find it and it wouldn't be found for a while by someone else afterwards. But ID enumerations? Come on.

So does the standard "responsible disclosure", at least in the scenario I've given (easy to do; not critical if the service is shut down), help the affected parties (the customers) or the businesses? Why should I care about a company worth $X losing $Y if it's their fault?

I think in the future I'll anonymously contact companies with way more strict deadlines if their customers (or others) are in serious risk. I'll lose the ability to brag with my real name, but I can live with it.

As to the other comments talking about how spammed their security@ mail is - that's the cost of doing business. It doesn't seem like a valid excuse to me. Security isn't one of hundreds random things a business should care about. It's one of the most important ones. So just assign more people to review your mail. If you can't, why are you handling people's PII?


Don't do this.

I understand you think you are doing the right thing but be aware that by shutting down a medical communication services there's a non-trivial chance someone will die because of slower test results.

Your responsibility is responsible disclosure.

Their responsibility is how to handle it. Don't try to decide that for them.


> I think in the future I'll anonymously contact companies with way more strict deadlines if their customers (or others) are in serious risk. I'll lose the ability to brag with my real name, but I can live with it.

What you're describing is likely a crime. The sad reality is most businesses don't view protection of customers' data as a sacred duty, but simply another of the innumerable risks to be managed in the course of doing business. If they can say "we were working on fixing it!" their asses are likely covered even if someone does leverage the exploit first—and worst-case, they'll just pay a fine and move on.


Precisely - they view security as just one part of many of their business, instead of viewing it as one of the most important parts. They've insured themselves against a breach, so it's not a big deal for them. But it should be.

The more casualties, the more media attention -> the more likely they, and others in their field, will take security more seriously in the future.

If we let them do nothing for a month, they'll eventually fix it, but in the mean time malicious hackers may gain access to the PII. They might not make it public, but sell that PII via black markets. The company may not get the negative publicity it deserves and likely won't learn to fix their systems in time and to adopt adequate security measures. The sale of the PII and the breach itself might become public knowledge months after the fact, while the company has had a chance to grow in the meantime, and make more security mistakes that may be exploited later on.

And yes, I know it may be a crime - that's why I said I'd report it anonymously from now on. But if the company sits on their asses for a month, shouldn't that count as a crime, as well? The current definition of responsible disclosure gives companies too much leeway, in my opinion.

If I knew I operated a service that was trivial to exploit and was hosting people's PII, I'd shut it down until I fixed it. People won't die if I make everything in my power to provide the test results (in my example of medical labs) to doctors and patients via other means, such as via paper or phone. And if people do die, it would be devastating, of course, but it would mean society has put too much trust into a single system without making sure it's not vulnerable to the most basic of attacks. So it would happen sooner or later, anyway. Although I can't imagine someone dying because their doctor had to make a phone call to the lab instead of typing in a URL.

The same argument about people dying due to the disruption of the medical communications system could be made about too-big-to-fail companies that are entrenched into society because a lot of pension funds have invested in them. If the company goes under, the innocent people dependent on the pension fund's finances would suffer. While they would suffer, which would be awful, of course, would the alternative be to not let such companies go bankrupt? Or would it be better for such funds to not rely so much on one specific company in the first place? That is to say, in both cases (security or stocks in general) the reality is that currently people are too dependent on a few singular entities, while they shouldn't be. That has to change, and the change has to begin somewhere.


Seems like the live demo is bear hugged - been waiting for ~5 minutes now. A bit ironic given their landing page: Don’t make your prospects wait–ever again

In its current iteration this demo might net discourage your future clients rather than encourage them.

I like the idea in general as an alternative to needing to book with a BDE. I'd always prefer to just self serve for a new product; anything that gates my time (sales calls, popover walkthroughs, etc) is something I'd prefer to skip. But I know non-engineering customers really love these calls to see the power of a new platform. I wonder if they'll be as engaged during an AI walkthrough versus when there's a person on the other end of the phone.


hi thanks! the load was massive try now!


Pretty happy the under 200k token pricing is staying in the same ballpark as Gemini 2.5 Pro:

Input: $1.25 -> $2.00 (1M tokens)

Output: $10.00 -> $12.00

Squeezes a bit more margin out of app layer companies, certainly, but there's a good chance that for tasks that really require a sota model it can be more than justified.


Every recent release has bumped the pricing significantly. If I was building a product and my margins weren’t incredible I’d be concerned. The input price almost doubled with this one.


I'm not sure how concerned people should be at the trend lines. If you're building a product that already works well, you shouldn't feel the need to upgrade to a larger parameter model. If your product doesn't work and the new architectures unlock performance that would let you have a feasible business, even a 2x on input tokens shouldn't be the dealbreaker.

If we're paying more for a more petaflop heavy model, it makes sense that costs would go up. What really would concern me is if companies start ratcheting prices up for models with the same level of performance. My hope is raw hardware costs and OSS releases keep a lid on the margin pressure.


Nice to see you on here! I used the ContentDetector with a threshold of 27.0 and otherwise default parameters. Realize I could have done a grid sweep to really hone in on a good param range, but because I had only one input video labeled I wanted something that would work well enough out of the box. I imagine this dataset is rather... heterogenous.

If you happen to know a better apriori threshold I would be happy to re-run the analysis and update the chart.


If you're willing, could you try using AdaptiveDetector? It should have better defaults for handling fast camera movements a bit better.

The threshold values themselves can be tuned if you generate a statsfile and plot the result, but that can sometimes be tedious if you have a lot of files (thus the huge interest in methods like TransNetV2). Glad to see the real world applications of those in action. You can always just increase/decrease the threshold by 5-10% depending on if you find it's too sensitive or not sensitive enough as well.

Thanks for the response!


AdaptiveDetector definitely did a better job, will append these new stats to the post:

precision 0.397, recall 0.727, F1 0.513, mean temporal error 0.307 s


I bought a Sony TRV120 10+ years ago, back when I was doing this conversion project for the first time. It's built like a tank and still works today.

At the risk of being smited by professional archivists, I'm willing to wager that 99.9% of people can't tell the difference between a $5k archival rig and one of these higher quality camcorders. At this point it really feels like the biggest inhibiter to good quality digitization is the decaying of the tape versus the archival setup.

That said - for anyone with the time, patience, and soldering abilities I would love a more proper A/B test with RF signal capture software encoding. Something like this:

https://rastrillo.ca/digitizing-video8-tapes-with-vhs-decode...


There be dragons - crazy things like taking several captures of the same tape and averaging the frames and such. :-)


Since the webapp is pretty opinionated to my setup (ie. linking against AVFoundation, using MPS for inference, always capturing an image after import) I didn't originally think it would be that useful to open source. Happy to do so - are you looking to get something specific out of it?


Most of my tapes did have pretty detailed narration and date overlays written directly to tape. But even without narration I still had luck doing basic event summarization and facial recognition of family members to build the tags.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: