Hacker Newsnew | past | comments | ask | show | jobs | submit | lukeramsden's commentslogin

Why are AI researchers constantly handicapping everything they do under the guise of ""safety""? It's a bag of data and some math algorithms that generate text....


> It's a bag of data and some math algorithms that generate text....

I agree with the general premise of too much "safety", but this argument is invalid. Humans are bags of meat and they can do some pretty terrible things.


But what we're doing to these models is literally censoring what they're saying - not doing.

I don't think that anyone has any problems with stopping random AIs when they're doing crimes (or more realistically the humans making them do that) - but if you're going to make the comparison to humans in good faith, it'd be a person standing behind you, punishing you when you say something offensive.


What I'm saying is that the argument "they're math and data, therefore what they say is safe" is not a valid one.


> Why are AI researchers constantly handicapping everything

Career and business self-preservation in a social media neurotic world. It doesn't take much to trigger the outrage machine and cancel every future prospect you might have, especially in a very competitive field flush with other "clean" applicants.

Just look at the whole "AI racism" fustercluck for a small taste.


Lets reverse this - why wouldn't they do that? I agree with you, but LLMs tend to be massively expensive and thus innately tied to ROI. A lot of companies fret about advertising even near some types of content. The idea of spending millions to put a racist bot on your home page is, no surprise, not very appetizing.

So of course if this is where the money and interest flows then the research follows.

Besides, it's a generally useful area anyway. The ability to tweak behavior even if not done for "safety" still seems pretty useful.


What if an AI model could tell you exactly how to modify a common virus to kill 50% of everyone it infects?


Yeah. It's will start it's instruction with recommendation of buying some high-tech biolab for $100,000,000.

Seriously. The reason why we dont have mass killings everywhere is not the fact that information on how to make explosive drones or poisons is impossible to find or access. It's also not so hard to buy a car or knife.

Hell you can even find YouTube videos on how exactly uranium enrichment works step by step. Even though some content creators even got police raided for that. Yet we dont see tons of random kids making dirty bombs.

PS: Cody's Lab: Uranium Refining:

https://archive.org/details/cl-uranium


you cannot compare making nuclear weapons to modifying viruses to be more lethal. It is vastly cheaper to modify viruses and the knowledge is bottleneck vs nukes were the knowledge of how to make them is very widespread but getting the materials is very hard.

another example is if a LLM could tell you exactly how to build a tabletop laser device that could enrich uranium for a few hundred thousand dollars.


LLMs are not AGIs. LLMs can only ever tell you how to build a device to enrich uranium for few hundred thousand dollars if this information was already public knowledge and LLM was trained on it. Situation is the same with building biolab tech for few hundred thousand dollars. Also if there was an actor who have few millions already they wouldn't have any problem to get their hands on any LLM or scientist who able to build it for them.

The only "danger" LLM "safety" can prevent is generation of racist porn stories.


With the vast amounts of data LLMs are trained on they make it much easier for people to find harmful and dangerous information if they aren't filtered. See

https://en.wikipedia.org/wiki/Separation_of_isotopes_by_lase...


Societies basic entry barrier: easy enough to make sure the dumb person who hasn't achieved anything in life can't do it but not relevant who is smart enough to make it in society who circumvents it if they want.

It's the same with plenty of other things.


> It's a bag of data and some math algorithms that generate text....

That describes almost every web server.

To the extent that this particular maths produces text that causes political, financial, or legal harms to their interests, this kind of testing is just like any other accepting testing.

To the extent that the maths is "like a human", even in the vaguest and most general sense of "like", then it is also good to make sure that the human it's like isn't a sadistic psychopath — we don't know how far we are from "like" by any standard, because we don't know what we're doing, so this is playing it safe even if we're as far from this issue as cargo-cults were from functioning radios.


Can't read the full article due to paywall but ostensibly it's due to bias rules on race and not visa rules? Sounds like visas being abused and then backstopped by unrelated rules does not mean the visa rules shouldn't be fixed.


You joke but I have seen this far too many times... in very highly valued startups...

100% line coverage though!


Not really. Do you think that this is trivial at AWS scale? What do you do when people hit their hard spend limits, start shutting down their EC2 instances and deleting their data? I can see the argument that just because its "hard" doesn't mean they shouldn't do it, but it's disingenuous to say they're shady because they don't.


At AWS engineering scale they can absolutely figure it out if they have the slightest interest in doing so. I've heard all the excuses — they all suck.

Businesses with lawyers and stuff can afford to negotiate with AWS etc. when things go wrong. Individuals who want to upskill on AWS to improve their job prospects have to roll the dice on AWS maybe bankrupting them. AWS actively encourages developers to put themselves in this position.

I don't know if AWS should be regulated into providing spending controls. But if they don't choose to provide spending controls of their own accord, I'll continue to call them out for being grossly irresponsible, because they are.


People kept up bringing this argument since the very beginning when people already asked for this feature. This used to be the most upvoted request on AWS forums with AWS officially acknowledging (back in 2007 IIRC), "We know it's important for you and are working on it". But they made a decision not to implement it.

The details don't matter, really. For those who decide to set up a hard cap and agree to its terms, there could be a grace period or not. In the end, all instances would be shut down and all data lost, just like in traditional services when you haven't paid your bill so you are no longer entitled to them, pure and simple.

They haven't implemented and never will because Amazon is a company that is obsessed with optimization. There is negative motivation to implement anything related to that.


That, and AWS just doesn’t really care for people with a spending limit as their customers, which is entirely reasonable.

Just forgiving all the ridiculous bills is a much better (and cheaper) strategy.


They've had two decades to figure it out. For EC2, they could shut down the instance but keep storage and public IPs. It shouldn't be too hard to estimate when the instance has to be stopped to end up with charges below the hard limit.


Now multiply the logic required by multiple cost factors across hundreds of services.


Imagine how impossible it is to actually build those services, with all their logic, if just this added logic is impossible :)


I didn't say it was impossible or even intractable. Its simply not as easy as everyone "why don't they justs."


When it's their money Azure manage to have a spending limit...

> The spending limit in Azure prevents spending over your credit amount. [1]

Once it's your money miraculously the spending limit is no longer available...

> The spending limit isn’t available for subscriptions with commitment plans or with pay-as-you-go pricing. [1]

[1]: https://learn.microsoft.com/en-us/azure/cost-management-bill...


> Do you think that this is trivial at AWS scale?

What a ridiculous point. AWS achieves non-trivial things at scale all the time, and brag about it too.

So many smart engineers with high salaries and they can't figure out a solution like "shut down instances so costs don't continue to grow, but keep the data so nothing critical is lost, at least for a limited time"?

Disingenuous is what you are writing - oh no, it's a hard problem, they can't be expected to even try to solve it.


> What a ridiculous point. AWS achieves non-trivial things at scale all the time, and brag about it too.

Many companies achieve non-trivial things at scale. Pretty much every good engineer I speak to will list out all the incredibly challenging thing they did. And follow it up with "however, this component in Billing is 100x more difficult than that!"

I've worked in Billing and I'd say a huge number of issues come from the business logic. When you add a feature after-the-fact, you'll find a lot of technical and business blockers that prevent you doing the most obvious path. I strongly suspect AWS realised they passed this point of no return some time ago and now the effort to implement it vastly outweighs any return they'd ever hope to see.

And, let's be honest, there will be no possible implementation of this that will satisfy even a significant minority of the people demanding this feature. Everyone things they're saying the same thing but the second you dig into the detail and the use-case, everyone will expect something slightly (but critically) different.


> "however, this component in Billing is 100x more difficult than that!"

Simply claiming this does not make it true. Anyway, the original claim was simply that it is not trivial. This is what is known as moving the goalposts, look it up.

> let's be honest, there will be no possible implementation of this

Prefixing some assertion with "let's be honest" does not prove it or even support it in any way. If you don't have any actual supporting arguments, there's nothing to discuss, to be honest.


> Simply claiming this does not make it true.

The people "claiming" this actually worked on it. I read a post from HN just yesterday talking about the complexities of billing. Look it up.

> If you don't have any actual supporting arguments

You can read other responses in this post. Look it up.


> Disingenuous is what you are writing - oh no, it's a hard problem, they can't be expected to even try to solve it.

I find it funny people bring this pseudo-argument up whenever this issue is discussed. Customers: "We want A, it's crucial for us". People on the Internet: "Do you have any idea how difficult is to implement A? How would it work?" And the discussion diverges into technical details obscuring the main point: AWS is bent on on never implementing this feature even though in the past (that is more than a decade ago) they promised they would do that.


You shut down their instances and keep the data for a week and delete it if they don't pay promptly. It's not very profitable though.


That’s what Hetzner does.


> What do you do when people hit their hard spend limits, start shutting down their EC2 instances and deleting their data?

Yes, why not? I don't see the problem here? If you didn't want that, you could set a higher spending limit.

If they want a little more user-friendly approach they could give you X hours grace.

> You've been above your spending limit for 4 hrs (200%), in 4 hrs your services will go into suspended state. Increase your spending limit to resume.


yes it's trivial for them, they are crazy rich and it's their core competence


set a policy for what happens when the spend limit is reached? not rocket science


Can start with the number of unicorns in USA vs Europe, especially when you take population in to account https://www.failory.com/unicorns


That isn't a concrete example of a regulations that hinder innovation.


What do you think the cause is? Unwashed eggs?


any number of reasons: language barriers, existing American firms anti-competing, smaller domestic markets, less centralisation, and, yes, in some cases, regulation, but, when it comes down to it, it's better to have smaller firms that don't (or less frequently) damage society than larger firms than do, even just from the perspective of wealth distribution.


> it's better to have smaller firms that don't (or less frequently) damage society

I'm not sure about that - I really like my lifestyle which would be nearly impossible to attain in Europe, but is very attainable for Americans.

I don't see how you're materially better off because you're forced to use foreign companies (Google, Facebook, etc.) instead of having your own.


What are you talking about? I am unable to follow your reasoning, maybe you can walk us through?


I think he's saying that, yes, this regulation means that your own companies are more ethical, but European consumers end up using these less-regulated American companies anyway. this is true, but this problem has started to be solved by the EU anyway, for example, with the Digital Markets and Services Acts


Why are you asking me? And what does 'Unicorns' have to do with innovation anyway?


How many of those unicorns are financial black holes never expecting to turn a profit?

And the inclusion of so many cryptocurrency "unicorns" in that list is also quite telling.


So Estonia is better than the US?


If you can’t afford to pay the current employees that raise, you certainly can’t afford new people - the market rate is the market rate.


The best way to do this is message passing. My current way of doing it is using Aeron[0] + SBE[1] to pass messages very efficiently between "services" - you can then configure it to either be using local shared memory (/dev/shm) or to replicate the log buffer over the network to another machine.

[0]: https://aeroncookbook.com/aeron/overview/ [1]: https://aeroncookbook.com/simple-binary-encoding/overview/


Part of test-driven design is using the tests to drive out a sensible and easy to use interface for the system under test, and to make it testable from the get-go (not too much non-determinism, threading issues, whatever it is). It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation! But the best and quickest way to get to having high quality _behaviour_ tests is to start by using "implementation tests" to make sure you have an easily testable system, and then go from there.


>It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation!

Building tests only to throw them away is the design equivalent of burning stacks of $10 notes to stay warm.

As a process it works. It's just 2x easier to write behavioral tests first and thrash out a good design later under its harness.

It mystifies me that doubling the SLOC of your code by adding low level tests only to trash them later became seen as a best practice. It's so incredibly wasteful.


> As a process it works. It's just 2x easier to write behavioral tests first and thrash out a good design later under its harness.

I think this “2x easier” only applies to developers who deeply understand how to design software. A very poorly designed implementation can still pass the high level tests, while also being hard to reason about (typically poor data structures) and debug, having excessive requirements for test setup and tear down due to lots of assumed state, and be hard to change, and might have no modularity at all, meaning that the tests cover tens of thousands of lines (but only the happy path, really).

Code like this can still be valuable of course, since it satisfies the requirements and produces business value, however I’d say that it runs a high risk of being marked for a complete rewrite, likely by someone who also doesn’t really know how to design software. (Organizations that don’t know what well designed software looks like tend not to hire people who are good at it.)


"Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.

I've seen plenty of horrible unit test driven developed code with a mess of unnecessary mocks.

So no, this isnt about skill.

"Test driven design" doesnt provide effective safety rails to prevent bad design from happening. It just causes more pain to those who use it as such. Experience is what is supposed to tell you how to react to that pain.

In the hands of junior developers test driven design is more like test driven self flagellation in that respect: an exercise in unnecessary shame and humiliation.

Moreover since it prevents those tests with a clusterfuck of mocks from operating as a reliable safety harness (because they fail when implementation code changes, not in the presence of bugs), it actively inhibits iterative exploration towards good design.

These tests have the effect of locking in bad design because keeping tightly coupled low level tests green and refactoring is twice as much work as just refactoring without this type of test.


> I've seen plenty of horrible unit test driven developed code with a mess of unnecessary mocks.

Mocks are an anti-pattern. They are a tool that either by design or unfortunate happenstance allows and encourages poor separation of concerns, thereby eliminating the single largest benefit of TDD: clean designs.


You asserted:

> … TDD is a "design practice" but I find it to be completely wrongheaded.

> The principle that tests that couple to low level code give you feedback about tightly coupled code is true but it does that because low level/unit tests couple too tightly to your code - I.e. because they too are bad code!

But now you’re asserting:

> "Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.

Which feels like it contradicts your earlier assertion that TDD produces low-level unit tests. In other words, for there to be a “unit test” there must be a boundary around the “unit”, and if the code created by following TDD doesn’t even have module-sized units, then is that really TDD anymore?

Edit: Or are you asserting that TDD doesn’t provide any direction at all about what kind of testing to do? If so, then what does it direct us to do?


>"Test driven design" in the wrong hands will also lead to a poorly designed non modular implementation in less skilled hands.

>Which feels like it contradicts your earlier assertion that TDD produces low-level unit tests.

No, it doesnt contradict that at all. Test driven design, whether done optimally or suboptimally, produces low level unit tests.

Whether the "feedback" from those tests is taken into account determines whether you get bad design or not.

Either way I do not consider it a good practice. The person I was replying to was suggesting that it was a practice that was more suited to be people with a lack of experience. I dont think that is true.

>Or are you asserting that TDD doesn’t provide any direction at all about what kind of testing to do?

I'm saying that test driven design provides weak direction about design and it is not uncommon for test driven design to still produce bad designs because that weak direction is not followed by people with less experience.

Thus I dont think it's a practice whose effectiveness is moderated by experience level. It's just a bad idea either way.


Thanks for clarifying.

I think this nails it:

> Whether the "feedback" from those tests is taken into account determines whether you get bad design or not.

Which to me was kind of the whole point of TDD in the first place; to let the ease and/or difficulty of testing become feedback that informs the design overall, leading to code that requires less set up to test, fewer dependencies to mock, etc.

I also agree that a lot of devs ignore that feedback, and that just telling someone to “do TDD” without first making sure that they know that they need to strive to have little to no test setup and few or no mocks, etc., otherwise the advice is pointless.

Overall I get the sense that a sizable number of programmers accept a mentality of “I’m told programming is hard, this feels hard so I must be doing it right”. It’s a mentality of helplessness, of lack of agency, as if there is nothing more they can do to make things easier. Thus they churn out overly complex, difficult code.


>Which to me was kind of the whole point of TDD in the first place; to let the ease and/or difficulty of testing become feedback that informs the design overall

Yes and that is precisely what I was arguing against throughout this thread.

For me, (integration) test driven development development is about creating:

* A signal to let me know if my feature is working and easy access to debugging information if it is not.

* A body of high quality tests.

It is 0% about design, except insofar as the tests give me a safety harness for refactoring or experimenting with design changes.


Don't agree, though I think it's more suble than "throw away the tests" - more "evolve them to a larger scope".

I find this particularly with web services,especially when the the services are some form of stateless calculators. I'll usually start with tests that focus on the function at the native programming language level. Those help me get the function(s) working correctly. The code and tests co-evolve.

Once I get the logic working, I'll add on the HTTP handling. There's no domain logic in there, but there is still logic (e.g. mapping from json to native types, authentication, ...). Things can go wrong there too. At this point I'll migrate the original tests to use the web service. Doing so means I get more reassurance for each test run: not only that the domain logic works, but that the translation in & out works correctly too.

At that point there's no point leaving the original tests in place. They're just covering a subset of the E2E tests so provide no extra assurance.

I'm therefore with TFA in leaning towards E2E testing because I get more bang for the buck. There are still places where I'll keep native language tests, for example if there's particularly gnarly logic that I want extra reassurance on, or E2E testing is too slow. But they tend to be the exception, not the rule.


> At that point there's no point leaving the original tests in place. They're just covering a subset of the E2E tests so provide no extra assurance.

They give you feedback when something fails, by better localising where it failed. I agree that E2E tests provide better assurance, but tests are not only there to provide assurance, they are also there to assist you in development.


Starting low level and evolving to a larger scope is still unnecessary work.

It's still cheaper starting off building a playwright/calls-a-rest-api test against your web app than building a low level unit test and "evolving" it into a playwright test.

I agree that low level unit tests are faster and more appropriate and if you are surrounding complex logic with a simple and stable api (e.g. testing a parser) but it's better to work your way down to that level when it makes sense, not starting there and working your way up.


That’s not my experience. In the early stages, it’s often not clear what the interface or logic should be - even at the external behaviour level. Hence the reason tests and code evolve together. Doing that at native code level means I can focus on one thing: the domain logic. I use FastAPI plus pytest for most of these projects. The net cost of migrating a domain-only test to use the web API is small. Doing that once the underlying api has stabilised is less effort than starting with a web test.


I dont think ive ever worked on any project where they hadnt yet decided whether they wanted a command line app or a website or an android app before I started. That part is usually fixed in stone.

Sometimes lower level requirements are decided before higher level requirements.

I find that this often causes pretty bad requirements churn - when you actually get the customer to think about the UI or get them to look at one then inevitably the domain model gets adjusted in response. This is the essence of why BDD/example driven specification works.


What exactly is it wasting? Is your screen going to run out of ink? Even in the physical contruction world, people often build as much or more scaffolding as the thing they're actually building, and that takes time and effort to put up and take down, but it's worthwhile.

Sure, maybe you can do everything you would do via TDD in your head instead. But it's likely to be slower and more error-prone. You've got a computer there, you might as well use it; "thinking aloud" by writing out your possible API designs and playing with them in code tends to be quicker and more effective.


>What exactly is it wasting?

Time. Writing and maintaining low level unit tests takes time. That time is an investment. That investment does not pay off.

Doing test driven development with high level integration tests also takes time. That investment pays dividends though. Those tests provide safety.

>Sure, maybe you can do everything you would do via TDD in your head instead. But it's likely to be slower and more error-prone.

It's actually much quicker and safer if you can change designs under the hood and you dont have to change any of the tests because they validate all the behavior.

Quicker and safer = you can do more iterations on the design in the available time = a better design in the end.

The refactoring step of red, green, refactor is where the design magic happens. If the refactoring turns tests red again that inhibits refactoring.


> It's well known that you should likely _delete these tests_ once you've written higher level ones that are more testing behaviour than implementation!

Is it? I don't think I've ever seen that mentioned.


Put simply, doing TDD properly leads to sensible separation of concerns.


I did some research in to this this year in the context of maybe trying to start a business to solve this - and this was the conclusion I came to. There’s lots of threads here on HN about it too. It’s a structural, market-wide issue where the primary service Ticketmaster provide is reputation laundering, and in return, large agents and promoters agree to continue to use Ticketmaster despite their reputation.

I don’t know what the solution is.


Like, don't go to overpriced concerts? How much more obvious than this should the solution be? None in that chain create value, so no need for you to feed their greed.


> There are many instances I've encountered where two pieces of code coincided to look similar at a certain point in time. As the codebase evolved, so did the two pieces of code, their usage and their dependencies, until the similarity was almost gone

https://connascence.io/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: