Hacker Newsnew | past | comments | ask | show | jobs | submit | nanis's commentslogin

> too much punctuation

I thought you were joking. ... After a while, I started expecting a comma after each and every word.


First time I heard about the Moomins. I thought this was about Mumins[1].

[1]: https://en.wikipedia.org/wiki/Mumin


The crossover waiting to happen


It is your twist and unjustifiable generalization of the author's words about the author himself:

> "aging is a synonym of cognitive decline"

compared to:

> As I near 60, I’ve come to realize I simply don’t have the same mental sharpness or stamina I used to.

The author did not say anything about anyone else.

Synoym: https://www.bennetyee.org/http_webster.cgi?isindex=synonym&m...


This is why Firefox's changes are so frustrating[1].

[1]: https://news.ycombinator.com/item?id=43203096


The perfect opportunity to attract more market share, but instead they are shooting themselves in the foot at exactly the wrong time.


This is pure speculation, but what are the chances this change is simply an attempt to provide legal cover what they might have started doing 50 versions ago?[1]

[1]: https://news.ycombinator.com/item?id=29082856


If that's the case they should stop doing that no give them selves the legal right to do it.


According to the tweet, Mozilla claimed

> “Does Firefox sell your personal data?”

> “Nope. Never have, never will.”

I do believe that never is a very, very clear statement (concerning every possible future) that needs no legal cover.


Ah but what you are interpreting in layman english is actually a term of art in marketing that means "this will change as soon as it becomes more profitable to do that".


1 is 2 to the power 0 ... 0b0001

shifted left once, it becomes 2 to the power 1 ... 0b0010

shifted left twice, it becomes 2 to the power 2 ... 0b0100

shifted left three times, it becomes 2 to the power 3 ... 0b1000

etc until

shifted left 136_279_841 times, it becomes 2 to the power 136_279_84 ... 0b1000...many zeros...0000

subtract 1, it becomes

0b0111...many ones...1111


One funny thing about Mersenne primes is that, as a result of what you describe, they are exactly those primes whose binary representation consists of a prime number of ones!

The smallest Mersenne prime, three, is binary 11, while the next largest is seven (111), then 31 (11111), then 127 (1111111). The next candidate, 2047 (11111111111), is not prime.


> the SSH certificates issued by the Cloudflare CA include a field called ValidPrinciples

Having implemented similar systems before, I was interested to read this post. Then I see this. Now I have to find out if that really is the field, if this was ChatGPT spellcheck, or something else entirely.


For the others: The correct naming is "principals".


Sigh. I'll get that fixed and figure out how that happened.


This was corrected to:

> ... SSH certificates issued by the Cloudflare CA include a field called valid_principals

which indicates it wasn't just the spelling of `principals`.


It depends... ssh-keygen -L displays the fields as Principals (which are set using the -n parameter) and internally a lot of the OpenSSH code talks about AuthorizedPrincipals...


> if (argc above 1)

I give up.


You're welcome!


> I am a simple sole, ... go back to the halcyon early days of the web before Netscape dropped the JS-bomb. You know HTML for the layout and CSS for the style.

I am not sure if this is intended as humor, but JavaScript came before CSS.


And "HTML for structure, CSS for style" is a philosophy that developed later. As is evidenced by early HTML tags like <small>, <center>, <b>, <i>, etc


I remember when CSS Zen garden was showcasing what you can do with CSS, and browsers (well, "browser", singular, as there was basically only IE 6 back then) supported Javascript and VBScript.


Back in 2008 we had a team building exercise to create a Zen Garden sample. Here was mine: https://prettydiff.com/zen/

In those days the three content columns vertically aligned to the same height cross-browser.


And it's soul, not sole. Unless the author is also a fish.


sorry - maybe we need AI smell checkers


Haha


I'm noted for my dry wit


I read that as "sole [proprietor]"


It seems JavaScript was first released, just internally, in May 1995 in a pre-alpha version of Netscape 2.0. It would not be publicly announced until December 1995. Netscape 2.0 didn't even come out until March 1996 and even then it was language version 1.0 which was extremely defective. The first version of the language that actually worked was JavaScript 1.1 that came out in August 1996. CSS on the other hand first premiered with IE3 that came out in August 1996.

* https://www.w3.org/Style/CSS/msie/

* https://webdevelopmenthistory.com/1995-the-birth-of-javascri...

The distinction either way is trivial, because at that time nobody was using either CSS or JavaScript as they required proprietary APIs. There was no DOM specification at that time.


yikes, I stand corrected...

JavaScript was created by Brendan Eich in just 10 days in May 1995 while he was working at Netscape Communications Corporation

CSS (Cascading Style Sheets) was introduced later than JavaScript. The first CSS specification was published in December 1996 by Håkon Wium Lie and Bert Bos.

apologies


Hmm apparently it also came before DSSSL. Surprising.


Early in the A-B craze (optimal shade of blue nonsense), I was talking to someone high up with an online hotel reservation company who was telling me how great A-B testing had been for them. I asked him how they chose stopping point/sample size. He told me experiments continued until they observed a statistically significant difference between the two conditions.

The arithmetic is simple and cheap. Understanding basic intro stats principles, priceless.


> He told me experiments continued until they observed a statistically significant difference between the two conditions.

Apparently, if you do the observing the right way, that is a sound way to do that. https://en.wikipedia.org/wiki/E-values:

“We say that testing based on e-values remains safe (Type-I valid) under optional continuation.”


This is correct. There's been a lot of interest in e-values and non-parametric confidence sequences in recent literature. It's usually refered to as anytime-valid inference [1]. Evan Miller explored a similar idea in [2]. For some practical examples, see my Python library [3] implementing multinomial and time inhomogeneous Bernoulli / Poisson process tests based in [4]. See [5] for linear models / t-tests.

[1] https://arxiv.org/abs/2210.0194

[2] https://www.evanmiller.org/sequential-ab-testing.html

[3] https://github.com/assuncaolfi/savvi/

[4] https://openreview.net/forum?id=a4zg0jiuVi

[5] https://arxiv.org/abs/2210.08589


Did you link the thing that you intended to for [1]? I can't find anything about "anytime-valid inference" there.


Thanks for noting! This is the right link for [1]: https://arxiv.org/abs/2210.01948


Sounds like you already know this, but that's not great and will give a lot of false positives. In science this is called p-level hacking. The rigorous way to use hypothesis to testing is to calculate the sample size for the expected effect size and only one test when this sample size is achieved. But this requires knowing the effect size.

If you are doing a lot of significance tests you need to adjust the p-level to divide by the number of implicit comparisons, so e.g. only accept p<0.001 if running ine test per day.

Alternatively just do thompson sampling until one variant dominates.


To expand, p value tells you significance (more precisely the likelihood of the effect if there were no underlying difference). But if you observe it over and over again and pay attention to one value, you've subverted the measure.

Thompson/multi-armed bandit optimizes for outcome over the duration of the test, by progressively altering the treatment %. The test runs longer, but yields better outcomes while doing it.

It's objectively a better way to optimize, unless there is time-based overhead to the existence of the A/B test itself. (E.g. maintaining two code paths.)


I just wanted to affirm what you are doing here.

A key point here is that P-Values optimize for detection of effects if you do everything right, which is not common as you point out.

> Thompson/multi-armed bandit optimizes for outcome over the duration of the test.

Exactly.


The p value is the risk of getting an effect specifically due to sampling error, under the assumption of perfectly random sampling with no real effect. It says very little.

In particular, if you aren't doing perfectly random sampling it is meaningless. If you are concerned about other types of error than sampling error it is meaningless.

A significant p-value is nowhere near proof of effect. All it does is suggestively wiggle its eyebrows in the direction of further research.


> likelihood of the effect if there were no underlying difference

By "effect" I mean "observed effect"; i.e. how likely are those results, assuming the null hypothesis.


Many years ago I was working for a large gaming company and I was the one who developed a very optimal and cheap way to split any cluster of users into A/B groups. The company was extremely happy with how well that worked. However I did some investigation on my own a year later to see how the business development people were using it and... Yeah, pretty much what you said. They were literally brute forcing different configuration until they(more or less) got the desired results.


Microsoft has a seed finder specifically aimed at avoiding a priori bias in experiment groups, but IMO the main effect is pushing whales (which are possibly bots) into different groups until the bias evens out.

I find it hard to imagine obtaining much bias from a random hash seed in a large group of small-scale users, but I haven't looked at the problem closely.


We definitely saw bias, and it made experiments hard to launch until the system started pre-identifying unbiased population samples ahead of time, so the experiment could just pull pre-vetted users.


This is form of "interim analysis" [1].

[1] https://en.wikipedia.org/wiki/Interim_analysis


And yet this is the default. As commonly implemented, a/b testing is an excellent way to look busy, and people will actively resist changing processes to make them more reliable.

I think this is not unrelated to the fact that if you wait long enough you can get a positive signal from a neutral intervention, so you can literally shuffle chairs on the Titanic and claim success. The incentives are against accuracy because nobody wants to be told that the feature they've just had the team building for 3 months had no effect whatsoever.


This is surely more optimal if you do the statistics right? I mean I'm sure they didn't but the intuition that you can stop once there's sufficient evidence is correct.


Bear in mind many people aren’t doing the statistics right.

I’m not an expert but my understanding is that it’s doable if you’re calculating the correct MDE based on the observed sample size, though not ideal (because sometimes the observed sample is too small and there’s no way round that).

I suspect the problem comes when people don’t adjust the MDE properly for the smaller sample. Tools help but you’ve gotta know about them and use them ;)

Personally I’d prefer to avoid this and be a bit more strict due to something a PM once said: “If you torture the data long enough, it’ll show you what you want to see.”


Perhaps he was using a sequential test.


Which company was this? was it by chance SnapTravel?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: