Hacker Newsnew | past | comments | ask | show | jobs | submit | eoerl's commentslogin

Optimization is post hoc here : you have to train first to be able to huffman en ode, so it's not a pure format question


In a lot of countries there are rules, for instance limitations in terms of spending or similar time on air for all candidates. I don't know whether that's the case in Romania, but it is completely possible to rule an election out even if people voted "freely". I know that typically doesn't apply to the US, but there's a world outside of it


These can also be used for machine learning actually (see Dali for data loading for instance)


We (The Eye Tribe folks) sold one at 99$ years ago. 1k-3k is mostly lack of competition I believe.


There are flags[1] for that indeed. It feels like half of the people commenting here don't know all that much about the topic they're commenting upon

1: https://pytorch.org/docs/stable/generated/torch.use_determin...


> It's not perfect but a group of aligned people in the same physical working space will just dominate a similar group spread apart that has to use chats & zoom to communicate. Management has got to be seeing this, in various forms, across multiple business segments.

There's no data on this, at the very least you could mention that it's only your personal impression ?

IMO (and this is clearly a personal take) there are two competing effects: - higher bandwidth and easier to align face to face - more distractions, interruptions, more complicated to get things done

If you're in a business or position where you have no IP or nothing hard to do per say, you'll see the first one dominate. If you're somewhere with IP and competitive advantages through smarts then I'd say (personal again) the second effect can come to dominate.

Google pulling a "no remote" move means to me that their competitive advantage in terms of engineering and smarts is not a priority + using the fact that the market swung back towards employers vs. employees. But not general comment about "this take is obviously so much better", this is just intellectual lazyness I believe


It is not typically possible to blend models like that, since the training process is (lateral) order insensitive, as far as the model goes.


I thought so too until found that there are quite a bit of literatures nowadays about "merging" weights, for example, this one: https://arxiv.org/pdf/1811.10515.pdf and also the OpenCLIP paper.


Is that still the case when all models have a common ancestor (i.e. finetuned) and haven’t yet overfit on new data?


identical outputs, up to float computation shenanigans (not computed in the same order, strictly speaking)


yep, same approach but it arrived 3 days later and there's no mention of the [original PR](https://github.com/huggingface/diffusers/pull/532#issuecomme...), nice. Else the kernels used in that case -upstream flash attention- are not compatible with all nvidia GPU generations, FYI (xformers' cover a wider range and are generally faster or just pull Flash')


did you even peek at the link ? There's a PR on diffusers, and it's mentioned on the front page https://github.com/huggingface/diffusers/pull/532#issuecomme...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: