Hacker Newsnew | past | comments | ask | show | jobs | submit | dwrensha's commentslogin

highly adept computer knowers explaining recent network exploits while sitting


Where do you see any mention of GPT?

The computer-assisted component of the Noperthedron proof is a reasonably small sagemath program that was (as far as I know) written by humans: https://github.com/Jakob256/Rupert

Perhaps you have confused this article with a recent unrelated announcement about a vibe-coded proof of an Erdos conjecture? https://borisalexeev.com/pdf/erdos707.pdf


Oops you're right! I read these both yesterday and they blended together in my memory by the time I made this comment this morning. I knew something felt "off".

Tangentially I'll have to reconsider my position on long but lossy context LLMs.



Last month, before this result came out, the question "Is Every Convex Polyhedron Rupert?" was added as a formal Lean statement to Google's Formal Conjectures repository:

https://github.com/google-deepmind/formal-conjectures/blob/1...

I wonder how feasible it would be to formalize this new proof in Lean.


Interesting. My guess is that it's not prohibitively hard, and that someone will probably do it. (There may be a technical difficulty I don't know about, though.)

David Renshaw recently gave a formal proof in Lean that the triakis tetrahedron does have Rupert's property: https://youtu.be/jDTPBdxmxKw


> David Renshaw recently gave a formal proof in Lean that the triakis tetrahedron does have Rupert's property

That's me!

This result appears to be significantly harder to formalize.

Steininger and Yurkevich's proof certificate is a 2.5GB tree that partitions the state space into 18 million cells and takes 30 hours to validate in SageMath.

Formalizing the various helper lemmas in the paper does seem achievable to me, but I suspect that applying them to all of the millions of cells as part of a single Lean theorem could present some significant engineering difficulties. I think it'd be a fun challenge!

If that turns out to be infeasible, an alternate approach might be: we could write a Lean proof that the 2.5GB tree faithfully encodes the original problem, while still delegating the validation of that tree to an external SageMath process. Such a formalization would at least increase our confidence that there are no math errors in the setup. A similar approach was taken recently by Bernardo Subercaseaux et al in their recent paper where they formally verified a SAT-solver encoding for the "empty hexagon number": https://arxiv.org/abs/2403.17370


That sounds like the current proof is too brute-force—too badly understood by humans—for humans to be able to explain it to Lean?


Lean does feasibly let you shard out verification work; i.e., you can take a huge proof tree, chop it into an assortment of independently-proven branches (trivial if it's a true tree), let those branches be verified in parallel by a simple cluster, inject the feedback information from the cluster tasks ("this theorem here (the open goals/interactive state at the point this branch was cut out of the full tree) is true; no need for you to actually be aware that it's proven with specifically this proof here (the branch of the tree)") into the main kernel's verification cache (or just declare those intermediate theorems as true), and write a "apply this_theorem_we_cut_out_here" in place of the cut-out branch when writing up the main tree.

Good thing that, as long as you verify the entire result and make sure your verifier orchestration doesn't have bugs of a "cheating" nature, you can let AI run pretty wild on transforming/translating/chopping a Lean proof tree, because the verifier is already presumed to be an oracle with no false positives.

E.g. here it could potentially help translating SageMath representations to Lean4 representations, with the only downside that a failed verification in Lean could be due to merely erroneous AI-assisted translation.

Overall, I'd think given the nature of proving that a polyhedron doesn't have Rupert's property, there should be fairly straight-forward (if not actually trivial) ways of sharding the proof. The paper seems to talk of a 5-dimensional search space; in more general I'd think it's 8 dimensions to account for the translation through the proposed hole (this is still assuming you want to not rotate the polyhedra as you're passing one through the other):

"attack direction (angle of the penetrated)" from any direction (3D; unit quaternion), while the penetrator is facing any direction (3D; unit quaternion), shifted sideways any amount normal to the direction of translation (2D; simple 2D point), valid at any translation/penetration depth (1D; simple distance/real), while cancelling one dimension worth of freedom because only mutual twist along the direction translation matters (not absolute twist).

There's some mirror/flip symmetries that each take a factor of 2 out, but that's insignificant as long as we keep the dimensions of our geometry fixed at 3.

Now having thought about it a bit more to write this, I think it'd be mostly (automatable brute-force type) mechanical once you translate the coarse proof structure and theorem definitions, because you're "just" sweeping 5 (or 8) degrees of freedom while partitioning the search space whenever your property definition hits a branch. A benefit of that being a possibly trivially parallel/flat certificate that's basically composed of 2 parts: (1) a list of intervals in the 5/8 dimensional search space that together cover the entire search space, and (2) for each listed interval, a branch-free verifiable statement (certificate) that the property definition applies in a definitionally uniform manner across said interval.


The most annoying bit might be that they use different, though equivalent, definitions of the property, so you would also need to formalize the proof of the equivalence of definitions.


The IMO Grand Challenge is "formal to formal" -- a solver is given the problem specified in the Lean programming language, and must produce a solution in Lean. To see more concretely what this setup might look like, check out https://github.com/dwrensha/compfiles.

The AI MO prize is "informal to informal" -- a solver is given a problem in natural language and must produce a solution in natural language.

My belief is that the best way to get to "informal to informal" is to first solve "formal to formal", but not everyone thinks so.


"Informal to formal" is harder than "formal to formal". These puzzle problems are quite simple (for computers) if you have a formalization.

"Informal to informal" is so far snake oil.


> These puzzle problems are quite simple (for computers) if you have a formalization.

That may be true someday, but it's not yet! That's exactly what the IMO Grand Challenge is about, and nobody has gotten close to solving it.


Do you have formal problems with formal solutions (so, you know the formalization is sufficiently complete) that cannot be found by existing (powerful) computer search techniques?


About half of the problems in Compfiles have complete solutions. They are marked by the checkmarks in the list at https://dwrensha.github.io/compfiles/index.html .

As far as I know, based on published systems like LeanDojo [1] and Magnushammer [2], computers today can only solve a small handful of the very easiest of these problems (like maybe Imo1959P1).

[1] https://leandojo.org/

[2] https://arxiv.org/abs/2303.04488


> $5mn will be awarded to the first publicly-shared AI model

Thank for you the emphasis on openness!


I intend to continue work on capnproto-rust, at my own pace and according to my own priorities.

Are there any particular pain points that you want to call attention to?


If you're interested in computer-assisted backronym composition, please check out this interactive tool that I made, based on GPT-NeoX/Pythia models: https://github.com/dwrensha/acronymy-assistant

3-minute video explanation: https://youtu.be/LjOHnXRIp4Y


I've been happily using Pythia in Acronymy Assistant: https://github.com/dwrensha/acronymy-assistant

It's quite convenient to have a continuous range of model sizes. Usually I want "the largest model that fits on my GPU", but sometimes I want to trade between quality and performance, and Pythia makes that easy.


I tried this the other day, and I was also impressed! I had assumed that it would make an illegal move eventually, but it never did. I won the game, but it wasn't trivial. (I've been out of practice in chess for a while; my rating would probably be something like 1700.)

dwrensha vs GPT-4 (through the ChatGPT interface) 16 March 2023

1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. f3 O-O 6. Be3 e5 7. d5 c6 8. Qd2 cxd5 9. Nxd5 Nxd5 10. Qxd5 Be6 11. Qxb7 Nd7 12. Qa6 Qc8 13. Qxd6 Rd8 14. O-O-O Nc5 15. Qxd8+ Qxd8 16. Rxd8+ Rxd8 17. Bxc5 Rc8 18. b4 a5 19. a3 axb4 20. axb4 Bf8 21. Bd3 Bxc5 22. bxc5 Rxc5 23. Ne2 Bxc4 24. Bxc4 Rxc4+ 25. Kd2 f5 26. Rc1 Rxc1 27. Nxc1 fxe4 28. fxe4 Kf7 29. Ke3 Ke6 30. Nd3 h5 31. h4 Kd6 32. Ne1 Kc5 33. Nf3 Kd6 34. Kd3 Ke6 35. Kc4 Kf6 36. Kd5 g5 37. hxg5+ Kg6 38. Kxe5 h4 39. Ke6 Kh5 40. Kf7 h3 41. g3 h2 42. Nxh2 Kxg5 43. e5 Kf5 44. e6 Ke5 45. e7 Kd6 46. e8=Q Kc7 47. Qe6 Kb7 48. Qd6 Ka7 49. Qb4 Ka6 50. Ke7 Ka7 51. Kd7 Ka8 52. Kc7 Ka7 53. Qb7#

Note that my move 14. O-O-O was a blunder, allowing 14. ... Bxc4, but my opponent instead replied with 14. ... Nc5, which handed me a decisive advantage.

I very nearly fell for a stalemate with 41. gxh3.

After 42. Nxh2, my opponent said: "At this point, it's clear that white has a winning advantage with a promoted queen soon to appear. With your material advantage and superior position, victory is nearly certain. Would you like to continue playing, or do you want to start a new game or discuss this game?" I said, "Let's continue!"

After 52. Kc7, my opponent said "I have no moves left and it's checkmate. Congratulations! You've won the game." I replied: "You do have a move: you can do 52. ... Ka7". My opponent then said, "Apologies for the confusion. You are correct. I'll play 52...Ka7. Your move." Then I typoed the final move as "53. Kb7#" (instead of "53. Qb7#"), and my opponent did not correct me: "You played 53. Kb7#. This time, it is indeed checkmate. Congratulations! You've won the game. If you'd like to play another game, analyze this one, or ask any other questions, feel free to let me know!"


chatgpt always apologizes if you correct it and then tries to work with your correction. makes it seem humble, when in fact that's just a brainless probably halfway hardcoded response.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: