alexirpan's comments

alexirpan · on March 25, 2024

You can actually do better than O((N!)^2), you can get it in O(n^24^n) because you can reduce the state to finding the next best single player to send in given the unordered set of unused players.

Source: the part of the post literally a few paragraphs down from the quoted section, where I describe how to do this, acknowledge this is fast enough to brute force for real crew battles that run with small N, and then write code to do so ;)

jvanderbot · on March 25, 2024

> O(n^24^n)

is this a typo? That's enormous. Way bigger than factorial.

Let's assume it's a typo.

OK if I understand that correctly, that reduces the computational complexity at the expense of not looking forward far enough to preserve any optimal guarantees. Don't you want to keep in mind the future possibilities?

for example:

Suppose A beats C soundly (80%), and loses to D half the time.

Then suppose B has a 60% chance to beat anyone.

You just lost, and other team has C up, with D, D, D on deck. You have A, B, remaining.

Your maximize your one-step chance against C by playing A, but then have .5x.5x.5 chance of winning against remainig. Total odds: .8x.5x.5x.5 = .1.

Had you played B, you'd have .6x.6x.6x.6 = .12 at best (A does worse if B loses)

That's just something I dreamed up right now. The worst case could be much worse.

It's fair to say a nice POMDP solution is in order here, as that would probably solve the "realest" case.

alexirpan · on March 25, 2024

It's not a typo - specifically the time complexity is O(n^22^n2^n) = O(n^24^n). This is faster than your O(N!N!) proposal.
https://www.wolframalpha.com/input?i=plot+%28n%21%29%5E2+vs+...

To be more specific - you get to O(n^22^n*2^n) by doing dynamic programming, which lets you avoid recomputing extra work for considering future ordering possibilities. This approach still implicitly considers all possible future orderings and acts optimally with respect to that.

You can read the original post for more details - at a high level you are building a cache of win probabilities assuming optimal play, starting from the base case of 1-player crew battles, and extending it by 1 match per step of the recursion until you get back to the top-level problem. It is sufficient to only do computation 1 step ahead if you have a cache for how to act optimally for all n future steps.

jvanderbot · on March 25, 2024

Oh shit I read that as n^{24^n} I was like what are these kids smoking nowadays.

Yes dynamic programming is indeed faster than brute force and I think I can convince myself this is the same approach with a little thought.

I think everything we said here was correct (yes both of us) but I was responding to an incorrect understanding of your strategy (you aren't doing one step ahead you are indeed calculating total expectations).

The brute force was just to show that the overall problem size was small when n is 6, in spirit of the original comment, and that brute force was just fine.

Someone · on March 25, 2024

> Oh shit I read that as n^{24^n} I was like what are these kids smoking nowadays

That is the standard interpretation of x^y^z. Interpreting it as

  (x^y)^z

doesn’t make sense as that is equal to the simpler and shorter

  x^(yz)

(https://math.stackexchange.com/a/1633800)

jvanderbot · on March 25, 2024

Well, he was saying n^24^n = (n^2)(4^n) which makes sense when rendered in latex, but not here. Clearly, I saw n^(24^n)

reaperman · on March 25, 2024

The formatting along this entire thread is so bad due to accidental asterisk-induced italics that I cant follow much of it at all. It would help posters to learn that you can force a * without any risk of italics if you put a backslash in front of the asterisk to escape it.

You can also put four spaces in front of the line to automatically escape all formatting characters.

    *this has four spaces in front of it and no escape characters*

jvanderbot · on March 26, 2024

The authors would like to thank the reviewer for their insightful comments, while the edit window has passed for this particular thread, future work will make use of this trick.

alexirpan · on Feb 17, 2018

Author here. My intent was to say, "When someone I know asks me if RL is a good idea, the answer is no >= 70% of the time." The important missing piece is that it's conditioned on that person asking me in particular. I don't have the largest profile, so the people who ask me about RL are generally well-informed about RL and have some practical ML experience too. Within that set, 70% felt like the right lower bound.