This is a neat project, but with all due respect: I don't think the output can be convincingly called a "face". It would be interesting to see this attempted in a higher resolution.
Your comment makes me wonder if maybe we should start thinking beyond techniques designed to recognize faces and instead brainstorm how we would get a machine to learn not how to recognize but how to reproduce. I reckon going about it this way would be instructive on how to improve recognition since recognition and reproduction are two highly related activities.
Parts of machine learning used to be focused on density estimation / manifold learning for exactly this reason. It seems to be the case that supervised tasks (for sufficiently complicated and diverse problems e.g. ILSVRC, MSCOCO) implicitly learn some portions of what you would want from a density estimator. This is enough to generalize to new, but related tasks - see the huge body of work on pretrained VGG for {semantic segmentation, style, texture generation, depth/surface normal prediction}.
Learning a general density that is useful is quite difficult - you could be missing entire modes from the data distribution and never know it. You don't know what you don't know, after all. And there is still no clear (in my opinion) evidence that learning a density really helps generalize to a new task, especially when the end task is still a supervised one.
Explicit density modeling seems to help the most in semi-supervised / limited data settings, but any time you can bring some enormous but related labeled data to bear on your supervised problem it seems to win over any kind of density estimation tricks - compare trained from scratch nets for the tasks I mentioned, to ones exploiting pretrained VGG models.
If we really want to get into "what is better" - the state of the art has recently moved significantly past this (paper [1] and blog [2]) - not to mention the recent submissions to ICLR in this realm [3][4]. GAN training is very tricky to get working in practice, especially on new problems, though conditional GAN seems to be a bit more stable in learning than the unconditional relatives.
This blog deserves credit for thinking differently about a problem and getting something useful out of it.
I think the goal of this inversion was to inspect what the cascade is "looking at" in some sense - and for that this does a great job! Inverting a lossy classification feature is nearly always going to be worse than something designed from the ground up to be a generative model.