My guess on a catch would be that it'll focus ONLY the wavelengths that you select for. the others might be diffused or given wildly different focal points. This might mean that for certain types of photography you're limited in what you pick up (e.g. if you've lit a scene with mono-chromatic light that's not one of the selected wavelengths).
They did say in the article that they hope to produce devices that work for a range of wavelengths. This would be great because 3 wavelengths would not be sufficient for capturing natural images. We can approximate the cone responses of natural images with 3 wavelengths, which is why RGB encoding and display works for displaying images to humans. But in capture, you want to receive broad spectrum light at the detector, and then encode it to RGB.
Neither the sensor elements in a camera nor the pixels in a monitor are selective for a single wavelength, but rather a spectrum. So we don't actually approximate cone response that way. The fact that we store the channel values as three integers does not change this.
You're right that monitor pixels aren't individual wavelengths, but the spectrum of a reproduced image can be very different than the spectrum of the natural image. My point was just that you don't want your flat lens harshly filtering that natural image by only operating at 3 highly tuned wavelengths -- that's not going to be enough to capture natural images, despite the fact that we can reproduce them with 3 components.
You're right that monitor pixels aren't individual wavelengths, but the spectrum of a reproduced image can be very different than the spectrum of the natural image. My point was just that you don't want your flat lens harshly filtering that natural image by only working one 3 wavelengths.
I'm guessing there's a catch, but I would love to be wrong.