With the parallelism model and abstraction it has to support model/data or hybrid partitioning, and synchronous/asynchronous or hybrid training, it should be easy to extend to GPU cluster. However, training is required only periodically, and if it can be done on existing clusters as efficiently, why not?