I think rsync has an option to copy everything but file contents. Couple that with 'find' running 'touch' and it should be fine. (create blank files, copy attributes to them)
You don't need libraries to train word vectors in all languages, you can just load precomputed vectors. In order to measure similarity between two vectors you only need 3 lines of code (it's a simple sum of products for a couple hundred real values).