Actually, very few workflows at film studios rely on batch renders these days. Far more important to artist productivity are interactive workflows - being able to work in context at scale in and between DCC's, and therefore load speed was actually a huge design consideration for usd, and integration of USD's gl imaging system, Hydra Stream, into workflows, has been a game changer for many interactive workflows. Check out the paragraph "Maximize artistic iteration by minimizing latency" on the USD website's front page... or better yet, actually try out USD (usdview) with a modern allocator like jemalloc (since both ingestion and imaging in usd extensively leverage multithreading).
Although USD's native binary format compresses integer topology data, mesh (and curve) point, normal, and uv data is laid out in arrays that can be directly uploaded to a gpu from the file system with zero copies or processing.
It is true that there are several more software layers in USD between file-open and mesh-data-extraction, but to the earlier "forward looking" comment, what you get for that are built on features for allowing clients to select variations with smart/sparse updating, and serious scalability features, for larger scenes. These are all features that have been added since USD's release, and there is still quite a lot of development underway.
No knocks on glTf, which does a great job at what it's designed for. But maybe Apple is looking at a different future?
I found the layer and streaming features interesting. It seems to me like that's the big win over gltf. Correct me if I'm wrong, but it seems like USD might support things like streaming LODs or even tiling long term.