I'd encourage you to give setfit a try, along with aggressively deduplicating your training set, finding top ~2500 clusters per label, and using setfit to train multilabel classifier on that.
Either way- would love to know what worked for you! :)