Nice! Will this work for Triton instances ie can I swap the model loaded to the ...

AllenHW · on Aug 23, 2024

From what I gather, Triton assumes models are stored either in a remote repository or a local folder, and the model loading logic is all kept internal to the server.

Since we use pinned RAM memory for model loading and manage the cache hierarchy, the sever needs to at least make a call to our daemon. So we'd need to fork the Triton Server. But hopefully it'd only take a few lines of change!

I've actually never used Triton Server myself - curious how you have found it so far if you've used it. How does it compare to other alternatives in your opinion?