I’ve been thinking about the following: couldn’t we also allow a dynamic sequence length and just in time compile when we first get an input with a specific sequence length? Further requests should then be compiled. As the sequence length is finite, this would mean that one could either pre-compile every sequence length or “warmup” the serving.
We can do that for sure but it means you may compile the program several times. But it is something I will consider while exploring these ideas. ![]()






















