This model removes the need for expanding the input length. I believe it can be trained to operate on input and output lengths limited only by available data: https://github.com/BlinkDL/RWKV-LM