25.10.29
已将vllm回退到0.10.2, 经验证相同问题未再复现.
该问题与此前MFocus性能退化本质相同, 目前仍然没有得到完整解释, 据推测是sm120和vllm注意力后端共同作用的结果.
我没有能力自主修复这一层面的问题, 问题已汇报于 https://github.com/vllm-project/vllm/issues/26930 , 且已见多例.
服务现已重新开启, 你应该不再会遭遇这一问题. 需要注意, 我们清除了十余个可能已发/将发问题的session.
25.10.29
Issue solved with vllm instance reverted to 0.10.2.
This is actually the same issue with previous MFocus quality descalation, and I have yet no definite explaination. I suppose it's a co-result of sm120 GPUs and vllm's attention backend.
I'm not able to fix such an issue of vllm myself, but reported to https://github.com/vllm-project/vllm/issues/26930 . Multiple instances of this issue have been awared.
The service is back online now, and you won't encounter this issue again hopefully. Please note that we have force reset ~10 corrupted / corrupting sessions.