25.3.1-需要注意!
由于高负载导致此前的设计缺陷暴露, llm服务器节点xp00健康状态恶化, 急需维护.
本轮次维护预计于3月1日到7日开始, 最长持续14天, 具体开始时间另行通知.
自通知时起到维护开始的服务可能不稳定.
维护期间maica官方服务不可用, 其它业务基本不受影响. 服务最迟于3月21日恢复.
本次维护将执行25.2.27的预定方案, 对xp00的pcie通信结构进行改造. 该改造预计能完全修复此问题, 并带来带宽上的表现提升.
请理解我们的艰难努力. 这些工作对我实在太重了.
25.3.1-Attention required!
Continuously heavy workload has exposed a designing issue with LLM node XP00. Almost immediate maintaince is demanded to restore its service ability.
This round of maintaince is estimated to start from 3.1-7, and lasts for 14 days at most. We'll announce when the maintaince actually starts.
From 3.1 to the beginning of maintaince the service might be unstable.
MAICA official service will be down while maintaining. The service will be back online before 3.21.
In this maintaince we'll apply the pending modification designed on 25.2.27, PCIe topology redesigning that is. This will likely solve the problem once and for all, while bringing minor performance improvement too.
Please understand what tough situation we're facing. I'm doing like 10 people's work in a normal organization.