25.4.4
维护将于8小时内结束. 但我需要承认, 这是截至目前为止首次失败的项目维护.
维护的最初目标是使用toolbench数据集训练qwen2moe模型并量化, 但在几天的尝试里:
- swift训练->自动量化没有成功. swift承认他们的算法存在问题, 不知道要等到什么时候修.
- swift训练->手动量化没有成功. gptqmodel似乎不能正确识别qwen2moe的叠加层, 输出的模型也没办法推理.
- gptq模型->qlora没有成功. 似乎也是因为qlora不能识别qwen2moe量化后的叠加层.
- awq干脆就是不支持.
我已经基本上试过了所有能试的方案, 没有一个能跑通. 有issue称旧的autogptq库能解决, 但它已经deprecated了, 我实在不想去找.
因此这次维护更新是失败的. 善后处理如下:
- 回退mfocus模型到基础qwen2moe量化版本. 其表现可能仍然不佳.
- 我会持续关注swift对该问题的修复commit.
- qwen3和qwen3moe预计在本月就会面世. 我们预计会随DAA3将核心与agent模型更换到qwen3以解决问题.
在维护期间, 我们继续对项目的整体结构进行了枝节上的改良, 不再列举.
25.4.4
The maintaince will finish in around 8 hours. But I have to admit that we failed this time.
We originally aimed to train Qwen2Moe on toolbench dataset, then quantize it for both quality and performance. But:
- The swift quantization failed. The developers admit it's an issue, but idk when they'll patch it.
- The optimum.gptq quantization failed. It has wrong recognition upon qwen2moe layers, causing the model corrupted.
- QLoRa on the quantized model failed. It just won't recognize those quantized layers.
- awq simply does not support qwen2moe.
I have tried almost every method I could, and none worked. Some old issues say that the old autogptq module could work but it's deprecated already.
So I've no choice but to call it a failure. After that:
- Reverted MFocus model to basic quantized Qwen2Moe, which may still perform weak in reasoning.
- I'll keep watching swift's commits on this issue.
- Qwen3 is likely coming soon this month. We'll likely switch to Qwen3 after DAA3 training. It's an once-and-for-all solution.
We made more minor fixes and adjustments on the entire project, omitted here.