Huawei and Hubei Mobile Validate AI Inference Acceleration: External KV Cache Boosts Throughput 372%
Huawei and Hubei Mobile completed the first operator AI inference acceleration trial, using OceanStor A800 storage and Ascend A3 supernode with UCM to externalize KV Cache to PB-level storage, achieving up to 372% TPS improvement for long-context inference on GLM-5.1 and MiniMax M2.5 models.