C
Cisco
2026-03-25
Technology Integration Important Medium 90% Confidence

Cisco Validates Rapid Fine-tuning on Private AI Infrastructure with NVIDIA

Summary

Cisco IT partnered with NVIDIA to achieve 2-5 hour end-to-end embedding model fine-tuning using Nemotron RAG recipe on a single H200 GPU. The solution uses 120B parameter local LLM for synthetic data generation without manual labeling, improving NDCG@1 by 7.3 absolute points. Validates rapid domain-specific retrieval optimization on private AI infrastructure.

Key Takeaways

Cisco IT evaluated embedding model fine-tuning using NVIDIA's Nemotron RAG recipe, comprising five stages: synthetic data generation, data preparation, contrastive fine-tuning, BEIR evaluation, and ONNX model export.
Experiments completed in 2-5 hours on a single NVIDIA H200 143GB GPU within Cisco AI Pod (Cisco UCS 885A system). Used 120B parameter local LLM for synthetic data generation without manual labeling or external API costs.
Fine-tuning NVIDIA's 1B parameter NV-EmbedQA model on 925 documents improved NDCG@1 by 7.1-7.3 absolute points (9.9%-11.1% relative), Recall@10 by up to 6.8 points (8.5%), and MAP@10 by up to 6.5 points (9.7%).

Why It Matters

Validates technical feasibility of rapid fine-tuning on private AI infrastructure, strengthens Cisco AI Pod's value proposition in enterprise AI deployment, and promotes industry shift towards localized AI optimization solutions....

Sign up to view full strategic analysis

Sign Up Free
Source: Cisco Blog
View Original →