C
Cisco
2026-03-25
Technology Integration Impact: Important Strength: Medium Conf: 90%

Cisco Validates Rapid Fine-tuning on Private AI Infrastructure with NVIDIA

Summary

Cisco IT partnered with NVIDIA to achieve 2-5 hour end-to-end embedding model fine-tuning using Nemotron RAG recipe on a single H200 GPU. The solution uses 120B parameter local LLM for synthetic data generation without manual labeling, improving NDCG@1 by 7.3 absolute points. Validates rapid domain-specific retrieval optimization on private AI infrastructure.

Key Takeaways

Cisco IT evaluated embedding model fine-tuning using NVIDIA's Nemotron RAG recipe, comprising five stages: synthetic data generation, data preparation, contrastive fine-tuning, BEIR evaluation, and ONNX model export.
Experiments completed in 2-5 hours on a single NVIDIA H200 143GB GPU within Cisco AI Pod (Cisco UCS 885A system). Used 120B parameter local LLM for synthetic data generation without manual labeling or external API costs.
Fine-tuning NVIDIA's 1B parameter NV-EmbedQA model on 925 documents improved NDCG@1 by 7.1-7.3 absolute points (9.9%-11.1% relative), Recall@10 by up to 6.8 points (8.5%), and MAP@10 by up to 6.5 points (9.7%).

Why It Matters

Validates technical feasibility of rapid fine-tuning on private AI infrastructure, strengthens Cisco AI Pod's value proposition in enterprise AI deployment, and promotes industry shift towards localized AI optimization solutions.
Source: Cisco Blog
View Original →

💬 Comments (0)