A
argbe.tech - news1min read
NVIDIA lays out a Nemotron RAG document pipeline from PDFs to citations
NVIDIA published a Nemotron RAG workflow that turns complex PDFs into structured, multimodal retrieval data and citation-backed answers. The guide details the models, infrastructure, and prerequisites for a production-ready document pipeline.
NVIDIA outlined a Nemotron-based document processing pipeline for multimodal RAG that converts complex PDFs into grounded, cited answers.
- The workflow starts with the NeMo Retriever (nv-ingest) stack, using GPU-accelerated extraction that outputs JSON with text chunks, table markdown, and chart images.
- Multimodal retrieval uses
nvidia/llama-nemotron-embed-vl-1b-v2for embeddings andnvidia/llama-nemotron-rerank-vl-1b-v2for cross-encoder reranking; embeddings are 2,048-dimensional vectors per item. - Answer generation is wired to the
nvidia/llama-3.3-nemotron-super-49b-v1.5endpoint on NVIDIA NIM, with a separate Nemotron OCR endpoint available for document extraction. - The reference setup calls for Python 3.10–3.12 (tested on 3.12), an NVIDIA GPU with at least 24 GB VRAM, about 250 GB of disk space, and an NVIDIA API key.
- Example dependencies include
nv-ingest26.1.1,nv-ingest-api26.1.1,nv-ingest-client26.1.1,milvus-lite2.4.12, andopenai1.51.0 or newer.