A
argbe.tech - news
1min read

NVIDIA lays out a Nemotron RAG document pipeline from PDFs to citations

NVIDIA published a Nemotron RAG workflow that turns complex PDFs into structured, multimodal retrieval data and citation-backed answers. The guide details the models, infrastructure, and prerequisites for a production-ready document pipeline.

NVIDIA outlined a Nemotron-based document processing pipeline for multimodal RAG that converts complex PDFs into grounded, cited answers.

  • The workflow starts with the NeMo Retriever (nv-ingest) stack, using GPU-accelerated extraction that outputs JSON with text chunks, table markdown, and chart images.
  • Multimodal retrieval uses nvidia/llama-nemotron-embed-vl-1b-v2 for embeddings and nvidia/llama-nemotron-rerank-vl-1b-v2 for cross-encoder reranking; embeddings are 2,048-dimensional vectors per item.
  • Answer generation is wired to the nvidia/llama-3.3-nemotron-super-49b-v1.5 endpoint on NVIDIA NIM, with a separate Nemotron OCR endpoint available for document extraction.
  • The reference setup calls for Python 3.10–3.12 (tested on 3.12), an NVIDIA GPU with at least 24 GB VRAM, about 250 GB of disk space, and an NVIDIA API key.
  • Example dependencies include nv-ingest 26.1.1, nv-ingest-api 26.1.1, nv-ingest-client 26.1.1, milvus-lite 2.4.12, and openai 1.51.0 or newer.