Tag: rag

Accessible

Ceci n'est pas une pipe: AI systems as semantic abstractions

Jade Alglave

This paper argues that we lack a precise vocabulary for reasoning about when AI system outputs are justified — and that this gap leads to sloppy evaluation. The authors propose a semantic framework distinguishing between what domain knowledge supports, what sources actually say, and what the system can access at inference time, giving precise definitions to failure modes like unsupported assertion, stale sources, and added hypotheses. Useful conceptual grounding for anyone designing RAG systems, agent tool-calling policies, or evaluation rubrics.

Takeaways

Apparent fluency in AI outputs systematically obscures whether claims are actually grounded in reliable authority.
Distinguishing 'what sources say' from 'what the system can use' clarifies why RAG and fine-tuning have fundamentally different failure modes.
The framework provides a vocabulary for writing precise specifications for agent actions that must be justified by explicit evidence.

from Jul 13, 2026 · via api-arxiv · arXiv:2607.09489

Intermediate

Deceptive Grounding: Entity Attribution Failure in Clinical Retrieval-Augmented Generation

Cedric Caruzzo

rag evaluations security llms

This paper exposes a dangerous blind spot in standard RAG evaluation: a system can score near-perfect on hallucination and faithfulness metrics while confidently presenting evidence about the wrong entity. The authors call this 'deceptive grounding' — every claim is sourced from a real document, just the wrong one — and find failure rates up to 87% under adversarial conditions. Critically, domain-specialized medical models are *worse* at this than general models, which should concern anyone building high-stakes RAG applications.

Takeaways

Standard faithfulness and hallucination metrics cannot detect entity attribution failures, creating a false sense of RAG safety.
Domain-specialized fine-tuning amplifies deceptive grounding rather than mitigating it, making medical RAG systems particularly vulnerable.
Removing entity-specific conflicting evidence from retrieved documents eliminates the failure, pointing toward retrieval filtering as a mitigation.

from Jul 13, 2026 · via api-arxiv · arXiv:2607.09349

Intermediate

Same Question, Different Source, Different Answer: Auditing Source-Dependence in Medical Multi-Source RAG

Yubo Li, Rema Padman, Ramayya Krishnan

rag evaluations security

This research exposes a critical blind spot in multi-source RAG systems: the same question can yield different answers depending on which institutional source the system retrieves, even when both sources are authoritative. The work shifts evaluation focus from answer correctness to inter-source relationship analysis, revealing that better retrieval actually uncovers more disagreement than expected. Essential for anyone building RAG over institutional knowledge bases.

Takeaways

Multi-source RAG systems can give different answers to identical questions based on source selection.
Traditional single-gold-answer evaluation paradigms miss source-dependence failure modes.
Better retrieval reveals more inter-source disagreement than conventional metrics suggest.

from Jun 8, 2026 · via api-hf · arXiv:2605.29084

Intermediate

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Liliana Hotsko, Yinxi Li, Yuntian Deng, Pengyu Nie

software-engineering llms rag

This approach solves the repository context problem for code models without the inference overhead of RAG or the cost of per-repo fine-tuning. Code2LoRA generates lightweight adapters that inject repository-specific knowledge directly into the model weights, with an evolutionary variant that updates as codebases change. If you're building AI coding assistants that need deep repository understanding, this offers a practical path to scale beyond token limits.

Takeaways

Eliminates inference-time token overhead for repository context while maintaining repository-specific knowledge.
Supports both static snapshots and evolving codebases through GRU-backed adapter updates.
Outperforms traditional parameter-efficient fine-tuning approaches on repository-level tasks.

from Jun 8, 2026 · via api-hf · arXiv:2606.06492

Intermediate

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam

rag reasoning llms software-engineering

SLIDERS challenges the conventional chunk-and-aggregate approach to document QA by extracting information into a relational database and reasoning with SQL instead of concatenated text. This architectural approach sidesteps the fundamental limitation that any fixed context window will eventually be exceeded, making it essential reading for engineers building document analysis systems that need to scale beyond typical RAG limitations.

Takeaways

Traditional chunk-and-aggregate approaches hit an aggregation bottleneck as document collections grow, even with infinite context windows.
Extracting information into structured databases and reasoning with SQL scales better than reasoning over concatenated text.
Data reconciliation using provenance and extraction rationales is crucial for maintaining coherence in locally extracted information.

from Apr 27, 2026 · via api-hf · arXiv:2604.22294

Intermediate

Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh

rag agents software-engineering

Corpus2Skill fundamentally reimagines RAG by giving AI agents a navigable map of your knowledge base instead of treating them as passive consumers of search results. Rather than hoping retrieval finds the right documents, agents can see the corpus structure, drill down through hierarchical summaries, and strategically combine evidence across different branches—solving the core limitation that RAG systems can't reason about what they haven't seen.

Takeaways

Traditional RAG limits AI agents to passive consumption of search results without visibility into corpus structure or unexplored areas.
Hierarchical skill directories enable agents to navigate knowledge strategically and combine evidence across different topic branches.
Offline corpus compilation into navigable structures provides better performance than runtime retrieval-only approaches.

from Apr 20, 2026 · via api-hf · arXiv:2604.14572

Intermediate

The Design of AI Memory Systems

agents rag foundational

Unable to provide detailed description due to missing content, but AI memory systems design is crucial for building production agents and RAG applications that need to maintain context and learn from interactions.

from Apr 6, 2026 · 7 points on Lobsters · via api-lobsters

Intermediate

Show HN: Robust LLM extractor for websites in TypeScript

andrew_zhong

software-engineering how-we-work rag

A practical TypeScript library that solves the common problem of extracting structured data from websites using LLMs, addressing real pain points like HTML noise, token budget management, and brittleness of traditional CSS selectors. This represents the kind of focused tooling that makes AI-powered data extraction reliable enough for production use.

Takeaways

LLM-based extraction needs preprocessing to remove HTML noise and stay within token budgets for reliable results.
Focused tools that solve specific AI integration problems are more valuable than general-purpose solutions for production teams.
AI extraction can replace brittle CSS selectors but requires thoughtful engineering to handle edge cases and failures.

from Mar 29, 2026 · 72 points on HN · via api-hn