Workshop

Build a Local AI Research Assistant with Deep Search and a Personal Knowledge Base

Deploy Local Deep Research, configure multi-source agentic search, and build an encrypted, compounding knowledge library

35 min deep-research RAG LangGraph local-llm knowledge-base agentic-search

What's happening

A Research Assistant That Gets Smarter Over Time

Local Deep Research (LDR) has emerged as one of the most capable open-source research assistants available, recently achieving approximately 95% accuracy on the SimpleQA benchmark when paired with GPT-4.1-mini. What distinguishes LDR from conventional retrieval-augmented generation systems is its agentic architecture: rather than following a fixed pipeline — search, retrieve, summarize — LDR's new LangGraph agent strategy allows the underlying language model to autonomously decide which specialized search engines to query, when to switch between them, and when enough evidence has been gathered to synthesize a response.

This matters for a specific reason. Most AI-assisted research tools treat search as a single step: the user asks a question, the system queries one source, and returns a summary. LDR treats search as an iterative reasoning process. It might begin with a broad web search, recognize from the results that the question is biomedical in nature, pivot to PubMed, discover a relevant preprint reference, query arXiv for it, and only then synthesize — all without human intervention.

Equally significant is the personal knowledge base. Every research session produces sources — papers, articles, web pages. LDR allows you to download these directly into an encrypted, indexed library. Future queries then search your accumulated library alongside the live web. The knowledge compounds: each session makes the next one richer.

In this workshop, we will deploy LDR as a fully local stack (Ollama for the LLM, SearXNG for web search), explore its agent strategy architecture, and build a personal knowledge base that grows with use. Everything runs on your hardware, encrypted with AES-256. No data leaves your machine unless you choose a cloud LLM provider.

Deploy the Local Deep Research Stack with Docker

The LDR stack comprises three services: an LLM inference server (Ollama), a meta-search engine (SearXNG), and the LDR application itself. These services communicate over a shared Docker network, forming a self-contained research pipeline.

The architecture follows a pattern common in modern AI applications: separation of inference from orchestration. Ollama handles token generation, SearXNG handles search federation across dozens of upstream engines, and LDR orchestrates the research workflow — deciding what to search, interpreting results, and producing structured reports.

Rather than manually writing Docker Compose configurations, we will have our AI agent generate a deployment script tailored to our hardware. This is a useful exercise in specifying infrastructure requirements through natural language.

Ask your agent

Get the agent to produce a complete Docker Compose deployment for the LDR stack, configured for a fully local (no cloud API) setup with Ollama, SearXNG, and LDR.

Think about it

What three services need to run, and what ports does each require?
How should the services discover each other — host networking or a named Docker network?
What persistent state needs to survive container restarts (model weights, research data, search configuration)?
If you have an NVIDIA GPU, how would you communicate that requirement differently than a CPU-only setup?

What the agent gives back

The agent should produce a Docker Compose file defining three services — ollama (port 11434), searxng (port 8080), and local-deep-research (port 5000) — connected via a shared bridge network. Ollama should have a named volume for model storage. LDR should have a named volume mounted at /data with the LDR_DATA_DIR environment variable set accordingly. The agent should also provide the one-line command to pull a suitable model into Ollama after startup. For GPU users, the agent should include the NVIDIA runtime configuration as a separate override file or a noted modification.

Tip

SearXNG acts as a privacy-respecting meta-search engine. It federates your query across Google, Bing, DuckDuckGo, and others without exposing your identity to any of them. LDR queries SearXNG rather than any search engine directly, which is why the stack remains fully private.

Warning

Ollama requires significant memory. A 20B-parameter model like gpt-oss:20b needs approximately 12–16 GB of RAM (or VRAM for GPU inference). If your machine has limited resources, substitute a smaller model such as llama3.2:3b or phi3:mini. Research quality will decrease, but the workflow remains functional.

Why Ollama instead of calling an API directly?

Ollama provides a standardized OpenAI-compatible API layer over local model inference. This means LDR can use the same client code regardless of whether the backend is a local 7B model or a cloud-hosted GPT-4. The abstraction also allows hot-swapping models without reconfiguring LDR — you simply change the model name in settings. For privacy-sensitive research, local inference ensures that your queries never leave your network. The trade-off is inference speed: a local 20B model on consumer hardware will be substantially slower than a cloud API call, but the latency is acceptable for research workflows where you are willing to wait 30–60 seconds for a thorough answer.

Understand the LangGraph Agent Strategy

LDR offers over 20 research strategies, ranging from simple factual lookup to deep multi-source analysis. The most sophisticated is the LangGraph agent strategy, which represents a fundamental shift from pipeline-based to agent-based research.

In a pipeline strategy, the execution path is predetermined: receive query → search web → collect results → summarize. The system follows the same steps regardless of the query's nature. In the agent strategy, the LLM itself becomes the orchestrator. It receives the query, reasons about which information sources are most relevant, issues tool calls to specialized search engines, evaluates the returned results, and decides whether to search again, switch engines, or synthesize.

This is implemented using LangGraph, a framework for building stateful, multi-step agent workflows as directed graphs. Each node in the graph represents a capability — web search, arXiv query, PubMed lookup, document retrieval — and the LLM navigates between nodes based on its assessment of what information is still needed.

The practical consequence is that the agent strategy tends to collect significantly more sources and produce more comprehensive reports than pipeline strategies, because it adapts its search behavior to the specific question rather than following a fixed recipe.

Ask your agent

Get the agent to explain the architectural difference between LDR's pipeline strategies and the LangGraph agent strategy, including a description of the state graph and the decision points where the LLM exercises autonomy.

Think about it

What does it mean for a research workflow to be a 'directed graph' rather than a linear pipeline?
At which points in the workflow does the LLM make a decision, and what information does it use to decide?
How does the agent determine when it has gathered *enough* evidence to stop searching and begin synthesis?
What role does the 'state' in a stateful agent play — what is being accumulated across steps?

What the agent gives back

The agent should describe the LangGraph agent strategy as a state machine where the LLM operates as the router. The state object accumulates search results, source metadata, and a running assessment of coverage. At each step, the LLM examines the current state and selects a tool: search_web, search_arxiv, search_pubmed, search_semantic_scholar, search_local_docs, or synthesize. The key architectural insight is that the graph has cycles — the agent can return to search nodes multiple times — unlike a pipeline which moves strictly forward. The termination condition is the LLM's own judgment that sufficient evidence exists, optionally bounded by a maximum iteration count to prevent runaway loops.

Tip

To enable the LangGraph agent strategy in LDR, navigate to Settings in the web UI and select langgraph-agent from the strategy dropdown. This is not the default — LDR ships with a simpler pipeline strategy enabled to ensure broad compatibility.

How LangGraph differs from a simple ReAct loop

A common agentic pattern is ReAct (Reason + Act): the LLM alternates between reasoning about what to do and executing a tool call, in a flat loop. LangGraph extends this by introducing graph structure — explicit nodes with defined transitions, conditional edges, and shared state. This allows more complex workflows: parallel tool calls, sub-graphs for specialized tasks, and checkpointing for resumption. In LDR's case, the graph includes nodes for each search engine, a synthesis node, and conditional edges that route based on the LLM's assessment of result quality and topical coverage. The graph structure also makes the workflow inspectable and debuggable in ways that a flat ReAct loop is not.

What is the SimpleQA benchmark and why does 95% matter?

SimpleQA is an evaluation benchmark developed by OpenAI consisting of short, factual questions with verifiable answers. A 95% score indicates that LDR, when paired with GPT-4.1-mini, can answer straightforward factual queries with high reliability. This is notable because the system is performing multi-step retrieval and synthesis, not simply recalling training data. The benchmark validates that LDR's search orchestration and synthesis pipeline faithfully preserves factual accuracy from source material. However, SimpleQA tests relatively simple factual recall — it does not measure the system's ability to handle nuanced, multi-faceted research questions where the agent strategy's adaptive search becomes most valuable.

✓

At this point, you should have the LDR stack running locally via Docker and accessible at `http://localhost:5000`. You should also understand the conceptual difference between pipeline-based research strategies (fixed sequence of steps) and the LangGraph agent strategy (LLM-driven navigation through a graph of search tools). If the web UI loads and you can see the strategy selection dropdown in Settings, you are ready to proceed.

Quick Check

You want to research a biomedical question: 'What are the latest findings on GLP-1 receptor agonists for neurodegeneration?' Which research strategy should you select, and why?

The 'quick' strategy — it will give the fastest answer from a single web search

✗ Not quite. The quick strategy performs a single search pass and immediate synthesis. For a question spanning active biomedical research, it will likely miss relevant arXiv preprints, PubMed studies, and recent clinical data. Speed is not the bottleneck in a research workflow — coverage is.

The 'langgraph-agent' strategy — it will autonomously query PubMed, arXiv, and web sources based on what it finds

✓ Correct! This question spans multiple domains (pharmacology, neuroscience, clinical trials) and multiple source types (preprints, peer-reviewed studies, news). The LangGraph agent will recognize the biomedical nature of the query, route to PubMed and potentially Semantic Scholar, discover preprint references and query arXiv, and synthesize across all sources. Its adaptive behavior is precisely suited to multi-domain questions.

The 'academic' pipeline strategy — it always searches academic databases in a fixed order

✗ Not quite. The academic pipeline is better than the quick strategy for this question, but it follows a predetermined sequence. It will not adapt if early results suggest a different search direction. The agent strategy's advantage is its ability to change course mid-research — for example, discovering that a key GLP-1 trial was just reported in a news outlet and pivoting to web search for the press coverage.

Configure Multi-Source Search Orchestration

LDR's power comes from its ability to federate searches across fundamentally different information ecosystems. A web search engine, an arXiv query, and a PubMed search are not interchangeable — they have different query syntaxes, different result structures, and different strengths. Web search excels at recency and breadth; arXiv provides preprints before peer review; PubMed offers structured biomedical literature with MeSH term indexing; Semantic Scholar provides citation graph analysis.

Configuring these sources correctly is the difference between a research assistant that returns superficial web summaries and one that produces literature-review-quality synthesis. Each source has parameters that affect result quality: the number of results to retrieve, whether to fetch full text or abstracts only, and how to handle rate limits.

We will use our AI agent to generate a configuration that balances thoroughness with performance, tailored to a specific research domain.

Ask your agent

Get the agent to produce a search source configuration for LDR optimized for interdisciplinary scientific research — covering web, arXiv, PubMed, and Semantic Scholar — with appropriate result counts and priority settings.

Think about it

How many results per source is enough to ensure coverage without overwhelming the synthesis step?
Should all sources be queried for every question, or should the agent decide? How does this interact with the strategy choice?
What happens when sources return conflicting information — how should the synthesis handle disagreements?
Are there rate limits or authentication requirements for any of these academic APIs?

What the agent gives back

The agent should describe the configuration approach: for the LangGraph agent strategy, source selection is delegated to the LLM, so the configuration specifies available sources and their parameters rather than a fixed query order. A reasonable configuration retrieves 10–15 results per source, enables full-text fetching for arXiv (PDFs are freely available), uses abstract-only mode for PubMed (full text requires institutional access for many journals), and sets Semantic Scholar to return citation counts for relevance ranking. The agent should note that SearXNG configuration (at http://localhost:8080/preferences) controls which upstream search engines are active for web queries, and recommend enabling at least Google Scholar alongside general web engines.

API Key Note

arXiv and PubMed APIs are free and require no authentication for reasonable query volumes. Semantic Scholar's API is also free but rate-limited; for heavy use, you can request an API key at semanticscholar.org/product/api. SearXNG requires no API keys — it scrapes search engines directly.

Tip

LDR's settings are accessible both through the web UI at http://localhost:5000 and through a TOML configuration file in the data volume. For reproducible setups, prefer the configuration file; for experimentation, use the web UI.

How SearXNG federates search without API keys

SearXNG is a meta-search engine that operates by issuing HTTP requests to search engine frontends (Google, Bing, DuckDuckGo, etc.) and parsing the HTML responses — essentially automating what a human would do in a browser. This means it requires no API keys and incurs no per-query costs, but it is subject to rate limiting and CAPTCHAs if query volume is high. Running SearXNG locally as part of the Docker stack gives LDR a privacy-preserving search capability: your queries go to SearXNG on localhost, which fans out to search engines from your server's IP, and results return without any search engine knowing that an AI system is the ultimate consumer of the results.

Build and Populate the Personal Knowledge Base

The most distinctive feature of LDR is its personal knowledge base — an encrypted, locally-stored library of documents that grows with each research session. This transforms LDR from a stateless question-answering tool into a compounding research assistant: each session deposits sources into the library, and future sessions search the library alongside live sources.

The knowledge base operates on a straightforward pipeline: documents (PDFs, web pages, articles) are downloaded, their text is extracted, the text is chunked into passages, each passage is embedded into a vector representation, and the vectors are stored in an indexed database. When you query the knowledge base, your question is similarly embedded and matched against stored passages by vector similarity.

Critically, the entire database is encrypted with AES-256 via SQLCipher. Each user gets an isolated database. There is no password recovery mechanism — this is a deliberate design choice that ensures true zero-knowledge security. Even someone with physical access to the server cannot read the data without the user's passphrase.

We will now use our agent to design a workflow for systematically populating this knowledge base from a research session.

Ask your agent

Get the agent to describe a systematic workflow for populating the LDR knowledge base after a research session — including which sources to download, how the indexing pipeline works, and how to verify that documents are searchable.

Think about it

After a research session, LDR presents its sources with metadata. What criteria should guide which sources to download into the library versus skip?
How does the embedding and indexing process work — what model creates the vectors, and how does chunk size affect retrieval quality?
How would you verify that a newly added document is actually searchable? What kind of test query would confirm it?
What happens when the knowledge base grows large — does retrieval quality degrade, and if so, how?

What the agent gives back

The agent should describe the workflow in three phases. First, curation: after a research session, review the cited sources in the report and select those with lasting reference value — peer-reviewed papers, authoritative reports, primary data sources — while skipping ephemeral news articles or redundant sources. Second, ingestion: use LDR's download-and-index feature (available in the web UI for each cited source) to fetch the document, extract text, chunk it into passages of approximately 500–1000 tokens, embed each chunk using the configured embedding model, and store the vectors in the SQLCipher database. Third, verification: run a targeted query that should match the newly added document — for example, if you added a paper on GLP-1 agonists, query 'GLP-1 receptor binding affinity' and confirm the paper appears in the local document results. The agent should note that retrieval quality remains stable as the library grows because vector similarity search is sublinear in complexity, though very large libraries (thousands of documents) benefit from periodic re-indexing.

Warning

If you set LDR_BOOTSTRAP_ALLOW_UNENCRYPTED=true during initial setup (common for troubleshooting), your knowledge base is stored in plain SQLite without encryption. For any research involving sensitive or proprietary information, ensure SQLCipher is properly configured before adding documents.

How vector similarity search works in the knowledge base

When a document is added to the knowledge base, each text chunk is converted into a high-dimensional vector (typically 384 or 768 dimensions) by an embedding model. These vectors capture semantic meaning: passages about similar topics produce vectors that are close together in the embedding space, even if they use different words. At query time, your question is embedded using the same model, and the system finds stored vectors with the highest cosine similarity to the query vector. This is why the knowledge base can find relevant passages even when the exact keywords differ — the search operates on meaning, not string matching. The trade-off is that embedding quality depends entirely on the model used; a small, general-purpose embedding model may not capture domain-specific nuances as well as a specialized one.

The compounding effect: why the knowledge base gets more valuable over time

Consider two scenarios. In the first, a researcher asks LDR about CRISPR gene editing today, gets a report, and discards it. Six months later, they ask about CRISPR delivery mechanisms — LDR starts from scratch, re-searching the same foundational sources. In the second scenario, the researcher downloads the key papers from the first session into the knowledge base. Six months later, the delivery mechanism query automatically retrieves relevant passages from those stored papers alongside new web results. The synthesis is richer because it connects current findings to the researcher's prior reading. Over dozens of sessions, the knowledge base becomes a personalized research corpus — a curated, searchable subset of the literature that reflects the researcher's specific interests and prior investigations.

✓

You should now understand the three-layer architecture of the LDR system: inference (Ollama), search federation (SearXNG + academic APIs), and knowledge persistence (SQLCipher-encrypted vector store). You should be able to run a research query using the LangGraph agent strategy, observe which sources the agent selects, and download cited sources into your knowledge base for future retrieval.

Your Turn

Use your AI agent to design a prompt template for LDR that produces structured literature reviews — not just summaries, but reviews that identify consensus findings, open questions, and methodological limitations across sources.

LDR's default output is a synthesized report with citations. For serious research, you often need more structure: a clear statement of what the literature agrees on, where findings diverge, what methodological issues exist, and what gaps remain. This is the difference between a summary and a literature review. Your task is to craft a system prompt or research query format that reliably produces this structured output from LDR.

Think about it

What sections should a structured literature review contain, and how would you communicate this structure to the LLM?
How should the prompt instruct the LLM to handle contradictory findings from different sources?
Should the prompt specify a minimum number of sources, or let the agent decide when coverage is sufficient?
How would you instruct the system to distinguish between well-established findings and preliminary results?

See a sample prompt

One way you could prompt it

I need a system prompt for Local Deep Research that transforms its output from a general synthesis into a structured literature review. The review should have five sections: (1) Research Question — a precise restatement of the query, (2) Consensus Findings — claims supported by multiple independent sources with citation counts, (3) Contested or Preliminary Findings — claims supported by only one source or where sources disagree, with explicit notation of the disagreement, (4) Methodological Notes — any limitations in the cited studies that affect confidence (sample size, study design, conflict of interest), and (5) Open Questions — specific gaps in the literature that the search revealed. The prompt should instruct the LLM to use at least 8 sources before synthesizing, to prefer peer-reviewed sources over preprints and preprints over news articles, and to flag any claim that rests on a single source. Format the output in markdown with each section clearly headed.

Evaluate and Iterate: Assessing Research Output Quality

Deploying a research assistant is straightforward; knowing whether to trust its output is the harder problem. LDR's ~95% SimpleQA score is encouraging for factual queries, but real research questions are rarely simple factual lookups. They involve synthesis, judgment about source quality, and recognition of uncertainty — none of which SimpleQA measures.

A disciplined evaluation approach requires three dimensions. Factual accuracy: are the specific claims in the report verifiable against the cited sources? Source coverage: did the agent find the most relevant and authoritative sources, or did it settle for whatever appeared first? Synthesis quality: does the report merely concatenate source summaries, or does it identify patterns, contradictions, and implications across sources?

We will use our agent to design a lightweight evaluation rubric that you can apply to any LDR output, allowing you to calibrate your trust in the system over time and identify systematic weaknesses.

Ask your agent

Get the agent to produce a practical evaluation rubric for assessing the quality of LDR research reports across factual accuracy, source coverage, and synthesis depth.

Think about it

How would you spot-check factual claims without reading every cited source in full?
What signals indicate that the agent's source coverage was thorough versus superficial — is the number of sources sufficient, or does source *diversity* matter more?
How can you distinguish genuine synthesis (connecting ideas across sources) from mere summarization (restating each source sequentially)?
Should the rubric be binary (pass/fail) or graduated? What scoring approach is most useful for iterative improvement?

What the agent gives back

The agent should produce a rubric with three dimensions, each scored on a 1–5 scale. Factual Accuracy (1–5): select three specific claims from the report, locate them in the cited sources, and verify. Score 5 if all three are accurately represented with appropriate nuance; score 1 if any claim is fabricated or materially misrepresented. Source Coverage (1–5): assess whether the sources span multiple databases (web, arXiv, PubMed), include both recent and foundational works, and represent diverse perspectives. Score 5 for comprehensive, multi-source coverage; score 1 for reliance on a single source type. Synthesis Quality (1–5): check whether the report identifies cross-source patterns, notes contradictions, and draws conclusions that no single source states explicitly. Score 5 for genuine analytical synthesis; score 1 for sequential source summaries with no integration. The rubric should include a note that scores below 3 on any dimension warrant re-running the query with a different strategy or more explicit instructions.

Tip

A useful heuristic for synthesis quality: if you could reconstruct the report's structure by reading only one of its cited sources, the synthesis is likely shallow. Good synthesis produces insights that emerge only from the combination of sources.

Why benchmark scores do not tell the whole story

The SimpleQA benchmark measures a system's ability to answer short, factual questions with verifiable answers — questions like 'What year was the transistor invented?' or 'What is the capital of Bhutan?' A 95% score on this benchmark tells you that LDR's retrieval and synthesis pipeline faithfully preserves factual information from sources. It does not tell you whether the system can identify the most authoritative sources, handle ambiguity, recognize when a question has no clear consensus answer, or produce analysis that goes beyond what any single source contains. These are the capabilities that matter for genuine research, and they require human evaluation — at least until we have benchmarks sophisticated enough to measure them.

Quick Check

You run a research query and LDR returns a report citing 12 sources, all from web search results. The report is well-written and the claims seem reasonable. What should concern you?

Nothing — 12 sources is more than enough for a reliable report

✗ Not quite. Source *count* is not the same as source *diversity*. Twelve web results may all originate from the same handful of news articles reporting on a single study. Without academic database sources (arXiv, PubMed, Semantic Scholar), you have no way to verify whether the web sources accurately represent the underlying research. This is a coverage problem, not a quantity problem.

All sources are from the same type (web search) — the agent may not have queried academic databases, reducing source diversity and verification ability

✓ Correct! Source diversity is a stronger indicator of report reliability than source count. If the LangGraph agent queried only web sources, it may have missed the primary research papers that the web articles are reporting on. Check whether the agent strategy was enabled — a pipeline strategy might explain the single-source-type behavior. Re-run with the LangGraph agent strategy and verify that academic databases are configured and accessible.

12 sources is too many — the synthesis will be diluted

✗ Not quite. For a research question of any depth, 12 sources is a reasonable number. The issue is not quantity but diversity. A well-configured agent strategy might use 15–20 sources across multiple databases and still produce a focused, coherent synthesis — because it selects sources based on relevance, not merely availability.

Recap

In this workshop, we deployed a fully local, privacy-preserving AI research assistant using Local Deep Research, Ollama, and SearXNG. We examined the architectural distinction between pipeline-based research strategies — which follow a fixed sequence of search and synthesis steps — and the LangGraph agent strategy, where the language model autonomously navigates a graph of specialized search tools based on its evolving understanding of the question.

We configured multi-source search orchestration across web, arXiv, PubMed, and Semantic Scholar, understanding that source diversity matters more than source quantity for research reliability. We built a personal knowledge base using LDR's encrypted document indexing, establishing a workflow where each research session deposits curated sources into a vector-indexed library that enriches all future queries.

Finally, we developed an evaluation rubric for assessing research output quality — recognizing that benchmark scores like SimpleQA's 95% validate factual accuracy on simple queries but do not measure the synthesis depth and source coverage that distinguish useful research from superficial summarization.

The system we built is not merely a question-answering tool. It is a research infrastructure that compounds in value over time: each session adds to the knowledge base, each evaluation calibrates your trust in the output, and each prompt refinement improves the quality of future reports.

The LangGraph agent strategy transforms research from a fixed pipeline into an adaptive process — the LLM decides which search engines to query and when synthesis is warranted, producing more comprehensive and contextually appropriate results than predetermined sequences.
A personal knowledge base that persists across sessions creates a compounding advantage: future queries automatically search your curated library alongside live sources, connecting new findings to prior research without manual re-discovery.
Evaluating AI research output requires assessing three distinct dimensions — factual accuracy (are claims verifiable?), source coverage (are diverse, authoritative sources represented?), and synthesis quality (does the report produce insights beyond what any single source contains?) — none of which can be reduced to a single benchmark score.

Where to go next

Experiment with different LLM backends (local models via Ollama versus cloud APIs like Claude or GPT-4) and compare their effect on research quality, particularly on the agent strategy's ability to select appropriate search engines.
Build a domain-specific knowledge base by running a series of research sessions on a focused topic, downloading all relevant sources, and then testing whether queries on that topic produce richer results from the accumulated library than from live search alone.
Explore LDR's API endpoints to integrate research capabilities into other tools — for example, triggering a deep research query from a note-taking application or a CI pipeline that monitors a topic for new developments.

Sources

LearningCircuit/local-deep-research (GitHub Trending Python)