<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://solr.cool/feed.xml" rel="self" type="application/atom+xml" /><link href="https://solr.cool/" rel="alternate" type="text/html" /><updated>2026-05-19T18:08:06+00:00</updated><id>https://solr.cool/feed.xml</id><title type="html">solr.cool</title><subtitle>Field notes on open-source search — Solr, hybrid retrieval, RAG and agentic stacks.</subtitle><entry><title type="html">Hybrid Search Architecture for Apache Solr 10 and beyond</title><link href="https://solr.cool/hybrid-search-on-apache-solr/" rel="alternate" type="text/html" title="Hybrid Search Architecture for Apache Solr 10 and beyond" /><published>2026-05-18T00:00:00+00:00</published><updated>2026-05-18T00:00:00+00:00</updated><id>https://solr.cool/hybrid-search-on-apache-solr</id><content type="html" xml:base="https://solr.cool/hybrid-search-on-apache-solr/"><![CDATA[<h2 id="starting-point">Starting Point</h2>

<p>“Hybrid search” today almost always means: lexical search (BM25) <strong>plus</strong> vector search (dense embeddings, optionally sparse like SPLADE), merged via Reciprocal Rank Fusion (RRF) or weighted linear combination, often with a downstream reranker (cross-encoder, ColBERT). The question is no longer <em>whether</em> an engine can do this — it’s <em>how mature</em>, <em>how operationally expensive</em>, and <em>how scalable</em>.</p>

<p>Equally important is the forward-looking perspective: the most transformative innovations of the coming years are happening not in the index, but in the layers <em>above</em> it — LLM query rewriting, agentic retrieval planning, cross-encoder and LLM reranking, generative answer synthesis, multimodal search, late-interaction models like ColPali for visually rich documents. This matters for engine choice because some of these trends are engine-neutral and some are not.</p>

<p>The serious open-source candidates fall into three categories:</p>

<ol>
  <li><strong>Lucene-based search platforms:</strong> Solr, OpenSearch, Elasticsearch</li>
  <li><strong>AI-native search engine:</strong> Vespa</li>
  <li><strong>Vector-first databases with BM25:</strong> Qdrant, Weaviate, Milvus</li>
</ol>

<p>The evaluation of these categories runs through the following functional areas, which together in modern search systems make the difference between “works” and “competitive”:</p>

<ol>
  <li><strong>Full-text and hybrid search</strong> — BM25, dense vectors, sparse models, fusion (RRF), cross-encoder and LLM reranking</li>
  <li><strong>Filters and selections</strong> — structured, deterministic constraints on the result set; in the hybrid path implemented as pre-filter to the KNN search</li>
  <li><strong>Facets and aggregations</strong> — counted refinements on the current result set, hierarchical or pivot, distinct from OLAP-style aggregations</li>
  <li><strong>Autosuggest / search-as-you-type</strong> — its own path with its own index, latency class (P99 &lt; 50 ms), and ranking logic</li>
  <li><strong>Ranking and personalization</strong> — from static boosts through learning-to-rank to real-time multi-phase ranking with user features</li>
  <li><strong>Generative SERP layer</strong> — RAG answers, agentic query plans, multimodal and late-interaction retrieval, dynamic result composition</li>
</ol>

<blockquote class="epigraph">
  <p>The most honest answer to <strong>“which engine ages best”</strong>: the one you hard-wire the <strong>least</strong>.</p>
</blockquote>

<h2 id="candidates">The Candidates in Detail</h2>

<h3 id="apache-solr-10-available-since-early2026">Apache Solr 10 <small style="font-weight:400;font-size:16px;color:var(--ink-soft)">(available since early 2026)</small></h3>

<p>Solr 10 is no longer the “old classic that can barely do vectors.” The release brings substantial improvements: scalar and binary quantization of dense vectors, optional GPU acceleration via cuVS-Lucene as a pluggable codec, new <code class="language-plaintext highlighter-rouge">efSearch</code> parameters for HNSW tuning, feature-vector caching for learning-to-rank, and with <code class="language-plaintext highlighter-rouge">SeededKnnVectorQuery</code> and <code class="language-plaintext highlighter-rouge">PatienceKnnVectorQuery</code> (early termination) two specific hybrid accelerators. Hybrid retrieval constructions (<code class="language-plaintext highlighter-rouge">{!bool should=$lex should=$knn}</code>) and hybrid ranking work — though they are still underdocumented in the reference guide (SOLR-17103). The <code class="language-plaintext highlighter-rouge">TextToVectorQParser</code> allows the query to be encoded directly within Solr.</p>

<div class="table-wrap">
  <table>
    <thead><tr><th>Pros</th><th>Cons</th></tr></thead>
    <tbody><tr>
      <td>No migration needed &mdash; schema and ops knowledge stay. Genuine ASF governance, no corporate overlord, Apache&nbsp;2.0. Very strong faceted search, geo, parallel SQL &mdash; if relevant. Learning-to-rank is mature and has been able to use vector similarity as a feature since 9.3.</td>
      <td>Hybrid DX is raw &mdash; a lot of manual XML/JSON, pagination with BoolQParser&nbsp;+&nbsp;KNN is tricky, RRF not out of the box (actively being worked on). External ZooKeeper dependency for SolrCloud. Java&nbsp;21 as minimum (operational implication). Community smaller and shrinking relative to Elastic/OpenSearch. Multi-vector / late-interaction fields (ColBERT/ColPali) are not yet first-class in Lucene &mdash; the most relevant future weakness.</td>
    </tr></tbody>
  </table>
</div>

<h3 id="opensearch-3x">OpenSearch (3.x)</h3>

<p>Apache 2.0, Linux Foundation governance since 2024, Lucene-based. Version 3.2 explicitly expanded “agentic AI” and native hybrid search, supports FAISS and nmslib engines alongside Lucene HNSW, vector dimensions up to 16k. RRF and score normalization are built in as pipeline processors.</p>

<div class="table-wrap">
  <table>
    <thead><tr><th>Pros</th><th>Cons</th></tr></thead>
    <tbody><tr>
      <td>Truly open source. Security (RBAC, FLS/DLS, audit) is in the free distribution &mdash; with Elastic this costs Platinum/Enterprise. Very active push toward AI features. Large ecosystem, Kibana-equivalent dashboards. AWS integration if desired.</td>
      <td>Performance benchmarks show it lags 40&ndash;140% behind Elasticsearch (vendor benchmarks, read with caution). Operational complexity similar to Elasticsearch. A migration from Solr is a real migration: schema, query language, tooling, configuration. Multi-vector late-interaction shares the Lucene weakness with Solr.</td>
    </tr></tbody>
  </table>
</div>

<h3 id="elasticsearch-8x--9x">Elasticsearch (8.x / 9.x)</h3>

<p>License is OSI-compliant again since 2024 via AGPLv3 option (alongside SSPL and the Elastic License). Mature hybrid search with RRF, ELSER (Elastic’s own sparse model for out-of-domain semantics), built-in reranking API.</p>

<div class="table-wrap">
  <table>
    <thead><tr><th>Pros</th><th>Cons</th></tr></thead>
    <tbody><tr>
      <td>Probably the most polished hybrid search experience in the Lucene family, excellent documentation, mature ML pipelines, Kibana. Strongest DX for hybrid out of the box.</td>
      <td>The licensing nightmare isn&rsquo;t fully over &mdash; AGPLv3 is OSI-compliant but tricky for many enterprise contexts. Many premium features (ML, security tier, RAG API) remain gated. TCO at larger cluster sizes is relevant. If the customer is trying to move <em>away</em> from commercial pressure, this is the wrong signal.</td>
    </tr></tbody>
  </table>
</div>

<h3 id="vespa">Vespa</h3>

<p>Formerly Yahoo, open source under Apache 2.0 since 2017. Unlike the Lucene family, a vector-native architecture: mutable in-memory data structures (no refresh interval), multi-phase ranking on content nodes (not scatter-gather), ONNX/LightGBM executable locally.</p>

<div class="table-wrap">
  <table>
    <thead><tr><th>Pros</th><th>Cons</th></tr></thead>
    <tbody><tr>
      <td>Clear performance king for hybrid at scale &mdash; vendor benchmarks claim 8.5&times; higher hybrid throughput per core compared to Elasticsearch, 12.9&times; for pure vector. True real-time visibility. First-class tensor and ranking expressiveness, ColBERT/late-interaction native. The only engine that does <em>retrieval and complex ranking in a single query round trip</em>. Exactly the architecture that supports real-time personalization and multi-phase ranking.</td>
      <td>The steepest learning curve in this list &mdash; its own configuration language, its own query language (YQL), its own mental model. Smaller community, fewer Stack Overflow answers. Operationally demanding for self-hosting; Vespa Cloud is the pragmatic alternative. Overkill if data volume is "medium" and latency isn&rsquo;t critical in single-digit ms.</td>
    </tr></tbody>
  </table>
</div>

<h3 id="qdrant--weaviate--milvus-vector-first-with-bm25">Qdrant / Weaviate / Milvus <small style="font-weight:400;font-size:16px;color:var(--ink-soft)">(Vector-first with BM25)</small></h3>

<p>Qdrant (Rust, Apache 2.0), Weaviate (Go, BSD-3), Milvus (Go/C++, Apache 2.0) are primarily vector DBs but by now all ship usable BM25 + hybrid fusion (RRF, DBSF, alpha-blending). Qdrant integrates IDF calculation into the engine, Weaviate has the <code class="language-plaintext highlighter-rouge">with_hybrid(alpha=...)</code> API. ColBERT / ColPali support is first-class here — so they’re strongest precisely where the Lucene family is weakest.</p>

<div class="table-wrap">
  <table>
    <thead><tr><th>Pros</th><th>Cons</th></tr></thead>
    <tbody><tr>
      <td>Best DX for vector + hybrid when starting greenfield. Quick to set up, clear APIs, small footprints. Reranking hooks (ColBERT, cross-encoder) are first-class. Multimodal workflows (CLIP, SigLIP, ColPali for PDFs/images) come with less friction than the Lucene engines.</td>
      <td>Weaker on the "classical" lexical side &mdash; tokenizers, analyzers, synonyms, fuzzy matching, phrase slop, highlighting, faceted search, spell-check are not at Lucene level. If you&rsquo;re running Solr in production today, you almost certainly use features that are missing here or would have to be built. Better suited as a RAG backend than as a universal site/product search.</td>
    </tr></tbody>
  </table>
</div>

<h3 id="matrix">Candidate Evaluation Matrix</h3>

<div class="table-wrap">
  <table>
    <thead><tr>
      <th>Criterion</th><th>Solr&nbsp;10</th><th>OpenSearch</th><th>Elasticsearch</th><th>Vespa</th><th>Qdrant / Weaviate</th>
    </tr></thead>
    <tbody>
      <tr><td>License (clean OSS)</td><td><span class="ok">✔</span> Apache 2.0</td><td><span class="ok">✔</span> Apache 2.0</td><td><span class="warn">⚠</span> AGPLv3 / SSPL</td><td><span class="ok">✔</span> Apache 2.0</td><td><span class="ok">✔</span> Apache 2.0 / BSD</td></tr>
      <tr><td>Hybrid search DX</td><td><span class="warn">⚠</span> raw</td><td><span class="ok">✔</span> good</td><td><span class="ok">✔</span> very good</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> excellent</td></tr>
      <tr><td>Lexical depth</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> very good</td><td><span class="warn">⚠</span> basic</td></tr>
      <tr><td>Faceting / aggregations</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> excellent</td><td><span class="ok">✔</span> very good</td><td><span class="warn">⚠</span> weak</td></tr>
      <tr><td>Autosuggest (e-comm level)</td><td><span class="warn">⚠</span> building blocks</td><td><span class="warn">⚠</span> building blocks</td><td><span class="ok">✔</span> search_as_you_type + LTR</td><td><span class="ok">✔</span> reference</td><td><span class="warn">⚠</span> basic</td></tr>
      <tr><td>Vector performance</td><td><span class="ok">✔</span> good (with 10)</td><td><span class="ok">✔</span> good</td><td><span class="ok">✔</span> good</td><td><span class="ok">✔</span> top tier</td><td><span class="ok">✔</span> very good</td></tr>
      <tr><td>Late interaction (ColBERT/ColPali)</td><td><span class="warn">⚠</span> weak</td><td><span class="warn">⚠</span> weak</td><td><span class="warn">⚠</span> in progress</td><td><span class="ok">✔</span> native</td><td><span class="ok">✔</span> first-class</td></tr>
      <tr><td>Ranking flexibility</td><td><span class="ok">✔</span> LTR mature</td><td><span class="ok">✔</span> good</td><td><span class="ok">✔</span> ML stack</td><td><span class="ok">✔</span> multi-phase</td><td><span class="warn">⚠</span> rerank hook</td></tr>
      <tr><td>Operational maturity</td><td><span class="ok">✔</span> high</td><td><span class="ok">✔</span> high</td><td><span class="ok">✔</span> high</td><td><span class="warn">⚠</span> steep</td><td><span class="ok">✔</span> simple</td></tr>
      <tr><td>Migration cost (from current)</td><td><span class="ok">✔</span> none</td><td><span class="bad">✘</span> large</td><td><span class="bad">✘</span> large</td><td><span class="bad">✘</span> very large</td><td><span class="bad">✘</span> large</td></tr>
      <tr><td>Community momentum</td><td><span class="warn">⚠</span> stable</td><td><span class="ok">✔</span> growing</td><td><span class="ok">✔</span> large</td><td><span class="warn">⚠</span> niche</td><td><span class="ok">✔</span> growing</td></tr>
    </tbody>
  </table>
</div>

<h2 id="serp">Where Is the SERP Heading — and What Does That Mean for Engine Choice?</h2>

<p>Before deciding, it’s worth looking at the direction of innovation. Six trends I see as defining for the next 2–4 years:</p>

<ul>
  <li><strong>Late-interaction models migrate from reranker to retrieval layer.</strong> ColBERT was the start; ColPali/ColQwen are the natural continuation — multi-vector representations per document, MaxSim matching, no OCR pipeline drama with PDFs or images. Vespa, Qdrant and Weaviate support this in production today; Lucene-based engines have a harder time structurally because the index has historically been single-vector-centric.</li>
  <li><strong>LLM and cross-encoder rerankers become standard stage 2.</strong> The math is uncontested: hybrid retrieval on top-100, then a cross-encoder or LLM reranker on top-10. Voyage, Cohere, Jina, FlashRank, ColBERT-v2 are the building blocks. Engine-neutral, runs externally.</li>
  <li><strong>Generative answers and “generative UI” on the SERP.</strong> The display becomes dynamic: comparison table when the query looks like one; map when geo; carousel when products; pure answer when FAQ-like. The engine doesn’t decide this; the layer above does.</li>
  <li><strong>Agentic search and multi-step query plans.</strong> An LLM decomposes the user question into sub-queries, calls retrieval as a tool, checks the results, refines, asks back. MCP is becoming the standard interface here.</li>
  <li><strong>Real-time personalization in the ranking stage.</strong> Multi-phase ranking where user context, session, embedding similarity to past behavior, and business logic come together. Native in Vespa, via LTR in Lucene-based engines.</li>
  <li><strong>Multimodality as default.</strong> Image-to-text, text-to-image, mixed queries. CLIP, SigLIP, ColPali are the tools — for visually rich sites a realistic use case in 2–3 years.</li>
</ul>

<p><strong>What’s engine-relevant, what isn’t?</strong> Cross-encoder reranking, generative answers, and agentic orchestration live almost entirely <em>above</em> the engine. Late interaction, real-time multi-phase ranking, and (with caveats) multimodality are the trends where index architecture genuinely makes a difference. That’s where the Lucene family has structural work to do, while Vespa and the vector-first DBs are already ahead.</p>

<h3 id="recommendation">Recommendation</h3>

<p><strong>If the customer were starting greenfield</strong> — without the existing Solr investment — and the profile is “classical search engine with hybrid extension, medium-to-large data volume, no megascale RAG,” I would recommend <strong>OpenSearch</strong>. The hybrid DX is mature, RRF and score normalization are built in, the license is clean, security features are included at no extra cost, and the ecosystem is large enough that for most problems someone has already posted a solution.</p>

<p><strong>However:</strong> The customer is <em>not</em> starting greenfield — they’re facing the Solr 9-to-10 upgrade. And here the recommendation flips: <strong>Solr 10 is sufficient in the overwhelming majority of cases for hybrid search</strong>, and migration cost to OpenSearch would be substantial (schema modeling, query language, indexing pipeline, ops, monitoring, team skills). Solr 10 closes the exact gaps that 9.x still had — with quantization, GPU codec, SeededKnn and PatienceKnn termination.</p>

<p><strong>Looking ahead reinforces this recommendation — with one important caveat.</strong> The truly innovative layers of the SERP for the next several years (generative answers, agentic orchestration, query rewriting, LLM reranking) are engine-neutral and live in the application layer. Anyone thinking “I’ll buy engine X and that gets me AI search” is wrong — regardless of which engine. Only three trends are genuinely engine-architecture-relevant: late interaction (ColBERT/ColPali), real-time multi-phase ranking, native multimodality.</p>

<p><strong>Concretely as a two-stage approach:</strong></p>

<ol>
  <li><strong>Now:</strong> Run the upgrade to Solr 10. As part of it, build a hybrid retrieval setup as a PoC — DenseVectorField, an embedding model (e.g., multilingual-e5 or bge-m3), <code class="language-plaintext highlighter-rouge">{!bool should=$lex should=$knn}</code> with RRF in the application layer, optionally a cross-encoder reranker as a second stage. Within a few weeks you’ll know whether the relevance gain justifies the complexity.</li>
  <li><strong>If the PoC hits limits</strong> — missing multi-phase ranking on large data sets, need for native late interaction for visual documents, real-time personalization requirements — then the question is <em>where to migrate</em>, and the answer depends on the specific bottleneck (Vespa for ranking power and real-time, Qdrant/Weaviate for RAG/multimodal use cases, OpenSearch for broader platform).</li>
</ol>

<blockquote class="is-red">
  <p><strong>Upgrade to Solr 10.</strong> Add hybrid. Build it so the engine stays replaceable.</p>
</blockquote>

<p>The worst move would be to migrate off working Solr without a concrete pain point to justify the bill. Every serious engine in 2026 does hybrid search — the real differentiator isn’t “can it?” but ranking quality, embedding choice, reranking strategy, and the evaluation loop. That work is engine-independent. And it’s what will actually shape the SERP of the next few years.</p>

<h2 id="readers-map">What Follows — A Reader’s Map</h2>

<p>The recommendation above is the <em>what</em>. The remainder covers the <em>how</em> — the architecture and the disciplines that turn the recommendation into a working system, built so that today’s deterministic stack can host tomorrow’s agent without a rewrite.</p>

<div class="reader-map">
  <h4>Reader&rsquo;s Map</h4>
  <ul>
    <li><strong>A Resilient API Stack</strong> &mdash; the layered architecture from client to engine, the contract that keeps the engine replaceable, contrasted with the agentic alternative.</li>
    <li><strong>Filters and Facets</strong> &mdash; the second, non-relevance path through the same stack; why pure filter queries should skip half the pipeline.</li>
    <li><strong>Autosuggest</strong> &mdash; its own subsystem, different latency class, different signals, different pool.</li>
    <li><strong>The Indexing Path</strong> &mdash; embeddings, evaluation, the cross-cutting disciplines that decide whether the system improves over time.</li>
    <li><strong>Putting It Together</strong> &mdash; the synthesis, and the one rule that decides whether the engine stays replaceable.</li>
  </ul>
</div>

<h2 id="api-stack">A Resilient API Stack — Architecture Hygiene in Concrete Terms</h2>

<p>There are two ways to design a search API today, and a system built well can support both. The first is the <em>deterministic pipeline</em> — query understanding, retrieval, fusion, reranking, composition — laid out as imperative code that runs the same plan every time. The second is the <em>agentic orchestration</em> — a model that owns the plan, calls the engine as a tool, evaluates results, and iterates. The deterministic version is what every search team has been building for two decades; the agentic version is what teams are starting to ship in production.</p>

<pre>Query path (synchronous)                   Indexing path (async)
────────────────────────                   ─────────────────────

┌──────────────────────────────┐           ┌──────────────────────────┐
│ Client (Web, App, Agent,MCP) │           │   Source systems / CMS   │
│   speaks stable Domain API   │           │      CDC or Webhook      │
└──────────────┬───────────────┘           └────────────┬─────────────┘
               │                                        │
               ▼                                        ▼
┌──────────────────────────────┐           ┌──────────────────────────┐
│   Search BFF / Domain API    │           │    Indexing pipeline     │
│ /search /suggest /facets …   │           │  Normalize Enrich Chunk  │
│       engine-agnostic        │           └────────────┬─────────────┘
└──────────────┬───────────────┘                        │
               │                                        ▼
               ▼                              ┌──────────────────────────┐
┌──────────────────────────────┐              │    Embedding service     │
│      Query understanding     │◀─────────────│   versioned models,      │
│ Rewrite Expansion Intent     │              │       dual-write         │
│         Sub-queries          │              └────────────┬─────────────┘
└──────────────┬───────────────┘                           │
               │                                           │
               ▼                                           │ Bulk index
┌──────────────────────────────┐                           │
│    Retrieval orchestrator    │                           │
│ Plan Fan-out Fusion(RRF)     │                           │
│            Top-K             │                           │
└──────────────┬───────────────┘                           │
               │                                           │
               ▼                                           │
┌──────────────────────────────┐                           │
│        Engine adapter        │                           │
│  translates Query-IR to engine                           │
└──────────────┬───────────────┘                           │
               │                                           │
               ▼                                           ▼
┌──────────────────────────────────────────────────────────────────────┐
│                            Search engine                             │
│              Solr / OpenSearch / Vespa / Qdrant                      │
└──────────────┬───────────────────────────────────────────────────────┘
               │
               ▼
┌──────────────────────────────┐           ┌──────────────────────────┐
│       Reranking stage        │           │  Eval &amp; experimentation  │
│  Cross-encoder ColBERT LLM   │◀──────────│ Goldset A/B online metr. │
└──────────────┬───────────────┘           └────────────┬─────────────┘
               │                                        │
               ▼                                        │
┌──────────────────────────────┐                        │
│      Result composition      │                        │
│ Hits Facets Highlights       │                        │
│         RAG answer           │                        │
└──────────────┬───────────────┘                        │
               │                                        │
               ▼                                        │
┌──────────────────────────────┐                        │
│    Telemetry &amp; click logs    │◀───────────────────────┘
│ structured, linked to Query-IR
└──────────────────────────────┘</pre>

<h3 id="agentic">The Agentic Alternative — Thin Primitives, an Orchestrating Agent</h3>

<p>The deterministic-pipeline assumption is exactly what the next few years will challenge most directly. Doug Turnbull made the case sharply in <a href="https://softwaredoug.com/blog/2026/05/11/the-new-agentic-search-models" target="_blank" rel="noopener">a recent post</a>: the “thick search monolith” is being unbundled. In its place: a small set of <em>thin retrieval primitives</em> (basic keyword search, basic embedding search, a few filters), orchestrated by an agent that sees the whole problem rather than executing reductive steps.</p>

<p>Frontier models like GPT-5 and Sonnet already do the 80% case well — they understand most queries with general knowledge, and they can drive a retrieval tool reasonably. But Doug’s central point is about the last 20%: the domain knowledge that <em>isn’t</em> in a frontier model’s training. A furniture store knows that “bistro tables” means small outdoor tables, not restaurant equipment; GPT-5 doesn’t. Specialized agentic search models — SID-1, Glean’s Waldo, startups like Charcoal — get trained on the domain and on search-as-task specifically.</p>

<h4 id="what-the-agentic-shift-changes-in-the-layers">What the agentic shift changes in the layers</h4>

<ul>
  <li><strong>The Retrieval Orchestrator becomes the agent’s seat.</strong> An LLM occupies this layer and runs a loop: call the engine, evaluate the result, decide whether to refine, filter, expand, or retry. No longer imperative code; a model with tools.</li>
  <li><strong>The engine adapter becomes hot.</strong> A deterministic pipeline calls the engine once or twice per user query. An agentic orchestrator may call it five or ten times in a loop. The adapter must be idempotent, fast, safe to call repeatedly, with clear failure modes the agent can interpret.</li>
  <li><strong>The reranker may shrink.</strong> When the agent itself selects across iterations — keeping promising candidates, dropping bad ones — it <em>is</em> reranking, spread across the loop. A dedicated Cross-Encoder stage may still earn its place for raw quality, but it stops being mandatory.</li>
</ul>

<h4 id="what-the-agentic-shift-doesnt-change">What the agentic shift doesn’t change</h4>

<p>What survives unchanged is the discipline: a stable domain API at the boundary, engine-agnostic hit schemas, an evaluation loop, versioned embeddings, a dedicated suggest path. The agent has to talk to <em>something</em>, and that something is the layered stack.</p>

<blockquote>
  <p><strong>Design the engine adapter as a tool, not a remote procedure.</strong> The orchestrator calling it today is your code. The orchestrator calling it in three years is a model.</p>
</blockquote>

<h3 id="latency">The Latency Problem — and How to Pay for the Loop</h3>

<p>The honest cost of going agentic is latency. A deterministic pipeline runs one query plan: query understanding (~5 ms) → engine call (~50 ms) → rerank (~50 ms) → compose. Total: ~100 ms for the fast path. A plan-act-analyze loop costs <em>(LLM inference for planning + engine call + LLM inference for analysis)</em> per iteration. With a frontier model at 200–400 ms per call and three iterations, you’re at 750 ms to 1.5 seconds before the user sees anything. That’s the difference between “feels instant” and “feels broken.”</p>

<p>The bottleneck also <em>moves</em>. In the deterministic pipeline the engine dominates and you tune Solr. In the agentic loop the LLM dominates by a factor of 4–10×, and tuning the engine harder buys you almost nothing. Seven mitigations, ordered roughly by impact:</p>

<ul>
  <li><strong>Specialized, smaller orchestrator models.</strong> A 50 ms domain-tuned model vs a 300 ms frontier model changes the equation entirely. SID-1, Waldo and similar models are designed to be cheap enough to call multiple times per query. For online search, the single biggest lever.</li>
  <li><strong>Speculative parallelism.</strong> Fire multiple candidate retrievals in parallel from the first plan and let the analysis step pick. Two iterations of serial latency collapse to one.</li>
  <li><strong>Hot-path bypass for simple queries.</strong> A small fast classifier decides: simple queries → deterministic pipeline (~100 ms), complex or ambiguous queries → agent (500–1500 ms).</li>
  <li><strong>Caching at multiple layers.</strong> Query-IR caching, retrieval caching, reranker caching by <code class="language-plaintext highlighter-rouge">(query_hash, doc_id)</code>. Cache hit rates of 30–60% on hot queries are realistic.</li>
  <li><strong>Streaming results during iteration.</strong> Start streaming partial output from the first iteration while the agent decides whether to refine. The user perceives latency as “time to first useful content,” not “time to final response.”</li>
  <li><strong>Iteration budgets and timeouts.</strong> Hard cap on agent iterations: typically 2–3 for online queries.</li>
  <li><strong>Deterministic plan and analyze, with LLM escalation.</strong> Implement the <em>plan</em> and <em>analyze</em> steps as rules, heuristics, lookups, small fast classifiers — and reach for an LLM only when the deterministic version reports low confidence. You keep the agentic <em>architecture</em> while running it at deterministic-pipeline cost for 90% of queries.</li>
</ul>

<blockquote>
  <p>The agentic architecture and the LLM tax are <strong>separable</strong>. Build the plan-act-analyze loop deterministically. Open up to LLM-driven plan and analyze selectively, where measurement shows rules can’t carry the load.</p>
</blockquote>

<p>The combination matters more than any individual mitigation. A realistic production setup: a deterministic plan-act-analyze loop for every query (~120 ms baseline); hot-path bypass skipping the loop entirely for trivial queries (~80 ms, 40% of traffic); LLM escalation in plan or analyze for genuinely ambiguous queries (+200–300 ms, 8% of traffic); full multi-iteration LLM-driven loop reserved for the hardest cases (~600 ms, 2% of traffic). Weighted average: well under 150 ms.</p>

<blockquote class="is-red">
  <p>Agentic search is a <strong>latency tax</strong> — and the bill comes due on every iteration. Don’t ship a 1.5-second loop and hope users forgive you.</p>
</blockquote>

<h3 id="layers">The Guiding Idea</h3>

<p>A resilient stack accepts three truths. First, <em>the engine is the longest-lived component, but not the most valuable one</em> — you swap it maybe once every five years; the layers above grow every year. Second, <em>ranking is its own subsystem</em>, not an engine feature. Third, <em>the SERP is composed in the application layer</em>, not in the index.</p>

<h4 id="layer1--stable-domain-api-the-search-bff">Layer 1 — Stable Domain API (the Search BFF)</h4>

<p>The most important decision in the whole stack. The client (web, app, later agents via MCP) speaks <em>not</em> with the engine but with its own domain API, formulated in the language of your search — <code class="language-plaintext highlighter-rouge">/search?q=...&amp;filter=type:trick&amp;page=2</code>, not <code class="language-plaintext highlighter-rouge">/solr/select?q=...&amp;fq=...&amp;rows=10</code>. The response format is also independent: <code class="language-plaintext highlighter-rouge">{ hits: [...], facets: [...], suggestions: [...], answer?: {...} }</code> — and contains <em>no</em> Solr-specific fields.</p>

<p>Take this seriously, and you can switch engines later without touching the client. Don’t, and in two years you have <code class="language-plaintext highlighter-rouge">solrFacetCount</code> in your React code and never get out again.</p>

<h4 id="layer2--query-understanding">Layer 2 — Query Understanding</h4>

<p>Incoming query → outgoing structured representation (the internal <em>Query IR</em>). Spellcheck/did-you-mean, synonym expansion, language detection, intent classification, and increasingly LLM-based: sub-query decomposition, HyDE-style query hypotheses, entity linking to your own vocabulary.</p>

<p>Important: this stage returns an object, not a rewritten string. A good query IR looks like:</p>

<pre class="is-json">{
  "raw": "new tricks for beginners",
  "normalized": "new tricks for beginners",
  "language": "en",
  "intent": "browse",
  "entities": [{"type": "skill_level", "value": "beginner"}],
  "expanded_terms": ["tricks", "stunts", "moves"],
  "embeddings": { "dense": [], "sparse": {} },
  "subqueries": []
}</pre>

<p><strong>Vector search is not semantic search.</strong> This distinction lives here, in Query Understanding. <em>Vector search</em> embeds the query string and finds documents whose embeddings are close — a similarity operation, not a meaning operation. <em>True semantic search</em> takes the actual meaning of the query and reflects it into retrieval, often by rewriting or augmenting the query before it touches the engine.</p>

<p>The classic example is “wireless bras.” A general-purpose embedding model puts “wireless bras” near documents about bras in general — the word “wireless” is a weaker signal in the embedding than the word “bras,” and the model has no domain knowledge that, in this product category, “wireless” means “no underwire.” Pure vector search will happily return underwire bras as top results. True semantic search recognizes the intent — <em>no underwire</em> — and acts on it.</p>

<blockquote>
  <p>Vector search asks <strong>“what’s nearby?”</strong> Semantic search asks <strong>“what did you mean?”</strong> The first is math. The second is domain knowledge — and the document index alone won’t give it to you.</p>
</blockquote>

<h4 id="layer3--retrieval-orchestrator">Layer 3 — Retrieval Orchestrator</h4>

<p>The ranking brain. The orchestrator decides: <em>which</em> retrieval strategies run (BM25, dense, sparse/SPLADE, late interaction), <em>in parallel or sequentially</em>, <em>how to fuse</em> (RRF, linear combination, learned), and <em>how much</em> (top-K). This is where you can later <em>emulate</em> Vespa-style multi-phase ranking even if the engine only delivers phase 1.</p>

<h4 id="layer4--engine-adapter">Layer 4 — Engine Adapter</h4>

<p>The adapter translates the internal query IR into what the specific engine understands. Solr gets <code class="language-plaintext highlighter-rouge">{!bool should=$lex should=$knn}</code>, OpenSearch gets a <code class="language-plaintext highlighter-rouge">hybrid</code> pipeline, Qdrant gets a <code class="language-plaintext highlighter-rouge">query_points</code> call with prefetch. The adapter must contain <em>no</em> business logic — that belongs in the orchestrator or reranking. The adapter is dumb and mechanical; that’s its virtue.</p>

<h4 id="layer5--reranking-as-its-own-stage">Layer 5 — Reranking as Its Own Stage</h4>

<p>The most important <em>additional</em> service in this stack — and the one most teams get the biggest relevance gain from. In practice you run two tracks here: a fast cross-encoder or ColBERT for the default path (~50–100 ms on top-50), and optionally an LLM reranker for high-value queries. Reranker outputs are cacheable by <code class="language-plaintext highlighter-rouge">(query_hash, doc_id)</code> pair.</p>

<h4 id="layer6--result-composition">Layer 6 — Result Composition</h4>

<p>The SERP is assembled here. Hits from the reranker, facets from the engine, highlights, optionally a generative answer (RAG with the top-3 hits as context), possibly a dynamic UI hint. This layer will grow the most in the next several years, because generative UI and AI-Overview-style features dock here. That’s exactly why it must not touch the engine directly.</p>

<h2 id="filters">Filters and Facets — The Second Path</h2>

<p>The layers above optimize the <em>relevance path</em>: full-text query in, ranked hits out. Filters and facets are different in nature — deterministic, set-based, and need neither embeddings nor reranking. A future-proof architecture treats them as a second, leaner path through the same stack.</p>

<pre>Browse path (filters, no full-text query)
─────────────────────────────────────────

┌──────────────────────────────┐
│ Client (Web, App, Agent,MCP) │
│ Filters+facets, no q         │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│   Search BFF / Domain API    │
│ /search?filter=…&amp;facet=…     │
└──────────────┬───────────────┘
               │ (Query Understanding,
               │  Orchestrator, Reranking
               │  are skipped)
               ▼
┌──────────────────────────────┐
│        Engine adapter        │     ┌──────────────────────────┐
│ Filters as pre-filter        │ ◀── │ Facet-only / Suggest     │
│ Facet aggregations           │     │ (cacheable, separate)    │
└──────────────┬───────────────┘     └──────────────────────────┘
               │
               ▼
┌──────────────────────────────┐
│        Search engine         │
│ Filters + Facet in 1 round   │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│      Result composition      │
│ Hits + Facets + Selections   │
└──────────────────────────────┘</pre>

<h3 id="where-filters-and-facets-land-in-the-layers">Where Filters and Facets Land in the Layers</h3>

<ul>
  <li><strong>Domain API:</strong> Filters and facets must be <em>first-class</em>. <code class="language-plaintext highlighter-rouge">/search?q=...&amp;filter[category]=trick&amp;facet[]=brand</code>. The response format needs its own <code class="language-plaintext highlighter-rouge">facets</code> section with buckets, counts, and the currently active selection.</li>
  <li><strong>Query Understanding</strong> does noticeably less here. The exception is <em>natural-language filter extraction</em>: “cheap BMX bikes under 500 euros” should become <code class="language-plaintext highlighter-rouge">{q: "BMX bikes", filter: {price: "&lt;500"}}</code>.</li>
  <li><strong>Retrieval Orchestrator:</strong> the browse path forks from the relevance path. Pure filter queries need no hybrid fusion, no RRF, no embeddings.</li>
  <li><strong>Engine Adapter:</strong> the most important technical pitfall is here. Filters must go to the KNN search as <em>pre-filter</em>, not as <em>post-filter</em>. Solr 10 supports pre-filtering via the <code class="language-plaintext highlighter-rouge">filter</code> clause of the KNN query, OpenSearch via <code class="language-plaintext highlighter-rouge">efficient_filter</code>, Qdrant via native filter conditions.</li>
  <li><strong>Reranking</strong> is skipped in the browse path. Reranking a purely filtered list with no query is pointless.</li>
  <li><strong>Result Composition</strong> builds the facet UI from the engine’s buckets — and decides <em>which</em> facets are displayed (sticky, conditional, hierarchical).</li>
</ul>

<h3 id="do-facets-have-to-come-from-the-search-engine">Do Facets Have to Come from the Search Engine?</h3>

<p>The honest answer: usually yes, but not necessarily. The dividing line runs along the question of whether the facet refers to the <em>current result set</em> or to the <em>full corpus or analytics data</em>.</p>

<p><strong>Should come from the engine</strong> are all facets that aggregate over the current search/filter result set — the classic refinement facets. <strong>Don’t have to come from the engine</strong> are global counts and analytics-style aggregates — those belong in an OLAP store like ClickHouse or Druid.</p>

<blockquote>
  <p>The decisive question: does the facet count the <strong>current result set</strong>, or the <strong>whole corpus</strong>? Result set → engine. Corpus → can live elsewhere.</p>
</blockquote>

<h3 id="when-filters-cost-you-recall--and-how-llms-change-the-calculus">When Filters Cost You Recall — and How LLMs Change the Calculus</h3>

<p>There’s a rule every senior search engineer has evangelized at some point: <strong>do not pre-select filters from a free-text query.</strong> The reasoning is sound. Pre-selecting filters destroys recall in two ways at once. <em>Misclassification:</em> the system infers a filter the user didn’t intend, and correct documents disappear. <em>Missing attribute data:</em> documents that <em>would</em> match are tagged inconsistently or not at all in the filter field. The user sees a thin, wrong result set and walks.</p>

<p>The question is whether LLMs change the calculus, and the honest answer is: yes, but only with deliberate safeguards. Three patterns make filter inference safer:</p>

<ul>
  <li><strong>Filters as boosts, not gates.</strong> <code class="language-plaintext highlighter-rouge">boost:underwire=false^3.0</code> instead of <code class="language-plaintext highlighter-rouge">filter:underwire=false</code>. Trades a small amount of precision for a meaningful amount of safety.</li>
  <li><strong>Confidence-aware filter application.</strong> The LLM returns a confidence score with each candidate. High-confidence numeric constraints → filters. Lower-confidence semantic inferences → boosts or omitted.</li>
  <li><strong>Agentic iteration with recall checks.</strong> Apply the inferred filter, look at the result count. If the result set collapsed below threshold, drop the filter and re-run. The orchestrator detects the failure mode and self-corrects.</li>
</ul>

<blockquote>
  <p>The old rule “never pre-select filters from a free-text query” wasn’t wrong — it was right for a system that <strong>couldn’t recover.</strong> With LLM-driven query understanding and an orchestrator that can iterate, the rule becomes “pre-select as <strong>boosts</strong>, with <strong>confidence</strong>, with a <strong>fallback path</strong>.” Same caution, more tools.</p>
</blockquote>

<h3 id="consequences-for-engine-choice">Consequences for Engine Choice</h3>

<p>Solr and OpenSearch have the most mature faceting engines in the Lucene family. If a use case is heavily browse- and filter-driven (product catalog, classical site search), Lucene-based stays the natural choice. For RAG-centric use cases where facets play a minor role, the weakness of the vector-first DBs is acceptable.</p>

<blockquote class="is-red">
  <p>The split between <strong>relevance path</strong> and <strong>browse path</strong> belongs in the orchestrator — not the adapter, not the composer. Miss it, and pure filter queries push embeddings through the stack for nothing.</p>
</blockquote>

<h2 id="autosuggest">Autosuggest — The Underestimated Lever</h2>

<p>Autosuggest is not an afterthought in e-commerce. Vinted reports that over 20% of all search sessions now <em>start</em> with a click on a suggest result — a few years ago it was below 8%. The system handles 4,700 queries per second with P99 of 31 ms against a pool of 125 million suggestions. That’s not UX polish, that’s a direct conversion lever.</p>

<h3 id="autosuggest-is-not-ordinary-search">Autosuggest Is Not Ordinary Search</h3>

<ul>
  <li><strong>Latency class:</strong> P99 below ~30–50 ms against a large suggestion pool, on every keystroke. Full-text search may take 200 ms; suggest may not.</li>
  <li><strong>Load profile:</strong> 5–8 suggest calls per submitted search — suggest QPS is typically 5–10× higher than search QPS.</li>
  <li><strong>Its own index, its own ranking logic:</strong> we rank <em>queries</em>, not documents. At Vinted, query-log candidates make up only 2% of the pool but generate about half of all clicks.</li>
  <li><strong>Ranking signals differ:</strong> not BM25 + vector but STR (sell-through rate), suggestion CTR, prefix-level click frequency, and crucially: <em>input length</em>.</li>
  <li><strong>Its own fallback logic:</strong> progressive relaxation — exact prefix → fuzzy(1) → fuzzy(2) — with stop-as-soon-as-10-results.</li>
</ul>

<blockquote>
  <p>Suggest is its own <strong>subsystem</strong> — its own index, its own latency class, its own ranking model. Treat it as a setting on full-text search and you build a feature. Architect it as its own path and you build a <strong>conversion lever</strong>.</p>
</blockquote>

<h3 id="what-solr10-brings-to-the-table">What Solr 10 Brings to the Table</h3>

<p>Solr has traditionally had a rich suggester infrastructure. The building blocks are solid, but the gap to the Vinted/Vespa reference architecture is real.</p>

<p><strong>Existing building blocks in Solr 10:</strong> <code class="language-plaintext highlighter-rouge">AnalyzingInfixSuggester</code> and <code class="language-plaintext highlighter-rouge">BlendedInfixSuggester</code> (Lucene-based, with a real analyzer chain); <code class="language-plaintext highlighter-rouge">FuzzySuggester</code> for Levenshtein-based typo tolerance; <code class="language-plaintext highlighter-rouge">WFSTCompletionLookup</code> / <code class="language-plaintext highlighter-rouge">FSTCompletionLookup</code> for very fast FST-based lookups (FSTLookupFactory is the new default in 10); EdgeNGram field type as a manual path; context filtering; chained suggesters mapping the tier architecture; mature LTR with vector features since 9.3.</p>

<p><strong>Where Solr 10 falls structurally behind Vespa:</strong> LTR in the hot path on every keystroke is possible but uncomfortable. Real-time feature store for user features is missing. Accent tolerance with intent preservation isn’t out-of-the-box. Streaming-mode indexing for suggest-pool updates is doable but not the standard path.</p>

<h3 id="concrete-recommendation-for-building-autosuggest">Concrete Recommendation for Building Autosuggest</h3>

<p>If the customer today has Solr 9 with rudimentary suggest and wants to raise the level with Solr 10, I would <em>not</em> start with LTR. Vinted’s data are very clear: the biggest lever wasn’t ML reranking, but adding query-log candidates to the pool.</p>

<ol>
  <li><strong>Raise the baseline:</strong> BlendedInfixSuggester on a dedicated suggest core, pool from product metadata + search logs, simple heuristic, progressive relaxation in two tiers. A 2–3 week project, probably captures 80% of the Vinted effect.</li>
  <li><strong>Build out tier matching and measure:</strong> add the third fuzzy tier, set up A/B tests. Tune Solr suggest performance to P99 &lt; 30 ms.</li>
  <li><strong>Personalization via reranker service:</strong> only then add LightGBM reranking as its own stage. Start with few, high-impact features.</li>
  <li><strong>Session awareness and personal history</strong> as API features, no engine changes needed.</li>
</ol>

<blockquote>
  <p>The biggest suggest lever isn’t the <strong>model</strong> — it’s the <strong>pool</strong>. Real user queries from your search logs beat any personalization you can bolt on top.</p>
</blockquote>

<h2 id="indexing">The Indexing Path</h2>

<p>The indexing pipeline is <em>the</em> place where embedding discipline is decided. Three rules:</p>

<ul>
  <li><strong>Embeddings are versioned.</strong> Every embedding carries a model tag (<code class="language-plaintext highlighter-rouge">bge-m3-v1</code>, <code class="language-plaintext highlighter-rouge">e5-large-v2</code>). When you change the model, dual-write runs: all new documents get both embeddings, the backfill runs in the background, and only when 100% coverage is reached does the query side switch over.</li>
  <li><strong>Embedding generation as its own service</strong>, not as an engine plugin. Solr 10’s <code class="language-plaintext highlighter-rouge">TextToVectorQParser</code> is tempting, but it binds the embedding logic to the engine. Better: a small dedicated service called both from the indexing pipeline and from query understanding. Same model on both sides — that’s the point that often goes wrong.</li>
  <li><strong>The pipeline is declarative</strong>, ideally CDC-driven. A document update in the CMS → an event → the pipeline normalizes, chunks, embeds, indexes. No cron job, no “reindex button.”</li>
</ul>

<h3 id="the-central-cross-layer--evaluation">The Central Cross-Layer — Evaluation</h3>

<p>This is the service 80% of teams forget, and it has the largest lever. Three components:</p>

<ul>
  <li>A <strong>goldset</strong> with query → expected top-K, maintained by the business side. A nightly job computes NDCG, MRR, recall@K with explicit metric targets.</li>
  <li>An <strong>A/B infrastructure</strong> running two configurations in parallel and measuring online metrics (CTR, position of first click, reformulation rate, zero-result rate).</li>
  <li><strong>Structured telemetry</strong> linking every click to the query IR active at the time and the displayed hit list. This is simultaneously the training-data pipeline for later learning-to-rank.</li>
</ul>

<p>Without this layer you can’t measure improvements, and without measurement every change becomes an act of faith.</p>

<h2 id="together">Putting It Together — What This Means for Solr 10</h2>

<p>In the customer’s context: Solr is the engine in the “Search engine” box. Around it sit embedding service (standalone), query understanding (standalone, initially simple), retrieval orchestrator (initially thin, just hybrid + RRF), adapter (Solr-specific), reranker (standalone, with a cross-encoder), composition (standalone). Search BFF at the top.</p>

<p>If you build it this way, a later switch to OpenSearch costs only the adapter swap and a reindex, no replatforming. A switch to Vespa costs more — parts of the orchestrator and reranker migrate into Vespa, because Vespa does this natively — but the domain API and the client stay untouched.</p>

<blockquote class="is-red">
  <p>Keep the <strong>domain API</strong>, the <strong>reranker</strong>, and the <strong>composer</strong> Solr-free. Those three layers clean, everything else is fixable. Those three layers dirty, nothing is.</p>
</blockquote>

<p>So the honest answer to “which engine ages best” is therefore: the one you hard-wire the least.</p>

<h2 id="sources">Sources for Deeper Research</h2>

<h4 id="engine-documentation-and-releases">Engine documentation and releases</h4>

<ul>
  <li><a href="https://solr.apache.org/guide/solr/latest/query-guide/suggester.html" target="_blank" rel="noopener">Apache Solr 10 Reference Guide — Suggester</a></li>
  <li><a href="https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-10.html" target="_blank" rel="noopener">Major Changes in Solr 10</a></li>
  <li><a href="https://sease.io/category/apache-solr" target="_blank" rel="noopener">Sease.io blog on Solr vector search and KNN optimization</a></li>
  <li><a href="https://docs.vespa.ai/" target="_blank" rel="noopener">Vespa.ai documentation</a> and <a href="https://blog.vespa.ai/" target="_blank" rel="noopener">Vespa blog</a></li>
  <li><a href="https://qdrant.tech/articles/" target="_blank" rel="noopener">Qdrant articles</a> — BM42, RRF, DBSF, hybrid reranking patterns</li>
  <li><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/search-as-you-type.html" target="_blank" rel="noopener">Elasticsearch search-as-you-type field type</a></li>
</ul>

<h4 id="e-commerce-search-field-reports">E-commerce search field reports</h4>

<ul>
  <li><a href="https://vinted.engineering/2026/04/22/personalized-search-autocomplete/" target="_blank" rel="noopener">Vinted Engineering: How Vinted Serves Personalised Search Autocomplete</a></li>
  <li><a href="https://vinted.engineering/2024/09/05/goodbye-elasticsearch-hello-vespa/" target="_blank" rel="noopener">Vinted Engineering: Goodbye Elasticsearch, Hello Vespa</a></li>
  <li><a href="https://spinscale.de/posts/2023-01-18-mirror-mirror-what-am-i-typing-next.html" target="_blank" rel="noopener">Alexander Reelsen: Mirror, mirror, what am I typing next?</a></li>
  <li><a href="https://spinscale.de/posts/2020-06-22-implementing-a-modern-ecommerce-search.html" target="_blank" rel="noopener">Alexander Reelsen: Implementing a Modern E-Commerce Search</a></li>
  <li><a href="https://pureinsights.com/blog/2025/elasticsearch-vs-opensearch-2025/" target="_blank" rel="noopener">Pureinsights: Elasticsearch vs OpenSearch in 2025</a></li>
</ul>

<h4 id="research-and-curated-collections">Research and curated collections</h4>

<ul>
  <li><a href="https://github.com/frutik/awesome-search" target="_blank" rel="noopener">frutik/awesome-search</a></li>
  <li><a href="https://softwaredoug.com/blog/2026/05/11/the-new-agentic-search-models" target="_blank" rel="noopener">Doug Turnbull: Agentic Search Models</a></li>
  <li><a href="https://www.sid.ai/research/sid-1" target="_blank" rel="noopener">SID-1 research note</a></li>
  <li><a href="https://www.glean.com/blog/waldo-launch" target="_blank" rel="noopener">Glean: Waldo launch</a></li>
  <li><a href="https://arxiv.org/abs/2407.01449" target="_blank" rel="noopener">ColPali: Efficient Document Retrieval with Vision Language Models</a></li>
  <li><a href="https://arxiv.org/abs/2204.10936" target="_blank" rel="noopener">Counterfactual Learning to Rank for Utility-Maximizing Query Autocompletion</a></li>
  <li><a href="https://seekstorm.com/blog/pruning-radix-trie/" target="_blank" rel="noopener">Pruning Radix Trie (Wolf Garbe, SeekStorm Blog)</a></li>
  <li><a href="https://lucidworks.com/post/auto-suggest-from-popular-queries-using-edgengrams/" target="_blank" rel="noopener">Lucidworks: Auto-Suggest from Popular Queries Using EdgeNGrams</a></li>
  <li><a href="https://sigir-ecom.github.io/" target="_blank" rel="noopener">SIGIR Workshop on eCommerce</a></li>
</ul>

<aside class="process-note">
  <span class="process-note__label">Colophon</span>
  <h4 class="process-note__head">How this came together</h4>
  <p>
    This document is the residue of <strong>a couple of weeks of musing
    with Claude</strong> and discussing architecture ideas with
    <strong>Tobias Kässmann</strong>.
  </p>
  <p>
    What you&rsquo;re reading is a <em>living document</em>. As Solr&nbsp;10
    matures, as the agentic layer shifts, as the field reports keep
    landing, the recommendation gets re-tested and the prose gets re-cut.
    The thinking is mine; the iteration speed and the willingness to argue
    against yesterday&rsquo;s position are not entirely.
  </p>
  <p class="process-note__sig">
    <span>— Torsten Bøgh Köster, May&nbsp;2026</span>
    <span class="process-note__stamp">Published&nbsp;May&nbsp;18,&nbsp;2026</span>
  </p>
</aside>]]></content><author><name>Torsten Bøgh Köster</name></author><summary type="html"><![CDATA[An honest 2026 assessment of open-source hybrid-search options on top of Apache Solr — Solr 10, OpenSearch, Elasticsearch, Vespa, Qdrant/Weaviate — and the layered architecture that keeps the engine replaceable.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://solr.cool/assets/social-card.png" /><media:content medium="image" url="https://solr.cool/assets/social-card.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>