Skip to main content

Rerank with RRF (Reciprocal Rank Fusion) Β 

RerankRRF is a powerful technique for combining multiple ranked lists without requiring score calibration. It’s particularly effective for hybrid search scenarios where you want to merge results from different search methods like vector search and BM25 keyword search.
::RerankRRF()           // Uses default k=60
::RerankRRF(k: 60)      // Custom k parameter
When using the SDKs or curling the endpoint, the query name must match what is defined in the queries.hx file exactly.

How It Works

RRF uses a simple but effective formula to combine rankings:
RRF_score(d) = Ξ£ 1/(k + rank_i(d))
Where:
  • d is a document/result
  • rank_i(d) is the rank of document d in the i-th result list
  • k is a constant that controls the impact of ranking differences
The algorithm works by:
  1. Taking multiple ranked lists of results
  2. For each item, calculating its RRF score based on its positions across all lists
  3. Re-ranking items by their combined RRF scores
  4. Items that appear highly ranked in multiple lists get boosted

When to Use RerankRRF

Use RerankRRF when you want to:
  • Merge multiple search methods: Combine vector search with BM25, or multiple vector searches with different embeddings
  • Avoid score normalization: RRF doesn’t require calibrating or normalizing scores from different search systems
  • Boost consensus results: Items that rank highly across multiple search strategies will be prioritized
  • Implement hybrid search: Create a unified ranking from complementary search approaches

Parameters

k (optional, default: 60)

Controls how much weight is given to ranking position:
  • Higher values (e.g., 80-100): Reduce the impact of ranking differences, treat all highly-ranked items more equally
  • Lower values (e.g., 20-40): Emphasize top-ranked items more strongly, create sharper distinctions
  • Default (60): Provides a balanced approach suitable for most use cases
Start with the default value of 60 and adjust based on your specific data distribution and requirements.

Example 1: Basic reranking with default parameters

QUERY SearchDocuments(query_vec: [F64]) =>
    results <- SearchV<Document>(query_vec, 100)
        ::RerankRRF()
        ::RANGE(0, 10)
    RETURN results

QUERY InsertDocument(vector: [F64], title: String, content: String) =>
    document <- AddV<Document>(vector, {
        title: title,
        content: content
    })
    RETURN document
Here’s how to run the query using the SDKs or curl
from helix.client import Client

client = Client(local=True, port=6969)

documents = [
    {"vector": [0.1, 0.2, 0.3, 0.4], "title": "Machine Learning Basics", "content": "Introduction to ML"},
    {"vector": [0.2, 0.3, 0.4, 0.5], "title": "Deep Learning", "content": "Neural networks explained"},
    {"vector": [0.15, 0.25, 0.35, 0.45], "title": "AI Applications", "content": "Real-world AI uses"},
]

for doc in documents:
    client.query("InsertDocument", doc)

query_vector = [0.12, 0.22, 0.32, 0.42]
results = client.query("SearchDocuments", {"query_vec": query_vector})
print("Search results:", results)

Example 2: Custom k parameter for more aggressive reranking

QUERY SearchWithCustomK(query_vec: [F64]) =>
    results <- SearchV<Document>(query_vec, 100)
        ::RerankRRF(k: 30.0)
        ::RANGE(0, 20)
    RETURN results

QUERY InsertDocument(vector: [F64], title: String, content: String) =>
    document <- AddV<Document>(vector, {
        title: title,
        content: content
    })
    RETURN document
Here’s how to run the query using the SDKs or curl
from helix.client import Client

client = Client(local=True, port=6969)

documents = [
    {"vector": [0.1, 0.2, 0.3, 0.4], "title": "Python Tutorial", "content": "Learn Python basics"},
    {"vector": [0.2, 0.3, 0.4, 0.5], "title": "Advanced Python", "content": "Python design patterns"},
    {"vector": [0.15, 0.25, 0.35, 0.45], "title": "Python for Data Science", "content": "Using pandas and numpy"},
]

for doc in documents:
    client.query("InsertDocument", doc)

query_vector = [0.12, 0.22, 0.32, 0.42]
results = client.query("SearchWithCustomK", {"query_vec": query_vector})
print("Search results with custom k:", results)

Example 3: Dynamic k parameter from query input

QUERY FlexibleRerank(query_vec: [F64], k_value: F64) =>
    results <- SearchV<Document>(query_vec, 100)
        ::RerankRRF(k: k_value)
        ::RANGE(0, 10)
    RETURN results

QUERY InsertDocument(vector: [F64], title: String, content: String) =>
    document <- AddV<Document>(vector, {
        title: title,
        content: content
    })
    RETURN document
Here’s how to run the query using the SDKs or curl
from helix.client import Client

client = Client(local=True, port=6969)

documents = [
    {"vector": [0.1, 0.2, 0.3, 0.4], "title": "Web Development", "content": "HTML, CSS, JavaScript"},
    {"vector": [0.2, 0.3, 0.4, 0.5], "title": "Backend Engineering", "content": "APIs and databases"},
    {"vector": [0.15, 0.25, 0.35, 0.45], "title": "DevOps", "content": "CI/CD and infrastructure"},
]

for doc in documents:
    client.query("InsertDocument", doc)

query_vector = [0.12, 0.22, 0.32, 0.42]

# Try with different k values
for k in [30.0, 60.0, 90.0]:
    results = client.query("FlexibleRerank", {
        "query_vec": query_vector,
        "k_value": k
    })
    print(f"Results with k={k}:", results)

Best Practices

Retrieve Sufficient Candidates

Always fetch more results than you ultimately need to give RRF sufficient candidates to work with:
// Good: Fetch 100, return 10
SearchV<Document>(vec, 100)::RerankRRF()::RANGE(0, 10)

// Not ideal: Fetch 10, return 10
SearchV<Document>(vec, 10)::RerankRRF()::RANGE(0, 10)

Parameter Tuning

Start with the default k=60 and adjust based on your observations:
  • If results seem too similar, try a lower k (30-40) to emphasize top rankings
  • If you want more variety, try a higher k (80-100) to flatten differences
  • Test with real queries and evaluate result quality

Combining with Other Operations

RRF works well with filtering and other result operations:
QUERY AdvancedSearch(query_vec: [F64], min_score: F64) =>
    results <- SearchV<Document>(query_vec, 200)
        ::WHERE(_.distance::GT(min_score))  // Filter first
        ::RerankRRF()                        // Then rerank
        ::RANGE(0, 20)                       // Finally limit
    RETURN results

Performance Considerations

RRF is computationally efficient with O(n) complexity, making it suitable for real-time applications. However:
  • Larger candidate sets require more processing
  • Balance result quality with query latency
  • Consider caching frequently-accessed results

Use Cases

Combine text search with visual similarity for product discovery:
SearchV<Product>(query_embedding, 100)::RerankRRF(k: 50)::RANGE(0, 20)

Document Retrieval

Merge semantic search with keyword matching for comprehensive document retrieval:
SearchV<Document>(semantic_vec, 150)::RerankRRF()::RANGE(0, 10)

Content Recommendation

Blend multiple recommendation signals (user preferences, popularity, recency):
SearchV<Content>(user_profile_vec, 100)::RerankRRF(k: 70)::RANGE(0, 15)