Keyword Search using BM25 with `SearchBM25`

Search for keywords in nodes using the BM25 ranking algorithm.

SearchBM25<Type>(text, limit)

BM25 is a ranking function used for full-text search. It searches through the text properties of nodes and ranks results based on keyword relevance and frequency.

When using the SDKs or curling the endpoint, the query name must match what is defined in the queries.hx file exactly.

Example 1: Basic keyword search

QUERY SearchKeyword (keywords: String, limit: I64) =>
    documents <- SearchBM25<Document>(keywords, limit)
    RETURN documents

QUERY InsertDocument (content: String, created_at: Date) =>
    document <- AddN<Document>({ content: content, created_at: created_at })
    RETURN document

Here’s how to run the query using the SDKs or curl

from datetime import datetime, timezone
from helix.client import Client

client = Client(local=True, port=6969)

sample_docs = [
    "Machine learning algorithms for data analysis",
    "Introduction to artificial intelligence and neural networks",
    "Database optimization techniques and performance tuning",
    "Web development with modern JavaScript frameworks"
]

for content in sample_docs:
    client.query("InsertDocument", {
        "content": content,
        "created_at": datetime.now(timezone.utc).isoformat(),
    })

result = client.query("SearchKeyword", {
    "keywords": "machine learning algorithms",
    "limit": 5
})

print(result)

Example 2: Keyword search with postfiltering

QUERY SearchRecentKeywords (keywords: String, limit: I64, cutoff_date: Date) =>
    searched_docs <- SearchBM25<Document>(keywords, limit)
    documents <- searched_docs::WHERE(_::{created_at}::GTE(cutoff_date))
    RETURN documents

QUERY InsertDocument (content: String, created_at: Date) =>
    document <- AddN<Document>({ content: content, created_at: created_at })
    RETURN document

Here’s how to run the query using the SDKs or curl

from datetime import datetime, timezone, timedelta
from helix.client import Client

client = Client(local=True, port=6969)

recent_date = datetime.now(timezone.utc).isoformat()
old_date = (datetime.now(timezone.utc) - timedelta(days=15)).isoformat()

recent_docs = [
    "Modern machine learning techniques in 2024",
    "Latest artificial intelligence research papers"
]

for content in recent_docs:
    client.query("InsertDocument", {
        "content": content,
        "created_at": recent_date,
    })

old_docs = [
    "Traditional machine learning approaches from last year",
    "Historical AI development milestones"
]

for content in old_docs:
    client.query("InsertDocument", {
        "content": content,
        "created_at": old_date,
    })

cutoff_date = (datetime.now(timezone.utc) - timedelta(days=10)).isoformat()

result = client.query("SearchRecentKeywords", {
    "keywords": "machine learning artificial intelligence",
    "limit": 5,
    "cutoff_date": cutoff_date,
})

print(result)

Getting Started

Schema

Writing Data

Reading Data

Search

Graph Traversals

Aggregation

Advanced Queries

Reference

Keyword Search

Keyword Search using BM25 with `SearchBM25`

Example 1: Basic keyword search

Example 2: Keyword search with postfiltering

Getting Started

Schema

Writing Data

Reading Data

Search

Graph Traversals

Aggregation

Advanced Queries

Reference

​Keyword Search using BM25 with SearchBM25

​Example 1: Basic keyword search

​Example 2: Keyword search with postfiltering

Keyword Search using BM25 with `SearchBM25`

Example 1: Basic keyword search

Example 2: Keyword search with postfiltering