Keyword Search using BM25 with SearchBM25 Â
Search for keywords in nodes using the BM25 ranking algorithm.
SearchBM25 < Type >( text , limit )
BM25 is a ranking function used for full-text search. It searches through the text properties of nodes and ranks results based on keyword relevance and frequency.
When using the SDKs or curling the endpoint, the query name must match what is defined in the queries.hx file exactly.
Example 1: Basic keyword search
QUERY SearchKeyword ( keywords : String , limit : I64 ) =>
documents <- SearchBM25 < Document >( keywords , limit )
RETURN documents
QUERY InsertDocument ( content : String , created_at : Date ) =>
document <- AddN < Document >({ content : content , created_at : created_at })
RETURN document
See all 7 lines
Here’s how to run the query using the SDKs or curl
Python
Rust
Go
TypeScript
Curl
from datetime import datetime, timezone
from helix.client import Client
client = Client( local = True , port = 6969 )
sample_docs = [
"Machine learning algorithms for data analysis" ,
"Introduction to artificial intelligence and neural networks" ,
"Database optimization techniques and performance tuning" ,
"Web development with modern JavaScript frameworks"
]
for content in sample_docs:
client.query( "InsertDocument" , {
"content" : content,
"created_at" : datetime.now(timezone.utc).isoformat(),
})
result = client.query( "SearchKeyword" , {
"keywords" : "machine learning algorithms" ,
"limit" : 5
})
print (result)
See all 24 lines
Example 2: Keyword search with postfiltering
QUERY SearchRecentKeywords ( keywords : String , limit : I64 , cutoff_date : Date ) =>
searched_docs <- SearchBM25 < Document >( keywords , limit )
documents <- searched_docs :: WHERE ( _ :: { created_at } :: GTE ( cutoff_date ))
RETURN documents
QUERY InsertDocument ( content : String , created_at : Date ) =>
document <- AddN < Document >({ content : content , created_at : created_at })
RETURN document
See all 8 lines
Here’s how to run the query using the SDKs or curl
Python
Rust
Go
TypeScript
Curl
from datetime import datetime, timezone, timedelta
from helix.client import Client
client = Client( local = True , port = 6969 )
recent_date = datetime.now(timezone.utc).isoformat()
old_date = (datetime.now(timezone.utc) - timedelta( days = 15 )).isoformat()
recent_docs = [
"Modern machine learning techniques in 2024" ,
"Latest artificial intelligence research papers"
]
for content in recent_docs:
client.query( "InsertDocument" , {
"content" : content,
"created_at" : recent_date,
})
old_docs = [
"Traditional machine learning approaches from last year" ,
"Historical AI development milestones"
]
for content in old_docs:
client.query( "InsertDocument" , {
"content" : content,
"created_at" : old_date,
})
cutoff_date = (datetime.now(timezone.utc) - timedelta( days = 10 )).isoformat()
result = client.query( "SearchRecentKeywords" , {
"keywords" : "machine learning artificial intelligence" ,
"limit" : 5 ,
"cutoff_date" : cutoff_date,
})
print (result)
See all 39 lines