helix-py is a Python library for interacting with helix-db,
a powerful graph-vector database written in Rust. It provides both a simple query interface and a PyTorch-like
front-end for defining and executing custom graph queries and vector-based operations. This makes it
well-suited for use cases such as similarity search, knowledge graph construction, and machine learning pipelines.
Installation
To setup a simple Client
to interface with a running helix instance:
import helix
# Connect to a local helix instance
db = helix.Client(local=True, verbose=True)
# Note that the query name is case sensitive
db.query('add_user', {"name": "John", "age": 20})
The default port is 6969
, but you can change it by passing in the port
parameter.
For cloud instances, you can pass in the api_endpoint
parameter.
Queries
helix-py allows users to define a PyTorch-like manner, similar to how
you would define a neural networkβs forward pass. You can use built-in queries in helix/client.py
to get started with inserting and search vectors, or you can define your own queries for more complex workflows.
Pytorch-like Query
Given a HelixQL query like this:
QUERY add_user(name: String, age: I64) =>
usr <- AddV<User>({name: name, age: age})
RETURN usr
You can define a matching Python class:
from helix.client import Query
from helix.types import Payload
class add_user(Query):
def __init__(self, name: str, age: int):
super().__init__()
self.name = name
self.age = age
def query(self) -> Payload:
return [{ "name": self.name, "age": self.age }]
def response(self, response):
return response
db.query(add_user("John", 20))
Make sure that the Query.query
method returns a list of objects.
Instance
To setup a simple Instance
that manages and automatically starts and stops a helix instance with respect
to the lifetime of the script:
from helix.instance import Instance
helix_instance = Instance("helixdb-cfg", 6969, verbose=True)
# Deploy & redeploy instance
helix_instance.deploy()
# Start instance
helix_instance.start()
# Stop instance
helix_instance.stop()
# Delete instance
helix_instance.delete()
# Instance status
print(helix_instance.status())
helixdb-cfg
is the directory where the configuration files are stored.
and from there you can interact with the instance using Client
.
The instance will be automatically stopped when the script exits.
Providers
Helix has LLM interfaces for popular LLM providers.
Available providers:
OpenAIProvider
GeminiProvider
AnthropicProvider
Donβt forget to set the OPENAI_API_KEY
, GEMINI_API_KEY
, and ANTHROPIC_API_KEY
environment variables depending on the provider you are using.
All providers expose two methods:
enable_mcps(name: str, url: str=...) -> bool
to enable Helix MCP tools
generate(messages, response_model: BaseModel | None=None) -> str | BaseModel
The generate method supports messages in the 2 formats:
- Free-form text: pass a string
- Message lists: pass a list of
dict
or provider-specific Message
models
It also supports structured outputs by passing a Pydantic model to get validated results.
Example:
from pydantic import BaseModel
# OpenAI
from helix.providers.openai_client import OpenAIProvider
openai_llm = OpenAIProvider(
name="openai-llm",
instructions="You are a helpful assistant.",
model="gpt-5-nano",
history=True
)
print(openai_llm.generate("Hello!"))
class Person(BaseModel):
name: str
age: int
occupation: str
print(openai_llm.generate([{"role": "user", "content": "Who am I?"}], Person))
To enable MCP tools with a running Helix MCP server (see MCP Feature):
openai_llm.enable_mcps("helix-mcp") # uses default http://localhost:8000/mcp/
gemini_llm.enable_mcps("helix-mcp") # uses default http://localhost:8000/mcp/
anthropic_llm.enable_mcps("helix-mcp", url="https://your-remote-mcp/...")
- OpenAI GPT-5 family models support reasoning while other models use temperature.
- Anthropic local streamable MCP is not supported; use a URL-based MCP.
Embedders
Helix has embedder interfaces for popular embedding providers.
Available embedders:
OpenAIEmbedder
GeminiEmbedder
VoyageAIEmbedder
Each embedder implements:
embed(text: str, **kwargs)
returns a vector [F64]
embed_batch(texts: List[str], **kwargs)
returns a list of vectors [F64]
Examples (see examples/llm_providers/providers.ipynb
for more):
from helix.embedding.openai_client import OpenAIEmbedder
openai_embedder = OpenAIEmbedder() # requires OPENAI_API_KEY
vec = openai_embedder.embed("Hello world")
batch = openai_embedder.embed_batch(["a", "b", "c"])
from helix.embedding.gemini_client import GeminiEmbedder
gemini_embedder = GeminiEmbedder()
vec = gemini_embedder.embed("doc text", task_type="RETRIEVAL_DOCUMENT")
from helix.embedding.voyageai_client import VoyageAIEmbedder
voyage_embedder = VoyageAIEmbedder()
vec = voyage_embedder.embed("query text", input_type="query")
Chunking
Helix uses Chonkie chunking methods to split text into manageable pieces for processing and embedding:
from helix import Chunk
text = "Your long document text here..."
chunks = Chunk.token_chunk(text)
semantic_chunks = Chunk.semantic_chunk(text)
code_text = "def hello(): print('world')"
code_chunks = Chunk.code_chunk(code_text, language="python")
texts = ["Document 1...", "Document 2...", "Document 3..."]
batch_chunks = Chunk.sentence_chunk(texts)
You can find all the different chunking examples inside of Chunking Feature.
The loader (helix/loader.py
) currently supports .parquet
, .fvecs
, and .csv
data. Simply pass in the path to your
file or files and the columns you want to process and the loader does the rest for you and is easy to integrate with
your queries
For more information, check out our examples!