Getting Started With HelixDB Chunking
HelixDB uses Chonkie for chunking text data into manageable pieces. TheChunk
class provides various methods to split text based on different strategies.
Basic Usage
Default Values and Customization
All chunking methods come with Chonkie’s default values, so you can use them with minimal configuration as shown in the examples above. However, if you need to change the chunking behavior, each method accepts various parameters that you can modify according to your specific needs. The following sections detail all available chunking methods and their customizable parameters.Available Chunking Methods
Token Chunking
Splits text by tokens with specified chunk size and overlap.Sentence Chunking
Chunks text by sentences while respecting token limits.Recursive Chunking
Chunks text using recursive rules that split on headings, paragraphs, and other structures.Code Chunking
Chunks source code with language-specific syntax awareness.Semantic Chunking
Chunks text based on semantic similarity between sentences.Late Chunking
Combines recursive chunking with embeddings.Neural Chunking
Uses a neural network model trained specifically for text chunking.Slumber Chunking (LLM-guided)
Uses LLM guidance for optimal semantic boundaries.Example Output
When you run a chunking method, you’ll get a list of text chunks. Here’s an example of what the output might look like when usingtoken_chunk
: