Understanding Grouping Operations
HelixDB provides two powerful operations for organizing and summarizing data:GROUP_BY and AGGREGATE_BY. While they may seem similar, they serve different purposes and return different results.
Key Differences
| Feature | GROUP_BY | AGGREGATE_BY |
|---|---|---|
| Returns | Count summaries only | Full data objects + counts |
| Memory Usage | Low - only stores counts | Higher - stores all objects |
| Use Case | Analytics, distributions | Detailed reports, processing |
| Output Size | Small, compact | Large, comprehensive |
| Best For | Dashboards, statistics | Data analysis, transformations |
Syntax Comparison
Both operations support single or multiple properties:Output Format Comparison
GROUP_BY Output
AGGREGATE_BY Output
Performance Characteristics
GROUP_BY Performance
- Memory: O(n) where n = number of unique groups
- Speed: Fast - only counts are stored
- Bandwidth: Minimal - small response size
- Scalability: Excellent for large datasets
AGGREGATE_BY Performance
- Memory: O(m) where m = total number of items
- Speed: Moderate - full objects stored
- Bandwidth: Higher - complete data returned
- Scalability: Good for moderate datasets
For large datasets where you only need counts, GROUP_BY can be orders of magnitude more efficient in terms of memory and bandwidth usage.
Use Case Decision Tree
Best Practices
Use GROUP_BY When:
- Building analytics dashboards
- Showing data distributions
- Generating summary reports
- Optimizing for memory/bandwidth
- Working with large datasets (millions of records)
- Creating charts or graphs
Use AGGREGATE_BY When:
- Need to process grouped data further
- Building detailed reports with examples
- Need to display sample records per group
- Performing transformations on grouped items
- Working with moderate datasets (thousands of records)
- Building data exploration interfaces
When using the SDKs or curling the endpoint, the query name must match what is defined in the
queries.hx file exactly.Example 1: Side-by-Side Comparison - User Distribution
Example 2: Using COUNT with Both Operations
Common Pitfalls
Memory Issues with AGGREGATE_BY
Using GROUP_BY When You Need Data
Summary
Choose the right operation for your use case:- GROUP_BY: Lightweight, fast, perfect for counts and distributions
- AGGREGATE_BY: Comprehensive, detailed, ideal for data processing