Skip to main content

Introduction

We want to create a Python script that uses HelixDB to help students find professors based on their research areas and bio. For example a user could ask
  • “What professors does High-Energy Physics research?”
  • “What professors are working in the X University?”
  • “What professors are working in the Computer Science department?”
  • “What professors are working in the X University and are working in the Computer Science department?”
  • “I like doing research in Large Language Models, can you recommend me some professors doing this in X University?”

The dataset

In this example, we have a dataset on Professors with fields like:
  • Name
  • Department
  • University
  • Short biography
  • Key Research Areas
We will be ingesting this data from a JSON file, an example is shown below:
{
  "name": "James",
  "department": "Computer Science",
  "university": "Uni X",
  "bio": "James is an Assistant Professor focusing on basketball analytics, computer vision, and machine learning. Projects include ShotNet and DunkGPT.",
  "key_research_areas": [
    {"area": "Computer Vision for Basketball"},
    {"area": "Predictive Modelling & Simulation"},
    {"area": "Sports Analytics with Large Language Models"},
    {"area": "Wearable Sensor Data Mining"},
    {"area": "Fairness & Ethics in Sports AI"}
  ]
}

Building a Graph

Based on this data, we can create a Vector Graph RAG with the following nodes and edges: Nodes:
  • Professor Node with properties name, bio
  • Research Area Node with properties research_area
  • Department Node with the property name
  • University Node with the property name
Vector Nodes for Embeddings: We will have an embedding of the professor’s research area
  • Vector Node with the property research_area
Edges:
  • Professor to Research Area Edge
  • Professor to Department Edge
  • Professor to University Edge
  • Professor to Vector Node Edge
I