Python setup for the professors example using Qwen Embedding Model
This section will cover the Python code to connect to HelixDB and create nodes and edges, create embeddings for the professors, search by embeddings, and filtering.
Step 3: Creating the nodes that we will link professors to
Here we start creating all the nodes that we will link a professors to.NOTE: You can also make a query in helixdb to get the ID of the research area if you have the name, but we will store it in Python for simplicity.We will be using db.query("query_name", {"param": value}) to create the nodes and edges
Copy
Ask AI
research_areas = { "Computer Vision for Basketball": "Designing CNN and Transformer architectures that track player pose, ball trajectory, and court zones to quantify defensive pressure and shooting mechanics.", "Predictive Modelling & Simulation": "Building Monte-Carlo and sequence models that forecast possession outcomes and season performance using play-by-play and spatial data.", "Sports Analytics with Large Language Models": "Leveraging LLMs to explain model outputs, auto-generate commentary, and mine historical game archives for strategic patterns.", "Wearable Sensor Data Mining": "Applying time-series and graph learning techniques to inertial-measurement signals for fatigue monitoring and injury prevention.", "Fairness & Ethics in Sports AI": "Studying algorithmic bias and ensuring equitable analytics across different leagues, genders, and play styles."}research_area_ids = {}for research_area in research_areas: research_area_node = db.query("create_research_area", {"area": research_area}) research_area_ids[research_area] = research_area_node[0]['research_area']['id']departments = ["Computer Science", "Mathematics", "Physics", "Chemistry", "Biology"]department_ids = {}for department in departments: department_node = db.query("create_department", {"name": department}) department_ids[department] = department_node[0]['department']['id']universities = ["Uni X", "Uni Y", "Uni Z"]university_ids = {}for university in universities: university_node = db.query("create_university", {"name": university}) university_ids[university] = university_node[0]['university']['id']labs = {"Basketball Data Science Lab": "An interdisciplinary group combining data science, biomechanics, and sport psychology to create next-generation analytics tools for basketball."}lab_ids = {}for lab in labs: lab_node = db.query("create_lab", {"name": lab, "research_focus": labs[lab]}) lab_ids[lab] = lab_node[0]['lab']['id']
We will use Qwen3 Embedding 0.6B as an example to create embeddings for the professors.
Copy
Ask AI
for professor in professors: # Create Professor Node professor_node =db.query("create_professor", {"name": professor["name"], "title": professor["title"], "page": professor["page"], "bio": professor["bio"]}) professor_id = professor_node[0]['professor']['id'] # Link Professor to Research Area for research_area in professor["key_research_areas"]: if research_area['area'] in research_areas: research_area_id = research_area_ids[research_area['area']] db.query("link_professor_to_research_area", {"professor_id": professor_id, "research_area_id": research_area_id}) # Link Professor to Department for department in professor["department"]: if department in department_ids: department_id = department_ids[department] db.query("link_professor_to_department", {"professor_id": professor_id, "department_id": department_id}) # Link Professor to University for university in professor["university"]: if university in university_ids: university_id = university_ids[university] db.query("link_professor_to_university", {"professor_id": professor_id, "university_id": university_id}) # Link Professor to Lab for lab in professor["labs"]: if lab['name'] in lab_ids: lab_id = lab_ids[lab['name']] db.query("link_professor_to_lab", {"professor_id": professor_id, "lab_id": lab_id}) # Create Research Area Embedding research_area_and_description = "\n".join([research_area['area'] + ": " + research_area['description'] for research_area in professor['key_research_areas']]) research_area_and_description_embedding = model.encode(research_area_and_description).astype(float).tolist() db.query("create_research_area_embedding", {"professor_id": professor_id, "areas_and_descriptions": research_area_and_description, "vector": research_area_and_description_embedding})
We’ve now added all the nodes and edges to the graph, and created the embeddings for the professors. We can now search for similar professors based on their research area and description embeddings.
query = "Find me a professor who does computer vision for basketball"embedded_query_vector = model.encode(query).astype(float).tolist()results = db.query("search_similar_professors_by_research_area_and_description", {"query_vector": embedded_query_vector, "k": 5})print(results)
Python Print Result
Copy
Ask AI
[{'professors': [{'page': 'https://www.example.com', 'label': 'Professor', 'bio': 'James is an Assistant Professor whose work sits at the intersection of basketball analytics, computer vision, and large-scale machine learning. His research focuses on turning raw player-tracking video, wearable-sensor streams, and play-by-play logs into actionable insights for teams, coaches, and broadcasters. Signature projects include ShotNet— a deep learning model that predicts shot success probability in real time— and DunkGPT, a language model fine-tuned on millions of play descriptions to generate advanced scouting reports.', 'name': 'James', 'id': '...', 'title': 'Assistant Professor'}]}]
[{'research_areas': [{'areas_and_descriptions': 'Computer Vision for Basketball: Designing CNN and Transformer architectures that track player pose, ball trajectory, and court zones to quantify defensive pressure and shooting mechanics. Predictive Modelling & Simulation: Building Monte-Carlo and sequence models that forecast possession outcomes and season performance using play-by-play and spatial data. Sports Analytics with Large Language Models: Leveraging LLMs to explain model outputs, auto-generate commentary, and mine historical game archives for strategic patterns. Wearable Sensor Data Mining: Applying time-series and graph learning techniques to inertial-measurement signals for fatigue monitoring and injury prevention. Fairness & Ethics in Sports AI: Studying algorithmic bias and ensuring equitable analytics across different leagues, genders, and play styles.'}]}]