KrishSingaria
diff --git a/‎.github/workflows/release.yml‎
Lines changed: 59 additions & 0 deletions b/‎.github/workflows/release.yml‎
Lines changed: 59 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 8 additions & 3 deletions b/‎.gitignore‎
Lines changed: 8 additions & 3 deletions
diff --git a/‎CODE-DOCS.md‎
Lines changed: 225 additions & 0 deletions b/‎CODE-DOCS.md‎
Lines changed: 225 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
@@ -0,0 +1,59 @@
+name: Build and Publish Wheels
+
+on:
+  release:
+    types: [published]
+  workflow_dispatch: # Allows manual triggering for testing
+
+jobs:
+  build_wheels:
+    name: Build wheels on ${{ matrix.os }}
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [ubuntu-latest, windows-latest, macos-latest]
+
+    steps:
+      - uses: actions/checkout@v4
+
+      # Build the wheels
+      - name: Build wheels
+        uses: pypa/[email protected]
+        env:
+          # Skip old Python versions and PyPy to save time
+          CIBW_SKIP: "cp36-* cp37-* pp*"
+          # Force C++20 standard for Linux builds (Nanobind needs it)
+          CIBW_ENVIRONMENT_LINUX: "CXXFLAGS='-std=c++20'"
+
+      - uses: actions/upload-artifact@v4
+        with:
+          name: cibw-wheels-${{ matrix.os }}-${{ strategy.job-index }}
+          path: ./wheelhouse/*.whl
+
+  build_sdist:
+    name: Build source distribution
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Build sdist
+        run: pipx run build --sdist
+      - uses: actions/upload-artifact@v4
+        with:
+          name: cibw-sdist
+          path: dist/*.tar.gz
+
+  publish_to_pypi:
+    needs: [build_wheels, build_sdist]
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/download-artifact@v4
+        with:
+          pattern: cibw-*
+          path: dist
+          merge-multiple: true
+
+      - name: Publish to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        with:
+          user: __token__
+          password: ${{ secrets.PYPI_PASSWORD }}
@@ -1,5 +1,10 @@
 .vscode/
-*.gl
-graph.gl
 venv/
-*.exe
+build/
+benchmark/dataset
+
+*.exe
+*.gl
+*.zip
+*.npz
+benchmark/*.csv
@@ -0,0 +1,225 @@
+# GraphZero API Reference 📘
+
+This document details the Python API exposed by the `graphzero` C++ engine.
+
+## 📦 Core Class: `Graph`
+
+The main entry point for interacting with the graph.
+
+```python
+import graphzero as gz
+g = gz.Graph("path/to/graph.gl")
+
+```
+
+### Properties
+
+| Property | Type | Description |
+| --- | --- | --- |
+| `g.num_nodes` | `int` | Total number of nodes in the graph. |
+| `g.num_edges` | `int` | Total number of edges (directed). |
+
+### Methods
+
+#### `get_degree(node_id: int) -> int`
+
+Returns the out-degree (number of neighbors) for a specific node.
+
+* **Usage:** checking if a node is a dead-end before walking.
+
+#### `get_neighbours(node_id: int) -> numpy.ndarray`
+
+Returns a **1-D numpy ndarray** of neighbour node IDs (dtype: `np.int64`). This is returned from the C++ layer as a fast zero-copy buffer and can be used directly with NumPy/PyTorch.
+
+* **Notes:**
+  - The binding uses the British spelling `get_neighbours` (this is the function name exposed in the Python API).
+  - For very high-degree nodes prefer `sample_neighbours` or `batch_random_fanout` to avoid copying large arrays.
+
+---
+
+### 🎲 Sampling Methods (The Engine)
+
+These functions use OpenMP multithreading on the C++ side and release the GIL to fully saturate CPU/disk bandwidth. All batch functions return a **NumPy ndarray** of dtype `np.int64`.
+
+#### `batch_random_walk_uniform(start_nodes: List[int], walk_length: int) -> numpy.ndarray`  
+
+**The Speed King.** Performs unbiased uniform random walks.
+
+* **Return shape & dtype:** `ndarray` with shape `(len(start_nodes), walk_length)` and dtype `np.int64`.
+* **Algorithm:** At every step, pick a neighbour uniformly at random.
+* **Use Case:** DeepWalk, uniform walk baselines, and fast data generation for training.
+
+#### `batch_random_walk(start_nodes: List[int], walk_length: int, p: float = 1.0, q: float = 1.0) -> numpy.ndarray`  
+
+**The Biased Walker.** Performs Node2Vec-style 2nd-order random walks.
+
+* **Arguments:**
+  - `p` (Return parameter): Low = keeps walk local (BFS-like).
+  - `q` (In-out parameter): Low = explores far away (DFS-like).
+* **Return shape & dtype:** `ndarray` with shape `(len(start_nodes), walk_length)` and dtype `np.int64`.
+* **Performance:** Slower than uniform walks due to additional transition calculations.
+
+#### `batch_random_fanout(start_nodes: List[int], K: int) -> numpy.ndarray`  
+
+Performs uniform neighbor *fanout* sampling for a batch of start nodes (useful for GNN neighbour sampling).
+
+* **Behavior:** For each start node returns `K` sampled neighbour IDs (using reservoir sampling / uniform sampling without replacement where possible).
+* **Return shape & dtype:** `ndarray` with shape `(len(start_nodes), K)`, dtype `np.int64`.
+
+#### `sample_neighbours(start_node: int, K: int) -> numpy.ndarray`  
+
+Performs uniform neighbour sampling for a single node using **reservoir sampling**.
+
+* **Behavior:** Returns up to `K` neighbour IDs sampled uniformly at random. If the node degree <= `K`, all neighbours are returned.
+* **Return shape & dtype:** 1-D `ndarray` of length `<= K`, dtype `np.int64`.
+
+## 🛠️ Utilities
+
+#### `gz.convert_csv_to_gl(input_csv: str, output_bin: str, directed: bool)`
+
+Converts a raw Edge List CSV into the optimized GraphLite binary format (`.gl`).
+
+* **Input CSV Format:** Two columns (Source, Destination). Headers are ignored if they exist.
+* **Process:** 1.  **Pass 1:** Scans file to count degrees (Memory: Low).
+2.  **Allocation:** Creates the `.gl` file and `mmaps` it.
+3.  **Pass 2:** Reads CSV again and places edges into the correct memory buckets.
+* **Note:** This process handles graphs larger than RAM.
+
+
+# 🧠 Example: Training Node2Vec with PyTorch
+
+This script demonstrates how to use `GraphZero` to train a real Node2Vec model.
+Since `GraphZero` handles the **Data Loading** (the bottleneck), the GPU can focus entirely on **Training** (the math).
+
+**File:** `train_node2vec.py`
+
+```python
+import torch
+import torch.nn as nn
+import torch.optim as optim
+import graphzero as gz
+import numpy as np
+from torch.utils.data import DataLoader, Dataset
+
+# --- CONFIGURATION ---
+GRAPH_PATH = "papers100M.gl" # The beast
+EMBEDDING_DIM = 128
+WALK_LENGTH = 20
+WALKS_PER_EPOCH = 100_000 # Number of starts per batch
+BATCH_SIZE = 1024
+EPOCHS = 5
+
+print(f"Initializing GraphZero Engine on {GRAPH_PATH}...")
+g = gz.Graph(GRAPH_PATH)
+print(f"   Nodes: {g.num_nodes:,} | Edges: {g.num_edges:,}")
+
+# --- 1. THE DATASET (Powered by GraphZero) ---
+class GraphZeroWalkDataset(Dataset):
+    """
+    Generates random walks on-the-fly using C++ engine.
+    """
+    def __init__(self, graph_engine, num_walks, walk_len):
+        self.g = graph_engine
+        self.num_walks = num_walks
+        self.walk_len = walk_len
+        
+    def __len__(self):
+        # In a real scenario, this might be num_nodes
+        # For this demo, we define an arbitrary epoch size
+        return self.num_walks
+
+    def __getitem__(self, idx):
+        # We don't generate single walks (too slow).
+        # We let the DataLoader batch them, then call C++ in the collate_fn.
+        # So we just return a random start node here.
+        return np.random.randint(0, self.g.num_nodes)
+
+# --- 2. CUSTOM COLLATE FUNCTION (The Secret Sauce) ---
+def collate_walks(batch_start_nodes):
+    """
+    This is where the magic happens.
+    Instead of Python looping, we give the whole batch of start nodes 
+    to C++ and get back the massive walk matrix instantly.
+    """
+    # 1. Convert batch to list of uint64 for C++
+    start_nodes = [int(x) for x in batch_start_nodes]
+    
+    # 2. Call C++ Engine (Releases GIL, runs OpenMP)
+    # Result is a flat list: [walk1_step1, walk1_step2... walk2_step1...]
+    flat_walks = g.batch_random_walk_uniform(start_nodes, WALK_LENGTH)
+    
+    # 3. Reshape for PyTorch (Batch Size, Walk Length)
+    walks_tensor = torch.tensor(flat_walks, dtype=torch.long)
+    walks_tensor = walks_tensor.view(len(start_nodes), WALK_LENGTH)
+    
+    return walks_tensor
+
+# --- CONFIGURATION ADJUSTMENT ---
+# We map 204M nodes -> 1M unique embeddings to save RAM
+HASH_SIZE = 1_000_000  
+# RAM Usage: 1M * 128 * 4 bytes = ~512 MB (Very safe)
+
+# --- 3. THE MODEL (Hashed Skip-Gram) ---
+class Node2Vec(nn.Module):
+    def __init__(self, num_nodes, embed_dim):
+        super().__init__()
+        # INSTEAD OF: self.in_embed = nn.Embedding(num_nodes, embed_dim)
+        # WE USE:
+        self.in_embed = nn.Embedding(HASH_SIZE, embed_dim)
+        self.out_embed = nn.Embedding(HASH_SIZE, embed_dim)
+        
+    def forward(self, target, context):
+        # Hashing Trick: Map massive ID -> Small ID
+        # In a real app, you'd use a better hash, but modulo is fine for a demo
+        t_hashed = target % HASH_SIZE
+        c_hashed = context % HASH_SIZE
+        
+        v_in = self.in_embed(t_hashed)
+        v_out = self.out_embed(c_hashed)
+        
+        return torch.sum(v_in * v_out, dim=1)
+
+# --- 4. TRAINING LOOP ---
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = Node2Vec(g.num_nodes, EMBEDDING_DIM).to(device)
+optimizer = optim.Adam(model.parameters(), lr=0.01)
+
+# PyTorch DataLoader wraps our C++ engine
+loader = DataLoader(
+    GraphZeroWalkDataset(g, WALKS_PER_EPOCH, WALK_LENGTH), 
+    batch_size=BATCH_SIZE, 
+    collate_fn=collate_walks, # <--- Connects PyTorch to GraphZero
+    num_workers=0 # Windows needs 0, Linux can use more
+)
+
+print("\nStarting Training...")
+
+for epoch in range(EPOCHS):
+    total_loss = 0
+    
+    for batch_walks in loader:
+        # batch_walks shape: [1024, 20]
+        batch_walks = batch_walks.to(device)
+        
+        # Simple Positive Pair generation: (Current, Next)
+        # Real implementations use sliding windows, simplified here for brevity
+        target = batch_walks[:, :-1].flatten()
+        context = batch_walks[:, 1:].flatten()
+        
+        optimizer.zero_grad()
+        loss = -model(target, context).mean() # Dummy loss for demo
+        loss.backward()
+        optimizer.step()
+        
+        total_loss += loss.item()
+        
+    print(f"Epoch {epoch+1}/{EPOCHS} | Avg Loss: {total_loss/len(loader):.4f}")
+
+print("✅ Training Complete.")
+
+```
+
+This example showcases how `GraphZero` can be seamlessly integrated into a PyTorch training loop, allowing for efficient data loading and processing of massive graphs. The C++ engine handles the heavy lifting of random walk generation, freeing up Python to focus on model training.
+here is the screenshot of the output when running the script:
+
+![Training Output](benchmark/images/examplecode.png)
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Krish
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.