Skip to content

✨ feat(sdk): sampleNodes endpoint for Explore wander-in starters#76

Merged
marcelsamyn merged 2 commits into
mainfrom
feat/sample-nodes
Jun 18, 2026
Merged

✨ feat(sdk): sampleNodes endpoint for Explore wander-in starters#76
marcelsamyn merged 2 commits into
mainfrom
feat/sample-nodes

Conversation

@marcelsamyn

Copy link
Copy Markdown
Owner

What & why

Adds a focused sampleNodes SDK endpoint that samples a handful of interesting (well-connected) nodes — the backing data for Petals' Memory → Explore empty-state "wander in" starters. The existing no-query queryGraph returns the entire labeled graph, far too heavy to fetch just to render ~6 chips, so this is a dedicated, cheap query.

Selection logic

  • Counts active claims per node where the node is subject or object (its "connections").
  • Keeps only labeled, userId-owned nodes with connectionCount >= 3.
  • Excludes scaffolding node types: Temporal, Atlas, AssistantDream.
  • Optional nodeTypes filter.
  • Ranks the top K=60 most-connected into a pool, then draws limit (default 6, max 24) at random — substantive results that vary per call (the UI's "shuffle").

Single Drizzle query (two CTEs → ORDER BY random()); no embeddings, no LLM.

Changes

  • src/lib/schemas/sample-nodes.ts — Zod request/response schemas.
  • src/lib/query/sample-nodes.tssampleInterestingNodes().
  • src/routes/query/sample-nodes.tsPOST /query/sample-nodes.
  • src/sdk/memory-client.tsMemoryClient.sampleNodes().
  • src/sdk/index.ts — barrel re-export of the schemas/types.

Note: no version bump in this PR — versioning/publish handled separately after merge to keep version history clean alongside parallel work.

How to test

  • pnpm test --run src/lib/query/sample-nodes.test.ts (needs the test Postgres on localhost:5431; docker compose up -d db). Covers: >=3 + noise-exclusion + labeled-only filtering, nodeTypes + limit, and the empty-result case. 3/3 pass.
  • pnpm run build — clean.

Checklist

  • Build green (pnpm run build)
  • Tests green (3/3, not skipped)
  • Version bump + publish (intentionally deferred)

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new endpoint and SDK method to sample 'interesting' nodes for the Explore empty state, selecting labeled, non-noise nodes with at least three active connections. The implementation includes the query logic using Drizzle ORM, Zod schemas for validation, and integration tests. The review feedback highlights a potential performance bottleneck in PostgreSQL due to the use of an OR condition in the database join. It is recommended to refactor the query using a unionAll subquery to unnest the connections and perform a clean equality join instead.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1 to +10
import {
and,
desc,
eq,
inArray,
isNotNull,
notInArray,
or,
sql,
} from "drizzle-orm";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Import unionAll from drizzle-orm to allow rewriting the query without an OR condition in the join, which degrades PostgreSQL query performance.

Suggested change
import {
and,
desc,
eq,
inArray,
isNotNull,
notInArray,
or,
sql,
} from "drizzle-orm";
import {
and,
desc,
eq,
inArray,
isNotNull,
notInArray,
sql,
unionAll,
} from "drizzle-orm";
References
  1. Avoid using OR conditions in database joins (e.g., in PostgreSQL) as it can severely degrade query performance by preventing efficient index usage.

Comment on lines +35 to +77
// CTE 1: connection counts for qualifying nodes.
const conn = db.$with("conn").as(
db
.select({
id: nodes.id,
nodeType: nodes.nodeType,
label: nodeMetadata.label,
description: nodeMetadata.description,
connectionCount: sql<string>`count(${claims.id})`.as(
"connection_count",
),
})
.from(nodes)
.innerJoin(nodeMetadata, eq(nodeMetadata.nodeId, nodes.id))
.innerJoin(
claims,
and(
eq(claims.userId, userId),
eq(claims.status, "active"),
or(
eq(claims.subjectNodeId, nodes.id),
eq(claims.objectNodeId, nodes.id),
),
),
)
.where(
and(
eq(nodes.userId, userId),
isNotNull(nodeMetadata.label),
notInArray(nodes.nodeType, NOISE_NODE_TYPES),
...(nodeTypes && nodeTypes.length > 0
? [inArray(nodes.nodeType, nodeTypes)]
: []),
),
)
.groupBy(
nodes.id,
nodes.nodeType,
nodeMetadata.label,
nodeMetadata.description,
)
.having(sql`count(${claims.id}) >= ${MIN_CONNECTIONS}`),
);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Avoid using OR conditions in database joins (e.g., or(eq(claims.subjectNodeId, nodes.id), eq(claims.objectNodeId, nodes.id))) as it can severely degrade query performance by preventing efficient index usage in PostgreSQL. Instead, use a unionAll subquery to unnest the subject and object connections, and then perform a clean equality join.

  const connections = unionAll(
    db
      .select({ nodeId: claims.subjectNodeId, claimId: claims.id })
      .from(claims)
      .where(and(eq(claims.userId, userId), eq(claims.status, "active"))),
    db
      .select({ nodeId: claims.objectNodeId, claimId: claims.id })
      .from(claims)
      .where(
        and(
          eq(claims.userId, userId),
          eq(claims.status, "active"),
          isNotNull(claims.objectNodeId),
        ),
      ),
  ).as("connections");

  // CTE 1: connection counts for qualifying nodes.
  const conn = db.$with("conn").as(
    db
      .select({
        id: nodes.id,
        nodeType: nodes.nodeType,
        label: nodeMetadata.label,
        description: nodeMetadata.description,
        connectionCount: sql<string>`count(${connections.claimId})`.as(
          "connection_count",
        ),
      })
      .from(nodes)
      .innerJoin(nodeMetadata, eq(nodeMetadata.nodeId, nodes.id))
      .innerJoin(connections, eq(connections.nodeId, nodes.id))
      .where(
        and(
          eq(nodes.userId, userId),
          isNotNull(nodeMetadata.label),
          notInArray(nodes.nodeType, NOISE_NODE_TYPES),
          ...(nodeTypes && nodeTypes.length > 0
            ? [inArray(nodes.nodeType, nodeTypes)]
            : []),
        ),
      )
      .groupBy(
        nodes.id,
        nodes.nodeType,
        nodeMetadata.label,
        nodeMetadata.description,
      )
      .having(sql`count(${connections.claimId}) >= ${MIN_CONNECTIONS}`),
  );
References
  1. Avoid using OR conditions in database joins (e.g., in PostgreSQL) as it can severely degrade query performance by preventing efficient index usage.

@marcelsamyn

Copy link
Copy Markdown
Owner Author

Applied the unionAll rewrite — both subject and object connections are now unnested via UNION ALL and joined on a clean equality predicate, so each branch can use its per-column index instead of the index-defeating OR join. One note for this repo's drizzle-orm (0.41.0): unionAll isn't on the root export here; it's imported from drizzle-orm/pg-core. Counts are unchanged for the seeded data (distinct subject/object pairs) — the 3 query tests still pass and the build is green. Also fixed the prettier formatting that failed CI.

@marcelsamyn marcelsamyn merged commit 6e61748 into main Jun 18, 2026
1 check passed
@marcelsamyn marcelsamyn deleted the feat/sample-nodes branch June 18, 2026 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant