✨ feat(sdk): sampleNodes endpoint for Explore wander-in starters#76
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new endpoint and SDK method to sample 'interesting' nodes for the Explore empty state, selecting labeled, non-noise nodes with at least three active connections. The implementation includes the query logic using Drizzle ORM, Zod schemas for validation, and integration tests. The review feedback highlights a potential performance bottleneck in PostgreSQL due to the use of an OR condition in the database join. It is recommended to refactor the query using a unionAll subquery to unnest the connections and perform a clean equality join instead.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| import { | ||
| and, | ||
| desc, | ||
| eq, | ||
| inArray, | ||
| isNotNull, | ||
| notInArray, | ||
| or, | ||
| sql, | ||
| } from "drizzle-orm"; |
There was a problem hiding this comment.
Import unionAll from drizzle-orm to allow rewriting the query without an OR condition in the join, which degrades PostgreSQL query performance.
| import { | |
| and, | |
| desc, | |
| eq, | |
| inArray, | |
| isNotNull, | |
| notInArray, | |
| or, | |
| sql, | |
| } from "drizzle-orm"; | |
| import { | |
| and, | |
| desc, | |
| eq, | |
| inArray, | |
| isNotNull, | |
| notInArray, | |
| sql, | |
| unionAll, | |
| } from "drizzle-orm"; |
References
- Avoid using OR conditions in database joins (e.g., in PostgreSQL) as it can severely degrade query performance by preventing efficient index usage.
| // CTE 1: connection counts for qualifying nodes. | ||
| const conn = db.$with("conn").as( | ||
| db | ||
| .select({ | ||
| id: nodes.id, | ||
| nodeType: nodes.nodeType, | ||
| label: nodeMetadata.label, | ||
| description: nodeMetadata.description, | ||
| connectionCount: sql<string>`count(${claims.id})`.as( | ||
| "connection_count", | ||
| ), | ||
| }) | ||
| .from(nodes) | ||
| .innerJoin(nodeMetadata, eq(nodeMetadata.nodeId, nodes.id)) | ||
| .innerJoin( | ||
| claims, | ||
| and( | ||
| eq(claims.userId, userId), | ||
| eq(claims.status, "active"), | ||
| or( | ||
| eq(claims.subjectNodeId, nodes.id), | ||
| eq(claims.objectNodeId, nodes.id), | ||
| ), | ||
| ), | ||
| ) | ||
| .where( | ||
| and( | ||
| eq(nodes.userId, userId), | ||
| isNotNull(nodeMetadata.label), | ||
| notInArray(nodes.nodeType, NOISE_NODE_TYPES), | ||
| ...(nodeTypes && nodeTypes.length > 0 | ||
| ? [inArray(nodes.nodeType, nodeTypes)] | ||
| : []), | ||
| ), | ||
| ) | ||
| .groupBy( | ||
| nodes.id, | ||
| nodes.nodeType, | ||
| nodeMetadata.label, | ||
| nodeMetadata.description, | ||
| ) | ||
| .having(sql`count(${claims.id}) >= ${MIN_CONNECTIONS}`), | ||
| ); |
There was a problem hiding this comment.
Avoid using OR conditions in database joins (e.g., or(eq(claims.subjectNodeId, nodes.id), eq(claims.objectNodeId, nodes.id))) as it can severely degrade query performance by preventing efficient index usage in PostgreSQL. Instead, use a unionAll subquery to unnest the subject and object connections, and then perform a clean equality join.
const connections = unionAll(
db
.select({ nodeId: claims.subjectNodeId, claimId: claims.id })
.from(claims)
.where(and(eq(claims.userId, userId), eq(claims.status, "active"))),
db
.select({ nodeId: claims.objectNodeId, claimId: claims.id })
.from(claims)
.where(
and(
eq(claims.userId, userId),
eq(claims.status, "active"),
isNotNull(claims.objectNodeId),
),
),
).as("connections");
// CTE 1: connection counts for qualifying nodes.
const conn = db.$with("conn").as(
db
.select({
id: nodes.id,
nodeType: nodes.nodeType,
label: nodeMetadata.label,
description: nodeMetadata.description,
connectionCount: sql<string>`count(${connections.claimId})`.as(
"connection_count",
),
})
.from(nodes)
.innerJoin(nodeMetadata, eq(nodeMetadata.nodeId, nodes.id))
.innerJoin(connections, eq(connections.nodeId, nodes.id))
.where(
and(
eq(nodes.userId, userId),
isNotNull(nodeMetadata.label),
notInArray(nodes.nodeType, NOISE_NODE_TYPES),
...(nodeTypes && nodeTypes.length > 0
? [inArray(nodes.nodeType, nodeTypes)]
: []),
),
)
.groupBy(
nodes.id,
nodes.nodeType,
nodeMetadata.label,
nodeMetadata.description,
)
.having(sql`count(${connections.claimId}) >= ${MIN_CONNECTIONS}`),
);References
- Avoid using OR conditions in database joins (e.g., in PostgreSQL) as it can severely degrade query performance by preventing efficient index usage.
c12aba8 to
51bbedb
Compare
|
Applied the |
What & why
Adds a focused
sampleNodesSDK endpoint that samples a handful of interesting (well-connected) nodes — the backing data for Petals' Memory → Explore empty-state "wander in" starters. The existing no-queryqueryGraphreturns the entire labeled graph, far too heavy to fetch just to render ~6 chips, so this is a dedicated, cheap query.Selection logic
userId-owned nodes withconnectionCount >= 3.Temporal,Atlas,AssistantDream.nodeTypesfilter.limit(default 6, max 24) at random — substantive results that vary per call (the UI's "shuffle").Single Drizzle query (two CTEs →
ORDER BY random()); no embeddings, no LLM.Changes
src/lib/schemas/sample-nodes.ts— Zod request/response schemas.src/lib/query/sample-nodes.ts—sampleInterestingNodes().src/routes/query/sample-nodes.ts—POST /query/sample-nodes.src/sdk/memory-client.ts—MemoryClient.sampleNodes().src/sdk/index.ts— barrel re-export of the schemas/types.How to test
pnpm test --run src/lib/query/sample-nodes.test.ts(needs the test Postgres onlocalhost:5431;docker compose up -d db). Covers:>=3+ noise-exclusion + labeled-only filtering,nodeTypes+limit, and the empty-result case. 3/3 pass.pnpm run build— clean.Checklist
pnpm run build)