Skip to content

feat(appkit): send internal telemetry via AppkitLog schema#332

Open
calvarjorge wants to merge 20 commits intomainfrom
jorge.calvar/send_telemetry
Open

feat(appkit): send internal telemetry via AppkitLog schema#332
calvarjorge wants to merge 20 commits intomainfrom
jorge.calvar/send_telemetry

Conversation

@calvarjorge
Copy link
Copy Markdown
Contributor

@calvarjorge calvarjorge commented Apr 30, 2026

Summary

  • Introduce the AppkitLog events (APP_STARTUP, HEARTBEAT, REQUEST_METRICS) and a TelemetryReporter singleton that owns shared dispatch state, the periodic heartbeat, and per-endpoint request metrics aggregation.
  • Server plugin records each matched route via res.on('finish') middleware; the reporter flushes one event per (method, route) on a periodic timer (configurable via APPKIT_TELEMETRY_HEARTBEAT_INTERVAL_MS / APPKIT_TELEMETRY_METRICS_FLUSH_INTERVAL_MS).
  • POSTs to /telemetry-ext?o=<workspaceId> with the workspace SP bearer token. Errors propagate from the inner senders so consumers can see exactly what was sent and how the endpoint responded; the only swallows live at the SDK's outermost boundaries (fire-and-forget startup + interval timers).
  • Adds an Internal Telemetry tab in dev-playground that lets you trigger each event on demand and renders the request, response, and an equivalent curl command. Disable globally with disableInternalTelemetry: true on createApp or APPKIT_TELEMETRY_DISABLED=true.
  • Fixes: app.yaml was missing the DATABRICKS_JOB_ID binding the jobs plugin requires (deploys to a fresh app failed startup validation); knip false-positive on the pnpm exec cdxgen-invoked @cyclonedx/cdxgen dep blocked all pre-commit hooks.

Introduce the AppkitLog event family (APP_STARTUP, HEARTBEAT,
REQUEST_METRICS) and a TelemetryReporter singleton that owns the
shared dispatch state, periodic heartbeat, and request metrics
aggregation. The server plugin records each matched route via
res.on('finish') middleware; the reporter flushes one event per
endpoint on a periodic timer. The legacy observability_log
APP_STARTUP is kept as a fallback until the AppkitLog schema is
deployed end-to-end on the telemetry backend.

Errors propagate from the inner senders so consumers can see
exactly what was POSTed and how the endpoint responded; the only
catches live at the SDK's outermost boundaries (fire-and-forget
startup + interval timers).

Adds an Internal Telemetry tab in dev-playground that lets you
trigger each event on demand and renders the request, response,
and an equivalent curl command. Disable with
disableInternalTelemetry: true or APPKIT_TELEMETRY_DISABLED=true.

Also unblocks the pre-commit knip hook by ignoring the @cyclonedx/cdxgen
dependency, which is invoked dynamically via pnpm exec from the
release:sbom script.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
The dev-playground registers the jobs() plugin, whose manifest
requires DATABRICKS_JOB_ID, but app.yaml never declared a binding
for it. As a result, deploying the playground to a fresh
Databricks App fails AppKit's startup resource validation with
"Missing required resources: job:Job [jobs]".

Add the missing entry alongside the other resource bindings.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
The bare /telemetry endpoint rejects SP bearer tokens and 302s to
/login.html?next_url=..., which the previous code tried to follow
verbatim — but a relative location is not a valid fetch URL and
threw a "Failed to parse URL" error that the legacy try/catch
silently swallowed. Switch the dispatch URL to the SP-friendly
/telemetry-ext endpoint, and harden redirect handling by resolving
the location against the original request URL.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Now that the AppkitLog dispatch path is verified end-to-end against
/telemetry-ext, the observability_log fallback is no longer needed.
Remove sendStartupTelemetry, the dead StartupTelemetryParams /
buildEntityId / buildLegacyStartupPayload helpers, and the second
fire-and-forget block in createApp's bootstrap. Migrate the sender
test suite to cover sendAppkitLogs (which retains all the URL,
auth, redirect, and error-propagation guarantees) and rewrite the
core internal-telemetry tests against the reporter mock.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Databricks Apps injects DATABRICKS_CLIENT_ID (the app's OAuth
client UUID) into the runtime env, not DATABRICKS_APP_ID, so the
old lookup always resolved to "" and the AppkitLog.app_id field
went out empty. Switch the bootstrap to read DATABRICKS_CLIENT_ID
so logs carry the actual per-app identifier.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Rename APPKIT_TELEMETRY_DISABLED to DISABLE_APPKIT_INTERNAL_TELEMETRY.
The new name makes it explicit that this controls AppKit's
internal/anonymized telemetry, not the user-facing OpenTelemetry
config exposed via createApp({ telemetry }).

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
sender.ts only wrapped postTelemetry with buildAppkitPayload — one
caller, no shared logic worth a dedicated file. Have the reporter's
#send call postTelemetry directly and rename the test file from
sender.test.ts to client.test.ts so the wire-format coverage
(URL, auth, redirects, error propagation) clearly targets
postTelemetry.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Document exactly what AppKit collects (event_name, app_id,
appkit_version, plus per-event bodies for APP_STARTUP, HEARTBEAT,
and REQUEST_METRICS), how it's sent, and the two ways to disable —
disableInternalTelemetry on createApp and the
DISABLE_APPKIT_INTERNAL_TELEMETRY env var. Bump faq.md's sidebar
position to 8 to make room. Add a one-paragraph header in the
package's index.ts pointing at the public doc.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
- Refuse cross-origin redirects from /telemetry-ext so the live SP
  Authorization header cannot be replayed against a third party.
- Fall back to serviceCtx.client.config.host when DATABRICKS_HOST
  is unset so the dispatch URL still resolves correctly when the
  SDK was given a pre-configured WorkspaceClient.
- Redact Authorization / Cookie / Set-Cookie when surfacing the
  request in the dev-playground debug UI and in the printed curl,
  so the sensitive headers don't leak via the response or get
  copy-pasted into shared logs.
- Revert the knip.json @cyclonedx/cdxgen exception. Earlier
  diagnosis was wrong — the warnings are notices, not errors, and
  the original pre-commit failures came from this branch's own
  unused exports (now trimmed). With the branch rebased on origin/
  main, knip exits 0 against the unmodified config.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Both branches in fetchWithRedirect (cross-origin throw + same-origin
follow) want to release the redirect's response body before moving
on. Run the cancel once after we've parsed the target URL, before
deciding what to do, instead of repeating it on each side.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
The /telemetry-ext endpoint may not actually issue redirects in
practice; the follow logic was added defensively. Inline a single
fetch with redirect: "manual" so any 3xx surfaces directly to the
caller (and the dev-playground UI), making it easy to verify
whether the redirect path is exercised on real traffic.

If the deployed app never produces a 3xx response, the helper
function and its tests stay deleted; if it does, restore them.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
@calvarjorge calvarjorge force-pushed the jorge.calvar/send_telemetry branch from 04a9847 to 08378b9 Compare April 30, 2026 10:58
The /telemetry-ext endpoint does not actually redirect under normal
SP-authenticated traffic, so the redirect-follow logic, the
redirect: "manual" hint on fetch, and the corresponding tests were
all dead weight. Inline a plain fetch and remove the related test.

Also drop the "Inspecting events locally" section from the public
internal-telemetry doc — the dev-playground harness is internal
tooling and isn't relevant to library consumers.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
@calvarjorge calvarjorge marked this pull request as draft May 4, 2026 12:43
@calvarjorge calvarjorge marked this pull request as ready for review May 4, 2026 12:43
@calvarjorge calvarjorge requested a review from a team as a code owner May 4, 2026 12:43
Copy link
Copy Markdown
Member

@pkosiec pkosiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Could you share internally a short demo how it works E2E? not sure how to test it properly.

Thanks!

Comment thread packages/appkit/src/index.ts Outdated
Comment thread docs/docs/internal-telemetry.mdx Outdated
cache?: CacheConfig;
client?: WorkspaceClient;
onPluginsReady?: (appkit: PluginMap<T>) => void | Promise<void>;
disableInternalTelemetry?: boolean;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for removing that. IMO the environmental variable is enough 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the proposal in the design doc. Since we're obtaining some data from customer apps, we wanted to be as transparent as possible about it.

Comment thread docs/docs/internal-telemetry.mdx Outdated
```

Either one fully disables the reporter — no events are emitted and no
network calls are made.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also support DO_NOT_TRACK=1 - https://donottrack.sh/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page is not loading for me

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added DO_NOT_TRACK=1 as a way to disable it nevertheless

Comment thread docs/docs/internal-telemetry.mdx Outdated
Comment thread .gitignore
Comment thread packages/appkit/src/internal-telemetry/reporter.ts Outdated
Comment thread packages/appkit/src/internal-telemetry/index.ts Outdated
Comment thread packages/appkit/src/internal-telemetry/client.ts Outdated
Comment thread packages/appkit/src/internal-telemetry/appkit-log.ts
appkitVersion: productVersion,
});
reporter.start();
reporter.sendStartup().catch(() => {});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are calling this (which is asynchronous) and basically ignoring the promise, is this on purpose?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we don't want to cause delays or raise errors because of internal telemetry.

… debug

The dev-playground debug plugin was the only consumer of
TelemetryReporter, TelemetrySendRequest, TelemetrySendResponse, and
TelemetrySendResult outside the package. Since dev-playground is
internal tooling for AppKit maintainers, exporting infrastructure
types via the public surface only to feed it isn't justified.

Drop:
- TelemetryReporter and the TelemetrySend* types from the public
  @databricks/appkit exports
- The TelemetrySend* re-exports from internal-telemetry/index.ts
  (no remaining consumer)
- The internal-telemetry-debug plugin and its routes
- The /internal-telemetry route, home-page card, and nav link in
  dev-playground

The reporter and types remain available intra-package for the
core/server wiring; nothing about the dispatch behavior changes.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
- Rename docs/docs/internal-telemetry.mdx to privacy.mdx so library
  consumers don't confuse it with the user-facing OpenTelemetry
  config exposed via createApp({ telemetry }).
- Move the page to the bottom of the sidebar (sidebar_position: 99).
- Honor the cross-tool DO_NOT_TRACK=1 convention
  (https://consoledonottrack.com) alongside the AppKit-specific
  DISABLE_APPKIT_INTERNAL_TELEMETRY env var.
- Document the new opt-out path on the Privacy page and update the
  in-source comment that pointed at the old doc location.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
The project uses moduleResolution: "bundler", which doesn't require
explicit .js extensions on relative imports. The rest of appkit
imports without them; only internal-telemetry/ added them, by
mistake on my part. Strip them to match the surrounding codebase.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Drop the "we do not send X, Y, Z" paragraph — committing to a
non-collection list ties our hands if the schema later changes.
Also drop the "How it's sent" section; the dispatch endpoint and
fire-and-forget semantics are implementation details users don't
need to track.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
The dev-playground no longer needs the aggregated plugin manifest;
the file is generated by `appkit plugin sync` and isn't read by
any current dev-playground flow.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
Drop the placeholder: true field from app_startup_event and
heartbeat_event payloads; an empty object is enough to identify
the oneof variant on the wire.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
The SDK's apiClient.request already does host resolution,
authentication, query string handling, and User-Agent setting that
client.ts was reimplementing by hand. Drop client.ts (and its
tests), inline a single apiClient.request call inside reporter's
#send, and remove the now-unused workspaceHost from the reporter
constructor.

Tests for the reporter switch to mocking apiClient.request on the
WorkspaceClient mock instead of stubbing global fetch.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
…dleware

- Call TelemetryReporter.getInstance()?.stop() in the server
  plugin's _gracefulShutdown so heartbeat/metric timers don't keep
  firing during the 15s shutdown grace window.
- Export requestMetricsMiddleware as @internal and add unit tests
  for the integration point: matched routes, baseUrl + route
  template assembly, no-op when req.route is unset, no-op when the
  reporter is uninitialized, and 4xx/5xx status pass-through.

Co-authored-by: Isaac
Signed-off-by: Jorge Calvar <[email protected]>
@calvarjorge calvarjorge requested a review from pkosiec May 6, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants