Skip to content

feat: [WIP] configure klog to write to stderr to prevent JSON corruption#60

Open
ropatil010 wants to merge 1 commit into
openshift-eng:mainfrom
ropatil010:klog-stderr
Open

feat: [WIP] configure klog to write to stderr to prevent JSON corruption#60
ropatil010 wants to merge 1 commit into
openshift-eng:mainfrom
ropatil010:klog-stderr

Conversation

@ropatil010

Copy link
Copy Markdown

Problem:
The 'list tests' command outputs JSON to stdout for consumption by CI systems. However, klog (used by imported k8s packages) defaults to stdout, causing deprecation warnings and other log messages to corrupt the JSON stream.

This was discovered in openshift/cluster-authentication-operator#857 where klog warnings like "W0327 12:34:56.123456 12345 warnings.go:70] deprecation warning" appeared in stdout, breaking JSON parsers with errors like: "invalid character 'W' looking for beginning of value"

Solution:
Configure klog to write to stderr in pkg/cmd/cmd.go init() function, executed before any commands run. This keeps stdout clean for structured output while directing all logs/diagnostics to stderr.

This approach is consistent with how pkg/ginkgo/util.go already configures GinkgoWriter to use stderr, following the Unix convention:

  • stdout = structured data (JSON)
  • stderr = logs and diagnostics

Impact:
CI systems can now reliably parse test lists without klog corruption.

Reference: openshift/cluster-authentication-operator#857

Problem:
The 'list tests' command outputs JSON to stdout for consumption by CI systems.
However, klog (used by imported k8s packages) defaults to stdout, causing
deprecation warnings and other log messages to corrupt the JSON stream.

This was discovered in openshift/cluster-authentication-operator#857 where
klog warnings like "W0327 12:34:56.123456 12345 warnings.go:70] deprecation
warning" appeared in stdout, breaking JSON parsers with errors like:
"invalid character 'W' looking for beginning of value"

Solution:
Configure klog to write to stderr in pkg/cmd/cmd.go init() function, executed
before any commands run. This keeps stdout clean for structured output while
directing all logs/diagnostics to stderr.

This approach is consistent with how pkg/ginkgo/util.go already configures
GinkgoWriter to use stderr, following the Unix convention:
- stdout = structured data (JSON)
- stderr = logs and diagnostics

Impact:
CI systems can now reliably parse test lists without klog corruption.

Reference: openshift/cluster-authentication-operator#857

Co-Authored-By: Rohit Patil <[email protected]>
@openshift-ci openshift-ci Bot requested review from jupierce and stbenjam April 6, 2026 10:20
@openshift-ci

openshift-ci Bot commented Apr 6, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ropatil010
Once this PR has been reviewed and has the lgtm label, please assign jupierce for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 6, 2026
@openshift-ci

openshift-ci Bot commented Apr 6, 2026

Copy link
Copy Markdown

Hi @ropatil010. Thanks for your PR.

I'm waiting for a openshift-eng member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ropatil010

Copy link
Copy Markdown
Author

/ok-to-test

@openshift-ci

openshift-ci Bot commented Apr 6, 2026

Copy link
Copy Markdown

@ropatil010: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ropatil010 added a commit to ropatil010/cluster-authentication-operator that referenced this pull request Apr 6, 2026
Remove local pkg/test/init workaround in favor of framework-level
fix from openshift-eng/openshift-tests-extension#60.

Temporary replace directive points to ropatil010:klog-stderr branch
until the PR merges upstream.

Co-Authored-By: Rohit Patil <[email protected]>
@jupierce

jupierce commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

/ok-to-test

@openshift-ci openshift-ci Bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 6, 2026
@ropatil010 ropatil010 changed the title feat: [WIP] configure klog to write to stderr to prevent JSON corruption feat: configure klog to write to stderr to prevent JSON corruption Apr 6, 2026
@ropatil010

Copy link
Copy Markdown
Author

Hi @stbenjam @jupierce Can you PTAL on this PR.

@stbenjam

stbenjam commented Apr 6, 2026

Copy link
Copy Markdown
Member

Hm, this PR increases the vendored footprint of OTE which we were hoping to keep minimal.

init() can also have surprising behavior: what if an OTE extension has its own init that resets klog to Stdout? We'd get non-deterministic failures. Is there a reason to prefer this approach vs. putting the stderr setting in your extension early on?

Also, do we know where klog is getting set to stdout? Is that something we could fix upstream?
I beleive klog default is stderr, so it gets changed somewhere

@ropatil010

Copy link
Copy Markdown
Author

Hi @stbenjam
Here is the revert PR from TRT team: openshift/cluster-authentication-operator#857.
On my latest PR: openshift/cluster-authentication-operator@9541074 to solve the CI failures i have used init functionality.
As per the suggestion from @liouk here: openshift/cluster-authentication-operator#859 (comment) i tried to implement in the OTE framework directly to avoid using init in each repos.
cc: @liouk

@stbenjam

stbenjam commented Apr 6, 2026

Copy link
Copy Markdown
Member

As per the suggestion from @liouk here: openshift/cluster-authentication-operator#859 (comment) i tried to implement in the OTE framework directly to avoid using init in each repos.

Hm, that comment says this issue seems that it's caused by OTE itself but nothing here causes klog to go to stdout. I would try to find what it is doing it and address it at the root

@ropatil010 ropatil010 changed the title feat: configure klog to write to stderr to prevent JSON corruption feat: [WIP] configure klog to write to stderr to prevent JSON corruption Apr 7, 2026
ropatil010 added a commit to ropatil010/cluster-authentication-operator that referenced this pull request Apr 7, 2026
Remove local pkg/test/init workaround in favor of framework-level
fix from openshift-eng/openshift-tests-extension#60.

Temporary replace directive points to ropatil010:klog-stderr branch
until the PR merges upstream.

Co-Authored-By: Rohit Patil <[email protected]>
@liouk

liouk commented Apr 14, 2026

Copy link
Copy Markdown

I would try to find what it is doing it and address it at the root

@ropatil010 @stbenjam it looks like that openshift-tests combines the output when it invokes extension binaries: https://github.com/openshift/origin/blob/9df27cdfd2146a887c88ea090cba75f2b0657389/pkg/test/extensions/binary.go#L943

Overall the fix introduced in openshift/cluster-authentication-operator#859 is quite fragile as it depends on import order, however if we really need to combine the output when openshift-tests invokes extension binaries, this might be unavoidable.

Any insights on this? Overall I'd also rather solve the root issue rather than fixing it at each repo that might have similar issues, but I'm not sure if this is realistic.

@stbenjam

Copy link
Copy Markdown
Member

The code you linked from origin shows we combine stdout and stderr, I think that is probably not the right thing for us to do but the existing JSON boundary detection should prevent scooping up logs.

If you go back to the original PR that was reverted, it was only failing in metal jobs, so that's your clue it something specific to those.

Metal IPv6 jobs are disconnected and query extensions for image mirroring information, and that seems to be where the problem is.

I think openshift/origin#31009 fixes the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Indicates a non-member PR verified by an org member that is safe to test.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants