Skip to content

Enhance error handling in DelegatingMultiSinkOutputCommitter#1606

Open
psainics wants to merge 1 commit intodevelopfrom
fix/multisink-commit-error-logging
Open

Enhance error handling in DelegatingMultiSinkOutputCommitter#1606
psainics wants to merge 1 commit intodevelopfrom
fix/multisink-commit-error-logging

Conversation

@psainics
Copy link
Copy Markdown
Contributor

@psainics psainics commented Apr 9, 2026

This pull request improves error handling and logging in the DelegatingMultiSinkOutputCommitter class for BigQuery multi-sink operations. The main focus is to provide more informative log messages when task or job commits fail, making it easier to diagnose issues.

Enhanced error handling and logging:

  • Added a Logger instance (LOG) using SLF4J to the DelegatingMultiSinkOutputCommitter class for logging warnings. [1] [2]
  • Wrapped commitTask and commitJob calls in try-catch blocks to log warnings with detailed failure reasons when an IOException occurs, then re-throw the exception.
  • Introduced a private getFailureReason helper method to extract the root cause message from an IOException for clearer log output.

…g logging for task and job commit failures
@psainics psainics self-assigned this Apr 9, 2026
@psainics psainics added the build Trigger unit test build label Apr 9, 2026
@psainics psainics requested a review from itsankit-google April 9, 2026 05:53
} catch (IOException e) {
LOG.warn("BigQuery multi-sink table '{}' failed during job commit. Reason: {}",
tableName, getFailureReason(e), e);
throw e;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging and re-throwing an error is considered an anti-pattern.

Isn't this error propagated upwards and logged somewhere in pipeline run?

@psainics psainics closed this Apr 25, 2026
@vikasrathee-cs vikasrathee-cs reopened this May 4, 2026
@itsankit-google
Copy link
Copy Markdown
Contributor

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logging and enhanced error handling to the DelegatingMultiSinkOutputCommitter by adding a logger and a helper method to extract root cause failure messages. The feedback suggests significant improvements to the error handling strategy in commitTask and commitJob; specifically, instead of failing fast on the first error, the code should attempt to commit all sinks and collect exceptions using addSuppressed to avoid inconsistent states. Additionally, the reviewer recommended hardening the getFailureReason method to handle circular references and ensure a non-null return value by falling back to the exception's string representation.

Comment on lines 104 to 113
for (String tableName : committerMap.keySet()) {
configureContext(taskAttemptContext, tableName);

committerMap.get(tableName).commitTask(taskAttemptContext);
try {
configureContext(taskAttemptContext, tableName);
committerMap.get(tableName).commitTask(taskAttemptContext);
} catch (IOException e) {
LOG.warn("BigQuery multi-sink table '{}' failed during task commit. Reason: {}",
tableName, getFailureReason(e), e);
throw e;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In a multi-sink scenario, failing fast on the first error prevents other sinks from attempting to commit their data, which can lead to partial and inconsistent job states.

It is better to attempt the commit for all sinks and collect any exceptions using addSuppressed, similar to the implementation in abortTask and abortJob. Additionally, since a commit failure is a fatal event for the task, LOG.error is more appropriate than LOG.warn.

    IOException ioe = null;
    for (String tableName : committerMap.keySet()) {
      try {
        configureContext(taskAttemptContext, tableName);
        committerMap.get(tableName).commitTask(taskAttemptContext);
      } catch (IOException e) {
        LOG.error("BigQuery multi-sink table '{}' failed during task commit. Reason: {}",
                  tableName, getFailureReason(e), e);
        if (ioe == null) {
          ioe = e;
        } else {
          ioe.addSuppressed(e);
        }
      }
    }
    if (ioe != null) {
      throw ioe;
    }

Comment on lines 118 to 127
for (String tableName : committerMap.keySet()) {
configureContext(jobContext, tableName);

committerMap.get(tableName).commitJob(jobContext);
try {
configureContext(jobContext, tableName);
committerMap.get(tableName).commitJob(jobContext);
} catch (IOException e) {
LOG.warn("BigQuery multi-sink table '{}' failed during job commit. Reason: {}",
tableName, getFailureReason(e), e);
throw e;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to commitTask, commitJob should attempt to commit all sinks rather than failing fast. This ensures that as much data as possible is finalized and provides a complete picture of which sinks failed. Using LOG.error is also recommended here as job commit failures are critical.

    IOException ioe = null;
    for (String tableName : committerMap.keySet()) {
      try {
        configureContext(jobContext, tableName);
        committerMap.get(tableName).commitJob(jobContext);
      } catch (IOException e) {
        LOG.error("BigQuery multi-sink table '{}' failed during job commit. Reason: {}",
                  tableName, getFailureReason(e), e);
        if (ioe == null) {
          ioe = e;
        } else {
          ioe.addSuppressed(e);
        }
      }
    }
    if (ioe != null) {
      throw ioe;
    }

Comment on lines +185 to +191
private String getFailureReason(IOException exception) {
Throwable rootCause = exception;
while (rootCause.getCause() != null) {
rootCause = rootCause.getCause();
}
return rootCause.getMessage() == null ? exception.getMessage() : rootCause.getMessage();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of getFailureReason might return null if both the root cause and the wrapper exception have no message. It also lacks a check for circular references in the exception chain.

A more robust approach is to fall back to rootCause.toString() when the message is null, which will at least provide the exception class name.

Suggested change
private String getFailureReason(IOException exception) {
Throwable rootCause = exception;
while (rootCause.getCause() != null) {
rootCause = rootCause.getCause();
}
return rootCause.getMessage() == null ? exception.getMessage() : rootCause.getMessage();
}
private String getFailureReason(IOException exception) {
Throwable rootCause = exception;
while (rootCause.getCause() != null && rootCause.getCause() != rootCause) {
rootCause = rootCause.getCause();
}
return rootCause.getMessage() != null ? rootCause.getMessage() : rootCause.toString();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Trigger unit test build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants