Skip to content

OpenSearchGenericManager: Unsupported /_cluster/stats call causes spurious ERROR logs on AWS OpenSearch Serverless (AOSS) #27599

@bwright86

Description

@bwright86

Problem

When OpenMetadata is configured against AWS OpenSearch Serverless (AOSS), several cluster-level API calls fail with 404 because AOSS is a managed serverless service — it has no concept of cluster nodes, shards, or JVM heap. Three methods in OpenSearchGenericManager call endpoints that AOSS does not implement:

Method Endpoint Used for
clusterStats() /_cluster/stats Reindexing auto-tune
nodesStats() /_nodes/stats JVM/CPU metrics for auto-tune
getSearchHealthStatus() /_cluster/health Service health status panel

The following ERROR is logged repeatedly at runtime:

ERROR [o.o.s.s.o.OpenSearchGenericManager] - Failed to fetch cluster stats
os.org.opensearch.client.transport.TransportException: Request failed with status code '404'

The error is misleading: the cluster is healthy and authentication is working. The 404 is structural — AOSS will never implement these endpoints — not a transient failure.

Additional Impact: Search Service Reported as Unhealthy

getSearchHealthStatus() calls /_cluster/health and returns HEALTHY_STATUS or UNHEALTHY_STATUS via ServicesStatusJobHandler. This result is surfaced in the OpenMetadata UI under Settings → Health Check, and is also available via:

GET /api/v1/system/status

Because /_cluster/health returns 404 on AOSS, the UI reports the search backend as unhealthy even when AOSS is fully functional and serving requests. This is a false negative that will cause operators to incorrectly believe their search backend has a problem.

Detection

AOSS endpoints always follow the pattern <collection-id>.<region>.aoss.amazonaws.com. This makes detection unambiguous without any new configuration:

private boolean isAwsOpenSearchServerless(String host) {
    return host != null && host.endsWith(".aoss.amazonaws.com");
}

Alternatively, the existing SEARCH_AWS_SERVICE_NAME=aoss environment variable (already set in the Helm chart when using AOSS) can serve as a secondary signal.

Proposed Fix

1. Gate cluster/node stats calls behind an AOSS check

Skip calls to unsupported endpoints when running against Serverless, falling back to configured defaults which the auto-tune path already supports:

if (!isAwsOpenSearchServerless(searchConfiguration.getHost())) {
    fetchClusterStats();
} else {
    LOG.debug("Skipping cluster stats fetch — AWS OpenSearch Serverless does not support /_cluster/stats");
}

2. Use GET / as the AOSS health check

/_cluster/health is not available on AOSS. client.info() (GET /) is supported, correctly reflects both connectivity and auth status, and is the appropriate substitute:

private SearchHealthStatus getAossHealthStatus() {
    try {
        client.info(); // GET / — supported by AOSS
        return new SearchHealthStatus(HEALTHY_STATUS);
    } catch (Exception e) {
        return new SearchHealthStatus(UNHEALTHY_STATUS);
    }
}

getSearchHealthStatus() would dispatch to this method when AOSS is detected.

Impact Summary

  • Every OpenMetadata deployment using AOSS sees continuous spurious ERROR log entries, making it harder to identify real errors
  • The Settings → Health Check panel incorrectly shows the search backend as unhealthy
  • Monitoring/alerting systems that treat any ERROR log or the health status API as a signal produce false positives
  • The reindexing auto-tune feature is silently non-functional on AOSS deployments

Environment

  • OpenMetadata: 1.12.4 / 1.12.5
  • Search backend: AWS OpenSearch Serverless (AOSS)
  • Deployment: EKS with IRSA authentication (SEARCH_AWS_IAM_AUTH_ENABLED=true, SEARCH_AWS_SERVICE_NAME=aoss)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions