Skip to content

Commit ce5ad3e

Browse files
Merge pull request #261383 from kushagraThapar/update_azure_cosmos_java_capture_diagnostics
Update azure cosmos java capture diagnostics
2 parents 36f3358 + 085a50a commit ce5ad3e

1 file changed

Lines changed: 111 additions & 12 deletions

File tree

articles/cosmos-db/nosql/troubleshoot-java-sdk-v4.md

Lines changed: 111 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ ms.custom: devx-track-java, ignite-2022, devx-track-extended-java
2121
>
2222
2323
> [!IMPORTANT]
24-
> This article covers troubleshooting for Azure Cosmos DB Java SDK v4 only. Please see the Azure Cosmos DB Java SDK v4 [Release notes](sdk-java-v4.md), [Maven repository](https://mvnrepository.com/artifact/com.azure/azure-cosmos), and [performance tips](performance-tips-java-sdk-v4.md) for more information. If you are currently using an older version than v4, see the [Migrate to Azure Cosmos DB Java SDK v4](migrate-java-v4-sdk.md) guide for help upgrading to v4.
24+
> This article covers troubleshooting for Azure Cosmos DB Java SDK v4 only. Please see the Azure Cosmos DB Java SDK v4 [Release notes](sdk-java-v4.md), [Maven repository](https://mvnrepository.com/artifact/com.azure/azure-cosmos), and [performance tips](performance-tips-java-sdk-v4.md) for more information. If you're currently using an older version than v4, see the [Migrate to Azure Cosmos DB Java SDK v4](migrate-java-v4-sdk.md) guide for help upgrading to v4.
2525
>
2626
2727
This article covers common issues, workarounds, diagnostic steps, and tools when you use Azure Cosmos DB Java SDK v4 with Azure Cosmos DB for NoSQL accounts.
@@ -30,20 +30,20 @@ Azure Cosmos DB Java SDK v4 provides client-side logical representation to acces
3030
Start with this list:
3131

3232
* Take a look at the [Common issues and workarounds] section in this article.
33-
* Look at the Java SDK in the Azure Cosmos DB central repo, which is available [open source on GitHub](https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/cosmos/azure-cosmos). It has an [issues section](https://github.com/Azure/azure-sdk-for-java/issues) that's actively monitored. Check to see if any similar issue with a workaround is already filed. One helpful tip is to filter issues by the *cosmos:v4-item* tag.
33+
* Look at the Java SDK in the Azure Cosmos DB central repo, which is available [open source on GitHub](https://github.com/Azure/azure-sdk-for-java/tree/master/sdk/cosmos/azure-cosmos). It has an [issues section](https://github.com/Azure/azure-sdk-for-java/issues) that's actively monitored. Check to see if any similar issue with a workaround is already filed. One helpful tip is to filter issues by the `*cosmos:v4-item*` tag.
3434
* Review the [performance tips](performance-tips-java-sdk-v4.md) for Azure Cosmos DB Java SDK v4, and follow the suggested practices.
35-
* Read the rest of this article, if you didn't find a solution. Then file a [GitHub issue](https://github.com/Azure/azure-sdk-for-java/issues). If there is an option to add tags to your GitHub issue, add a *cosmos:v4-item* tag.
35+
* Read the rest of this article, if you didn't find a solution. Then file a [GitHub issue](https://github.com/Azure/azure-sdk-for-java/issues). If there's an option to add tags to your GitHub issue, add a `*cosmos:v4-item*` tag.
3636

3737
## Capture the diagnostics
3838

3939
Database, container, item, and query responses in the Java V4 SDK have a Diagnostics property. This property records all the information related to the single request, including if there were retries or any transient failures.
4040

41-
The Diagnostics are returned as a string. The string changes with each version as it is improved to better troubleshooting different scenarios. With each version of the SDK, the string will have breaking changes to the formatting. Do not parse the string to avoid breaking changes.
41+
The Diagnostics are returned as a string. The string changes with each version as it is improved to better troubleshooting different scenarios. With each version of the SDK, the string might break its format. Don't parse the string to avoid breaking changes.
4242

4343
The following code sample shows how to read diagnostic logs using the Java V4 SDK:
4444

4545
> [!IMPORTANT]
46-
> We recommend validating the minimum recommended version of the Java V4 SDK and ensure you are using this version or higher. You can check recommended version [here](./sdk-java-v4.md#recommended-version).
46+
> We recommend validating the minimum recommended version of the Java V4 SDK and ensure you're using this version or higher. You can check recommended version [here](./sdk-java-v4.md#recommended-version).
4747
4848
# [Sync](#tab/sync)
4949

@@ -197,6 +197,105 @@ itemResponseMono.onErrorResume(throwable -> {
197197
```
198198
---
199199

200+
## Logging the diagnostics
201+
Java V4 SDK versions v4.43.0 and above support automatic logging of Cosmos Diagnostics for all requests or errors if they meet certain criteria. Application developers can define thresholds for latency (for point (create, read, replace, upsert, patch) or non-point operations (query, change feed, bulk and batch)), request charge and payload size. If the requests exceed these defined thresholds, the cosmos diagnostics for those requests will be emitted automatically.
202+
203+
By default, the Java v4 SDK logs these diagnostics automatically in a specific format. However, this can be changed by implementing `CosmosDiagnosticsHandler` interface and providing your own custom Diagnostics Handler.
204+
205+
These `CosmosDiagnosticsThresholds` and `CosmosDiagnosticsHandler` can then be used in `CosmosClientTelemetryConfig` object, which should be passed into `CosmosClientBuilder` while creating sync or async client.
206+
207+
NOTE: These diagnostics thresholds are applied across different types of diagnostics including logging, tracing and client telemetry.
208+
209+
The following code samples show how to define diagnostics thresholds, custom diagnostics logger and use them through client telemetry config:
210+
211+
# [Sync](#tab/sync)
212+
213+
#### Defining custom Diagnostics Thresholds
214+
```Java
215+
// Create diagnostics threshold
216+
CosmosDiagnosticsThresholds cosmosDiagnosticsThresholds = new CosmosDiagnosticsThresholds();
217+
// For demo purposes, we will reduce the threshold so to log all diagnostics
218+
// NOTE: Do not use the same thresholds for production
219+
cosmosDiagnosticsThresholds.setPayloadSizeThreshold(10);
220+
cosmosDiagnosticsThresholds.setPointOperationLatencyThreshold(Duration.ofMillis(10));
221+
cosmosDiagnosticsThresholds.setNonPointOperationLatencyThreshold(Duration.ofMillis(10));
222+
cosmosDiagnosticsThresholds.setRequestChargeThreshold(5f);
223+
```
224+
225+
#### Defining custom Diagnostics Handler
226+
```Java
227+
// By default, DEFAULT_LOGGING_HANDLER can be used
228+
CosmosDiagnosticsHandler cosmosDiagnosticsHandler = CosmosDiagnosticsHandler.DEFAULT_LOGGING_HANDLER;
229+
230+
// App developers can also define their own diagnostics handler
231+
cosmosDiagnosticsHandler = new CosmosDiagnosticsHandler() {
232+
@Override
233+
public void handleDiagnostics(CosmosDiagnosticsContext diagnosticsContext, Context traceContext) {
234+
logger.info("This is custom diagnostics handler: {}", diagnosticsContext.toJson());
235+
}
236+
};
237+
```
238+
239+
#### Defining CosmosClientTelemetryConfig
240+
```Java
241+
// Create Client Telemetry Config
242+
CosmosClientTelemetryConfig cosmosClientTelemetryConfig =
243+
new CosmosClientTelemetryConfig();
244+
cosmosClientTelemetryConfig.diagnosticsHandler(cosmosDiagnosticsHandler);
245+
cosmosClientTelemetryConfig.diagnosticsThresholds(cosmosDiagnosticsThresholds);
246+
247+
// Create sync client
248+
CosmosClient client = new CosmosClientBuilder()
249+
.endpoint(AccountSettings.HOST)
250+
.key(AccountSettings.MASTER_KEY)
251+
.clientTelemetryConfig(cosmosClientTelemetryConfig)
252+
.buildClient();
253+
```
254+
255+
# [Async](#tab/async)
256+
257+
#### Defining custom Diagnostics Thresholds
258+
```Java
259+
// Create diagnostics threshold
260+
CosmosDiagnosticsThresholds cosmosDiagnosticsThresholds = new CosmosDiagnosticsThresholds();
261+
// For demo purposes, we will reduce the threshold so to log all diagnostics
262+
// NOTE: Do not use the same thresholds for production
263+
cosmosDiagnosticsThresholds.setPayloadSizeThreshold(10);
264+
cosmosDiagnosticsThresholds.setPointOperationLatencyThreshold(Duration.ofMillis(10));
265+
cosmosDiagnosticsThresholds.setNonPointOperationLatencyThreshold(Duration.ofMillis(10));
266+
cosmosDiagnosticsThresholds.setRequestChargeThreshold(5f);
267+
```
268+
269+
#### Defining custom Diagnostics Handler
270+
```Java
271+
// By default, DEFAULT_LOGGING_HANDLER can be used
272+
CosmosDiagnosticsHandler cosmosDiagnosticsHandler = CosmosDiagnosticsHandler.DEFAULT_LOGGING_HANDLER;
273+
274+
// App developers can also define their own diagnostics handler
275+
cosmosDiagnosticsHandler = new CosmosDiagnosticsHandler() {
276+
@Override
277+
public void handleDiagnostics(CosmosDiagnosticsContext diagnosticsContext, Context traceContext) {
278+
logger.info("This is custom diagnostics handler: {}", diagnosticsContext.toJson());
279+
}
280+
};
281+
```
282+
283+
#### Defining CosmosClientTelemetryConfig
284+
```Java
285+
// Create Client Telemetry Config
286+
CosmosClientTelemetryConfig cosmosClientTelemetryConfig =
287+
new CosmosClientTelemetryConfig();
288+
cosmosClientTelemetryConfig.diagnosticsHandler(cosmosDiagnosticsHandler);
289+
cosmosClientTelemetryConfig.diagnosticsThresholds(cosmosDiagnosticsThresholds);
290+
291+
// Create async client
292+
CosmosAsyncClient client = new CosmosClientBuilder()
293+
.endpoint(AccountSettings.HOST)
294+
.key(AccountSettings.MASTER_KEY)
295+
.clientTelemetryConfig(cosmosClientTelemetryConfig)
296+
.buildAsyncClient();
297+
```
298+
200299
## Retry design <a id="retry-logics"></a><a id="retry-design"></a><a id="error-codes"></a>
201300
See our guide to [designing resilient applications with Azure Cosmos DB SDKs](conceptual-resilient-sdk-applications.md) for guidance on how to design resilient applications and learn which are the retry semantics of the SDK.
202301

@@ -208,7 +307,7 @@ See our guide to [designing resilient applications with Azure Cosmos DB SDKs](co
208307
For best performance:
209308
* Make sure the app is running on the same region as your Azure Cosmos DB account.
210309
* Check the CPU usage on the host where the app is running. If CPU usage is 50 percent or more, run your app on a host with a higher configuration. Or you can distribute the load on more machines.
211-
* If you are running your application on Azure Kubernetes Service, you can [use Azure Monitor to monitor CPU utilization](../../azure-monitor/containers/container-insights-analyze.md).
310+
* If you're running your application on Azure Kubernetes Service, you can [use Azure Monitor to monitor CPU utilization](../../azure-monitor/containers/container-insights-analyze.md).
212311

213312
#### Connection throttling
214313
Connection throttling can happen because of either a [connection limit on a host machine] or [Azure SNAT (PAT) port exhaustion].
@@ -234,18 +333,18 @@ If your app is deployed on Azure Virtual Machines without a public IP address, b
234333
* Assign a public IP to your Azure VM.
235334

236335
##### <a name="cant-connect"></a>Can't reach the Service - firewall
237-
``ConnectTimeoutException`` indicates that the SDK cannot reach the service.
336+
``ConnectTimeoutException`` indicates that the SDK can't reach the service.
238337
You may get a failure similar to the following when using the direct mode:
239338
```
240339
GoneException{error=null, resourceAddress='https://cdb-ms-prod-westus-fd4.documents.azure.com:14940/apps/e41242a5-2d71-5acb-2e00-5e5f744b12de/services/d8aa21a5-340b-21d4-b1a2-4a5333e7ed8a/partitions/ed028254-b613-4c2a-bf3c-14bd5eb64500/replicas/131298754052060051p//', statusCode=410, message=Message: The requested resource is no longer available at the server., getCauseInfo=[class: class io.netty.channel.ConnectTimeoutException, message: connection timed out: cdb-ms-prod-westus-fd4.documents.azure.com/101.13.12.5:14940]
241340
```
242341

243-
If you have a firewall running on your app machine, open port range 10,000 to 20,000 which are used by the direct mode.
342+
If you have a firewall running on your app machine, open port range 10,000 to 20,000, which are used by the direct mode.
244343
Also follow the [Connection limit on a host machine](#connection-limit-on-host).
245344

246345
#### UnknownHostException
247346

248-
UnknownHostException means that the Java framework cannot resolve the DNS entry for the Azure Cosmos DB endpoint in the affected machine. You should verify that the machine can resolve the DNS entry or if you have any custom DNS resolution software (such as VPN or Proxy, or a custom solution), make sure it contains the right configuration for the DNS endpoint that the error is claiming cannot be resolved. If the error is constant, you can verify the machine's DNS resolution through a `curl` command to the endpoint described in the error.
347+
UnknownHostException means that the Java framework can't resolve the DNS entry for the Azure Cosmos DB endpoint in the affected machine. You should verify that the machine can resolve the DNS entry or if you have any custom DNS resolution software (such as VPN or Proxy, or a custom solution), make sure it contains the right configuration for the DNS endpoint that the error is claiming can't be resolved. If the error is constant, you can verify the machine's DNS resolution through a `curl` command to the endpoint described in the error.
249348

250349
#### HTTP proxy
251350

@@ -258,7 +357,7 @@ The SDK uses the [Netty](https://netty.io/) IO library to communicate with Azure
258357

259358
The Netty IO threads are meant to be used only for non-blocking Netty IO work. The SDK returns the API invocation result on one of the Netty IO threads to the app's code. If the app performs a long-lasting operation after it receives results on the Netty thread, the SDK might not have enough IO threads to perform its internal IO work. Such app coding might result in low throughput, high latency, and `io.netty.handler.timeout.ReadTimeoutException` failures. The workaround is to switch the thread when you know the operation takes time.
260359

261-
For example, take a look at the following code snippet which adds items to a container (look [here](quickstart-java.md) for guidance on setting up the database and container.) You might perform long-lasting work that takes more than a few milliseconds on the Netty thread. If so, you eventually can get into a state where no Netty IO thread is present to process IO work. As a result, you get a ReadTimeoutException failure.
360+
For example, take a look at the following code snippet, which adds items to a container (look [here](quickstart-java.md) for guidance on setting up the database and container.) You might perform long-lasting work that takes more than a few milliseconds on the Netty thread. If so, you eventually can get into a state where no Netty IO thread is present to process IO work. As a result, you get a ReadTimeoutException failure.
262361

263362
### <a id="java4-readtimeout"></a>Java SDK V4 (Maven com.azure::azure-cosmos) Async API
264363

@@ -287,7 +386,7 @@ This failure is a server-side failure. It indicates that you consumed your provi
287386

288387
### Error handling from Java SDK Reactive Chain
289388

290-
Error handling from Azure Cosmos DB Java SDK is important when it comes to client's application logic. There are different error handling mechanism provided by [reactor-core framework](https://projectreactor.io/docs/core/release/reference/#error.handling) which can be used in different scenarios. We recommend customers to understand these error handling operators in detail and use the ones which fit their retry logic scenarios the best.
389+
Error handling from Azure Cosmos DB Java SDK is important when it comes to client's application logic. There are different error handling mechanisms provided by [reactor-core framework](https://projectreactor.io/docs/core/release/reference/#error.handling) which can be used in different scenarios. We recommend customers to understand these error handling operators in detail and use the ones which fit their retry logic scenarios the best.
291390

292391
> [!IMPORTANT]
293392
> We do not recommend using [`onErrorContinue()`](https://projectreactor.io/docs/core/release/api/reactor/core/publisher/Flux.html#onErrorContinue-java.util.function.BiConsumer-) operator, as it is not supported in all scenarios.
@@ -299,7 +398,7 @@ The Azure Cosmos DB Emulator HTTPS certificate is self-signed. For the SDK to wo
299398

300399
### Dependency Conflict Issues
301400

302-
The Azure Cosmos DB Java SDK pulls in a number of dependencies; generally speaking, if your project dependency tree includes an older version of an artifact that Azure Cosmos DB Java SDK depends on, this may result in unexpected errors being generated when you run your application. If you are debugging why your application unexpectedly throws an exception, it is a good idea to double-check that your dependency tree is not accidentally pulling in an older version of one or more of the Azure Cosmos DB Java SDK dependencies.
401+
The Azure Cosmos DB Java SDK pulls in many dependencies; generally speaking, if your project dependency tree includes an older version of an artifact that Azure Cosmos DB Java SDK depends on, this may result in unexpected errors being generated when you run your application. If you're debugging why your application unexpectedly throws an exception, it's a good idea to double-check that your dependency tree is not accidentally pulling in an older version of one or more of the Azure Cosmos DB Java SDK dependencies.
303402

304403
The workaround for such an issue is to identify which of your project dependencies brings in the old version and exclude the transitive dependency on that older version, and allow Azure Cosmos DB Java SDK to bring in the newer version.
305404

0 commit comments

Comments
 (0)