[← Back to Reviews Index](../Stewards%20Reviews%20Index.md)

# API Resilience Review — Azure-AI-RAG-CSharp-Semantic-Kernel-Functions

| Field | Value |
|---|---|
| Project | Azure-AI-RAG-CSharp-Semantic-Kernel-Functions |
| Review Date | 2026-03-21 |
| Steward | API Resilience Steward |
| Scope | C# ChatAPI project only |
| Critical | 3 |
| Notable | 4 |
| Minor | 2 |
| Info | 2 |
| Total Findings | 11 |

---

## 1. Resilience Architecture Overview

The ChatAPI project is a .NET 8 ASP.NET Core Web API that integrates with three critical Azure external dependencies:

- **Azure OpenAI** — chat completion and text embedding generation, accessed via the Semantic Kernel SDK.
- **Azure AI Search** — vector and semantic search, accessed via the `Azure.Search.Documents` SDK.
- **Azure Cosmos DB** — chat history storage and product catalog storage, accessed via the `Microsoft.Azure.Cosmos` SDK.

All three dependencies are network-bound, latency-sensitive, and subject to transient failures (throttling, timeouts, network blips). The project has **no resilience layer whatsoever**. There are no Polly policies, no `Microsoft.Extensions.Http.Resilience` registrations, no explicit timeouts, no circuit breakers, no fallback handlers, and no `CancellationToken` propagation through any service layer.

The csproj does not reference `Polly`, `Microsoft.Extensions.Http.Resilience`, or any resilience library. Every external call is a bare SDK call with no retry, no timeout guard, and no cancellation support — the API will hang indefinitely or surface raw exceptions to callers when any Azure dependency is degraded.

Overall resilience posture: **critically deficient**.

---

## 2. External Dependency Map

| Dependency | Client Type | Retry | Circuit Breaker | Timeout | Fallback | Notes |
|---|---|---|---|---|---|---|
| Azure OpenAI (chat completion) | Semantic Kernel `IChatCompletionService` | None | None | None | None | Called in `ChatService.GetResponseAsync` with no guard |
| Azure OpenAI (text embedding) | Semantic Kernel `ITextEmbeddingGenerationService` | None | None | None | None | Called in `AISearchDataPlugin.ResourceLookup` |
| Azure AI Search | `SearchClient` (SDK, singleton) | None | None | None | None | Called in `AISearchData.RetrieveDocumentationAsync` |
| Azure Cosmos DB (chat history) | `CosmosClient` (SDK, singleton) | SDK default | None | SDK default | None | Chat history read/write in `ChatHistoryData` |
| Azure Cosmos DB (product catalog) | `CosmosClient` (SDK, singleton) | SDK default | None | SDK default | None | Product reads in `ProductData`; also used at startup by `GenerateProductInfo` |

The Cosmos DB SDK does include built-in retry for transient errors (429, 503) via `CosmosClientOptions.MaxRetryAttemptsOnRateLimitedRequests` — however, no custom `CosmosClientOptions` are passed in `Program.cs`, so the SDK falls back to its built-in defaults (9 retry attempts, up to 30-second wait). While this provides some baseline protection, it is not explicitly verified or tuned. This aspect is deferred to the CosmosDB Steward.

---

## 3. HttpClient Factory Assessment

The project does **not use `HttpClient` directly** — all external service calls are made via Azure SDK clients (`AzureOpenAIClient`, `SearchClient`, `CosmosClient`) and the Semantic Kernel abstraction layer. There is no `new HttpClient()` instantiation in the codebase.

The Azure SDK clients internally manage their own HTTP connection pooling and lifecycle. However, the absence of `IHttpClientFactory` is not a finding here because the project correctly relies on first-party Azure SDK clients rather than raw `HttpClient` instances.

**Finding:** No `new HttpClient()` usage detected — socket exhaustion risk from raw `HttpClient` is not present.

---

## 4. Retry Policy Assessment

No retry policies are configured anywhere in the codebase. The `ChatAPI.csproj` does not reference `Polly` or `Microsoft.Extensions.Http.Resilience`. All Azure SDK client calls (`chatCompletion.GetChatMessageContentAsync`, `_embedding.GenerateEmbeddingAsync`, `_searchClient.SearchAsync`, Cosmos DB reads/writes) are invoked directly with no retry wrapper.

Specific gaps:

- `ChatService.GetResponseAsync` (line 49): `GetChatMessageContentAsync` — Azure OpenAI call with no retry.
- `AISearchDataPlugin.ResourceLookup` (line 35–39): embedding generation + search retrieval with no retry.
- `AISearchData.RetrieveDocumentationAsync` (line 56): `SearchAsync` — Azure AI Search call with no retry.
- `ProductData.GetProductAsync` / `GetProductByNameAsync`: Cosmos DB reads with only SDK-default retry (not tuned).
- `ChatHistoryData`: All Cosmos DB reads/writes with only SDK-default retry.

Azure OpenAI and Azure AI Search transient failures (network errors, 429 throttling, 503) will surface directly as unhandled exceptions to the caller.

---

## 5. Circuit Breaker Assessment

No circuit breaker is configured for any external dependency. If Azure OpenAI or Azure AI Search becomes degraded or returns errors continuously, every incoming request will attempt to call the failing dependency, producing cascading failures under load.

There is no isolation boundary preventing a failure in Azure OpenAI from blocking all API threads. Without a circuit breaker:
- Every user request during an Azure OpenAI outage will block until the SDK timeout (which is itself unconfigured and unbounded in the Semantic Kernel layer).
- Thread pool exhaustion is possible under sustained degradation.

The project has no `AddStandardResilienceHandler()`, no Polly `CircuitBreakerPolicy`, and no custom pipeline defined.

---

## 6. Timeout Configuration Assessment

No explicit timeouts are configured at any layer:

- `AzureOpenAIClient` is registered as a singleton without specifying `AzureOpenAIClientOptions` or a custom `HttpClient` with `Timeout`.
- `SearchClient` is registered as a singleton without `SearchClientOptions` specifying a timeout.
- Semantic Kernel's `GetChatMessageContentAsync` call has no `OpenAIPromptExecutionSettings` timeout or outer `Task.WhenAny`/`CancellationToken` guard.
- The `GenerateEmbeddingAsync` call in `AISearchDataPlugin` has no timeout guard.

A slow Azure OpenAI response (e.g., 60+ seconds) will hold the ASP.NET Core request thread for the full duration. The default `HttpClient.Timeout` in the Azure SDK is typically 100 seconds, but no configuration enforces this. Under load, this leads to request queue saturation.

---

## 7. CancellationToken Propagation Assessment

`CancellationToken` is not propagated anywhere in the service layer:

- `ChatController.Post` is an `async Task<string>` action method that does not accept or use `CancellationToken`. ASP.NET Core provides `HttpContext.RequestAborted` on `ControllerBase`, but this is never accessed.
- `ChatService.GetResponseAsync(string question, string sessionId)` accepts no `CancellationToken` parameter. All awaited operations (`_chatHistoryData.InitializeChatHistoryFromCosmosDBAsync`, `GetChatMessageContentAsync`, etc.) receive no cancellation token.
- `AISearchData.RetrieveDocumentationAsync` accepts no `CancellationToken`. The `SearchAsync` overload supports one but it is not used.
- `AISearchDataPlugin.ResourceLookup` accepts no `CancellationToken`. The embedding and search calls accept cancellation tokens but do not receive one.
- `ChatHistoryData`: all async methods (`AddUserMessageAsync`, `AddAssistantMessageAsync`, `GetMessagesBySessionIdAsync`) accept no `CancellationToken`. The Cosmos DB SDK `CreateItemAsync` and `ReadNextAsync` support cancellation but are called without tokens.
- `ProductData.GetProductAsync`, `GetProductByNameAsync`: no `CancellationToken` parameter. Cosmos DB `ReadItemAsync` and `ReadNextAsync` support cancellation but are called without tokens.

The consequence is that client disconnections (browser close, client timeout) do not abort in-flight Azure calls, wasting Azure OpenAI tokens and Cosmos DB RUs on abandoned requests.

---

## 8. Findings

| Severity | ID | Title | File |
|---|---|---|---|
| 🔴 Critical | RESL-TIMEOUT-001 | No timeout configured for Azure OpenAI calls | `Services/ChatService.cs` |
| 🔴 Critical | RESL-TIMEOUT-002 | No timeout configured for Azure AI Search calls | `Data/AISearchData.cs` |
| 🔴 Critical | RESL-CANCEL-001 | CancellationToken not propagated anywhere in the service layer | `Controllers/ChatController.cs`, `Services/ChatService.cs`, `Data/*` |
| 🟡 Notable | RESL-RETRY-001 | No retry policy on Azure OpenAI chat completion calls | `Services/ChatService.cs` |
| 🟡 Notable | RESL-RETRY-002 | No retry policy on Azure OpenAI embedding generation calls | `Plugins/AISearchDataPlugin.cs` |
| 🟡 Notable | RESL-RETRY-003 | No retry policy on Azure AI Search calls | `Data/AISearchData.cs` |
| 🟡 Notable | RESL-CB-001 | No circuit breaker on any critical external dependency | `Program.cs` |
| 🟢 Minor | RESL-CANCEL-002 | Azure AI Search `SearchAsync` called without CancellationToken | `Data/AISearchData.cs` |
| 🟢 Minor | RESL-CANCEL-003 | Cosmos DB operations called without CancellationToken | `Data/ChatHistoryData.cs`, `Data/ProductData.cs` |
| ℹ️ Info | RESL-HTTPCLIENT-001 | No raw `new HttpClient()` usage detected | N/A |
| ℹ️ Info | RESL-COSMOS-001 | Cosmos DB SDK provides built-in retry; explicit tuning not present | `Program.cs` |

### Finding Details

#### RESL-TIMEOUT-001 — 🔴 Critical: No timeout configured for Azure OpenAI calls

The `AzureOpenAIClient` singleton is registered with `new AzureOpenAIClient(...)` using no `AzureOpenAIClientOptions`. The Semantic Kernel chat completion call in `ChatService.GetResponseAsync` has no enclosing timeout guard. A slow or hung Azure OpenAI response will block the ASP.NET Core request thread indefinitely up to the SDK's internal default, which is not explicitly enforced. Under load, this saturates the thread pool.

**Recommendation:** Configure a `HttpClient` with an explicit `Timeout` when constructing `AzureOpenAIClient`, or wrap `GetChatMessageContentAsync` with a `CancellationTokenSource` with a configured timeout (e.g., 30 seconds). Prefer propagating `HttpContext.RequestAborted` through the stack.

---

#### RESL-TIMEOUT-002 — 🔴 Critical: No timeout configured for Azure AI Search calls

`SearchClient` is registered as a singleton with no `SearchClientOptions` specifying request timeout. `RetrieveDocumentationAsync` calls `SearchAsync` with no timeout guard. A degraded Azure AI Search service will hold all threads calling this method indefinitely.

**Recommendation:** Pass `SearchClientOptions` with a `Retry.NetworkTimeout` when constructing `SearchClient`, or wrap the call in a policy with a timeout.

---

#### RESL-CANCEL-001 — 🔴 Critical: CancellationToken not propagated anywhere in the service layer

`ChatController.Post` does not accept or use `CancellationToken ct` (available via `ControllerBase.HttpContext.RequestAborted`). `ChatService.GetResponseAsync` has no `CancellationToken` parameter. No downstream async operation receives a cancellation token. Client disconnections do not abort in-flight Azure OpenAI, Cosmos DB, or AI Search operations, causing unnecessary resource consumption and Azure cost.

**Recommendation:** Add `CancellationToken cancellationToken = default` to `ChatController.Post`, `ChatService.GetResponseAsync`, and all `Data` layer async methods. Pass `HttpContext.RequestAborted` from the controller to the service. Pass the token to all SDK async calls.

---

#### RESL-RETRY-001 — 🟡 Notable: No retry policy on Azure OpenAI chat completion calls

`GetChatMessageContentAsync` in `ChatService` is called with no retry. Azure OpenAI is subject to 429 (rate limiting) and transient 5xx errors. A single transient failure fails the entire chat request.

**Recommendation:** Apply a Polly retry policy (or `Microsoft.Extensions.Http.Resilience` `StandardResilienceHandler`) with exponential backoff, jitter, and max 3–5 attempts scoped to 429 and 5xx responses. This can be applied at the `AzureOpenAIClient` HTTP pipeline level via `AzureOpenAIClientOptions.Transport`.

---

#### RESL-RETRY-002 — 🟡 Notable: No retry policy on Azure OpenAI embedding generation calls

`GenerateEmbeddingAsync` in `AISearchDataPlugin.ResourceLookup` is called with no retry. Transient embedding failures silently re-throw from the kernel function, failing the whole RAG lookup.

**Recommendation:** Apply the same retry policy as RESL-RETRY-001 — both calls share the same `AzureOpenAIClient` instance, so a single retry configuration at the client level covers both.

---

#### RESL-RETRY-003 — 🟡 Notable: No retry policy on Azure AI Search calls

`_searchClient.SearchAsync` in `AISearchData.RetrieveDocumentationAsync` has no retry. Transient AI Search failures surface directly to the kernel function and then to the caller.

**Recommendation:** Pass `SearchClientOptions` with `Retry` configured (`MaxRetries = 3`, `Mode = RetryMode.Exponential`) when constructing `SearchClient` in `Program.cs`.

---

#### RESL-CB-001 — 🟡 Notable: No circuit breaker on any critical external dependency

No circuit breaker is applied to Azure OpenAI, Azure AI Search, or Cosmos DB. During a sustained Azure OpenAI outage, every incoming request will attempt the failing call, exhausting threads and wasting client budget.

**Recommendation:** Apply `AddStandardResilienceHandler()` (from `Microsoft.Extensions.Http.Resilience`) to the `AzureOpenAIClient` HTTP pipeline. This provides retry, timeout, and circuit breaker in one configured pipeline. Alternatively, add a Polly `CircuitBreakerPolicy` wrapping the Semantic Kernel chat completion calls.

---

#### RESL-CANCEL-002 — 🟢 Minor: Azure AI Search `SearchAsync` called without CancellationToken

`_searchClient.SearchAsync<SearchDocument>(question, searchOptions)` is called without the optional `cancellationToken` parameter. The `SearchAsync` overload accepts a `CancellationToken`.

**Recommendation:** Add `CancellationToken cancellationToken = default` to `RetrieveDocumentationAsync` and pass it to both `SearchAsync` and `GetResultsAsync`.

---

#### RESL-CANCEL-003 — 🟢 Minor: Cosmos DB operations called without CancellationToken

`container.CreateItemAsync`, `container.ReadItemAsync`, `query.ReadNextAsync`, and related calls in `ChatHistoryData` and `ProductData` all accept `CancellationToken` but are called without one.

**Recommendation:** Thread `CancellationToken` through all Cosmos DB data layer methods.

---

#### RESL-HTTPCLIENT-001 — ℹ️ Info: No raw `new HttpClient()` usage detected

The project uses Azure SDK clients and Semantic Kernel abstractions throughout. No `new HttpClient()` instantiation was found, so there is no socket exhaustion risk from direct `HttpClient` misuse.

---

#### RESL-COSMOS-001 — ℹ️ Info: Cosmos DB SDK provides built-in retry; explicit tuning not present

`CosmosClient` is registered with only a connection string (`new CosmosClient(connectionString)`), no `CosmosClientOptions`. The SDK defaults to 9 retry attempts for rate-limited requests with up to 30-second waits. This provides a baseline but the retry behavior is not explicitly declared or tuned. Cosmos DB retry configuration values are deferred to the CosmosDB Steward and the API Config Steward.

---

## 9. Recommended Improvements

| Finding | Recommended Action | Priority |
|---|---|---|
| RESL-CANCEL-001 | Add `CancellationToken` parameter to controller, service, and all data layer async methods; propagate `HttpContext.RequestAborted` | High |
| RESL-TIMEOUT-001 | Configure explicit timeout on `AzureOpenAIClient` via `AzureOpenAIClientOptions` or outer `CancellationTokenSource` | High |
| RESL-TIMEOUT-002 | Configure `SearchClientOptions.Retry.NetworkTimeout` when registering `SearchClient` in DI | High |
| RESL-CB-001 | Add `Microsoft.Extensions.Http.Resilience` package; apply `AddStandardResilienceHandler()` to Azure OpenAI and AI Search client pipelines | Medium |
| RESL-RETRY-001 | Apply exponential-backoff retry (3–5 attempts, jitter) at the `AzureOpenAIClient` transport layer or via Polly pipeline wrapping `GetChatMessageContentAsync` | Medium |
| RESL-RETRY-002 | Same retry configuration as RESL-RETRY-001 — covered by the same `AzureOpenAIClient` pipeline change | Medium |
| RESL-RETRY-003 | Configure `SearchClientOptions` with `Retry.MaxRetries = 3`, `RetryMode.Exponential` when constructing `SearchClient` | Medium |
| RESL-CANCEL-002 | Pass `CancellationToken` to `SearchAsync` and `GetResultsAsync` in `AISearchData.RetrieveDocumentationAsync` | Low |
| RESL-CANCEL-003 | Pass `CancellationToken` to all Cosmos SDK calls in `ChatHistoryData` and `ProductData` | Low |

### Suggested Dependency Addition

```xml
<PackageReference Include="Microsoft.Extensions.Http.Resilience" Version="8.*" />
```

### Suggested Pattern: CancellationToken Propagation

```csharp
// ChatController.cs
[HttpPost(Name = "PostChatRequest")]
public async Task<string> Post([FromBody] ChatRequest request, CancellationToken cancellationToken)
{
    return await chatService.GetResponseAsync(request.Input, request.SessionId, cancellationToken);
}

// ChatService.cs
public async Task<string> GetResponseAsync(string question, string sessionId, CancellationToken cancellationToken = default)
{
    // ... pass cancellationToken to all awaited calls
    ChatMessageContent response = await chatCompletion.GetChatMessageContentAsync(
        _chatHistory,
        executionSettings: openAIPromptExecutionSettings,
        kernel: kernel,
        cancellationToken: cancellationToken);
}
```

### Suggested Pattern: Azure AI Search Retry Configuration

```csharp
builder.Services.AddSingleton(serviceProvider => new SearchClient(
    new Uri(builder.Configuration["AZURE_AI_SEARCH_ENDPOINT"]!),
    builder.Configuration["AZURE_AI_SEARCH_INDEX"],
    new DefaultAzureCredential(),
    new SearchClientOptions
    {
        Retry =
        {
            MaxRetries = 3,
            Mode = RetryMode.Exponential,
            NetworkTimeout = TimeSpan.FromSeconds(15)
        }
    }));
```

---

## Summary

The ChatAPI has no resilience infrastructure. All three critical Azure dependencies (Azure OpenAI, Azure AI Search, Azure Cosmos DB) are called with bare SDK invocations: no retry, no circuit breaker, no timeout, and no cancellation support. The three critical findings (RESL-TIMEOUT-001, RESL-TIMEOUT-002, RESL-CANCEL-001) represent the highest-priority work because they cause thread exhaustion under degraded conditions and waste Azure resources on abandoned requests. Adding `Microsoft.Extensions.Http.Resilience` and threading `CancellationToken` through the stack would address the majority of findings in a single focused effort.

---

*This review is based on static analysis of source files as of 2026-03-21. It does not reflect runtime behavior, infrastructure configuration, or transient errors observed in production. Generated by the API Resilience Steward (api-resilience-steward).*