[← Back to Reviews Index](../Stewards%20Reviews%20Index.md)

# Python Configuration Review — Azure-AI-RAG-CSharp-Semantic-Kernel-Functions

**Steward:** Python Configuration Steward
**Project:** Azure-AI-RAG-CSharp-Semantic-Kernel-Functions
**Run Date:** 2026-03-22
**Target:** `src/DocumentLoaderFunction/` — Python Azure Functions app (blob trigger, LangChain, Azure AI Search embeddings)

---

## 1. Configuration Architecture Overview

The DocumentLoaderFunction is a Python Azure Functions v2 app (model-based, single `function_app.py`) that:

- Triggers on blob uploads to an Azure Storage `load` container.
- Uses `DefaultAzureCredential` (Managed Identity in production) to authenticate with Azure Cognitive Services and Azure AI Search.
- Reads embeddings configuration from environment variables (`AZURE_OPENAI_EMBEDDING`, `AZURE_OPENAI_API_VERSION`, `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`).
- Reads search configuration from environment variables (`AZURE_AI_SEARCH_ENDPOINT`, `AZURE_AI_SEARCH_INDEX`).
- Reads blob storage URI from `BlobTriggerConnection__blobServiceUri`.

Production configuration is provisioned via Bicep (`infra/app/loader-function.bicep`). No `local.settings.json` or `.env.example` is committed — `local.settings.json` is correctly excluded via `.gitignore`.

**Overall posture:** The function uses Managed Identity correctly for production authentication. However, there are several notable gaps: no startup validation of required environment variables, inconsistent `os.environ` access patterns, a documented-but-empty `KeyVaultUri` setting, a mismatched environment variable name between Bicep and Python code, and no `local.settings.json.example` to guide local developers.

---

## 2. Environment Variables Assessment

### Variables consumed by the function

| Variable | Access Pattern | Required | Provisioned via Bicep |
|---|---|---|---|
| `AZURE_AI_SEARCH_ENDPOINT` | `environ["..."]` (direct, no handler) | Yes | Yes |
| `AZURE_AI_SEARCH_INDEX` | `environ["..."]` (direct, no handler) | Yes | Yes (hardcoded `azure-support`) |
| `AZURE_OPENAI_EMBEDDING` | `environ.get("...")` | Yes | Yes (hardcoded `text-embedding`) |
| `AZURE_OPENAI_API_VERSION` | `environ.get("...")` | Yes | Yes (hardcoded `2023-05-15`) |
| `AZURE_OPENAI_ENDPOINT` | `environ.get("...")` | Yes | Yes (passed from OpenAI module) |
| `AZURE_OPENAI_API_KEY` | `environ.get("...")` | No (Managed Identity path sets it at runtime) | No |
| `BlobTriggerConnection__blobServiceUri` | `environ.get("...")` | Yes | Yes |
| `AZURE_STORAGE_URL` | Set in Bicep but **not read by Python code** | N/A | Yes |
| `KeyVaultUri` | Set in Bicep as empty string, **not read by Python code** | N/A | Yes (empty) |

### Key issues

**Mixed access patterns:** Lines 57–58 use `environ["AZURE_AI_SEARCH_ENDPOINT"]` and `environ["AZURE_AI_SEARCH_INDEX"]` (bare subscript — raises `KeyError` with no context), while lines 73–76 use `environ.get(...)` (returns `None` silently). Neither pattern validates at startup.

**Mismatched variable name:** Bicep provisions `AZURE_STORAGE_URL` (line 55 of `loader-function.bicep`), but the Python code reads `BlobTriggerConnection__blobServiceUri` (line 82 of `function_app.py`). `AZURE_STORAGE_URL` is set in infra but never consumed. This is a dead configuration entry.

**Silent `None` propagation:** `environ.get("AZURE_OPENAI_EMBEDDING")`, `environ.get("AZURE_OPENAI_API_VERSION")`, and `environ.get("AZURE_OPENAI_ENDPOINT")` return `None` if the variable is absent. `AzureOpenAIEmbeddings` will accept `None` for these parameters and fail later during the first embedding call with an SDK-level error that does not identify the missing variable.

**Runtime mutation of `environ`:** Lines 49, 52, and 54 write to `environ` at request time (`OPENAI_API_TYPE`, `OPENAI_API_KEY`, `AZURE_OPENAI_AD_TOKEN`). In a multi-threaded or multi-invocation environment, this is a race condition. Azure Functions Python workers may reuse the same process across concurrent invocations.

---

## 3. local.settings.json Assessment

| Check | Result |
|---|---|
| `local.settings.json` committed to source control | Not found (correct — excluded by `.gitignore`) |
| `local.settings.json` excluded by function-level `.gitignore` | Yes — line 129 of `src/DocumentLoaderFunction/.gitignore` |
| `local.settings.json` excluded by root-level `.gitignore` | No — root `.gitignore` does not mention `local.settings.json` |
| `local.settings.json.example` or equivalent template committed | **Not found** |
| Required variables documented in README | Partially — README documents the Cosmos DB connection string for the API app only; no local development instructions for the Function app |

The absence of a `local.settings.json.example` means there is no machine-readable record of which environment variables are required to run the function locally. A developer cloning the repo has no template to fill in.

---

## 4. Secrets Management Assessment

| Check | Result |
|---|---|
| API keys or tokens hardcoded in Python source | None found |
| Connection strings hardcoded in Python source | None found |
| `DefaultAzureCredential` used for Azure service authentication | Yes — correctly used for AI Search and Storage |
| Managed Identity assigned in Bicep | Yes — User-Assigned Managed Identity provisioned and assigned |
| Azure Key Vault integrated | Partially — `azure-keyvault-secrets` listed in `requirements.txt`, `KeyVaultUri` provisioned in Bicep, but **no Key Vault usage exists in `function_app.py`** |
| `AZURE_OPENAI_API_KEY` handled securely | Mixed — in production the token is fetched via Managed Identity (correct); `environ.get("AZURE_OPENAI_API_KEY")` is also passed to `AzureOpenAIEmbeddings`, creating a fallback path that could accept a plaintext key if present |

The Key Vault scaffolding (SDK dependency + empty `KeyVaultUri` app setting) exists but is unused. This suggests Key Vault integration was planned but not implemented.

---

## 5. Configuration Validation Assessment

There is **no startup validation** of required environment variables. All configuration access occurs inside the `Loader` function handler, which is triggered per blob event. Failures due to missing configuration manifest as:

- `KeyError` for `environ["AZURE_AI_SEARCH_ENDPOINT"]` and `environ["AZURE_AI_SEARCH_INDEX"]` — the exception is caught by the broad `except Exception` on line 95, logged, and swallowed. The blob is not requeued; processing silently fails.
- `None` passed to `AzureOpenAIEmbeddings` — SDK raises an error mid-embedding, also caught and logged, blob silently fails.

Because errors are caught at the top level without re-raising, a misconfigured deployment will process zero documents without any obvious alert, only silent `logging.error` output. There is no fail-fast path.

**Recommended startup pattern:** Azure Functions v2 supports module-level initialization. Required variables should be validated at import time (outside the function handler) so the worker fails to start rather than silently dropping documents.

---

## 6. Environment Awareness Assessment

| Check | Result |
|---|---|
| `FUNCTIONS_WORKER_RUNTIME` set to `python` | Yes — in Bicep |
| `AzureWebJobsStorage` configured | Yes — via `AzureWebJobsStorage__credential` / `__accountName` (Managed Identity, correct) |
| `FUNCTIONS_EXTENSION_VERSION` pinned | Yes — `~4` |
| Log level differentiated by environment | No — `logging.info` / `logging.error` used uniformly; no log level configuration based on environment |
| Development vs. production behavior distinguished | No — no `ENVIRONMENT` or equivalent variable read; no conditional logic |
| Application Insights configured | Yes — `APPINSIGHTS_INSTRUMENTATIONKEY` set in Bicep |

There is no mechanism to raise or lower log verbosity between local development and production. In production, `logging.info` calls that dump full blob content (`f"Blob content as JSON: {json_data}"`, line 68) may log large payloads to Application Insights, increasing cost and potentially exposing content in telemetry.

---

## 7. Findings

### PYCFG-ENVVAR-001 — No startup validation of required environment variables

**Severity:** Notable
**File:** `src/DocumentLoaderFunction/function_app.py`
**Lines:** 57–76

All required environment variables are read inside the per-invocation handler. Missing configuration is silently swallowed by the top-level `except Exception` block, causing blobs to be dropped with no indication of a configuration problem. There is no fail-fast path.

**Recommendation:** Add a module-level validation block that checks all required variables at import time, before the function is registered. Raise a `ValueError` or `EnvironmentError` with a clear message listing every missing variable. Azure Functions will surface this as a worker startup failure, making misconfiguration immediately visible.

---

### PYCFG-ENVVAR-002 — Bare `environ["KEY"]` subscript without error handling

**Severity:** Notable
**File:** `src/DocumentLoaderFunction/function_app.py`
**Lines:** 57, 58

`environ["AZURE_AI_SEARCH_ENDPOINT"]` and `environ["AZURE_AI_SEARCH_INDEX"]` use bare subscript access. If either variable is absent the runtime raises a `KeyError` that is caught by the generic `except Exception` on line 95 and logged without context. The error message (`KeyError: 'AZURE_AI_SEARCH_ENDPOINT'`) provides no guidance on how to fix the problem.

**Recommendation:** Replace bare subscript access with validated reads, either via a startup check (see PYCFG-ENVVAR-001) or with explicit guard clauses that raise descriptive errors identifying the missing variable and its purpose.

---

### PYCFG-ENVVAR-003 — `environ.get()` returns `None` silently for required values

**Severity:** Notable
**File:** `src/DocumentLoaderFunction/function_app.py`
**Lines:** 73–76

`environ.get("AZURE_OPENAI_EMBEDDING")`, `environ.get("AZURE_OPENAI_API_VERSION")`, and `environ.get("AZURE_OPENAI_ENDPOINT")` return `None` when absent. These values are passed directly to `AzureOpenAIEmbeddings`, which will fail during the first embedding call with an SDK-level exception that does not identify the missing variable.

**Recommendation:** Either validate at startup (PYCFG-ENVVAR-001) or use `environ.get("KEY") or raise ValueError("AZURE_OPENAI_ENDPOINT is required")` patterns. Never pass `None` silently to SDK constructors for required parameters.

---

### PYCFG-ENVVAR-004 — Runtime mutation of `os.environ` is a concurrency hazard

**Severity:** Notable
**File:** `src/DocumentLoaderFunction/function_app.py`
**Lines:** 49, 52, 54

`environ["OPENAI_API_TYPE"]`, `environ["OPENAI_API_KEY"]`, and `environ["AZURE_OPENAI_AD_TOKEN"]` are written inside the per-invocation handler on every blob event. Azure Functions Python workers may handle concurrent invocations within the same process, and `os.environ` is a process-global shared dict. Concurrent writes create a race condition where one invocation may use the token fetched by another.

**Recommendation:** Pass the credential token directly to the LangChain client via its constructor parameters rather than routing it through `os.environ`. `AzureOpenAIEmbeddings` accepts `azure_ad_token` and `api_key` parameters directly, avoiding process-global state.

---

### PYCFG-LOCAL-001 — No `local.settings.json.example` provided

**Severity:** Notable
**File:** `src/DocumentLoaderFunction/` (missing file)

There is no `local.settings.json.example`, `.env.example`, or equivalent template that documents which environment variables are required to run the function locally. A developer cloning the repository has no machine-readable guide to set up their local environment.

**Recommendation:** Commit a `local.settings.json.example` to `src/DocumentLoaderFunction/` listing all required `Values` keys with placeholder values and inline comments explaining each variable's purpose. Ensure this file is not excluded by `.gitignore`.

---

### PYCFG-SECRET-001 — Key Vault scaffolding present but unused

**Severity:** Minor
**File:** `src/DocumentLoaderFunction/function_app.py`, `infra/app/loader-function.bicep`
**Lines:** Bicep line 103–105; `requirements.txt` line 3

`azure-keyvault-secrets` is listed in `requirements.txt` and `KeyVaultUri` is provisioned as an empty string in Bicep, but no Key Vault client is instantiated or used in `function_app.py`. This scaffolding implies planned integration that was never completed.

**Recommendation:** Either complete the Key Vault integration (reading the `KeyVaultUri` setting and using `SecretClient` to retrieve sensitive values) or remove the dead dependency and Bicep setting to reduce deployment surface and confusion.

---

### PYCFG-ENVVAR-005 — Dead `AZURE_STORAGE_URL` environment variable (Bicep/code mismatch)

**Severity:** Minor
**File:** `infra/app/loader-function.bicep` line 55; `src/DocumentLoaderFunction/function_app.py` line 82

Bicep provisions `AZURE_STORAGE_URL` with the blob service URI, but the Python code reads `BlobTriggerConnection__blobServiceUri` instead. `AZURE_STORAGE_URL` is set in the function app environment but never consumed by any Python code. This is a dead configuration entry that could confuse future maintainers or operators.

**Recommendation:** Remove the `AZURE_STORAGE_URL` app setting from `loader-function.bicep`. The function already correctly reads `BlobTriggerConnection__blobServiceUri` at line 82.

---

### PYCFG-LOGGING-001 — Full document content logged at `INFO` level in production

**Severity:** Minor
**File:** `src/DocumentLoaderFunction/function_app.py`
**Line:** 68

`logging.info(f"Blob content as JSON: {json_data}")` logs the full parsed document payload at `INFO` level on every invocation. In production, with Application Insights ingestion, this may log large structured payloads, increasing ingestion cost and potentially exposing document content in telemetry systems.

**Recommendation:** Downgrade to `logging.debug` or log only a document identifier (e.g., blob name and reference code) at `INFO`. Reserve full content logging for `DEBUG` level and gate it on an environment variable such as `LOG_LEVEL`.

---

### PYCFG-ENVVAR-006 — No environment-aware log level configuration

**Severity:** Minor
**File:** `src/DocumentLoaderFunction/function_app.py`, `src/DocumentLoaderFunction/host.json`

The function uses `logging.info` and `logging.error` uniformly with no mechanism to adjust verbosity between local development and production. `host.json` configures Application Insights sampling but does not set a log level for the Python worker.

**Recommendation:** Read a `LOG_LEVEL` environment variable at startup and apply it via `logging.basicConfig(level=...)`. Document `LOG_LEVEL` in the `local.settings.json.example`. Set it to `DEBUG` locally and `WARNING` or `INFO` in production Bicep.

---

### PYCFG-SECRET-002 — `AZURE_OPENAI_API_KEY` passed to SDK creates plaintext key fallback path

**Severity:** Info
**File:** `src/DocumentLoaderFunction/function_app.py`
**Line:** 76

`api_key=environ.get("AZURE_OPENAI_API_KEY")` is passed to `AzureOpenAIEmbeddings`. In production, the Managed Identity token path (lines 49–54) sets `OPENAI_API_KEY` in `environ`, but the explicit `api_key` parameter on the SDK constructor means a plaintext API key would be accepted if placed in this variable. The Bicep does not provision `AZURE_OPENAI_API_KEY`, so this path is inactive in the standard deployment, but it represents a latent surface.

**Recommendation:** Remove the `api_key` parameter from the `AzureOpenAIEmbeddings` constructor when Managed Identity is the intended authentication path. Rely solely on the token set via `OPENAI_API_KEY` / `AZURE_OPENAI_AD_TOKEN` to eliminate any path that would accept a plaintext key.

---

### PYCFG-INFRA-001 — Managed Identity authentication correctly implemented

**Severity:** Info
**File:** `infra/app/loader-function.bicep`, `src/DocumentLoaderFunction/function_app.py`

`DefaultAzureCredential` is used for all Azure service authentication (AI Search, Blob Storage). The Bicep provisions a User-Assigned Managed Identity and configures all `BlobTriggerConnection__*` and `AzureWebJobsStorage__*` settings with `managedidentity` credential type. No connection strings or API keys are stored in the function app settings for Azure service authentication.

---

## 8. Recommended Improvements

### Priority 1 — Startup validation (addresses PYCFG-ENVVAR-001, -002, -003)

Add a module-level guard before the `FunctionApp` registration:

```python
_REQUIRED_ENV_VARS = [
    "AZURE_AI_SEARCH_ENDPOINT",
    "AZURE_AI_SEARCH_INDEX",
    "AZURE_OPENAI_EMBEDDING",
    "AZURE_OPENAI_API_VERSION",
    "AZURE_OPENAI_ENDPOINT",
    "BlobTriggerConnection__blobServiceUri",
]

_missing = [v for v in _REQUIRED_ENV_VARS if not environ.get(v)]
if _missing:
    raise EnvironmentError(
        f"DocumentLoaderFunction: missing required environment variables: {', '.join(_missing)}"
    )
```

This causes the worker to refuse to start rather than silently dropping documents.

### Priority 2 — Remove runtime `os.environ` mutation (addresses PYCFG-ENVVAR-004)

Replace the `environ["OPENAI_API_TYPE"]` / `environ["OPENAI_API_KEY"]` writes with direct constructor parameters on `AzureOpenAIEmbeddings`, passing `azure_ad_token` from `credential.get_token(...)` directly.

### Priority 3 — Commit `local.settings.json.example` (addresses PYCFG-LOCAL-001)

Create `src/DocumentLoaderFunction/local.settings.json.example` listing all required variables with placeholder values and comments.

### Priority 4 — Remove dead configuration entries (addresses PYCFG-ENVVAR-005, PYCFG-SECRET-001)

- Remove `AZURE_STORAGE_URL` from `loader-function.bicep`.
- Either implement Key Vault integration or remove `azure-keyvault-secrets` from `requirements.txt` and remove the `KeyVaultUri` Bicep setting.

### Priority 5 — Log level configuration (addresses PYCFG-LOGGING-001, PYCFG-ENVVAR-006)

- Add `LOG_LEVEL` env var support at startup.
- Downgrade full document content logging to `DEBUG`.

---

## Summary

| Severity | Count |
|---|---|
| Critical | 0 |
| Notable | 4 |
| Minor | 4 |
| Info | 2 |
| **Total** | **10** |

The function's security posture for production is sound — Managed Identity is used correctly and no secrets are committed. The primary risks are operational: missing startup validation means misconfigured deployments silently drop documents, and the absence of a `local.settings.json.example` makes local development setup undocumented. The runtime `os.environ` mutation pattern is a concurrency hazard worth addressing before the function sees high-throughput use.
