Adding semantic caching with Azure Managed Redis #3024

robertopc1 · 2025-12-17T16:16:11Z

Why make this change?

Closes #3023 — Adds semantic caching support so repeated (or semantically equivalent) SQL queries can be served from cache instead of re-executing against the database, reducing latency and database load. This also enables “near-duplicate” query reuse by caching against vector similarity rather than exact string matching.
Additional discussion/setup notes: semantic-cache-real-azure-openai-setup.md

What is this change?

Introduces a new semantic caching pipeline for SQL query execution (MSSQL/MySQL/PostgreSQL) backed by:
Embeddings generated via an IEmbeddingService implementation (Azure OpenAI).
Vector storage + similarity search via ISemanticCache implemented on top of Azure Managed Redis vector capabilities.
Adds runtime config support for semantic caching:
New config object models (SemanticCacheOptions, EmbeddingProviderOptions, AzureManagedRedisOptions) and JSON converter factories.
Runtime config loading + validation updates to enforce required semantic cache configuration.
Wires semantic cache through the execution stack:
Updates QueryManagerFactory / QueryEngineFactory and SQL executors to use the semantic cache-aware QueryExecutor flow.
Updates service startup to register semantic cache services.
Adds CLI support to generate semantic cache configuration via config generation paths (CLI options + config generator updates).

References:
Real Azure setup + testing guide: semantic-cache-real-azure-openai-setup.md
Redis vector similarity search concepts (for reviewers): https://redis.io/docs/latest/develop/interact/search-and-query/query/vector-search/

How was this tested?

Integration Tests — SemanticCacheIntegrationTests.cs
Unit Tests — SemanticCacheOptionsTests.cs, SemanticCacheServiceTests.cs, AzureOpenAIEmbeddingServiceTests.cs
E2E Tests — SemanticCacheE2ETests.cs and CLI e2e updates in EndToEndTests.cs

Sample Request(s)

# First request (cache miss -> DB execution + cache write) curl -s "http://localhost:5000/api/Books?$filter=title eq 'Dune'"

# Second request (expected semantic cache hit -> served from cache) curl -s "http://localhost:5000/api/Books?$filter=title eq 'Dune'"

# First request (cache miss) query { books(filter: { title: { eq: "Dune" } }) { id title } }

# Re-run the same (or semantically equivalent) query (expected cache hit) query { books(filter: { title: { eq: "Dune" } }) { id title } }

- Add SemanticCacheOptions with similarity threshold, max results, TTL - Add AzureManagedRedisOptions for Redis connection configuration - Add EmbeddingProviderOptions for Azure OpenAI configuration - Wire semantic cache options into RuntimeOptions and RuntimeConfig - Add UserProvided flags following DAB repository patterns

- Add SemanticCacheOptionsConverterFactory with validation - Add AzureManagedRedisOptionsConverterFactory - Add EmbeddingProviderOptionsConverterFactory - Register converters in RuntimeConfigLoader.GetSerializationOptions() - Validate similarity threshold (0.0-1.0) and numeric fields

… Redis - Implement AzureOpenAIEmbeddingService with exponential backoff retry - Implement RedisVectorStore with RediSearch vector similarity (KNN) - Implement SemanticCacheService orchestration layer - Add SemanticCacheResult DTO - Register services in DI with conditional configuration validation - Use COSINE distance metric for text embeddings - Support automatic Redis vector index creation

- Architecture overview and component descriptions - Configuration examples and parameter reference - Usage patterns and integration examples - Performance characteristics and scalability guidance - Troubleshooting guide and monitoring recommendations

Add ValidateSemanticCacheConfiguration() method to RuntimeConfigValidator to ensure semantic cache is properly configured when enabled. **Validations:** - Validates Azure Managed Redis connection string is not null/empty - Validates embedding provider endpoint, API key, and model are configured - Validates similarity-threshold is between 0.0 and 1.0 - Validates max-results and expire-seconds are positive integers - Integrated into ValidateConfigProperties() for startup validation Completes semantic caching infrastructure implementation.

…ings

…r semantic caching

…rFactory, updating integration tests, e2e tests and readme file

RubenCerna2079 · 2025-12-22T15:26:18Z

Hi @robertopc1, once you think the PR is ready for review please change it from a draft to an open PR.

robertopc1 · 2026-01-05T18:49:43Z

Hi @robertopc1, once you think the PR is ready for review please change it from a draft to an open PR.

Thank you @RubenCerna2079 - I just did :)

Copilot

Pull request overview

This PR adds semantic caching support for Data API Builder, enabling repeated or semantically similar SQL queries to be served from cache using vector similarity search rather than exact string matching. The implementation uses Azure OpenAI for generating embeddings and Azure Managed Redis (with RediSearch) for vector storage and similarity search.

Key changes:

New semantic cache infrastructure with ISemanticCache and IEmbeddingService interfaces
Integration with SQL query execution pipeline (MSSQL, MySQL, PostgreSQL)
Runtime configuration support with CLI commands for semantic cache settings

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
src/Service/Startup.cs	Registers semantic cache services with DI container when enabled
src/Service/SemanticCache/*.cs	Core semantic cache implementation: service, Redis vector store, Azure OpenAI embedding service
src/Core/Services/*.cs	Service interfaces for semantic cache and embeddings
src/Core/Resolvers/SqlQueryEngine.cs	Integrates semantic cache into SQL query execution pipeline
src/Core/Resolvers/QueryExecutor.cs	Adds semantic cache check/store logic at executor level
src/Core/Resolvers/*QueryExecutor.cs	Updates MSSQL/MySQL/PostgreSQL executors to accept semantic cache services
src/Core/Resolvers/Factories/*.cs	Updates factories to pass semantic cache services to executors
src/Core/Configurations/RuntimeConfigValidator.cs	Adds validation for semantic cache configuration
src/Config/ObjectModel/*.cs	New configuration models for semantic cache options
src/Config/Converters/*.cs	JSON converters for semantic cache configuration
src/Config/RuntimeConfigLoader.cs	Registers semantic cache converters
src/Cli/*.cs	CLI support for configuring semantic cache via command line
src/Service.Tests/*.cs	Unit, integration, and E2E tests for semantic cache
src/Service/SemanticCache/README.md	Comprehensive documentation for semantic cache feature
docs/Testing/*.md	Setup guide for testing with real Azure OpenAI

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-05T18:56:29Z

src/Core/Resolvers/SqlQueryEngine.cs

+            // Check semantic cache first if enabled
+            if (runtimeConfig.IsSemanticCachingEnabled && 
+                _semanticCache is not null && 
+                _embeddingService is not null &&
+                structure.DbPolicyPredicatesForOperations[EntityActionOperation.Read] == string.Empty)
+            {
+                _logger.LogInformation(
+                    "Semantic cache IS ENABLED - will attempt to use it for query: {Query}",
+                    queryString.Substring(0, Math.Min(100, queryString.Length)));
+
+                try
+                {
+                    // Generate embedding for the query
+                    float[] embedding = await _embeddingService.GenerateEmbeddingAsync(queryString);
+
+                    _logger.LogDebug(
+                        "Generated embedding with {Dimensions} dimensions",
+                        embedding.Length);
+
+                    // Get semantic cache config
+                    var semanticCacheConfig = runtimeConfig.Runtime?.SemanticCache;
+                    int maxResults = semanticCacheConfig?.MaxResults ?? SemanticCacheOptions.DEFAULT_MAX_RESULTS;
+                    double similarityThreshold = semanticCacheConfig?.SimilarityThreshold ?? SemanticCacheOptions.DEFAULT_SIMILARITY_THRESHOLD;
+
+                    // Query semantic cache
+                    SemanticCacheResult? cacheResult = await _semanticCache.QueryAsync(
+                        embedding,
+                        maxResults,
+                        similarityThreshold);
+
+                    if (cacheResult is not null)
+                    {
+                        _logger.LogInformation(
+                            "Semantic cache hit! Similarity: {Similarity:F4} for query: {Query}",
+                            cacheResult.Similarity,
+                            queryString.Substring(0, Math.Min(100, queryString.Length)));
+
+                        // Parse cached JSON response back to JsonDocument
+                        return JsonDocument.Parse(cacheResult.Response);
+                    }
+
+                    _logger.LogDebug("Semantic cache miss for query: {Query}", 
+                        queryString.Substring(0, Math.Min(100, queryString.Length)));
+
+                    // Execute query against database
+                    JsonDocument? queryResponse = await ExecuteQueryAndCacheAsync(
+                        queryExecutor,
+                        queryString,
+                        structure,
+                        dataSourceName,
+                        embedding,
+                        runtimeConfig);
+
+                    return queryResponse;
+                }
+                catch (Exception ex)
+                {
+                    _logger.LogWarning(ex, "Semantic cache operation failed, falling back to normal execution");
+                    // Fall through to normal execution
+                }
+            }


The embedding generation logic is duplicated in both SqlQueryEngine (lines 334-394) and QueryExecutor (lines 958-1081). This creates a maintenance burden and potential for inconsistency. Consider consolidating the semantic cache check logic into a single location or creating a shared helper method.

Copilot · 2026-01-05T18:56:29Z

src/Service/SemanticCache/RedisVectorStore.cs

+            // Note: We'll use a default dimension (1536 for text-embedding-3-small)
+            // The actual dimension should match your embedding model
+            int defaultDimensions = 1536; // Adjust based on your embedding model


The hardcoded default dimension (1536) is specific to text-embedding-ada-002 and text-embedding-3-small models. If a user configures a different model (like text-embedding-3-large with 3072 dimensions), the index will be created with the wrong dimension size, causing vector search failures. Consider making the dimension configurable or dynamically determining it from the first stored embedding.

Copilot · 2026-01-05T18:56:29Z

src/Service/SemanticCache/AzureOpenAIEmbeddingService.cs

+        // Configure HTTP client
+        _httpClient.DefaultRequestHeaders.Add("api-key", _options.ApiKey);
+        _httpClient.Timeout = TimeSpan.FromSeconds(30);


The API key is added to the HttpClient's DefaultRequestHeaders, which means all instances created by this factory will share the same headers. If the factory creates clients for different purposes, this could leak the API key to unintended endpoints. Consider creating a named HttpClient specifically for Azure OpenAI or setting headers per request instead of on the client.

Copilot · 2026-01-05T18:56:30Z

src/Service/Startup.cs

+                if (semanticCacheOptions.AzureManagedRedis is null ||
+                    string.IsNullOrWhiteSpace(semanticCacheOptions.AzureManagedRedis.ConnectionString))
+                {
+                    throw new Exception("Semantic Cache: Azure Managed Redis connection string is required when semantic caching is enabled.");
+                }
+
+                if (semanticCacheOptions.EmbeddingProvider is null ||
+                    string.IsNullOrWhiteSpace(semanticCacheOptions.EmbeddingProvider.Endpoint))
+                {
+                    throw new Exception("Semantic Cache: Embedding provider endpoint is required when semantic caching is enabled.");
+                }


Using generic Exception for validation errors is too broad and doesn't provide a standardized error response. Consider using DataApiBuilderException with appropriate status codes and subStatusCodes to align with the existing error handling pattern used elsewhere in the codebase (see RuntimeConfigValidator for examples).

Copilot · 2026-01-05T18:56:30Z

src/Core/Resolvers/QueryExecutor.cs

+                // Generate embedding for SQL query
+                float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);


The query embedding is generated twice: once in SqlQueryEngine for cache lookup and again in QueryExecutor when storing the result. This doubles the cost and latency of embedding generation. The embedding should be passed between these methods to avoid redundant API calls to Azure OpenAI.

Suggested change

// Generate embedding for SQL query

float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);

// Reuse precomputed embedding for SQL query when available, otherwise generate it.

float[] embedding;

if (httpContext?.Items != null &&

httpContext.Items.TryGetValue("SemanticCache.SqlQueryEmbedding", out object? existingEmbeddingObj) &&

existingEmbeddingObj is float[] existingEmbedding)

{

embedding = existingEmbedding;

}

else

{

embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);

}

Copilot · 2026-01-05T18:56:32Z

src/Service/SemanticCache/RedisVectorStore.cs

+            // Build FT.SEARCH query for vector similarity
+            // KNN query format: *=>[KNN K @field_name $vector AS score]
+            string indexName = GetIndexName();
+            string keyPrefix = _options.KeyPrefix ?? "resp:";


This assignment to keyPrefix is useless, since its value is never read.

Copilot · 2026-01-05T18:56:32Z

src/Service/SemanticCache/RedisVectorStore.cs

+            // Check if index exists using FT.INFO
+            try
+            {
+                var infoResult = await _database.ExecuteAsync("FT.INFO", indexName);


This assignment to infoResult is useless, since its value is never read.

Copilot · 2026-01-05T18:56:32Z

src/Service/SemanticCache/RedisVectorStore.cs

+            var createResult = await _database.ExecuteAsync(
+                "FT.CREATE",
+                indexName,
+                "ON", "HASH",
+                "PREFIX", "1", keyPrefix,
+                "SCHEMA",
+                FIELD_QUERY, "TEXT",
+                FIELD_EMBEDDING, "VECTOR", "FLAT", "6",
+                    "TYPE", "FLOAT32",
+                    "DIM", defaultDimensions.ToString(),
+                    "DISTANCE_METRIC", "COSINE",
+                FIELD_RESPONSE, "TEXT",
+                FIELD_TIMESTAMP, "NUMERIC",
+                FIELD_DIMENSIONS, "NUMERIC");


This assignment to createResult is useless, since its value is never read.

Copilot · 2026-01-05T18:56:32Z

src/Core/Resolvers/QueryExecutor.cs

+            if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
+            {
+                if (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) ||
+                    sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) ||
+                    sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) ||
+                    sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) ||
+                    sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase))
+                {
+                    return false;
+                }


These 'if' statements can be combined.

Suggested change

if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))

{

if (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase))

{

return false;

}

if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase) &&

(sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) ||

sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase)))

{

return false;

Copilot · 2026-01-05T18:56:33Z

src/Service/SemanticCache/AzureOpenAIEmbeddingService.cs

+                if (response.StatusCode == HttpStatusCode.TooManyRequests)
+                {
+                    if (attempt < MAX_RETRIES)
+                    {
+                        int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1);
+                        _logger.LogWarning(
+                            "Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})",
+                            delayMs,
+                            attempt,
+                            MAX_RETRIES);
+                        await Task.Delay(delayMs, cancellationToken);
+                        continue;
+                    }


These 'if' statements can be combined.

Suggested change

if (response.StatusCode == HttpStatusCode.TooManyRequests)

{

if (attempt < MAX_RETRIES)

{

int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1);

_logger.LogWarning(

"Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})",

delayMs,

attempt,

MAX_RETRIES);

await Task.Delay(delayMs, cancellationToken);

continue;

}

if (response.StatusCode == HttpStatusCode.TooManyRequests && attempt < MAX_RETRIES)

{

int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1);

_logger.LogWarning(

"Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})",

delayMs,

attempt,

MAX_RETRIES);

await Task.Delay(delayMs, cancellationToken);

continue;

JerryNixon · 2026-01-05T20:22:01Z

I am a little concerned that such a large PR was submitted for a new feature by the same author without any planning. Coupling to Azure OpenAI and Azure Redis so tightly feels like we are moving away from our core principles. Then again, I am open to advanced features like this, especially when they bring such high value to our customers. But we need to discuss this before moving forward on this plan.

Roberto Perez and others added 13 commits December 8, 2025 15:20

feat: Add semantic caching with Azure Managed Redis and OpenAI embedd…

44c835a

…ings

docs: Clarify SQL-only scope and add comprehensive config examples fo…

6d27c75

…r semantic caching

Adding Unit Tests

788e08f

Adding CLI Support, Unit Tests and Integration Tests

2a7fad3

feat: Add E2E tests for semantic caching with real Azure OpenAI

92d02ae

Adding fixes

a578a1b

removing some unnesessary files

00a844a

Adding ISemantCache and IEmbeddingService to the executors and Manage…

b461722

…rFactory, updating integration tests, e2e tests and readme file

robertopc1 marked this pull request as ready for review January 5, 2026 18:45

Copilot AI review requested due to automatic review settings January 5, 2026 18:45

robertopc1 requested review from Alekhya-Polavarapu, Aniruddh25, RubenCerna2079, aaronburtle, anushakolan, neeraj-sharma2592, rusamant, sourabh1007, souvikghosh04 and vadeveka as code owners January 5, 2026 18:45

Copilot started reviewing on behalf of robertopc1 January 5, 2026 18:46 View session

Copilot AI reviewed Jan 5, 2026

View reviewed changes

		// Generate embedding for SQL query
		float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);

-                // Generate embedding for SQL query
-                float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);
+                // Reuse precomputed embedding for SQL query when available, otherwise generate it.
+                float[] embedding;
+                if (httpContext?.Items != null &&
+                    httpContext.Items.TryGetValue("SemanticCache.SqlQueryEmbedding", out object? existingEmbeddingObj) &&
+                    existingEmbeddingObj is float[] existingEmbedding)
+                {
+                    embedding = existingEmbedding;
+                }
+                else
+                {
+                    embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);
+                }

Adding semantic caching with Azure Managed Redis #3024

Are you sure you want to change the base?

Adding semantic caching with Azure Managed Redis #3024

Uh oh!

Conversation

robertopc1 commented Dec 17, 2025

Why make this change?

What is this change?

How was this tested?

Sample Request(s)

Uh oh!

RubenCerna2079 commented Dec 22, 2025

Uh oh!

robertopc1 commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

JerryNixon commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants