Skip to content

Conversation

@robertopc1
Copy link

Why make this change?

Closes #3023 — Adds semantic caching support so repeated (or semantically equivalent) SQL queries can be served from cache instead of re-executing against the database, reducing latency and database load. This also enables “near-duplicate” query reuse by caching against vector similarity rather than exact string matching.
Additional discussion/setup notes: semantic-cache-real-azure-openai-setup.md

What is this change?

  • Introduces a new semantic caching pipeline for SQL query execution (MSSQL/MySQL/PostgreSQL) backed by:
    Embeddings generated via an IEmbeddingService implementation (Azure OpenAI).

  • Vector storage + similarity search via ISemanticCache implemented on top of Azure Managed Redis vector capabilities.

  • Adds runtime config support for semantic caching:
    New config object models (SemanticCacheOptions, EmbeddingProviderOptions, AzureManagedRedisOptions) and JSON converter factories.

  • Runtime config loading + validation updates to enforce required semantic cache configuration.

  • Wires semantic cache through the execution stack:

  • Updates QueryManagerFactory / QueryEngineFactory and SQL executors to use the semantic cache-aware QueryExecutor flow.

  • Updates service startup to register semantic cache services.

  • Adds CLI support to generate semantic cache configuration via config generation paths (CLI options + config generator updates).

References:
Real Azure setup + testing guide: semantic-cache-real-azure-openai-setup.md
Redis vector similarity search concepts (for reviewers): https://redis.io/docs/latest/develop/interact/search-and-query/query/vector-search/

How was this tested?

Integration Tests — SemanticCacheIntegrationTests.cs
Unit Tests — SemanticCacheOptionsTests.cs, SemanticCacheServiceTests.cs, AzureOpenAIEmbeddingServiceTests.cs
E2E Tests — SemanticCacheE2ETests.cs and CLI e2e updates in EndToEndTests.cs

Sample Request(s)

# First request (cache miss -> DB execution + cache write) curl -s "http://localhost:5000/api/Books?$filter=title eq 'Dune'"

# Second request (expected semantic cache hit -> served from cache) curl -s "http://localhost:5000/api/Books?$filter=title eq 'Dune'"

# First request (cache miss) query { books(filter: { title: { eq: "Dune" } }) { id title } }

# Re-run the same (or semantically equivalent) query (expected cache hit) query { books(filter: { title: { eq: "Dune" } }) { id title } }

Roberto Perez and others added 13 commits December 8, 2025 15:20
- Add SemanticCacheOptions with similarity threshold, max results, TTL
- Add AzureManagedRedisOptions for Redis connection configuration
- Add EmbeddingProviderOptions for Azure OpenAI configuration
- Wire semantic cache options into RuntimeOptions and RuntimeConfig
- Add UserProvided flags following DAB repository patterns
- Add SemanticCacheOptionsConverterFactory with validation
- Add AzureManagedRedisOptionsConverterFactory
- Add EmbeddingProviderOptionsConverterFactory
- Register converters in RuntimeConfigLoader.GetSerializationOptions()
- Validate similarity threshold (0.0-1.0) and numeric fields
… Redis

- Implement AzureOpenAIEmbeddingService with exponential backoff retry
- Implement RedisVectorStore with RediSearch vector similarity (KNN)
- Implement SemanticCacheService orchestration layer
- Add SemanticCacheResult DTO
- Register services in DI with conditional configuration validation
- Use COSINE distance metric for text embeddings
- Support automatic Redis vector index creation
- Architecture overview and component descriptions
- Configuration examples and parameter reference
- Usage patterns and integration examples
- Performance characteristics and scalability guidance
- Troubleshooting guide and monitoring recommendations
Add ValidateSemanticCacheConfiguration() method to RuntimeConfigValidator
to ensure semantic cache is properly configured when enabled.

**Validations:**
- Validates Azure Managed Redis connection string is not null/empty
- Validates embedding provider endpoint, API key, and model are configured
- Validates similarity-threshold is between 0.0 and 1.0
- Validates max-results and expire-seconds are positive integers
- Integrated into ValidateConfigProperties() for startup validation

Completes semantic caching infrastructure implementation.
…rFactory, updating integration tests, e2e tests and readme file
@RubenCerna2079
Copy link
Contributor

Hi @robertopc1, once you think the PR is ready for review please change it from a draft to an open PR.

@robertopc1
Copy link
Author

Hi @robertopc1, once you think the PR is ready for review please change it from a draft to an open PR.

Thank you @RubenCerna2079 - I just did :)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds semantic caching support for Data API Builder, enabling repeated or semantically similar SQL queries to be served from cache using vector similarity search rather than exact string matching. The implementation uses Azure OpenAI for generating embeddings and Azure Managed Redis (with RediSearch) for vector storage and similarity search.

Key changes:

  • New semantic cache infrastructure with ISemanticCache and IEmbeddingService interfaces
  • Integration with SQL query execution pipeline (MSSQL, MySQL, PostgreSQL)
  • Runtime configuration support with CLI commands for semantic cache settings

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
src/Service/Startup.cs Registers semantic cache services with DI container when enabled
src/Service/SemanticCache/*.cs Core semantic cache implementation: service, Redis vector store, Azure OpenAI embedding service
src/Core/Services/*.cs Service interfaces for semantic cache and embeddings
src/Core/Resolvers/SqlQueryEngine.cs Integrates semantic cache into SQL query execution pipeline
src/Core/Resolvers/QueryExecutor.cs Adds semantic cache check/store logic at executor level
src/Core/Resolvers/*QueryExecutor.cs Updates MSSQL/MySQL/PostgreSQL executors to accept semantic cache services
src/Core/Resolvers/Factories/*.cs Updates factories to pass semantic cache services to executors
src/Core/Configurations/RuntimeConfigValidator.cs Adds validation for semantic cache configuration
src/Config/ObjectModel/*.cs New configuration models for semantic cache options
src/Config/Converters/*.cs JSON converters for semantic cache configuration
src/Config/RuntimeConfigLoader.cs Registers semantic cache converters
src/Cli/*.cs CLI support for configuring semantic cache via command line
src/Service.Tests/*.cs Unit, integration, and E2E tests for semantic cache
src/Service/SemanticCache/README.md Comprehensive documentation for semantic cache feature
docs/Testing/*.md Setup guide for testing with real Azure OpenAI

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +334 to +394
// Check semantic cache first if enabled
if (runtimeConfig.IsSemanticCachingEnabled &&
_semanticCache is not null &&
_embeddingService is not null &&
structure.DbPolicyPredicatesForOperations[EntityActionOperation.Read] == string.Empty)
{
_logger.LogInformation(
"Semantic cache IS ENABLED - will attempt to use it for query: {Query}",
queryString.Substring(0, Math.Min(100, queryString.Length)));

try
{
// Generate embedding for the query
float[] embedding = await _embeddingService.GenerateEmbeddingAsync(queryString);

_logger.LogDebug(
"Generated embedding with {Dimensions} dimensions",
embedding.Length);

// Get semantic cache config
var semanticCacheConfig = runtimeConfig.Runtime?.SemanticCache;
int maxResults = semanticCacheConfig?.MaxResults ?? SemanticCacheOptions.DEFAULT_MAX_RESULTS;
double similarityThreshold = semanticCacheConfig?.SimilarityThreshold ?? SemanticCacheOptions.DEFAULT_SIMILARITY_THRESHOLD;

// Query semantic cache
SemanticCacheResult? cacheResult = await _semanticCache.QueryAsync(
embedding,
maxResults,
similarityThreshold);

if (cacheResult is not null)
{
_logger.LogInformation(
"Semantic cache hit! Similarity: {Similarity:F4} for query: {Query}",
cacheResult.Similarity,
queryString.Substring(0, Math.Min(100, queryString.Length)));

// Parse cached JSON response back to JsonDocument
return JsonDocument.Parse(cacheResult.Response);
}

_logger.LogDebug("Semantic cache miss for query: {Query}",
queryString.Substring(0, Math.Min(100, queryString.Length)));

// Execute query against database
JsonDocument? queryResponse = await ExecuteQueryAndCacheAsync(
queryExecutor,
queryString,
structure,
dataSourceName,
embedding,
runtimeConfig);

return queryResponse;
}
catch (Exception ex)
{
_logger.LogWarning(ex, "Semantic cache operation failed, falling back to normal execution");
// Fall through to normal execution
}
}
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The embedding generation logic is duplicated in both SqlQueryEngine (lines 334-394) and QueryExecutor (lines 958-1081). This creates a maintenance burden and potential for inconsistency. Consider consolidating the semantic cache check logic into a single location or creating a shared helper method.

Copilot uses AI. Check for mistakes.
Comment on lines +291 to +293
// Note: We'll use a default dimension (1536 for text-embedding-3-small)
// The actual dimension should match your embedding model
int defaultDimensions = 1536; // Adjust based on your embedding model
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded default dimension (1536) is specific to text-embedding-ada-002 and text-embedding-3-small models. If a user configures a different model (like text-embedding-3-large with 3072 dimensions), the index will be created with the wrong dimension size, causing vector search failures. Consider making the dimension configurable or dynamically determining it from the first stored embedding.

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +57
// Configure HTTP client
_httpClient.DefaultRequestHeaders.Add("api-key", _options.ApiKey);
_httpClient.Timeout = TimeSpan.FromSeconds(30);
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API key is added to the HttpClient's DefaultRequestHeaders, which means all instances created by this factory will share the same headers. If the factory creates clients for different purposes, this could leak the API key to unintended endpoints. Consider creating a named HttpClient specifically for Azure OpenAI or setting headers per request instead of on the client.

Copilot uses AI. Check for mistakes.
Comment on lines +473 to +483
if (semanticCacheOptions.AzureManagedRedis is null ||
string.IsNullOrWhiteSpace(semanticCacheOptions.AzureManagedRedis.ConnectionString))
{
throw new Exception("Semantic Cache: Azure Managed Redis connection string is required when semantic caching is enabled.");
}

if (semanticCacheOptions.EmbeddingProvider is null ||
string.IsNullOrWhiteSpace(semanticCacheOptions.EmbeddingProvider.Endpoint))
{
throw new Exception("Semantic Cache: Embedding provider endpoint is required when semantic caching is enabled.");
}
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using generic Exception for validation errors is too broad and doesn't provide a standardized error response. Consider using DataApiBuilderException with appropriate status codes and subStatusCodes to align with the existing error handling pattern used elsewhere in the codebase (see RuntimeConfigValidator for examples).

Copilot uses AI. Check for mistakes.
Comment on lines +1051 to +1052
// Generate embedding for SQL query
float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query embedding is generated twice: once in SqlQueryEngine for cache lookup and again in QueryExecutor when storing the result. This doubles the cost and latency of embedding generation. The embedding should be passed between these methods to avoid redundant API calls to Azure OpenAI.

Suggested change
// Generate embedding for SQL query
float[] embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);
// Reuse precomputed embedding for SQL query when available, otherwise generate it.
float[] embedding;
if (httpContext?.Items != null &&
httpContext.Items.TryGetValue("SemanticCache.SqlQueryEmbedding", out object? existingEmbeddingObj) &&
existingEmbeddingObj is float[] existingEmbedding)
{
embedding = existingEmbedding;
}
else
{
embedding = await EmbeddingService.GenerateEmbeddingAsync(sqlText);
}

Copilot uses AI. Check for mistakes.
// Build FT.SEARCH query for vector similarity
// KNN query format: *=>[KNN K @field_name $vector AS score]
string indexName = GetIndexName();
string keyPrefix = _options.KeyPrefix ?? "resp:";
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to keyPrefix is useless, since its value is never read.

Copilot uses AI. Check for mistakes.
// Check if index exists using FT.INFO
try
{
var infoResult = await _database.ExecuteAsync("FT.INFO", indexName);
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to infoResult is useless, since its value is never read.

Copilot uses AI. Check for mistakes.
Comment on lines +295 to +308
var createResult = await _database.ExecuteAsync(
"FT.CREATE",
indexName,
"ON", "HASH",
"PREFIX", "1", keyPrefix,
"SCHEMA",
FIELD_QUERY, "TEXT",
FIELD_EMBEDDING, "VECTOR", "FLAT", "6",
"TYPE", "FLOAT32",
"DIM", defaultDimensions.ToString(),
"DISTANCE_METRIC", "COSINE",
FIELD_RESPONSE, "TEXT",
FIELD_TIMESTAMP, "NUMERIC",
FIELD_DIMENSIONS, "NUMERIC");
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to createResult is useless, since its value is never read.

Copilot uses AI. Check for mistakes.
Comment on lines +936 to +945
if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
{
if (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase))
{
return false;
}
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 'if' statements can be combined.

Suggested change
if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase))
{
if (sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase))
{
return false;
}
if (sql.StartsWith("SELECT", StringComparison.OrdinalIgnoreCase) &&
(sql.Contains("INFORMATION_SCHEMA", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("sys.", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("sys ", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("FROM sys", StringComparison.OrdinalIgnoreCase) ||
sql.Contains("object_id(", StringComparison.OrdinalIgnoreCase)))
{
return false;

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +108
if (response.StatusCode == HttpStatusCode.TooManyRequests)
{
if (attempt < MAX_RETRIES)
{
int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1);
_logger.LogWarning(
"Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})",
delayMs,
attempt,
MAX_RETRIES);
await Task.Delay(delayMs, cancellationToken);
continue;
}
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 'if' statements can be combined.

Suggested change
if (response.StatusCode == HttpStatusCode.TooManyRequests)
{
if (attempt < MAX_RETRIES)
{
int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1);
_logger.LogWarning(
"Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})",
delayMs,
attempt,
MAX_RETRIES);
await Task.Delay(delayMs, cancellationToken);
continue;
}
if (response.StatusCode == HttpStatusCode.TooManyRequests && attempt < MAX_RETRIES)
{
int delayMs = INITIAL_RETRY_DELAY_MS * (int)Math.Pow(2, attempt - 1);
_logger.LogWarning(
"Rate limited by Azure OpenAI. Retrying after {DelayMs}ms (attempt {Attempt}/{MaxRetries})",
delayMs,
attempt,
MAX_RETRIES);
await Task.Delay(delayMs, cancellationToken);
continue;

Copilot uses AI. Check for mistakes.
@JerryNixon
Copy link
Contributor

I am a little concerned that such a large PR was submitted for a new feature by the same author without any planning. Coupling to Azure OpenAI and Azure Redis so tightly feels like we are moving away from our core principles. Then again, I am open to advanced features like this, especially when they bring such high value to our customers. But we need to discuss this before moving forward on this plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to do semantic caching with Azure Managed Redis

3 participants