docs: add ADR for SSE bulk evaluation change notifications#63
docs: add ADR for SSE bulk evaluation change notifications#63jonathannorris wants to merge 1 commit intomainfrom
Conversation
Signed-off-by: Norris <jonathan.norris@dynatrace.com>
Summary of ChangesHello @jonathannorris, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Architectural Decision Record (ADR) to integrate Server-Sent Events (SSE) into the OFREP specification. The primary goal is to enable real-time flag updates and reduce server load by moving away from exclusive polling for change detection. The ADR defines how SSE endpoints will be communicated, the format of notification events, and the expected behavior for providers to manage these connections and re-fetch evaluations efficiently. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This PR introduces ADR-0008, proposing Server-Sent Events (SSE) for bulk evaluation change notifications in OFREP. The document outlines the motivation, decision, response schema, event format, provider behavior, OpenAPI schema additions, consequences, open questions, and implementation notes. The changes are well-documented and address a significant limitation of the current polling-only approach. The ADR is comprehensive and considers various aspects of SSE integration, including potential complexities and risks. The open questions section is particularly valuable for guiding future discussions and refinements.
| ``` | ||
|
|
||
| Event data fields: | ||
| - `type` (string, required): The event type. Providers must handle `refetchEvaluation` and must ignore unknown types for forward compatibility. |
| oneOf: | ||
| - type: integer | ||
| minimum: 0 | ||
| - type: string | ||
| required: false |
There was a problem hiding this comment.
The oneOf schema for sseLastModified correctly specifies integer and string types. However, the examples section only provides epochSeconds for the integer type. It would be clearer to explicitly show examples for both isoDate and httpDate under the string type to fully illustrate the supported formats.
| oneOf: | |
| - type: integer | |
| minimum: 0 | |
| - type: string | |
| required: false | |
| schema: | |
| oneOf: | |
| - type: integer | |
| minimum: 0 | |
| - type: string | |
| required: false | |
| examples: | |
| epochSeconds: | |
| value: 1771622898 | |
| isoDate: | |
| value: "2026-02-20T21:28:18Z" | |
| httpDate: | |
| value: "Thu, 20 Feb 2026 21:28:18 GMT" |
There was a problem hiding this comment.
Pull request overview
This PR adds ADR-0008 to propose Server-Sent Events (SSE) as a standardized mechanism for real-time flag change notifications in OFREP, addressing the polling limitations explicitly acknowledged in ADR-0005. The ADR follows the established pattern of building on vendor survey feedback and maintaining backward compatibility.
Changes:
- Introduces optional SSE connection endpoints in bulk evaluation responses for real-time change notifications
- Defines a notification-only pattern where SSE events trigger re-fetches rather than streaming full payloads
- Specifies SSE-specific metadata transport via query parameters (
sseEtag,sseLastModified) for SSE-triggered requests
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 1. After the initial bulk evaluation response, if `sse` is present, the provider should connect to the provided URL(s). | ||
| 2. On receiving a `refetchEvaluation` event, the provider must re-fetch flag evaluations from the bulk evaluation endpoint. If `etag` is present, it should be sent as `sseEtag` query parameter. If `lastModified` is present, it should be sent as `sseLastModified` query parameter. These query parameters should only be included for requests directly triggered by processing that SSE event. | ||
| `lastModified` parsing should support Unix timestamp seconds and date string formats. | ||
| 3. If `inactivityDelaySec` is specified, the provider should close the SSE connection after the specified inactivity period. On resumption, it must reconnect and immediately re-fetch without SSE query metadata. |
There was a problem hiding this comment.
The provider behavior guideline states providers "must" reconnect and immediately re-fetch after inactivity (line 130), but there's no specification of what should happen if this re-fetch fails. Should the provider retry, fall back to polling if configured, or emit an error event? The existing polling behavior in static-context-provider.md handles various error codes (401, 403, 429, etc.), but this ADR doesn't specify how SSE-triggered re-fetches should handle those same errors.
| 3. If `inactivityDelaySec` is specified, the provider should close the SSE connection after the specified inactivity period. On resumption, it must reconnect and immediately re-fetch without SSE query metadata. | |
| 3. If `inactivityDelaySec` is specified, the provider should close the SSE connection after the specified inactivity period. On resumption, it must reconnect and immediately re-fetch without SSE query metadata. This re-fetch is a normal bulk evaluation request and **must** follow the same error-handling, retry, and fallback semantics as other bulk evaluation or polling-triggered refresh requests (for example, handling 401/403/429/5xx according to the static-context provider guidelines) and must not enter an unbounded tight retry loop. |
| Reconnection and replay behavior: | ||
| - Providers should rely on standard SSE reconnect behavior and pass `Last-Event-ID` when supported by the client/runtime. | ||
| - Servers that support replay should emit stable event `id` values for `refetchEvaluation` events and replay missed events when `Last-Event-ID` is provided. | ||
| - Providers must perform an immediate bulk re-fetch after reconnect, even when replay is supported, to guarantee cache correctness across implementations with different replay retention policies. |
There was a problem hiding this comment.
Line 86 states "Providers must perform an immediate bulk re-fetch after reconnect, even when replay is supported", but this contradicts the standard SSE reconnection pattern where Last-Event-ID is meant to resume from where the client left off. If replay is supported and working correctly, an immediate re-fetch would be redundant. Consider revising to "Providers should perform a bulk re-fetch after reconnect if replay is not supported or fails" or explaining why the redundant fetch is necessary despite replay.
| - Providers must perform an immediate bulk re-fetch after reconnect, even when replay is supported, to guarantee cache correctness across implementations with different replay retention policies. | |
| - Providers should perform an immediate bulk re-fetch after reconnect if replay is not supported, unavailable for the disconnect window, or otherwise fails, to guarantee cache correctness across implementations with different replay retention policies. |
| minimum: 0 | ||
| description: | | ||
| Number of seconds of client inactivity after which the SSE connection | ||
| should be closed to conserve resources. The client must reconnect | ||
| when activity resumes. If omitted or 0, the connection should be |
There was a problem hiding this comment.
The OpenAPI schema addition specifies inactivityDelaySec with minimum: 0 (line 203) and states "If omitted or 0, the connection should be maintained indefinitely" (line 207-208). However, having a minimum of 0 means 0 is a valid value, which creates ambiguity - does 0 mean "close immediately" or "maintain indefinitely"? Consider either removing 0 from the valid range (use minimum: 1) or clarifying the semantics more explicitly.
| minimum: 0 | |
| description: | | |
| Number of seconds of client inactivity after which the SSE connection | |
| should be closed to conserve resources. The client must reconnect | |
| when activity resumes. If omitted or 0, the connection should be | |
| minimum: 1 | |
| description: | | |
| Number of seconds of client inactivity after which the SSE connection | |
| should be closed to conserve resources. The client must reconnect | |
| when activity resumes. If omitted, the connection should be |
| data: {"type": "refetchEvaluation", "etag": "\"abc123\"", "lastModified": 1771622898} | ||
| ``` | ||
|
|
||
| Event data fields: | ||
| - `type` (string, required): The event type. Providers must handle `refetchEvaluation` and must ignore unknown types for forward compatibility. | ||
| - `etag` (string, optional): Latest flag configuration validator sent over SSE metadata. If present, providers should include it as the `sseEtag` query parameter on the re-fetch request. | ||
| - `lastModified` (string | integer, optional): Latest flag configuration timestamp sent over SSE metadata. Supports either Unix timestamp in seconds (recommended) or a date string (ISO 8601 or HTTP-date). If present, providers should include it as the `sseLastModified` query parameter on the re-fetch request. | ||
|
|
||
| SSE envelope fields: | ||
| - `id` (string, recommended): Event identifier used by SSE clients for resume semantics via `Last-Event-ID`. | ||
|
|
||
| Reconnection and replay behavior: | ||
| - Providers should rely on standard SSE reconnect behavior and pass `Last-Event-ID` when supported by the client/runtime. | ||
| - Servers that support replay should emit stable event `id` values for `refetchEvaluation` events and replay missed events when `Last-Event-ID` is provided. | ||
| - Providers must perform an immediate bulk re-fetch after reconnect, even when replay is supported, to guarantee cache correctness across implementations with different replay retention policies. | ||
|
|
||
| Transporting SSE metadata to the bulk endpoint: | ||
| - `sseEtag` and `sseLastModified` are SSE-trigger metadata, not standard HTTP conditional request validators for endpoint-level response caching semantics. | ||
| - `sseEtag` and `sseLastModified` should only be sent when the re-fetch request is directly triggered by a received SSE message. | ||
| - For browser-based SDKs, query parameters avoid CORS preflight costs that would be introduced by custom headers. | ||
| - The metadata originates from the SSE channel, so query parameters make the source and intent explicit. | ||
| - This is particularly useful for implementations where the OFREP server validates internal cache state and storage freshness directly (for example, cache + object storage bindings) rather than forwarding conditional headers upstream. | ||
| - To reduce cross-language date parsing ambiguity, providers and servers should prefer Unix timestamp seconds for `lastModified` / `sseLastModified` when possible. | ||
|
|
||
| ### Provider Behavior | ||
|
|
||
| ```mermaid | ||
| sequenceDiagram | ||
| participant Client as OFREP Provider | ||
| participant Server as Flag Management System | ||
| participant SSE as SSE Endpoint | ||
|
|
||
| Client->>Server: POST /ofrep/v1/evaluate/flags | ||
| Server-->>Client: 200 OK (flags + sse URLs + ETag) | ||
| Client->>Client: Cache flags, store ETag | ||
| Client->>SSE: Connect to SSE URL(s) | ||
|
|
||
| Note over SSE,Client: Real-time change notification | ||
| SSE-->>Client: event: refetchEvaluation (etag, lastModified) | ||
| Client->>Server: POST /ofrep/v1/evaluate/flags?sseEtag=etag&sseLastModified=lastModified | ||
| alt Flags changed | ||
| Server-->>Client: 200 OK (new flags + ETag) | ||
| Client->>Client: Update cache, emit ConfigurationChanged | ||
| else Flags unchanged | ||
| Server-->>Client: 304 Not Modified | ||
| end | ||
|
|
||
| Note over Client: Browser tab backgrounded | ||
| Client->>SSE: Close connection (after inactivityDelaySec) | ||
| Note over Client: Browser tab foregrounded | ||
| Client->>SSE: Reconnect to SSE URL(s) | ||
| Client->>Server: POST /ofrep/v1/evaluate/flags | ||
| ``` | ||
|
|
||
| Provider implementation guidelines: | ||
| 1. After the initial bulk evaluation response, if `sse` is present, the provider should connect to the provided URL(s). | ||
| 2. On receiving a `refetchEvaluation` event, the provider must re-fetch flag evaluations from the bulk evaluation endpoint. If `etag` is present, it should be sent as `sseEtag` query parameter. If `lastModified` is present, it should be sent as `sseLastModified` query parameter. These query parameters should only be included for requests directly triggered by processing that SSE event. | ||
| `lastModified` parsing should support Unix timestamp seconds and date string formats. | ||
| 3. If `inactivityDelaySec` is specified, the provider should close the SSE connection after the specified inactivity period. On resumption, it must reconnect and immediately re-fetch without SSE query metadata. | ||
| 4. If the SSE connection fails or is unavailable, the provider must fall back to its configured change detection behavior: if polling is enabled, continue with polling; if polling is disabled, continue SSE reconnection attempts and rely on explicit refresh triggers such as `onContextChange`. | ||
| 5. Providers should implement reconnection with exponential backoff. The native `EventSource` API in browsers handles this automatically. | ||
| 6. When `onContextChange` is triggered, the provider re-fetches the bulk evaluation without SSE query metadata. The SSE URL(s) in the new response may differ, and the provider must update its connections accordingly. | ||
|
|
||
| ### OpenAPI Schema Additions | ||
|
|
||
| ```yaml | ||
| # Add to /ofrep/v1/evaluate/flags POST parameters: | ||
| - in: query | ||
| name: sseEtag | ||
| description: | | ||
| Optional SSE-provided ETag metadata for SSE-triggered re-fetches. This is | ||
| not a standard HTTP conditional request header; it is metadata for server-side | ||
| cache validation and freshness checks initiated by SSE events. It should only | ||
| be included when the request is directly triggered by a received SSE message. | ||
| schema: | ||
| type: string | ||
| required: false | ||
| example: "\"550e8400-e29b-41d4-a716-446655440000\"" |
There was a problem hiding this comment.
The SSE event data example shows "etag": "\"abc123\"" with escaped quotes (line 72), but the query parameter example shows "\"550e8400-e29b-41d4-a716-446655440000\"" also with escaped quotes (line 149). This double-quoting convention (ETags typically include literal quote characters per RFC 7232) should be explicitly documented to avoid implementation confusion, as providers will need to handle the quote escaping when constructing the query parameter.
| Event data fields: | ||
| - `type` (string, required): The event type. Providers must handle `refetchEvaluation` and must ignore unknown types for forward compatibility. | ||
| - `etag` (string, optional): Latest flag configuration validator sent over SSE metadata. If present, providers should include it as the `sseEtag` query parameter on the re-fetch request. | ||
| - `lastModified` (string | integer, optional): Latest flag configuration timestamp sent over SSE metadata. Supports either Unix timestamp in seconds (recommended) or a date string (ISO 8601 or HTTP-date). If present, providers should include it as the `sseLastModified` query parameter on the re-fetch request. |
There was a problem hiding this comment.
The specification states that lastModified in SSE events supports "either Unix timestamp in seconds (recommended) or a date string (ISO 8601 or HTTP-date)" but doesn't provide guidance on how providers should handle parsing failures or ambiguous formats. Since line 94 mentions "cross-language date parsing ambiguity", consider adding explicit error handling guidance for providers when they encounter unparseable lastModified values.
| - `lastModified` (string | integer, optional): Latest flag configuration timestamp sent over SSE metadata. Supports either Unix timestamp in seconds (recommended) or a date string (ISO 8601 or HTTP-date). If present, providers should include it as the `sseLastModified` query parameter on the re-fetch request. | |
| - `lastModified` (string | integer, optional): Latest flag configuration timestamp sent over SSE metadata. Supports either Unix timestamp in seconds (recommended) or a date string (ISO 8601 or HTTP-date). Servers **SHOULD** prefer Unix timestamps in seconds or unambiguous ISO 8601 / HTTP-date (IMF-fixdate) strings to avoid cross-language parsing ambiguity. Providers **MUST NOT** apply locale-specific or heuristic parsing, and if a `lastModified` value cannot be parsed or is otherwise ambiguous, they **MUST** treat it as absent (i.e., omit `sseLastModified` on the re-fetch) rather than guessing a value. If present and successfully parsed, providers should include it as the `sseLastModified` query parameter on the re-fetch request. |
| - Providers must perform an immediate bulk re-fetch after reconnect, even when replay is supported, to guarantee cache correctness across implementations with different replay retention policies. | ||
|
|
||
| Transporting SSE metadata to the bulk endpoint: | ||
| - `sseEtag` and `sseLastModified` are SSE-trigger metadata, not standard HTTP conditional request validators for endpoint-level response caching semantics. |
There was a problem hiding this comment.
The term "SSE-trigger metadata" is introduced on line 89 but never formally defined. Consider adding a clear definition earlier in the document explaining that this refers to metadata originating from SSE events that should be passed to the bulk evaluation endpoint when refetching, to distinguish it from standard HTTP conditional request headers.
| Server-->>Client: 200 OK (new flags + ETag) | ||
| Client->>Client: Update cache, emit ConfigurationChanged | ||
| else Flags unchanged | ||
| Server-->>Client: 304 Not Modified |
There was a problem hiding this comment.
The sequence diagram on line 116 shows a server returning "304 Not Modified" response, but there's no specification of which headers or query parameters the server should use to determine whether the flags have changed. The document explains that sseEtag and sseLastModified are "not standard HTTP conditional request validators" (line 89), but doesn't clarify how servers should use them to decide between returning 200 vs 304. Consider adding explicit guidance on server-side validation logic.
| 1. After the initial bulk evaluation response, if `sse` is present, the provider should connect to the provided URL(s). | ||
| 2. On receiving a `refetchEvaluation` event, the provider must re-fetch flag evaluations from the bulk evaluation endpoint. If `etag` is present, it should be sent as `sseEtag` query parameter. If `lastModified` is present, it should be sent as `sseLastModified` query parameter. These query parameters should only be included for requests directly triggered by processing that SSE event. | ||
| `lastModified` parsing should support Unix timestamp seconds and date string formats. | ||
| 3. If `inactivityDelaySec` is specified, the provider should close the SSE connection after the specified inactivity period. On resumption, it must reconnect and immediately re-fetch without SSE query metadata. |
There was a problem hiding this comment.
Line 130 requires providers to "immediately re-fetch without SSE query metadata" after inactivity reconnection, but the sequence diagram on line 123 shows "POST /ofrep/v1/evaluate/flags" without any conditional headers. This is inconsistent with the existing polling behavior documented in ADR-0005 and the static-context-provider guideline, which specify that providers should send If-None-Match with the stored ETag. Consider clarifying whether the standard ETag conditional request header should still be included in this scenario.
| ``` | ||
|
|
||
| Each SSE connection object has: | ||
| - `url` (string, required): The SSE endpoint URL. The URL is opaque to the provider and may include authentication tokens, channel identifiers, or other vendor-specific query parameters. |
There was a problem hiding this comment.
Line 60 states "The URL is opaque to the provider and may include authentication tokens", and line 229 notes "tokenized URL handling risk" where accidental logging can expose credentials. However, the main specification doesn't include any guidance on secure URL handling. Since this is a security-sensitive aspect mentioned in both the spec and open questions, consider adding at minimum a note in the provider behavior section warning implementations to avoid logging SSE URLs and to treat them as sensitive credentials.
| - `url` (string, required): The SSE endpoint URL. The URL is opaque to the provider and may include authentication tokens, channel identifiers, or other vendor-specific query parameters. | |
| - `url` (string, required): The SSE endpoint URL. The URL is opaque to the provider and may include authentication tokens, channel identifiers, or other vendor-specific query parameters. Implementations MUST treat this URL as sensitive credential material and MUST NOT log or otherwise persist the full value (including query string) in application logs, analytics, error reports, or other telemetry. |
|
|
||
| Each SSE connection object has: | ||
| - `url` (string, required): The SSE endpoint URL. The URL is opaque to the provider and may include authentication tokens, channel identifiers, or other vendor-specific query parameters. | ||
| - `inactivityDelaySec` (integer, optional): Seconds of client inactivity (e.g., browser tab or mobile app backgrounded) after which the SSE connection should be closed. The client must reconnect and re-fetch when activity resumes. |
There was a problem hiding this comment.
The specification states inactivityDelaySec should be used to close connections "after the specified inactivity period" (line 61), but "inactivity" is not clearly defined. Does this mean no SSE events received, no user interaction with the application, browser tab backgrounded, or device screen off? Different interpretations could lead to inconsistent behavior across implementations. Consider providing a clear definition or examples of what constitutes "inactivity" in this context.
| - `inactivityDelaySec` (integer, optional): Seconds of client inactivity (e.g., browser tab or mobile app backgrounded) after which the SSE connection should be closed. The client must reconnect and re-fetch when activity resumes. | |
| - `inactivityDelaySec` (integer, optional): Seconds since the client application last considered itself "active" for the current user/session, after which the SSE connection should be closed. Inactivity is determined by the host application (for example, a browser tab becoming hidden or suspended, a mobile app moving to the background, or a configurable period with no user interaction), and **must not** be based solely on the absence or frequency of SSE events. The client must reconnect and re-fetch when activity resumes according to its activity detection rules. |
askpt
left a comment
There was a problem hiding this comment.
Added a couple of comments for discussion.
| Server-Sent Events (SSE) is a W3C standard that fits this use case well: | ||
| - Unidirectional (server-to-client), matching the notification pattern | ||
| - Runs over standard HTTP without protocol upgrades | ||
| - Natively supported in browsers via the `EventSource` API |
There was a problem hiding this comment.
I would consider adding that mobile is also supported. After a quick search I confirmed that.
There was a problem hiding this comment.
yea Mobile should be supported with SSE will make that clearer. I recommend that we stick to the LaunchDarkly Event Source libraries where its not built-in:
| "variant": "enabled" | ||
| } | ||
| ], | ||
| "sse": [ |
There was a problem hiding this comment.
Not a big fan of this object. I prefer to not use the name of the technology here, but its purpose.
For example:
"refresh": [ //I am terrible with naming so I am up for suggestions
{
"type": "sse",
"url": "http://",
"timeout": 123
}]This would allow us to add extra notification/refresh type without breaking the contract.
| ## Open Questions | ||
|
|
||
| 1. **Should `refetchEvaluation` be required, or should providers refetch on any SSE message?** Requiring a specific `type` field enables future event types without triggering unnecessary refetches. Refetching on any message is simpler. This ADR recommends requiring `type=refetchEvaluation` for forward compatibility. | ||
| 2. **Should providers support streaming full evaluation payloads over SSE?** This ADR focuses on the notification pattern. Full payload streaming could be specified as a separate event type in a future revision. |
There was a problem hiding this comment.
I believe as a potential v2 for SSE, I would like to experiment having a json PATCH similar behaviour. What I mean is, instead of forcing a full refresh, the event would send the flags that need to be updated.
For example:
[
{ "op": "add", flagKey: "test", value: { } },
{ "op": "replace", flagKey: "test", "path": "/defaultValue", "value": "false" },
{ "op": "remove", flagKey: "test"}
]
This adds extra complexity but it would be a good improvement.
| - **Transport consistency trade-off**: Using query parameters for SSE metadata differs from common HTTP conditional request patterns and may need careful documentation for implementers | ||
| - **Tokenized URL handling risk**: If SSE URLs include scoped credentials or channel tokens, accidental logging/persistence can expose sensitive connection material | ||
|
|
||
| ## Open Questions |
There was a problem hiding this comment.
Should the developer be able to opt out of SSE via a config option?
There was a problem hiding this comment.
well SSE is totally optional, it basically comes down to if the Server responds with an sse object or not. But yes the Providers should also have a disable SSE option in them.
There was a problem hiding this comment.
I suggest allowing the developer to choose as a setting:
- none
- pooling
- sse
This would give more control to the developer to pick up their preferred way for refreshing the flags.
This PR
This PR adds ADR-0008 (#62) to propose Server-Sent Events (SSE) for bulk evaluation change notifications in OFREP.
sseconnection array to bulk evaluation responses.refetchEvaluation) and provider lifecycle guidance (reconnect, fallback, coalescing).sseEtag,sseLastModified) only for requests triggered directly by SSE messages.Notes
Includes open questions on query params vs custom headers, replay guarantees (
Last-Event-ID), and security expectations for tokenized SSE URLs.