NOTE: This document discusses the more advanced use cases that we could support in the future, the design principles we want to follow, and the main concrete changes in the semantic conventions observed since version 1.26.
This document describes the different types of schema changes that can be made to a semantic convention registry and outlines how to handle them both within the semantic convention registry and in the schema changes format generated by Weaver.
The schema changes format generated by Weaver must be sufficiently comprehensive to support the following use cases:
- Generation of meaningful migration guides: These guides should preserve the intent and semantics of the changes made by the semantic convention authors.
- Implementation of SchemaProcessors: These processors should enable automatic bidirectional (when possible) conversion of telemetry data streams between different versions of a semantic convention registry.
- Generation of Database Migration Scripts: These scripts should accurately represent the changes made to a registry in a database system (e.g. SQL DDL scripts).
More use cases may be added as needed.
First of all, the stable entities (registry attributes and signals) defined in the semantic convention registry are already subject to very strict evolution rules. Breaking changes are only used as a last resort. The analysis of semantic conventions since version 1.26 has shown no critical changes. The complex changes studied in this document are mainly dedicated to experimental entities, for which we officially have no firm commitment to implement reversible changes.
We are aware that semantic conventions are still largely in an experimental state at this time. Consequently, we aim to explore best-effort approaches to facilitate backward and forward migrations within reasonable limits. Defining what is considered reasonable to support is one of the objectives of this document.
The following design principles have been established to guide the creation of schema changes:
- Unique and Persistent Naming
- Each attribute and signal must have a unique and persistent name across registry versions.
- (Note: Spans do not yet have a formally defined unique name; this is currently under discussion.)
- Structured Change Descriptions
- Changes should be documented in a format that is both machine-readable and unambiguous to enable automated interpretation.
- Wherever feasible, the need for external context or supplementary knowledge should be minimized.
- In rare and complex scenarios, it should still be possible to apply an informal deprecation if formal approaches are prohibitively costly or if backward/forward migration is impossible.
- Renaming Requires Deprecation
- When an attribute or signal must be renamed, a formal deprecation procedure is mandatory.
- Such renaming is inherently a breaking change and will be treated accordingly by Weaver.
- Precise Deprecation Definition
- The process and criteria for deprecating an attribute or signal must be specified in detail.
- Definitions should clarify the possibility of backward and forward transformations.
- In extreme cases, it should be explicit when backward or forward compatibility cannot be maintained.
- Unambiguous Change Semantics
- The purpose and impact of each change must be clearly stated so that migration guidance can be easily derived.
- A complex change script should not be required to interpret the intention behind a change.
- Backward and forward scripts should be reserved for detailing the technical steps needed for migration.
The schema changes format generated by Weaver is a YAML file that describes the differences between two versions of a semantic convention registry. The format is structured as follows:
head:
semconv_version: <version n>
baseline:
semconv_version: <version n+1>
changes:
registry_attributes:
- name: <attribute_name>
type: <change_type> # change types are described after this YAML snippet
# other fields that depend on the change type
- ...
events:
- name: <event_name>
type: <change_type> # change types are described after this YAML snippet
# other fields that depend on the change type
- ...
metrics:
- name: <metric_name>
type: <change_type> # change types are described after this YAML snippet
# other fields that depend on the change type
- ...
Where <change_type>
can be one of the following:
added
: A new item was added in the head registry.renamed
: An item was renamed in the head registry.conditionally_renamed
: An item was renamed based on specific conditions.split
: An item was split into multiple items.merged
: Multiple items were merged into a single item.obsoleted
: An item was obsoleted in the head registry.conditionally_obsoleted
: An item was obsoleted based on specific conditions.generalized
: An item was replaced by a more general item.updated
: An item was updated in the head registry.
Each <change_type>
conveys a specific semantics of change. This list is not exhaustive. Each <change_type>
is
associated with a specific set of fields that characterize this type of change (see the next section and the examples
to see a list of those fields).
The name
of an item in the semantic conventions acts as a persistent ID that should not change across versions.
However, there are cases where renaming an item becomes necessary. In such instances, the procedure is to deprecate
the old item and introduce a new item with the updated name. The deprecated
field of the old item should reference the
new item as its replacement, along with additional information about the type of renaming.
Several forms of renaming have been observed within the semantic conventions:
- Basic Renaming: A single item is renamed to a new name. Backward and forward transformations are straightforward in this case and can be automatically inferred.
- Conditional Renaming: An item is renamed to a new name based on specific conditions. These conditions could depend on the value of another item or a combination of items. Usually, backward and forward transformations can be explicitly defined.
- Split: An item is split into multiple items. Sometimes, backward and forward transformations can be explicitly defined.
- Merge: Multiple items are merged into a single item. Sometimes, backward and forward transformations can also be explicitly defined.
The next sections provide more details on each form of renaming.
Note: The
name
field for spans is not yet fully defined within the semantic conventions. Schema changes for spans will be addressed in this document once this field is more clearly defined in the semantic conventions.
Consider the following two consecutive versions of a semantic convention registry:
# Version n
groups:
- id: registry.process
type: attribute_group
brief: "Process attributes."
attributes:
- id: process.cpu.state
brief: "The state of the CPU"
type: ...
stability: experimental
To rename the process.cpu.state
attribute to cpu.mode
, the following changes are introduced:
# Version n+1
groups:
- id: registry.process.deprecated
type: attribute_group
brief: Attributes specific to a cpu instance.
attributes:
# The existing attribute is deprecated
- id: process.cpu.state
brief: "Deprecated, use `cpu.mode` instead."
deprecated:
reason: renamed
new_name: cpu.mode
type: ...
stability: experimental
- id: registry.cpu
type: attribute_group
brief: Attributes specific to a cpu instance.
display_name: CPU Attributes
attributes:
# A new attribute is introduced
- id: cpu.mode
brief: "The mode of the CPU"
type: ...
stability: experimental
examples: [ "user", "system" ]
From these changes, Weaver can generate a diff report highlighting the renaming of the process.cpu.state
attribute to
cpu.mode
:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: process.cpu.state
type: renamed
new_name: cpu.mode
note: Deprecated, use `cpu.mode` instead.
Sometimes, renaming an item depends on specific context. Consider the following two consecutive versions of a semantic convention registry:
# Version n
groups:
- id: registry.network
type: attribute_group
attributes:
- id: net.peer.name
type: string
stability: experimental
examples: ['example.com']
#...
To rename the net.peer.name
attribute to server.address
for client
spans and client.address
for server
spans,
the following changes are made:
# Version n+1
groups:
- id: registry.network.deprecated
type: attribute_group
attributes:
- id: net.peer.name
type: string
brief: Deprecated, use `server.address` on client spans and `client.address` on server spans.
deprecated:
reason: conditionally_renamed
forward: >
switch span_kind {
case 'client' => attributes['server.address'] = attributes['net.peer.name'],
case 'server' => attributes['client.address'] = attributes['net.peer.name']
}
backward: >
switch span_kind {
case 'client' => attributes['net.peer.name'] = attributes['server.address'],
case 'server' => attributes['net.peer.name'] = attributes['client.address']
}
stability: experimental
examples: ['example.com']
When these changes are applied at the attribute_group
level, it means they are applied globally. It should also be
possible to apply this type of change at the level of a specific signal. [ToDo].
Weaver can then generate a diff report reflecting the conditional renaming of the net.peer.name
attribute:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: net.peer.name
type: conditionally_renamed
forward: >
switch span_kind {
case 'client' => attributes['server.address'] = attributes['net.peer.name'],
case 'server' => attributes['client.address'] = attributes['net.peer.name']
}
backward: >
switch span_kind {
case 'client' => attributes['net.peer.name'] = attributes['server.address'],
case 'server' => attributes['net.peer.name'] = attributes['client.address']
}
note: Deprecated, use `server.address` on client spans and `client.address` on server spans.
In some situations, a single item is split into multiple items. Consider the following two consecutive versions of a semantic convention registry:
# Version n
groups:
- id: registry.db
type: attribute_group
stability: experimental
attributes:
- id: db.connection_string
type: string
stability: experimental
To split the db.connection_string
attribute into server.address
and server.port
, the following changes are made
in the registry:
# Version n+1
groups:
- id: registry.db.deprecated
type: attribute_group
stability: experimental
attributes:
- id: db.connection_string
type: string
brief: 'Deprecated, use `server.address`, `server.port` attributes instead.'
deprecated:
reason: split
into: ["server.address", "server.port"]
forward: >
attributes['server.address'] = attributes['db.connection_string'].split(':')[0]
attributes['server.port'] = attributes['db.connection_string'].split(':')[1]
backward: attributes['server.address'] + ':' + attributes['server.port']
stability: experimental
# server.address and server.port already exist in the registry
# so they stay unchanged.
Weaver can generate a diff report to describe this split operation:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: db.connection_string
type: split
into: ["server.address", "server.port"]
forward: >
attributes['server.address'] = attributes['db.connection_string'].split(':')[0]
attributes['server.port'] = attributes['db.connection_string'].split(':')[1]
backward: attributes['server.address'] + ':' + attributes['server.port']
note: Deprecated, use `server.address`, `server.port` attributes instead.
In other cases, multiple items are merged into a single item. Consider the following example of two consecutive versions of a semantic convention registry:
# Version n
groups:
- id: registry.db
type: attribute_group
brief: Deprecated Database Attributes
stability: experimental
attributes:
- id: "db.cassandra.table"
type: string
- id: "db.cosmosdb.container"
type: string
- id: "db.mongodb.collection"
type: string
- id: "db.sql.table"
type: string
To merge the db.cassandra.table
, db.cosmosdb.container
, db.mongodb.collection
, and db.sql.table
attributes into
a single attribute db.collection.name
, and introduce a new attribute db.system
to store the system name, the
following changes are made:
# Version n+1
groups:
# The existing db attributes are deprecated
- id: registry.db.deprecated
type: attribute_group
stability: experimental
attributes:
- id: "db.cassandra.table"
deprecated:
reason: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'cassandra' then attributes['db.cassandra.table'] = attributes['db.collection.name']
- id: "db.cosmosdb.container"
deprecated:
reason: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'cosmosdb' then attributes['db.cosmosdb.container'] = attributes['db.collection.name']
- id: "db.mongodb.collection"
deprecated:
reason: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'mongodb' then attributes['db.mongodb.collection'] = attributes['db.collection.name']
- id: "db.sql.table"
deprecated:
reason: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'sql' then attributes['db.sql.table'] = attributes['db.collection.name']
# The new attributes are introduced
- id: registry.db
type: attribute_group
brief: Database Attributes
stability: experimental
attributes:
- id: "db.collection.name"
type: string
brief: "The name of the collection."
- id: "db.system"
type: string
brief: "The system name."
Weaver can generate a diff report to describe this merge operation:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: "db.cassandra.table"
type: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'cassandra' then attributes['db.cassandra.table'] = attributes['db.collection.name']
- name: "db.cosmosdb.container"
type: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'cosmosdb' then attributes['db.cosmosdb.container'] = attributes['db.collection.name']
- name: "db.mongodb.collection"
type: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'mongodb' then attributes['db.mongodb.collection'] = attributes['db.collection.name']
- name: "db.sql.table"
type: merged
merged_to: "db.collection.name"
backward: >
if attributes['db.system'] == 'sql' then attributes['db.sql.table'] = attributes['db.collection.name']
The same concept of merging can be applied to signals like metrics, events, and spans.
Deprecation is the process of marking an item as obsolete or outdated. It can occur for several reasons:
- Renaming: When an item is renamed, it is first deprecated before the new item is introduced.
- Soft Removal: When an item is deprecated without a direct replacement.
- Conditional Deprecation: When an item is deprecated based on specific conditions.
- Generalization: When a more general attribute replaces a specific one.
Each of these cases requires documenting the deprecation and, where applicable, defining forward and backward transformations to maintain compatibility.
Soft removal refers to the process of marking an item as deprecated without introducing a replacement. This approach is often used when the item is no longer needed but still needs to remain in the registry for backward compatibility.
# Version n
groups:
- id: registry.db.metrics
type: attribute_group
stability: experimental
attributes:
- id: db.jdbc.driver_classname
type: string
stability: experimental
In the next version, the db.jdbc.driver_classname
attribute is obsoleted without a replacement:
# Version n+1
groups:
- id: registry.db.metrics.deprecated
type: attribute_group
stability: experimental
attributes:
- id: db.jdbc.driver_classname
type: string
brief: 'Removed, no replacement at this time.'
deprecated:
reason: obsoleted
stability: experimental
The diff report generated by Weaver could look like this:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: db.jdbc.driver_classname
type: obsoleted
note: Removed, no replacement at this time.
Conditional deprecation occurs when an item is marked as deprecated only under certain conditions, such as based on the value of another attribute.
# Version n
groups:
- id: registry.db
type: attribute_group
stability: experimental
attributes:
- id: db.instance.id
type: string
stability: experimental
In the next version, the db.instance.id
attribute is deprecated based on the value of the db.system
attribute. For
example, if the system is elasticsearch
, the replacement is db.elasticsearch.node.name
:
# Version n+1
groups:
- id: registry.db.deprecated
type: attribute_group
stability: experimental
attributes:
- id: db.instance.id
type: string
brief: 'Deprecated, no general replacement at this time. For Elasticsearch, use `db.elasticsearch.node.name` instead.'
deprecated:
reason: conditionally_obsoleted
forward: >
if attributes['db.system'] == 'elasticsearch' then attributes['db.elasticsearch.node.name'] = attributes['db.instance.id']
else drop attributes['db.instance.id']
backward: >
if attributes['db.system'] == 'elasticsearch' then attributes['db.instance.id'] = attributes['db.elasticsearch.node.name']
stability: experimental
When these changes are applied at the attribute_group
level, it means they are applied globally. It should also be
possible to apply this type of change at the level of a specific signal. [ToDo].
Weaver can generate a diff report as follows:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: db.instance.id
type: conditionally_obsoleted
forward: >
if attributes['db.system'] == 'elasticsearch' then attributes['db.elasticsearch.node.name'] = attributes['db.instance.id']
else drop attributes['db.instance.id']
backward: >
if attributes['db.system'] == 'elasticsearch' then attributes['db.instance.id'] = attributes['db.elasticsearch.node.name']
note: Deprecated, no general replacement at this time. For Elasticsearch, use `db.elasticsearch.node.name` instead.
Generalization occurs when a specific attribute is replaced by a more general one. This is often done to accommodate additional use cases.
# Version n
groups:
- id: "db.table.name"
# ...
In the next version, the db.table.name
attribute is deprecated in favor of db.collection.name
, which is more general
and can represent tables, views, indexes, ...:
# Version n+1
groups:
- id: "db.table.name"
note: Deprecated, db.collection.name now represents other entities too (indexes, views, table, etc).
deprecated:
reason: generalized
generalized_to: "db.collection.name"
# ...
- id: "db.collection.name"
note: Represents the name of a table, a view, an index, ...
The diff report could look like this:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
registry_attributes:
- name: db.table.name
type: generalized
generalized_to: db.collection.name
note: Deprecated, db.collection.name now represents other entities too (indexes, views, table, etc).
The previous sections focused on renaming and deprecation of top-level items (attributes and signals) in the semantic conventions. This section describes how to handle changes in the fields of attributes and signals and changes in the attributes of signals.
Field and attribute changes can be classified into two categories: metadata-only changes and data-impacting changes:
- Metadata-Only Changes: Changes made to the notes, descriptions, or examples of an attribute or signal typically have no impact on the corresponding telemetry data stream.
- Data-Impacting Changes : Changes such as modifying the type of an attribute or signal, or altering the unit of a metric, can impact the corresponding telemetry data stream. For this category of field changes, it is often necessary to support both backward and forward transformations to ensure compatibility.
The following example illustrates a change in the unit of a metric between two versions of a semantic convention registry:
# Version n
groups:
- id: metric.db.client.connection.create_time
type: metric
metric_name: db.client.connection.create_time
stability: experimental
instrument: counter
unit: "s"
The new version of the registry updates the unit of the db.client.connection.create_time
metric, changing it from
seconds to milliseconds as follows:
# Version n+1
groups:
- id: metric.db.client.connection.create_time
type: metric
metric_name: db.client.connection.create_time
stability: experimental
instrument: counter
unit: "ms" # <- This is where the change is made
This change is classified as a data-impacting change because it affects the telemetry data stream. To maintain compatibility between the two versions of the semantic convention registry, both backward and forward transformations must be defined.
Based on these changes, Weaver can generate a diff report that highlights the update to the unit of the
db.client.connection.create_time
metric.
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
metrics:
- name: metric.db.client.connection.create_time
type: updated
fields:
- type: updated
name: unit
old_value: s
new_value: ms
forward: >
attributes['db.client.connection.create_time'] = attributes['db.client.connection.create_time'] * 1000
backward: >
attributes['db.client.connection.create_time'] = attributes['db.client.connection.create_time'] / 1000
In this diff report, Weaver automatically infers the backward and forward transformations to ensure compatibility between the two versions of the semantic convention registry.
TBD: Address data-impacting changes where backward and forward transformations need to be defined manually.
TBD (attribute changes follow the same principles as field changes)
It not uncommon to see a combination of renaming and field changes in a single schema change. For example, consider the following two consecutive versions of a semantic convention registry:
# Version n
groups:
- id: metric.db.client.connections.create_time
type: metric
metric_name: db.client.connections.create_time
stability: experimental
instrument: histogram
unit: "ms"
To rename the metric.db.client.connections.create_time
metric to metric.db.client.connection.create_time
and update
the unit from milliseconds to seconds, the following changes are made:
groups:
- id: metric.db.client.connections.create_time.deprecated
type: metric
metric_name: db.client.connections.create_time
brief: "Deprecated, use `db.client.connection.create_time` instead. Note: the unit also changed from `ms` to `s`."
deprecated:
reason: renamed
renamed_to: `db.client.connection.create_time`
fields:
- name: unit
old_value: ms
new_value: s
forward: >
metrics['db.client.connection.create_time'].value = metrics['metric.db.client.connections.create_time'].value / 1000
backward: >
metrics['metric.db.client.connections.create_time'].value = metrics['db.client.connection.create_time'].value * 1000
stability: experimental
instrument: histogram
unit: "ms"
- id: metric.db.client.connection.create_time
type: metric
metric_name: db.client.connection.create_time
stability: experimental
instrument: histogram
unit: "s"
The diff report generated by Weaver could look like this:
head:
semconv_version: n
baseline:
semconv_version: n+1
changes:
metrics:
- name: metric.db.client.connections.create_time
type: renamed
renamed_to: db.client.connection.create_time
fields:
- name: unit
old_value: ms
new_value: s
forward: >
metrics['db.client.connection.create_time'].value = metrics['metric.db.client.connections.create_time'].value / 1000
backward: >
metrics['metric.db.client.connections.create_time'].value = metrics['db.client.connection.create_time'].value * 1000