Skip to content

Commit cf0708c

Browse files
authored
fix: new "run context" management to fix numerous span hierarchy issues (#2181)
This makes a number of changes to the `Instrumentation` API that is used by the instrumentations under "lib/instrumentation/modules/". Tracking the "current transaction" and "current span" is now entirely encapsulated in tracking the current "run context" (see `RunContext.js`) via a "run context manager" singleton (`agent._instrumentation._runCtxMgr`). This is the thing that uses `async_hooks` (`AsyncHooksRunContextManager`) or not (`BasicRunContextManager`) to track the run context for the currently running JS. They use an interface (and some implementation) very similar to OTel's ContextManager. The primary reason our context management can't be closer to OTel's is because our `apm.startTransaction(...)`, `apm.startSpan(...)`, `span.end()`, et al APIs **change the current run context**. # Instrumentation API used for run context tracking - **Part** of run context tracking is handled by an async-hook tracking new async tasks. The rest of run context tracking is in explicit calls to "bind" a function call to a particular run context. ins.bindEmitter(ee) // (unchanged) bind added event handlers to the curr run context ins.bindFunction(fn) // (unchanged) bind fn to the curr run context ins.bindFunctionToRunContext(rc, fn) // bind fn to a specific run context ins.bindFunctionToEmptyRunContext(fn) // an odd ball used to explicitly break run context ins.withRunContext(rc, fn, thisArg, ...args) // Equivalent to binding `fn` then calling it, but with less overhead. - Creating and ending transactions and spans: ins.startTransaction(...) -> trans // (unchanged) ins.startSpan(...) -> span // (unchanged) ins.createSpan(...) -> span // Create span, but don't change the current run context. ins.addEndedTransaction(trans) // (unchanged) ins.addEndedSpan(trans) // (unchanged) - Getting and working with run contexts: ins.currRunContext() -> rc // Get the current run context. // The following are mostly used internally by above methods. ins.supersedeWithTransRunContext(trans) ins.supersedeWithSpanRunContext(span) ins.supersedeWithEmptyRunContext() # Behavior changes This makes *no* changes to the public API. There are, however, the following changes in behavior. 1. If user code creates span A, then creates span B *in the same async task*: B will be a child of A. // BEFORE AFTER apm.startTransaction('t0') // transaction 't0' transaction 't0' apm.startSpan('s1') // |- span 's1' `- span 's1' apm.startSpan('s2') // `- span 's2' `- span 's2' 2. Before this PR, an ended transaction would linger as `apm.currentTransaction`. Not any more. Fixes: #1889 Fixes: #1239
1 parent a36ad83 commit cf0708c

File tree

96 files changed

+1942
-869
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+1942
-869
lines changed

.tav.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@ generic-pool:
33
commands: node test/instrumentation/modules/generic-pool.test.js
44
mimic-response:
55
versions: ^1.0.0
6-
commands: node test/instrumentation/modules/mimic-response.test.js
6+
commands:
7+
- node test/instrumentation/modules/mimic-response.test.js
8+
- node test/instrumentation/modules/http/github-179.test.js
79
got-very-old:
810
name: got
911
versions: '>=4.0.0 <9.0.0'

CHANGELOG.asciidoc

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,33 @@ Notes:
2828
[[release-notes-3.x]]
2929
=== Node.js Agent version 3.x
3030
31+
==== Unreleased
32+
33+
[float]
34+
===== Breaking changes
35+
36+
[float]
37+
===== Features
38+
39+
[float]
40+
===== Bug fixes
41+
42+
* A significant change was made to internal run context tracking (a.k.a. async
43+
context tracking). There are no configuration changes or API changes for
44+
custom instrumentation. ({pull}2181[#2181])
45+
+
46+
One behavior change is that multiple spans created synchronously (in the same
47+
async task) will form parent/child relationships; before this change they would
48+
all be siblings. This fixes HTTP child spans of Elasticsearch and aws-sdk
49+
automatic spans to properly be children. ({issues}1889[#1889])
50+
+
51+
Another behavior change is that a span B started after having ended span A in
52+
the same async task will *no longer* be a child of span A. ({pull}1964[#1964])
53+
+
54+
This fixes an issue with context binding of EventEmitters, where
55+
`removeListener` would fail to actually remove if the same handler function was
56+
added to multiple events.
57+
3158
3259
[[release-notes-3.23.0]]
3360
==== 3.23.0 2021/10/25

DEVELOPMENT.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,24 +38,24 @@ environment variables:
3838

3939
## debug logging of `async_hooks` usage
4040

41-
The following patch to the agent's async-hooks.js can be helpful to learn
42-
how its async hook tracks relationships between async operations:
41+
When using the `AsyncHooksRunContextManager` the following debug printf in
42+
the `init` async hook can be helpful to learn how its async hook tracks
43+
relationships between async operations:
4344

4445
```diff
45-
diff --git a/lib/instrumentation/async-hooks.js b/lib/instrumentation/async-hooks.js
46-
index 1dd168f..f35877d 100644
47-
--- a/lib/instrumentation/async-hooks.js
48-
+++ b/lib/instrumentation/async-hooks.js
49-
@@ -71,6 +71,9 @@ module.exports = function (ins) {
50-
// type, which will init for each scheduled timer.
51-
if (type === 'TIMERWRAP') return
52-
53-
+ const indent = ' '.repeat(triggerAsyncId % 80)
54-
+ process._rawDebug(`${indent}${type}(${asyncId}): triggerAsyncId=${triggerAsyncId} executionAsyncId=${asyncHooks.executionAsyncId()}`);
46+
diff --git a/lib/instrumentation/run-context/AsyncHooksRunContextManager.js b/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
47+
index 94376188..571539aa 100644
48+
--- a/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
49+
+++ b/lib/instrumentation/run-context/AsyncHooksRunContextManager.js
50+
@@ -60,6 +60,8 @@ class AsyncHooksRunContextManager extends BasicRunContextManager {
51+
return
52+
}
53+
54+
+ process._rawDebug(`${' '.repeat(triggerAsyncId % 80)}${type}(${asyncId}): triggerAsyncId=${triggerAsyncId} executionAsyncId=${asyncHooks.executionAsyncId()}`);
5555
+
56-
const transaction = ins.currentTransaction
57-
if (!transaction) return
58-
56+
const context = this._stack[this._stack.length - 1]
57+
if (context !== undefined) {
58+
this._runContextFromAsyncId.set(asyncId, context)
5959
```
6060

6161

NOTICE.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
apm-agent-nodejs
2+
Copyright 2011-2021 Elasticsearch B.V.
3+
14
# Notice
25

36
This project contains several dependencies which have been vendored in
@@ -95,3 +98,14 @@ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
9598
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
9699
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
97100
THE SOFTWARE.
101+
102+
## opentelemetry-js
103+
104+
- **path:** [lib/instrumentation/run-context/](lib/instrumentation/run-context/)
105+
- **author:** OpenTelemetry Authors
106+
- **project url:** https://github.com/open-telemetry/opentelemetry-js
107+
- **original file:** https://github.com/open-telemetry/opentelemetry-js/tree/main/packages/opentelemetry-context-async-hooks/src
108+
- **license:** Apache License 2.0, https://github.com/open-telemetry/opentelemetry-js/blob/main/packages/opentelemetry-context-async-hooks/LICENSE
109+
110+
Parts of "lib/instrumentation/run-context" have been adapted from or influenced
111+
by TypeScript code in `@opentelemetry/context-async-hooks`.

lib/agent.js

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -50,26 +50,26 @@ function Agent () {
5050

5151
Object.defineProperty(Agent.prototype, 'currentTransaction', {
5252
get () {
53-
return this._instrumentation.currentTransaction
53+
return this._instrumentation.currTransaction()
5454
}
5555
})
5656

5757
Object.defineProperty(Agent.prototype, 'currentSpan', {
5858
get () {
59-
return this._instrumentation.currentSpan
59+
return this._instrumentation.currSpan()
6060
}
6161
})
6262

6363
Object.defineProperty(Agent.prototype, 'currentTraceparent', {
6464
get () {
65-
const current = this.currentSpan || this.currentTransaction
65+
const current = this._instrumentation.currSpan() || this._instrumentation.currTransaction()
6666
return current ? current.traceparent : null
6767
}
6868
})
6969

7070
Object.defineProperty(Agent.prototype, 'currentTraceIds', {
7171
get () {
72-
return this._instrumentation.ids
72+
return this._instrumentation.ids()
7373
}
7474
})
7575

@@ -86,6 +86,9 @@ Object.defineProperty(Agent.prototype, 'currentTraceIds', {
8686
// - There may be in-flight tasks (in ins.addEndedSpan() and
8787
// agent.captureError() for example) that will complete after this destroy
8888
// completes. They should have no impact other than CPU/resource use.
89+
// - The patching of core node functions when `asyncHooks=false` is *not*
90+
// undone. This means run context tracking for `asyncHooks=false` is broken
91+
// with in-process multiple-Agent use.
8992
Agent.prototype.destroy = function () {
9093
if (this._transport && this._transport.destroy) {
9194
this._transport.destroy()
@@ -275,27 +278,27 @@ Agent.prototype.setFramework = function ({ name, version, overwrite = true }) {
275278
}
276279

277280
Agent.prototype.setUserContext = function (context) {
278-
var trans = this.currentTransaction
281+
var trans = this._instrumentation.currTransaction()
279282
if (!trans) return false
280283
trans.setUserContext(context)
281284
return true
282285
}
283286

284287
Agent.prototype.setCustomContext = function (context) {
285-
var trans = this.currentTransaction
288+
var trans = this._instrumentation.currTransaction()
286289
if (!trans) return false
287290
trans.setCustomContext(context)
288291
return true
289292
}
290293

291294
Agent.prototype.setLabel = function (key, value, stringify) {
292-
var trans = this.currentTransaction
295+
var trans = this._instrumentation.currTransaction()
293296
if (!trans) return false
294297
return trans.setLabel(key, value, stringify)
295298
}
296299

297300
Agent.prototype.addLabels = function (labels, stringify) {
298-
var trans = this.currentTransaction
301+
var trans = this._instrumentation.currTransaction()
299302
if (!trans) return false
300303
return trans.addLabels(labels, stringify)
301304
}
@@ -419,11 +422,11 @@ Agent.prototype.captureError = function (err, opts, cb) {
419422
const handled = opts.handled !== false // default true
420423
const shouldCaptureAttributes = opts.captureAttributes !== false // default true
421424
const skipOutcome = Boolean(opts.skipOutcome)
422-
const span = this.currentSpan
425+
const span = this._instrumentation.currSpan()
423426
const timestampUs = (opts.timestamp
424427
? Math.floor(opts.timestamp * 1000)
425428
: Date.now() * 1000)
426-
const trans = this.currentTransaction
429+
const trans = this._instrumentation.currTransaction()
427430
const traceContext = (span || trans || {})._context
428431

429432
// As an added feature, for *some* cases, we capture a stacktrace at the point

lib/instrumentation/async-hooks.js

Lines changed: 0 additions & 113 deletions
This file was deleted.

lib/instrumentation/http-shared.js

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ exports.instrumentRequest = function (agent, moduleName) {
1818

1919
if (isRequestBlacklisted(agent, req)) {
2020
agent.logger.debug('ignoring blacklisted request to %s', req.url)
21-
// don't leak previous transaction
22-
agent._instrumentation.currentTransaction = null
21+
// Don't leak previous transaction.
22+
agent._instrumentation.supersedeWithEmptyRunContext()
2323
} else {
2424
var traceparent = req.headers.traceparent || req.headers['elastic-apm-traceparent']
2525
var tracestate = req.headers.tracestate
@@ -152,7 +152,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
152152
// however a traceparent header must still be propagated
153153
// to indicate requested services should not be sampled.
154154
// Use the transaction context as the parent, in this case.
155-
var parent = span || agent.currentTransaction
155+
var parent = span || ins.currTransaction()
156156
if (parent && parent._context) {
157157
const headerValue = parent._context.toTraceParentString()
158158
const traceStateValue = parent._context.toTraceStateString()
@@ -181,7 +181,7 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
181181
// Or if it's somehow preferable to listen for when a `response` listener
182182
// is added instead of when `response` is emitted.
183183
const emit = req.emit
184-
req.emit = function (type, res) {
184+
req.emit = function wrappedEmit (type, res) {
185185
if (type === 'response') onresponse(res)
186186
if (type === 'abort') onAbort(type)
187187
return emit.apply(req, arguments)
@@ -228,17 +228,11 @@ exports.traceOutgoingRequest = function (agent, moduleName, method) {
228228
}
229229

230230
function onresponse (res) {
231-
// Work around async_hooks bug in Node.js 12.0 - 12.2 (https://github.com/nodejs/node/pull/27477)
232-
ins._recoverTransaction(span.transaction)
233-
234231
agent.logger.debug('intercepted http.ClientRequest response event %o', { id: id })
235232
ins.bindEmitter(res)
236-
237233
statusCode = res.statusCode
238-
239234
res.prependListener('end', function () {
240235
agent.logger.debug('intercepted http.IncomingMessage end event %o', { id: id })
241-
242236
onEnd()
243237
})
244238
}

0 commit comments

Comments
 (0)