-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go: template/text.Template
execution methods: support reading arbitrary content
#17701
Go: template/text.Template
execution methods: support reading arbitrary content
#17701
Conversation
or | ||
defaultImplicitTaintReadGlobal(node, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to be done in the corresponding file in tainttracking2
.
any(ImplicitFieldReadNode ifrn).shouldImplicitlyReadAllFields(node) and | ||
getElementTypeIncludingFields*(node.getType()) = containerType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be slightly easier to understand if you replace this with or defaultImplicitTaintReadGlobal(node, c)
. (In theory you could also remove these lines and the overall functionality would be the same, but I think best to not do that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this new test folder is best in go/ql/test/library-tests/semmle/go/frameworks
? All the other folders in there are for tests of particular libraries. What about in go/ql/test/library-tests/semmle/go/dataflow
?
go/ql/test/library-tests/semmle/go/frameworks/serialization/texttemplate.go
Show resolved
Hide resolved
c0cbb4a
to
63ff0cb
Compare
Click to show differences in coveragegoGenerated file changes for go
- `Standard library <https://pkg.go.dev/std>`_,"````, ``archive/*``, ``bufio``, ``bytes``, ``cmp``, ``compress/*``, ``container/*``, ``context``, ``crypto``, ``crypto/*``, ``database/*``, ``debug/*``, ``embed``, ``encoding``, ``encoding/*``, ``errors``, ``expvar``, ``flag``, ``fmt``, ``go/*``, ``hash``, ``hash/*``, ``html``, ``html/*``, ``image``, ``image/*``, ``index/*``, ``io``, ``io/*``, ``log``, ``log/*``, ``maps``, ``math``, ``math/*``, ``mime``, ``mime/*``, ``net``, ``net/*``, ``os``, ``os/*``, ``path``, ``path/*``, ``plugin``, ``reflect``, ``reflect/*``, ``regexp``, ``regexp/*``, ``slices``, ``sort``, ``strconv``, ``strings``, ``sync``, ``sync/*``, ``syscall``, ``syscall/*``, ``testing``, ``testing/*``, ``text/*``, ``time``, ``time/*``, ``unicode``, ``unicode/*``, ``unsafe``",52,605,104
+ `Standard library <https://pkg.go.dev/std>`_,"````, ``archive/*``, ``bufio``, ``bytes``, ``cmp``, ``compress/*``, ``container/*``, ``context``, ``crypto``, ``crypto/*``, ``database/*``, ``debug/*``, ``embed``, ``encoding``, ``encoding/*``, ``errors``, ``expvar``, ``flag``, ``fmt``, ``go/*``, ``hash``, ``hash/*``, ``html``, ``html/*``, ``image``, ``image/*``, ``index/*``, ``io``, ``io/*``, ``log``, ``log/*``, ``maps``, ``math``, ``math/*``, ``mime``, ``mime/*``, ``net``, ``net/*``, ``os``, ``os/*``, ``path``, ``path/*``, ``plugin``, ``reflect``, ``reflect/*``, ``regexp``, ``regexp/*``, ``slices``, ``sort``, ``strconv``, ``strings``, ``sync``, ``sync/*``, ``syscall``, ``syscall/*``, ``testing``, ``testing/*``, ``text/*``, ``time``, ``time/*``, ``unicode``, ``unicode/*``, ``unsafe``",52,603,104
- Totals,,459,943,1532
+ Totals,,459,941,1532
- text/template,,,6,,,,,,,,,,,,,,,,,,,,,,,6,
+ text/template,,,4,,,,,,,,,,,,,,,,,,,,,,,4, |
fe808f7
to
d822d14
Compare
@smowton What is the status of this? Is it waiting for review, or are you still working on it? |
@owen-mc thanks to my currently blurry work situation I found time to fix it up. Currently running QA; please do review. |
QA results: new TPs from around 20 repos. Found and fixed an FP based on defining an interface that could be Significant slowdowns in two, which so far as I can see are simply caused by significantly more flow being possible when text-template instantiation is able to pull taint down from fields to top level. I propose to go ahead on grounds that the vast majority of repositories suffer no consequences, and that is real flow (though we could iterate to try to find heuristics to prevent excessively-expensive exploration if necessary). |
Observations: * revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps. * It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`. * `revFlowThroughArg` can then be trivially inlined. Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
Observations: * revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps. * It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`. * `revFlowThroughArg` can then be trivially inlined. Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking in good shape. I assume that we should wait for #18355 to be merged so that we can avoid the two significant slowdowns you found?
0988340
to
2580ea2
Compare
Observations: * revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps. * It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`. * `revFlowThroughArg` can then be trivially inlined. Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
Otherwise it's too easy to define a common interface to both text/template, which doesn't sanitize, and html/template, which does.
2580ea2
to
9504f36
Compare
Observations: * revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps. * It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`. * `revFlowThroughArg` can then be trivially inlined. Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
I rebased this on latest main and reran QA: found one large performance outlier, yaklang/yaklang. Tested whether the dataflow performance improvement I discovered before Christmas works there too; started a DCA experiment to check and reported more fully at #18355 |
Performance issue with yaklang resolved by #18515 |
Add a mechanism for specifying dataflow nodes that ought read taint through any access path, and use it to make
template/text
functions read taint out of fields.Note this PR currently has #18355 applied because it appears to significantly improve performance for some outlier repositories. This shouldn't be merged until that PR has been dealt with.