Skip to content

Conversation

@jotak
Copy link
Member

@jotak jotak commented Oct 14, 2025

  • Start implementing TLS, by reading the TLS header when present
  • Extract SSL version
  • Report the TLS version in output records

Dependencies

@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link

openshift-ci bot commented Oct 14, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oliviercazade for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jotak jotak changed the title WIP TLS NETOBSERV-2471: TLS usage tracking Oct 30, 2025
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 30, 2025

@jotak: This pull request references NETOBSERV-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.21.0" version, but no target version was set.

In response to this:

(wip status = pretty much done, except for handshake version (see code comment), maybe needs an experimental feature gate; correctness to be verified, as I'm getting less TLS flows than expected - am I missing something?)

use tls.VersionName

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

- Start implementing TLS, by reading the TLS header when present
- Extract SSL version (not done yet for the handshake message)
- Report the TLS version in output records
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 30, 2025

@jotak: This pull request references NETOBSERV-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.21.0" version, but no target version was set.

In response to this:

  • Start implementing TLS, by reading the TLS header when present
  • Extract SSL version (not done yet for the handshake message)
  • Report the TLS version in output records

TODO: handshake message: version is stored a couple of bits further

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 31, 2025

@jotak: This pull request references NETOBSERV-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.21.0" version, but no target version was set.

In response to this:

  • Start implementing TLS, by reading the TLS header when present
  • Extract SSL version (not done yet for the handshake message)
  • Report the TLS version in output records

TODO: handshake message: version is stored a couple of bits further

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jotak jotak marked this pull request as ready for review October 31, 2025 15:27
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 31, 2025

@jotak: This pull request references NETOBSERV-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.21.0" version, but no target version was set.

In response to this:

  • Start implementing TLS, by reading the TLS header when present
  • Extract SSL version
  • Report the TLS version in output records

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Oct 31, 2025

@jotak: This pull request references NETOBSERV-2471 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.21.0" version, but no target version was set.

In response to this:

  • Start implementing TLS, by reading the TLS header when present
  • Extract SSL version
  • Report the TLS version in output records

Dependencies

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

if (enable_dns_tracking) {
dns_errno = track_dns_packet(skb, &pkt);
}
track_tls_version(skb, &pkt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't think its better if we have feature config for tracker like we do with all other features ? instead of enabling it by default ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I'll do that (although I would like to have it enabled by default if it doesn't show visible impact on perfs)

if (pkt->ssl_version < aggregate_flow->ssl_version) {
aggregate_flow->ssl_version = pkt->ssl_version;
}
aggregate_flow->misc_flags |= MISC_FLAGS_SSL_MISMATCH;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it better to have mismatch counter instead of flag ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the flag allows us to correlate precisely with a particular flow, and report that up to the UI


// Extract TLS info
static inline void track_tls_version(struct __sk_buff *skb, pkt_info *pkt) {
if (pkt->id->transport_protocol == IPPROTO_TCP) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do u think this work for STCP protocol too ?, not sure what is the story for DTLS ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't digged into other protocols.. might be something to do if someone asks for it, but I believe TCP covers most of the expectations

struct tcphdr *tcp = (struct tcphdr *)pkt->l4_hdr;
if (!tcp || ((void *)tcp + sizeof(*tcp) > data_end)) {
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the verifier needs the above check given how late we call this tracker all headers should have been sane already ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I had a verifier error without those checks

};

// Extract TLS info
static inline void track_tls_version(struct __sk_buff *skb, pkt_info *pkt) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good idea to add return code and track failure at the caller since there many conditions check in this function and if things doesn't work it will be hard to debug.
might be good adding BPF_PRINTK() on failing checks ?


switch (rec.content_type) {
case CONTENT_TYPE_HANDSHAKE: {
pkt->ssl_version = ((u16)rec.major) << 8 | rec.minor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why setting it here if u really want handshake ssl version ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a fallback for the case where handshake header couldn't be read ... But I'm going to change that, actually perhaps we don't want the client-hello version at all, it doesn't tell what's actually going to be used

case CONTENT_TYPE_ALERT:
case CONTENT_TYPE_APP_DATA:
pkt->ssl_version = ((u16)rec.major) << 8 | rec.minor;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do u think adding SSL record type to the follow will be adding value to end user ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean adding it to the flow, like, as a bitfield? hmm why not ..

#define CONTENT_TYPE_CHANGE_CIPHER 0x14
#define CONTENT_TYPE_ALERT 0x15
#define CONTENT_TYPE_HANDSHAKE 0x16
#define CONTENT_TYPE_APP_DATA 0x17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on this gist it seems there also heartbeat
https://gist.github.com/coin8086/1cd0411447066a5a02be6a3e493479e2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe something for the feature could be measuring the handshake latency ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that would require a new map like we do for DNS, it's probably more impact on performance? If so, it should probably be a separate feature

Copy link
Contributor

@msherif1234 msherif1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!!, I left some comments/suggestions, nothing major small enhancements . Also pls share some screenshots for this new feature

return;
}

switch (rec.content_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic assume there is always tls record after tcp header this is not safe assumption we could have dns header here or any other header ??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I should harden that a bit more; I've found unexpected TLSVersion during my tests probably because of that.

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Nov 3, 2025
@github-actions
Copy link

github-actions bot commented Nov 3, 2025

New images:
quay.io/netobserv/ebpf-bytecode:16db482
quay.io/netobserv/netobserv-ebpf-agent:16db482

These will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=16db482 make set-agent-image

@openshift-ci
Copy link

openshift-ci bot commented Nov 3, 2025

@jotak: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/netobserv-cli-tests 9d32f26 link false /test netobserv-cli-tests
ci/prow/qe-e2e-tests 9d32f26 link false /test qe-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference ok-to-test To set manually when a PR is safe to test. Triggers image build on PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants