Skip to content

Blog on "What's new in Network Observability 1.9" #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

stleerh
Copy link
Contributor

@stleerh stleerh commented Jul 17, 2025

  • Let me know if I should leave in Enable IPsec on OVN-Kubernetes. I think it's useful since the documentation is not that clear, but I admit it's not directly related to Network Observability.

  • We should spell IPsec like this (lowercase 's') and not IPSec (capital 'S').

  • Is there a doc that provides all the field names for creating a flowlogs-pipeline filter query or the query for Network Observability CLI?

Copy link

github-actions bot commented Jul 17, 2025

🚀 PR Preview for netobserv.io has been successfully deployed!
It's available at https://netobserv-io-blog-26-preview.surge.sh and will be removed when the PR closes.

ovnkube-node-dtq28 7/8 Running 8 4m11s
```

However, Network Observability will report an "AxiosError" if using Loki. Wait at least another 10 minutes or more until the service CA (Certificate Authority) is updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sentence make me feel like there is a bug on our side 😆

I would prefer to simply remove it as you mention above that the connectivity to the cluster will be lost.
Else we should force service CA to update faster if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took Mehul's suggestion to issue the oc adm wait-for-stable-cluster command to wait the cluster to be stable.


### Network Observability - IPsec feature

In **Observe > Network Traffic, Traffic flows tab**, it adds a new column **IPSec Status** that has the possible values of "success", "error", or "n/a" (Figure 2).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, on "IPsec" versus "IPSec": I know the former is correct for the acronym. I guess it's more debatable when it's part os a json/camel-case name, such as IPSecStatus;
I've opened this PR to fix just the column/filter display name, without changing the underlying json name: netobserv/network-observability-operator#1778

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I thought I changed that already before releasing ... looks like I missed some places 😥

@jotak
Copy link
Member

jotak commented Jul 22, 2025

Thanks @stleerh ! Nice blog, as always :-)
I think it's nice to have this little aside on setting up ipsec

Copy link
Member

@memodi memodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work @stleerh

to: 9001
```

Make the two changes with the comment "change" to enable IPsec. Reduce the network MTU by 46 bytes, which are needed by the ESP header to do IPsec encryption. My current network MTU value was 8901 (using jumbo frames) and was reduced to 8855. You also need to provide the machine or physical MTU on the interface even though it won't be changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused, if additional ESP header bytes are added to existing packets, why the MTU needs to be reduced as opposed to increasing it to avoid fragmentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem counter-intuitive, but here's the explanation. There are two MTUs, the machine MTU for the NIC and the overlay network MTU for OVN-Kubernetes. You don't want to exceed the machine MTU as that causes fragmentation, and when using ESP, fragmented packets can even get dropped.

In my case, the machine MTU was 9001 and the network MTU was 8901. It is 100 bytes less to provide overhead for Geneve. We need to reduce it another 46 bytes to 8855 for the IPsec overhead. Therefore, packets generated by the pods will not exceed 8855 bytes. Once the overheads are added, it shouldn't exceed the machine MTU or 9001.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks for the explanation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! had the same question :)

ovnkube-node-dtq28 7/8 Running 8 4m11s
```

However, Network Observability will report an "AxiosError" if using Loki. Wait at least another 10 minutes or more until the service CA (Certificate Authority) is updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I ran into that error, perhaps I waited long enough for cluster and workloads to stabilize. I usually use command:

oc adm wait-for-stable-cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This works better, although I am seeing other problems.

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@msherif1234 msherif1234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice blog!! just few comments


To enable IPsec, follow the instructions on [Configuring IPsec encryption](https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/networking/network-security#configuring-ipsec-ovn). If you didn't enable IPsec during cluster installation, it's a bit tricky to set up, so I've provided a quick guide below on setting this up in a test environment.

Network Observability can identify encrypted IPsec traffic between pods. In the OpenShift web console, when you create the FlowCollector instance, scroll down to the **Agent configuration** and open up the section named **Features**. In the dropdown for **Value**, select **IPSec** as shown in Figure 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean between nodes not pods right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is referring to the actual IPsec traffic, which is either pod-to-pod or host-to-pod. It mentions (in line 43):

IPsec flows always appear as node-to-node traffic, but they are actually encapsulated pod-to-pod or host-to-pod traffic.

Figure 2: Flows table with IPSec Status column

IPsec flows always appear as node-to-node traffic, but they are actually encapsulated pod-to-pod or host-to-pod traffic. There are two types of encapsulation used for IPsec-encrypted flows. The first is ESP encapsulation, which is the traditional IPsec mode. ESP packets don't have ports, hence the ports are `n/a`. The second is UDP encapsulation. In the table, the destination port is 6081, so they are OVN Geneve tunnel traffic. If you only see UDP encapsulated traffic (no ESP), then you must have configured `encapsulation: Always` when configuring IPsec.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also refer to ipsec related filters ?

Copy link
Contributor Author

@stleerh stleerh Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there more filters than the "IPSec Status" column?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so @jpinsonneau pls confirm

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no there's only that one


### Enable IPsec on OVN-Kubernetes

Here are the steps to enable IPsec on OVN-Kubernetes. You can skip this section if you already have IPsec enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be replaced with pointer to OCP IPsec doc IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a link to the OCP IPsec doc earlier (line 19). Because there are some non-trivial issues that the doc doesn't point out, I feel leaving this section in has value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't think its good idea for netobserv doc to teach OCP IPsec users about IPsec it seems out of scope to me, probably we should open OCP doc bug if ipsec doc isn't complete or missing critical info WDYT ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, as long as it's the upstream / community blog, that's ok, we have a broader "freedom of expression" to mention this kind of things. But I'm not sure it will fly unchanged if you submit that to the RH blog ...

Copy link
Contributor Author

@stleerh stleerh Jul 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msherif1234 Actually, the OCP doc on configuring IPsec is complete in terms of explaining and covering every possible combination (ten sections from 3.6.1 to 3.6.10), whereas this blog focuses on the scenario supported by Network Observability. These ten sections do not cover prerequisites, such as reducing the MTU, which is in another documentation. It doesn't highlight what to expect when you issue some of these commands (e.g. need to wait a few minutes), which is typically not their documentation style.

They are documenting it holistically from an IPsec point-of-view, but this blog just wants you to be able to set up IPsec properly and enough so you can view the traffic using Network Observability. If you're unable to do that without having to read pages of pages of IPsec and MTU documentation, then the blog is not going to be very useful. In the end, if you already have IPsec enabled (unlikely), then it's harmless and you can skip this section, which is what it tells you to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, I did this in previous blogs, such as how to set up UDN, but it was a lot shorter to explain that.

Copy link

@leandroberetta leandroberetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice blog! Just made a few minor comments to align with the correct spelling of "IPsec".


To enable IPsec, follow the instructions on [Configuring IPsec encryption](https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/networking/network-security#configuring-ipsec-ovn). If you didn't enable IPsec during cluster installation, it's a bit tricky to set up, so I've provided a quick guide below on setting this up in a test environment.

Network Observability can identify encrypted IPsec traffic between pods. In the OpenShift web console, when you create the FlowCollector instance, scroll down to the **Agent configuration** and open up the section named **Features**. In the dropdown for **Value**, select **IPSec** as shown in Figure 1.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last IPsec mention needs to have the 's' in lowercase.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's unfortunate but @stleerh was right to spell it that way, because it's how it was written in the API: https://github.com/netobserv/network-observability-operator/blob/main/api/flowcollector/v1beta2/flowcollector_types.go#L194
The problem being that we can't change the API that easily, once it has been released, that would be a breaking change for users who have already set up ipsec.
Or maybe we could do a change to accept both spellings? Which wouldn't be a breaking change ...

Copy link
Member

@jotak jotak Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that's actually the same problem with the eBPF manager, if we wanted to adopt the correct acronym in spelling, that should be "eBPFManager")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case, I didn't realize this was a value used by the API. We can dismiss this suggestion.

In **Observe > Network Traffic, Traffic flows tab**, it adds a new column **IPSec Status** that has the possible values of "success", "error", or "n/a" (Figure 2).

![Flows table with IPsec status](flows-ipsec_status.png)<br>
Figure 2: Flows table with IPSec Status column

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPsec with s lowercase

| `--enable_ipsec` | Enable eBPF IPsec tracking feature |
| `--enable_network_events` | Enable eBPF Network Events feature |
| `--enable_udn_mapping` | Enable eBPF UDN Mapping feature |
| `--sampling` | Set sampling interval, defaults to 1 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you specifically changed sampling ratio to interval, can you elaborate why?
Previously I wanted to change "rate" to ratio because IMO "rate" isn't very accurate (we tend to understand rate as "something per-second", right?)
But interval doesn't seem more accurate either?

Copy link
Contributor Author

@stleerh stleerh Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ratio, such as 1:10 or 3:2, is the wrong term as it requires two values, the antecedent (1st value) and the consequent (2nd value). It typically reads like "1 in 10" or "1 for 10". In what we call "sampling", the antecedent is always 1 and the sampling value is the consequent. I don't think anyone will understand this if we call it the "sampling consequent".

It turns out the common term for this is "sampling interval". In fact, you can Google search what a "sampling interval" is. Here's one definition.

Sampling interval is the distance or time between which measurements are taken, or data is recorded. In research terms, also referred to as ‘nth selection’, this is when we select every nth participant ([sampling unit](http://www.djsresearch.co.uk/glossary/item/Sampling-Unit)) in the list; this sampling interval produces [a random selection](http://www.djsresearch.co.uk/glossary/item/Random-Sampling) from throughout the total population.

"Sampling rate" is, in fact, wrong. Rate is the inverse of interval. That is:

rate = 1 / interval
interval = 1 / rate

Therefore, if you say the sampling rate is 10, that means you are doing something 10 times per second (assuming the unit is in seconds), rather than 1 out of 10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I didn't know this meaning of "interval", I've been struggling to find an accurate term all this time
So now we need to update the texts & doc everywhere 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jotak
Copy link
Member

jotak commented Jul 30, 2025

@stleerh is it good to merge?

@stleerh
Copy link
Contributor Author

stleerh commented Aug 6, 2025

@stleerh is it good to merge?

Let's merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants