-
Notifications
You must be signed in to change notification settings - Fork 177
Add K8s-specific files to elastic-agent diagnostics bundle #9103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request does not have a backport label. Could you fix it @pchila? 🙏
|
2a07a04
to
b88d866
Compare
…ner logs collection for diagnostics
…iagnostics from the helm release
9f05aa5
to
0f17d47
Compare
💚 Build Succeeded
History
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
|
@@ -1507,6 +1507,9 @@ | |||
"clusterRole": { | |||
"$ref": "#/definitions/AgentPresetClusterRole" | |||
}, | |||
"role": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all of these Helm Chart changes necessary in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh @swiatekm you are probably onto something here, tell me which changes you think can go away? 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'm mostly asking for the Helm Chart changes to be its own PR, because this one is already quite big, and giving agent permissions by default is something we should more carefully review. Especially if it involves permissions to read Secrets in the namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge the diagnostics independently of the permissions, as they need to work gracefully even in their absence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if git isn't lying to me we have the following
file | changes |
---|---|
NOTICE-fips.txt | 873 +++++++++++++++++++++------------ |
NOTICE.txt | 873 +++++++++++++++++++++------------ |
deploy/helm/elastic-agent/examples/eck/rendered/manifest.yaml | 16 + |
deploy/helm/elastic-agent/examples/fleet-managed-certificates/rendered/manifest.yaml | 5 + |
deploy/helm/elastic-agent/examples/fleet-managed/rendered/manifest.yaml | 5 + |
deploy/helm/elastic-agent/examples/kubernetes-custom-output/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/examples/kubernetes-default/rendered/manifest.yaml | 10 + |
.../helm/elastic-agent/examples/kubernetes-hints-autodiscover/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/examples/kubernetes-ksm-sharding/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/examples/kubernetes-onboarding/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/examples/kubernetes-only-logs/rendered/manifest.yaml | 5 + |
deploy/helm/elastic-agent/examples/multiple-integrations/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/examples/netflow-service/rendered/manifest.yaml | 4 + |
deploy/helm/elastic-agent/examples/nginx-custom-integration/rendered/manifest.yaml | 4 + |
deploy/helm/elastic-agent/examples/priority-class/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/examples/statefulset-preset/rendered/manifest.yaml | 4 + |
deploy/helm/elastic-agent/examples/system-custom-auth-paths/rendered/manifest.yaml | 5 + |
deploy/helm/elastic-agent/examples/user-cluster-role/rendered/manifest.yaml | 4 + |
deploy/helm/elastic-agent/examples/user-service-account/rendered/manifest.yaml | 10 + |
deploy/helm/elastic-agent/templates/agent/_helpers.tpl | 2 + |
deploy/helm/elastic-agent/templates/agent/cluster-role.yaml | 1 + |
deploy/helm/elastic-agent/templates/agent/eck/daemonset.yaml | 7 + |
deploy/helm/elastic-agent/templates/agent/eck/deployment.yaml | 7 + |
deploy/helm/elastic-agent/templates/agent/eck/statefulset.yaml | 7 + |
deploy/helm/elastic-agent/templates/agent/k8s/daemonset.yaml | 4 + |
deploy/helm/elastic-agent/templates/agent/k8s/deployment.yaml | 4 + |
deploy/helm/elastic-agent/templates/agent/k8s/statefulset.yaml | 4 + |
deploy/helm/elastic-agent/templates/agent/role-binding.yaml | 38 ++ |
deploy/helm/elastic-agent/templates/agent/role.yaml | 37 ++ |
deploy/helm/elastic-agent/values.schema.json | 73 +++ |
go.mod | 2 +- |
internal/pkg/agent/application/actions/handlers/handler_action_diagnostics.go | 19 +- |
internal/pkg/agent/application/actions/handlers/handler_action_diagnostics_test.go | 4 +- |
internal/pkg/agent/cmd/run.go | 2 +- |
internal/pkg/diagnostics/diagnostics.go | 10 +- |
internal/pkg/diagnostics/diagnostics_k8s.go | 579 ++++++++++++++++++++++ |
internal/pkg/diagnostics/diagnostics_k8s_test.go | 1060 ++++++++++++++++++++++++++++++++++++++++ |
internal/pkg/diagnostics/diagnostics_test.go | 34 +- |
internal/pkg/diagnostics/testdata/helm.release.v1.secret.data | 1 + |
internal/pkg/diagnostics/testdata/helm.release.v2.secret.data | 1 + |
pkg/control/v2/server/server.go | 18 +- |
testing/integration/k8s/common.go | 94 ++++ |
testing/integration/k8s/kubernetes_agent_standalone_test.go | 88 ++++ |
so with some calculations, 3032 changes are because of generated files and testing code.
Now if we do split them up the diagnostics and helm chart change, we will reduce this PR by ~500 changes which isn't that dramatic of a difference; that said, no prob just say the word and this PR is split up in two 🙂
What does this PR do?
This PR adds kubernetes data to the elastic-agent diagnostics bundle, specifically:
The new diagnostics files are compressed in a .zip archive
elastic-agent-k8s.zip
within the elastic-agent diagnostics bundle.Why is it important?
This should help with initial investigation time for elastic-agent issues running on k8s by collecting as much information about the affected agents as possible along with the usual diagnostics files.
Checklist
./changelog/fragments
using the changelog toolDisruptive User Impact
How to test this PR locally
SNAPSHOT=true EXTERNAL=true PACKAGES=docker DOCKER_VARIANTS=basic PLATFORMS="linux/amd64" mage -v clean package
elastic-agent diagnostics
commandkubectl -n kube-system exec agent-clusterwide-elastic-agent-66cb9c54b7-v6k2s -c agent -- elastic-agent diagnostics -f /tmp/diag.zip
➜ scratch unzip -d diag ./diag.zip Archive: ./diag.zip inflating: diag/version.txt inflating: diag/package.version inflating: diag/goroutine.pprof.gz ... more files here ... inflating: diag/elastic-agent-k8s.zip # <--- new file added by this PR ... more files here ... creating: diag/logs/ creating: diag/logs/data/ inflating: diag/logs/data/elastic-agent-20250725.ndjson
Related issues
Questions to ask yourself