Add a step to exhaustive tests for observabilitySRE accetpance testing #17623

donoghuc · 2025-05-01T22:08:07Z

Release notes

[rn:skip]

What does this PR do?

This commit shows the proposed pattern for adding acceptance testing for the observability SRE image. This will run when exhaustive tests run. A new gradle task will hook in to rspec similar to how it is done for the smoke tests. The main difference is that instead of building a container, the latest is pulled from the container registry and run on a fips configured host VM. Tests have been added showing data flowing under FIPS mode from filebeat through logstash to elasticsearch. Test coverage has also been added to cover what happens when logstash is configured to send or recieve data in non-fips TLS. We show that an error is logged and no data is sent/recieved.

Why is it important/What is the impact to the user?

NA

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files (and/or docker env variables)
I have added tests that prove my fix is effective or that my feature works

Related issues

.buildkite/scripts/exhaustive-tests/generate-steps.py

donoghuc · 2025-05-01T22:11:03Z

x-pack/build.gradle

+    description = "Run ObservabilitySRE acceptance tests"
+    // Need to have set up the ruby environment for rspec even through we are running in container
+    dependsOn(":bootstrap", ":logstash-core:assemble", ":installDevelopmentGems")
+    // TODO: hook in to rspec


I've proved out that on a fips host we can run these tasks. This will give us a consistent java/jruby env for which we can do the standard gradle/rake/rspec control flow.

For the next step in this PR i will be adding in some rspec that shows a pattern for doing container orchestration. At this point i'm thinking that will look like shelling out to docker-compose in rspec.

x-pack/distributions/internal/observabilitySRE/qa/acceptance/docker/docker-compose.yml

...s/internal/observabilitySRE/qa/acceptance/docker/elasticsearch/config/elasticsearch-fips.yml

donoghuc · 2025-05-02T20:41:34Z

Still very much WIP. I have some cleanup and bugs to track down. Just wanted to get the structure out there breaking down responsibility between gradle/rspec etc and float the idea of changing configuration of components via interpolation in the docker compose file.

donoghuc · 2025-05-05T23:10:54Z

I've got this all working locally. Specifically the rspec tests will now test that data goes from LS to ES with gradle generating certs and rspec managing container startup/teardown.

We are still waiting on unblocking generation of a fips enabled test runner, but I'm happy with the patterns established here breaking down the responsibility between gradle/rspec and docker-compose.

yaauie

The structure looks sensible, and leaves enough hook-points for future variations.

I've left a note about encapsulating the complexity of controlling docker from the hooks. Feel free to resolve that as you see fit (even if that is just acknowledging my nitpick without addressing it).

yaauie · 2025-05-22T15:45:52Z

x-pack/distributions/internal/observabilitySRE/qa/acceptance/spec/acceptance_tests_spec.rb

+
+  context "when running with non-FIPS compliant configuration" do
+    before(:all) do
+      system("cd #{__dir__}/../docker && LOGSTASH_PIPELINE=logstash-to-elasticsearch-weak.conf docker-compose up -d") or fail "Failed to start Docker Compose with weak SSL"


nit: since each test context grouping needs to do some variation of this, we can define helper methods to encapsulate the complexity. I also prefer to use the long version of flags in checked-in source (e.g., --detach and --volumes instead of -d and -v) since it makes it easier to understand the intention.

def docker_compose_up(env={}) = docker_compose_invoke("up --detach", env) def docker_compose_down(env={}) = docker_compose_invoke("down --volumes", env) def docker_compose_invoke(subcommand, env={}) env_str = env.map{ |k,v| "#{k.to_s.upcase}=#{Shellwords.escape(v)} "}.join command = "#{env_str}docker-compose #{subcommand}" work_dir = Pathname.new("__dir__/../docker").cleanpath system("cd #{Shellwords.escape(workdir} && #{command}") or fail "Failed to invoke Docker Compose with command `#{command}` in directory `#{work_dir}`" end

And I think we can use docker-compose's --project-directory to set the working directory and avoid the &&-chaining:

def docker_compose_invoke(subcommand, env={}) env_str = env.map{ |k,v| "#{k.to_s.upcase}=#{Shellwords.escape(v)} "}.join work_dir = Pathname.new("__dir__/../docker").cleanpath command = "#{env_str}docker-compose --project-directory=#{Shellwords.escape(work_dir)} #{subcommand}" system(command) or fail "Failed to invoke Docker Compose with command `#{command}`" end

But either would make this line look like:

Suggested change

system("cd #{__dir__}/../docker && LOGSTASH_PIPELINE=logstash-to-elasticsearch-weak.conf docker-compose up -d") or fail "Failed to start Docker Compose with weak SSL"

docker_compose_up(logstash_pipeline: 'logstash-to-elasticsearch-weak.conf')

Great suggestion. Incorporated.

yaauie · 2025-05-22T16:32:55Z

...ernal/observabilitySRE/qa/acceptance/docker/logstash/pipeline/logstash-to-elasticsearch.conf

+    # Generate this message indefinitely to give ES container time to come online
+    count => -1


in theory we shouldn't need to generate more than one, since the ES output is designed to retry its batch of events until all of them have been explicitly rejected by elasticseaerch (e.g., with a successful HTTP 2XX response from the bulk API containing individual rejections).

Ah, yeah i had a wrong assumption earlier. You are right here and actually this will save quite a bit of headache. Thanks!

This commit shows the proposed pattern for adding acceptance testing for the observability SRE image. This will run when exhaustive tests run. A new gradle task will hook in to rspec similar to how it is done for the smoke tests. The main difference is that instead of building a container, the latest is pulled from the container registry and run on a fips configured host VM.

…tests This commit shows the rough structure for how I am planning on handling docker compose networks for acceptance tests. The main idea is to use interpolation in the docker compose file to point to different configuration files for filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests showing behavior when the system is and is not configured properly for FIPS. The breakdown in responsibility is: 1. Gradle handles cert generation (similar to smoke test, this avoids checking in PKI) 2. Rspec handles stopping/starting docker compose and managing environment vars for intperolation in docker compose manifests (different from smoke tests where a single static docker compose is started in gradle) 3. Rspec handles deciding when containers are ready and querying state about data flowing through the system 4. Gradle cleans up certs THis is just a rough sketch, there are still bugs to be worked out but before i get too far in to it I want to get the idea out there.

This commit adds a test to show that data will not flow from LS to ES when weak non fips config is used.

This will be handled separately in a separate PR, but taking this commit for now on this branch.

The latest ES images do not require this workaround.

1. Remove rogue character from test file causing interpreter failure 2. Split out helpers for docker compose orchestration 3. Only send a single message instead of infinite through to ES

donoghuc · 2025-05-28T17:24:35Z

This is green! https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1924#019713af-a4d2-4cfa-8af1-1d705b88b30e

As described in elastic/ingest-dev#5471 this commit adds a test for filebeat sending data through logstash to elasticsearch using fips config.

This test ensures logstash will not accept data from filebeat when using weak tls configuration. See elastic/ingest-dev#5472

donoghuc · 2025-05-30T21:32:18Z

Kicked off https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1935

ci-agent-images PR has been approved and will be merged monday https://github.com/elastic/ci-agent-images/pull/1426#issuecomment-2923339566

Crytpo is actually kind of a funny.

donoghuc · 2025-05-30T22:05:51Z

...observabilitySRE/plugin/logstash-integration-fips_validation/lib/logstash/fips_validation.rb

@@ -42,7 +42,7 @@ def before_bootstrap_checks(runner)
      # ensure Bouncycastle is configured and ready
      begin
        if Java::org.bouncycastle.crypto.CryptoServicesRegistrar.isInApprovedOnlyMode
-          accumulator.success "Bouncycastle Crytpo is in `approved-only` mode"
+          accumulator.success "Bouncycastle Crypto is in `approved-only` mode"


Credit @robbavey for eagle eye 🦅

donoghuc · 2025-06-03T20:16:08Z

Somehow I lost d8b1980 I just re-introduced and kicked off a fresh test https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1938

donoghuc · 2025-06-03T21:16:24Z

Green!

2025-06-03 13:54:32 PDT | > Task :logstash-xpack:observabilitySREacceptanceTests
  | 2025-06-03 13:54:32 PDT |  
  | 2025-06-03 13:54:32 PDT | Finished in 5 minutes 11 seconds (files took 0.34338 seconds to load)
  | 2025-06-03 13:54:32 PDT | 4 examples, 0 failures
  | 2025-06-03 13:54:32 PDT |

https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1938

x-pack/distributions/internal/observabilitySRE/qa/acceptance/docker/docker-compose.yml

Similar to elastic#17627

donoghuc · 2025-06-06T17:57:52Z

Kicked off a build: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1947

Locally this is good:

➜  logstash git:(fedramp-high-acceptance-tests) ./gradlew observabilitySREacceptanceTests
> Task :logstash-xpack:observabilitySREacceptanceTests

    Finished in 3 minutes 20 seconds (files took 0.27663 seconds to load)
    4 examples, 0 failures


org.logstash.xpack.test.RSpecObservabilitySREAcceptanceTests > rspecTests PASSED

--------------------------------------------------------------------
|  Results: SUCCESS (1 tests, 1 successes, 0 failures, 0 skipped)  |
--------------------------------------------------------------------

Use the same buildkite agent script for setting up a vm based runner as other pipes

donoghuc · 2025-06-06T19:32:51Z

Ugh, i had lost a1504c4 again. Re-kicked the buildkite validation https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1948

elastic-sonarqube · 2025-06-06T19:47:05Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube

elasticmachine · 2025-06-06T20:02:09Z

💚 Build Succeeded

Buildkite Build
Commit: a1504c4

History

💛 Build #3000 was flaky 721e13b
💛 Build #2996 was flaky fb096e0
💚 Build #2992 succeeded de9af76
💚 Build #2981 succeeded d8b1980
💛 Build #2972 was flaky 38f2b8b
💚 Build #2971 succeeded a0b1f8e

robbavey

One question about the elasticsearch.yml config, feel free to merge if that setting is not required

robbavey · 2025-06-06T21:16:00Z

...s/internal/observabilitySRE/qa/acceptance/docker/elasticsearch/config/elasticsearch-fips.yml

+discovery.type: single-node
+http.port: 9200
+network.host: 0.0.0.0
+# Security settings


Does xpack.security.fips_mode.enabled need to be set to true, or is this not required with the the elasticsearch-cloud-ess-fips docker image?

(https://www.elastic.co/docs/deploy-manage/security/fips-140-2)

I will add this as investigation in https://github.com/elastic/ingest-dev/issues/5320 (added a point in google doc to track this down).

donoghuc · 2025-06-06T23:44:08Z

Buildkite was green and I added the last question to another ticket #17623 (comment) Will see how this does over the weekend. I expect it to stay green

elastic#17623) * Add a step to exhaustive tests for observabilitySRE accetpance testing This commit shows the proposed pattern for adding acceptance testing for the observability SRE image. This will run when exhaustive tests run. A new gradle task will hook in to rspec similar to how it is done for the smoke tests. The main difference is that instead of building a container, the latest is pulled from the container registry and run on a fips configured host VM. * WIP: Idea for how to handle multipe container configs for acceptance tests This commit shows the rough structure for how I am planning on handling docker compose networks for acceptance tests. The main idea is to use interpolation in the docker compose file to point to different configuration files for filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests showing behavior when the system is and is not configured properly for FIPS. The breakdown in responsibility is: 1. Gradle handles cert generation (similar to smoke test, this avoids checking in PKI) 2. Rspec handles stopping/starting docker compose and managing environment vars for intperolation in docker compose manifests (different from smoke tests where a single static docker compose is started in gradle) 3. Rspec handles deciding when containers are ready and querying state about data flowing through the system 4. Gradle cleans up certs THis is just a rough sketch, there are still bugs to be worked out but before i get too far in to it I want to get the idea out there. * Add tests describing behavior of LS -> ES with non-fips config This commit adds a test to show that data will not flow from LS to ES when weak non fips config is used. * Use latest ES image This will be handled separately in a separate PR, but taking this commit for now on this branch. * Remove custom entrypoint from new container The latest ES images do not require this workaround. * Take up code review suggestions 1. Remove rogue character from test file causing interpreter failure 2. Split out helpers for docker compose orchestration 3. Only send a single message instead of infinite through to ES * Add full prefix name for new image * Test filebeat -> LS -> ES using fips config As described in elastic/ingest-dev#5471 this commit adds a test for filebeat sending data through logstash to elasticsearch using fips config. * Test LS wont accept input from non fips configured filebeat This test ensures logstash will not accept data from filebeat when using weak tls configuration. See elastic/ingest-dev#5472 * Fix a funny typo. Crytpo is actually kind of a funny. * Ensure we are using the purpose build ES image in testing Similar to elastic#17627 * Ensure JAVA_HOME is set etc Use the same buildkite agent script for setting up a vm based runner as other pipes

…in (#17785) * forward-port observabilitySRE image creation into `main` This is the CLEAN subset of a cherry-pick of the merge-commit from the observabilitySRE feature branch into 8.x in PR #17541 (0b1d299), OMITTING changes to `docker/*` and `rakelib/artifacts.rake` that would conflict due to substantial refactorings on `main`. * forward-port observabilitySRE image creation into `main` (re-implament) This is a forward-port of _functionality_ from the observabilitySRE feature branch into 8.x in PR #17541 (0b1d299), wholly re-implementing the changes in `docker/*` and `rakelib/artifacts.rake` from the 8.x-style docker structure to the refactored structure present on `main`. * Fix pull request pipeline definition for buildkite (#17552) When the fedramp high feature branch was merged into 8.x the PR pipeline accidentally duplicated the top level `steps` key. This was a mistake and is causing issues generating exhaustive test pipeline definition. This commit fixes the bug by ensuring there is a single `steps` key that defines all the steps in the pipeline. * Ensure observabilitySRE image is pushed on DRA staging (#17569) The `artifactDockerObservabilitySRE` gradle task *always* produces a tag with a `SNAPSHOT` postfix. In the staging pipeline we use the shared `qualified-version` script for determining the LS version. That script correctly handles conditionally adding a `SNAPSHOT` postfix which is important for the tagging scheme for pushing to our container registry. Given the intermediate tag produced by the gradle task is never pushed anywhere we can update the build script to ensure the "local" artifact is always referenced with the `SNAPSHOT` postfix. * Use dedicated elasticsearch image for observabilitySRE smoke testing (#17627) * Use dedicated elasticsearch image for observabilitySRE smoke testing The ES team has started publishing a purpose built image for the fedramp high project. Update our smoke test stack to use this container. * Override default entrypoint into elasticsearch container The new image does not provide the stub `/app/elasticsearch.sh` file https://github.com/elastic/elasticsearch/blob/1a1763c591c4c32bf66f0df3bce2040e8f19a1a2/distribution/docker/README.md?plain=1#L16-L19 previously available. This commit overrides the entrypoint to avoid needing that file. See: https://github.com/elastic/elasticsearch/blob/1a1763c591c4c32bf66f0df3bce2040e8f19a1a2/distribution/docker/src/docker/Dockerfile.ess#L38C5-L40C37 * Remove entrypoint workaround due to fix landing upstream * Restore code review changes (#17539) * Comment to clarify why FIPS flag is not needed for smoke tests * Use full versions of docker commands for readability * Simplify grock pattern match The grok pattern is unanchored-by-default, we don't need the leading and trailing wildcards. * Add a step to exhaustive tests for observabilitySRE accetpance testing (#17623) * Add a step to exhaustive tests for observabilitySRE accetpance testing This commit shows the proposed pattern for adding acceptance testing for the observability SRE image. This will run when exhaustive tests run. A new gradle task will hook in to rspec similar to how it is done for the smoke tests. The main difference is that instead of building a container, the latest is pulled from the container registry and run on a fips configured host VM. * WIP: Idea for how to handle multipe container configs for acceptance tests This commit shows the rough structure for how I am planning on handling docker compose networks for acceptance tests. The main idea is to use interpolation in the docker compose file to point to different configuration files for filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests showing behavior when the system is and is not configured properly for FIPS. The breakdown in responsibility is: 1. Gradle handles cert generation (similar to smoke test, this avoids checking in PKI) 2. Rspec handles stopping/starting docker compose and managing environment vars for intperolation in docker compose manifests (different from smoke tests where a single static docker compose is started in gradle) 3. Rspec handles deciding when containers are ready and querying state about data flowing through the system 4. Gradle cleans up certs THis is just a rough sketch, there are still bugs to be worked out but before i get too far in to it I want to get the idea out there. * Add tests describing behavior of LS -> ES with non-fips config This commit adds a test to show that data will not flow from LS to ES when weak non fips config is used. * Use latest ES image This will be handled separately in a separate PR, but taking this commit for now on this branch. * Remove custom entrypoint from new container The latest ES images do not require this workaround. * Take up code review suggestions 1. Remove rogue character from test file causing interpreter failure 2. Split out helpers for docker compose orchestration 3. Only send a single message instead of infinite through to ES * Add full prefix name for new image * Test filebeat -> LS -> ES using fips config As described in elastic/ingest-dev#5471 this commit adds a test for filebeat sending data through logstash to elasticsearch using fips config. * Test LS wont accept input from non fips configured filebeat This test ensures logstash will not accept data from filebeat when using weak tls configuration. See elastic/ingest-dev#5472 * Fix a funny typo. Crytpo is actually kind of a funny. * Ensure we are using the purpose build ES image in testing Similar to #17627 * Ensure JAVA_HOME is set etc Use the same buildkite agent script for setting up a vm based runner as other pipes --------- Co-authored-by: Cas Donoghue <[email protected]>

donoghuc commented May 1, 2025

View reviewed changes

.buildkite/scripts/exhaustive-tests/generate-steps.py Outdated Show resolved Hide resolved

donoghuc commented May 1, 2025

View reviewed changes

donoghuc commented May 2, 2025

View reviewed changes

x-pack/distributions/internal/observabilitySRE/qa/acceptance/docker/docker-compose.yml Show resolved Hide resolved

donoghuc commented May 2, 2025

View reviewed changes

...s/internal/observabilitySRE/qa/acceptance/docker/elasticsearch/config/elasticsearch-fips.yml Show resolved Hide resolved

donoghuc force-pushed the fedramp-high-acceptance-tests branch 3 times, most recently from 94869d9 to 6c53570 Compare May 5, 2025 22:44

donoghuc marked this pull request as ready for review May 5, 2025 23:08

yaauie self-requested a review May 22, 2025 14:51

yaauie approved these changes May 22, 2025

View reviewed changes

donoghuc added 6 commits May 22, 2025 14:38

Add tests describing behavior of LS -> ES with non-fips config

3873b44

This commit adds a test to show that data will not flow from LS to ES when weak non fips config is used.

Use latest ES image

0b776dc

This will be handled separately in a separate PR, but taking this commit for now on this branch.

Remove custom entrypoint from new container

1630ca2

The latest ES images do not require this workaround.

Take up code review suggestions

a0b1f8e

1. Remove rogue character from test file causing interpreter failure 2. Split out helpers for docker compose orchestration 3. Only send a single message instead of infinite through to ES

donoghuc force-pushed the fedramp-high-acceptance-tests branch from a257cad to a0b1f8e Compare May 23, 2025 18:51

Add full prefix name for new image

38f2b8b

donoghuc added 2 commits May 30, 2025 14:17

Test filebeat -> LS -> ES using fips config

018b6a1

As described in elastic/ingest-dev#5471 this commit adds a test for filebeat sending data through logstash to elasticsearch using fips config.

Test LS wont accept input from non fips configured filebeat

d39a080

This test ensures logstash will not accept data from filebeat when using weak tls configuration. See elastic/ingest-dev#5472

donoghuc force-pushed the fedramp-high-acceptance-tests branch from d8b1980 to d39a080 Compare May 30, 2025 21:22

Fix a funny typo.

de9af76

Crytpo is actually kind of a funny.

donoghuc commented May 30, 2025

View reviewed changes

donoghuc commented Jun 6, 2025

View reviewed changes

x-pack/distributions/internal/observabilitySRE/qa/acceptance/docker/docker-compose.yml Outdated Show resolved Hide resolved

Ensure we are using the purpose build ES image in testing

721e13b

Similar to elastic#17627

donoghuc force-pushed the fedramp-high-acceptance-tests branch from fb096e0 to 721e13b Compare June 6, 2025 17:48

Ensure JAVA_HOME is set etc

a1504c4

Use the same buildkite agent script for setting up a vm based runner as other pipes

robbavey approved these changes Jun 6, 2025

View reviewed changes

donoghuc merged commit 10e41a8 into elastic:8.19 Jun 6, 2025
7 checks passed

yaauie mentioned this pull request Jul 9, 2025

Forwardport observability-sre internal distro support from 8.19 to main #17785

Merged

	system("cd #{__dir__}/../docker && LOGSTASH_PIPELINE=logstash-to-elasticsearch-weak.conf docker-compose up -d") or fail "Failed to start Docker Compose with weak SSL"
	docker_compose_up(logstash_pipeline: 'logstash-to-elasticsearch-weak.conf')

		# Generate this message indefinitely to give ES container time to come online
		count => -1

Add a step to exhaustive tests for observabilitySRE accetpance testing #17623

Add a step to exhaustive tests for observabilitySRE accetpance testing #17623

Uh oh!

Conversation

donoghuc commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Release notes

What does this PR do?

Why is it important/What is the impact to the user?

Checklist

Related issues

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

donoghuc commented May 2, 2025

Uh oh!

donoghuc commented May 5, 2025

Uh oh!

yaauie left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

donoghuc commented May 28, 2025

Uh oh!

donoghuc commented May 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

donoghuc commented Jun 3, 2025

Uh oh!

donoghuc commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

donoghuc commented Jun 6, 2025

Uh oh!

donoghuc commented Jun 6, 2025

Uh oh!

elastic-sonarqube bot commented Jun 6, 2025

Quality Gate passed

Uh oh!

elasticmachine commented Jun 6, 2025

💚 Build Succeeded

History

Uh oh!

robbavey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

donoghuc commented Jun 6, 2025

Uh oh!

Uh oh!

donoghuc commented May 1, 2025 •

edited

Loading

yaauie left a comment •

edited

Loading

donoghuc commented Jun 3, 2025 •

edited

Loading