Skip to content

Add a step to exhaustive tests for observabilitySRE accetpance testing #17623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 6, 2025

Conversation

donoghuc
Copy link
Member

@donoghuc donoghuc commented May 1, 2025

Release notes

[rn:skip]

What does this PR do?

This commit shows the proposed pattern for adding acceptance testing for the observability SRE image. This will run when exhaustive tests run. A new gradle task will hook in to rspec similar to how it is done for the smoke tests. The main difference is that instead of building a container, the latest is pulled from the container registry and run on a fips configured host VM. Tests have been added showing data flowing under FIPS mode from filebeat through logstash to elasticsearch. Test coverage has also been added to cover what happens when logstash is configured to send or recieve data in non-fips TLS. We show that an error is logged and no data is sent/recieved.

Why is it important/What is the impact to the user?

NA

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Related issues

description = "Run ObservabilitySRE acceptance tests"
// Need to have set up the ruby environment for rspec even through we are running in container
dependsOn(":bootstrap", ":logstash-core:assemble", ":installDevelopmentGems")
// TODO: hook in to rspec
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've proved out that on a fips host we can run these tasks. This will give us a consistent java/jruby env for which we can do the standard gradle/rake/rspec control flow.

For the next step in this PR i will be adding in some rspec that shows a pattern for doing container orchestration. At this point i'm thinking that will look like shelling out to docker-compose in rspec.

@donoghuc
Copy link
Member Author

donoghuc commented May 2, 2025

Still very much WIP. I have some cleanup and bugs to track down. Just wanted to get the structure out there breaking down responsibility between gradle/rspec etc and float the idea of changing configuration of components via interpolation in the docker compose file.

@donoghuc donoghuc force-pushed the fedramp-high-acceptance-tests branch 3 times, most recently from 94869d9 to 6c53570 Compare May 5, 2025 22:44
@donoghuc donoghuc marked this pull request as ready for review May 5, 2025 23:08
@donoghuc
Copy link
Member Author

donoghuc commented May 5, 2025

I've got this all working locally. Specifically the rspec tests will now test that data goes from LS to ES with gradle generating certs and rspec managing container startup/teardown.

We are still waiting on unblocking generation of a fips enabled test runner, but I'm happy with the patterns established here breaking down the responsibility between gradle/rspec and docker-compose.

@yaauie yaauie self-requested a review May 22, 2025 14:51
Copy link
Member

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks sensible, and leaves enough hook-points for future variations.

I've left a note about encapsulating the complexity of controlling docker from the hooks. Feel free to resolve that as you see fit (even if that is just acknowledging my nitpick without addressing it).


context "when running with non-FIPS compliant configuration" do
before(:all) do
system("cd #{__dir__}/../docker && LOGSTASH_PIPELINE=logstash-to-elasticsearch-weak.conf docker-compose up -d") or fail "Failed to start Docker Compose with weak SSL"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since each test context grouping needs to do some variation of this, we can define helper methods to encapsulate the complexity. I also prefer to use the long version of flags in checked-in source (e.g., --detach and --volumes instead of -d and -v) since it makes it easier to understand the intention.

  def docker_compose_up(env={}) = docker_compose_invoke("up --detach", env)

  def docker_compose_down(env={}) = docker_compose_invoke("down --volumes", env)

  def docker_compose_invoke(subcommand, env={})
    env_str = env.map{ |k,v| "#{k.to_s.upcase}=#{Shellwords.escape(v)} "}.join
    command = "#{env_str}docker-compose #{subcommand}"
    work_dir = Pathname.new("__dir__/../docker").cleanpath

    system("cd #{Shellwords.escape(workdir} && #{command}") or fail "Failed to invoke Docker Compose with command `#{command}` in directory `#{work_dir}`"
  end

And I think we can use docker-compose's --project-directory to set the working directory and avoid the &&-chaining:

  def docker_compose_invoke(subcommand, env={})
    env_str = env.map{ |k,v| "#{k.to_s.upcase}=#{Shellwords.escape(v)} "}.join
    work_dir = Pathname.new("__dir__/../docker").cleanpath

    command = "#{env_str}docker-compose --project-directory=#{Shellwords.escape(work_dir)} #{subcommand}"

    system(command) or fail "Failed to invoke Docker Compose with command `#{command}`"
  end

But either would make this line look like:

Suggested change
system("cd #{__dir__}/../docker && LOGSTASH_PIPELINE=logstash-to-elasticsearch-weak.conf docker-compose up -d") or fail "Failed to start Docker Compose with weak SSL"
docker_compose_up(logstash_pipeline: 'logstash-to-elasticsearch-weak.conf')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion. Incorporated.

Comment on lines 3 to 4
# Generate this message indefinitely to give ES container time to come online
count => -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory we shouldn't need to generate more than one, since the ES output is designed to retry its batch of events until all of them have been explicitly rejected by elasticseaerch (e.g., with a successful HTTP 2XX response from the bulk API containing individual rejections).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah i had a wrong assumption earlier. You are right here and actually this will save quite a bit of headache. Thanks!

donoghuc added 6 commits May 22, 2025 14:38
This commit shows the proposed pattern for adding acceptance testing for the
observability SRE image. This will run when exhaustive tests run. A new gradle
task will hook in to rspec similar to how it is done for the smoke tests. The
main difference is that instead of building a container, the latest is pulled
from the container registry and run on a fips configured host VM.
…tests

This commit shows the rough structure for how I am planning on handling docker
compose networks for acceptance tests. The main idea is to use interpolation in
the docker compose file to point to different configuration files for
filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests
showing behavior when the system is and is not configured properly for FIPS. The
breakdown in responsibility is:

1. Gradle handles cert generation (similar to smoke test, this avoids checking
in PKI)
2. Rspec handles stopping/starting docker compose and managing environment vars
for intperolation in docker compose manifests (different from smoke tests where
a single static docker compose is started in gradle)
3. Rspec handles deciding when containers are ready and querying state about
data flowing through the system
4. Gradle cleans up certs

THis is just a rough sketch, there are still bugs to be worked out but before i
get too far in to it I want to get the idea out there.
This commit adds a test to show that data will not flow from LS to ES
when weak non fips config is used.
This will be handled separately in a separate PR, but taking this
commit for now on this branch.
The latest ES images do not require this workaround.
1. Remove rogue character from test file causing interpreter failure
2. Split out helpers for docker compose orchestration
3. Only send a single message instead of infinite through to ES
@donoghuc donoghuc force-pushed the fedramp-high-acceptance-tests branch from a257cad to a0b1f8e Compare May 23, 2025 18:51
@donoghuc
Copy link
Member Author

donoghuc added 2 commits May 30, 2025 14:17
As described in elastic/ingest-dev#5471 this commit
adds a test for filebeat sending data through logstash to elasticsearch using
fips config.
This test ensures logstash will not accept data from filebeat when using weak
tls configuration.

See elastic/ingest-dev#5472
@donoghuc donoghuc force-pushed the fedramp-high-acceptance-tests branch from d8b1980 to d39a080 Compare May 30, 2025 21:22
@donoghuc
Copy link
Member Author

Crytpo is actually kind of a funny.
@@ -42,7 +42,7 @@ def before_bootstrap_checks(runner)
# ensure Bouncycastle is configured and ready
begin
if Java::org.bouncycastle.crypto.CryptoServicesRegistrar.isInApprovedOnlyMode
accumulator.success "Bouncycastle Crytpo is in `approved-only` mode"
accumulator.success "Bouncycastle Crypto is in `approved-only` mode"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit @robbavey for eagle eye 🦅

@donoghuc
Copy link
Member Author

donoghuc commented Jun 3, 2025

Somehow I lost d8b1980 I just re-introduced and kicked off a fresh test https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1938

@donoghuc
Copy link
Member Author

donoghuc commented Jun 3, 2025

Green!

2025-06-03 13:54:32 PDT | > Task :logstash-xpack:observabilitySREacceptanceTests
  | 2025-06-03 13:54:32 PDT |  
  | 2025-06-03 13:54:32 PDT | Finished in 5 minutes 11 seconds (files took 0.34338 seconds to load)
  | 2025-06-03 13:54:32 PDT | 4 examples, 0 failures
  | 2025-06-03 13:54:32 PDT |  

https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1938

@donoghuc donoghuc force-pushed the fedramp-high-acceptance-tests branch from fb096e0 to 721e13b Compare June 6, 2025 17:48
@donoghuc
Copy link
Member Author

donoghuc commented Jun 6, 2025

Kicked off a build: https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1947

Locally this is good:

➜  logstash git:(fedramp-high-acceptance-tests) ./gradlew observabilitySREacceptanceTests
> Task :logstash-xpack:observabilitySREacceptanceTests

    Finished in 3 minutes 20 seconds (files took 0.27663 seconds to load)
    4 examples, 0 failures


org.logstash.xpack.test.RSpecObservabilitySREAcceptanceTests > rspecTests PASSED

--------------------------------------------------------------------
|  Results: SUCCESS (1 tests, 1 successes, 0 failures, 0 skipped)  |
--------------------------------------------------------------------

Use the same buildkite agent script for setting up a vm based runner as other pipes
@donoghuc
Copy link
Member Author

donoghuc commented Jun 6, 2025

Ugh, i had lost a1504c4 again. Re-kicked the buildkite validation https://buildkite.com/elastic/logstash-exhaustive-tests-pipeline/builds/1948

Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the elasticsearch.yml config, feel free to merge if that setting is not required

discovery.type: single-node
http.port: 9200
network.host: 0.0.0.0
# Security settings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does xpack.security.fips_mode.enabled need to be set to true, or is this not required with the the elasticsearch-cloud-ess-fips docker image?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add this as investigation in https://github.com/elastic/ingest-dev/issues/5320 (added a point in google doc to track this down).

@donoghuc donoghuc merged commit 10e41a8 into elastic:8.19 Jun 6, 2025
7 checks passed
@donoghuc
Copy link
Member Author

donoghuc commented Jun 6, 2025

Buildkite was green and I added the last question to another ticket #17623 (comment) Will see how this does over the weekend. I expect it to stay green

yaauie pushed a commit to yaauie/logstash that referenced this pull request Jul 9, 2025
elastic#17623)

* Add a step to exhaustive tests for observabilitySRE accetpance testing

This commit shows the proposed pattern for adding acceptance testing for the
observability SRE image. This will run when exhaustive tests run. A new gradle
task will hook in to rspec similar to how it is done for the smoke tests. The
main difference is that instead of building a container, the latest is pulled
from the container registry and run on a fips configured host VM.

* WIP: Idea for how to handle multipe container configs for acceptance tests

This commit shows the rough structure for how I am planning on handling docker
compose networks for acceptance tests. The main idea is to use interpolation in
the docker compose file to point to different configuration files for
filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests
showing behavior when the system is and is not configured properly for FIPS. The
breakdown in responsibility is:

1. Gradle handles cert generation (similar to smoke test, this avoids checking
in PKI)
2. Rspec handles stopping/starting docker compose and managing environment vars
for intperolation in docker compose manifests (different from smoke tests where
a single static docker compose is started in gradle)
3. Rspec handles deciding when containers are ready and querying state about
data flowing through the system
4. Gradle cleans up certs

THis is just a rough sketch, there are still bugs to be worked out but before i
get too far in to it I want to get the idea out there.

* Add tests describing behavior of LS -> ES with non-fips config

This commit adds a test to show that data will not flow from LS to ES
when weak non fips config is used.

* Use latest ES image

This will be handled separately in a separate PR, but taking this
commit for now on this branch.

* Remove custom entrypoint from new container

The latest ES images do not require this workaround.

* Take up code review suggestions

1. Remove rogue character from test file causing interpreter failure
2. Split out helpers for docker compose orchestration
3. Only send a single message instead of infinite through to ES

* Add full prefix name for new image

* Test filebeat -> LS -> ES using fips config

As described in elastic/ingest-dev#5471 this commit
adds a test for filebeat sending data through logstash to elasticsearch using
fips config.

* Test LS wont accept input from non fips configured filebeat

This test ensures logstash will not accept data from filebeat when using weak
tls configuration.

See elastic/ingest-dev#5472

* Fix a funny typo.

Crytpo is actually kind of a funny.

* Ensure we are using the purpose build ES image in testing

Similar to elastic#17627

* Ensure JAVA_HOME is set etc

Use the same buildkite agent script for setting up a vm based runner as other pipes
yaauie pushed a commit to yaauie/logstash that referenced this pull request Jul 14, 2025
elastic#17623)

* Add a step to exhaustive tests for observabilitySRE accetpance testing

This commit shows the proposed pattern for adding acceptance testing for the
observability SRE image. This will run when exhaustive tests run. A new gradle
task will hook in to rspec similar to how it is done for the smoke tests. The
main difference is that instead of building a container, the latest is pulled
from the container registry and run on a fips configured host VM.

* WIP: Idea for how to handle multipe container configs for acceptance tests

This commit shows the rough structure for how I am planning on handling docker
compose networks for acceptance tests. The main idea is to use interpolation in
the docker compose file to point to different configuration files for
filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests
showing behavior when the system is and is not configured properly for FIPS. The
breakdown in responsibility is:

1. Gradle handles cert generation (similar to smoke test, this avoids checking
in PKI)
2. Rspec handles stopping/starting docker compose and managing environment vars
for intperolation in docker compose manifests (different from smoke tests where
a single static docker compose is started in gradle)
3. Rspec handles deciding when containers are ready and querying state about
data flowing through the system
4. Gradle cleans up certs

THis is just a rough sketch, there are still bugs to be worked out but before i
get too far in to it I want to get the idea out there.

* Add tests describing behavior of LS -> ES with non-fips config

This commit adds a test to show that data will not flow from LS to ES
when weak non fips config is used.

* Use latest ES image

This will be handled separately in a separate PR, but taking this
commit for now on this branch.

* Remove custom entrypoint from new container

The latest ES images do not require this workaround.

* Take up code review suggestions

1. Remove rogue character from test file causing interpreter failure
2. Split out helpers for docker compose orchestration
3. Only send a single message instead of infinite through to ES

* Add full prefix name for new image

* Test filebeat -> LS -> ES using fips config

As described in elastic/ingest-dev#5471 this commit
adds a test for filebeat sending data through logstash to elasticsearch using
fips config.

* Test LS wont accept input from non fips configured filebeat

This test ensures logstash will not accept data from filebeat when using weak
tls configuration.

See elastic/ingest-dev#5472

* Fix a funny typo.

Crytpo is actually kind of a funny.

* Ensure we are using the purpose build ES image in testing

Similar to elastic#17627

* Ensure JAVA_HOME is set etc

Use the same buildkite agent script for setting up a vm based runner as other pipes
yaauie pushed a commit to yaauie/logstash that referenced this pull request Jul 15, 2025
elastic#17623)

* Add a step to exhaustive tests for observabilitySRE accetpance testing

This commit shows the proposed pattern for adding acceptance testing for the
observability SRE image. This will run when exhaustive tests run. A new gradle
task will hook in to rspec similar to how it is done for the smoke tests. The
main difference is that instead of building a container, the latest is pulled
from the container registry and run on a fips configured host VM.

* WIP: Idea for how to handle multipe container configs for acceptance tests

This commit shows the rough structure for how I am planning on handling docker
compose networks for acceptance tests. The main idea is to use interpolation in
the docker compose file to point to different configuration files for
filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests
showing behavior when the system is and is not configured properly for FIPS. The
breakdown in responsibility is:

1. Gradle handles cert generation (similar to smoke test, this avoids checking
in PKI)
2. Rspec handles stopping/starting docker compose and managing environment vars
for intperolation in docker compose manifests (different from smoke tests where
a single static docker compose is started in gradle)
3. Rspec handles deciding when containers are ready and querying state about
data flowing through the system
4. Gradle cleans up certs

THis is just a rough sketch, there are still bugs to be worked out but before i
get too far in to it I want to get the idea out there.

* Add tests describing behavior of LS -> ES with non-fips config

This commit adds a test to show that data will not flow from LS to ES
when weak non fips config is used.

* Use latest ES image

This will be handled separately in a separate PR, but taking this
commit for now on this branch.

* Remove custom entrypoint from new container

The latest ES images do not require this workaround.

* Take up code review suggestions

1. Remove rogue character from test file causing interpreter failure
2. Split out helpers for docker compose orchestration
3. Only send a single message instead of infinite through to ES

* Add full prefix name for new image

* Test filebeat -> LS -> ES using fips config

As described in elastic/ingest-dev#5471 this commit
adds a test for filebeat sending data through logstash to elasticsearch using
fips config.

* Test LS wont accept input from non fips configured filebeat

This test ensures logstash will not accept data from filebeat when using weak
tls configuration.

See elastic/ingest-dev#5472

* Fix a funny typo.

Crytpo is actually kind of a funny.

* Ensure we are using the purpose build ES image in testing

Similar to elastic#17627

* Ensure JAVA_HOME is set etc

Use the same buildkite agent script for setting up a vm based runner as other pipes
yaauie added a commit that referenced this pull request Jul 16, 2025
…in (#17785)

* forward-port observabilitySRE image creation into `main`

This is the CLEAN subset of a cherry-pick of the merge-commit from the
observabilitySRE feature branch into 8.x in PR #17541 (0b1d299),
OMITTING changes to `docker/*` and `rakelib/artifacts.rake` that would
conflict due to substantial refactorings on `main`.

* forward-port observabilitySRE image creation into `main` (re-implament)

This is a forward-port of _functionality_ from the observabilitySRE feature
branch into 8.x in PR #17541 (0b1d299),
wholly re-implementing the changes in `docker/*` and `rakelib/artifacts.rake`
from the 8.x-style docker structure to the refactored structure present
on `main`.

* Fix pull request pipeline definition for buildkite (#17552)

When the fedramp high feature branch was merged into 8.x the PR pipeline
accidentally duplicated the top level `steps` key. This was a mistake and is
causing issues generating exhaustive test pipeline definition. This commit fixes
the bug by ensuring there is a single `steps` key that defines all the steps in
the pipeline.

* Ensure observabilitySRE image is pushed on DRA staging (#17569)

The `artifactDockerObservabilitySRE` gradle task *always* produces a tag with a
`SNAPSHOT` postfix. In the staging pipeline we use the shared
`qualified-version` script for determining the LS version. That script correctly
handles conditionally adding a `SNAPSHOT` postfix which is important for the
tagging scheme for pushing to our container registry. Given the intermediate tag
produced by the gradle task is never pushed anywhere we can update the build
script to ensure the "local" artifact is always referenced with the `SNAPSHOT`
postfix.

* Use dedicated elasticsearch image for observabilitySRE smoke testing (#17627)

* Use dedicated elasticsearch image for observabilitySRE smoke testing

The ES team has started publishing a purpose built image for the fedramp high
project. Update our smoke test stack to use this container.

* Override default entrypoint into elasticsearch container

The new image does not provide the stub `/app/elasticsearch.sh` file
https://github.com/elastic/elasticsearch/blob/1a1763c591c4c32bf66f0df3bce2040e8f19a1a2/distribution/docker/README.md?plain=1#L16-L19
previously available. This commit overrides the entrypoint to avoid needing that
file. See: https://github.com/elastic/elasticsearch/blob/1a1763c591c4c32bf66f0df3bce2040e8f19a1a2/distribution/docker/src/docker/Dockerfile.ess#L38C5-L40C37

* Remove entrypoint workaround due to fix landing upstream

* Restore code review changes (#17539)

* Comment to clarify why FIPS flag is not needed for smoke tests

* Use full versions of docker commands for readability

* Simplify grock pattern match

The grok pattern is unanchored-by-default, we don't need the leading and trailing
wildcards.

* Add a step to exhaustive tests for observabilitySRE accetpance testing (#17623)

* Add a step to exhaustive tests for observabilitySRE accetpance testing

This commit shows the proposed pattern for adding acceptance testing for the
observability SRE image. This will run when exhaustive tests run. A new gradle
task will hook in to rspec similar to how it is done for the smoke tests. The
main difference is that instead of building a container, the latest is pulled
from the container registry and run on a fips configured host VM.

* WIP: Idea for how to handle multipe container configs for acceptance tests

This commit shows the rough structure for how I am planning on handling docker
compose networks for acceptance tests. The main idea is to use interpolation in
the docker compose file to point to different configuration files for
filebeat/logstash/elasticsearch. This is mainly due to the nature of these tests
showing behavior when the system is and is not configured properly for FIPS. The
breakdown in responsibility is:

1. Gradle handles cert generation (similar to smoke test, this avoids checking
in PKI)
2. Rspec handles stopping/starting docker compose and managing environment vars
for intperolation in docker compose manifests (different from smoke tests where
a single static docker compose is started in gradle)
3. Rspec handles deciding when containers are ready and querying state about
data flowing through the system
4. Gradle cleans up certs

THis is just a rough sketch, there are still bugs to be worked out but before i
get too far in to it I want to get the idea out there.

* Add tests describing behavior of LS -> ES with non-fips config

This commit adds a test to show that data will not flow from LS to ES
when weak non fips config is used.

* Use latest ES image

This will be handled separately in a separate PR, but taking this
commit for now on this branch.

* Remove custom entrypoint from new container

The latest ES images do not require this workaround.

* Take up code review suggestions

1. Remove rogue character from test file causing interpreter failure
2. Split out helpers for docker compose orchestration
3. Only send a single message instead of infinite through to ES

* Add full prefix name for new image

* Test filebeat -> LS -> ES using fips config

As described in elastic/ingest-dev#5471 this commit
adds a test for filebeat sending data through logstash to elasticsearch using
fips config.

* Test LS wont accept input from non fips configured filebeat

This test ensures logstash will not accept data from filebeat when using weak
tls configuration.

See elastic/ingest-dev#5472

* Fix a funny typo.

Crytpo is actually kind of a funny.

* Ensure we are using the purpose build ES image in testing

Similar to #17627

* Ensure JAVA_HOME is set etc

Use the same buildkite agent script for setting up a vm based runner as other pipes

---------

Co-authored-by: Cas Donoghue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants