Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
ea6a2e6
Fix PaddingDecimalFormatTest
qboileau May 23, 2024
4d4540c
Add aws profile on kafka hdd deployment
qboileau May 23, 2024
8c9d44c
Update kafka ssh kafka hdd deployment terraform
qboileau May 23, 2024
dcd8f73
Fix ansible on ssd deployment
qboileau May 23, 2024
a965f93
feat: Add gateway deployment in terraform and ansible
fteychene May 23, 2024
4267459
feat: Add gateway instances in all group for ansible inventory genera…
fteychene May 23, 2024
9a968fb
Create new driver-gateway for Conduktor Gateway benchmarking separate…
qboileau May 23, 2024
39c8025
Create new workload with 100 topics and 16 partitions
qboileau May 23, 2024
3e5a258
Update gateway with driver config and deployment fixes
qboileau May 24, 2024
83848d1
Update workload
qboileau May 24, 2024
21f3452
Update gateway instance type
qboileau May 27, 2024
85d4d3a
Fix JAVA_OPTS for java 17 and add ansible wait_for_connection on all …
qboileau May 27, 2024
58a5d86
Fix README linting
qboileau May 27, 2024
480c51d
Disable platform dependent test
qboileau May 27, 2024
167fff0
Make Gateway image configurable
qboileau May 28, 2024
65b1963
Add workload 50-topic-3-partitions-1kb-2p-2c-100k.yaml
qboileau May 29, 2024
7ba6c84
Add workload config for 50 topics, 10k rate, 100b messages
Jun 5, 2024
464107c
Add workload config for 50 topics, 1 producer 1 consumer per topic
Jun 5, 2024
85e25de
Add new workload for lower volume
Jun 24, 2024
ce3fc12
Add 100 topic workload, 1000m/s, 1k msgs
Jun 25, 2024
c665577
Update terraform create KMS key for gateway encryption and use cloudi…
qboileau Jul 3, 2024
cad9801
Refactor ansible and setup gateway interceptors
qboileau Jul 3, 2024
6691a86
Update benchmark driver to support topic prefix configuration
qboileau Jul 3, 2024
c0d7edb
Update readme
qboileau Jul 3, 2024
de669a5
Fix gateway driver package and configuration
qboileau Jul 4, 2024
df5ede8
Fix ansible deploy and configuration
qboileau Jul 4, 2024
cc4c51d
Fix Config and export kms key arn to ansible variables
qboileau Jul 5, 2024
6dc1905
terraform fix warning
qboileau Jul 5, 2024
5a922cb
terraform remove profile
qboileau Jul 5, 2024
980810a
Add missing gateway interceptors template file
qboileau Jul 5, 2024
a8d2042
More permissive interceptor debug
qboileau Jul 5, 2024
3096071
Force packaing install in ansible
qboileau Jul 5, 2024
82dd174
Remove kms key alias
qboileau Jul 5, 2024
73eec49
Reduce 1t-3p-100b tests duration from 15 to 10 mins
qboileau Jul 8, 2024
758ae63
Try to fix yum package check
qboileau Jul 8, 2024
f09c0dc
Try to fix yum package check 2
qboileau Jul 8, 2024
f5e5447
Add decrypt plugin for consume benchmark
qboileau Jul 8, 2024
cb0e59f
Switch gateway log level to INFO
qboileau Jul 9, 2024
3731be9
Switch gateway log level to DEBUG
qboileau Jul 9, 2024
5af8988
Tune gateway logs levels
qboileau Jul 9, 2024
43b526c
Merge pull request #2 from conduktor/conduktor-gateway-encrypt
qboileau Jul 9, 2024
49cbce2
Add private docker registry login for gateway image
qboileau Oct 15, 2024
7d892ab
Fix typo
qboileau Oct 15, 2024
0ebb229
Merge pull request #3 from conduktor/private-registry
qboileau Oct 16, 2024
37f47db
Fix keySecretId in gateway-interceptors.json
trobert Oct 30, 2024
36b3041
Merge pull request #4 from conduktor/fix-key-id
trobert Oct 30, 2024
c833e4f
Added some new variables for AWS profiles. Updated the readme to incl…
BStarmerSmith Nov 1, 2024
a9f7719
Updated readme to include section on running aws sso login
BStarmerSmith Nov 1, 2024
ec27d57
Use better template names for export variables.
BStarmerSmith Nov 4, 2024
349a126
Merge pull request #5 from conduktor/ops-494-fix-issues-with-setting-…
BStarmerSmith Nov 4, 2024
01fc915
mvn spotless:apply
trobert Nov 15, 2024
93738d6
Merge pull request #7 from conduktor/fix-readme
trobert Nov 15, 2024
3bdba19
Changed default profile to empty string. If this string is empty when…
BStarmerSmith Nov 15, 2024
6781ccc
Merge pull request #8 from conduktor/fix_aws_profile_change
BStarmerSmith Nov 15, 2024
96055bf
Added command to Ansible playbook to install pygal. Which is used by …
BStarmerSmith Nov 18, 2024
650509d
Merge pull request #9 from conduktor/ops-547-fix-missing-python-lib-f…
BStarmerSmith Nov 18, 2024
9ce523a
Added pip to Ansible setup step.
BStarmerSmith Nov 18, 2024
5b2f1ad
Merge pull request #10 from conduktor/b_add_pip_to_ansible_setup
BStarmerSmith Nov 18, 2024
d32f490
Bumped all download/upload-artifact actions to be > v3.
BStarmerSmith Nov 21, 2024
6f6d260
Merge pull request #11 from conduktor/ops-498-fix-gh-downloadupload-a…
BStarmerSmith Nov 21, 2024
4fb18f0
Switched the GW logging to INFO from DEBUG
BStarmerSmith Feb 3, 2025
cbc7268
Merge pull request #12 from conduktor/switch_logging_to_info
BStarmerSmith Feb 3, 2025
ecaa1f5
Added more tags to the benchmark, as well as a cost_center tag to mon…
BStarmerSmith Feb 19, 2025
06b5985
Fixed wrong variable being used.
BStarmerSmith Feb 21, 2025
f054149
Merge pull request #13 from conduktor/ops-550-increase-monitoring-ben…
BStarmerSmith Feb 21, 2025
e99153d
Make it easy to run performance tests locally
caoilte-conduktor Apr 8, 2025
c1448b7
Merge pull request #14 from conduktor/caoilte/run-locally
caoilte-conduktor Apr 8, 2025
ee3acfd
manually override acl enabled
jpalmerr Jun 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr-build-and-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ jobs:
mkdir test-results
find . -type d -name "*surefire*" -exec cp --parents -R {} test-results/ \;
zip -r test-results.zip test-results
- uses: actions/upload-artifact@v3
- uses: actions/upload-artifact@v4
name: upload test-results
if: failure()
with:
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,12 @@ target
*.svg
core/*.json
*.json
!**/templates/*.json
*.retry
*.pem
*.hcl
**/inventory.txt
**/tf_ansible_vars_file.yml
**/.terraform
**/terraform.tfstate
**/terraform.tfstate.backup
Expand Down
3 changes: 3 additions & 0 deletions .sdkmanrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Enable auto-env through the sdkman_auto_env config
# Add key=value pairs of SDKs to use below
java=17.0.14-tem
5 changes: 5 additions & 0 deletions benchmark-framework/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,11 @@
<artifactId>driver-bookkeeper</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>driver-gateway</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>driver-jms</artifactId>
Expand Down
7 changes: 6 additions & 1 deletion benchmark-framework/src/main/resources/log4j2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,12 @@ Configuration:
size: 100MB
DefaultRollOverStrategy:
max: 10
Loggers:
Loggers:
logger:
- name: io.openmessaging.benchmark
level: info
- name: io.openmessaging.benchmark.driver.gateway
level: debug
Root:
level: info
additivity: false
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@

import static org.assertj.core.api.Assertions.assertThat;

import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;

class PaddingDecimalFormatTest {

@Test
@Disabled("This test seem to be platform dependent")
void format() {
PaddingDecimalFormat format = new PaddingDecimalFormat("0.0", 7);
assertThat(format.format(1L)).isEqualTo(" 1.0");
Expand Down
108 changes: 108 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
services:
zookeeper:
image: "confluentinc/cp-zookeeper:latest"
container_name: openmessaging-benchmark-zookeeper
restart: always
ports:
- "22181:2181"
environment:
ZOOKEEPER_CLIENT_PORT: 2181
healthcheck:
test: echo srvr | nc zookeeper 2181 || exit 1
retries: 20
interval: 10s
kafka-1:
image: "confluentinc/cp-kafka:latest"
container_name: openmessaging-benchmark-kafka-1
restart: always
ports:
- "19092:19092"
environment:
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_LISTENERS: "INTERNAL://kafka-1:9092,EXTERNAL://:19092"
KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka-1:9092,EXTERNAL://localhost:19092"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
KAFKA_BROKER_ID: 1
healthcheck:
test: nc -zv kafka-1 9092 || exit 1
interval: 10s
retries: 25
start_period: 20s
depends_on:
zookeeper:
condition: service_healthy
kafka-2:
image: confluentinc/cp-kafka
container_name: openmessaging-benchmark-kafka-2
hostname: kafka-2
ports:
- 19093:19093
environment:
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_LISTENERS: "INTERNAL://kafka-2:9093,EXTERNAL://:19093"
KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka-2:9093,EXTERNAL://localhost:19093"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
KAFKA_BROKER_ID: 2
healthcheck:
test: nc -zv kafka-2 9093 || exit 1
interval: 10s
retries: 25
start_period: 20s
depends_on:
zookeeper:
condition: service_healthy
kafka-3:
image: confluentinc/cp-kafka
container_name: openmessaging-benchmark-kafka-3
hostname: kafka-3
ports:
- 19094:19094
environment:
KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
KAFKA_LISTENERS: "INTERNAL://kafka-3:9094,EXTERNAL://:19094"
KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka-3:9094,EXTERNAL://localhost:19094"
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"
KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
KAFKA_BROKER_ID: 3
healthcheck:
test: nc -zv kafka-3 9094 || exit 1
interval: 10s
retries: 25
start_period: 20s
depends_on:
zookeeper:
condition: service_healthy

conduktor-gateway:
image: harbor.cdkt.dev/conduktor/conduktor-gateway:3.8.0-rc1
hostname: conduktor-gateway
container_name: openmessaging-benchmark-conduktor-gateway
environment:
KAFKA_BOOTSTRAP_SERVERS: kafka-1:9092,kafka-2:9093,kafka-3:9094
JAVA_OPTS: "-XX:UseSVE=0"
VAULT_TOKEN: vault-plaintext-root-token
ports:
- "8888:8888"
- "6969:6969"
healthcheck:
test: curl localhost:8888/health
interval: 5s
retries: 25
depends_on:
kafka-1:
condition: service_healthy
kafka-2:
condition: service_healthy
kafka-3:
condition: service_healthy
kafka-client:
image: confluentinc/cp-kafka:latest
hostname: kafka-client
container_name: openmessaging-benchmark-kafka-client
command: sleep infinity
volumes:
- ${PWD}:${PWD}
- ~/.m2:$HOME/.m2
working_dir: ${PWD}
29 changes: 29 additions & 0 deletions driver-gateway/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Conduktor Gateway benchmarks

NOTE: This is a slightly modified version of Apache Kafka Benchmarks for Conduktor Gateway

## How to run the benchmarks

Pre-requisites:
- terraform 1.8+ (older version may also work)
- ansible 2.16 (older version may also work)
- AWS CLI setup with account with permission on EC2, VPC and API Gateway

1. Create an SSH key named `kafka-aws` in `~/.ssh/` with `ssh-keygen -f ~/.ssh/kafka_aws` **without passphrase**
2. Build the`openmessaging-benchmark` project with `mvn clean install`
3. Go to the `driver-gateway/deploy/ssd-deployment` module
4. Initialize terraform with `terraform init`
5. Create the infrastructure with `terraform apply`
1. You can change the value of num_instances using -var='num_instances={"client"=5, "gateway"=4, "kafka"=2}'
2. You might also need to change the AWS profile with -var='aws_profile=your_profile'
3. With SSO the session does expire so you might need to run `aws sso login --profile your_profile` before running terraform
6. Export your harbor creds https://harbor.cdkt.dev/
1. Click your name in the top right corner and select User Profile then take your username and CLI secret
2. export REGISTRY_USERNAME=<registry login>
3. export REGISTRY_PASSWORD=<registry api token>
7. Setup nodes with `ansible-playbook --user ec2-user --inventory-file inventory.ini deploy.yaml`
8. Connect to one benchmark worker node with `ssh -i ~/.ssh/kafka_aws ec2-user@$(terraform output client_ssh_host | tr -d '"')`
9. Go to benchmark directory with `cd /opt/benchmark`
10. Run the benchmark with `sudo bin/benchmark --drivers driver-gateway/gateway-latency.yaml workloads/100-topic-4-partitions-1kb-4p-4c-500k.yaml;`
11. Download reports from nodes with `scp -i ~/.ssh/kafka_aws ec2-user@$(terraform output client_ssh_host | tr -d '"'):/opt/benchmark/*.json ./reports`

37 changes: 37 additions & 0 deletions driver-gateway/deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Gateway Deployments

There are two types of deployment found under ssd-deployment and hdd-deployment folders:

- SSD with the i3 instance type that has fast NVMe drives as local instance stores
- HDD with the d2 instance type that has fast HDD sequential IO drives as local instance stores

Customize the instance types in the terraform.tfvars.

If you choose larger instances, they come with more drives. To include those in the benchmarks you must:

- update the Ansible script to include them in the mount and filesystem tasks
- update the server.properties to include them in the logs.dir config

NOTE: When using d2 instances, the instance stores are not automatically generated. You must add them to the provision-kafka-aws.tf file.

For instructions on how to run a benchmark see the [Kafka instructions](http://openmessaging.cloud/docs/benchmarks/kafka/). Run the Terraform and Ansible commands from either the ssd-deployment or hdd-deployment folder.

## AWS Instance Types

| Instance | vCPU | RAM (GB) | Instance Store Drives | Network Baseline | Network Burst |
| -- | -- | -- | -- | -- | -- |
| i3.large | 2 | 15.25 | 1 x 475 NVMe SSD | 0.74 | 10 |
| i3.xlarge | 4 | 30.5 | 1 x 950 NVMe SSD | 1.24 | 10 |
| i3.2xlarge | 8 | 61 | 1 x 1,900 NVMe SSD | 2.48 | 10 |
| i3.4large | 16 | 122 | 2 x 1,900 NVMe SSD | 4.96 | 10 |
| i3.8xlarge | 32 | 244 | 4 x 1,900 NVMe SSD | 10 | 10 |
| d2.xlarge | 4 | 30.5 | 3 x 2 TB HDD | 1.24 | 1.24 |
| d2.2xlarge | 8 | 61 | 6 x 2 TB HDD | 2.48 | 2.48 |
| d2.4large | 16 | 122 | 12 x 2 TB HDD | 4.96 | 4.96 |
| d2.8xlarge | 36 | 244 | 24 x 2 TB HDD | 9.85 | 9.85 |

## Compression

When using 4 client VMs or less you may see lower throughput when using compression. Compression is performed by the producers and consumers only (when using defaults) and clients need to be spread across more VMs to see any throughput gains.

Obviously, throughput may not be your primary goal when using compression.
2 changes: 2 additions & 0 deletions driver-gateway/deploy/ssd-deployment/.tool-versions
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
terraform 1.8.3
ansible 2.16.6
18 changes: 18 additions & 0 deletions driver-gateway/deploy/ssd-deployment/ansible.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[defaults]
interpreter_python = /usr/bin/python3
private_key_file=~/.ssh/kafka_aws
remote_user=ec2-user
# ansible optimization https://www.redhat.com/sysadmin/faster-ansible-playbook-execution
forks=15
callbacks_enabled = timer, profile_tasks, profile_roles
pipelining=true
host_key_checking=false

[privilege_escalation]
become=true
become_method=sudo
become_user=root

[ssh_connection]
ssh_args=-o ServerAliveInterval=60 -o ControlMaster=auto -o ControlPersist=60s
retries=10
Loading