Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit #738

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

ThaSami
Copy link

@ThaSami ThaSami commented Feb 20, 2025

What this PR does / why we need it:

This PR fixes an issue where experiments running SSM by tag sometimes report misleading IAM permission errors. The root cause is twofold:

  1. The implementation of DescribeInstanceInformation did not handle pagination properly. By default, the API returns limited results per page, so if target instances reside on subsequent pages, they won't be found in the first API response—resulting in erroneous permission error messages.

  2. The code was not properly handling AWS API rate limits, causing failures during high-volume operations when throttling occurred.

The fix:

  • Updates the code to use DescribeInstanceInformationPages to properly iterate through all pages and correctly aggregate instance information
  • Implements exponential backoff with jitter when encountering AWS API rate limits

These changes ensure reliable operation even with large numbers of instances and during periods of high API load.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
fixes #737

Special notes for your reviewer:

Checklist:

  • Fixes #
  • PR messages has document related information
  • Labelled this PR & related issue with breaking-changes tag
  • PR messages has breaking changes related information
  • Labelled this PR & related issue with requires-upgrade tag
  • PR messages has upgrade related information
  • Commit has unit tests
  • Commit has integration tests
  • E2E run Required for the changes

@ThaSami ThaSami changed the title Fix: handle pagination in ssm describe Fix: handle pagination in ssm describeInstanceInformation Feb 20, 2025
@ThaSami ThaSami changed the title Fix: handle pagination in ssm describeInstanceInformation Fix: handle pagination in ssm describeInstanceInformation & API Rate Limit Mar 17, 2025
@neelanjan00
Copy link
Member

Tagging @uditgaurav for a codeowner review.

Co-authored-by: Neelanjan Manna <[email protected]>
Signed-off-by: Sami Shabaneh <[email protected]>
Signed-off-by: Sami Shabaneh <[email protected]>
@oba11
Copy link

oba11 commented Mar 19, 2025

Thank you @ThaSami for this PR. Alot of other folks will benefit from this, I'm looking forward to it getting merged by the litmus maintainers soon 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SSM by Tag: Misleading IAM Permission Errors and Rate Limiting Issues
3 participants