Skip to content

Conversation

@SylvainSenechal
Copy link
Contributor

@SylvainSenechal SylvainSenechal commented Oct 6, 2025

Issue: BB-706
sdk v3 migration + backbeat client migration

@bert-e
Copy link
Contributor

bert-e commented Oct 6, 2025

Hello sylvainsenechal,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@codecov
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

❌ Patch coverage is 62.58597% with 272 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.24%. Comparing base (66c19b0) to head (19e0184).
⚠️ Report is 38 commits behind head on development/9.1.

Files with missing lines Patch % Lines
...xtensions/replication/tasks/MultipleBackendTask.js 59.18% 60 Missing ⚠️
extensions/replication/utils/SetupReplication.js 4.76% 60 Missing ⚠️
lib/BackbeatMetadataProxy.js 30.23% 30 Missing ⚠️
extensions/replication/tasks/CopyLocationTask.js 72.16% 27 Missing ⚠️
extensions/replication/management.js 4.16% 23 Missing ⚠️
lib/queuePopulator/IngestionProducer.js 69.86% 22 Missing ⚠️
extensions/lifecycle/management.js 5.88% 16 Missing ⚠️
lib/clients/utils.js 65.38% 9 Missing ⚠️
lib/credentials/CredentialsManager.js 75.75% 8 Missing ⚠️
extensions/utils/VaultClientWrapper.js 16.66% 5 Missing ⚠️
... and 5 more
Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/gc/tasks/GarbageCollectorTask.js 87.28% <100.00%> (+0.10%) ⬆️
...ecycle/bucketProcessor/LifecycleBucketProcessor.js 79.87% <100.00%> (+0.38%) ⬆️
.../lifecycle/tasks/LifecycleColdStatusArchiveTask.js 89.79% <100.00%> (ø)
extensions/lifecycle/tasks/LifecycleTaskV2.js 88.88% <ø> (ø)
lib/clients/ClientManager.js 66.66% <100.00%> (+1.96%) ⬆️
lib/credentials/AccountCredentials.js 64.10% <100.00%> (+0.94%) ⬆️
lib/credentials/RoleCredentials.js 98.00% <100.00%> (+0.32%) ⬆️
lib/tasks/BackbeatTask.js 97.36% <ø> (ø)
...sions/lifecycle/tasks/LifecycleDeleteObjectTask.js 92.85% <93.33%> (+0.34%) ⬆️
extensions/lifecycle/tasks/LifecycleTask.js 91.54% <95.65%> (+0.02%) ⬆️
... and 13 more

... and 10 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.36% <ø> (ø)
Core Library 80.33% <62.00%> (-0.70%) ⬇️
Ingestion 70.61% <69.86%> (+0.30%) ⬆️
Lifecycle 78.28% <81.81%> (-0.07%) ⬇️
Oplog Populator 85.06% <ø> (ø)
Replication 59.94% <58.21%> (-1.25%) ⬇️
Bucket Scanner 85.76% <ø> (+0.15%) ⬆️
@@                 Coverage Diff                 @@
##           development/9.1    #2679      +/-   ##
===================================================
- Coverage            74.81%   74.24%   -0.58%     
===================================================
  Files                  201      200       -1     
  Lines                13579    13564      -15     
===================================================
- Hits                 10159    10070      -89     
- Misses                3410     3484      +74     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.36% <2.75%> (-0.14%) ⬇️
api:routes 9.18% <0.96%> (-0.13%) ⬇️
bucket-scanner 85.76% <ø> (+0.15%) ⬆️
ft_test:queuepopulator 9.00% <0.27%> (-1.33%) ⬇️
ingestion 12.50% <8.52%> (-0.06%) ⬇️
lib 7.55% <0.41%> (-0.24%) ⬇️
lifecycle 18.86% <10.59%> (-0.08%) ⬇️
notification 1.03% <0.00%> (-0.01%) ⬇️
replication 18.46% <38.23%> (-0.31%) ⬇️
unit 50.76% <19.11%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 5 times, most recently from d06be27 to 11d7667 Compare October 13, 2025 14:47
@scality scality deleted a comment from bert-e Oct 13, 2025
@scality scality deleted a comment from bert-e Oct 13, 2025
@bert-e
Copy link
Contributor

bert-e commented Oct 13, 2025

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 3 times, most recently from 5c4933f to 616c43b Compare October 14, 2025 09:58
return done(err);
}

const backbeatClient = this.getBackbeatClient(accountId);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note : There are a lot of variables, function names and possibly file names that would benefit from a rename, from backbeatClient to cloudserverClient etc.
I don't wanna do it now because it will add a lot of noise to that PR, and it's not even that straightforward to do with this silly programming language

Copy link
Contributor

@francoisferrand francoisferrand Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make it more palatable, think of it as Backbeat(Routes)Client for now...

Key: destEntry.getObjectKey(),
CanonicalID: destEntry.getOwnerId(),
// TODO : missing content length
ContentLength: partSize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed content-length. Smithy doesn't allow us to have a content length variable that we defined ourselves to be used as the header, because it's calculating the header itself.
Only thing I'm worried about is that the new ContentLength from Smithy might not be equal to the previous one : I think the new ones uses the whole request, when our old content length was partObj.getObjSize()

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 3 times, most recently from 404c946 to 897ee3e Compare October 14, 2025 14:06
// eslint-disable-next-line no-param-reassign
err.origin = 'target';
if (err.ObjNotFound || err.code === 'ObjNotFound') {
if (err.ObjNotFound || err.code === 'ObjNotFound' ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't super nice error checking 🫤

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot guaarantee the type of error there? is it always an arsenal error, or an sdk one? if sdk you should be able to use instanceof, no?
Side note: can't it also be a NoSuchVersion in this case, if the versionID is not empty?

Copy link
Contributor Author

@SylvainSenechal SylvainSenechal Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked, we can use err.name === 'ObjNotFound' (or err.code too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the side note, I wasnt able to trigger any NoSuchVersion, but I was able to trigger NoSuchBucket.
In any case, nto very important imo : the error will always be thrown, this if statement is only there to not log the error in this specific situation

.then(data => {
sourceEntry.setReplicationSiteDataStoreVersionId(this.site,
data.versionId);
// TODO : review metadata metrics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francoisferrand I think using JSON.stringify will work, I see that the function .getSerialized() in Arsenal also returns a string from JSON.stringify that is then used in this same _publishMetadataWriteMetrics, but I'm not sure that "command.input" is equivalent to the old "destReq.httpRequest.body" data we used to call it with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few occurences where this _publishMetadataWriteMetrics is an issue for me.
Actually, we can add a middleware to extract the request body like this, and pass it to the metrics just as before.
Although, I have found that here it would likely be useless, because as you can see in the command above, there is Body in the command, so the length would always be 0.
I have only found one occurence where we the request has a body

command.middlewareStack.add(
        next => async args => {
            const request = args.request as any;
            console.log("content-length:", request?.headers?.['content-length'])
            console.log("len", Buffer.byteLength(request.body as any))
            return next(args);
        },
        { step: 'finalizeRequest' }
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thought : With the old SDK, we were using destReq.httpRequest.body, but if you look at the command, there is no proper body parameter, and with the new v3 client, I tried to get the body with the middleware and it is undefined...
Anyways, probably not worth it to overthink this problem, we just need to clearly state what is the metric we want to compute first (I believe that here it's the Tags, since the api is putTagging), so we probably just want to do something like
this._publishMetadataWriteMetrics(JSON.stringify(command.input.Tags), writeStartTime)

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 2 times, most recently from 3f6b0a3 to fb1beba Compare October 15, 2025 13:50
}
if (!aborted) {

return this.backbeatClient.send(getObjectCommand)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the kind of changes that's a little bit tricky :
With this new GetObject, we cannot have the same "incomeMsg", readableStream, as we used to have before.
The body returned is of type "StreamingBlobPayloadOutputTypes", there are a few methods that we can call on it (transformToString, transformToByteArray, transformToWebStream).

Here, I choose to directly pass the returned body from getObject to _sendMultipleBackendPutObject, because the MultiBackendPutObject function should be directly capable of using the response.Body of that getObject, simplifying the code

(err.origin === 'source' &&
(err.NoSuchEntity || err.code === 'NoSuchEntity' ||
err.AccessDenied || err.code === 'AccessDenied'))) {
(err.NoSuchEntity || err.code === 'NoSuchEntity' || err.name === 'NoSuchEntity' ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is horrible but I'll call it defensive code

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 3 times, most recently from e6f1071 to 2a4168a Compare November 26, 2025 18:06
@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 3 times, most recently from 143378d to 8cacb15 Compare November 26, 2025 19:46
@SylvainSenechal SylvainSenechal changed the title Replace backbeat client with cloudserver client Replace backbeat client with cloudserver client and migration aws sdk to v3 Nov 27, 2025
@SylvainSenechal SylvainSenechal marked this pull request as ready for review November 27, 2025 09:06
accountId = res.Account;
} catch (err) {
// Workaround a Vault issue on 8.3 branch
// https://scality.atlassian.net/browse/VAULT-238
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still relevant? Please create ticket to look at it, and possibly clean it eventually.,...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 I was checking the same. We already have the ticket I guess...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the checklist

Copy link
Contributor

@francoisferrand francoisferrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still reviewing, but aleady some findings)

if (err.code === 'ObjNotFound' || err.code === 'NoSuchBucket') {
return done(err, { committable: true });
if (err.code === 'ObjNotFound' || err.code === 'NoSuchBucket' ||
err.name === 'ObjNotFound' || err.name === 'NoSuchBucket') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no actual object type, as the errors are not "parsed"?
(should be added in CloudserverClient eventually, I guess)

We should create a ticket, and already "flag" the places which would benefit for this in backbeat with something like TODO: use error types once supported in CLDSRVCLT-5

Copy link
Contributor Author

@SylvainSenechal SylvainSenechal Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I didn't pay much attention to proper smithy error generation, I'm adding this to my checklist

// because it means the object has been deleted by other means and we don't need to retry
if (err.code === 'ObjNotFound' || err.code === 'NoSuchBucket') {
return done(err, { committable: true });
if (err.code === 'ObjNotFound' || err.code === 'NoSuchBucket' ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check both code and name?
esp. since errors later check only name, does not seem consistent (either we need to check everywhere, or we can check nane only) ?

→ please review and align behavior where/if relevant


const s3 = this.clientManager.getS3Client(accountId);
if (!s3) {
const s3Client = this.clientManager.getS3Client(accountId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: renaming from s3 to s3client was not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have a proper name for clients, I probably did this because results from ctrl-f "s3" returns to many irrelevant results

logFields: { params },
actionFunc: done => s3.getBucketLifecycleConfiguration(params, done),
actionFunc: done => {
const command = new GetBucketLifecycleConfigurationCommand(params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not attach request id here?
(maybe cleanup for later, but best flag it already with a TODO and ticket)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about requestID earlier.
I think what we can do is leave it as it is with this pr (I kept request id where it was used, but didn't add it where it wasn't used).
Then follow up ticket to check if some apis should use it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but ew can already add comments pointing to this followup in the code: since it is relatively easy to identify those spots in the diff (as it is relatively small vs the whole codebase). It does not change the current PR behavior at all, but will give a starting point to look at when working on that followup ticket.


(async () => {
try {
const command = new DeleteBucketLifecycleCommand(params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing request id


(async () => {
try {
const command = new PutBucketVersioningCommand(params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing requestId

try {
const command = new PutBucketVersioningCommand(params);
await getS3Client(endpoint).send(command);
cb();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you must NEVER invoke the continuation callback from within the try block !
this creates hard to debug issues when the "followup" code (in the callback) throws an exception....and thus invokes the callback as well!

either we "fully" migrate to async/await (probably too large a change to do along with SDK), or we need to stick with promise.then()... or use utils.callbackify(), which can make this migration even simpler as it does the combining of success and failures callbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I fixed it, and found the pattern in a few other file that I'll also fix

accountId = res.Account;
} catch (err) {
// Workaround a Vault issue on 8.3 branch
// https://scality.atlassian.net/browse/VAULT-238

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 I was checking the same. We already have the ticket I guess...

Copy link
Contributor

@francoisferrand francoisferrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments, but to sumup "major" issues:

  • In quite a few places, code now calls the continuation callback from within the try block. This is very dangerous (leading to duplicate callback code & hard to debug issues) and must be not be done : use callbackify() or Promise.then() instead
  • Some code can be dedup/simplified using callbackify() (e.g. common log or metrics in resolve & reject)
  • “duplicated” CloudserverClient / S3Clent / IACClient setup, with lots of options & extra middleware : should be factorized with helper functions?
  • Errors checked with both err.code and err.name : seems weird, is this really needed?
  • RequestUID is not consistently passed ; not sure it can/should always be added, or how getSerializedLog will behave when not called from an API : but should be flagged to ease followup work
  • I still need to review the AbortController and assocaited uses (CopyLocationTask/ReplicateObject). Now that we have fixed "underlying" issues with streaming, are these "deep" rework/refactor needed? Maybe worth checking, to minimize risk of breaking stuff by rewriting if not needed...

}
return cb(null, roles[0], roles[1]);
})
.catch(err => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using utils.callbackify() instead of promise.then() allows to keep the same exact structure as before... or you can at least put the .catch block before .then

return this.backbeatSource.getMetadata(params, (err, blob) => {
if (err) {
return this.backbeatSource.send(new GetMetadataCommand(params))
.then(data => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, .catch().then() make it easier to review

};
return this.backbeatSource.getMetadata(params, (err, blob) => {
if (err) {
return this.backbeatSource.send(new GetMetadataCommand(params))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing request id

return doneOnce(err);
});
const incomingMsg = sourceReq.createReadStream();
attachReqUids(command, log);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaces with a attribute in command param?

@SylvainSenechal
Copy link
Contributor Author

SylvainSenechal commented Dec 1, 2025

Should probably create a follow up ticket (edit : created : https://scality.atlassian.net/jira/software/c/projects/OS/boards/268?selectedIssue=BB-730)

  • cleanup err.name / err.code
  • bin/ensureServiceUser line 167 https://scality.atlassian.net/browse/VAULT-238
  • smithy proper errors (to be used with instanceof)
  • Request ID for s3 sdk calls, and maybe for cloudserver client calls too ?
  • Have centralized function to create s3/cloudserver clients
  • Refactor s3/cloudserver client mocks ?

Other things to check now for this ticket :

  • tests/functional/queuePopulator/queuePopulator.spec.js recheck tests with Maha
  • Put back the old test.yml sequential test instead of matrix

@bert-e
Copy link
Contributor

bert-e commented Dec 2, 2025

Incorrect fix version

The Fix Version/s in issue BB-706 contains:

  • 9.2.0

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.1.4

Please check the Fix Version/s of BB-706, or the target
branch of this pull request.

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-706 branch 2 times, most recently from 77722a8 to 561e37e Compare December 3, 2025 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants