Description
We are have begun load testing the lambda-redshift-loader in Production with a data source that is identical to our old process we must sunset. We noticed that when we compared our load test redshift table we were missing records. After investigating some specific examples, I found the following:
- The file was formatted correctly in S3 and triggered an event that started Lambda.
- The file was assigned to a batch with 22 other files (23 in total in the batch). Running "describeBatch.js" returns all 23 files with the associated S3 manifest file. However, when I view the manifest file it contains 22 files and is missing the 23rd file (last in the describeBatch.js result).
Looking at Redshift, I can see that the manifest was correctly loaded but without the file in question. There is no reference to the file in any of the stl_load tables.
- The associated batchId is in status "complete".
- LambdaRedshiftProcessedFiles returns the association between the missing file and batchId.
- The lambda doesn't fail, but occasionally an error is seen in CW logs indicating
error: The conditional request failed
{
"code": "ConditionalCheckFailedException",
"requestId": "",
"retryDelay": 25.41047740462168,
"retryable": false,
"statusCode": 400,
"time": "2022-12-22T00:22:32.534Z"
}
error:
"TableName": "LambdaRedshiftBatches",
"AttributeUpdates": {
"status": {
"Action": "PUT",
"Value": {
"S": "locked"
}
},
"lastUpdate": {
"Action": "PUT",
"Value": {
"N": "1671669428.94"
}
}
},
"Expected": {
"status": {
"AttributeValueList": [
{
"S": "open"
}
],
"ComparisonOperator": "EQ"
}
},
"ReturnValues": "ALL_NEW"
}
I regenerated the same file with a new name as a test and the missing records appeared in the load test table.
Do you have any ideas on where the error could exist?