You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Configure [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html) to send `s3:ObjectCreated:*` events with specified prefix to SQS
10
10
2. The S3 connector discovers new files via `ObjectCreated` S3 events in AWS SQS.
11
-
3. The files' metadata are persisted in RocksDB in the checkpoint location together with Spark Structure streaming engine maintained offset. This ensures that data is ingested exactly once. (End to end exactly once requires the data sink to be idempotent.)
11
+
3. The files' metadata are persisted in RocksDB in the checkpoint location together with Spark Structured Streaming engine maintained offset. This ensures that data is ingested exactly once. (End to end exactly once requires the data sink to be idempotent.)
12
12
4. Driver distributes the S3 file list to executors
13
13
5. Executors read the S3 files
14
-
6. After successful data sink processing, Spark Structure streaming engine commit the batch
14
+
6. After successful data sink processing, Spark Structured Streaming engine commit the batch
15
+
16
+
The RocksDB used by this connector is self-contained. The Spark structured streaming application using this connector is free to use any state store backend.
This will create *target/spark-streaming-sql-s3-connector-<versiion>.jar* file which contains the connector code and its dependencies. The jar file will also be installed to local maven repository.
25
28
26
-
Current version is compatible with spark 3.2 and above.
29
+
Current version is compatible with Spark 3.2 and above.
## How to use S3 event notifications for multiple applications
208
211
209
-
If one S3 path's event notifications need to be consumed by multiple Spark Structure Streaming applications, SNS can be used to fanout to Amazon SQS queues. The message flow is S3 event notifications -> SNS -> SQS. When an S3 event notification is published to the SNS topic, Amazon SNS sends the notification to each of the subscribed SQS queues.
212
+
If one S3 path's event notifications need to be consumed by multiple Spark Structured Streaming applications, SNS can be used to fanout to Amazon SQS queues. The message flow is S3 event notifications -> SNS -> SQS. When an S3 event notification is published to the SNS topic, Amazon SNS sends the notification to each of the subscribed SQS queues.
0 commit comments