Skip to content

Commit 091bea8

Browse files
committed
Update README.md
1 parent 67408d2 commit 091bea8

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

README.md

+11-8
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,32 @@
1-
# Apache Spark Structure Streaming S3 Connector
1+
# Apache Spark Structured Streaming S3 Connector
22

3-
An Apache Spark Structure Streaming S3 connector for reading S3 files using Amazon S3 event notifications to AWS SQS.
3+
An Apache Spark Structured Streaming S3 connector for reading S3 files using Amazon S3 event notifications to AWS SQS.
44

55
## Archicture Overview
66

77
![s3-connector](./docs/images/s3-connector-overview.png)
88

99
1. Configure [Amazon S3 Event Notifications](https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html) to send `s3:ObjectCreated:*` events with specified prefix to SQS
1010
2. The S3 connector discovers new files via `ObjectCreated` S3 events in AWS SQS.
11-
3. The files' metadata are persisted in RocksDB in the checkpoint location together with Spark Structure streaming engine maintained offset. This ensures that data is ingested exactly once. (End to end exactly once requires the data sink to be idempotent.)
11+
3. The files' metadata are persisted in RocksDB in the checkpoint location together with Spark Structured Streaming engine maintained offset. This ensures that data is ingested exactly once. (End to end exactly once requires the data sink to be idempotent.)
1212
4. Driver distributes the S3 file list to executors
1313
5. Executors read the S3 files
14-
6. After successful data sink processing, Spark Structure streaming engine commit the batch
14+
6. After successful data sink processing, Spark Structured Streaming engine commit the batch
15+
16+
The RocksDB used by this connector is self-contained. The Spark structured streaming application using this connector is free to use any state store backend.
1517

1618
## How to build
1719
**Prerequisite**: [install Rocksdb](https://github.com/facebook/rocksdb/blob/main/INSTALL.md)
1820

19-
Clone spark-sql-kinesis from the source repository on GitHub.
21+
Clone `spark-streaming-sql-s3-connector` from the source repository on GitHub.
2022

2123
```
24+
git clone https://github.com/aws-samples/spark-streaming-sql-s3-connector.git
2225
mvn clean install -DskipTests
2326
```
2427
This will create *target/spark-streaming-sql-s3-connector-<versiion>.jar* file which contains the connector code and its dependencies. The jar file will also be installed to local maven repository.
2528

26-
Current version is compatible with spark 3.2 and above.
29+
Current version is compatible with Spark 3.2 and above.
2730

2831
## How to test
2932

@@ -182,7 +185,7 @@ spark-submit --class pt.spark.sql.streaming.connector.DataGenerator --jars ~/spa
182185
```
183186

184187
## How to configure
185-
Spark Structure Streaming S3 connector supports the following settings.
188+
Spark Structured Streaming S3 connector supports the following settings.
186189

187190
Name | Default | Description
188191
--- |:----------------------------------------| ---
@@ -206,7 +209,7 @@ spark.s3conn.sqs.keepMessageForConsumerError| false
206209

207210
## How to use S3 event notifications for multiple applications
208211

209-
If one S3 path's event notifications need to be consumed by multiple Spark Structure Streaming applications, SNS can be used to fanout to Amazon SQS queues. The message flow is S3 event notifications -> SNS -> SQS. When an S3 event notification is published to the SNS topic, Amazon SNS sends the notification to each of the subscribed SQS queues.
212+
If one S3 path's event notifications need to be consumed by multiple Spark Structured Streaming applications, SNS can be used to fanout to Amazon SQS queues. The message flow is S3 event notifications -> SNS -> SQS. When an S3 event notification is published to the SNS topic, Amazon SNS sends the notification to each of the subscribed SQS queues.
210213

211214
## Security
212215

0 commit comments

Comments
 (0)