-
Notifications
You must be signed in to change notification settings - Fork 13
Kafka
DRAFT STATUS, WILL BE FILLED IN WHEN THIS COMPONENT BECOMES SUPPORTED
Website | https://kafka.apache.org/090/documentation.html/ |
Supported versions | KafkaClient: 0.9.0.1 |
Scala: 2.11 | |
Current responsible(s) | Jürgen Jakobitsch @ SWC -- [email protected] |
Docker image(s) | bde2020/kafka:latest |
bde2020/docker-kafkasail:latest (Example App) | |
More info | http://kafka.apache.org/documentation.html#introduction |
Apache Kafka is a distributed publish subscribe messaging system. Originally it was developed within LinkedIn and later donated to the Apache Software Foundation. It can be scaled without downtime. Messages are persisted on disk in a distributed transaction log to prevent data loss. Apache Kafka's messages are categorized in topics to which is possible to subscribe. Subscription to said topics can be done in two ways depending in the use case: first as a consumer group, where every message is processed by one member of the group of consumers, which will ensure that messages are only processed once and second as distinct consumers, where every message will be consumed by every single consumer. The second option can for example be used to distribute messages (e.g. data entries) among several databases.
An implementation of the OpenRDF's SAIL API is provided as an example in bde2020/docker-kafkasail. This implementation extends OpenRDF's Memory Store (a triple store implementation) writes RDF Statements not directly into the underlying triple store but creates Apache Kafka Messages (lists of statements) that are then consumed by every running instance of said implementation. This way inserts and deletes are propagated to any number of running OpenRDF Memory Stores via Apache Kafka, making it essentially possible to cluster OpenRDF Memory Stores.
Scaling can be achieved by simply adding new instances of the image to the running orchestration framework, which can be done as said without downtime.