Kafka

DRAFT STATUS, WILL BE FILLED IN WHEN THIS COMPONENT BECOMES SUPPORTED


Website	https://kafka.apache.org/090/documentation.html/
Supported versions	KafkaClient: 0.9.0.1
	Scala: 2.11
Current responsible(s)	Jürgen Jakobitsch @ SWC -- [email protected]
Docker image(s)	bde2020/kafka:latest
	bde2020/docker-kafkasail:latest (Example App)
More info	http://kafka.apache.org/documentation.html#introduction

Short description

Apache Kafka is a distributed publish subscribe messaging system. Originally it was developed within LinkedIn and later donated to the Apache Software Foundation. It can be scaled without downtime. Messages are persisted on disk in a distributed transaction log to prevent data loss. Apache Kafka's messages are categorized in topics to which is possible to subscribe. Subscription to said topics can be done in two ways depending in the use case: first as a consumer group, where every message is processed by one member of the group of consumers, which will ensure that messages are only processed once and second as distinct consumers, where every message will be consumed by every single consumer. The second option can for example be used to distribute messages (e.g. data entries) among several databases.

Example usage

An implementation of the OpenRDF's SAIL API is provided as an example in bde2020/docker-kafkasail. This implementation extends OpenRDF's Memory Store (a triple store implementation) writes RDF Statements not directly into the underlying triple store but creates Apache Kafka Messages (lists of statements) that are then consumed by every running instance of said implementation. This way inserts and deletes are propagated to any number of running OpenRDF Memory Stores via Apache Kafka, making it essentially possible to cluster OpenRDF Memory Stores.

Scaling

Scaling can be achieved by simply adding new instances of the image to the running orchestration framework, which can be done as said without downtime.

Computational frameworks
- Flink
- Spark
- Storm
Data storage
- Hadoop
- Hue HDFS File Browser
- Cassandra
- Hive
- Redis
- Virtuoso
- 4store
- PostGIS
- Zeppelin
Data acquisition
- Flume
Message passing
- Kafka
Search engines
- Elasticsearch
- Solr
Semantic components
- DEER
- EDCAT
- FOX
- GeoTriples
- Silk
- Limes
- SEMAGROW engine
- Sextant
- Strabon
- UnifiedViews

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kafka

Short description

Example usage

Scaling

Home

BDE stack

Implementing pilot on BDE stack

Implementing pilot on BDI platform

Installation

Components

Clone this wiki locally