-
Notifications
You must be signed in to change notification settings - Fork 13
Elasticsearch
Website | https://www.elastic.co/ |
Supported versions | 2.3 |
Current responsible(s) | Mohamed Nadjib Mami @ Uni.Bonn -- [email protected] |
Docker image(s) | bde2020/elasticsearch:latest |
More info | https://www.elastic.co/products/elasticsearch |
Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. Features include:
- Distributed and Highly Available Search Engine.
- Each index is fully sharded with a configurable number of shards.
- Each shard can have one or more replicas.
- Read / Search operations performed on any of the replica shards.
- Multi Tenant with Multi Types.
- Support for more than one index.
- Support for more than one type per index.
- Index level configuration (number of shards, index storage, ...).
- Various set of APIs
- HTTP RESTful API
- Native Java API.
- All APIs perform automatic node operation rerouting.
- Document oriented
- No need for upfront schema definition.
- Schema can be defined per type for customization of the indexing process.
- Reliable, Asynchronous Write Behind for long term persistency.
- (Near) Real Time Search.
- Built on top of Lucene
- Each shard is a fully functional Lucene index
- All the power of Lucene easily exposed through simple configuration / plugins.
- Per operation consistency
- Single document level operations are atomic, consistent, isolated and durable.
- Open Source under the Apache License, version 2 ("ALv2")
(From: https://github.com/elastic/elasticsearch)
Add the following services to your docker-compose.yml
to integrate an Elasticsearch instance in your BDE pipeline:
elasticsearch:
image: elasticsearch:2.3
command: elasticsearch -Des.network.host=0.0.0.0
ports:
- "9200:9200"
- "9300:9300"
elasticsearch-mapping-init:
environment:
- file_url=https://raw.githubusercontent.com/big-data-europe/pilot-sc4-flink-kafka-consumer/master/elasticsearch_fcd_mapping.json
- index_name=thessaloniki
- mappings_name=floating-cars
build:
context: .
links:
- elasticsearch
In addition to Elasticsearch version (for example, 2.3 used above), set the values of the following variables (see environment
above):
-
file_url
: the link to the JSON file containing the mappings definition (currently, the file must exist online, so the expected value should look like: http(s)://example.com/path/to/file.json). -
index_name
: give your index a name -
mappings_name
: give your mappings a name
Simply run the following command (of course, Docker and Docker Compose are assumed being installed a priori):
sudo docker-compose up -d
In order to verify your installation is properly working, submit the following HTTP request:
http://localhost:9200
If it returns a JSON object, something starting with:
{
"name" : "Allison Blaire",
...
... then your Elasticsearch instance is up and running.
Next, to check if your mappings have been successfully received and validated (syntactically), submit the following HTTP request:
http://localhost:9200/{index_name}/_mapping/{mappings_name}
... replacing {index_name}
and {mappings_name}
with the values set previously in the docker-compose.yml
file.
If it returns a JSON object, something starting with:
{"{index_name}":{"mappings":{"{mappings_name}-cars":{"...
... then you are all set.
Elasticsearch is built to scale. Each index is broken down into shards, and each shard can have one or more replica. By default, an index is created with 5 shards and 1 replica per shard (5/1). There are many topologies that can be used, including 1/10 (improve search performance), or 20/1 (improve indexing performance, with search executed in a map reduce fashion across shards).