Elasticsearch
Settings References:
- A Heap of Trouble
- Cluster in Elasticsearch
- Designing the perfect elasticsearch cluster
- Heap Sizing
- Disabled Swapping
- Zen Discovery
- Important configuration change
- G1GC and Elaticsearch
Monitoring References:
Tool:
- A web frontend for an elastic search cluster (chrome plugin is available)
For large number of response: https://www.elastic.co/guide/en/elasticsearch/reference/6.4/search-request-scroll.html
Update the document without reindexing all: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html
Split brain:
A “split-brain” situation is when communication between nodes in the cluster fails due to either a network failure or an internal failure with one of the nodes. In this kind of scenario, more than one node might believe it is the master node, leading to a state of data inconsistency.
Garbage Collector:
-XX:+UseConcMarkSweepGC
if below 4G Heap-XX:+UseG1GC
if over 4G Heap (but there is noted that it may caused some bug for 32bit machine)
Sharding Size:
- From 0 to 4 million documents per index: 1 shard.
- From 4 to 5 million documents per index: 2 shards, so the index can still grow without causing too much problems in the future.
- With more than 5 millions documents, (number of documents / 5 million) + 1 shard.
memory_lock
is not working in k8s
Terminology
MySQL | Elasticsearch |
---|---|
Database | Index |
Table | Type |
Row | Document |
Column | Field |
Schema | Mapping |
Index | Everything is indexed |
SQL | Query DSL |
SELECT * FROM table | GET http://… |
UPDATE table SET | PUT http://… |
Nodes
- Master-eligible node
node.master: true
- responsible to cluster wide actions
- creating or deleting an index
- tracking which nodes are part of the cluster
- deciding which shards to allocate to which nodes
- don’t forget update
discovery.zen.minimum_master_nodes
- Master dedicated configuration:
node.master: true node.data: false node.ingest: false search.remote.connect: false
- responsible to cluster wide actions
- Data node
node.data: true
- contain the documents
- handle data related operations (CRUD, search and aggregation)
- Data dedicated configuration:
node.master: false node.data: true node.ingest: false search.remote.connect: false
- Ingest node
node.ingest: true
- apply an ingest pipeline to a document in order to transform and enrich the document before indexing
- Ingest dedicated configuration:
node.master: false node.data: false node.ingest: true search.remote.connect: false
- Coordinating node
node.master: false
node.data: false
node.ingest: false
- forwards the request to data node
- only route requests, handle the search reduce phase, and distribute bulk indexing
- behave as smart load balancers.
- Coordinating dedicated configuration:
node.master: false node.data: false node.ingest: false search.remote.connect: false
Install
Install Homebrew
brew install elasticsearch
# config location
cat /usr/local/etc/elasticsearch/elasticsearch.yml
brew services start elasticsearch
Install with manual download
# download
curl https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.2.2.tar.gz
tar xvzf elasticsearch-6.2.2.tar.gz
# config
cat elasticsearch-6.2.2/config/elasticsearch.yml
# run
elasticsearch-6.2.2/bin/elasticsearch
API
DELETE /_all
GET /[index]/_settings
POST /[index]/[type]/_search
GET /_nodes/stats?human&pretty
GET /_cluster/stats
Plugins
- Elastic LTR (Learning to Rank): train and use ranking models