Optimising Elasticsearch Indexes for Better Query Efficiency

Hello! If you’re looking to get the most out of your Elasticsearch cluster, you’ve come to the right place. In this guide, we’ll explore how to optimise your indexes for improved query efficiency. We’ll walk through essential best practices with examples to illustrate how to put these tips into action.

1. Plan Your Index Structure Carefully

1.1 Use Appropriate Mappings

Mappings in Elasticsearch determine how documents and fields are stored and indexed. By defining mappings properly, you’ll ensure that data is stored in the most efficient way. This can greatly reduce the amount of overhead Elasticsearch needs to deal with when you run queries. This is the primary step in designing a efficient ES cluster.

Things to consider:

Only store the fields you need for searching and analysis.
Choose the correct field type (e.g., text vs. keyword, numeric vs. date).
Set up custom analysers if necessary (e.g., using english analyser for text).

Here’s a simple example of creating an index with a custom mapping:

PUT /my_blog_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "english_text": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english_text"
      },
      "author": {
        "type": "keyword"
      },
      "published_date": {
        "type": "date"
      },
      "content": {
        "type": "text",
        "analyzer": "english_text"
      }
    }
  }
}

2. Keep Your Shard Count Under Control

By default, Elasticsearch creates five primary shards for each new index. However, having more shards than necessary can hinder query performance. Each shard requires memory and CPU resources, so balancing your shard count is crucial.

Recommendations:

Start with a smaller number of shards, especially for indices with relatively little data (e.g., 1 or 2 primary shards).
Consider increasing the number of shards only if your data grows significantly and you need more capacity.
Use the _cat/indices endpoint to monitor the size and usage of your indices.

If you would like to create an index with a custom shard and replica setting, you can use the following endpoint:

PUT /my_blog_index_optimised
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

3. Use the Right Refresh Interval

Elasticsearch refreshes an index (making new data searchable) by default every second. This is great if you need near real-time searching, but can be inefficient if you’re heavily indexing data and don’t require up-to-the-second results.

What to do:

Identify whether you require real-time searching, and increase the refresh interval during heavy index operations to reduce overhead.
You can switch it back to a lower interval (or the default) once indexing is finished.

You can use the following endpoint to change the refresh interval:

PUT /my_blog_index_optimised/_settings
{
  "index": {
    "refresh_interval": "30s"
  }
}

4. Minimise the Number of Fields

Adding too many fields in your mapping can slow down queries and make indexing inefficient. This is especially true when you’re storing multiple “dynamic” fields that Elasticsearch automatically detects.

Tips:

Keep your mapping as concise as possible.
Avoid unbounded field growth by controlling or disabling dynamic fields when you can.

Following is an example of disabling dynamic fields to prevent unplanned mapping inflation:

PUT /my_minimal_index
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "title": { "type": "text" },
      "author": { "type": "keyword" }
      // ... add other fields as needed
    }
  }
}

5. Tailor Your Queries Wisely

Elasticsearch offers a wide range of query types, but some can be more resource-intensive than others. For instance, wildcard and regex queries can be expensive if they’re not used judiciously.

Suggestions for query performance:

Use exact match queries (like term or keyword) when searching on fields with a finite set of values.
Rely on appropriate analysers for full-text searches on large text fields.
Combine multiple conditions in one request instead of making several separate queries.
Cache repetitive queries using filters.

We can use filters like following to get a more efficient query:

GET /my_blog_index_optimised/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "content": "Elasticsearch optimisation"
        }
      },
      "filter": {
        "term": {
          "author": "john_doe"
        }
      }
    }
  }
}

6. Monitor and Tune Your Cluster

At last, continual monitoring helps you spot performance bottlenecks and fix them as soon as they arise. Optimising a database is not a one-time task, but a continual process. You must closely monitor how the database reacts to sudden traffic changes. Use Elasticsearch’s built-in tools and APIs to check:

Heap usage: Keep the JVM heap size to about 50% of the node’s total RAM (up to 32 GB).
CPU usage: High CPU usage might indicate inefficient queries or insufficient hardware resources.
Disk I/O: Elasticsearch is disk-intensive, so fast SSDs are recommended.

Here are a few endpoints and tools to help you monitor:

GET /_cat/nodes?v – general overview of cluster nodes
GET /_cat/indices?v – overview of indices
Elasticsearch Metrics in Kibana – graphical dashboards for performance insights

I hope these tips and examples help you keep your Elasticsearch queries zippy and your users smiling. Thank you for reading, and may your clusters always respond in the blink of an eye! ⚡️