Optimizing Elasticsearch – Part 2: Index Lifecycle Management

In the previous blog post Optimize Elasticsearch for log collection – Part 1: reduce the number of shards“, we have seen one solution to recover a cluster suffering from the “too many shards syndrome” by merging indices that were too small. In this article, we’ll see how we can rely on latest Elasticsearch feature to keep control on our indices and shards size. We will also use the available features to manage our indices all along their lifecycle – from creation to deletion.

Index Lifecyle Management

Index Lifecycle Management (ILM) is a new feature introduced in Elasticsearch 6.7.0. It is part of Xpack and free to use as part of the community edition[1].

This feature mainly aims at managing indices for time series data, and it allows us to define the different stages and actions for an index from ingestion stage to deletion. It will handle the rollover of indices based on defined requirements automatically (time and / or size).

This feature replaces the curator component that previously took care of managing this.

Quick overview

The Index Lifecycle Management defines 4 different stages in which indices can reside:

  • Hot: indices in hot stage are sensitive indices used for data ingestion and usually respect a strong SLA. Ingest and search performance of hots indices are important. To reach the best performance requirements, those indices are often store on SSD storage to speed up search and ingestion.
  • Warm: indices in warm stage are usually not used for ingestion anymore but they’re still likely to be queried, albeit less often than hot indices. Search performance is still important whereas ingestion is usually performed on hot indices.
  • Cold: indices in cold stage are unlikely to be queried but it’s too early to consider the data to be deleted; for example, in the case of security monitoring, we might still need old data to support a security incident investigation. Cold indices need to remain on disk but their presence cannot leverage the overall cluster performance. Cold indices are usually frozen, we’ll see why and how later.
  • Delete: indices in the delete stage are about to be deleted. So simply.

An index will start its life in the hot stage, then warm, cold and finally the delete stage; this is the index lifecycle. Next, we need to define what the conditions are for an index to move from one stage to another. The set of conditions defining the life-cycle of an index is called an index policy.

To make the whole roll-over working and efficient, we’ll need to define:

  • An index policy telling which condition are required to go from one stage to another;
  • A template mapping telling the initial settings of the newly created index in the roll-over process;
  • An ingestion alias name pointing to the hottest index freshly created for ingestion.

Of course, as two indices cannot have the same name, we need a way to ingest real-time data into the newly created index triggered by the roll-over mechanism. We’ll use aliases for that purpose.

Index policy

Before going into the index policy definition, we need to define our SLA, use-cases and requirements. Every index in every environment can have its own requirements in terms of performance and time retention. Here, we will make very simple assumptions and rules trying to keep the cluster health and performance at its best by avoiding for example the “too many shards” cluster syndrome.

We consider this default policy applicable for single node cluster or for three nodes cluster. Bigger cluster will imply a more complete policy specifying nodes hosting shards in a specific stage. We can think of a cluster where SSD servers are hosting the hot shards and spinning disks are hostingthe others. SSD servers will then be used mainly for ingestion and fast searches which, does make sense in larger environments. For our environment, we’ll only assume we have a small cluster of 3 nodes with homogeneous hardware. In this case, there is no need for specifying a shards allocation strategy [2]. The only thing we need in our scenario is to specify a very simple and efficient index policy strategy:

  • We want to keep a hot index for logs ingestion until it reaches the size of 90GB or until it is older than 14 days.[3] Hot indices will have 3 shards. Every shard will then have a maximum size of 30GB which a quite good trade-off performance / overhead[4]
  • An index older than 14 days will then enter the warm phase. As warm indices won’t be use to ingest logs, we can easily merge the segments together to speed up the search a bit. This will have the effect to mark your warm indices as read-only [5]. We also have here the possibility to reduce the number of shards [6], the result will be the same as re-indexing with less shards. As the number of shards in your destination index must be a lower factor of the source index, we decided to not shrink our shards. However, it could be a very interesting operation to perform in big environment where requirements require shrinking. Yet another interesting feature is to increase or decrease the number of replicas, we’ll make sure we have no replicas for our warm indices in order to free-up some disk space.
  • An index older than 30 days will be considered a cold index. In our strategy, we will freeze on disk all indices older than 30 days, freeing up some memory for in-memory shards of hot and warm indices [7] . Searching from frozen indices will have a huge impact on Elasticsearch performance; as this is not likely to often happen in most use cases, we considered this as acceptable.
  • An index older than 210 days will be marked to be deleted.

Every lifecycle policy can be registered under a specific name; we will use this feature to assign different policy to different index ingesting at different speed. We can also use a different index policy for a different index which doesn’t have the same requirements and SLA.

In elasticsearch, this policy is applied with the following request (in this case, applying the policy to “brofilter”):

PUT _ilm/policy/brofilter
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_age": "14d",
            "max_size": "90G"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "14d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "allocate": {
            "number_of_replicas": 0
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "freeze": {}
        }
      },
      "delete": {
        "min_age": "210d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

With such a strategy, you will note that:

  • Indices of 90GB but eager than 14 days will remain in the hot phase. It will just be rolled-over with another one to keep an acceptable shard size.
  • Indices will be deleted after 210 days from its creation. Data which was ingested 196 days ago can potentially be deleted (210 days – 14 days).
  • The priority is used to load the shards into memory. The higher the priority, the faster the shards will be loaded.
  • The initial number of shards and replicas for the hot phase is set in the template mappings.

Template mapping

Template mapping is used to give a default configuration to newly created indices matching a specific pattern. We will also use that template mapping feature to give the initial settings to hot indices. We will specify in the template mapping:

  • The initial number of shards;
  • The initial number of replicas;
  • The applied lifecycle policy;
  • The alias used for ingestion.

You need to take into account that the number of shards and replicas can be changed during the lifecycle of the index; these settings actually target indices in their hot stage.

The mapping is then very simple, and can be set as following:

PUT _template/brofilter
{
  "index_patterns": [
    "logstash-eagleeye-brofilter-*"
  ],
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "lifecycle.name": "brofilter",
      "lifecycle.rollover_alias": "logstash-eagleeye-brofilter"
    }
  }
}

This state that all newly created indices matching logstash-eagleeye-brofilter-* will:

  • Have 3 shards as we want in our hot stage;
  • Have 1 replica, which is particularly useful in a 3 nodes cluster;
  • Follow the brofilter policy previously defined;
  • Will roll over the ingestion alias named logstash-eagleeye-brofilter

We didn’t provide any field mapping, but this would definitely be the place and the time to do that in your own environment!

Creating the first index

As the settings are now committed into Elasticsearch, the very first index can now be created manually together with its ingestion alias, in order to launch the machinery.

PUT logstash-eagleeye-brofilter-000001
{
  "aliases": {
    "logstash-eagleeye-brofiler": {
      "is_write_index": true
    }
  }
}

This will trigger the creation of the very first index: logstash-eagleeye-brofilter-000001. This new index is linked with the ingestion alias logstash-eagleeye-brofilter. This alias name must be used in the Logstash configuration in order to make sure we always ingest in the correct hot index. Not using the ingestion alias to ingest logs would undo the added value of using a rollover strategy.

From now on, the first index is created and it will follow the index policy previously defined. The previously defined mapping will automatically be applied on every new index created by the policy. The ingestion alias will be defined on the newly rolled-over index automatically. We don’t have to do this manually; we just need to keep ingesting in the ingestion alias, and all the rest is managed by Elasticsearch.

By default, every new index will be created with a new appended number. We have decided that the index prefix will be the same as the ingestion alias. All new indices will have an incremented suffix going from logstash-eagleeye-brofilter-000001 to logstash-eagleeye-brofilter-000002 and so on… Note that it is possible to modify this behavior and use a timestamp instead of a number [8]; for our example, we decided not to use this.

Because every new ingestion alias needs to be explicitly created before ingestion, and because index roll-over is automatically managed by Elasticsearch, we can safely disable automatic index creation.

Troubleshooting issues with Index Lifecycle Management

Each piece of software comes with a selection of specific caveats and potential headaches. Elasticsearch index lifecycle management is not an exception to the rule and sometimes, we can still run into issues. To troubleshoot these issues, the “explain” command can be used in order to get insights on the lifecycle state of an index:

GET logstash-eagleeye*/_ilm/explain

That command will give a lifecycle status for every index matching: logstash-eagleeye-*.

In our own environment, we faced an issue where lifecyce.rollover_alias defined in the template mapping was not the same as the one defined in the creation of the first index. The index lifecycle policy was then unable to rollover and bind the ingestion alias to the new index. In order to fix this, we had to do the following:

  • Modify the mapping of the index in error;
  • Modify the template mapping to avoid the issue in the future;
  • Trigger the index policy again.

Once the issue is solved, you have to tell the index policy management to process the index again by running the following command:

POST logstash-eagleeye-brofilter-000001/_ilm/retry

We hope this blog post helped you shed some light on how indices can be managed within Elasticsearch, and the different principles that govern these concepts.

In the next blog post we will explain how you can deploy an Index Lifecycle Policy through ansible for an automatic way of managing all of this easily – stay tuned!


[1] https://www.elastic.co/guide/en/elasticsearch/reference/6.7/index-lifecycle-management.html

[2] https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-allocation.html

[3] Specifying an age is very important if we want to remove an index after a certain time. If we don’t specify a maximum age, we take the risk to delete recent data if the same hot index has been used to ingest data for long time.

[4] Mainly to avoid the “too many shards syndrome” as much as possible.

[5] “Force merge should only be called against read-only indices. Running force merge against a read-write index can cause very large segments to be produced (>5Gb per segment), and the merge policy will never consider it for merging again until it mostly consists of deleted docs. This can cause very large segments to remain in the shards.https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html

[6] In the warm stage, you can specify shrink action to reduce the number of shards for you index. https://www.elastic.co/guide/en/elasticsearch/reference/current/_actions.html#ilm-shrink-action

[7] https://www.elastic.co/guide/en/elasticsearch/reference/master/frozen-indices.html

[8] https://www.elastic.co/guide/en/elasticsearch/reference/current/date-math-index-names.html

3 thoughts on “Optimizing Elasticsearch – Part 2: Index Lifecycle Management

    1. Also, Is it must define an alias & roll-over indices? I’ve logs indices created everyday DD:MM:YY-000001 and if I simply apply a policy to move it to warm/cold/delete phases it would still work right?( My indices would move to warm after 30 days of creation and cold after 60 days)

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s