2  Searching Data

2.1 Task: Write and execute a search query for terms and/or phrases in one or more fields of an index

The following section will have only one full example, but will show variations of term and phrase queries. Also, bear in mind that when they say term they may not mean the Elasticsearch use of the word, but rather the generic search use of the word. There are a lot of ways to execute a search in Elasticsearch. Don’t get bogged down; focus on term and phrase searches for this section of the example.

Example 2: Boosting Document Score When an Additional Field Matches

Requirements

  • Perform a search for beverage OR bar
  • Boost the score of documents if the value snack exists in the tags field.

Steps

  1. Index Sample Documents Using _bulk Endpoint:
    • Index documents with fields such as name, description, and tags.
    POST /products/_bulk
    { "index": { "_id": "1" } }
    { "name": "Yoo-hoo Beverage", "description": "A delicious, chocolate-flavored drink.", "tags": ["beverage", "chocolate"] }
    { "index": { "_id": "2" } }
    { "name": "Apple iPhone 12", "description": "The latest iPhone model with advanced features.", "tags": ["electronics", "smartphone"] }
    { "index": { "_id": "3" } }
    { "name": "Choco-Lite Bar", "description": "A light and crispy chocolate snack bar.", "tags": ["snack", "chocolate"] }
    { "index": { "_id": "4" } }
    { "name": "Samsung Galaxy S21", "description": "A powerful smartphone with an impressive camera.", "tags": ["electronics", "smartphone"] }
    { "index": { "_id": "5" } }
    { "name": "Nike Air Max 270", "description": "Comfortable and stylish sneakers.", "tags": ["footwear", "sportswear"] }
  2. Perform the query_string Query with Boosting:
    • Use a query_string query to create an OR condition within the query.
    • Use a function_score query to boost the score of documents where the tags field contains a specific value (e.g., "chocolate").
    GET /products/_search
    {
      "query": {
        "function_score": {
          "query": {
            "query_string": {
              "query": "beverage OR bar"
            }
          },
          "functions": [
            {
              "filter": {
                "term": { "tags": "snack" }
              },
              "weight": 2
            }
          ],
          "boost_mode": "multiply"
        }
      }
    }

Test

  • Run the above search query.
  • Run the following query (which is missing the filter function)
GET /products/_search
{
  "query": {
    "query_string": {
      "query": "beverage OR bar"
    }
  }
}
  • Check the boosted output to ensure that documents containing "snack" in the tags field have a higher score, and that documents are matched based on the OR condition in the query_string.

Considerations

  • The query_string query allows you to use a query syntax that includes operators such as OR, AND, and NOT to combine different search criteria.
  • The function_score query is used to boost the score of documents based on specific conditions—in this case, whether the tags field contains the value "snack".
  • The weight parameter in the function_score query determines the amount by which the score is boosted, and the boost_mode of "multiply" multiplies the original score by the boost value.

Clean-up (optional)

  • Delete the example index

    DELETE products

Documentation

2.2 Task: Write and execute a search query that is a Boolean combination of multiple queries and filters

Example 1: Creating a Boolean search for documents in a book index

Requirements

  • Search for documents with a term in the “title”, “description”, and “category” field

Steps

  1. Open the Kibana Console or use a REST client.

  2. Index some documents which will create an index at the same time. The Elastic Console doesn’t like properly formatted documents when calling _bulk so they need to be tightly packed.

    POST /books/_bulk
    { "index": { "_id": "1" } }
    { "title": "To Kill a Mockingbird", "description": "A novel about the serious issues of rape and racial inequality.", "category": "Fiction" }
    { "index": { "_id": "2" } }
    { "title": "1984", "description": "A novel that delves into the dangers of totalitarianism.", "category": "Dystopian" }
    { "index": { "_id": "3" } }
    { "title": "The Great Gatsby", "description": "A critique of the American Dream.", "category": "Fiction" }
    { "index": { "_id": "4" } }
    { "title": "Moby Dick", "description": "The quest of Ahab to exact revenge on the whale Moby Dick.", "category": "Adventure" }
    { "index": { "_id": "5" } }
    { "title": "Pride and Prejudice", "description": "A romantic novel that also critiques the British landed gentry at the end of the 18th century.", "category": "Romance" }
  3. Create a boolean search query. The order in which the various clauses are added don’t matter to the final result.

    GET books/_search
    {
      "query": {
        "bool": {}
      }
    }
  4. Add a must query for the description field. This will return 4 documents.

    GET books/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "terms": {
                "description": [
                  "novel",
                  "dream",
                  "critique"
                ]
              }
            }
          ]
        }
      }
    }
  5. Add a filter query for the category field. This will return 2 documents.

    GET books/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "terms": {
                "description": [
                  "novel",
                  "dream",
                  "critique"
                ]
              }
            }
          ],
          "filter": [
            {
              "term": {
                "category": "fiction"
              }
            }
          ]
        }
      }
    }
  6. Add a must_not filter for the title field. This will return 1 document.

    GET books/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "terms": {
                "description": [
                  "novel",
                  "dream",
                  "critique"
                ]
              }
            }
          ],
          "filter": [
            {
              "term": {
                "category": "fiction"
              }
            }
          ],
          "must_not": [
            {
              "term": {
                "title": {
                  "value": "gatsby"
                }
              }
            }
          ]
        }
      }
    }

Considerations

  • The bool query allows for combining multiple queries and filters with Boolean logic.
  • The must, must_not, and filter clauses ensure that all searches and filters must match for a document to be returned.

Test

  1. Verify that the search query returns documents with the term “novel”, “dream”, and “critique” in the description field. Why are there no documents with the term “critique”?

Clean-up (optional)

  • Delete the index

    DELETE books

Documentation

Example 2: Creating a Boolean search for finding products within a specific price range and excluding discontinued items

Requirements

  • Find all documents where the name field exists (name: \*) and the price field falls within a specified range.
  • Additionally, filter out any documents where the discontinued field is set to true.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Index some documents which will create an index at the same time. The Elastic Console doesn’t like properly formatted documents when calling _bulk so they need to be tightly packed.

    POST /products/_bulk
    {"index":{"_id":1}}
    {"name":"Coffee Maker","price":49.99,"discontinued":false}
    {"index":{"_id":2}}
    {"name":"Gaming Laptop","price":1299.99,"discontinued":false}
    {"index":{"_id":3}}
    {"name":"Wireless Headphones","price":79.99,"discontinued":true}
    {"index":{"_id":4}}
    {"name":"Smartwatch","price":249.99,"discontinued":false}
  3. Construct the first search query (the name field exists and the price field falls within a specified range)

    GET products/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "name"
              }
            },
            {
              "range": {
                "price": {
                  "gte": 70,
                  "lte": 500
                }
              }
            }
          ]
        }
      }
    }
  4. Construct the second search query (same as above, but check if discontinued is set to true)

    GET products/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "exists": {
                "field": "name"
              }
            },
            {
              "range": {
                "price": {
                  "gte": 70,
                  "lte": 500
                }
              }
            }
          ],
          "must_not": [
            {
              "term": {
                "discontinued": {
                  "value": "true"
                }
              }
            }
          ]
        }
      }
    }

Explanation

  • Similar to the previous example, the bool query combines multiple conditions.
  • The must clause specifies documents that must match all conditions within it.
  • The range query ensures the price field is between $70 (inclusive) and $500 (inclusive).
  • The must_not clause excludes documents that match the specified criteria.
  • The term query filters out documents where discontinued is set to true.

Test

  1. Run the search query and verify the results only include documents for products with:
    • A price between $70 and $500 (inclusive).
    • discontinued set to true (not discontinued).

This should return a single document with an ID of 4 (Smartwatch) based on the sample data.

Considerations

  • The chosen price range (gte: 70, lte: 500) can be adjusted based on your specific needs.
  • You can modify the match query for name to use more specific criteria if needed.

Clean-up (optional)

  • Delete the index

    DELETE products

Documentation

Example 3: Creating a Boolean search for e-commerce products

Requirements

  • Search for products that belong to the “Electronics” category.
  • The product name should contain the term “phone”.
  • Exclude products with a price greater than 500.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create an index.

    PUT products
    {
      "mappings": {
        "properties": {
          "name" : {
            "type": "text"
          },
          "category" : {
            "type": "text"
          },
          "price" : {
            "type": "float"
          }
        }
      }
    }
  3. Index some documents which will create an index at the same time. The Elastic Console doesn’t like properly formatted documents when calling _bulk so they need to be tightly packed.

    POST /products/_bulk
    {"index": { "_id": 1 } }
    { "name": "Smartphone X", "category": "Electronics", "price": 399.99 }
    {"index": { "_id": 2 } }
    { "name": "Laptop Y", "category": "Electronics", "price": 799.99 }
    {"index": { "_id": 3 } }
    { "name": "Headphones Z", "category": "Electronics", "price": 99.99 }
    {"index": { "_id": 4 } }
    { "name": "Gaming Console", "category": "Electronics", "price": 299.99 }
  4. Create a term query that only matches the category “electronics”. This returns all 4 documents.

    GET products/_search
    {
      "query": {
        "term": {
          "category": {
            "value": "electronics"
          }
        }
      }
    }
  5. Create another query using wildcard to return docs that includes “phone”. This returns only 2 documents.

    GET products/_search
    {
      "query": {
        "wildcard": {
          "name": {
            "value": "*phone*"
          }
        }
      }
    }
  6. Create another query using range that returns docs with any price less than $500. This returns 3 documents.

    GET products/_search
    {
      "query": {
        "range": {
          "price": {
            "lt": 500
          }
        }
      }
    }
  7. Combine the above into one bool query with a single must that contains the three queries. This will return the 2 matching documents.

    GET products/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "category": {
                  "value": "electronics"
                }
              }
            },
            {
              "wildcard": {
                "name": {
                  "value": "*phone*"
                }
              }
            },
            {
              "range": {
                "price": {
                  "lt": 500
                }
              }
            }
          ]
        }
      }
    }

Test

  1. The search results should include the following documents:
    • Smartphone X
    • Headphones Z

Considerations

  • The term query is used for matches on the category field.
  • The wildcard query is used for matches on the name field.
  • The range query is used to filter out documents based on price.
  • The bool.must query combines these conditions using the specified occurrence types.

Clean-up (optional)

  • Delete the index

    DELETE products

Documentation

Example 4: Creating a Boolean search for e-commerce products

Requirements

  • Create an index named “products”.
  • Create at least 4 documents with varying categories, prices, ratings, and brands.
  • Create a boolean query
    • Use the must:
      • return just electronics
      • products more than $500
    • Use must_not:
      • rating less than 4
    • Use filter:
      • only Apple products

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create the “products” index

    PUT products
    {
      "mappings": {
        "properties": {
          "brand": {
            "type": "text"
          },
          "category": {
            "type": "keyword"
          },
          "name": {
            "type": "text"
          },
          "price": {
            "type": "long"
          },
          "rating": {
            "type": "float"
          }
        }
      }
    }
  3. Add some sample documents using the _bulk endpoint.

    POST /products/_bulk
    {"index":{"_id":1}}
    {"name":"Laptop","category":"Electronics","price":1200,"rating":4.5,"brand":"Apple"}
    {"index":{"_id":2}}
    {"name":"Smartphone","category":"Electronics","price":800,"rating":4.2,"brand":"Samsung"}
    {"index":{"_id":3}}
    {"name":"Sofa","category":"Furniture","price":1000,"rating":3.8,"brand":"IKEA"}
    {"index":{"_id":4}}
    {"name":"Headphones","category":"Electronics","price":150,"rating":2.5,"brand":"Sony"}
    {"index":{"_id":5}}
    {"name":"Dining Table","category":"Furniture","price":600,"rating":4.1,"brand":"Ashley"}
  4. Create a term query that only matches the category “electronics”. This returns 3 documents.

    GET products/_search
    {
      "query": {
        "term": {
          "category": {
            "value": "electronics"
          }
        }
      }
    }
  5. Create a range query to return products whose price is greater than $500. This should return 4 documents (why?).

    GET products/_search
    {
      "query": {
        "range": {
          "price": {
            "gte": 500
          }
        }
      }
    }
  6. Create another range query to return products with a rating less than 4. This will return 2 documents.

    GET products/_search
    {
      "query": {
        "range": {
          "rating": {
            "lt": 4
          }
        }
      }
    }
  7. Create another term query to return only Apple branded products. This will return 2 documents.

    GET products/_search
    {
      "query": {
        "term": {
          "brand": {
            "value": "apple"
          }
        }
      }
    }
  8. Assemble the bool query by placing each query in their appropriate must, must_not and filter node.

    GET products/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "category": {
                  "value": "electronics"
                }
              }
            },
            {
              "range": {
                "price": {
                  "gte": 500
                }
              }
            }
          ],
          "must_not": [
            {
              "range": {
                "rating": {
                  "lt": 4
                }
              }
            }
          ],
          "filter": [
            {
              "term": {
                "brand": {
                  "value": "apple"
                }
              }
            }
          ]
        }
      }
    }

Test

  • Check the response from the search query to ensure that it returns the expected documents
    • products in the “Electronics” category
    • a price greater than $500
    • excluding products with a rating less than 4
    • from the brand “Apple”

Considerations

  • The filter clause is used to include only documents with the brand “Apple”.

Clean-up (optional)

  • Delete the index

    DELETE products

Documentation

2.4 Task: Write and execute metric and bucket aggregations

Example 1: Creating Metric and Bucket Aggregations for Product Prices

Requirements

  • Create an index called product_prices.
  • Index at least four documents using the _bulk endpoint.
  • Execute metric and bucket aggregations in a single
    • bucket the category field
    • calculate the average price per bucket
    • find the maximum price per bucket
    • find the minimum price per bucket

Steps

  1. Open the Kibana Console or use a REST client.

Ensure you have access to Kibana or any REST client to execute the following requests.

  1. Create an index with the following schema (needed for the aggregations to work properly).

    PUT product_prices
    {
      "mappings": {
        "properties": {
          "product": {
            "type": "text"
          },
          "category": {
            "type": "keyword"
          },
          "price": {
            "type": "double"
          }
        }
      }
    }
  2. Index documents.

    POST /product_prices/_bulk
    { "index": { "_id": "1" } }
    { "product": "Elasticsearch Guide", "category": "Books", "price": 29.99 }
    { "index": { "_id": "2" } }
    { "product": "Advanced Elasticsearch", "category": "Books", "price": 39.99 }
    { "index": { "_id": "3" } }
    { "product": "Elasticsearch T-shirt", "category": "Apparel", "price": 19.99 }
    { "index": { "_id": "4" } }
    { "product": "Elasticsearch Mug", "category": "Apparel", "price": 12.99 }
  3. Execute a simple aggregation (should return 2 buckets).

    GET product_prices/_search
    {
      "size": 0,
      "aggs": {
        "category_buckets": {
          "terms": {
            "field": "category"
          }
        }
      }
    }
  4. Add and execute a single sub-aggregation to determine the average price per category (bucket).

    GET product_prices/_search
    {
      "size": 0,
      "aggs": {
        "category_buckets": {
          "terms": {
            "field": "category"
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  5. Add min and max sub-aggregations and execute the query.

    GET product_prices/_search
    {
      "size": 0,
      "aggs": {
        "category_buckets": {
          "terms": {
            "field": "category"
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            },
            "min_price" : {
              "min": {
                "field": "price"
              }
            },
            "max_price": {
              "max": {
                "field": "price"
              }
            }
          }
        }
      }
    }

Test

  1. Verify the index creation.

    GET /product_prices
  2. Verify the documents have been indexed.

    GET /product_prices/_search
  3. Execute the aggregation query and verify the results.

    {
      ...
      "aggregations": {
        "category_buckets": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "Apparel",
              "doc_count": 2,
              "avg_price": {
                "value": 16.49
              },
              "min_price": {
                "value": 12.99
              },
              "max_price": {
                "value": 19.99
              }
            },
            {
              "key": "Books",
              "doc_count": 2,
              "avg_price": {
                "value": 34.99
              },
              "min_price": {
                "value": 29.99
              },
              "max_price": {
                "value": 39.99
              }
            }
          ]
        }
      }
    }

Considerations

  • The category field must be of type keyword.
  • The terms aggregation creates buckets for each unique category.
  • The avg, min, and max sub-aggregations calculate the average, minimum, and maximum prices within each category bucket.
  • Setting size to 0 ensures that only aggregation results are returned, not individual documents.

Clean-up (optional)

  • Delete the index.

    DELETE product_prices

Documentation

Example 2: Creating Metric and Bucket Aggregations for Website Traffic

Requirements

  • Create a new index with four documents representing website traffic data.
  • Aggregate the following:
    • Group traffic by country.
    • Calculate the total page views.
    • Calculate the average page views per country.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create a new index.

    PUT traffic
    {
      "mappings": {
        "properties": {
          "country": {
            "type": "keyword"
          },
          "page_views": {
            "type": "long"
          }
        }
      }
    }
  3. Add four documents representing website traffic data.

    POST /traffic/_bulk
    {"index":{}}
    {"country":"USA","page_views":100}
    {"index":{}}
    {"country":"USA","page_views":200}
    {"index":{}}
    {"country":"Canada","page_views":50}
    {"index":{}}
    {"country":"Canada","page_views":75}
  4. Execute the bucket aggregation for country (should return 2 buckets).

    GET traffic/_search
    {
      "size": 0,
      "aggs": {
        "country_bucket": {
          "terms": {
            "field": "country"
          }
        }
      }
    }
  5. Add the sum aggregation for total page_views (should return 1 aggregation).

    GET traffic/_search
    {
      "size": 0,
      "aggs": {
        "country_bucket": {
          "terms": {
            "field": "country"
          }
        },
        "total_page_views": {
          "sum": {
            "field": "page_views"
          }
        }
      }
    }
  6. Add a sub-aggregation for average page_views per country (should appear in 2 buckets).

    GET traffic/_search
    {
      "size": 0,
      "aggs": {
        "country_bucket": {
          "terms": {
            "field": "country"
          },
          "aggs": {
            "avg_page_views": {
              "avg": {
                "field": "page_views"
              }
            }
          }
        },
        "total_page_views": {
          "sum": {
            "field": "page_views"
          }
        }
      }
    }

Test

  1. Verify the index creation.

    GET /traffic
  2. Verify the documents have been indexed.

    GET /traffic/_search
  3. Verify that the total page views are calculated correctly (should be 425).

    GET /traffic/_search
    {
      "aggs": {
        "total_page_views": {
          "sum": {
            "field": "page_views"
          }
        }
      }
    }
  4. Verify that the traffic is grouped correctly by country and average page views are calculated.

    GET /traffic/_search
    {
      "aggs": {
        "traffic_by_country": {
          "terms": {
            "field": "country"
          },
          "aggs": {
            "avg_page_views": {
              "avg": {
                "field": "page_views"
              }
            }
          }
        }
      }
    }

    Response:

    {
      ...
      "aggregations": {
        "country_bucket": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "Canada",
              "doc_count": 2,
              "avg_page_views": {
                "value": 62.5
              }
            },
            {
              "key": "USA",
              "doc_count": 2,
              "avg_page_views": {
                "value": 150
              }
            }
          ]
        },
        "total_page_views": {
          "value": 425
        }
      }
    }

Considerations

  • The country field must be of type keyword.
  • The terms bucket aggregation is used to group traffic by country.
  • The sum metric aggregation is used to calculate the total page views.
  • The avg metric aggregation is used to calculate the average page views per country.

Clean-up (optional)

  • Delete the index.

    DELETE traffic

Documentation

Example 3: Creating Metric and Bucket Aggregations for Analyzing Employee Salaries

Requirements

  • An Elasticsearch index named employees with documents containing fields name, department, position, salary, hire_date.
  • Calculate the average salary across all employees.
  • Group the employees by department
  • Calculate the maximum salary for each department.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create an index with the proper mapping for the department as we want to bucket by it.

    PUT employees
    {
      "mappings": {
        "properties": {
          "name": {
            "type": "text"
          },
          "department": {
            "type": "keyword"
          },
          "position": {
            "type": "text"
          },
          "salary": {
            "type": "integer"
          },
          "hire_date": {
            "type": "date"
          }
        }
      }
    }
  3. Index sample employee documents using the /_bulk endpoint.

    POST /employees/_bulk
    {"index":{"_id":1}}
    {"name":"John Doe", "department":"Engineering", "position":"Software Engineer", "salary":80000, "hire_date":"2018-01-15"}
    {"index":{"_id":2}}
    {"name":"Jane Smith", "department":"Engineering", "position":"DevOps Engineer", "salary":75000, "hire_date":"2020-03-01"}
    {"index":{"_id":3}}
    {"name":"Bob Johnson", "department":"Sales", "position":"Sales Manager", "salary":90000, "hire_date":"2016-06-01"}
    {"index":{"_id":4}}
    {"name":"Alice Williams", "department":"Sales", "position":"Sales Representative", "salary":65000, "hire_date":"2019-09-15"}
  4. Calculate the average salary of all employees

    GET employees/_search
    {
      "size": 0,
      "aggs": {
        "avg_salary_all_emps": {
          "avg": {
            "field": "salary"
          }
        }
      }
    }
  5. Add grouping the employees by department

    GET employees/_search
    {
      "size": 0,
      "aggs": {
        "avg_salary_all_emps": {
          "avg": {
            "field": "salary"
          }
        },
        "employees_by_department" : {
          "terms": {
            "field": "department"
          }
        }
      }
    }
  6. Add calculating the highest salary of all employees by department

    GET employees/_search
    {
      "size": 0,
      "aggs": {
        "avg_salary_all_emps": {
          "avg": {
            "field": "salary"
          }
        },
        "employees_by_department": {
          "terms": {
            "field": "department"
          },
          "aggs": {
            "max_salary_by_department": {
              "max": {
                "field": "salary"
              }
            }
          }
        }
      }
    }

Test

  1. Verify the index creation.

    GET /employees
  2. Verify the documents have been indexed.

    GET /employees/_search
  3. Execute the aggregation query, and it should return the following:

    {
      ...
      "aggregations": {
        "avg_salary_all_emps": {
          "value": 77500
        },
        "employees_by_department": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "Engineering",
              "doc_count": 2,
              "max_salary_by_department": {
                "value": 80000
              }
            },
            {
              "key": "Sales",
              "doc_count": 2,
              "max_salary_by_department": {
                "value": 90000
              }
            }
          ]
        }
      }
    }

Considerations

  • The department field must be of type keyword.
  • The size parameter is set to 0 to exclude hit documents from the response.
  • The avg_salary_all_emps metric aggregation calculates the average of the salary field across all documents.
  • The employees_by_department bucket aggregation groups the documents by the department field.
  • The max_salary_by_department sub-aggregation calculates the maximum value of the salary field for each department.

Clean-up (optional)

  • Delete the index.

    DELETE employees

Documentation

2.5 Task: Write and execute aggregations that contain subaggregations

Example 1: Creating aggregations and sub-aggregations for Product Categories and Prices

Requirements

  • Create aggregations
    • by category
    • sub-aggregation of average price by category
      • price ranges: $0 to $20, $20-$40, $40 and up

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create an index.

    PUT /product_index
    {
      "mappings": {
        "properties": {
          "product": {
            "type": "text"
          },
          "category": {
            "type": "keyword"
          },
          "price": {
            "type": "double"
          }
        }
      }
    }
  3. Index some sample documents.

    POST /product_index/_bulk
    { "index": { "_id": "1" } }
    { "product": "Elasticsearch Guide", "category": "Books", "price": 29.99 }
    { "index": { "_id": "2" } }
    { "product": "Advanced Elasticsearch", "category": "Books", "price": 39.99 }
    { "index": { "_id": "3" } }
    { "product": "Elasticsearch T-shirt", "category": "Apparel", "price": 19.99 }
    { "index": { "_id": "4" } }
    { "product": "Elasticsearch Mug", "category": "Apparel", "price": 12.99 }
  4. Create an aggregation by category.

    GET product_index/_search
    {
      "size": 0,
      "aggs": {
        "category_buckets": {
          "terms": {
            "field": "category"
          }
        }
      }
    }
  5. Create a sub-aggregations of average price.

    GET product_index/_search
    {
      "size": 0,
      "aggs": {
        "category_buckets": {
          "terms": {
            "field": "category"
          },
          "aggs": {
            "average_price": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  6. Create a sub-aggregations of price ranges ($0-$20, $10-$40, $40 and up).

    GET product_index/_search
    {
      "size": 0,
      "aggs": {
        "category_buckets": {
          "terms": {
            "field": "category"
          },
          "aggs": {
            "average_price": {
              "avg": {
                "field": "price"
              }
            },
            "price_ranges" : {
              "range": {
                "field": "price",
                "ranges": [
                  {
                    "to": 20
                  },
                  {
                    "from": 20,
                    "to": 40
                  },
                  {
                    "from": 40
                  }
                ]
              }
            }
          }
        }
      }
    }

Test

  1. Verify the index creation and mappings.

    GET /product_index
  2. Verify the test documents are in the index.

    GET /product_index/_search
  3. Execute the aggregation query and confirm the results.

    {
      ...
      "aggregations": {
        "category_buckets": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "Apparel",
              "doc_count": 2,
              "average_price": {
                "value": 16.49
              },
              "price_ranges": {
                "buckets": [
                  {
                    "key": "*-20.0",
                    "to": 20,
                    "doc_count": 2
                  },
                  {
                    "key": "20.0-40.0",
                    "from": 20,
                    "to": 40,
                    "doc_count": 0
                  },
                  {
                    "key": "40.0-*",
                    "from": 40,
                    "doc_count": 0
                  }
                ]
              }
            },
            {
              "key": "Books",
              "doc_count": 2,
              "average_price": {
                "value": 34.99
              },
              "price_ranges": {
                "buckets": [
                  {
                    "key": "*-20.0",
                    "to": 20,
                    "doc_count": 0
                  },
                  {
                    "key": "20.0-40.0",
                    "from": 20,
                    "to": 40,
                    "doc_count": 2
                  },
                  {
                    "key": "40.0-*",
                    "from": 40,
                    "doc_count": 0
                  }
                ]
              }
            }
          ]
        }
      }
    }

Considerations

  • Setting size: 0 ensures the search doesn’t return any documents, focusing solely on the aggregations.
  • The category field must be of type keyword.
  • The terms aggregation creates buckets for each unique category.
  • The avg sub-aggregation calculates the average price within each category bucket.
  • The range sub-aggregation divides the prices into specified ranges within each category bucket.

Clean-up (optional)

  • Delete the index.

    DELETE product_index

Documentation

Example 2: Creating aggregations and sub-aggregations for Employee Data Analysis

Requirements

  • Use the terms aggregation to group employees by department.
  • Use the avg sub-aggregation to calculate the average salary per department.
  • Use the filters sub-aggregation to group employees by job_title.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create a new index called employees.

    PUT employees
    {
      "mappings": {
        "properties": {
          "department": {
            "type": "keyword"
          },
          "salary": {
            "type": "integer"
          },
          "job_title": {
            "type": "keyword"
          }
        }
      }
    }
  3. Insert four documents representing employee data.

    POST /employees/_bulk
    {"index":{}}
    {"department":"Sales","salary":100000,"job_title":"Manager"}
    {"index":{}}
    {"department":"Sales","salary":80000,"job_title":"Representative"}
    {"index":{}}
    {"department":"Marketing","salary":120000,"job_title":"Manager"}
    {"index":{}}
    {"department":"Marketing","salary":90000,"job_title":"Coordinator"}
  4. Execute an aggregation by department.

    GET employees/_search
    {
      "size": 0,
      "aggs": {
        "employees_by_department": {
          "terms": {
            "field": "department"
          }
        }
      }
    }
  5. Add the sub-aggregations for average salary by department.

    GET employees/_search
    {
      "size": 0,
      "aggs": {
        "employees_by_department": {
          "terms": {
            "field": "department"
          },
          "aggs": {
            "avg_salary_by_department": {
              "avg": {
                "field": "salary"
              }
            }
          }
        }
      }
    }
  6. Add a filters sub-aggregation for each job_title.

    GET employees/_search
    {
      "size": 0,
      "aggs": {
        "employees_by_department": {
          "terms": {
            "field": "department"
          },
          "aggs": {
            "avg_salary_by_department": {
              "avg": {
                "field": "salary"
              }
            },
            "employees_by_title": {
              "filters": {
                "filters": {
                  "Managers": {
                    "term": {
                      "job_title": "Manager"
                    }
                  },
                  "Representative" : {
                    "term": {
                      "job_title": "Representative"
                    }
                  },
                  "Coordinator" : {
                    "term": {
                      "job_title": "Coordinator"
                    }
                  }
                }
              }
            }
          }
        }
      }
    }

Test

  1. Verify the index creation and mappings.

    GET /employees
  2. Verify the test documents are in the index.

    GET /employees/_search
  3. Verify that the employees are grouped correctly by department and job title and that the average salary is calculated correctly for each department.

    {
      ...
      "aggregations": {
        "employees_by_department": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "Marketing",
              "doc_count": 2,
              "avg_salary_by_department": {
                "value": 105000
              },
              "employees_by_title": {
                "buckets": {
                  "Coordinator": {
                    "doc_count": 1
                  },
                  "Managers": {
                    "doc_count": 1
                  },
                  "Representative": {
                    "doc_count": 0
                  }
                }
              }
            },
            {
              "key": "Sales",
              "doc_count": 2,
              "avg_salary_by_department": {
                "value": 90000
              },
              "employees_by_title": {
                "buckets": {
                  "Coordinator": {
                    "doc_count": 0
                  },
                  "Managers": {
                    "doc_count": 1
                  },
                  "Representative": {
                    "doc_count": 1
                  }
                }
              }
            }
          ]
        }
      }
    }

Considerations

  • The department field must be of type keyword.
  • Setting size to 0 ensures the search doesn’t return any documents, focusing solely on the aggregations.
  • The terms aggregation is used to group employees by department.
  • The avg sub-aggregation is used to calculate the average salary per department.
  • The filters sub-aggregation is used to group employees by job_title.

Clean-up (optional)

  • Delete the index.

    DELETE employees

Documentation

Example 3: Creating aggregations and sub-aggregations for application logs by Hour and Log Level

Requirements

  • Analyze application logs stored in an Elasticsearch index named app-logs.
  • Use a date_histogram aggregation to group logs by the hour.
  • Within each hour bucket, create a sub-aggregation to group logs by their severity level (log_level).

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create a new index called app-logs.

    PUT app-logs
    {
      "mappings": {
        "properties": {
          "@timestamp": {
            "type": "date"
          },
          "log_level": {
            "type": "keyword"
          },
          "message": {
            "type": "text"
          }
        }
      }
    }
  3. Insert sample data.

    POST /app-logs/_bulk
    {"index":{},"_id":"1"}
    {"@timestamp":"2024-05-24T10:30:00","log_level":"INFO","message":"Application started successfully."}
    {"index":{},"_id":"2"}
    {"@timestamp":"2024-05-24T11:15:00","log_level":"WARNING","message":"Potential memory leak detected."}
    {"index":{},"_id":"3"}
    {"@timestamp":"2024-05-24T12:00:00","log_level":"ERROR","message":"Database connection failed."}
    {"index":{},"_id":"4"}
    {"@timestamp":"2024-05-24T10:45:00","log_level":"DEBUG","message":"Processing user request."}
  4. Use a date_histogram aggregation to group logs by the hour.

    GET app-logs/_search
    {
      "size": 0,
      "aggs": {
        "logs_by_the_hour": {
          "date_histogram": {
            "field": "@timestamp",
            "fixed_interval": "1h"
          }
        }
      }
    }
  5. Within each hour bucket, create a sub-aggregation to group logs by their severity level (log_level).

    GET app-logs/_search
    {
      "size": 0,
      "aggs": {
        "logs_by_the_hour": {
          "date_histogram": {
            "field": "@timestamp",
            "fixed_interval": "1h"
          },
          "aggs": {
            "log_severity": {
              "terms": {
                "field": "log_level"
              }
            }
          }
        }
      }
    }

Test

  1. Verify the index creation and mappings.

    GET /app-logs
  2. Verify the test documents are in the index.

    GET /app-logs/_search
  3. Run the search query and examine the response.

    {
      ...
      "aggregations": {
        "logs_by_the_hour": {
          "buckets": [
            {
              "key_as_string": "2024-05-24T10:00:00.000Z",
              "key": 1716544800000,
              "doc_count": 2,
              "log_severity": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "DEBUG",
                    "doc_count": 1
                  },
                  {
                    "key": "INFO",
                    "doc_count": 1
                  }
                ]
              }
            },
            {
              "key_as_string": "2024-05-24T11:00:00.000Z",
              "key": 1716548400000,
              "doc_count": 1,
              "log_severity": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "WARNING",
                    "doc_count": 1
                  }
                ]
              }
            },
            {
              "key_as_string": "2024-05-24T12:00:00.000Z",
              "key": 1716552000000,
              "doc_count": 1,
              "log_severity": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                  {
                    "key": "ERROR",
                    "doc_count": 1
                  }
                ]
              }
            }
          ]
        }
      }
    }

Considerations

  • Setting size to 0 ensures the search doesn’t return any documents, focusing solely on the aggregations.
  • The date_histogram aggregation groups documents based on the @timestamp field with an interval of one hour.
  • The nested terms aggregation within the logs_by_hour aggregation counts the occurrences of each unique log_level within each hour bucket.

Clean-up (optional)

  • Delete the index.

    DELETE app-logs

Documentation

Example 4: Finding the Stock with the Highest Daily Volume of the Month

This is taken from a webinar by Elastic to show a sample question and answer to the Certified Engineer Exam. Their answer was wrong and didn’t need aggregations.

Requirements

  • Create a query to find the stock with the highest daily volume for the current month.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Index sample data:

    • Use the _bulk endpoint to index sample stock data.

    • Ensure the data includes fields for stock_name, date, and volume.

      POST _bulk
      { "index": { "_index": "stocks", "_id": "1" } }
      { "stock_name": "AAPL", "date": "2024-07-01", "volume": 1000000 }
      { "index": { "_index": "stocks", "_id": "2" } }
      { "stock_name": "AAPL", "date": "2024-07-02", "volume": 1500000 }
      { "index": { "_index": "stocks", "_id": "3" } }
      { "stock_name": "GOOGL", "date": "2024-07-01", "volume": 2000000 }
      { "index": { "_index": "stocks", "_id": "4" } }
      { "stock_name": "GOOGL", "date": "2024-07-02", "volume": 2500000 }
      { "index": { "_index": "stocks", "_id": "5" } }
      { "stock_name": "MSFT", "date": "2024-07-01", "volume": 3000000 }
      { "index": { "_index": "stocks", "_id": "6" } }
      { "stock_name": "MSFT", "date": "2024-07-02", "volume": 3500000 }
      { "index": { "_index": "stocks", "_id": "7" } }
      { "stock_name": "TSLA", "date": "2024-07-01", "volume": 4000000 }
      { "index": { "_index": "stocks", "_id": "8" } }
      { "stock_name": "TSLA", "date": "2024-07-02", "volume": 4500000 }
      { "index": { "_index": "stocks", "_id": "9" } }
      { "stock_name": "AMZN", "date": "2024-07-01", "volume": 5000000 }
      { "index": { "_index": "stocks", "_id": "10" } }
      { "stock_name": "AMZN", "date": "2024-07-02", "volume": 5500000 }
  3. Create the query. The stocks in the index are all from July, but you want just the stocks for the latest month. Update the above dates so the query will work for you.

      GET stocks/_search
      {
        "size": 1, 
        "query": {
          "range": {
            "date": {
              "gte": "now/M",
              "lte": "now"
            }
          }
        }
      }
  4. The results of the query should be all the stocks from a given month. Now sort those stocks by their volume and display the top pick.

    GET stocks/_search
    {
      "size": 1, 
      "query": {
        "range": {
          "date": {
            "gte": "now/M",
            "lte": "now"
          }
        }
      },
      "sort": [
        {
          "volume": {
            "order": "desc"
          }
        }
      ]
    }

Test

  1. Verify the index creation and mappings.

    GET /stocks
  2. Verify the test documents are in the index.

    GET /stocks/_search
  3. Run the query and confirm that the stock with the highest daily volume of the month is displayed.

    {
      ...
        "hits": [
          {
            "_index": "stocks",
            "_id": "10",
            "_score": null,
            "_source": {
              "stock_name": "AMZN",
              "date": "2024-07-02",
              "volume": 5500000
            },
            "sort": [
              5500000
            ]
          }
        ]
      }
    }

Considerations

  • The range clause returned the stocks for the current month
  • The sort clause brought the highest volume of any stock to the top and size of 1 displayed that one record

Clean-up (Optional)

  • Delete the stocks index to clean up the data:

    DELETE /stocks

Documentation

Example 5: Aggregating Sales Data by Month with Sub-Aggregation of Total Sales Value

Requirements

  • Aggregate e-commerce sales data by month, creating at least 12 date buckets.
  • Perform a sub-aggregation to calculate the total sales value within each month.

Steps

  1. Index Sample Sales Documents Using _bulk Endpoint:

    POST /sales_data/_bulk
    { "index": { "_id": "1" } }
    { "order_date": "2023-01-15", "product": "Yoo-hoo Beverage", "quantity": 10, "price": 1.99 }
    { "index": { "_id": "2" } }
    { "order_date": "2023-02-20", "product": "Apple iPhone 12", "quantity": 1, "price": 799.99 }
    { "index": { "_id": "3" } }
    { "order_date": "2023-03-05", "product": "Choco-Lite Bar", "quantity": 25, "price": 0.99 }
    { "index": { "_id": "4" } }
    { "order_date": "2023-04-10", "product": "Nike Air Max 270", "quantity": 3, "price": 150.00 }
    { "index": { "_id": "5" } }
    { "order_date": "2023-05-18", "product": "Samsung Galaxy S21", "quantity": 2, "price": 699.99 }
    { "index": { "_id": "6" } }
    { "order_date": "2023-06-22", "product": "Yoo-hoo Beverage", "quantity": 15, "price": 1.99 }
    { "index": { "_id": "7" } }
    { "order_date": "2023-07-03", "product": "Choco-Lite Bar", "quantity": 30, "price": 0.99 }
    { "index": { "_id": "8" } }
    { "order_date": "2023-08-25", "product": "Apple iPhone 12", "quantity": 1, "price": 799.99 }
    { "index": { "_id": "9" } }
    { "order_date": "2023-09-10", "product": "Nike Air Max 270", "quantity": 4, "price": 150.00 }
    { "index": { "_id": "10" } }
    { "order_date": "2023-10-15", "product": "Samsung Galaxy S21", "quantity": 1, "price": 699.99 }
    { "index": { "_id": "11" } }
    { "order_date": "2023-11-20", "product": "Yoo-hoo Beverage", "quantity": 20, "price": 1.99 }
    { "index": { "_id": "12" } }
    { "order_date": "2023-12-30", "product": "Choco-Lite Bar", "quantity": 50, "price": 0.99 }
  2. Bucket the order_date using a Date Histogram Aggregation with Sub-Aggregation:

    • Use a date_histogram to create monthly buckets and a sum sub-aggregation to calculate total sales within each month.

      GET /sales_data/_search
      {
        "size": 0,
        "aggs": {
          "sales_over_time": {
            "date_histogram": {
              "field": "order_date",
              "calendar_interval": "month",
              "format": "yyyy-MM"
            },
            "aggs": {
              "total_sales": {
                "sum": {
                  "field": "total_value"
                }
              }
            }
          }
        }
      }
  3. Calculate the Total Value:

    • Before running the above aggregation, ensure that each document includes a total_value field. You could either compute it on the client side or dynamically compute it using an ingest pipeline or a script during the aggregation process.

      For simplicity, let’s assume the total_value is calculated as quantity * price:

      POST /sales_data/_update_by_query
      {
        "script": {
          "source": "ctx._source.total_value = ctx._source.quantity * ctx._source.price"
        },
        "query": {
          "match_all": {}
        }
      }

Test

  • Run the above GET /sales_data/_search query.
  • Check the output to see 12 date buckets, one for each month, with the total_sales value for each bucket.

Considerations

  • The date_histogram aggregation is ideal for grouping records by time intervals such as months, weeks, or days.
  • The sum sub-aggregation allows you to calculate the total value of sales within each date bucket.
  • Ensure that the total_value field is correctly calculated, as this impacts the accuracy of the sub-aggregation.

Clean-up (Optional)

  • Delete the stocks index to clean up the data:

    DELETE /sales_data

Documentation

2.6 Task: Write and execute a query that searches across multiple clusters

If you are running your instance of Elasticsearch locally, and need to create an additional cluster so that you can run these examples, go to the Appendix: Adding a Cluster to your Elasticsearch Instance for information on how to set up an additional single-node cluster.

Example 1: Creating search queries for Products in Multiple Clusters

Requirements

  • Set up two single-node clusters on localhost or Elastic Cloud.
  • Create an index in each cluster.
  • Index at least four documents in each cluster using the _bulk endpoint.
  • Configure cross-cluster search.
  • Execute a cross-cluster search query.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Set up multiple clusters on localhost.

  • Assume you have two clusters, es01 and es02 and they have been set up as directed in the Appendix.

  • In the local cluster, configure communication between the clusters by updating the local cluster settings.

    PUT /_cluster/settings
    {
      "persistent": {
        "cluster": {
          "remote": {
            "es01": {
              "seeds": [
                "es01:9300"
              ],
              "skip_unavailable": true
            },
            "es02": {
              "seeds": [
                "es02:9300"
              ],
              "skip_unavailable": false
            }
          }
        }
      }
    }
  1. Create a product index in each cluster.
  • From the Kibana Console (es01)

    PUT /products
    {
      "mappings": {
        "properties": {
          "product": {
            "type": "text"
          },
          "category": {
            "type": "keyword"
          },
          "price": {
            "type": "double"
          }
        }
      }
    }
  • From the command line (es02).

    curl -u elastic:[your password here] -X PUT "http://localhost:9201/products?pretty" -H 'Content-Type: application/json' -d'
    {
      "mappings": {
        "properties": {
          "product": {
            "type": "text"
          },
          "category": {
            "type": "keyword"
          },
          "price": {
            "type": "double"
          }
        }
      }
    }'
  1. Index product documents into each cluster.
  • For es01:

    POST /products/_bulk
    { "index": { "_id": "1" } }
    { "product": "Elasticsearch Guide", "category": "Books", "price": 29.99 }
    { "index": { "_id": "2" } }
    { "product": "Advanced Elasticsearch", "category": "Books", "price": 39.99 }
    { "index": { "_id": "3" } }
    { "product": "Elasticsearch T-shirt", "category": "Apparel", "price": 19.99 }
    { "index": { "_id": "4" } }
    { "product": "Elasticsearch Mug", "category": "Apparel", "price": 12.99 }
  • For es02 through the command line (note that the final single quote is on a line by itself):

    curl -u elastic:[your password here] -X POST "http://localhost:9201/products/_bulk?pretty" -H 'Content-Type: application/json' -d'
    { "index": { "_id": "5" } }
    { "product": "Elasticsearch Stickers", "category": "Accessories", "price": 4.99 }
    { "index": { "_id": "6" } }
    { "product": "Elasticsearch Notebook", "category": "Stationery", "price": 7.99 }
    { "index": { "_id": "7" } }
    { "product": "Elasticsearch Pen", "category": "Stationery", "price": 3.49 }
    { "index": { "_id": "8" } }
    { "product": "Elasticsearch Hoodie", "category": "Apparel", "price": 45.99 }
    '
  1. Configure Cross-Cluster Search (CCS).
  • In the local cluster, ensure the remote cluster is configured by checking the settings:

    GET /_cluster/settings?include_defaults=true&filter_path=defaults.cluster.remote
  1. Execute a Cross-Cluster Search query.

    GET /products,es02:products/_search
    {
      "query": {
        "match": {
          "product": "Elasticsearch"
        }
      }
    }

Test

  1. Verify the index creation.

    GET /products

    From the command line execute:

    curl -u elastic:[your password here] -X GET "http://localhost:9201/products?pretty"
  2. Verify that the documents have been indexed.

    GET /products/_search
    GET /es02:products/_search
  3. Ensure the remote cluster is correctly configured and visible from the local cluster.

    GET /_remote/info
  4. Execute a Cross-Cluster Search query.

    GET /products,es02:products/_search
    {
      "query": {
        "match": {
          "product": "Elasticsearch"
        }
      }
    }

Considerations

  • Cross-cluster search is useful for querying data across multiple Elasticsearch clusters, providing a unified search experience.
  • Ensure the remote cluster settings are correctly configured in the cluster settings.
  • Properly handle the index names to avoid conflicts and ensure clear distinction between clusters.

Clean-up (optional)

  • Delete the es01 index.

    DELETE products
  • Delete the es02 index from the command line.

    curl -u elastic:[your password here] -X DELETE "http://localhost:9201/products?pretty"

Documentation

2.7 Task: Write and execute a search that utilizes a runtime field

Example 1: Creating search queries for products with a runtime field for discounted prices

Requirements

  • Create an index.
  • Index four documents.
  • Define a runtime field.
  • Execute a search query that creates a query-time runtime field with a 10% discount

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create an index.

PUT /product_index
{
  "mappings": {
    "properties": {
      "product": {
        "type": "text"
      },
      "price": {
        "type": "double"
      },
      "category": {
        "type": "keyword"
      }
    }
  }
}
  1. Index some documents.
POST /product_index/_bulk
{ "index": { "_id": "1" } }
{ "product": "Elasticsearch Guide", "price": 29.99, "category": "Books" }
{ "index": { "_id": "2" } }
{ "product": "Advanced Elasticsearch", "price": 39.99, "category": "Books" }
{ "index": { "_id": "3" } }
{ "product": "Elasticsearch T-shirt", "price": 19.99, "category": "Apparel" }
{ "index": { "_id": "4" } }
{ "product": "Elasticsearch Mug", "price": 12.99, "category": "Apparel" }
  1. Define a query-time runtime field to return a discounted price.

    GET product_index/_search
    {
      "query": {
        "match_all": {}
      },
      "fields": [
        "product", "price", "discounted_price"
      ], 
      "runtime_mappings": {
        "discounted_price": {
          "type": "double",
          "script": {
            "source": "emit(doc['price'].value * 0.9)"
          }
        }
      }
    }

Test

  1. Verify the creation of the index and its mappings.

    GET /product_index
  2. Verify the indexed documents.

    GET /product_index/_search
  3. Execute the query and confirm the discounted_price.

    {
      ...
        "hits": [
          {
            ...
            "fields": {
              "product": [
                "Elasticsearch Guide"
              ],
              "price": [
                29.99
              ],
              "discounted_price": [
                26.991
              ]
            }
          },
          {
            ...
            "fields": {
              "product": [
                "Advanced Elasticsearch"
              ],
              "price": [
                39.99
              ],
              "discounted_price": [
                35.991
              ]
            }
          },
          {
            ...
            "fields": {
              "product": [
                "Elasticsearch T-shirt"
              ],
              "price": [
                19.99
              ],
              "discounted_price": [
                17.991
              ]
            }
          },
          {
            ...
            "fields": {
              "product": [
                "Elasticsearch Mug"
              ],
              "price": [
                12.99
              ],
              "discounted_price": [
                11.691
              ]
            }
          }
        ]
      }
    }

Considerations

  • Runtime fields allow for dynamic calculation of field values at search time, useful for complex calculations or when the field values are not stored.
  • The script in the runtime field calculates the discounted price by applying a 10% discount to the price field.

Clean-up (optional)

  • Delete the index.

    DELETE product_index

Documentation

Example 2: Creating search queries for employees with a calculated total salary

In this example, the runtime field is defined as part of the index that executes code when documents are indexed. The salary field is read at index time to create a new value for the runtime field total_salary.

Requirements

  • An index (employees) with documents containing employee information (name, department, salary)and a runtime field (total_salary) to calculate the total salary of each employee.
  • A search query to retrieve employees with a total salary above $65,000.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create the employees index with a mapping for the runtime field.

    PUT employees
    {
      "mappings": {
        "properties": {
          "name": {
            "type": "text"
          },
          "department": {
            "type": "text"
          },
          "salary": {
            "type": "integer"
          },
          "total_salary": {
            "type": "long",
            "script": {
              "source": "emit(doc['salary'].value * 12)"
            }
          }
        }
      }
    }
  3. Index some documents that contain a monthly salary.

    POST /employees/_bulk
    { "index": { "_id": "1" } }
    { "name": "John Doe", "department": "Sales", "salary": 4000 }
    { "index": { "_id": "2" } }
    { "name": "Jane Smith", "department": "Marketing", "salary": 6000 }
    { "index": { "_id": "3" } }
    { "name": "Bob Johnson", "department": "IT", "salary": 7000 }
    { "index": { "_id": "4" } }
    { "name": "Alice Brown", "department": "HR", "salary": 5000 }
  4. Execute a search query with a runtime field.

    GET employees/_search
    {
      "query": {
        "range": {
          "total_salary": {
            "gte": 65000
          }
        }
      },
      "fields": [
        "total_salary"
      ]
    }

Test

  1. Verify the creation of the index and its mappings.

    GET /employees
  2. Verify the indexed documents.

    GET /employees/_search
  3. Execute the query and verify the search results contain only employees with a total salary above 65000.

    {
      ...
        "hits": [
          {
            "_index": "employees",
            "_id": "2",
            "_score": 1,
            "_source": {
              "name": "Jane Smith",
              "department": "Marketing",
              "salary": 6000
            },
            "fields": {
              "total_salary": [
                72000
              ]
            }
          },
          {
            "_index": "employees",
            "_id": "3",
            "_score": 1,
            "_source": {
              "name": "Bob Johnson",
              "department": "IT",
              "salary": 7000
            },
            "fields": {
              "total_salary": [
                84000
              ]
            }
          }
        ]
      }
    }

Considerations

  • Runtime fields are calculated on the fly and can be used in search queries, aggregations, and sorting.
  • The script used in the runtime field calculates the total salary by multiplying the monthly salary by 12 months.

Clean-up (optional)

  • Delete the index.

    DELETE employees

Documentation

Example 3: Creating search queries with a runtime field for restaurant data

Requirements

  • Create a search query for restaurants in New York City.
  • Include the restaurant’s name, cuisine, and a calculated rating_score in the search results.
    • the rating_score is calculated by taking the square root of the product of the review_score and number_of_reviews.

Steps

  1. Open the Kibana Console or use a REST client.

  2. Create a restaurant index.

    PUT restaurants
    {
      "mappings": {
        "properties": {
          "city": {
            "type": "keyword"
          },
          "cuisine": {
            "type": "text"
          },
          "name": {
            "type": "text"
          },
          "number_of_reviews": {
            "type": "long"
          },
          "review_score": {
            "type": "float"
          },
          "state": {
            "type": "keyword"
          }
        }
      }
    }
  3. Index some sample restaurant documents.

    POST /restaurants/_bulk
    { "index": { "_id": 1 } }
    { "name": "Tasty Bites", "city": "New York", "state": "NY", "cuisine": "Italian", "review_score": 4.5, "number_of_reviews": 200 }
    { "index": { "_id": 2 } }
    { "name": "Spicy Palace", "city": "Los Angeles", "state": "CA", "cuisine": "Indian", "review_score": 4.2, "number_of_reviews": 150 }
    { "index": { "_id": 3 } }
    { "name": "Sushi Spot", "city": "San Francisco", "state": "CA", "cuisine": "Japanese", "review_score": 4.7, "number_of_reviews": 300 }
    { "index": { "_id": 4 } }
    { "name": "Burger Joint", "city": "Chicago", "state": "IL", "cuisine": "American", "review_score": 3.8, "number_of_reviews": 100 }
  4. Create a query to return restaurants based from New York City.

    GET restaurants/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "city": {
                  "value": "New York"
                }
              }
            },
            {
              "term": {
                "state": {
                  "value": "NY"
                }
              }
            }
          ]
        }
      }
    }
  5. Define a runtime field named weighted_rating to calculate a weighted rating score for New York restaurants.

    GET restaurants/_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "city": {
                  "value": "New York"
                }
              }
            },
            {
              "term": {
                "state": {
                  "value": "NY"
                }
              }
            }
          ]
        }
      }, 
      "runtime_mappings": {
        "rating_score": {
          "type": "double",
          "script": {
            "source": "emit(Math.sqrt(doc['review_score'].value * doc['number_of_reviews'].value))"
          }
        }
      },
      "fields": [
        "rating_score"
      ]
    }

Test

  • Verify the creation of the index and its mappings.

    GET /restaurants
  • Verify the indexed documents.

    GET /restaurants/_search
  • Execute the query and verify the restaurant name, cuisine type, and the calculated weighted rating score for restaurants located in New York, NY.

    {
      ...
        "hits": [
          {
            "_index": "restaurants",
            "_id": "1",
            "_score": 2.4079456,
            "_source": {
              "name": "Tasty Bites",
              "city": "New York",
              "state": "NY",
              "cuisine": "Italian",
              "review_score": 4.5,
              "number_of_reviews": 200
            },
            "fields": {
              "rating_score": [
                30
              ]
            }
          }
        ]
      }
    }

Considerations

  • The runtime_mappings section defines a new field weighted_rating that calculates a weighted rating score based on the review_score and number_of_reviews fields.
  • The query section uses the term query to search for restaurants in New York, NY.
  • The fields section specifies the fields to include in the search results (in this case, the runtime field weighted_rating).

Clean-up (optional)

  • Delete the index.

    DELETE restaurants

Documentation