前言

這篇文章是出自於線上課程 Complete Guide to Elasticsearch 的所記錄的筆記。

這一篇文章要來比較 ElasticSearch term & match 之間的差異。

本文

Term

term 在進行搜尋時，不會先對字句做分析
通常使用在精確比對中，像是數字、日期及關鍵字。

二話不說，直接看範例

E.g.

GET /analyzer_test/_search
{
  "query": {
    "term": {
      "description": {
        "value": "dog"
      }
    }
  }  
}

上面這種寫法，可以把 value: ... 簡化成下面這樣子

GET /analyzer_test/_search
{
  "query": {
    "term": {
      "description": "dog"
    }
  }  
}

上面兩種得到的結果會是相同的。

terms

透過 term 搜尋，欄位加上 keyword 的效果會更好

GET /analyzer_test/_search
{
  "query": {
    "terms": {
      "test.keyword": [
          "Brunch",
          "Dinner"
        ]
    }
  }  
}

ids

透過 id 取得特定資料，做法相當單純，而且可以一次使用多筆 id 來搜尋。

E.g.

GET /analyzer_test/_search
{
  "query": {
    "ids": {
      "values": ["EQlV-30BeiHXdjTuKITq", "eBhV-30BH9MdJOJuGEIo"]
    }
  }
}

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "eBhV-30BH9MdJOJuGEIo",
        "_score" : 1.0,
        "_source" : {
          "test" : "Brunch"
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "EQlV-30BeiHXdjTuKITq",
        "_score" : 1.0,
        "_source" : {
          "test" : "Dinner"
        }
      }
    ]
  }
}

gte, gt, lte, lt

透過指定條件來取得特定範圍內的資料。

numeric

GET /analyzer_test/_search 
{
  "query": {
    "range": {
      "in_stock": {
        "gte": 10,
        "lte": 20
      }
    }
  }
}

{
  "took" : 858,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "wBgL_30BH9MdJOJuw0Km",
        "_score" : 1.0,
        "_source" : {
          "in_stock" : 20
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "wRgL_30BH9MdJOJu0kJr",
        "_score" : 1.0,
        "_source" : {
          "in_stock" : 13
        }
      }
    ]
  }
}

Date

也可以搜尋特定範圍日期的資料，Elasticsearch 預設使用的日期格式為 yyyy/mm/dd。

GET /analyzer_test/_search 
{
  "query": {
    "range": {
      "created": {
        "gte": "2021/01/01",
        "lte": "2021/12/31"
      }
    }
  }
}

{
  "took" : 538,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "EgkQ_30BeiHXdjTudIQr",
        "_score" : 1.0,
        "_source" : {
          "created" : "2021/01/02"
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "EwkQ_30BeiHXdjTuhISx",
        "_score" : 1.0,
        "_source" : {
          "created" : "2021/02/05"
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "wxgQ_30BH9MdJOJuk0Kw",
        "_score" : 1.0,
        "_source" : {
          "created" : "2021/10/05"
        }
      }
    ]
  }
}

若要修改搜尋日期的格式，可以加上欄位 format

E.g.

GET /analyzer_test/_search 
{
  "query": {
    "range": {
      "created": {
        "gte": "01/01/2021",
        "lte": "31/12/2021",
        "format": "dd/MM/yyyy||yyyy"
      }
    }
  }
}

也可以透過 anchor || 的方式來表達相對應的時間

<operator>: <date>||<relative_method>

E.g.

GET /analyzer_test/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2021/01/01||-1y",
      }
    }
  }
}

-1y: 相對於前面的日期，再減少一年
-1d: 相對於前面的日期，再少一天

也支援自動進退位的表達方式，進退位取決於 operator

E.g.

這邊的範例意思為，相對於前面的日期減少一年，再進位至下一個月份(因為 operator 是 gte，故是退位)

GET /analyzer_test/_search
{
  "query": {
    "range": {
      "created": {
        "gte": "2021/01/01||-1y\M",
      }
    }
  }
}

Non null value

我們可以針對特定欄位，來搜尋該欄位非空的那些資料

E.g.

現在部分資料如下，有一些資料的 created 為 null

...
{
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "EwkQ_30BeiHXdjTuhISx",
        "_score" : 1.0,
        "_source" : {
          "created" : "2021/02/05"
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "wxgQ_30BH9MdJOJuk0Kw",
        "_score" : 1.0,
        "_source" : {
          "created" : "2021/10/05"
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "FAm5_30BeiHXdjTu64S8",
        "_score" : 1.0,
        "_source" : {
          "created" : null
...

假設現在要來搜尋欄位 created 非空值的資料，可以透過下面這個語法

GET /analyzer_test/_search
{
  "query": {
    "exists": {
      "field": "created"
    }
  }
}

Prefix

假設目前有這兩筆資料

...
"_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "dxhU-30BH9MdJOJuJEKr",
        "_score" : 1.0,
        "_source" : {
          "tags" : [
            "Red"
          ]
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "0hjK_30BH9MdJOJu7ULf",
        "_score" : 1.0,
        "_source" : {
          "tags" : [
            "Redd"
          ]
        }
...

因為這兩筆資料的開頭都是 Red，我們可以透過 prefix 來幫助我們找到這些資料

GET /analyzer_test/_search
{
  "query": {
    "prefix": {
      "tags.keyword": "Red"
    }
  }
}

wildcard

ES 也支援透過 wildcard 的方式來搜尋特定的資料。

E.g.

GET /analyzer_test/_search
{
  "query": {
    "wildcard": {
      "tags.keyword": "Re*"
    }
  }
}

因為 wildcard 在搜尋時會遍巡該欄位的資料，如果 * or ? 放在字首來搜尋的話，效能會變差。

regexp

ES 支援透過 regular expression (Regex) 的方式來搜尋特定的資料。

E.g.

GET /analyzer_test/_search
{
  "query": {
    "regexp": {
      "tags.keyword": "R.*"
    }
  }
}

Elastic 使用的是 Lucene’s regular expression engine，所以部分 Regex 的功能並不支援。

Match

在 ES 中， query type match 用來做 full text query

E.g.

搜尋 title 裡頭有符合搜尋內容的資料

GET /analyzer_test/_search
{
  "query": {
    "match": {
      "title": "Cat"
    }
  }
}

透過 match 尋找的結果會被分析，也會得到相對應的分數

{
  "took" : 791,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.25069216,
    "hits" : [
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "KBhPBH4BH9MdJOJu9UMH",
        "_score" : 0.25069216,
        "_source" : {
          "title" : "Dog and Cat and Cat"
        }
      },
      {
        "_index" : "analyzer_test",
        "_type" : "_doc",
        "_id" : "FglQBH4BeiHXdjTu_YRA",
        "_score" : 0.18232156,
        "_source" : {
          "title" : "Dog and Horse and Cat"
        }
      }
    ]
  }
}

match 預設使用的 opreator 是 or，可以修改 operator 來改變搜尋模式。

GET /analyzer_test/_search
{
  "query": {
    "match": {
      "title": {
        "query": "dog is cat",
        "operator": "and"
      }
    }
  }
}

因為使用 and，所以預期 query 的內容都要出現 dog & is & cat，但 title 裡面沒有出現 is，因此資料就算有 dog & cat，也不會顯示出來。

如果希望字句是按照順序的呈現，可以使用 match_phrase

E.g.

GET /analyzer_test/_search
{
  "query": {
    "match_phrase": {
      "title": "dog and cat"
      }
    }
  }
}

若是希望在不同的欄位同時找到同一個字，可以使用 multi_match。

E.g.

要在欄位 title 及 description 找到 cat & dog。

GET /analyzer_test/_search
{
  "query": {
    "multi_match": {
      "query": "cat dog",
      "fields": ["title", "description"]
    }
  }
}

假如搜尋的字有兩個，兩個字出現在同一個欄位的分數與任一個字出現在其中一個欄位，前者的分數會比較高，而且 multi_match 也會選用分數較高的。

Reference

Complete Guide to Elasticsearch

Percy's blog

ElasticSearch 學習紀錄 Part7 - Term level queries vs Full text queries

前言

本文

Term

terms

ids

gte, gt, lte, lt

numeric

Date

Non null value

Prefix

wildcard

regexp

Match

Reference