0%

ElasticSearch 學習紀錄 Part6 - searching

前言

這篇文章是出自於線上課程 Complete Guide to Elasticsearch 的所記錄的筆記。

這一篇文章主要介紹 ElasticSearch 搜尋資料的行為。

本文

  • 搜尋 index 的資料
    GET /<index>/_search

  • 針對特定內容來查詢
    GET /<index>/_search?q=<field>:<value>

E.g.

1
GET /analyzer_test/_search?q=description:dog

查詢到的資料會依序相關性來決定分數

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "analyzer_test",
"_type" : "_doc",
"_id" : "DAlf5n0BeiHXdjTuK4R7",
"_score" : 0.6931471,
"_source" : {
"description" : "Hey that dog!"
}
}
]
}
}
  • 多個 query
    1
    GET /analyzer_test/_search?q=description:dog AND type:dachshund

Query DSL

Query 可以分成 Leaf query 及 Compound query,後者可能是前者所組合而成。

E.g.
基礎的 query 語法

1
2
3
4
5
6
GET /analyzer_test/_search
{
"query": {
"match_all": {}
}
}

搜尋的運作模式

每個節點都可以扮演 coordinating node,當收到請求時,會將相同 index 但不同節點的 shard 整合起來,再回傳結果。

Score 相關性

ES 會先搜尋符合條件的資料,再將這些資料評分

常見的 relevance score 如下

  1. Term Frequence (TF): 根據 term 出現的次數來決定分數
  2. Inverse Document Frequency (IDF): 與 TF 相反
  3. Okapi BM25: TF + 上限(避免 stop word 高頻率出現)
  4. Field-length Norm: 字句越長,分數越低

新增參explain: true 來查看更詳細的 relevance score

1
2
3
4
5
6
7
8
9
GET /analyzer_test/_search
{
"explain": true,
"query": {
"term": {
"description": "dog"
}
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_shard" : "[analyzer_test][0]",
"_node" : "0lmKyQR9SQ-uYLPan-8BZw",
"_index" : "analyzer_test",
"_type" : "_doc",
"_id" : "DAlf5n0BeiHXdjTuK4R7",
"_score" : 0.6931471,
"_source" : {
"description" : "Hey that dog!"
},
"_explanation" : {
"value" : 0.6931471,
"description" : "weight(description:dog in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.6931471,
"description" : "score(freq=1.0), computed as boost * idf * tf from:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.6931472,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 1,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 2,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.45454544,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 3.0,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
}
]
}
}

Term level query vs. Full text query

  • term: 字句不會被分析
  • full: 字句會被分析,預設使用 standard analyzer

Reference

  1. Complete Guide to Elasticsearch