# /
# Aggregations
聚合Aggregations将您的数据汇总为度量、统计或其他分析。聚合可帮助您回答以下问题:
我的网站的平均加载时间是多少?
根据交易量,谁是我最有价值的客户?
在我的网络上,什么会被视为大文件?
每个产品类别中有多少产品?
Elasticsearch将聚合组织为三类:
Metric aggregations指标聚合:根据字段值计算度量,例如总和或平均值。
Bucket aggregations桶聚合:根据字段值、范围或其他条件,将文档分组到bucket。
Pipeline aggregations管道聚合:从其他聚合而不是文档或字段获取输入。
2
3
4
5
6
7
8
9
10
# 桶聚合
Bucket aggregations
Adjacency matrix
Auto-interval date histogram
Children
Composite
Date histogram
Date range
Diversified sampler
Filter
Filters
Geo-distance
Geohash grid
Geotile grid
Global
Histogram //直方图
IP range
Missing
Multi Terms
Nested
Parent
Range
Rare terms
Reverse nested
Sampler
Significant terms
Significant text
Terms
Variable width histogram
Subtleties of bucketing range fields
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Terms
GET kibana_sample_data_ecommerce/_search?size=0
{
"aggs": {
"product_name": {//聚合标签
"terms": { "field": "products.product_name.keyword" }
}
}
}
2
3
4
5
6
7
8
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"product_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 9143,
"buckets" : [
{
"key" : "Ankle boots - black",
"doc_count" : 117
},
{
"key" : "Print T-shirt - black",
"doc_count" : 114
},
{
"key" : "Boots - black",
"doc_count" : 111
},
{
"key" : "Lace-up boots - black",
"doc_count" : 107
},
{
"key" : "Lace-up boots - resin coffee",
"doc_count" : 85
},
{
"key" : "Sweatshirt - black",
"doc_count" : 83
},
{
"key" : "Vest - black",
"doc_count" : 82
},
{
"key" : "Print T-shirt - white",
"doc_count" : 79
},
{
"key" : "Jersey dress - black",
"doc_count" : 78
},
{
"key" : "Across body bag - black",
"doc_count" : 73
}
]
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# 收集模式(广度优先与深度优先)
一个示例问题场景是在电影数据库中查询10位最受欢迎的演员及其5位最佳搭档:
GET /_search
{
"aggs": {
"actors": {//最受欢迎的演员
"terms": {
"field": "actors",
"size": 10
},
"aggs": {
"costars": {//最佳搭档
"terms": {
"field": "actors",
"size": 5
}
}
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
注意:深度优先查找actors,再深度优先查找actors,会出现组合爆炸。尽量避免。
明智的选择是首先确定10位最受欢迎的演员,然后再检查这10位演员的最佳搭档。这种替代策略是我们所说的breadth_first收集模式,而不是depth_firs模式。
GET /_search
{
"aggs": {
"actors": {
"terms": {
"field": "actors",
"size": 10,
"collect_mode": "breadth_first" //广度优先
},
"aggs": {
"costars": {
"terms": {
"field": "actors",
"size": 5
}
}
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 直方图聚合(Histogram aggregation)
bucket_key = Math.floor((value - offset) / interval) * interval + offset
Math.floor:向下取整
间隔interval(必选):必须是正小数,而偏移量offset必须是[0,interval)中的小数。
偏移量offset(可选):bucket_key默认从0开始,然后以均匀的步长继续,例如间隔是10,则前三个桶将是[0,10)[10,20)[20,30)。bucket边界可以通过使用offset选项来改变。
2
3
4
5
6
以下片段根据产品的价格以50为间隔对产品进行“桶式”计算:
POST /sales/_search?size=0
{
"aggs": {
"prices": {
"histogram": {
"field": "price",
"interval": 50
}
}
}
}
以下可能是回应:
{
...
"aggregations": {
"prices": {
"buckets": [
{
"key": 0.0,
"doc_count": 1
},
{
"key": 50.0,
"doc_count": 1
},
{
"key": 100.0,
"doc_count": 0
},
{
"key": 150.0,
"doc_count": 2
},
{
"key": 200.0,
"doc_count": 3
}
]
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# 指标聚合
Metrics aggregations
Avg
Boxplot
Cardinality
Extended stats
Geo-bounds
Geo-centroid
Geo-Line
Matrix stats
Max
Median absolute deviation
Min
Percentile ranks //排名
Percentiles //百分比
Rate
Scripted metric
Stats
String stats
Sum
T-test
Top hits
Top metrics
Value count
Weighted avg
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Avg
POST kibana_sample_data_ecommerce/_search?size=0
{
"aggs": {
"price": {//聚合标签
"avg": { "field": "products.price" }
}
}
}
2
3
4
5
6
7
8
{
"took" : 9,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"price" : {
"value" : 34.78560120699911
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 百分比聚合(Percentiles aggregation)
Percentiles aggregation 百分比聚合
是一个多值指标聚合,用于计算从聚合文档中提取的数值的一个或多个百分位数。这些值可以从文档中的特定数字或直方图字段中提取。
2
GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"price": {
"percentiles": {
"field": "products.price",
"percents": [ 0, 25, 50, 75, 99.9 ]//也可以来着直方图字段
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"price" : {
"values" : {
"0.0" : 5.98828125,
"25.0" : 16.984375,
"50.0" : 25.64178240740741,
"75.0" : 43.783569500674766,
"99.9" : 200.0
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# 直方图字段类型(Histogram field type)
以下create index API请求创建一个具有两个字段映射的新索引:
my_histogram,用于存储百分比数据的histogram field
my_text,用于存储直方图字段标题(次要,已经有id)
PUT my-index-000001
{
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
},
"my_text" : {
"type" : "keyword"
}
}
}
}
以下index API请求存储两个直方图的预聚合:histogram_1和histogram_2。
PUT my-index-000001/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
}
}
PUT my-index-000001/_doc/2
{
"my_text" : "histogram_2",
"my_histogram" : {
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts" : [8, 17, 8, 7, 6, 2]
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#对'直方图聚合字段'进行'直方图聚合'
POST my-index-000001/_search?size=0
{
"aggs": {
"my_histogram": {
"histogram": {
"field": "my_histogram",
"interval": 0.1
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_histogram" : {
"buckets" : [
{
"key" : 0.1,
"doc_count" : 11
},
{
"key" : 0.2,
"doc_count" : 47
},
{
"key" : 0.30000000000000004,//这里比较特殊,不是0.3
"doc_count" : 8
},
{
"key" : 0.4,
"doc_count" : 25
},
{
"key" : 0.5,
"doc_count" : 8
}
]
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# 动态范围直方图(HDR Histogram)
HDR Histogram (High Dynamic Range Histogram)
设置为3个有效数字的直方图中,
则对于高达1毫秒的值,它将保持1微秒的值分辨率,
对于最大跟踪值(1小时),它将保持3.6秒的值分辨率。
GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"price": {
"percentile_ranks": {
"field": "products.price",
"values": [24.9, 25.0],
"hdr": { "number_of_significant_value_digits": 2 }//2位有效数字四舍五入为[25, 25],3位有效数字为[24.9, 25.0]
}
}
}
}
GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"price": {
"percentile_ranks": {
"field": "products.price",
"values": [25, 25]
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 嵌套聚合
# 桶聚合嵌套指标聚合
先按照名字分组(桶聚合),再各组求平均值(指标聚合)。
GET kibana_sample_data_ecommerce/_search?size=0
{
"aggs": {
"product_name": {//聚合标签
"terms": { "field": "products.product_name.keyword" },
"aggs": {
"price": {//聚合标签
"avg": { "field": "products.price" }
}
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4675,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"product_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 9143,
"buckets" : [
{
"key" : "Ankle boots - black",
"doc_count" : 117,
"price" : {
"value" : 41.123451967592594
}
},
{
"key" : "Print T-shirt - black",
"doc_count" : 114,
"price" : {
"value" : 25.40187733208955
}
},
{
"key" : "Boots - black",
"doc_count" : 111,
"price" : {
"value" : 48.22790434160305
}
},
{
"key" : "Lace-up boots - black",
"doc_count" : 107,
"price" : {
"value" : 48.40599385245902
}
},
{
"key" : "Lace-up boots - resin coffee",
"doc_count" : 85,
"price" : {
"value" : 40.43890947164948
}
},
{
"key" : "Sweatshirt - black",
"doc_count" : 83,
"price" : {
"value" : 28.75931640625
}
},
{
"key" : "Vest - black",
"doc_count" : 82,
"price" : {
"value" : 25.300986842105264
}
},
{
"key" : "Print T-shirt - white",
"doc_count" : 79,
"price" : {
"value" : 23.698069852941178
}
},
{
"key" : "Jersey dress - black",
"doc_count" : 78,
"price" : {
"value" : 30.814624828296704
}
},
{
"key" : "Across body bag - black",
"doc_count" : 73,
"price" : {
"value" : 30.675489403735632
}
}
]
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# 在聚合后查询
在聚合后查询,其实是在聚合后过滤。hits[]是过滤后的记录。
GET kibana_sample_data_ecommerce/_search?size=2
{
"_source":["products.product_name","products.price"],
"aggs": {
"product_name": {//聚合标签
"terms": { "field": "products.product_name.keyword" },
"aggs": {
"price": {//聚合标签
"avg": { "field": "products.price" }
}
}
}
},
"query": {
"term": {
"products.product_name.keyword": "Sweatshirt - black"
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 83,
"relation" : "eq"
},
"max_score" : 5.1554728,
"hits" : [
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "cYPLx4sBZlLmIBOuZBAr",
"_score" : 5.1554728,
"_source" : {
"products" : [
{
"price" : 24.99,
"product_name" : "Sweatshirt - black"
},
{
"price" : 64.99,
"product_name" : "Briefcase - cognac"
}
]
}
},
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "eoPLx4sBZlLmIBOuZBAr",
"_score" : 5.1554728,
"_source" : {
"products" : [
{
"price" : 16.99,
"product_name" : "Sweatshirt - black"
},
{
"price" : 59.99,
"product_name" : "Rucksack - taupe"
},
{
"price" : 54.99,
"product_name" : "Summer dress - navy blazer"
},
{
"price" : 41.99,
"product_name" : "Summer dress - black"
}
]
}
}
]
},
"aggregations" : {
"product_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 98,
"buckets" : [
{
"key" : "Sweatshirt - black",
"doc_count" : 83,
"price" : {
"value" : 28.75931640625
}
},
{
"key" : "Basic T-shirt - black",
"doc_count" : 2,
"price" : {
"value" : 18.2353515625
}
},
{
"key" : "Bomber Jacket - black",
"doc_count" : 2,
"price" : {
"value" : 49.494140625
}
},
{
"key" : "Boots - black",
"doc_count" : 2,
"price" : {
"value" : 35.9921875
}
},
{
"key" : "Handbag - black",
"doc_count" : 2,
"price" : {
"value" : 20.485026041666668
}
},
{
"key" : "Jumper - black",
"doc_count" : 2,
"price" : {
"value" : 20.65234375
}
},
{
"key" : "Jumper - bordeaux",
"doc_count" : 2,
"price" : {
"value" : 24.484375
}
},
{
"key" : "Rucksack - black ",
"doc_count" : 2,
"price" : {
"value" : 29.158854166666668
}
},
{
"key" : "Summer dress - navy blazer",
"doc_count" : 2,
"price" : {
"value" : 39.244140625
}
},
{
"key" : "Watch - tan",
"doc_count" : 2,
"price" : {
"value" : 33.830729166666664
}
}
]
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
# 在聚合后排序
在聚合后排序。
GET kibana_sample_data_ecommerce/_search?size=2
{
"_source":["products.product_name","products.price"],
"aggs": {
"product_name": {//聚合标签
"terms": { "field": "products.product_name.keyword" },
"aggs": {
"price": {//聚合标签
"avg": { "field": "products.price" }
}
}
}
},
"query": {
"term": {
"products.product_name.keyword": "Sweatshirt - black"
}
},
"sort": [
{
"products.price": {
"order": "desc"
}
}
]
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 83,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "LIPLx4sBZlLmIBOuaBdv",
"_score" : null,
"_source" : {
"products" : [
{
"price" : 41.99,
"product_name" : "Bomber Jacket - black"
},
{
"price" : 28.99,
"product_name" : "Sweatshirt - black"
},
{
"price" : 74.99,
"product_name" : "Boots - taupe"
},
{
"price" : 109.99,
"product_name" : "Winter jacket - black"
}
]
},
"sort" : [
110.0
]
},
{
"_index" : "kibana_sample_data_ecommerce",
"_type" : "_doc",
"_id" : "G4PLx4sBZlLmIBOuZxQ1",
"_score" : null,
"_source" : {
"products" : [
{
"price" : 24.99,
"product_name" : "Sweatshirt - black"
},
{
"price" : 32.99,
"product_name" : "High-top trainers - black"
},
{
"price" : 84.99,
"product_name" : "Lace-up boots - Midnight Blue"
},
{
"price" : 7.99,
"product_name" : "3 PACK - Socks - black/grey/white "
}
]
},
"sort" : [
85.0
]
}
]
},
"aggregations" : {
"product_name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 98,
"buckets" : [
{
"key" : "Sweatshirt - black",
"doc_count" : 83,
"price" : {
"value" : 28.75931640625
}
},
{
"key" : "Basic T-shirt - black",
"doc_count" : 2,
"price" : {
"value" : 18.2353515625
}
},
{
"key" : "Bomber Jacket - black",
"doc_count" : 2,
"price" : {
"value" : 49.494140625
}
},
{
"key" : "Boots - black",
"doc_count" : 2,
"price" : {
"value" : 35.9921875
}
},
{
"key" : "Handbag - black",
"doc_count" : 2,
"price" : {
"value" : 20.485026041666668
}
},
{
"key" : "Jumper - black",
"doc_count" : 2,
"price" : {
"value" : 20.65234375
}
},
{
"key" : "Jumper - bordeaux",
"doc_count" : 2,
"price" : {
"value" : 24.484375
}
},
{
"key" : "Rucksack - black ",
"doc_count" : 2,
"price" : {
"value" : 29.158854166666668
}
},
{
"key" : "Summer dress - navy blazer",
"doc_count" : 2,
"price" : {
"value" : 39.244140625
}
},
{
"key" : "Watch - tan",
"doc_count" : 2,
"price" : {
"value" : 33.830729166666664
}
}
]
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# 管道聚合
Pipeline aggregations 管道聚合
Average bucket 平均桶
Bucket script 桶脚本
Bucket selector 桶选择器
Bucket sort 桶排序
Cumulative cardinality 累计基数
Cumulative sum 累计金额
Derivative 派生词
Extended stats bucket 扩展统计桶
Inference bucket 推理桶
Max bucket 最大桶
Min bucket 最小桶
Moving average 移动平均线
Moving function 移动功能
Moving percentiles 移动百分位数
Normalize 规范化
Percentiles bucket 百分比桶
Serial differencing 串行差异
Stats bucket 状态桶
Sum bucket 合计桶
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 基数计数(Cumulative cardinality)
基数计数聚合,作用是统计数量(不包括重复)。去重/分组
不适合并行。因为不同分片并行去重是无效的,除非重复数据必然在同一个分片中。
# 计数是近似值(Counts are approximate)
基数计数聚合基于HyperLogLog++算法,是一个近似值。
注意:这个算法适合并行。
假设您正在对商店销售额进行索引,并希望统计与查询匹配的已售出产品的唯一数量:
POST /sales/_search?size=0
{
"aggs": {
"type_count": {
"cardinality": {
"field": "type"
}
}
}
}
2
3
4
5
6
7
8
9
10
# 支持精度控制
此聚合还支持精确阈值precision_threshold选项:
POST /sales/_search?size=0
{
"aggs": {
"type_count": {
"cardinality": {
"field": "type",
"precision_threshold": 100 //默认值为3000。
}
}
}
}
2
3
4
5
6
7
8
9
10
11
precision_threshold选项允许用内存换取准确性,并定义了一个唯一的计数,低于该计数时,计数将接近准确。超过这个值,计数可能会变得更加模糊。支持的最大值为40000,高于此数字的阈值将与40000的阈值具有相同的效果。默认值为3000。
# 预先计算的哈希
对于基数较高的字符串字段,将字段值的哈希存储在索引中,然后在此字段上运行基数聚合可能会更快。这可以通过从客户端提供哈希值来实现,也可以通过使用mapper-murmur3插件让Elasticsearch为您计算哈希值。
# 脚本(Script)
如果您需要两个字段组合的基数,请创建一个将它们组合在一起的运行时字段并进行聚合。
POST /sales/_search?size=0
{
"runtime_mappings": {
"type_and_promoted": {
"type": "keyword",
"script": "emit(doc['type'].value + ' ' + doc['promoted'].value)"
}
},
"aggs": {
"type_promoted_count": {
"cardinality": {
"field": "type_and_promoted"
}
}
}
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 缺失值(Missing value)
缺少的参数定义了应该如何处理缺少值的文档。默认情况下,它们将被忽略,但也可以将它们视为有值。
标记字段中没有值的文档将与值为N/a的文档属于同一存储桶。
POST /sales/_search?size=0
{
"aggs": {
"tag_cardinality": {
"cardinality": {
"field": "tag",
"missing": "N/A"
}
}
}
}
2
3
4
5
6
7
8
9
10
11