一、电视案例
1.1 数据准备
创建索引及映射
建立价格、颜色、品牌、售卖日期 字段
PUT /tvs
PUT /tvs/_mapping { "properties": { "price": { "type": "long" }, "color": { "type": "keyword" }, "brand": { "type": "keyword" }, "sold_date": { "type": "date" } } }
插入数据
POST /tvs/_bulk {"index":{}} {"price":1000,"color":"红色","brand":"长虹","sold_date":"2019-10-28"} {"index":{}} {"price":2000,"color":"红色","brand":"长虹","sold_date":"2019-11-05"} {"index":{}} {"price":3000,"color":"绿色","brand":"小米","sold_date":"2019-05-18"} {"index":{}} {"price":1500,"color":"蓝色","brand":"TCL","sold_date":"2019-07-02"} {"index":{}} {"price":1200,"color":"绿色","brand":"TCL","sold_date":"2019-08-19"} {"index":{}} {"price":2000,"color":"红色","brand":"长虹","sold_date":"2019-11-05"} {"index":{}} {"price":8000,"color":"红色","brand":"三星","sold_date":"2020-01-01"} {"index":{}} {"price":2500,"color":"蓝色","brand":"小米","sold_date":"2020-02-12"}
1.2 统计哪种颜色的电视销量最高
不加query 默认查询全部
GET /tvs/_search { "size": 0, "aggs": { "popular_colors": { "terms": { "field": "color" } } } }
查询条件解析
- size:只获取聚合结果,而不要执行聚合的原始数据
- aggs:固定语法,要对一份数据执行分组聚合操作
- popular_colors:就是对每个aggs,都要起一个名字,
- terms:根据字段的值进行分组
- field:根据指定的字段的值进行分组
返回
{ "took" : 121, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "popular_colors" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "红色", "doc_count" : 4 }, { "key" : "绿色", "doc_count" : 2 }, { "key" : "蓝色", "doc_count" : 2 } ] } } }
返回结果解析
- hits.hits:我们指定了size是0,所以hits.hits就是空的
- aggregations:聚合结果
- popular_color:我们指定的某个聚合的名称
- buckets:根据我们指定的field划分出的buckets
- key:每个bucket对应的那个值
- doc_count:这个bucket分组内,有多少个数量,其实就是这种颜色的销量
- bucket中的数据的默认的排序规则:按照doc_count降序排序
1.3 统计每种颜色电视平均价格
GET /tvs/_search { "size": 0, "aggs": { "colors": { "terms": { "field": "color" }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
在一个aggs执行的bucket操作(terms),平级的json结构下,再加一个aggs,
这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field,求一个平均值
返回:
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "colors" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "红色", "doc_count" : 4, "avg_price" : { "value" : 3250.0 } }, { "key" : "绿色", "doc_count" : 2, "avg_price" : { "value" : 2100.0 } }, { "key" : "蓝色", "doc_count" : 2, "avg_price" : { "value" : 2000.0 } } ] } } }
返回结果解析:
- avg_price:我们自己取的metric aggs的名字
- value:我们的metric计算的结果,每个bucket中的数据的price字段求平均值后的结果
相当于sql: select avg(price) from tvs group by color
1.4 每个颜色下,平均价格及每个颜色下,每个品牌的平均价格
多个子聚合
GET /tvs/_search { "size": 0, "aggs": { "group_by_color": { "terms": { "field": "color" }, "aggs": { "color_avg_price": { "avg": { "field": "price" } }, "group_by_brand": { "terms": { "field": "brand" }, "aggs": { "brand_avg_price": { "avg": { "field": "price" } } } } } } } }
返回
查看代码
{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_color" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "红色", "doc_count" : 4, "color_avg_price" : { "value" : 3250.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "长虹", "doc_count" : 3, "brand_avg_price" : { "value" : 1666.6666666666667 } }, { "key" : "三星", "doc_count" : 1, "brand_avg_price" : { "value" : 8000.0 } } ] } }, { "key" : "绿色", "doc_count" : 2, "color_avg_price" : { "value" : 2100.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "TCL", "doc_count" : 1, "brand_avg_price" : { "value" : 1200.0 } }, { "key" : "小米", "doc_count" : 1, "brand_avg_price" : { "value" : 3000.0 } } ] } }, { "key" : "蓝色", "doc_count" : 2, "color_avg_price" : { "value" : 2000.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "TCL", "doc_count" : 1, "brand_avg_price" : { "value" : 1500.0 } }, { "key" : "小米", "doc_count" : 1, "brand_avg_price" : { "value" : 2500.0 } } ] } } ] } } }
1.5 求出每个颜色的销售数量,平均价格、最小价格、最大价格、价格总和
GET /tvs/_search { "size": 0, "aggs": { "colors": { "terms": { "field": "color" }, "aggs": { "color_avg_price": { "avg": { "field": "price" } }, "color_min_price": { "min": { "field": "price" } }, "color_max_price": { "max": { "field": "price" } }, "color_sum_price": { "sum": { "field": "price" } } } } } }
返回:
查看代码
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "colors" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "红色", "doc_count" : 4, "color_avg_price" : { "value" : 3250.0 }, "color_min_price" : { "value" : 1000.0 }, "color_max_price" : { "value" : 8000.0 }, "color_sum_price" : { "value" : 13000.0 } }, { "key" : "绿色", "doc_count" : 2, "color_avg_price" : { "value" : 2100.0 }, "color_min_price" : { "value" : 1200.0 }, "color_max_price" : { "value" : 3000.0 }, "color_sum_price" : { "value" : 4200.0 } }, { "key" : "蓝色", "doc_count" : 2, "color_avg_price" : { "value" : 2000.0 }, "color_min_price" : { "value" : 1500.0 }, "color_max_price" : { "value" : 2500.0 }, "color_sum_price" : { "value" : 4000.0 } } ] } } }
返回结果解析
- count:bucket,terms,自动就会有一个doc_count,就相当于是count
- avg:avg aggs,求平均值
- max:求一个bucket内,指定field值最大的那个数据
- min:求一个bucket内,指定field值最小的那个数据
- sum:求一个bucket内,指定field值的总和
1.6 划分范围 histogram(直方图),求出价格每2000为一个区间,每个区间的销售总额
GET /tvs/_search { "size": 0, "aggs": { "price": { "histogram": { "field": "price", "interval": 2000 }, "aggs": { "income": { "sum": { "field": "price" } } } } } }
histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作
"histogram": { "field": "price", "interval": 2000 }
interval:2000,划分范围,左闭右开区间 ,[0~2000),2000~4000,4000~6000,6000~8000,8000~10000
bucket有了之后,一样的,去对每个bucket执行avg,count,sum,max,min,等各种metric操作,聚合分析
1.7 按照日期分组聚合,求出每个月销售个数
参数解析:
- date_histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket
- min_doc_count:即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,不然默认是会过滤掉这个区间的 extended_bounds,
- min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内
GET /tvs/_search { "size" : 0, "aggs": { "date_sales": { "date_histogram": { "field": "sold_date", "interval": "month", "format": "yyyy-MM-dd", "min_doc_count" : 0, "extended_bounds" : { "min" : "2019-01-01", "max" : "2020-12-31" } } } } }
返回
查看代码
#! Deprecation: [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future. { "took" : 11, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "date_sales" : { "buckets" : [ { "key_as_string" : "2019-01-01", "key" : 1546300800000, "doc_count" : 0 }, { "key_as_string" : "2019-02-01", "key" : 1548979200000, "doc_count" : 0 }, { "key_as_string" : "2019-03-01", "key" : 1551398400000, "doc_count" : 0 }, { "key_as_string" : "2019-04-01", "key" : 1554076800000, "doc_count" : 0 }, { "key_as_string" : "2019-05-01", "key" : 1556668800000, "doc_count" : 1 }, { "key_as_string" : "2019-06-01", "key" : 1559347200000, "doc_count" : 0 }, { "key_as_string" : "2019-07-01", "key" : 1561939200000, "doc_count" : 1 }, { "key_as_string" : "2019-08-01", "key" : 1564617600000, "doc_count" : 1 }, { "key_as_string" : "2019-09-01", "key" : 1567296000000, "doc_count" : 0 }, { "key_as_string" : "2019-10-01", "key" : 1569888000000, "doc_count" : 1 }, { "key_as_string" : "2019-11-01", "key" : 1572566400000, "doc_count" : 2 }, { "key_as_string" : "2019-12-01", "key" : 1575158400000, "doc_count" : 0 }, { "key_as_string" : "2020-01-01", "key" : 1577836800000, "doc_count" : 1 }, { "key_as_string" : "2020-02-01", "key" : 1580515200000, "doc_count" : 1 }, { "key_as_string" : "2020-03-01", "key" : 1583020800000, "doc_count" : 0 }, { "key_as_string" : "2020-04-01", "key" : 1585699200000, "doc_count" : 0 }, { "key_as_string" : "2020-05-01", "key" : 1588291200000, "doc_count" : 0 }, { "key_as_string" : "2020-06-01", "key" : 1590969600000, "doc_count" : 0 }, { "key_as_string" : "2020-07-01", "key" : 1593561600000, "doc_count" : 0 }, { "key_as_string" : "2020-08-01", "key" : 1596240000000, "doc_count" : 0 }, { "key_as_string" : "2020-09-01", "key" : 1598918400000, "doc_count" : 0 }, { "key_as_string" : "2020-10-01", "key" : 1601510400000, "doc_count" : 0 }, { "key_as_string" : "2020-11-01", "key" : 1604188800000, "doc_count" : 0 }, { "key_as_string" : "2020-12-01", "key" : 1606780800000, "doc_count" : 0 } ] } } }
注意:
#! Deprecation: [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future.
1.8 统计每季度每个品牌的销售额,及每季度的销售总额
GET /tvs/_search { "size": 0, "aggs": { "group_by_sold_date": { "date_histogram": { "field": "sold_date", "interval": "quarter", "format": "yyyy-MM-dd", "min_doc_count": 0, "extended_bounds": { "min": "2019-01-01", "max": "2020-12-31" } }, "aggs": { "group_by_brand": { "terms": { "field": "brand" }, "aggs": { "sum_price": { "sum": { "field": "price" } } } }, "total_sum_price": { "sum": { "field": "price" } } } } } }
返回
查看代码
#! Deprecation: [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future. { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_sold_date" : { "buckets" : [ { "key_as_string" : "2019-01-01", "key" : 1546300800000, "doc_count" : 0, "total_sum_price" : { "value" : 0.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ ] } }, { "key_as_string" : "2019-04-01", "key" : 1554076800000, "doc_count" : 1, "total_sum_price" : { "value" : 3000.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "小米", "doc_count" : 1, "sum_price" : { "value" : 3000.0 } } ] } }, { "key_as_string" : "2019-07-01", "key" : 1561939200000, "doc_count" : 2, "total_sum_price" : { "value" : 2700.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "TCL", "doc_count" : 2, "sum_price" : { "value" : 2700.0 } } ] } }, { "key_as_string" : "2019-10-01", "key" : 1569888000000, "doc_count" : 3, "total_sum_price" : { "value" : 5000.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "长虹", "doc_count" : 3, "sum_price" : { "value" : 5000.0 } } ] } }, { "key_as_string" : "2020-01-01", "key" : 1577836800000, "doc_count" : 2, "total_sum_price" : { "value" : 10500.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "三星", "doc_count" : 1, "sum_price" : { "value" : 8000.0 } }, { "key" : "小米", "doc_count" : 1, "sum_price" : { "value" : 2500.0 } } ] } }, { "key_as_string" : "2020-04-01", "key" : 1585699200000, "doc_count" : 0, "total_sum_price" : { "value" : 0.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ ] } }, { "key_as_string" : "2020-07-01", "key" : 1593561600000, "doc_count" : 0, "total_sum_price" : { "value" : 0.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ ] } }, { "key_as_string" : "2020-10-01", "key" : 1601510400000, "doc_count" : 0, "total_sum_price" : { "value" : 0.0 }, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ ] } } ] } } }
1.9 搜索与聚合结合,查询某个品牌按颜色销量
搜索与聚合可以结合起来。sql语句如下
select count(*) from tvs where brand like "%小米%" group by color
注意:任何的聚合,都必须在搜索出来的结果数据中之行。
GET /tvs/_search { "size": 0, "query": { "term": { "brand": { "value": "小米" } } }, "aggs": { "group_by_color": { "terms": { "field": "color" } } } }
返回
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_color" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "绿色", "doc_count" : 1 }, { "key" : "蓝色", "doc_count" : 1 } ] } } }
1.10 global bucket(全局桶):单个品牌与所有品牌销量对比
GET /tvs/_search { "size": 0, "query": { "term": { "brand": { "value": "小米" } } }, "aggs": { "single_brand_avg_price": { "avg": { "field": "price" } }, "all": { "global": {}, "aggs": { "all_brand_avg_price": { "avg": { "field": "price" } } } } } }
返回
{ "took" : 61, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "all" : { "doc_count" : 8, "all_brand_avg_price" : { "value" : 2650.0 } }, "single_brand_avg_price" : { "value" : 2750.0 } } }
返回结果解析:
- 一个结果,是基于query搜索结果来聚合的;
- 一个结果,是对所有数据执行聚合的
1.11 统计价格大于1200的电视平均价格
注意:单独使用filter 需加上constant_score
GET /tvs/_search { "size": 0, "query": { "constant_score": { "filter": { "range": { "price": { "gte": 1200 } } } } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } }
返回:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 7, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "avg_price" : { "value" : 2885.714285714286 } } }
1.12 bucket filter:统计品牌最近4年,3年的平均价格
注意:因为是最近的时间,所以读者实验的时候,需根据当前时间来自行设置查询范围
注意下面的区别
- aggs.filter,针对的是聚合去做的
- query里面的filter,是全局的,会对所有的数据都有影响
GET /tvs/_search { "size": 0, "query": { "term": { "brand": { "value": "小米" } } }, "aggs": { "recent_fouryear": { "filter": { "range": { "sold_date": { "gte": "now-4y" } } }, "aggs": { "recent_fouryear_avg_price": { "avg": { "field": "price" } } } }, "recent_threeyear": { "filter": { "range": { "sold_date": { "gte": "now-3y" } } }, "aggs": { "recent_threeyear_avg_price": { "avg": { "field": "price" } } } } } }
返回
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 2, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "recent_threeyear" : { "meta" : { }, "doc_count" : 2, "recent_threeyear_avg_price" : { "value" : 2750.0 } }, "recent_fouryear" : { "meta" : { }, "doc_count" : 2, "recent_fouryear_avg_price" : { "value" : 2750.0 } } } }
1.13 按每种颜色的平均销售额降序排序
GET /tvs/_search { "size": 0, "aggs": { "group_by_color": { "terms": { "field": "color", "order": { "avg_price": "desc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } }
返回:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_color" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "红色", "doc_count" : 4, "avg_price" : { "value" : 3250.0 } }, { "key" : "绿色", "doc_count" : 2, "avg_price" : { "value" : 2100.0 } }, { "key" : "蓝色", "doc_count" : 2, "avg_price" : { "value" : 2000.0 } } ] } } }
1.14 按每种颜色的每种品牌平均销售额降序排序
GET /tvs/_search { "size": 0, "aggs": { "group_by_color": { "terms": { "field": "color" }, "aggs": { "group_by_brand": { "terms": { "field": "brand", "order": { "avg_price": "desc" } }, "aggs": { "avg_price": { "avg": { "field": "price" } } } } } } } }
返回
查看代码
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_color" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "红色", "doc_count" : 4, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "三星", "doc_count" : 1, "avg_price" : { "value" : 8000.0 } }, { "key" : "长虹", "doc_count" : 3, "avg_price" : { "value" : 1666.6666666666667 } } ] } }, { "key" : "绿色", "doc_count" : 2, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "小米", "doc_count" : 1, "avg_price" : { "value" : 3000.0 } }, { "key" : "TCL", "doc_count" : 1, "avg_price" : { "value" : 1200.0 } } ] } }, { "key" : "蓝色", "doc_count" : 2, "group_by_brand" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "小米", "doc_count" : 1, "avg_price" : { "value" : 2500.0 } }, { "key" : "TCL", "doc_count" : 1, "avg_price" : { "value" : 1500.0 } } ] } } ] } } }