查询 Tindex-Query-Json

Query,即查询。Druid包含多种查询类型。

http-post请求地址格式为:http://{broker_ip:broker_port}/druid/v2/?pretty
eg http://192.168.0.223:8082/druid/v2/?pretty

1. Timeseries

对于需要统计一段时间内的汇总数据,或者是指定时间粒度的汇总数据,Druid通过Timeseries来完成。

查询语句如下:

{
    "queryType": "lucene_timeseries",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": " all",
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    },
    "aggregations": [
        {
            "name": "__VALUE__",
            "type": "lucene_count"
        }
    ]
}

相当于SQL语句的:select count(*) from userinfo

输出可能如下:

[
    {
        "timestamp": "2017-01-01T00:00:00.000Z",
        "result": {
            "__VALUE__": 100000
        }
    }
]

Timeseries查询包含如下部分。

字段名 描述 是否必须
queryType 对于Timeseries查询,该字段的值必须是lucene_timeseries
dataSource 要查询数据集dataSource名字。详见dataSource
intervals 查询时间区间范围,ISO-8601 格式。详见interval
granularity 查询结果进行聚合的时间粒度
filter 过滤条件。详见filter
aggregations 聚合。详见aggregation
postAggregations 后期聚合。详见post-aggregation
descending 是否降序
context 指定一些查询参数,如结果是否进缓存等

Timeseries输出每个时间粒度内指定条件的统计信息,通过filter指定过滤条件,通过aggregationspostAggregations指定聚合方式。

Timeseries不能输出维度信息,granularity支持all,none,second,minute,fifteen_minute,thirty_minute,hour,day,week,month,quarter,year

  • all,汇总为1条输出。
  • none,不推荐使用。
  • 其他的,则输出相应粒度的统计信息。

2. TopN

TopN返回指定维度和排序字段的有序top-n序列。TopN支持返回前N条记录,并支持指定Metric为排序依据。

查询示例如下:

{
    "queryType": "lucene_topN",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "dimension":"userId",
    "threshold":3,
    "metric":{
        "type":"numeric",
        "metric":"sum(age)"
    },
    "aggregations":[
        {
            "name": "sum(age)",
            "type": "lucene_doubleSum",
            "fieldName": "age"
        }
    ]
}

TopN查询包含以下部分:

字段名 描述 是否必须
queryType 对于TopN查询,该字段的值必须是lucene_topN
dataSource 要查询数据集dataSource名字。详见dataSource
intervals 查询时间区间范围,ISO-8601格式。详见详见interval
granularity 查询结果进行聚合的时间粒度
filter 过滤条件。详见filter
aggregations 聚合。详见aggregation
postAggregations 后聚合器。详见post-aggregation
dimension 进行TopN查询的维度,一个TopN查询指定且只能指定一个维度。详见dimension
threshold TopN的 N 取值
metric 进行统计并排序的Metric
context 指定一些查询参数,如结果是否进缓存等
  • metric: TopN专属,指定排序依据。它有如下使用方式:
"metric":"<metric_name>" //默认方式,升序排序
"metric":{
    "type":"numeric",   //指定按照numeric 降序排列
    "metric":"<metric_name>"
}
"metric":{
    "type":"inverted",     //指定按照numeric 升序排列
    "metric":"<delegate_top_n_metric_spec>"
}
"metric":{
    "type":"lexicographic", //指定按照字典序排序
    "previousStop":"<previousStop_value>", //如b,按照字典序,排到b开头的为止
}
"metric":{
    "type":"alphaNumeric",  //指定数字排序
    "previousStop":"<previousStop_value>"
}

需要注意的是,topN是一个近似算法,每一个Segment返回前1000条进行合并得到最后的结果,如果dimension的基数在1000以内,则是准确的,超过1000就是近似值。

3. GroupBy

3.1 GroupBy

GroupBy类似于SQL中的group by操作,能对指定的多个维度进行分组,也支持对指定的维度进行排序,并输出limit行数。同时,支持having操作。

查询示例如下:

{
    "queryType": "lucene_groupBy",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    },
    "dimensions": [
        {
            "type": "default",
            "dimension": "province",
            "outputName": "province"
        }
    ],
    "aggregations": [
        {
            "name": "sum(age)",
            "type": "lucene_doubleSum",
            "fieldName": "age"
        }
    ],
    "limitSpec": {
        "type": "default",
        "columns": [
            {
                "dimension": "province"
            }
        ],
        "limit": 3
    }
}

相当于SQL语句的:select province,sum(age) from userinfo group by province limit 3;

查询的结果如下:

[
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "province": "上海市",
            "sum(age)": 56642
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "province": "云南省",
            "sum(age)": 57850
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "province": "内蒙古",
            "sum(age)": 56473
        }
    }
]

GroupBy查询包含以下部分:

字段名 描述 是否必须
queryType 对于GroupBy查询,该字段的值必须是lucene_groupBy
dataSource 要查询数据集dataSource名字。详见dataSource
dimensions 进行GroupBy查询的维度集合。详见dimension
limitSpec 对统计结果进行排序,取limit的行数
having 对统计结果进行筛选。详见having
granularity 查询结果进行聚合的时间粒度
filter 过滤条件。详见filter
aggregations 聚合。详见aggregation
postAggregations 后聚合器。详见post-aggregation
intervals 查询时间区间范围,ISO-8601格式。详见interval
context 指定一些查询参数,如结果是否进缓存等

GroupBy特有的字段为limitSpechaving

  • limitSpec

指定排序规则和limit的行数。JSON示例如下:

{
    "type":"default",
    "limit":<integer_value>,
    "columns":[list of OrderByColumnSpec]
}

其中columns是一个数组,可以指定多个排序字段,排序字段可以使demensionmetric,指定排序规则的拼写方式:

{
    "dimension":"<Any dimension or metric name>",
    "direction":<"ascending"|"descending">
}

示例如下:

"limitSpec":{
    "type":"default",
    "limit":1000,
    "columns":[
        {
            "dimension":"visitor_count",
            "direction":"descending"
        },
        {
            "dimension":"click_visitor_count",
            "direction":"ascending"
        }
    ]
}
  • having

    类似于SQL中的having操作,对GroupBy的结果进行筛选,详见having

以上GroupBy仅支持对String类型的维度进行分组,而以下的NumericGroup,CustomGroup,numericGroupBy则支持对数字维度(包括int,float,double类型)的分组

3.2 NumericGroup

NumericGroup支持对数字维度按指定的间隔(间隔必须大于1)进行分组

查询示例如下:

{
    "queryType": "lucene_groupBy",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    },
    "dimensions": [
        {
            "type": "numericGroup",
            "dimension": "age",
            "outputName": "groupAge",
            "interval":5,
            "min":10,
            "max":29

        }
    ],
    "aggregations": [
        {
            "name": "sum(average)",
            "type": "lucene_doubleSum",
            "fieldName": "average"
        }
    ],
    "limitSpec": {
        "type": "default",
        "columns": [
            {
                "dimension": "sum(average)"
            }
        ],
        "limit": 3
    }
}

查询的结果如下:

[
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "25.000000~29.000000",
            "sum(average)": 1248982.8500000006
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "20.000000~25.000000",
            "sum(average)": 1252797.83
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "15.000000~20.000000",
            "sum(average)": 1258379.9299999997
        }
    }
]

NumericGroupGroupBy相比,只有dimensions里的dimension属性不同,它使用类型为numericGroupdimension,详见dimension

3.3 CustomGroup

CustomGroup支持对数字维度进行自定义的间隔进行分组

查询示例如下:

{
    "queryType": "lucene_groupBy",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    },
    "dimensions": [
        {
            "type": "customGroup",
            "dimension": "age",
            "outputName": "groupAge",
            "groups":[
                {
                    "name": "children",
                    "lower": 10,
                    "upper": 18
                },
                {
                    "name": "young people",
                    "lower": 18,
                    "upper": 29
                }
            ],
            "outOfBound":true

        }
    ],
    "aggregations": [
        {
            "name": "sum(average)",
            "type": "lucene_doubleSum",
            "fieldName": "average"
        }
    ],
    "limitSpec": {
        "type": "default",
        "columns": [
            {
                "dimension": "sum(average)"
            }
        ],
        "limit": 3
    }
}

查询结果如下:

[
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "OUT_OF_BOUND",
            "sum(average)": 249973.47
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "children",
            "sum(average)": 2026039.5100000002
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "young people",
            "sum(average)": 2750655.9500000016
        }
    }
]

CustomGroupGroupBy相比,只有dimensions里的dimension属性不同,它使用类型为customGroupdimension,详见dimension

3.4 NumericGroupBy

NumericGroupBy支持对数字维度进行间隔为1的分组

查询实例如下:

{
    "queryType": "lucene_groupBy",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    },
    "dimensions": [
        {
            "type": "numericGroupBy",
            "dimension": "age",
            "outputName": "groupAge"

        }
    ],
    "aggregations": [
        {
            "name": "sum(average)",
            "type": "lucene_doubleSum",
            "fieldName": "average"
        }
    ],
    "limitSpec": {
        "type": "default",
        "columns": [
            {
                "dimension": "sum(average)"
            }
        ],
        "limit": 3
    }
}

查询结果如下:

[
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "14",
            "sum(average)": 240744.52999999997
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "28",
            "sum(average)": 242116.0299999999
        }
    },
    {
        "v": "v1",
        "timestamp": "1000-01-01T00:00:00.000Z",
        "event": {
            "groupAge": "19",
            "sum(average)": 243802.67
        }
    }
]

NumericGroupByGroupBy相比,只有dimensions里的dimension属性不同,它使用类型为numericGroupBydimension,详见dimension

3.5 NestGroupBy

NestGroupBy完成的功能:基于一个SQL的结果集,作为SQL的查询表,再做聚合groupBy的操作

SELECT sum(countResult) as sumResult
FROM (SELECT count(*) as countResult
      FROM table_name
      GROUP BY user_id)
group by countResult;

查询实例如下:

{
    "queryType": "nest_groupBy",
    "dataSource": "shopping0123",
    "intervals": "1000/3000",
    "granularity": "all",
    "context": {
        "timeout": 180000,
        "groupByStrategy": "v2"
    },
    "dimensions": [
        {
            "dimension": "user_id"
        }
    ],
    "aggregations": [
        {
            "name": "total",
            "type": "lucene_count"
        }
    ],
    "nestSpec": {
        "nestGranularity": "all",
        "nestDimensions": [
            {
                "dimension": "total"
            }
        ],
        "nestAggregations": [
            {
                "name": "total",
                "type": "longSum",
                "fieldName": "total"
            }
        ],
        "nestLimitSpec": {
            "type": "default",
            "columns": [
                {
                    "dimension": "total",
                    "direction": "DESCENDING"
                }
            ],
            "limit": 2
        }
    }
}

查询结果如下

[
  {
    "v": "v1",
    "timestamp": "1000-01-01T00:00:00.000Z",
    "event": {
      "total": 678691
    }
  },
  {
    "v": "v1",
    "timestamp": "1000-01-01T00:00:00.000Z",
    "event": {
      "total": 11609
    }
  }
]

nsetSpec说明

字段名 描述 是否必须
nestGranularity 查询时间区间范围,ISO-8601 格式。详见interval
nestDimensions 进行GroupBy查询的维度集合。详见dimension
nestAggregations 聚合。这里的用法与aggregation一样,但是type不能用lucene相关的,只能用原生的,详细type看下表
nestPostAggregations 后聚合器。详见post-aggregation
nestLimitSpec 对统计结果进行排序,取limit的行数
nestHaving 对统计结果进行筛选。详见having

原生的type

type 描述
count 相当于SQL的count(*)。详细看Count Aggregation
longSum 相当于SQL的sum操作,输入的值为int, long,详细看LongSum Aggregation
doubleSum 相当于SQL的sum,输入的值类型未int , long , float ,详细看DoubleSum Aggregation
doubleMax 结果的最大值,该值类型为 double ,维度的类型支持 int,long,float。详细看DoubleMax Aggregation
doubleMin 结果的最小值,该值类型为double,输入的值类型为int,long,float。详细看DoubleMin Aggregation
longMax 输出结果最大值,输入的类型为intlong。详细看LongMax Aggregation
longMin 输出结果最小值,输入的类型为intlong。详细看LongMin Aggregation
javascript 如果上述聚合器无法满足需求,Druid还提供了JavaScript Aggregation。用户可以自己写JavaScript function,其中指定的列即为function的入参。详细看JavaScript Aggregation
hyperUnique 在查询时,HyperUnique Aggregation 使用HyperLogLog算法计算给定维度集合的基数。详细看HyperUnique Aggregation
cardinality 在查询时,Cardinality Aggregation使用HyperLogLog算法计算给定维度集合的基数,相当于distinct()。详细看Cardinality Aggregation
filtered Filtered Aggregation可以在aggregation中指定Filter规则。只对满足规则的维度进行聚合,以提升聚合效率。详细看Filtered Aggregation

4. Select

Select类似于SQL中的select操作,Select用来查看Druid中存储的数据,并支持按照指定过滤器和时间段查看指定维度和Metric。能通过descending字段指定排序顺序,并支持分页拉取,但不支持aggregationspostAggregations

JSON示例如下:

{
    "queryType": "lucene_select",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    },
    "dimensions": [
        "province"
    ],
    "pagingSpec": {
        "pagingIdentifiers": {},
        "threshold": 3
    }
}

相当于SQL语句:select province from userinfo limit 3;

查询结果如下:  

[
  {
    "timestamp": "2017-01-01T00:00:00.000Z",
    "result": {
      "pagingIdentifiers": {
        "userinfo_2017-01-01T00:00:00.000Z_2018-01-01T00:00:00.000Z_2017-05-17T08:08:34.224Z": 2
      },
      "events": [
        {
          "segmentId": "userinfo_2017-01-01T00:00:00.000Z_2018-01-01T00:00:00.000Z_2017-05-17T08:08:34.224Z",
          "offset": 0,
          "event": {
            "timestamp": "2017-05-15T07:54:40.918Z",
            "province": "宁夏"
          }
        },
        {
          "segmentId": "userinfo_2017-01-01T00:00:00.000Z_2018-01-01T00:00:00.000Z_2017-05-17T08:08:34.224Z",
          "offset": 1,
          "event": {
            "timestamp": "2017-05-15T07:54:42.481Z",
            "province": "贵州省"
          }
        },
        {
          "segmentId": "userinfo_2017-01-01T00:00:00.000Z_2018-01-01T00:00:00.000Z_2017-05-17T08:08:34.224Z",
          "offset": 2,
          "event": {
            "timestamp": "2017-05-15T07:54:43.780Z",
            "province": "内蒙古"
          }
        }
      ]
    }
  }
]

pagingSpec中指定分页拉取的offset和条目数,在结果中会返回下次拉取的offsetJSON示例如下:

{
    "pagingSpec":{
        "pagingIdentifiers":{},
        "thershold":5,
        "fromNext":true
    }
}

Search查询返回匹配中的维度,类似于SQL中的topN操作,但是支持更多的匹配操作。JSON示例如下:

{
    "queryType":"lucene_search",
    "dataSource":"userinfo",
    "granularity":"day",
    "intervals": "1000/3000",
    "limit":1,
    "searchDimensions":[
        "province",
        "time"
    ],
    "sort":{
        "type":"lexicographic"
    }
}
  • searchDimensions:搜索的维度

需要注意的是,Search只是返回匹配中维度,不支持其他聚合操作。如果要将Search作为查询条件进行TopNGroupByTimeseries等操作,则可以在filter字段中指定各种过滤方式。filter字段也支持正则匹配。 查询结果如下:

[
    {
        "timestamp": "2017-01-01T00:00:00.000Z",
        "result": [
            {
                "dimension": "province",
                "value": "新疆",
                "count": 1
            },
            {
                "dimension": "province",
                "value": "青海省",
                "count": 1
            },
            {
                "dimension": "province",
                "value": "黑龙江",
                "count": 1
            }
        ]
    }
]

6. 元数据查询

Druid支持对DataSource的基础元数据进行查询。

6.1 TimeBoundary

通过TimeBoundary可查询DataSource的最早和最晚的时间点,查询JSON示例如下:

{
    "queryType": "lucene_timeBoundary",
    "dataSource": "userinfo",
    "bound":"maxtime"
}
  • bound:最小最大时间,maxTime or minTime

返回结果如下:

[
    {
        "timestamp": "2017-05-17T07:54:36.337Z",
        "result": {
            "maxTime": "2017-05-17T07:54:36.337Z"
        }
    }
]

6.2 SegmentMetadata

通过SegmentMetadata可查询Segment的元信息,如有哪些columnmetricaggregator,查询JSON示例如下:

{
    "queryType": "lucene_segmentMetadata",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "merge": true,
    "analysisTypes": [
        "aggregators"
    ],
    "lenientAggregatorMerge": true,
    "usingDefaultInterval": false,
    "context": {
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    }
}

相当与SQL语句的 desc userinfo;

返回结果如下:

[
    {
        "id": "userinfo_2017-01-01T00:00:00.000Z_2018-01-01T00:00:00.000Z_2017-05-17T08:08:34.224Z",
        "intervals": null,
        "columns": {
            "UserID": {
                "type": "STRING",
                "hasMultipleValues": false,
                "size": 0,
                "cardinality": 0,
                "minValue": null,
                "maxValue": null,
                "errorMessage": null
            },
            "__time": {
                "type": "LONG",
                "hasMultipleValues": false,
                "size": 0,
                "cardinality": null,
                "minValue": null,
                "maxValue": null,
                "errorMessage": null
            },
          ...
        },
        "size": 0,
        "numRows": 100000,
        "aggregators": null,
        "queryGranularity": null
    }
]

segmentMetadata支持更多的查询字段,不过这些字段都不是必须的,具体如下:

字段名 描述 是否必须
toInclude 可以指定哪些column在返回结果中呈现,可以填all,none,list
merge 将多个Segment的元信息合并到一个返回结果中
analysisTypes 指定返回column的哪些属性,如size,intervals
lenientAggregatorMerge truefalse,设置为true时,将不同的aggregator合并显示
context 查询Context,可以指定是否缓存查询结果等
  • toInclude的使用方式如下:
    "toInclude":{"type":"all"}
    "toInclude":{"type":"none"}
    "toInclude":{"type":"list","columns":[<string list of column names>]}
    
  • analysisTypes支持指定的属性:cardinality,minmax,size,intervals,queryGranularity,aggregators

7. UserGroup

是用户分群查询,支持将多维度和多指标作为分析条件,有针对性地根据你的需要建立分群。JSON示例如下:

{
    "queryType":"user_group",
    "dataSource":"userinfo",
    "granularity":"all",
    "intervals": "1000/3000",
    "filter": {
        "type": "selector",
        "dimension": "province",
        "value": "广东省"
    },
    "dimension":"age",
    "dataConfig": {
        "hostAndPorts":"153.214.0.1:8046",  
        "clusterMode":true,  
        "groupId":"1"  
    },
    "aggregations":[
        {
            "name": "sum(age)",
            "type": "lucene_doubleSum",
            "fieldName": "age"
        }
    ],
    "context":{
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    }
}
  • dataConfig.hostAndPorts redis集群ip和端口,逗号或分号隔开
  • dataConfig.clusterMode redis是否是集群模式
  • dataConfig.groupId 用户分群id

8. Scan

用来查询原始数据,JSON示例如下:

{
    "queryType": "lucene_scan",
    "dataSource": "wuxianjiRT",
    "resultFormat": "compactedList",
    "batchSize": 1,
    "limit": 2,
    "columns": [
        "Province",
        "UserID"
    ],
    "filter": {
        "type": "and",
        "fields": [
            {
                "type": "in",
                "dimension": "ClientDeviceBrand",
                "values": [
                    "HUWEI"
                ]
            }
        ]
    },
    "intervals": [
        "2011-01-01/2017-06-30"
    ]
}
  • limit 总共返回的数量
  • batchSize 每一批次返回的数量

查询结果如下:

[
    {
        "segmentId": "wuxianjiRT_2017-02-23T00:00:00.000Z_2017-02-24T00:00:00.000Z_2017-02-23T00:00:00.905Z_14",
        "columns": [
            "timestamp",
            "Province",
            "UserID"
        ],
        "events": [
            [
                "2017-02-23T16:00:06.616Z",
                "辽宁省",
                "b5b41fac0361d157d9673ecb926af5ae"
            ]
        ]
    },
    {
        "segmentId": "wuxianjiRT_2017-02-23T00:00:00.000Z_2017-02-24T00:00:00.000Z_2017-02-23T00:00:00.905Z_14",
        "columns": [
            "timestamp",
            "Province",
            "UserID"
        ],
        "events": [
            [
                "2017-02-23T16:00:07.244Z",
                "安微省",
                "7f100b7b36092fb9b06dfb4fac360931"
            ]
        ]
    }
]

9. FirstN

查询某个维度的前N个值(不用排序,不重复),JSON示例如下:

{
    "queryType":"lucene_firstN",
    "dataSource":"userinfo",
    "dimension":"province",
    "threshold":5,
    "intervals": "1000/3000",
    "granularity":"all",
    "context":{
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    }
}

查询结果如下:

[
    {
        "timestamp": "2017-01-01T00:00:00.000Z",
        "result": [
            "黑龙江",
            "重庆市",
            "青海省",
            "新疆",
            "四川省"
        ]
    }
]

10. MultiHaving

用于对GroupBy分组数据进行不同的having条件过滤,分组时能对指定的多个维度进行分组,每个having过滤结果可以经过多个aggregatorSpecs进行聚合,JSON示例如下:

{
    "queryType": "multi_having",
    "dataSource": "userinfo",
    "intervals": "1000/3000",
    "granularity": "all",
    "dimensions":["province"],
    "aggregations":  [
        {
            "type":"lucene_count",
            "name":"_count_"
        }
    ],
    "havingAggregators": [
        {
            "name": "__havingAggregators",
            "havingSpec": {
                "type": "greaterThan",
                "aggregation": "_count_",
                "value": 3000
            },
            "aggregatorSpecs": [
                {
                    "type": "lucene_count",
                    "name": "_count1_"
                }
            ]
        }
    ],
    "context":{
        "timeout": 180000,
        "useOffheap": true,
        "groupByStrategy": "v2"
    }
}

相当于SQL语句的select count(_count_) _count1_ from (select count(*) _count_ from userinfo group by province having _count_ > 3000)

查询结果如下:

[
    {
        "v": "data_row",
        "event": {
            "data": {
                "__havingAggregators": {
                    "_count1_": 4
                }
            },
            "multiHavingTime": 1,
            "groupByTime": 36
        }
    }
]

MultiHaving特有的字段为havingAggregators

  • havingAggregators

指定一组不同的having过滤条件,以及对过滤结果的聚合方式。JSON示例如下:

[<HavingAggregator>,<HavingAggregator>,<HavingAggregator>...]

其中HavingAggregator指定一个单独的having过滤条件,以及对该过滤结果的聚合方式,它包含的字段如 下:

字段名 描述 是否必须
name 指定返回结果的属性名
havingSpec 对分组数据进行having过滤,详见having
aggregatorSpecs 对having过滤后的结果进行聚合的方式

11. Funnel

Funnel查询也叫漏斗查询,该查询可以设置多个步骤,每个步骤都会在上一个步骤的结果中查询出符合条件的数据,最后将每个步骤的结果返回,JSON示例如下:

{
  "queryType":"funnel",
  "dataSource":"rollup-normal-test5",
  "context":
  {
    "groupByStrategy":"v2",
    "useOffheap":true,
    "timeout":180000
  },
  "intervals":"2017-08-20T00:00:00.000Z/2017-08-22T23:59:59.999Z",
  "granularity":  {
    "type":"period",
    "period":"P2D"
  },
  "dimension":null,
  "field":"uid",
  "steps":[
    {
      "name":"第 1 步",
      "filter":"event:访问"
    },

    {
      "name":"第 2 步",
      "filter":"event:登录"
    }
  ],
  "slidingWindow":
  {
    "type":"period",
    "period":"P1D"
  }
}

查询结果:

[
    {
        "v": "FunnelResultRow",
        "timestamp": "2017-08-20T00:00:00.000Z",
        "event": {
            "第 1 步": 527595.2736384192,
            "第 2 步": 459295.1286562972
        },
        "type": "window"
    },
    {
        "v": "FunnelResultRow",
        "timestamp": "2017-08-21T00:00:00.000Z",
        "event": {
            "第 1 步": 388632.1614757022,
            "第 2 步": 245955.00990854224
        },
        "type": "window"
    },
    {
        "v": "FunnelResultRow",
        "timestamp": "2017-08-22T00:00:00.000Z",
        "event": {
            "第 1 步": 390788.5262813083,
            "第 2 步": 247582.08635253785
        },
        "type": "window"
    },
    {
        "v": "FunnelResultRow",
        "timestamp": "2017-08-20T00:00:00.000Z",
        "event": {
            "第 1 步": 574921.653267394,
            "第 2 步": 516034.8059116008
        },
        "type": "total"
    }
]
  • granularity.period 指定Funnel的查询粒度,P2D表示查询粒度为2天
  • dimension 指定查询的分组维度
  • field 指定用于基数统计的维度
  • steps 指定漏斗的查询条件
  • steps.name 指定查询条件的名称
  • steps.filter 指定查询具体的过滤条件,用lucene表达式设置
  • slidingWindow 指定查询的滑动窗口,即展示结果的时间间隔,目前仅支持设为一天,即P1D

12. Retention

Retention查询也叫留存查询,它要先设置一个起始条件,然后设置一个转化条件.然后在起始条件的结果里进行转化条件的过滤,最后返回各个条件的查询结果,JSON示例如下:

{
  "queryType":"retention",
  "dataSource":"rollup-normal-test5",
  "context":
  {
    "groupByStrategy":"v2",
    "useOffheap":true,
    "timeout":180000
  },
  "intervals":"2017-08-20T00:00:00.000Z/2017-08-27T23:59:59.999Z",
  "granularity":
  {
    "type":"period",
    "period":"P3D"

  },
  "field":"uid",
  "startStep":
  {
    "name":"total",
    "filter":"event:访问 "
  },
  "returnStep":
  {
    "name":"total_return",
    "filter":"event:购买"
  }
}

查询结果:

[
    {
        "timestamp": "2017-08-25T00:00:00.000Z",
        "result": {
            "total": 358710.1861640895,
            "2017-08-25T00:00:00.000Z": 246955.86689676947
        }
    },
    {
        "timestamp": "2017-08-22T00:00:00.000Z",
        "result": {
            "total": 571790.5760884841,
            "2017-08-22T00:00:00.000Z": 561162.0445494938,
            "2017-08-25T00:00:00.000Z": 391989.24259191006
        }
    },
    {
        "timestamp": "2017-08-19T00:00:00.000Z",
        "result": {
            "2017-08-19T00:00:00.000Z": 490870.6724267715,
            "total": 527595.2736384192,
            "2017-08-22T00:00:00.000Z": 519098.24003230926,
            "2017-08-25T00:00:00.000Z": 362174.31900703744
        }
    }
]
  • granularity.period 指定查询粒度
  • field 指定用于基数统计的维度
  • startStep.filter 指定起始条件,用lucene表达式设置
  • returnStep.filter 指定结束条件,用lucene表达式设置
© 广东数果 all right reserved,powered by Gitbook问题反馈邮件:developer@sugo.io 2020-11-12 17:52:00

results matching ""

    No results matching ""