预聚合接入

本例展示如何对数据进行预聚合接入

  • 本例使用的json如下:
    {
      "type": "lucene_supervisor",
      "dataSchema": {
          "dataSource": "rollup-test",
          "parser":{
              "type":"string",
              "parseSpec":{
                  "format":"json",
                  "dimensionsSpec":{
                      "dynamicDimension":false,  
                      "dimensions":[
                          {"name":"s|province","type":"string"},
                          {"name":"s|event","type":"string"}
                      ]
                  },
                  "timestampSpec":{
                      "column":"d|sugo_time",
                      "excludeTimeColumn": false,
                      "format":"millis"
                  }
              }
          },
          "metricsSpec": [{
              "type": "thetaSketch",
              "name": "uid_estimated_count",
              "fieldName": "s|uid"      
          }],
          "granularitySpec": {
              "type": "uniform",
              "segmentGranularity": "DAY",
              "queryGranularity":  {    
                  "type":"period",
                  "period":"P1D"     
              },
              "rollup": true,      
              "intervals": null
          }
      },
      "tuningConfig": {
          "type":"kafka",
          "maxRowsInMemory":10000000,
          "maxRowsPerSegment":20000000,
          "intermediatePersistPeriod":"PT10M",
          "buildV9Directly":true,
          "reportParseExceptions":true
      },
      "ioConfig": {
          "topic": "rollup_test",
          "replicas": 1,
          "taskCount": 1,
          "taskDuration": "PT300S",
          "consumerProperties": {
              "bootstrap.servers": "192.168.0.220:9092,192.168.0.221:9092,192.168.0.222:9092"
          },
          "startDelay": "PT5S",
          "period": "PT30S",
          "useEarliestOffset": true,
          "completionTimeout": "PT1800S",
          "lateMessageRejectionPeriod": null
      },
     "writerConfig" : {
       "type" : "lucene",
       "maxBufferedDocs" : -1,
       "ramBufferSizeMB" : 16.0,
       "indexRefreshIntervalSeconds" : 6
     }
    }
    
    | 属性名 |值 |类型 |是否必需| 默认值|说明 | | ---- |---- |--- | --- |--- |--- | | type |lucene_supervisor|string| 是 | - | 指定接入类型,注意:lucene_index也支持预聚合接入| | dataSchema |参见DataSchema|json| 是 | - | 定义表结构和数据粒度| | ioConfig |参见kafkaSupervisorIOConfig|json| 是 | - | 定义数据来源| | tuningConfig |参见kafkaSupervisorTuningConfig|json| 是 | - | 配置Task的优化参数| | writerConfig |参见WriterConfig|json| 是 | - | 配置数据段的写入参数|

特殊参数说明:

  • dataSchema.parser.parseSpec.dimensionsSpec.dynamicDimension 预聚合不支持动态维接入,故设为false.
  • dataSchema.parser.metricsSpec 指定预聚合的维度和聚合器.
  • dataSchema.parser.metricsSpec.fieldName 指定预聚合的维度(用于统计计数), 应该注意这个维度不应该再出现dimensionsSpec中,所以也就不能用动态维
  • dataSchema.parser.granularitySpec.rollup 设为true表示进行预聚合
  • dataSchema.parser.granularitySpec.queryGranularity.period 指定预聚合的粒度,不可留空,一定要设置
© 广东数果 all right reserved,powered by Gitbook问题反馈邮件:developer@sugo.io 2020-11-12 17:52:00

results matching ""

    No results matching ""