预聚合接入

本例展示如何对数据进行预聚合接入

本例使用的json如下:

{
  "type": "lucene_supervisor",
  "dataSchema": {
      "dataSource": "rollup-test",
      "parser":{
          "type":"string",
          "parseSpec":{
              "format":"json",
              "dimensionsSpec":{
                  "dynamicDimension":false,  
                  "dimensions":[
                      {"name":"s|province","type":"string"},
                      {"name":"s|event","type":"string"}
                  ]
              },
              "timestampSpec":{
                  "column":"d|sugo_time",
                  "excludeTimeColumn": false,
                  "format":"millis"
              }
          }
      },
      "metricsSpec": [{
          "type": "thetaSketch",
          "name": "uid_estimated_count",
          "fieldName": "s|uid"      
      }],
      "granularitySpec": {
          "type": "uniform",
          "segmentGranularity": "DAY",
          "queryGranularity":  {    
              "type":"period",
              "period":"P1D"     
          },
          "rollup": true,      
          "intervals": null
      }
  },
  "tuningConfig": {
      "type":"kafka",
      "maxRowsInMemory":10000000,
      "maxRowsPerSegment":20000000,
      "intermediatePersistPeriod":"PT10M",
      "buildV9Directly":true,
      "reportParseExceptions":true
  },
  "ioConfig": {
      "topic": "rollup_test",
      "replicas": 1,
      "taskCount": 1,
      "taskDuration": "PT300S",
      "consumerProperties": {
          "bootstrap.servers": "192.168.0.220:9092,192.168.0.221:9092,192.168.0.222:9092"
      },
      "startDelay": "PT5S",
      "period": "PT30S",
      "useEarliestOffset": true,
      "completionTimeout": "PT1800S",
      "lateMessageRejectionPeriod": null
  },
 "writerConfig" : {
   "type" : "lucene",
   "maxBufferedDocs" : -1,
   "ramBufferSizeMB" : 16.0,
   "indexRefreshIntervalSeconds" : 6
 }
}

| 属性名 |值 |类型 |是否必需| 默认值|说明 | | ---- |---- |--- | --- |--- |--- | | type |lucene_supervisor|string| 是 | - | 指定接入类型,注意:lucene_index也支持预聚合接入| | dataSchema |参见DataSchema|json| 是 | - | 定义表结构和数据粒度| | ioConfig |参见kafkaSupervisorIOConfig|json| 是 | - | 定义数据来源| | tuningConfig |参见kafkaSupervisorTuningConfig|json| 是 | - | 配置Task的优化参数| | writerConfig |参见WriterConfig|json| 是 | - | 配置数据段的写入参数|

特殊参数说明：

dataSchema.parser.parseSpec.dimensionsSpec.dynamicDimension 预聚合不支持动态维接入，故设为false.
dataSchema.parser.metricsSpec 指定预聚合的维度和聚合器.
dataSchema.parser.metricsSpec.fieldName 指定预聚合的维度(用于统计计数), 应该注意这个维度不应该再出现dimensionsSpec中,所以也就不能用动态维
dataSchema.parser.granularitySpec.rollup 设为true表示进行预聚合
dataSchema.parser.granularitySpec.queryGranularity.period 指定预聚合的粒度,不可留空,一定要设置

预聚合接入

预聚合接入

results matching ""

No results matching ""