CloudWatch metrics collected from prometheus , contains undesired dimensions #1196

ecerulm · 2024-06-04T12:24:25Z

Describe the bug

My configuration says

logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },

which specifies that only the following labels should becom dimensions

ClusterName
node
application
service
service_instance

but the final cloudwatch log event is

{
    "CloudWatchMetrics": [
        {
            "Namespace": "CWAgent/Prometheus",
            "Dimensions": [
                [
                    "service",
                    "service_instance",
                    "ClusterName",
                    "host",
                    "job",
                    "prom_metric_type",
                    "instance",
                    "node",
                    "application"
                ]
            ],
            "Metrics": [
                {
                    "Name": "java_lang_memory_heapmemoryusage_used",
                    "Unit": "Bytes"
                },
                {
                    "Name": "jmx_scrape_cached_beans"
                },
                {
                    "Name": "jmx_scrape_duration_seconds"
                },
                {
                    "Name": "jmx_scrape_error"
                }
            ]
        }
    ],
    "ClusterName": "tableau-dp2",
    "Timestamp": "1717502587825",
    "Version": "0",
    "application": "Tableau",
    "host": "xxxx",
    "instance": "127.0.0.1:12302",
    "job": "jmx",
    "node": "node1",
    "prom_metric_type": "gauge",
    "service": "vizqlservice",
    "service_instance": "2",
    "java_lang_memory_heapmemoryusage_used": 506484968,
    "jmx_scrape_cached_beans": 0,
    "jmx_scrape_duration_seconds": 0.057368237,
    "jmx_scrape_error": 0
}

as you can see the .CloudWatchMetrics.Dimensions contain additional dimension to the ones I specified:

host
job
prom_metric_type
instance

Steps to reproduce
If possible, provide a recipe for reproducing the error.

What did you expect to see?

I expect to see only the dimensions that I specified, or at least have documented somewhere that what dimensions will be "forced" or automatically added

What did you see instead?

I saw the dimensions that I specified **plus 4 other dimensions that I didn't ask for **

What version did you use?
Version: CWAgent/1.300039.0b612 (go1.22.2; linux; amd64)

What config did you use?
config.json


{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root",
    "debug": true
  },
  "metrics": {
    "aggregation_dimensions": [
      [
        "InstanceId"
      ]
    ],
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}"
    },
    "metrics_collected": {
      "collectd": {
        "metrics_aggregation_interval": 60
      },
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_iowait",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "metrics_collection_interval": 60,
        "totalcpu": true
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "/"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time",
          "write_bytes",
          "read_bytes",
          "writes",
          "reads"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "metrics_collection_interval": 60
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ],
        "metrics_collection_interval": 60
      },
      "statsd": {
        "metrics_aggregation_interval": 60,
        "metrics_collection_interval": 10,
        "service_address": ":8125"
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  },
  "logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },
    "force_flush_interval": 5
  }
}

prometheus.yaml

global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: jmx
    sample_limit: 10000
    file_sd_configs:
      - files: ["/opt/aws/amazon-cloudwatch-agent/etc/prometheus_sd_jmx.yaml"]

prometheus_sd_jmx.yaml

- targets:
  - 127.0.0.1:12300
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "0"
    node: node1
- targets:
  - 127.0.0.1:12301
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "1"
    node: node1
- targets:
  - 127.0.0.1:12302
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "2"
    node: node1
- targets:
  - 127.0.0.1:12303
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "3"
    node: node1

Environment
OS: Ubuntu 18.04.6 LTS"

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

sky333999 · 2024-06-13T14:23:47Z

Hi @ecerulm, thank you for providing all the details.
One more thing that would help is if you could curl the prometheus endpoint and provide us a static snapshot of the raw prometheus metrics from the target.

github-actions · 2024-09-15T00:12:07Z

This issue was marked stale due to lack of activity.

github-actions bot added the Stale label Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CloudWatch metrics collected from prometheus , contains undesired dimensions #1196

CloudWatch metrics collected from prometheus , contains undesired dimensions #1196

ecerulm commented Jun 4, 2024

sky333999 commented Jun 13, 2024

github-actions bot commented Sep 15, 2024

CloudWatch metrics collected from prometheus , contains undesired dimensions #1196

CloudWatch metrics collected from prometheus , contains undesired dimensions #1196

Comments

ecerulm commented Jun 4, 2024

sky333999 commented Jun 13, 2024

github-actions bot commented Sep 15, 2024