Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatch metrics collected from prometheus , contains undesired dimensions #1196

Open
ecerulm opened this issue Jun 4, 2024 · 2 comments
Labels

Comments

@ecerulm
Copy link
Contributor

ecerulm commented Jun 4, 2024

Describe the bug

My configuration says

logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },

which specifies that only the following labels should becom dimensions

  • ClusterName
  • node
  • application
  • service
  • service_instance

but the final cloudwatch log event is

{
    "CloudWatchMetrics": [
        {
            "Namespace": "CWAgent/Prometheus",
            "Dimensions": [
                [
                    "service",
                    "service_instance",
                    "ClusterName",
                    "host",
                    "job",
                    "prom_metric_type",
                    "instance",
                    "node",
                    "application"
                ]
            ],
            "Metrics": [
                {
                    "Name": "java_lang_memory_heapmemoryusage_used",
                    "Unit": "Bytes"
                },
                {
                    "Name": "jmx_scrape_cached_beans"
                },
                {
                    "Name": "jmx_scrape_duration_seconds"
                },
                {
                    "Name": "jmx_scrape_error"
                }
            ]
        }
    ],
    "ClusterName": "tableau-dp2",
    "Timestamp": "1717502587825",
    "Version": "0",
    "application": "Tableau",
    "host": "xxxx",
    "instance": "127.0.0.1:12302",
    "job": "jmx",
    "node": "node1",
    "prom_metric_type": "gauge",
    "service": "vizqlservice",
    "service_instance": "2",
    "java_lang_memory_heapmemoryusage_used": 506484968,
    "jmx_scrape_cached_beans": 0,
    "jmx_scrape_duration_seconds": 0.057368237,
    "jmx_scrape_error": 0
}

as you can see the .CloudWatchMetrics.Dimensions contain additional dimension to the ones I specified:

  • host
  • job
  • prom_metric_type
  • instance

Steps to reproduce
If possible, provide a recipe for reproducing the error.

What did you expect to see?

I expect to see only the dimensions that I specified, or at least have documented somewhere that what dimensions will be "forced" or automatically added

What did you see instead?

I saw the dimensions that I specified **plus 4 other dimensions that I didn't ask for **

What version did you use?
Version: CWAgent/1.300039.0b612 (go1.22.2; linux; amd64)

What config did you use?
config.json


{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "root",
    "debug": true
  },
  "metrics": {
    "aggregation_dimensions": [
      [
        "InstanceId"
      ]
    ],
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "ImageId": "${aws:ImageId}",
      "InstanceId": "${aws:InstanceId}",
      "InstanceType": "${aws:InstanceType}"
    },
    "metrics_collected": {
      "collectd": {
        "metrics_aggregation_interval": 60
      },
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_iowait",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "metrics_collection_interval": 60,
        "totalcpu": true
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "/"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time",
          "write_bytes",
          "read_bytes",
          "writes",
          "reads"
        ],
        "metrics_collection_interval": 60,
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "metrics_collection_interval": 60
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ],
        "metrics_collection_interval": 60
      },
      "statsd": {
        "metrics_aggregation_interval": 60,
        "metrics_collection_interval": 10,
        "service_address": ":8125"
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ],
        "metrics_collection_interval": 60
      }
    }
  },
  "logs": {
    "metrics_collected": {
      "prometheus": {
        "cluster_name": "tableau-dp2",
        "log_group_name": "tableau-dp2",
        "prometheus_config_path": "/opt/aws/amazon-cloudwatch-agent/etc/prometheus.yaml",
        "emf_processor": {
          "metric_declaration_dedup": true,
          "metric_namespace": "CWAgent/Prometheus",
          "metric_unit": {
            "java_lang_memory_heapmemoryusage_used": "Bytes"
          },
          "metric_declaration": [
            {
              "source_labels": ["node"],
              "label_matcher": "*",
              "dimensions": [
                [
                  "ClusterName",
                  "node",
                  "application",
                  "service",
                  "service_instance"
                ]
              ],
              "metric_selectors": [
                "^java_lang_memory_heapmemoryusage_used"
              ]
            }
          ]
        }
      }
    },
    "force_flush_interval": 5
  }
} 

prometheus.yaml

global:
  scrape_interval: 1m
  scrape_timeout: 10s
scrape_configs:
  - job_name: jmx
    sample_limit: 10000
    file_sd_configs:
      - files: ["/opt/aws/amazon-cloudwatch-agent/etc/prometheus_sd_jmx.yaml"]

prometheus_sd_jmx.yaml

- targets:
  - 127.0.0.1:12300
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "0"
    node: node1
- targets:
  - 127.0.0.1:12301
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "1"
    node: node1
- targets:
  - 127.0.0.1:12302
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "2"
    node: node1
- targets:
  - 127.0.0.1:12303
  labels:
    application: Tableau
    service: vizqlservice
    service_instance: "3"
    node: node1

Environment
OS: Ubuntu 18.04.6 LTS"

Additional context
Add any other context about the problem here.

@sky333999
Copy link
Contributor

Hi @ecerulm, thank you for providing all the details.
One more thing that would help is if you could curl the prometheus endpoint and provide us a static snapshot of the raw prometheus metrics from the target.

Copy link
Contributor

This issue was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants