Prometheus Trouble Shooting

记录一些 Prometheus 使用问题排查方法。

Prometheus 磁盘使用率高

要想知道哪些指标占用了最多资源,最好是计算每个指标有多少时间序列,然后显示前 10 个指标:

1
topk(10, count by (__name__)({__name__=~".+"}))

显示前 10 个指标属于哪个 Job:

1
topk(10, count by (__name__, job)({__name__=~".+"}))

查看时间序列最多的前 10 个 Job:

1
topk(10, count by (job)({__name__=~".+"}))

需要注意的是,这类 PromQL 会查询 Prometheus 中所有的指标,如果数据量大的话,可能会造成 Prometheus 有较高的负载。

Author

Warner Chen

Posted on

2025-05-29

Updated on

2025-05-29

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.