prometheus部署详情作业

下载各种二进制文件压缩包

prometheus-2.47.0tar.gz #prometheus服务端

mysqld_exporter-0.15.0.linux-amd64.tar.gz #mysql_exporter节点

node_exporter-1.6.1.linux-amd64.tar.gz #node_exporter节点

alertmanager-0.26.0.linux-amd64.tar-vwsq.gz #alertmanager报警器

prometheus-webhook-dingtalk.tar.gz #webhook报警器

redis_exporter-v1.54.0.linux-amd64.tar.gz #redis_exporter节点

prometheus.server部署

进入官网下载prometheus版本https://prometheus.io/download/

以下载的2.47.0为例

wget https://github.com/prometheus/prometheus/releases/download/v2.47.0/prometheus-2.47.0.linux-amd64.tar.gz #下载官网压缩包
tar zxf prometheus-2.47.0.linux-amd64.tar.gz -C /usr/local/ #解压
mv prometheus-2.47.0.linux-amd64/ prometheus #改名,方便看
vim /usr/lib/systemd/system/prometheus.service #设置system管理系统
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/usr/local/prometheus/prometheus --storage.tsdb.path=/usr/local/prometheus/data --config.file=/usr/local/prometheus/prometheus.yml
[Install]
WantedBy=multi-user.target

cd /usr/local/prometheus/ && cp prometheus.yml prometheus.yml,bak #备份配置文件

添加exporter

vim prometheus.yml #在最下面添加配置文件exporter节点和mysql节点

  - job_name: "node1"
    static_configs:
     - targets: ["192.168.48.139:9100"]
  - job_name: "mysql1"
    static_configs:
     - targets: ["192.168.48.139:9104"]

./promtool check config prometheus.yml #每次修改配置文件后最好输入一次,检测语法是否正确

每次修改配置文件后重启prometheus生效,所有添加的exporter都是套用此模板即可,启动查看他的端口号,由prometheus获得他收集的数据

systemctl start prometheus.service #启动服务,端口默认9090

访问ip加9090端口进入prometheus的web页面,如果提醒Warning: Error fetching server time:说明时间不对,需要调整

被监控端操作

添加node-exporter节点 #

wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz #下载node-exporter
tar xfz node_exporter-1.6.1.linux-amd64.tar.gz -C /usr/local/ && cd /usr/local/ #解压
mv node_exporter-1.6.1.linux-amd64/ node_exporter #改名
nohup ./node_exporter & #在后台启动,查看端口9100是否使用,启动失败查看nohup文件报错信息

登陆web查看

设置mysql-exporter节点

wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.0/mysqld_exporter-0.15.0.linux-amd64.tar.gz #下载
tar xfz mysqld_exporter-0.15.0.linux-amd64.tar.gz -C /usr/local/prometheus && cd /usr/local/prometheus && mv mysqld_exporter-0.15.0.linux-amd64 mysqld_exporter && cd mysqld_exporter#解压改名

nohup ./mysqld_exporter --web.listen-address=0.0.0.0:9104 --config.my-cnf=/etc/my.cnf --collect.auto_increment.columns --collect.binlog_size --collect.info_schema.innodb_metrics --collect.info_schema.processlist --collect.info_schema.tables --collect.info_schema.tablestats --collect.slave_status --collect.global_status --collect.global_variables & 

注意,config指定的配置文件路径需要在[client]块内写入一个用户的账号密码,用于读取mysql的数据

默认端口占用9104

 cat /etc/my.cnf 
[client]
user=root
password='1'
port = 3306
socket = /tmp/mysql.sock
default-character-set = utf8

#mysql的基本参数,启动命令--collect.auto_increment.columns:自增列信息

  • --collect.binlog_size:二进制日志大小

  • --collect.info_schema.innodb_metrics:InnoDB 存储引擎的指标信息

  • --collect.info_schema.processlist:当前执行的进程信息

  • --collect.info_schema.tables:信息模式下的数据表信息

  • --collect.info_schema.tablestats:信息模式下的数据表统计信息

  • --collect.slave_status:主从复制状态信息

  • --collect.global_status:全局 MySQL 状态信息

  • --collect.global_variables:全局 MySQL 变量信息

grafana的mysql的模板有18382,主从模板有7362

设置redis-exporter节点

https://github.com/oliver006/redis_exporter/releases #redis_exporter官网

wget https://github.com/oliver006/redis_exporter/releases/download/v1.29.0/redis_exporter-1.29.0.linux-amd64.tar.gz #下载
tar xfz redis_exporter-v1.54.0.linux-amd64.tar.gz -C /usr/local/prometheus && cd /usr/local/prometheus && mv redis_exporter-v1.54.0.linux-amd64 redis_exporter && cd redis_exporter #解压改名
nohup ./redis_exporter --redis.addr 192.168.48.139:6379 & #启动

监控redis各项指标,默认端口占用9121

grafane的redis模板有11835

监控端操作

设置报警器altermanager

在prometheus的server端下载altermanager

wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz #官网下载最新版
tar xfz /root/tar合集/alertmanager-0.26.0.linux-amd64.tar.gz -C /usr/local/ && cd /usr/local/ && mv alertmanager-0.26.0.linux-amd64 alertmanager #解压改名
cp alertmanager/alertmanager.yml alertmanager/alertmanager.yml.bak #备份配置文件
./amtool check-config alertmanager.yml #官方检测配置文件语法是否有问题的脚本

添加alertmanagers报警器

vim prometheus.yml #在主配置文件内找到alerting修改

设置alertmanagers所在的主机ip和端口,rule定义了采用哪个目录内的报警规则

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - 192.168.48.147:9093
rule_files:
  - "/usr/local/prometheus/rules/*.yml"

mkdir prometheus/rules/ #创建报警规则的目录

设置邮箱报警配置文件

vim alertmanager.yml

global:
  resolve_timeout: 5m #处理超时时间,默认为5min
  smtp_from: '793653518@qq.com'  #发送邮箱名称
  smtp_smarthost: 'smtp.qq.com:25' #邮箱smtp服务器代理
  smtp_auth_username: '793653518@qq.com' #邮箱名称
  smtp_auth_password: 'fpeqhpuyhjhobcbb' #邮箱授权码
  smtp_require_tls: false
route:
  group_by: ['alertname'] #报警分组依据
  group_wait: 10s #最初即第一次等待多久时间发送一组警报的通知
  group_interval: 10s #在发送新警报前的等待时间
  repeat_interval: 1m #发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝 
  receiver: 'email' #发送警报的接收者的名称,以下receivers name的名称
receivers:
- name: 'email' #警报,引用receiver定义的名称
  email_configs:  #邮箱配置
  - to: 'lanchi0831@foxmail.com' #收件人的邮箱
    send_resolved: true #一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签.
inhibit_rules: #抑制规则
  - source_match: #源标签
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

然后

./amtool check-config alertmanager.yml #检测配置文件

nohup ./alertmanager --config.file=./alertmanager.yml #启动,端口使用9093和9094

设置钉钉报警

去官网下载webhook服务

https://github.com/timonwong/prometheus-webhook-dingtalk

wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz #下载
tar xfz prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz -C /usr/local/alertmanager && cd /usr/local/alertmanager/prometheus-webhook-dingtalk-2.1.0.linux-amd64 #解压到alertmanager目录移动过去
cp config.example.yml config.yml #备份配置文件

找到钉钉群内的webhook机器人,获取url地址和加签

vim config.yml #编辑配置文件

templates:
  - /data/ding.tmpl
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=863a37411584a310ba7ef80bb0af2a3e766df34f52f84a306103ec586f63b2da
    secret: SEC566fc700806d1248b9da08877fb1a51fa4a24648ed59ec54595030497916fc40

url和secret是地址和加签地址,templates使用的默认模板,可以修改

nohup ./prometheus-webhook-dingtalk --config.file=config.yml & #启动webhook

cd /usr/local/alertmanager && vim ./alertmanager.yml #设置钉钉报警配置文件

global:
  resolve_timeout: 5m
  smtp_from: '793653518@qq.com'
  smtp_smarthost: 'smtp.qq.com:25'
  smtp_auth_username: '793653518@qq.com'
  smtp_auth_password: 'fpeqhpuyhjhobcbb'
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 1m
  repeat_interval: 1m
  receiver: 'dingding'
receivers:
- name: 'dingding'
  webhook_configs:
  - send_resolved: true
    url: http://localhost:8060/dingtalk/webhook1/send #输入上面查看webhook启动时显示的地址
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

./amtool check-config alertmanager.yml #检测配置文件

nohup ./alertmanager --config.file=./alertmanager.yml #启动,端口使用9093和9094

设置邮箱加钉钉,多个报警调整receivers: 模块

global:
  resolve_timeout: 5m
  smtp_from: '793653518@qq.com'
  smtp_smarthost: 'smtp.qq.com:25'
  smtp_auth_username: '793653518@qq.com'
  smtp_auth_password: 'fpeqhpuyhjhobcbb'
  smtp_require_tls: false
templates:
  - '/usr/local/alertmanager/*.tmpl'
route:
  receiver: default
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1m
receivers:
- name: 'default'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true
  webhook_configs:
  - send_resolved: true
    url: "http://localhost:8060/dingtalk/webhook1/send"
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true
- name: 'dingding'
  webhook_configs:
  - send_resolved: true
    url: "http://localhost:8060/dingtalk/webhook1/send"
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: 
      - alertname
      - dev
      - instance

这样设置可以同时触发多个接收器