这里使用telegram的bot来发送报警信息,首先搜索BotFather申请一个bot,然后查看他的api token image.png 记住这个token,然后获取一个频道chat id,可以搜索Get My ID这个机器人,把机器人拉进群组就会自动生成频道idimage.png 记住这个id,然后准备一个python脚本,替换里面的token和频道id

import json
import time

from flask import Flask, request
import requests

app = Flask(__name__)

TELEGRAM_BOT_TOKEN = '机器人的api token'
TELEGRAM_CHAT_ID = '频道id,-开头'


@app.route('/alert', methods=['POST'])
def post_alert():
    alert = request.get_json()
    send_telegram_message(alert)
    return '', 200


def send_telegram_message(message):
    receiver = message['receiver']
    status = message['status']
    # 获取所有警报
    alerts = message['alerts']
    messages = []
    for alert in alerts:
        alert_status = alert['status']
        alert_name = alert['labels']['alertname']
        severity = alert['labels']['severity']
        description = alert['annotations']['description']
        summary = alert['annotations']['summary']
        runbook_url = alert['annotations'].get('runbook_url', '')
        # 组合为一条消息
        message = (f"接收者:{receiver}"
                   f"\n状态:{status}"
                   f"\n警报名称:{alert_name}"
                   f"\n警报状态:{alert_status}"
                   f"\n严重级别:{severity}"
                   f"\n描述:{description}"
                   f"\n概括:{summary}"
                   f"\n参考方法: {runbook_url}")
        messages.append(message)
    # 合并所有消息为一个字符串
    alerts_info = "\n---\n".join(messages)
    url = f'https://api.telegram.org/bot{TELEGRAM_BOT_TOKEN}/sendMessage'
    payload = {
        'chat_id': TELEGRAM_CHAT_ID,
        'text': alerts_info,
        'parse_mode': 'Markdown'
    }
    while True:
        try:
            requests.post(url, json=payload)
            break
        except Exception as e:
            print(e)
            time.sleep(3)


if __name__ == '__main__':
    app.run(port=5001, host="0.0.0.0")

端口自定,这里用的5001,host是设置使得服务器能够接收的ip,0.0.0.0代表所有ip 然后运行该脚本测试一下是否可以把消息发送到频道 image.png 执行脚本如果报错缺少库,先把库安装一下

pip3 install flask
pip3 install requests

然后准备一个json的内容,通过post请求到脚本的地址加端口

 curl -X POST -H "Content-Type: application/json" -d '{"receiver":"telegram-webhook","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"KubeControllerManagerDown","prometheus":"monitoring/k8s","severity":"critical"},"annotations":{"description":"KubeControllerManager has disappeared from Prometheus target discovery.","runbook_url":"https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubecontrollermanagerdown","summary":"Target disappeared from Prometheus target discovery."},"startsAt":"2024-02-17T02:43:32.85Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr=absent%28up%7Bjob%3D%22kube-controller-manager%22%7D+%3D%3D+1%29&g0.tab=1","fingerprint":"838121cc4ca56ab5"},{"status":"firing","labels":{"alertname":"KubeSchedulerDown","prometheus":"monitoring/k8s","severity":"critical"},"annotations":{"description":"KubeScheduler has disappeared from Prometheus target discovery.","runbook_url":"https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeschedulerdown","summary":"Target disappeared from Prometheus target discovery."},"startsAt":"2024-02-17T02:43:21.848Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://prometheus-k8s-0:9090/graph?g0.expr=absent%28up%7Bjob%3D%22kube-scheduler%22%7D+%3D%3D+1%29&g0.tab=1","fingerprint":"92b2f1b7ee31decf"}],"groupLabels":{},"commonLabels":{"prometheus":"monitoring/k8s","severity":"critical"},"commonAnnotations":{"summary":"Target disappeared from Prometheus target discovery."},"externalURL":"http://alertmanager-main-1:9093","version":"4","truncatedAlerts":0}' http://脚本地址:5001/alert

请求路径需要带上/alert 发送请求后telegram的群组就会看到机器人的报警信息 image.png

收到信息说明脚本没什么问题,这个脚本只需要放在一台服务器上运行着就ok 接下来配置alertmanager的报警接受者,我用的kube-prometheus套件,直接修改alertmanager的配置yaml文件,cat manifests/alertmanager-secret.yaml

apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.25.0
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname', 'job']
      group_wait: 10s
      group_interval: 10m
      repeat_interval: 1h
      receiver: 'telegram-bot-webhook'
    
    receivers:
    - name: 'telegram-bot-webhook'
      webhook_configs:
      - url: 'http://脚本运行的服务器ip:5001/alert'
        send_resolved: true

group_wait: 这是当第一个告警触发后,Alertmanager将等待的时间长度,以收集可能属于同一组的其他告警通知。例如,设置为10s意味着当第一个告警触发后,Alertmanager会等待10秒钟,看是否有其他相关的告警也会触发,从而可以一起发送通知,以减少单独告警通知的数量。

group_interval: 一旦告警通知已经发送过一次后,这是Alertmanager在发送同一组告警的下一批通知前将等待的时间。在这个例子中,设置为10m意味着,在一组告警的通知被发送后,即便这组告警中还有告警仍处于触发状态,Alertmanager也会等待10分钟才会再次发送关于这组告警的通知。

repeat_interval: 这是即使没有新的告警加入,已被通知过的告警再次发送通知之前需要等待的时间间隔。在这个例子中设置为1h,意味着对同一告警,即使它持续触发状态不变,Alertmanager也会每小时重复发送一次通知。 这三个参数按照需求修改,改完后apply一下,然后重启alertmanager的pod,触发报警后就会将内容发送给脚本,然后脚本将内容转换发送到telegram的频道了 image.png

也可以让脚本作为容器运行在k8s内,创建一个目录,把脚本放到这个目录,在写一个Dockerfile,制作一个镜像

FROM python:3.8-slim
 
# 设置工作目录
WORKDIR /src
 
# 复制当前目录添加到容器工作目录
ADD . /src

#下载库
RUN pip install flask requests
 
 
# 启动脚本
CMD python app.py

image.png 然后构建镜像

docker build -t 镜像名字:版本号 .

执行这个命令构建出来镜像,然后推到本地仓库或者docker hub,云镜像仓库都可以 推上去之后将在集群创建一个deploy+svc,引用这个镜像就可以了

apiVersion: apps/v1
kind: Deployment
metadata:
  name: telegarm-hook
  namespace: monitoring
spec:
  replicas: 1 
  selector:
    matchLabels:
      app: telegarm-hook  
  template:
    metadata:
      labels:
        app: telegarm-hook
    spec:
      containers:
      - name: telegarm-hook
        image: 脚本镜像
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 5001
          name: http
        resources:
          requests:
            cpu: 50m
            memory: 100Mi
          limits:
            cpu: 50m
            memory: 100Mi
---
apiVersion: v1
kind: Service
metadata:
  name: telegarm-hook
  namespace: monitoring
spec:
  type: NodePort
  selector:
    app: telegarm-hook
  ports:
  - name: hook
    port: 5001
    targetPort: http

这里提供一个yaml文件,根据集群情况自己修改就行,如果在k8s内运行需要把alertmanager的配置也修改一下

apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/instance: main
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.25.0
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname', 'job']
      group_wait: 10s
      group_interval: 10m
      repeat_interval: 1h
      receiver: 'telegram-bot-webhook'
    
    receivers:
    - name: 'telegram-bot-webhook'
      webhook_configs:
      - url: 'http://telegarm-hook.monitoring.svc.cluster.local:5001/alert'
        send_resolved: true

type: Opaque

同样的文件,url地址直接使用集群内部的调用方式就行,修改完配置文件直接删了alertmanager的3个pod,重启就行

image.png