本文共 6302 字,大约阅读时间需要 21 分钟。
prometheus与alertmanager作为container运行在同一个pods中并交由Deployment控制器管理,alertmanager默认开启9093端口,因为我们的prometheus与alertmanager是处于同一个pod中,所以prometheus直接使用localhost:9093就可以与alertmanager通信(用于发送告警通知),告警规则配置rules.yml以Configmap的形式挂载到prometheus容器供prometheus使用,告警通知对象配置也通过Configmap挂载到alertmanager容器供alertmanager使用,这里我们使用邮件接收告警通知,具体配置在alertmanager.yml中
环境:Linux 3.10.0-693.el7.x86_64 x86_64 GNU/Linux
平台:Kubernetes v1.10.5Tips:prometheus与alertmanager完整的配置在文档末尾
在prometheus中指定告警规则的路径, rules.yml就是用来指定报警规则,这里我们将rules.yml用ConfigMap的形式挂载到/etc/prometheus目录下面即可:
rule_files:- /etc/prometheus/rules.yml
这里我们指定了一个InstanceDown告警,当主机挂掉1分钟则prometheus会发出告警
rules.yml: | groups: - name: example rules: - alert: InstanceDown expr: up == 0 for: 1m labels: severity: page annotations: summary: "Instance { { $labels.instance }} down" description: "{ { $labels.instance }} of job { { $labels.job }} has been down for more than 1 minutes."
alertmanager默认开启9093端口,又因为我们的prometheus与alertmanager是处于同一个pod中,所以prometheus直接使用localhost:9093就可以与alertmanager通信
alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"]
我们这里举了一个邮件告警的例子,alertmanager接收到prometheus发出的告警时,alertmanager会向指定的邮箱发送一封告警邮件,这个配置也是通过Configmap的形式挂载到alertmanager所在的容器中供alertmanager使用
alertmanager.yml: |- global: smtp_smarthost: 'smtp.exmail.qq.com:465' smtp_from: 'xin.liu@woqutech.com' smtp_auth_username: 'xin.liu@woqutech.com' smtp_auth_password: 'xxxxxxxxxxxx' smtp_require_tls: false route: group_by: [alertname] group_wait: 30s group_interval: 5m repeat_interval: 10m receiver: default-receiver receivers: - name: 'default-receiver' email_configs: - to: '1148576125@qq.com'
在prometheus web ui中可以看到 配置的告警规则
为了看测试效果,关掉一个主机节点:
在prometheus web ui中可以看到一个InstanceDown告警被触发
在alertmanager web ui中可以看到alertmanager收到prometheus发出的告警
指定接收告警的邮箱收到alertmanager发出的告警邮件
node_exporter_daemonset.yaml
apiVersion: extensions/v1beta1kind: DaemonSetmetadata: name: node-exporter namespace: kube-system labels: app: node_exporterspec: selector: matchLabels: name: node_exporter template: metadata: labels: name: node_exporter spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule containers: - name: node-exporter image: alery/node-exporter:1.0 ports: - name: node-exporter containerPort: 9100 hostPort: 9100 volumeMounts: - name: localtime mountPath: /etc/localtime - name: host mountPath: /host readOnly: true volumes: - name: localtime hostPath: path: /usr/share/zoneinfo/Asia/Shanghai - name: host hostPath: path: /
alertmanager-cm.yaml
kind: ConfigMapapiVersion: v1metadata: name: alertmanager namespace: kube-systemdata: alertmanager.yml: |- global: smtp_smarthost: 'smtp.exmail.qq.com:465' smtp_from: 'xin.liu@woqutech.com' smtp_auth_username: 'xin.liu@woqutech.com' smtp_auth_password: 'xxxxxxxxxxxx' smtp_require_tls: false route: group_by: [alertname] group_wait: 30s group_interval: 5m repeat_interval: 10m receiver: default-receiver receivers: - name: 'default-receiver' email_configs: - to: '1148576125@qq.com'
prometheus-rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1beta1kind: ClusterRolemetadata: name: prometheus namespace: kube-systemrules:- apiGroups: [""] resources: - nodes - nodes/proxy - services - endpoints - pods verbs: ["get", "list", "watch"]- nonResourceURLs: ["/metrics"] verbs: ["get"]---apiVersion: v1kind: ServiceAccountmetadata: name: prometheus namespace: kube-system---apiVersion: rbac.authorization.k8s.io/v1beta1kind: ClusterRoleBindingmetadata: name: prometheus namespace: kube-systemroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheussubjects:- kind: ServiceAccount name: prometheus namespace: kube-system
prometheus-cm.yaml
kind: ConfigMapapiVersion: v1data: prometheus.yml: | rule_files: - /etc/prometheus/rules.yml alerting: alertmanagers: - static_configs: - targets: ["localhost:9093"] scrape_configs: - job_name: 'node' kubernetes_sd_configs: - role: pod relabel_configs: - source_labels: [__meta_kubernetes_pod_ip] action: replace target_label: __address__ replacement: $1:9100 - source_labels: [__meta_kubernetes_pod_host_ip] action: replace target_label: instance - source_labels: [__meta_kubernetes_pod_node_name] action: replace target_label: node_name - action: labelmap regex: __meta_kubernetes_pod_label_(name) - source_labels: [__meta_kubernetes_pod_label_name] regex: node_exporter action: keep rules.yml: | groups: - name: example rules: - alert: InstanceDown expr: up == 0 for: 5m labels: severity: page annotations: summary: "Instance { { $labels.instance }} down" description: "{ { $labels.instance }} of job { { $labels.job }} has been down for more than 5 minutes." - alert: APIHighRequestLatency expr: api_http_request_latencies_second{quantile="0.5"} > 1 for: 10m annotations: summary: "High request latency on { { $labels.instance }}" description: "{ { $labels.instance }} has a median request latency above 1s (current value: { { $value }}s)"metadata: name: prometheus-config-v0.1.0 namespace: kube-system
prometheus.yaml
apiVersion: extensions/v1beta1kind: Deploymentmetadata: namespace: kube-system name: prometheus labels: name: prometheusspec: replicas: 1 selector: matchLabels: app: prometheus template: metadata: name: prometheus labels: app: prometheus spec: serviceAccountName: prometheus nodeSelector: node-role.kubernetes.io/master: "" tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master
转载地址:http://gvyto.baihongyu.com/