网站首页 > 博客文章 正文
前言:文中“实操示例”配置内容,可按需要进行拆解安装配置
一、环境准备
- Kubernetes 集群
确保已部署 Kubernetes 集群(版本 ≥1.20),且 kubectl 工具已配置。 - 镜像仓库
确认镜像 harbor.fq.com/prometheus/node-exporter:v1.8.2 和 Prometheus 相关镜像在私有仓库中可用。 - 命名空间
默认使用 default 命名空间,可根据需求调整至 monitoring(需同步修改所有 YAML 文件中的 namespace 字段)。
二、创建 RBAC 权限
目标:为 Prometheus 分配访问 Kubernetes API 的权限。
1. 创建 ServiceAccount
# prometheus-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
secrets:
- name: prometheus-token
解释:
- ServiceAccount prometheus 用于 Prometheus 的身份认证。
- secrets 字段关联一个 Secret(prometheus-token),存储访问凭证。
2. 创建 ClusterRole
# prometheus-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
解释:
- 授予 Prometheus 访问节点、服务、Pod 等资源的权限。
- 允许读取 /metrics 端点(非资源 URL)。
3. 创建 ClusterRoleBinding
# prometheus-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: default
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: rbac.authorization.k8s.io
解释:
- 将 prometheus ClusterRole 绑定到 prometheus ServiceAccount,确保权限生效。
4. 生成 ServiceAccount Token
# prometheus-token.yaml
apiVersion: v1
kind: Secret
metadata:
name: prometheus-token
annotations:
kubernetes.io/service-account.name: prometheus
type: kubernetes.io/service-account-token
应用 RBAC 配置:
kubectl apply -f prometheus-serviceaccount.yaml
kubectl apply -f prometheus-clusterrole.yaml
kubectl apply -f prometheus-clusterrolebinding.yaml
kubectl apply -f prometheus-token.yaml
☆实操示例
cat prometheus-rabc0227.yaml
---
# 1. 创建 monitoring 命名空间
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
---
# 2. 创建 Prometheus 使用的 ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
# 3. 创建 ClusterRole,定义 Prometheus 的权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups: [""]
resources:
- nodes/proxy
verbs: ["get", "list", "watch"]
- apiGroups: ["networking.k8s.io"]
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
# 4. 将 ClusterRole 绑定到 ServiceAccount
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
---
三、部署 Node Exporter
目标:在每个节点上部署 Node Exporter,收集节点资源指标。
# node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostNetwork: true
containers:
- name: node-exporter
image: harbor.fq.com/prometheus/node-exporter:v1.8.2
args:
- --path.rootfs=/host
volumeMounts:
- name: rootfs
mountPath: /host
volumes:
- name: rootfs
hostPath:
path: /
解释:
- DaemonSet 确保每个节点运行一个 Node Exporter Pod。
- hostNetwork: true 使用节点网络,直接暴露节点指标。
- hostPath 挂载根文件系统,用于收集节点级数据。
部署命令:
kubectl apply -f node-exporter-daemonset.yml
☆实操示例
cat node-exporter-daemonset.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring # 使用 "monitoring" 命名空间
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
annotations:
prometheus.io/scrape: "true" # 允许 Prometheus 抓取数据
prometheus.io/port: "9100" # 指定 Node Exporter 端口
spec:
hostNetwork: true # 允许 Pod 使用主机网络
hostPID: true # 允许访问主机的 PID 进程
tolerations:
- effect: NoSchedule # 允许调度到 tainted 节点
operator: Exists
- effect: NoExecute
operator: Exists
securityContext:
runAsNonRoot: true # 避免使用 root 权限
runAsUser: 65534 # 运行时使用 nobody 用户
containers:
- name: node-exporter
image: harbor.fq.com/prometheus/node-exporter:v1.8.2 # 替换为可信赖的镜像地址
args:
- --path.rootfs=/host/root # 设定 rootfs 路径
- --path.procfs=/host/proc # 设定 procfs 路径
- --path.sysfs=/host/sys # 设定 sysfs 路径
- --no-collector.wifi # 禁用 WiFi 采集
- --no-collector.hwmon # 禁用硬件监控采集
ports:
- containerPort: 9100
protocol: TCP
resources: # 资源请求与限制
requests:
memory: "30Mi"
cpu: "100m"
limits:
memory: "50Mi"
cpu: "200m"
volumeMounts: # 挂载主机目录
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
- name: rootfs
mountPath: /host/root
readOnly: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
---
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
annotations:
prometheus.io/scrape: 'true' # 允许 Prometheus 采集
prometheus.io/port: '9100' # 采集端口
spec:
selector:
k8s-app: node-exporter
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
type: ClusterIP # 仅在集群内部可访问
四、部署 Prometheus
目标:部署 Prometheus 主服务,配置抓取规则和持久化存储。
1. 创建持久化存储卷(PV/PVC)
根据集群存储类型(如 NFS、Local PV、云存储),创建 PVC 并挂载到 Prometheus。
示例(需根据实际环境调整):
# prometheus-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
2. 创建 Prometheus Deployment
# prometheus-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: prom/prometheus:v2.42.0
args:
- "--config.file=/etc/prometheus/prometheus.yml"
ports:
- containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- name: data-volume
mountPath: /prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: data-volume
persistentVolumeClaim:
claimName: prometheus-data
☆实操示例
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring # 指定命名空间
labels:
app: prometheus
spec:
replicas: 1 # 生产环境通常建议 1 个实例,使用远程存储提高可用性
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus # 关联 ServiceAccount,便于 RBAC 访问
containers:
- name: prometheus
image: harbor.fq.com/prometheus/prometheus:v3.1.0 # 使用私有仓库镜像
args:
- --config.file=/etc/prometheus/prometheus.yml # 指定 Prometheus 配置文件
- --storage.tsdb.path=/prometheus # 存储 TSDB 数据的位置
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
ports:
- containerPort: 9090 # Prometheus Web 界面端口
resources: # 限制 CPU 和内存,防止资源耗尽
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus # 配置文件挂载点
- name: prometheus-storage
mountPath: /prometheus # TSDB 数据存储路径
- name: file-sd
mountPath: /apps/prometheus/file-sd.yaml # 动态目标发现文件路径
subPath: file-sd.yaml # 仅挂载文件,而不是整个目录
volumes:
- name: prometheus-config
configMap:
name: prometheus-config # 从 ConfigMap 挂载 Prometheus 配置
- name: prometheus-storage
# persistentVolumeClaim: # 生产环境使用 PVC 持久化存储
# claimName: prometheus-pvc
emptyDir: {} # 测试环境可使用空目录
- name: file-sd
hostPath:
path: /root/file-sd.yaml # 使用主机上的动态发现文件
type: File
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
type: NodePort # 在生产环境中建议使用 LoadBalancer 或 Ingress
ports:
- port: 9090
targetPort: 9090
nodePort: 30090 # 通过 NodePort 访问 Web 界面
selector:
app: prometheus
3. 创建 Prometheus ConfigMap
# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'pushgateway'
static_configs:
- targets: ['pushgateway:9091']
- job_name: 'node-linux'
static_configs:
- targets: ['10.255.209.40:9100']
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [_meta_kubernetes_node_ip]
regex: '(.*):10250' # Kubernetes 节点的默认 kubelet 端口
replacement: '${1}:9100' # Node Exporter 的监听端口
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- kube-system
- default
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
regex: (.+)
replacement: ${1}:9090
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: /var/run/secrets/kubernetes.io/serviceaccount/kubeconfig
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
应用配置:
kubectl apply -f prometheus-pvc.yaml
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus-deployment.yaml
☆实操示例
cat prometheus-configmap0227.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_timeout: 10s # 添加超时时间,避免抓取任务卡住
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
# 抓取 Prometheus 自身指标
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# 抓取 Node Exporter 指标
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# 抓取 cAdvisor 指标
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
# 抓取 Pushgateway 指标
- job_name: 'pushgateway'
static_configs:
- targets: ['pushgateway:9091']
# 抓取特定节点的 Node Exporter 指标
- job_name: 'node-linux'
static_configs:
- targets: ['10.255.209.40:9100']
# 抓取 Kubernetes API Server 指标
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# 抓取 Kubernetes 节点指标(通过 Node Exporter)
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100' # 将 kubelet 端口替换为 Node Exporter 端口
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
# 抓取 Kubernetes Pods 指标
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 抓取 Kubernetes Service Endpoints 指标
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
#scheme: https
#tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
#bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
- job_name: 'kubernetes-nginx-endpoints' # 任务名称
kubernetes_sd_configs:
- role: endpoints # 自动发现 Kubernetes Endpoints
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的 Service
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换抓取协议(http 或 https)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
# 替换指标路径(默认为 /metrics)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
# 替换抓取地址和端口
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
# 将 Kubernetes 标签映射到 Prometheus 标签
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
# 添加 Kubernetes Namespace 标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
# 添加 Kubernetes Service 名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# 添加 Kubernetes Node 名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: kubernetes_node_name
# 如果需要抓取 HTTPS 端点,取消注释以下配置
# scheme: https
# tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# insecure_skip_verify: true # 生产环境中建议关闭,配置正确的 CA 证书
# bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: 'kube-state-metrics'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- kube-system
- monitoring
- default
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
action: keep
regex: kube-state-metrics
- source_labels: [__meta_kubernetes_endpoint_port_name]
action: keep
regex: http-metrics
metrics_path: /metrics
scheme: http
- job_name: "file_sd"
file_sd_configs:
- files:
- /apps/prometheus/file-sd.yaml
refresh_interval: 1m
- job_name: 'redis'
kubernetes_sd_configs:
- role: endpoints # 从 Kubernetes Endpoints 发现服务
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换目标地址为服务的 IP 和指定端口(9121)
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
action: keep
regex: Pod;(.*redis.*) # 仅抓取名称包含 "redis" 的 Pod
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
replacement: $1:9121 # 指定 Redis Exporter 的端口为 9121
# 添加 Kubernetes 服务的 app 标签
- source_labels: [__meta_kubernetes_service_label_app]
action: replace
target_label: app
# 添加 Kubernetes 命名空间标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 添加 Kubernetes 服务名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 添加 Kubernetes 节点名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
# 添加实例标签(用于区分不同的 Redis 实例)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: instance
- job_name: 'mysql'
kubernetes_sd_configs:
- role: endpoints # 从 Kubernetes Endpoints 发现服务
relabel_configs:
# 只抓取带有 `prometheus.io/scrape: "true"` 注解的服务
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
# 替换目标地址为服务的 IP 和指定端口(9104)
- source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
action: keep
regex: Pod;(.*mysql-exporter.*) # 仅抓取名称包含 "mysql-exporter" 的 Pod
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: __address__
replacement: $1:9104 # 指定 MySQL Exporter 的端口为 9104
# 添加 Kubernetes 服务的 app 标签
- source_labels: [__meta_kubernetes_service_label_app]
action: replace
target_label: app
# 添加 Kubernetes 命名空间标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: namespace
# 添加 Kubernetes 服务名称标签
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: service
# 添加 Kubernetes Pod 名称标签
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: pod
# 添加 Kubernetes 节点名称标签
- source_labels: [__meta_kubernetes_pod_node_name]
action: replace
target_label: node
# 添加实例标签(用于区分不同的 MySQL 实例)
- source_labels: [__meta_kubernetes_pod_ip]
action: replace
target_label: instance
4. 暴露 Prometheus 服务
# prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
selector:
app: prometheus
应用服务:
kubectl apply -f prometheus-service.yaml
五、验证部署
- 检查 Pod 状态:
- kubectl get pods -l app=prometheus -n default kubectl get pods -n kube-system -l app=node-exporter
- 预期输出:所有 Pod 状态为 Running。
- 访问 Prometheus UI:
通过浏览器访问 http://<NodeIP>:30090,进入 Prometheus 控制台。 - 在 Status > Targets 页面,确认 kubernetes-nodes 和 kubernetes-pods 任务状态为 UP。
- 查询 up{job="kubernetes-nodes"} 验证指标抓取是否正常。
六、常见问题排查
- 权限问题
- 错误示例:Failed to list *v1.Pod: forbidden
- 解决:检查 ClusterRoleBinding 是否绑定到正确的 ServiceAccount 和命名空间。
- Node Exporter 未启动
- 检查 DaemonSet 是否部署到所有节点,确认镜像拉取无错误。
- Prometheus 无法抓取指标
- 检查 Prometheus 配置中的 scrape_configs 是否指向正确的端口(如 Node Exporter 默认端口为 9100)。
- 验证网络连通性:kubectl exec -it prometheus-pod -- curl http://<NodeIP>:9100/metrics。
七、后续优化
- 配置 Alertmanager:添加告警规则并集成 Alertmanager 实现告警通知。
- 持久化存储优化:使用高可用存储方案(如 Ceph、Longhorn)保障数据可靠性。
- 监控 Dashboard:部署 Grafana,导入 Prometheus 数据源并配置监控看板。
猜你喜欢
- 2025-05-11 idea整合dockerfile插件,打包镜像(docker环境可不安装)
- 2025-05-11 超详细!基于k8s+docker+jenkins的一站式 DevOps 环境搭建教程-下
- 2025-05-11 Kubernetes(K8s)+ GitLab + Jenkins 实现CI/CD
- 2025-05-11 docker的镜像和仓库(docker镜像仓库地址)
- 2025-05-11 轻量容器如何改变开发世界?Docker 基本概念与架构详解
- 2025-05-11 微服务时代,运维必须了解的那些事(服务架构演变)
- 2025-05-11 目前还能用的Docker容器加速方案和可用镜像源
- 2025-05-11 替代虚拟机的容器Docker安装教程——(Windows版)
- 2025-05-11 k8s系列-06-containerd的基本操作
- 2025-05-11 K8s+Jenkins+Harbor+Gitlab+Pipeline+Rust 持续集成(三)
你 发表评论:
欢迎- 367℃用AI Agent治理微服务的复杂性问题|QCon
- 358℃初次使用IntelliJ IDEA新建Maven项目
- 358℃手把手教程「JavaWeb」优雅的SpringMvc+Mybatis整合之路
- 351℃Maven技术方案最全手册(mavena)
- 348℃安利Touch Bar 专属应用,让闲置的Touch Bar活跃起来!
- 346℃InfoQ 2024 年趋势报告:架构篇(infoq+2024+年趋势报告:架构篇分析)
- 345℃IntelliJ IDEA 2018版本和2022版本创建 Maven 项目对比
- 342℃从头搭建 IntelliJ IDEA 环境(intellij idea建包)
- 最近发表
- 标签列表
-
- powershellfor (55)
- messagesource (56)
- aspose.pdf破解版 (56)
- promise.race (63)
- 2019cad序列号和密钥激活码 (62)
- window.performance (66)
- qt删除文件夹 (72)
- mysqlcaching_sha2_password (64)
- ubuntu升级gcc (58)
- nacos启动失败 (64)
- ssh-add (70)
- jwt漏洞 (58)
- macos14下载 (58)
- yarnnode (62)
- abstractqueuedsynchronizer (64)
- source~/.bashrc没有那个文件或目录 (65)
- springboot整合activiti工作流 (70)
- jmeter插件下载 (61)
- 抓包分析 (60)
- idea创建mavenweb项目 (65)
- vue回到顶部 (57)
- qcombobox样式表 (68)
- vue数组concat (56)
- tomcatundertow (58)
- pastemac (61)
本文暂时没有评论,来添加一个吧(●'◡'●)