我们在k8s集群中使用云原生的promethues通常需要用到coreos的prometheus-operater,它可以方便的帮助我们在k8s中部署和配置使用prometheus。但prometheus并不是开箱即用的,如果要做到开箱即用的监控全家桶,官方提供了两个选择,分别是prometheus-operater helm chart和kube-prometheus。这两者都可以为我们提供开箱即用的方式部署promethues+alertmanager+promethues-push-gateway(kube-promethueus不包含,需要单独部署)+grafana全家桶,同时包含kubernetes-mixin的一整套报警规则和node-exporter,kube-state-metrics等一系列metrics exporter。区别在于helm chart由社区维护,而kube-promethues由coreos维护。这里我们将以kube-prometheus为例,简要说明配置和使用方式。
首先是部署,还是非常简单的,我们先将kube-prometheus的仓库clone下来
git clone https://github.com/coreos/kube-prometheus.git
然后根据官方文档操作即可
$ kubectl create -f manifests/ # It can take a few seconds for the above 'create manifests' command to fully create the following resources, so verify the resources are ready before proceeding. $ until kubectl get customresourcedefinitions servicemonitors.monitoring.coreos.com ; do date; sleep 1; echo ""; done $ until kubectl get servicemonitors --all-namespaces ; do date; sleep 1; echo ""; done $ kubectl apply -f manifests/ # This command sometimes may need to be done twice (to workaround a race condition).
这里将自动为我们部署prometheus,alertmanager和grafana。我们接下来可以通过port-forward也可以通过ingress将服务暴露出来
Prometheus $ kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090 Then access via http://localhost:9090 Grafana $ kubectl --namespace monitoring port-forward svc/grafana 3000 Then access via http://localhost:3000 and use the default grafana user:password of admin:admin. Alert Manager $ kubectl --namespace monitoring port-forward svc/alertmanager-main 9093 Then access via http://localhost:9093
或者编写ingress
# ingress-monitor.yaml apiVersion: extensions/v1beta1 kind: Ingress metadata: name: monitoring-ingress annotations: nginx.ingress.kubernetes.io/ssl-redirect: "false" namespace: monitoring spec: rules: - host: k8s-prometheus.calmkart.com http: paths: - path: / backend: serviceName: prometheus-k8s servicePort: 9090 - host: k8s-grafana.calmkart.com http: paths: - path: / backend: serviceName: grafana servicePort: 3000 - host: k8s-alertmanager.calmkart.com http: paths: - path: / backend: serviceName: alertmanager-main servicePort: 9093 # kubectl apply -f ingress-monitor.yaml
然后我们就可以访问到prometheus,alertmanager和grafana的服务页面了
这里prometheus已经集成了一些k8s相关的exporter和kubernetes-mixin的报警规则,我们可以从prometheus的status->rules和status->target中查看到。
接下来,我们部署push-gateway
#可以参考我这里NodePort的values参数,也可以自行设置 # values.yaml # Default values for prometheus-pushgateway. # This is a YAML-formatted file. # Declare variables to be passed into your templates. image: repository: prom/pushgateway tag: v0.9.0 pullPolicy: IfNotPresent service: type: NodePort port: 9091 targetPort: 9091 # Optional pod annotations podAnnotations: {} # Optional pod labels podLabels: {} # Optional service labels serviceLabels: {} # Optional serviceAccount labels serviceAccountLabels: {} # Optional additional arguments extraArgs: [] # Optional additional environment variables extraVars: [] resources: {} # We usually recommend not to specify default resources and to leave this as a conscious # choice for the user. This also increases chances charts run on environments with little # resources, such as Minikube. If you do want to specify resources, uncomment the following # lines, adjust them as necessary, and remove the curly braces after 'resources:'. # limits: # cpu: 200m # memory: 50Mi # requests: # cpu: 100m # memory: 30Mi serviceAccount: # Specifies whether a ServiceAccount should be created create: true # The name of the ServiceAccount to use. # If not set and create is true, a name is generated using the fullname template name: ## Configure ingress resource that allow you to access the ## pushgateway installation. Set up the URL ## ref: http://kubernetes.io/docs/user-guide/ingress/ ## ingress: ## Enable Ingress. ## enabled: false ## Annotations. ## # annotations: # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: 'true' ## Hostnames. ## Must be provided if Ingress is enabled. ## # hosts: # - pushgateway.domain.com ## TLS configuration. ## Secrets must be manually created in the namespace. ## # tls: # - secretName: pushgateway-tls # hosts: # - pushgateway.domain.com tolerations: {} # - effect: NoSchedule # operator: Exists ## Node labels for pushgateway pod assignment ## Ref: https://kubernetes.io/docs/user-guide/node-selection/ ## nodeSelector: {} replicaCount: 1 ## Affinity for pod assignment ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity affinity: {} # Enable this if you're using https://github.com/coreos/prometheus-operator serviceMonitor: enabled: true namespace: monitoring # fallback to the prometheus default unless specified # interval: 10s ## Defaults to what's used if you follow CoreOS [Prometheus Install Instructions](https://github.com/helm/charts/tree/master/stable/prometheus-operator#tldr) ## [Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#prometheus-operator-1) ## [Kube Prometheus Selector Label](https://github.com/helm/charts/tree/master/stable/prometheus-operator#exporters) selector: prometheus: kube-prometheus # Retain the job and instance labels of the metrics pushed to the Pushgateway # [Scraping Pushgateway](https://github.com/prometheus/pushgateway#configure-the-pushgateway-as-a-target-to-scrape) honorLabels: true # The values to set in the PodDisruptionBudget spec (minAvailable/maxUnavailable) # If not set then a PodDisruptionBudget will not be created podDisruptionBudget: # helm install --name push-gateway -f values.yaml stable/prometheus-pushgateway
然后我们来试着接入内部和外部的prometheus监控target.
1.实现接入内部的target
无论是外部或内部的target都需要一个metrics-server目标,对于内部target而言,一般是一个服务,比如服务calm-server
在prometheus-operater的使用方式中,有一个crd叫serviceMonitor,我们创建一个新的serviceMonitor就创建了一个prometheus的target
我们首先查看monitoring命名空间中已有的serviceMonitor(既prometheus target)
[xxx@xxxxx]# kubectl get servicemonitors.monitoring.coreos.com -n monitoring NAME AGE alertmanager 23d coredns 23d grafana 23d ingress-nginx 17d kube-apiserver 23d kube-controller-manager 23d kube-scheduler 23d kube-state-metrics 23d kubelet 23d node-exporter 23d prometheus 23d prometheus-operator 23d prometheus-pushgateway 19d
我们创建一个新的serviceMonitor,将calm-server的/metrics作为target
# calm-server-serviceMonitor.yaml
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: calm-server-metrics labels: k8s-app: calm-server-metrics namespace: monitoring spec: namespaceSelector: any: true selector: matchLabels: app: calm-server endpoints: - port: web interval: 10s honorLabels: true # kubectl apply -f calm-server-serviceMonitor.yaml
这里有几个需要注意的tips:
1.如果service和prometheus不再同一个命名空间,需要设置namespaceSelector,可以单独设置需要搜索的namespace,也可以像上面那样设置为全局搜索。
2.其次,endpoints中设置的port会按照interval设置的时间定时去:/metrics拉取数据,所以对应的服务需要提供相应的metrics(官方有非常多的exporter提供使用,可以参考https://prometheus.io/docs/instrumenting/exporters/)
我们接下来再查看所有serviceMonitor crd api对象,就会发现创建成功,同时prometheus的state->target中也会出现对应的target,我们也可以在prometheus中查询到对应的数据了。
2.实现接入外部target
实现了接入内部的target,那么,k8s集群外部的服务想要接入该怎么办呢?当然还是通过监控k8s集群内service的方式,不是service对应的是一个外部的endpoint对象。下面我们将以calm-server服务为例,说明如何通过k8s的endpoint外部对象接入外部target监控。
首先我们创建一个endpoint对象
# calm-server-endpoint.yaml apiVersion: v1 kind: Endpoints metadata: name: calm-server-metrics subsets: - addresses: - ip: x.x.x.x ports: - name: metrics port: xxxx protocol: TCP # kubectl apply -f calm-server-endpoint.yaml
我们将ip和port替换成我们的外部服务,apply后就创建了一个endpoint api对象,我们可以通过kubectl get endpoints查看
[xxx@xxxxxxx]# kubectl get endpoints NAME ENDPOINTS AGE calm-server-metrics 10.41.13.17:6789 19d kubernetes 10.1.33.159:6443 92d push-gateway-prometheus-pushgateway 10.240.224.15:9091 19d
这里需要注意的是,endpoint对象是不区分namespaces的
接着,我们创建一个service,service的选择器选择calm-server-metrics这个外部endpoint
# calm-server-metrics-service.yaml apiVersion: v1 kind: Service metadata: name: calm-server-metrics labels: app: calm-server-metrics spec: type: ExternalName externalName: x.x.x.x clusterIP: "" ports: - name: metrics port: xxxx protocol: TCP targetPort: 6789 # kubectl apply -f calm-server-metrics-service.yaml
我们这个时候访问这个服务,就等于访问了外部的endpoint,既外部服务
我们就可以像上面创建内部服务的serviceMonitor一样,创建serviceMonitor api对象从而更新prometheus target列表了。
接下来我们会有一个问题,如何修改这一系列全家桶的配置呢?比如prometheus的配置,grafana的配置。比如我们需要修改smtp报警配置该怎么办呢?如果是非云原生环境,我们可以直接修改配置文件即可,但在云原生环境中不一样。
kube-prometheus官方文档推荐的方式是使用jsonnet对官方库做编译修改。那么如何直接通过修改yaml的方式修改配置文件呢?下面将分别介绍如何修改全家桶中的各个配置文件。
1.首先是prometheus的alert rules,我们可以通过修改prometheus-rules.yaml文件修改。
2.其次是alertmanager的config,我们可以修改alertmanager-secret.yaml文件,注意,这是一个secret对象,内容经过了base64加密,我们应该先将内容解密再做修改,修改后再加密替换即可。
3.最后是grafana的配置修改,参考了grafana官方的docker image之后,我们可以先修改grafana-deployment.yaml文件,为其增加一个volume,配置如下
apiVersion: apps/v1beta2 kind: Deployment metadata: labels: app: grafana name: grafana namespace: monitoring spec: replicas: 1 selector: matchLabels: app: grafana template: metadata: labels: app: grafana spec: containers: - image: grafana/grafana:6.2.2 name: grafana ports: - containerPort: 3000 name: http readinessProbe: httpGet: path: /api/health port: http resources: limits: cpu: 200m memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - mountPath: /var/lib/grafana name: grafana-storage readOnly: false - mountPath: /etc/grafana/provisioning/datasources name: grafana-datasources readOnly: false - mountPath: /etc/grafana/provisioning/dashboards name: grafana-dashboards readOnly: false - mountPath: /grafana-dashboard-definitions/0/apiserver name: grafana-dashboard-apiserver readOnly: false - mountPath: /grafana-dashboard-definitions/0/controller-manager name: grafana-dashboard-controller-manager readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-cluster-rsrc-use name: grafana-dashboard-k8s-cluster-rsrc-use readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-node-rsrc-use name: grafana-dashboard-k8s-node-rsrc-use readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-resources-cluster name: grafana-dashboard-k8s-resources-cluster readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-resources-namespace name: grafana-dashboard-k8s-resources-namespace readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-resources-pod name: grafana-dashboard-k8s-resources-pod readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workload name: grafana-dashboard-k8s-resources-workload readOnly: false - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workloads-namespace name: grafana-dashboard-k8s-resources-workloads-namespace readOnly: false - mountPath: /grafana-dashboard-definitions/0/kubelet name: grafana-dashboard-kubelet readOnly: false - mountPath: /grafana-dashboard-definitions/0/nodes name: grafana-dashboard-nodes readOnly: false - mountPath: /grafana-dashboard-definitions/0/persistentvolumesusage name: grafana-dashboard-persistentvolumesusage readOnly: false - mountPath: /grafana-dashboard-definitions/0/pods name: grafana-dashboard-pods readOnly: false - mountPath: /grafana-dashboard-definitions/0/prometheus-remote-write name: grafana-dashboard-prometheus-remote-write readOnly: false - mountPath: /grafana-dashboard-definitions/0/prometheus name: grafana-dashboard-prometheus readOnly: false - mountPath: /grafana-dashboard-definitions/0/proxy name: grafana-dashboard-proxy readOnly: false - mountPath: /grafana-dashboard-definitions/0/scheduler name: grafana-dashboard-scheduler readOnly: false - mountPath: /grafana-dashboard-definitions/0/statefulset name: grafana-dashboard-statefulset readOnly: false - mountPath: /etc/grafana name: config-custom nodeSelector: beta.kubernetes.io/os: linux securityContext: runAsNonRoot: true runAsUser: 65534 serviceAccountName: grafana volumes: - emptyDir: {} name: grafana-storage - name: grafana-datasources secret: secretName: grafana-datasources - configMap: name: grafana-dashboards name: grafana-dashboards - configMap: name: grafana-dashboard-apiserver name: grafana-dashboard-apiserver - configMap: name: grafana-dashboard-controller-manager name: grafana-dashboard-controller-manager - configMap: name: grafana-dashboard-k8s-cluster-rsrc-use name: grafana-dashboard-k8s-cluster-rsrc-use - configMap: name: grafana-dashboard-k8s-node-rsrc-use name: grafana-dashboard-k8s-node-rsrc-use - configMap: name: grafana-dashboard-k8s-resources-cluster name: grafana-dashboard-k8s-resources-cluster - configMap: name: grafana-dashboard-k8s-resources-namespace name: grafana-dashboard-k8s-resources-namespace - configMap: name: grafana-dashboard-k8s-resources-pod name: grafana-dashboard-k8s-resources-pod - configMap: name: grafana-dashboard-k8s-resources-workload name: grafana-dashboard-k8s-resources-workload - configMap: name: grafana-dashboard-k8s-resources-workloads-namespace name: grafana-dashboard-k8s-resources-workloads-namespace - configMap: name: grafana-dashboard-kubelet name: grafana-dashboard-kubelet - configMap: name: grafana-dashboard-nodes name: grafana-dashboard-nodes - configMap: name: grafana-dashboard-persistentvolumesusage name: grafana-dashboard-persistentvolumesusage - configMap: name: grafana-dashboard-pods name: grafana-dashboard-pods - configMap: name: grafana-dashboard-prometheus-remote-write name: grafana-dashboard-prometheus-remote-write - configMap: name: grafana-dashboard-prometheus name: grafana-dashboard-prometheus - configMap: name: grafana-dashboard-proxy name: grafana-dashboard-proxy - configMap: name: grafana-dashboard-scheduler name: grafana-dashboard-scheduler - configMap: name: grafana-dashboard-statefulset name: grafana-dashboard-statefulset - configMap: name: grafana-config name: config-custom
其实就是将一个叫grafana-config的configMap作为volumeMount到/etc/grafana下。
然后我们创建这个configMap
# grafana-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: grafana-config namespace: monitoring data: grafana.ini: | [smtp] enabled = true host = smtp.calmkart.com:465 user = calmkart@calmkart.com password = xxxxxxxxxx skip_verify = false from_address = calmkart@calmkart.com from_name = grafana # kubectl apply -f grafana-configmap.yaml
这样就可以替换掉默认的配置文件了。
另外还有关于如何为grafana增加plugin等等话题,可以参考官方的相关资料。
就简单介绍到这里吧。
牛逼的彭董,碾压我等底层劳苦大众,我是服的