一个 Pod 被创建后, 一直卡在ContainerCreating的状态, 执行describe命令查看该 Pod 详细信息后发现如下 Event
1 2 3 4 5 6 7
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m default-scheduler Successfully assigned 61f983b5-19ca-4b33-8647-6b279ae93812 to k8node3 Normal SuccessfulMountVolume 2m kubelet, k8node3 MountVolume.SetUp succeeded for volume "default-token-7r9jt" Warning FailedCreatePodSandBox 2m (x12 over 2m) kubelet, k8node3 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Normal SandboxChanged 2m (x12 over 2m) kubelet, k8node3 Pod sandbox changed, it will be killed and re-created.
以上 Event 信息中, 能解读到的信息极其有限
Failed create pod sandbox: Google 提供的 pause 容器启动失败
oci runtime error: 运行时接口出的问题, 我的环境中运行时环境为 docker
connection reset by peer: 连接被重置
Pod sandbox changed, it will be killed and re-created: pause 容器引导的 Pod 环境被改变, 重新创建 Pod 中的 pause 引导
Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551282 1865 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551415 1865 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551459 1865 kuberuntime_manager.go:646] createPodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551581 1865 pod_workers.go:186] Error syncing pod 77b2b948-dce4-11e8-afec-b82a72cf3061 ("61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)"), skipping: failed to "CreatePodSandbox" for "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" with CreatePodSandboxError: "CreatePodSandbox for pod \"61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"61f983b5-19ca-4b33-8647-6b279ae93812\": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718255 1865 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718406 1865 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718443 1865 kuberuntime_manager.go:646] createPodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718597 1865 pod_workers.go:186] Error syncing pod 77b2b948-dce4-11e8-afec-b82a72cf3061 ("61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)"), skipping: failed to "CreatePodSandbox" for "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" with CreatePodSandboxError: "CreatePodSandbox for pod \"61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"61f983b5-19ca-4b33-8647-6b279ae93812\": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"" Oct 31 16:36:02 k8node3 kubelet[1865]: E1031 16:36:02.114171 1865 kubelet.go:1644] Unable to mount volumes for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)": timeout expired waiting for volumes to attach or mount for pod "default"/"61f983b5-19ca-4b33-8647-6b279ae93812". list of unmounted volumes=[default-token-7r9jt]. list of unattached volumes=[default-token-7r9jt]; skipping pod Oct 31 16:36:02 k8node3 kubelet[1865]: E1031 16:36:02.114262 1865 pod_workers.go:186] Error syncing pod 77b2b948-dce4-11e8-afec-b82a72cf3061 ("61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"61f983b5-19ca-4b33-8647-6b279ae93812". list of unmounted volumes=[default-token-7r9jt]. list of unattached volumes=[default-token-7r9jt]
Oct 31 16:33:58 k8node3 dockerd[1715]: time="2018-10-31T16:33:58.671146675+08:00" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"\n" id=029d9e843eedb822370c285b5abf1f37556461083d3bda2c7af38b3b00695b0f Oct 31 16:33:58 k8node3 dockerd[1715]: time="2018-10-31T16:33:58.671871096+08:00" level=error msg="Create container failed with error: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"\n" Oct 31 16:33:58 k8node3 dockerd[1715]: time="2018-10-31T16:33:58.717553371+08:00" level=error msg="Handler for POST /v1.27/containers/029d9e843eedb822370c285b5abf1f37556461083d3bda2c7af38b3b00695b0f/start returned error: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"\n" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.759631102+08:00" level=error msg="Handler for POST /v1.27/containers/207f0ffb4b5ecc5f8261af40cd7a2c4c2800a2c30b027c4fb95648f8c1b00274/stop returned error: Container 207f0ffb4b5ecc5f8261af40cd7a2c4c2800a2c30b027c4fb95648f8c1b00274 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.768603351+08:00" level=error msg="Handler for POST /v1.27/containers/03bf9bfcf4e3f66655b0124d6779ff649b2b00219b83645ca18b4bb08d1cc573/stop returned error: Container 03bf9bfcf4e3f66655b0124d6779ff649b2b00219b83645ca18b4bb08d1cc573 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.777073508+08:00" level=error msg="Handler for POST /v1.27/containers/7b37f5aee7afe01f209bcdc6b3568b522fb0bbda5cb4b322e10b05ec603f5728/stop returned error: Container 7b37f5aee7afe01f209bcdc6b3568b522fb0bbda5cb4b322e10b05ec603f5728 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.785774443+08:00" level=error msg="Handler for POST /v1.27/containers/1a01419973e4701b231556d74c619c30e0966889948e810b46567f08475ec431/stop returned error: Container 1a01419973e4701b231556d74c619c30e0966889948e810b46567f08475ec431 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.794198279+08:00" level=error msg="Handler for POST /v1.27/containers/c3c4049e7b1942395b3cc3a45cf0cc69b34bab6271cb940a70c7d9aed3ba6176/stop returned error: Container c3c4049e7b1942395b3cc3a45cf0cc69b34bab6271cb940a70c7d9aed3ba6176 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.802698120+08:00" level=error msg="Handler for POST /v1.27/containers/8d2c8a4cd5b43b071a9976251932955937d5b1f0f34dca1482cde4195df4747d/stop returned error: Container 8d2c8a4cd5b43b071a9976251932955937d5b1f0f34dca1482cde4195df4747d is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.811103238+08:00" level=error msg="Handler for POST /v1.27/containers/7fdb697e251cec249c0a17f1fdcc6d76fbec13a60929eb0217c744c181702c1f/stop returned error: Container 7fdb697e251cec249c0a17f1fdcc6d76fbec13a60929eb0217c744c181702c1f is already stopped"
Docker 的日志中, 除了已经看了很多遍的connection reset by peer之外, 还有一些新的发现
xxx is already stopped: 看日志, 感觉是向容器接口发送了 POST 请求以 stop 容器, 但是该容器已经被 stop 掉了
[root@k8s-node ~]# cat leak.sh #!/bin/bash declare -A map for i in `find /proc/*/mounts -exec grep $1 {} + 2>/dev/null | awk '{print $1"#"$2}'` do pid=`echo$i | awk -F "[/]"'{print $3}'` point=`echo$i | awk -F "[#]"'{print $2}'` mnt=`ls -l /proc/$pid/ns/mnt |awk '{print $11}'` map["$mnt"]="exist" cmd=`cat /proc/$pid/cmdline` echo -e "$pid\t$mnt\t$cmd\t$point" done
for i in `ps aux|grep docker-containerd-shim |grep -v "grep" |awk '{print $2}'` do mnt=`ls -l /proc/$i/ns/mnt 2>/dev/null | awk '{print $11}'` if [[ "${map[$mnt]}" == "exist" ]];then echo$mnt fi done
执行脚本, 后跟device or resource busy的目录绝对路径
1
sh leak.sh /var/lib/kubelet/pods/81791176-a505-11e7-accf-5254fe5a9007/volumes/kubernetes.io~secret/default-token-pzyxh
API Aggregation允许在不修改Kubernetes核心代码的同时扩展Kubernetes API. 开启 API Aggregation 需要在 kube-apiserver 中添加如下配置:
1 2 3 4 5 6 7
--requestheader-client-ca-file=<path to aggregator CA cert> --requestheader-allowed-names=front-proxy-client --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --proxy-client-cert-file=<path to aggregator proxy cert> --proxy-client-key-file=<path to aggregator proxy key>
官方警告: 除非你了解保护 CA 使用的风险和机制, 否则不要在不通上下文中重用已经使用过的 CA
如果 kube-proxy 没有和 API server 运行在同一台主机上,那么需要确保启用了如下 apiserver 标记: