一个 Pod 被创建后, 一直卡在ContainerCreating的状态, 执行describe命令查看该 Pod 详细信息后发现如下 Event
1 2 3 4 5 6 7
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 2m default-scheduler Successfully assigned 61f983b5-19ca-4b33-8647-6b279ae93812 to k8node3 Normal SuccessfulMountVolume 2m kubelet, k8node3 MountVolume.SetUp succeeded for volume "default-token-7r9jt" Warning FailedCreatePodSandBox 2m (x12 over 2m) kubelet, k8node3 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Normal SandboxChanged 2m (x12 over 2m) kubelet, k8node3 Pod sandbox changed, it will be killed and re-created.
以上 Event 信息中, 能解读到的信息极其有限
Failed create pod sandbox: Google 提供的 pause 容器启动失败
oci runtime error: 运行时接口出的问题, 我的环境中运行时环境为 docker
connection reset by peer: 连接被重置
Pod sandbox changed, it will be killed and re-created: pause 容器引导的 Pod 环境被改变, 重新创建 Pod 中的 pause 引导
Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551282 1865 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551415 1865 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551459 1865 kuberuntime_manager.go:646] createPodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:57 k8node3 kubelet[1865]: E1031 16:33:57.551581 1865 pod_workers.go:186] Error syncing pod 77b2b948-dce4-11e8-afec-b82a72cf3061 ("61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)"), skipping: failed to "CreatePodSandbox" for "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" with CreatePodSandboxError: "CreatePodSandbox for pod \"61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"61f983b5-19ca-4b33-8647-6b279ae93812\": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718255 1865 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718406 1865 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718443 1865 kuberuntime_manager.go:646] createPodSandbox for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "61f983b5-19ca-4b33-8647-6b279ae93812": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:286: decoding sync type from init pipe caused \"read parent: connection reset by peer\"" Oct 31 16:33:58 k8node3 kubelet[1865]: E1031 16:33:58.718597 1865 pod_workers.go:186] Error syncing pod 77b2b948-dce4-11e8-afec-b82a72cf3061 ("61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)"), skipping: failed to "CreatePodSandbox" for "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)" with CreatePodSandboxError: "CreatePodSandbox for pod \"61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"61f983b5-19ca-4b33-8647-6b279ae93812\": Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"" Oct 31 16:36:02 k8node3 kubelet[1865]: E1031 16:36:02.114171 1865 kubelet.go:1644] Unable to mount volumes for pod "61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)": timeout expired waiting for volumes to attach or mount for pod "default"/"61f983b5-19ca-4b33-8647-6b279ae93812". list of unmounted volumes=[default-token-7r9jt]. list of unattached volumes=[default-token-7r9jt]; skipping pod Oct 31 16:36:02 k8node3 kubelet[1865]: E1031 16:36:02.114262 1865 pod_workers.go:186] Error syncing pod 77b2b948-dce4-11e8-afec-b82a72cf3061 ("61f983b5-19ca-4b33-8647-6b279ae93812_default(77b2b948-dce4-11e8-afec-b82a72cf3061)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"61f983b5-19ca-4b33-8647-6b279ae93812". list of unmounted volumes=[default-token-7r9jt]. list of unattached volumes=[default-token-7r9jt]
Oct 31 16:33:58 k8node3 dockerd[1715]: time="2018-10-31T16:33:58.671146675+08:00" level=error msg="containerd: start container" error="oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"\n" id=029d9e843eedb822370c285b5abf1f37556461083d3bda2c7af38b3b00695b0f Oct 31 16:33:58 k8node3 dockerd[1715]: time="2018-10-31T16:33:58.671871096+08:00" level=error msg="Create container failed with error: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"\n" Oct 31 16:33:58 k8node3 dockerd[1715]: time="2018-10-31T16:33:58.717553371+08:00" level=error msg="Handler for POST /v1.27/containers/029d9e843eedb822370c285b5abf1f37556461083d3bda2c7af38b3b00695b0f/start returned error: oci runtime error: container_linux.go:247: starting container process caused \"process_linux.go:286: decoding sync type from init pipe caused \\\"read parent: connection reset by peer\\\"\"\n" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.759631102+08:00" level=error msg="Handler for POST /v1.27/containers/207f0ffb4b5ecc5f8261af40cd7a2c4c2800a2c30b027c4fb95648f8c1b00274/stop returned error: Container 207f0ffb4b5ecc5f8261af40cd7a2c4c2800a2c30b027c4fb95648f8c1b00274 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.768603351+08:00" level=error msg="Handler for POST /v1.27/containers/03bf9bfcf4e3f66655b0124d6779ff649b2b00219b83645ca18b4bb08d1cc573/stop returned error: Container 03bf9bfcf4e3f66655b0124d6779ff649b2b00219b83645ca18b4bb08d1cc573 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.777073508+08:00" level=error msg="Handler for POST /v1.27/containers/7b37f5aee7afe01f209bcdc6b3568b522fb0bbda5cb4b322e10b05ec603f5728/stop returned error: Container 7b37f5aee7afe01f209bcdc6b3568b522fb0bbda5cb4b322e10b05ec603f5728 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.785774443+08:00" level=error msg="Handler for POST /v1.27/containers/1a01419973e4701b231556d74c619c30e0966889948e810b46567f08475ec431/stop returned error: Container 1a01419973e4701b231556d74c619c30e0966889948e810b46567f08475ec431 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.794198279+08:00" level=error msg="Handler for POST /v1.27/containers/c3c4049e7b1942395b3cc3a45cf0cc69b34bab6271cb940a70c7d9aed3ba6176/stop returned error: Container c3c4049e7b1942395b3cc3a45cf0cc69b34bab6271cb940a70c7d9aed3ba6176 is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.802698120+08:00" level=error msg="Handler for POST /v1.27/containers/8d2c8a4cd5b43b071a9976251932955937d5b1f0f34dca1482cde4195df4747d/stop returned error: Container 8d2c8a4cd5b43b071a9976251932955937d5b1f0f34dca1482cde4195df4747d is already stopped" Oct 31 16:34:22 k8node3 dockerd[1715]: time="2018-10-31T16:34:22.811103238+08:00" level=error msg="Handler for POST /v1.27/containers/7fdb697e251cec249c0a17f1fdcc6d76fbec13a60929eb0217c744c181702c1f/stop returned error: Container 7fdb697e251cec249c0a17f1fdcc6d76fbec13a60929eb0217c744c181702c1f is already stopped"
Docker 的日志中, 除了已经看了很多遍的connection reset by peer之外, 还有一些新的发现
xxx is already stopped: 看日志, 感觉是向容器接口发送了 POST 请求以 stop 容器, 但是该容器已经被 stop 掉了