容器服务Docker&Kubernetes 关注
手机版

当 Kubernetes 遇到阿里云

  1. 云栖社区>
  2. 容器服务Docker&Kubernetes>
  3. 博客>
  4. 正文

当 Kubernetes 遇到阿里云

初扬 2017-01-19 10:59:44 浏览36280 评论27

摘要: 当 Kubernetes 遇到阿里云 阿里云提供了丰富多样的云产品支持,包括ECS、VPC网络、经典网络、负载均衡SLB等等,可以帮助Docker应用轻松在云端运行。阿里云除了推出容器服务提供了一站式的容器应用管理解决方案,也在不断推动其他开源容器技术和阿里云的集成更好地满足用户的多样化需求。

当 Kubernetes 遇到阿里云

阿里云提供了丰富多样的云产品支持,包括ECS、VPC网络、经典网络、负载均衡SLB等等,可以帮助Docker应用轻松在云端运行。阿里云除了推出容器服务提供了一站式的容器应用管理解决方案,也在不断推动其他开源容器技术和阿里云的集成更好地满足用户的多样化需求。

本文将会介绍如何非常方便的在阿里云上运行起一个安全的高可用的Kubernetes集群。同时为了让Kubernetes用户更好的使用阿里云服务,容器服务团队为Kubernetes提供了阿里云CloudProvider,支持为Kubernetes service 创建阿里云LoadBalance; 也为Flannel编写了网络驱动,让Flannel可以更好的支持阿里云VPC网络;同时我们基于当前最新的kubernetes 1.6.0-alpha版本制作了阿里云上一键部署安装脚本,开箱即用。

前置条件

  • 支持阿里云CentOS 7.2-x64版本 、Ubuntu 16.04 x64版本
  • 支持阿里云VPC网络和经典网络,创建VPC网络的网段建议使用192.168.0.0或者10.0.0.0网段,可以避免与本次安装Kubernetes的默认网段172.16.0.0冲突。
  • 准备阿里云账号KeyID与KeySecret
  • 如果您需要下载任何墙外的镜像,请移步使用阿里云镜像服务加速器
  • 请至少准备两个ECS实例,其中 node1 将作为master节点,node2作为工作节点。请注意不要修改ECS实例的名称(包括hostname)。

安装Kubernetes

准备3个配置参数

  • 获取阿里云KeyID和KeySecret,请点击. 假设ACCESS_KEY_ID=xxxxxxxx, ACCESS_KEY_SECRET=xxxxxxxxxxxxxxxx
  • 创建ECS CentOS 7.2-x64版本或者Ubuntu 16.04 x64版本。记录ECS所在region,以下列表为支持的region。例如杭州region名称为cn-hangzhou. 假设REGION=cn-hangzhou

    Region名称        |            值            |Region名称        |            值            |
    ------------------|-----------------------|------------------|-----------------------|
    杭州                | cn-hangzhou        |新加坡         |  ap-southeast-1         |
    青岛                |  cn-qingdao        |上海                |  cn-shanghai        |
    北京                |  cn-beijing         |迪拜                |  me-east-1          |
    香港                |  cn-hongkong       |东京                |  ap-northeast-1     |
    深圳                |  cn-shenzhen       |悉尼                |  ap-southeast-2     |
    硅谷                |  us-west-1         |法兰克福             |  eu-central-1       |
    弗吉尼亚             |  us-east-1         |||
    

注意: VPC网络可以不用设置REGION参数,系统会自动推测。但是经典网络必须要设置REGION参数。

开始安装

  • 安装Master节点:ssh root@node1登录master节点安装master.注意将下面的$ACCESS_KEY_ID$ACCESS_KEY_SECRET$REGION替换成上一步中获得的参数。

    [root@node1 ~]# curl -L 'http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/installer/kubemgr.sh' | \
                bash -s nice --node-type master --key-id $ACCESS_KEY_ID --key-secret $ACCESS_KEY_SECRET \
                    --region $REGION --discovery token://
    

    输出如下,注意记录输出中的token,TOKEN=token://xxxxxx:xxxxxxxxxxxxxxxx@12x.2x.24x.21x:989x

    docker has been installed
    3.0: Pulling from google-containers/pause-amd64
    Digest: sha256:3b3a29e3c90ae7762bdf587d19302e62485b6bef46e114b741f7d75dba023bd3
    
    ...
    
    [tokens] Generated token: "xxxxxx:xxxxxxxxxxxxxxxx"
    [certificates] Generated Certificate Authority key and certificate.
    
    ...
    
    [apiclient] All control plane components are healthy after 17.286402 seconds
    [apiclient] Waiting for at least one node to register and become ready
    [apiclient] First node is ready after 4.003314 seconds
    
    ...
    
    Your Kubernetes master has initialized successfully!
    
    You should now deploy a pod network to the cluster.
    
    ## 注意记录这个Token
    kubeadm join --discovery token://xxxxxx:xxxxxxxxxxxxxxxx@12x.2x.24x.21x:989x
    
    ...
    
    NAME                                              READY     STATUS              RESTARTS   AGE
    dummy-3158885821-vkv5q                            1/1       Running             0          5s
    etcd-izbp12l8fznm0yt7bas5p2z                      1/1       Running             0          19s
    kube-apiserver-izbp12l8fznm0yt7bas5p2z            1/1       Running             1          18s
    
    ...
    
    kubectl --namespace=kube-system get po
  • 安装Node节点:ssh root@node2登录到您的node2节点上。使用刚才您记录下来的token,执行以下命令,注意替换下面的变量:

    [root@node2 ~]# curl -L 'http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/installer/kubemgr.sh' \
                    | bash -s nice --node-type node --key-id $ACCESS_KEY_ID --key-secret \
                    $ACCESS_KEY_SECRET --region $REGION --discovery $TOKEN
    

    输出如下:

    docker has been installed
    3.0: Pulling from google-containers/pause-amd64
    
    ...
    
    Digest: sha256:3b3a29e3c90ae7762bdf587d19302e62485b6bef46e114b741f7d75dba023bd3
    Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0
    [preflight] Running pre-flight checks
    [discovery] Created cluster info discovery client, requesting info from "http://12x.2x.24x.21x:989x/cluster-info/v1/?token-id=56974f"
    [discovery] Cluster info object received, verifying signature using given token
    [discovery] Cluster info signature and contents are valid, will use API endpoints [https://12x.2x.24x.21x:6443]
    [bootstrap] Trying to connect to endpoint https://12x.2x.24x.21x:6443
    [bootstrap] Detected server version: v1.6.0-alpha.0.2229+88fbc68ad99479-dirty
    [bootstrap] Successfully established connection with endpoint "https://12x.2x.24x.21x:6443"
    [csr] Created API client to obtain unique certificate for this node, generating keys and certificate signing request
    [csr] Received signed certificate from the API server:
    Issuer: CN=kubernetes | Subject: CN=system:node:iZbp12l8fznm0yt7bas5p1Z | CA: false
    Not before: 2017-01-18 07:46:00 +0000 UTC Not After: 2018-01-18 07:46:00 +0000 UTC
    [csr] Generating kubelet configuration
    [kubeconfig] Wrote KubeConfig file to disk: "/etc/kubernetes/kubelet.conf"
    
    Node join complete:
    * Certificate signing request sent to master and response
      received.
    * Kubelet informed of new secure connection details.
    
    Run 'kubectl get nodes' on the master to see this machine join.

    Congratulations! 您已经成功的安装的了一个master和一个node节点。您可以重复在其他机器上执行安装node操作来添加更多节点。但是要让Kubernetes能正常运行您还需要为集群添加网络支持。

为集群添加网络支持

目前阿里云支持两种类型的网络:VPC网络和经典网络。请根据您的集群的网络类型不同为kubernetes选择相应的网络组件。

注意:以下两种网络只需要按照您的实际网络情况安装对应的一种。

方案一、增加VPC 网络支持: (适用于VPC网络)我们专门为flannel编写了vpc支持的插件。为Kubernetes安装flannel网络插件支持也非常容易。注意修改flannel-vpc.yml 文件中的replace with your id为您自己的KEY_ID和KEY_SECRET。在Master node上面执行下面的命令:

[root@node1 ~]# curl -sSL http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/conf/flannel-vpc.yml -o flannel-vpc.yml
[root@node1 ~]# vi flannel-vpc.yml
[root@node1 ~]# kubectl apply -f flannel-vpc.yml

等待一会儿,然后使用kubectl --namespace=kube-system get ds 列出所有kube-system命名空间下的所有daemonsets,您会看见一个名字叫kube-falnnel的ds处于Running状态. 说明网络部署成功。

[root@node1 ~]# kubectl get ds --namespace=kube-system
NAME              DESIRED   CURRENT   READY     NODE-SELECTOR   AGE
kube-flannel-ds   2         2         2         <none>          2h
kube-proxy        2         2         2         <none>          2h

方案二、增加经典网络支持: (适用于经典网络和VPC网络)通过flannel的VXLAN,我们可以为Pod打通经典网络内的网络连通性。安装经典网络支持同样很简单,运行以下命令即可:

[root@node1 ~]# kubectl apply -f http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/conf/flannel-vxlan.yml

等待一会儿,通过命令kubectl --namespace=kube-system get ds 可以查看您的网络插件的运行状态。安装完成,收工。

在您的Kubernetes集群里创建应用

运行nginx应用

现在运行一个nginx应用,运行以下命令创建一个具有两个nginx副本的应用。

[root@node1 ~]# kubectl run nginx --image=registry.cn-hangzhou.aliyuncs.com/spacexnice/nginx:latest --replicas=2 --labels run=nginx

deployment "nginx" created
[root@node1 ~]# kubectl get po
NAME                     READY     STATUS    RESTARTS   AGE
nginx-3579028506-9qxxl   1/1       Running   0          50s
nginx-3579028506-p032g   1/1       Running   0          50s

然后为nginx应用创建服务。可以指定type=Loadbalance来启用阿里云SLB能力,阿里云CloudProvider会自动为该服务创建LoadBalance。

[root@node1 ~]# kubectl expose deployment nginx --port=80 --target-port=80 --type=LoadBalancer

service "nginx" exposed
[root@node1 ~]# kubectl get svc
NAME         CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
kubernetes   172.19.0.1     <none>           443/TCP        3h
nginx        172.19.6.158   118.178.111.31   80:30146/TCP   6s

现在打开您的浏览器访问http://118.178.111.31(注意,这儿需要替换成您自己的EXTERNAL-IP)吧,熟悉的Welcome to nginx!是否出现。您还可以去您的阿里云SLB控制台上确认SLB相关信息。

重置节点

当您发现安装过程有错误,或者想卸载Kubernetes安装的时候,随时执行如下命令即可卸载安装。

[root@node1 ~]# curl -L 'http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/installer/kubemgr.sh' | bash -s nice --node-type down

阿里云SLB能力支持

Kubernetes阿里云CloudProvider提供了丰富的annotation来让用户高度定制化自己的SLB的行为,支持创建一个https和http的SLB,支持自定义SLB的带宽,支持自定义SLB健康检查,SLB网络地址类型等等。

为nginx创建一个https的SLB

如将刚刚创建的nginx服务的SLB类型更换成https,可以执行如下操作步骤:

  • 前往阿里云SLB控制台将您自己的https证书上传到阿里云。并记录生成的certid。假如生成的certid=124395s8ifs8ffftte.
  • 为刚刚创建的service添加相应的annotation(每个可用的annotation的含义见附表).并将spec节中的port更改为443端口。使用命令kubectl edit svc nginx来编辑刚刚创建的服务,编辑完成后按:wq保存即可:

    [root@node1 ~]# kubectl edit svc nginx
    apiVersion: v1
    kind: Service
    metadata:
    annotations:
      service.beta.kubernetes.io/alicloud-loadbalancer-ProtocolPort: "https:443"
      service.beta.kubernetes.io/alicloud-loadbalancer-Bandwidth: 60
      service.beta.kubernetes.io/alicloud-loadbalancer-CertID: "replace with your certid"
      service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckFlag: off
    creationTimestamp: 2017-01-18T10:45:32Z
    labels:
      run: nginx
    name: nginx
    namespace: default
    resourceVersion: "14365"
    selfLink: /api/v1/namespaces/default/services/nginx
    uid: 3c0e72e1-dd6b-11e6-b1ec-00163e0c1de5
    spec:
    clusterIP: 172.19.6.158
    ports:
    - nodePort: 30146
      port: 443
      protocol: TCP
      targetPort: 80
    selector:
      run: nginx
    sessionAffinity: None
    type: LoadBalancer
    status:
    loadBalancer:
      ingress:
      - ip: 118.178.111.31

稍等片刻,然后访问https://118.178.111.31即可看到一个安全的https nginx服务。 Go rock and roll!

小结

阿里云提供一个开放的公有云环境,为开源社区软件提供丰富的运行环境。您可以十分方便的在阿里云环境上搭建一个kubernetes集群来运行您的服务,但是阿里云容器服务为您提供了一站式解决方案,使用阿里云容器服务可以免去您集群运维的烦恼。

阿里云容器服务团队致力于在阿里云上推广容器技术。想了解更多容器服务内容,请访问https://www.aliyun.com/product/containerservice

附件 可用Annotation列表参考
Annotation Description Default
service.beta.kubernetes.io/alicloud-loadbalancer-ProtocolPort comma separated pair like "https:443,http:80" none
service.beta.kubernetes.io/alicloud-loadbalancer-AddressType Be "internet" or "intranet" "internet"
service.beta.kubernetes.io/alicloud-loadbalancer-SLBNetworkType slb network type, which is classic or vpc Be "classic" or "vpc"
service.beta.kubernetes.io/alicloud-loadbalancer-ChargeType Be "paybytraffic" or "payby bandwidth" "paybybandwidth"
service.beta.kubernetes.io/alicloud-loadbalancer-Region Which region this SLB in
service.beta.kubernetes.io/alicloud-loadbalancer-Bandwidth SLB bandwidth 50
service.beta.kubernetes.io/alicloud-loadbalancer-CertID certification id on AlibabaCloud, you need to upload first ""
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckFlag "on" or "off" "off" tcp no need for this mark because it default to "on"
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckType see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckURI see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckConnectPort see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-HealthyThreshold see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-UnhealthyThreshold see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckInterval see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckConnectTimeout see HealthCheck
service.beta.kubernetes.io/alicloud-loadbalancer-HealthCheckTimeout see HealthCheck

用云栖社区APP,舒服~

【云栖快讯】诚邀你用自己的技术能力来用心回答每一个问题,通过回答传承技术知识、经验、心得,问答专家期待你加入!  详情请点击

网友评论

1F
kissy

我在本地虚拟机上该怎么安装呢?

初扬

本地虚拟机上安装您将无法使用阿里云SLB。

评论
2F
loading)3

type=Loadbalance是slb对k8s开发的驱动接口吗,开源了吗

ledzep2

非官方的 https://www.github.com/kubeup/kube-aliyun

初扬

正在开源流程,但是kubernetes官方现在正在重构其cloudprovider,等重构完成后我们会将该Driver提交上去。

评论
3F
vitohermes

我阿里云6台都是centos7的镜像,能直接升级成centos7.2的吗?

初扬

yum update

评论
4F
大咕咕

这个错误怎么解决: Error adding network: open /run/flannel/subnet.env: no such file or directory

初扬

是否按上面的方式创建了flannel 网络?

大咕咕

是的

大咕咕

11m 11m 1 kube-flannel-ds-llf3l Pod spec.containers{install-cni} Normal Started {kubelet pro-2} Started container with docker id e7644835f221
11m 11m 1 kube-flannel-ds-llf3l Pod spec.containers{kube-flannel} Normal Pulled {kubelet pro-2} Container image "registry.cn-hangzhou.aliyuncs.com/google-containers/flannel-git:v0.7.0-2-g3a5b085-dirty-amd64" already present on machine
11m 11m 1 kube-flannel-ds-llf3l Pod spec.containers{kube-flannel} Normal Created {kubelet pro-2} Created container with docker id 3ff7d3184385; Security:[seccomp=unconfined]
11m 11m 1 kube-flannel-ds-llf3l Pod spec.containers{kube-flannel} Normal Started {kubelet pro-2} Started container with docker id 3ff7d3184385
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{kube-flannel} Normal Pulling {kubelet pro-1} pulling image "registry.cn-hangzhou.aliyuncs.com/google-containers/flannel-git:v0.7.0-2-g3a5b085-dirty-amd64"
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{kube-flannel} Normal Pulled {kubelet pro-1} Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/google-containers/flannel-git:v0.7.0-2-g3a5b085-dirty-amd64"
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{kube-flannel} Normal Created {kubelet pro-1} Created container with docker id 249219658c22; Security:[seccomp=unconfined]
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{kube-flannel} Normal Started {kubelet pro-1} Started container with docker id 249219658c22
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{install-cni} Normal Pulled {kubelet pro-1} Container image "registry.cn-hangzhou.aliyuncs.com/google-containers/flannel-git:v0.7.0-2-g3a5b085-dirty-amd64" already present on machine
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{install-cni} Normal Created {kubelet pro-1} Created container with docker id 0d339a508175; Security:[seccomp=unconfined]
11m 11m 1 kube-flannel-ds-xr9f5 Pod spec.containers{install-cni} Normal Started {kubelet pro-1} Started container with docker id 0d339a508175
11m 11m 1 kube-flannel-ds DaemonSet Normal SuccessfulCreate {daemon-set } Created pod: kube-flannel-ds-llf3l
11m 11m 1 kube-flannel-ds DaemonSet Normal SuccessfulCreate {daemon-set } Created pod: kube-flannel-ds-xr9f5
13m 13m 1 kube-proxy-7zxsf Pod spec.containers{kube-proxy} Normal Pulling {kubelet pro-2} pulling image "registry.cn-hangzhou.aliyuncs.com/google-containers/hyperkube-amd64:v1.6.0-alpha.0-alicloud"
13m 13m 1 kube-proxy-7zxsf Pod spec.containers{kube-proxy} Normal Pulled {kubelet pro-2} Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/google-containers/hyperkube-amd64:v1.6.0-alpha.0-alicloud"
13m 13m 1 kube-proxy-7zxsf Pod spec.containers{kube-proxy} Normal Created {kubelet pro-2} Created container with docker id 14e9ace45320; Security:[seccomp=unconfined]
13m 13m 1 kube-proxy-7zxsf Pod spec.containers{kube-proxy} Normal Started {kubelet pro-2} Started container with docker id 14e9ace45320
13m 13m 1 kube-proxy DaemonSet Normal SuccessfulCreate {daemon-set } Created pod: kube-proxy-7zxsf
11m 1h 3242 kubernetes-dashboard-973706668-gf3pr Pod Warning FailedSync {kubelet pro-1} Error syncing pod, skipping: failed to "SetupNetwork" for "kubernetes-dashboard-973706668-gf3pr_kube-system" with SetupNetworkError: "Failed to setup network for pod "kubernetes-dashboard-973706668-gf3pr_kube-system(d432dd0e-ee6e-11e6-b383-00163e0c7ebf)" using network plugins "cni": cni config unintialized; Skipping pod"

8m 11m 141 kubernetes-dashboard-973706668-gf3pr Pod Warning FailedSync {kubelet pro-1} Error syncing pod, skipping: failed to "SetupNetwork" for "kubernetes-dashboard-973706668-gf3pr_kube-system" with SetupNetworkError: "Failed to setup network for pod "kubernetes-dashboard-973706668-gf3pr_kube-system(d432dd0e-ee6e-11e6-b383-00163e0c7ebf)" using network plugins "cni": open /run/flannel/subnet.env: no such file or directory; Skipping pod"

8m 8m 1 kubernetes-dashboard-973706668-khvgl Pod Normal Scheduled {default-scheduler } Successfully assigned kubernetes-dashboard-973706668-khvgl to pro-2
1s 8m 513 kubernetes-dashboard-973706668-khvgl Pod Warning FailedSync {kubelet pro-2} Error syncing pod, skipping: failed to "SetupNetwork" for "kubernetes-dashboard-973706668-khvgl_kube-system" with SetupNetworkError: "Failed to setup network for pod "kubernetes-dashboard-973706668-khvgl_kube-system(bb859c4c-ee78-11e6-977c-00163e0c7ebf)" using network plugins "cni": open /run/flannel/subnet.env: no such file or directory; Skipping pod"

8m 8m 1 kubernetes-dashboard-973706668 ReplicaSet Normal SuccessfulCreate {replicaset-controller } Created pod: kubernetes-dashboard-973706668-khvgl

初扬

看一下Pod的启动情况,以及kubelete 日志
kubectl --namespace=kube-system get po
journalctl -u kubelet -f

大咕咕

我试了下,使用独立的vpc网络下的机器上部署没问题,但在已有机器的vpc网络中部署会失败,但 kube-flannel-ds 日志无法查看不知道是不是网络冲突了

评论
5F
yuanpingi

你好,我是Ubuntu16.04,kubeadm init时,到这一步时就停住了:
Hands on 问题 [apiclient] Created API client, waiting for the control plane to become ready

maxrocky 赞同
初扬

第一次执行需要下载镜像耗时较长,如果失败,卸载后重试一下

靖康

我和是这一个卡住,请问如何卸载?

初扬

文章里面有卸载的方法,全文搜索一下卸载

cloudexp

在init这一步一直卡住不动,也没有错误,怎么破?

评论
6F
vitohermes

2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: I0206 12:07:17.423998 6199 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: I0206 12:07:17.424495 6199 alicloud.go:190] Alicloud.ExternalID("izuf6fu4aqltf1evxoeqdxz")
2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: I0206 12:07:17.424524 6199 alicloud_instances.go:101] Alicloud.findInstanceByNodeName("izuf6fu4aqltf1evxoeqdxz")
2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:17.456786 6199 kubelet_node_status.go:73] Unable to construct v1.Node object for kubelet: failed to get external ID from cloud provider: instance not found
2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:17.818413 6199 reflector.go:188] pkg/kubelet/config/apiserver.go:45: Failed to list *v1.Pod: Get https://106.14.31.4:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dizuf6fu4aqltf1evxoeqdxz&resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:17.818434 6199 reflector.go:188] pkg/kubelet/kubelet.go:379: Failed to list *v1.Service: Get https://106.14.31.4:6443/api/v1/services?resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:17 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:17.818431 6199 reflector.go:188] pkg/kubelet/kubelet.go:387: Failed to list *v1.Node: Get https://106.14.31.4:6443/api/v1/nodes?fieldSelector=metadata.name%3Dizuf6fu4aqltf1evxoeqdxz&resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:18 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:18.819196 6199 reflector.go:188] pkg/kubelet/config/apiserver.go:45: Failed to list *v1.Pod: Get https://106.14.31.4:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dizuf6fu4aqltf1evxoeqdxz&resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:18 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:18.819233 6199 reflector.go:188] pkg/kubelet/kubelet.go:379: Failed to list *v1.Service: Get https://106.14.31.4:6443/api/v1/services?resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:18 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:18.819255 6199 reflector.go:188] pkg/kubelet/kubelet.go:387: Failed to list *v1.Node: Get https://106.14.31.4:6443/api/v1/nodes?fieldSelector=metadata.name%3Dizuf6fu4aqltf1evxoeqdxz&resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:19 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: I0206 12:07:19.735119 6199 kubelet.go:1725] skipping pod synchronization - [Kubelet failed to get node info: failed to get external ID from cloud provider: instance not found]
2月 06 12:07:19 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:19.819733 6199 reflector.go:188] pkg/kubelet/kubelet.go:379: Failed to list *v1.Service: Get https://106.14.31.4:6443/api/v1/services?resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:19 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:19.819743 6199 reflector.go:188] pkg/kubelet/config/apiserver.go:45: Failed to list *v1.Pod: Get https://106.14.31.4:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dizuf6fu4aqltf1evxoeqdxz&resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused
2月 06 12:07:19 iZuf6fu4aqltf1evxoeqdxZ kubelet[6199]: E0206 12:07:19.819750 6199 reflector.go:188] pkg/kubelet/kubelet.go:387: Failed to list *v1.Node: Get https://106.14.31.4:6443/api/v1/nodes?fieldSelector=metadata.name%3Dizuf6fu4aqltf1evxoeqdxz&resourceVersion=0: dial tcp 106.14.31.4:6443: getsockopt: connection refused

初扬

你是否在控制台上修改过实例的名称? 注意不要修改。 另外确保你的key 和id填写正确

靖康

实例名称不能填写吗?要留白?

初扬

@靖康 不能更改,创建出来的实例名称是什么样就维持原样就可以了

1498151011639535

如果改了再改回去可以吗?

评论
7F
maxrocky

试了centos7.2和ubuntu16,这个一键安装包都安装失败。
停在最后[apiclient] Created API client, waiting for the control plane to become ready
就没反应了
看kubelet的日志,有报错信息:
2月 07 12:48:39 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: I0207 12:48:39.337326 5324 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
2月 07 12:48:39 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: I0207 12:48:39.337349 5324 alicloud.go:190] Alicloud.ExternalID("iz2ze18tqzf74akvgmt0n0z")
2月 07 12:48:39 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: I0207 12:48:39.337355 5324 alicloud_instances.go:101] Alicloud.findInstanceByNodeName("iz2ze18tqzf74akvgmt0n0z")
2月 07 12:48:39 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: E0207 12:48:39.459129 5324 kubelet_node_status.go:73] Unable to construct v1.Node object for kubelet: failed to get external ID from cloud provider: instance not found
2月 07 12:48:39 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: E0207 12:48:39.846678 5324 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d
2月 07 12:48:40 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: I0207 12:48:40.029335 5324 kubelet.go:1725] skipping pod synchronization - [Kubelet failed to get node info: failed to get external ID from cloud provider: instance not found]
2月 07 12:48:40 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: E0207 12:48:40.152146 5324 reflector.go:188] pkg/kubelet/kubelet.go:379: Failed to list *v1.Service: Get https://47.93.78.17:6443/api/v1/services?resourceVersion=0: dial tcp 47.93.78.17:6443: getsockopt: connection refused
2月 07 12:48:40 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: E0207 12:48:40.152570 5324 reflector.go:188] pkg/kubelet/config/apiserver.go:45: Failed to list *v1.Pod: Get https://47.93.78.17:6443/api/v1/pods?fieldSelector=spec.nodeName%3Diz2ze18tqzf74akvgmt0n0z&resourceVersion=0: dial tcp 47.93.78.17:6443: getsockopt: connection refused
2月 07 12:48:40 iZ2ze18tqzf74akvgmt0n0Z kubelet[5324]: E0207 12:48:40.152833 5324 reflector.go:188] pkg/kubelet/kubelet.go:387: Failed to list *v1.Node: Get https://47.93.78.17:6443/api/v1/nodes?fieldSelector=metadata.name%3Diz2ze18tqzf74akvgmt0n0z&resourceVersion=0: dial tcp 47.93.78.17:6443: getsockopt: connection refused

初扬

确认一下你的access_key 和access_secret 是否正确。 确保你没有在控制台更改实例的名称

nickzibow

@maxrocky, 请问一下在哪个文件可以查看kubelet的日志?

初扬

journalctl -u kubelet -f

maxrocky

@初扬
key和secret是对的。用的是全局的key,这个应该没问题吧。
至于实例名称,我是没改过。不知道哪些操作会引起实例名修改。
我用全新的环境重新安装了一下,错误依旧。
@nickzibow
我用的命令:
systemctl status kubelet -l

初扬

@maxrocky 你可以到控制台上确认一下你的实例名称是否和hostname一样。

maxrocky

@初扬
在本机查看:
[root@iZ2ze18tqzf74akvgmt0n0Z ~]# hostnamectl status
Static hostname: iZ2ze18tqzf74akvgmt0n0Z
控制台中:
ID: i-2ze18tqzf74akvgmt0n0
这样算是一样的吧

maxrocky

@初扬
求助。。。

初扬

钉钉 搜索 @初扬, 私信你获取相关日志信息排查。

maxrocky

@初扬 我在钉钉里搜索没反应。我的电话18910906850,能否麻烦你加我一下?
这个卡了几天了,一直解决不了。
直接装官方的又用不了SLB。

cheyang52

@maxrocky 搜索spacexnice 钉钉

评论
8F
nickzibow

@初扬
您好,我这边也是在init的时候卡住了,请帮忙看看是什么问题?ps: key-id和key-securt已经确认是正确的,实例名称也没有修改过。系统是centos7。
[apiclient] Created API client, waiting for the control plane to become ready

journalctl -u kubelet -f查看的日志如下:
2月 08 12:55:33 iZ940dv4nthZ systemd[1]: Started kubelet: The Kubernetes Node Agent.
2月 08 12:55:33 iZ940dv4nthZ systemd[1]: Starting kubelet: The Kubernetes Node Agent...
2月 08 12:55:33 iZ940dv4nthZ kubelet[2232]: I0208 12:55:33.558990 2232 feature_gate.go:181] feature gates: map[]
2月 08 12:55:33 iZ940dv4nthZ kubelet[2232]: error: failed to run Kubelet: invalid kubeconfig: stat /etc/kubernetes/kubelet.conf: no such file or directory
2月 08 12:55:33 iZ940dv4nthZ systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
2月 08 12:55:33 iZ940dv4nthZ systemd[1]: Unit kubelet.service entered failed state.
2月 08 12:55:33 iZ940dv4nthZ systemd[1]: kubelet.service failed.
2月 08 12:55:43 iZ940dv4nthZ systemd[1]: kubelet.service holdoff time over, scheduling restart.
2月 08 12:55:43 iZ940dv4nthZ systemd[1]: Started kubelet: The Kubernetes Node Agent.
2月 08 12:55:43 iZ940dv4nthZ systemd[1]: Starting kubelet: The Kubernetes Node Agent...
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.759107 2268 feature_gate.go:181] feature gates: map[]
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.762847 2268 docker.go:356] Connecting to docker on unix:///var/run/docker.sock
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.762874 2268 docker.go:376] Start docker client with request timeout=2m0s
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.764181 2268 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.770163 2268 alicloud.go:226] Alicloud.CurrentNodeName("iz940dv4nthz")
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.770288 2268 manager.go:143] cAdvisor running in container: "/system.slice/kubelet.service"
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: W0208 12:55:43.784078 2268 manager.go:151] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.795274 2268 fs.go:117] Filesystem partitions: map[/dev/xvda1:{mountpoint:/var/lib/docker/overlay major:202 minor:1 fsType:ext4 blockSize:0} /dev/xvdb1:{mountpoint:/mnt major:202 minor:17 fsType:ext3 blockSize:0}]
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.796677 2268 manager.go:198] Machine: {NumCores:4 CpuFrequency:2600062 MemoryCapacity:7933718528 MachineID:fe50755eb5ae8841447ba7240000002d SystemUUID:CBAABC2F-9567-4FAF-B54A-36DA6AD7082A BootID:d127f3c0-aee5-445e-b835-1b8dd0341240 Filesystems:[{Device:/dev/xvda1 Capacity:42139451392 Type:vfs Inodes:2621440 HasInodes:true} {Device:/dev/xvdb1 Capacity:528305963008 Type:vfs Inodes:32768000 HasInodes:true}] DiskMap:map[202:0:{Name:xvda Major:202 Minor:0 Size:42949672960 Scheduler:deadline} 202:16:{Name:xvdb Major:202 Minor:16 Size:536870912000 Scheduler:deadline}] NetworkDevices:[{Name:eth0 MacAddress:00:16:3e:00:01:8f Speed:0 Mtu:1500} {Name:eth1 MacAddress:00:16:3e:00:07:11 Speed:0 Mtu:1500}] Topology:[{Id:0 Memory:8589533184 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:20971520 Type:Unified Level:3}]} {Id:1 Threads:[1] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:20971520 Type:Unified Level:3}]} {Id:2 Threads:[2] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:20971520 Type:Unified Level:3}]} {Id:3 Threads:[3] Caches:[{Size:32768 Type:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:262144 Type:Unified Level:2} {Size:20971520 Type:Unified Level:3}]}] Caches:[]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None}
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.797240 2268 manager.go:204] Version: {KernelVersion:3.10.0-327.el7.x86_64 ContainerOsVersion:CentOS Linux 7 (Core) DockerVersion:1.13.0 CadvisorVersion: CadvisorRevision:}
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.799585 2268 alicloud.go:226] Alicloud.CurrentNodeName("iz940dv4nthz")
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.799787 2268 alicloud.go:226] Alicloud.CurrentNodeName("iz940dv4nthz")
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.799816 2268 kubelet.go:243] Adding manifest file: /etc/kubernetes/manifests
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.799853 2268 kubelet.go:253] Watching apiserver
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.802379 2268 reflector.go:188] pkg/kubelet/config/apiserver.go:45: Failed to list *v1.Pod: Get https://x.x.x.x:6443/api/v1/pods?fieldSelector=spec.nodeName%3Diz940dv4nthz&resourceVersion=0: dial tcp x.x.x.x:6443: getsockopt: connection refused
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.802477 2268 reflector.go:188] pkg/kubelet/kubelet.go:387: Failed to list *v1.Node: Get https://x.x.x.x:6443/api/v1/nodes?fieldSelector=metadata.name%3Diz940dv4nthz&resourceVersion=0: dial tcp x.x.x.x:6443: getsockopt: connection refused
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: W0208 12:55:43.802954 2268 kubelet_network.go:69] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.802978 2268 kubelet.go:478] Hairpin mode set to "hairpin-veth"
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.803796 2268 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.805302 2268 reflector.go:188] pkg/kubelet/kubelet.go:379: Failed to list *v1.Service: Get https://x.x.x.x:6443/api/v1/services?resourceVersion=0: dial tcp x.x.x.x:6443: getsockopt: connection refused
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.813919 2268 docker_manager.go:259] Setting dockerRoot to /var/lib/docker
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.813938 2268 docker_manager.go:262] Setting cgroupDriver to cgroupfs
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.815155 2268 server.go:777] Started kubelet v1.6.0-alpha.0.2229+88fbc68ad99479-dirty
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.815815 2268 kubelet.go:1146] Image garbage collection failed: unable to find data for container /
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.816102 2268 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.816113 2268 alicloud.go:190] Alicloud.ExternalID("iz940dv4nthz")
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.816120 2268 alicloud_instances.go:101] Alicloud.findInstanceByNodeName("iz940dv4nthz")
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: I0208 12:55:43.816912 2268 server.go:125] Starting to listen on 0.0.0.0:10250
2月 08 12:55:43 iZ940dv4nthZ kubelet[2268]: E0208 12:55:43.819878 2268 event.go:208] Unable to write event: 'Post https://x.x.x.x:6443/api/v1/namespaces/default/events: dial tcp x.x.x.x:6443: getsockopt: connection refused' (may retry after sleeping)
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: E0208 12:55:44.025700 2268 kubelet.go:1228] Kubelet failed to get node info: failed to get external ID from cloud provider: instance not found
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.025747 2268 status_manager.go:131] Starting to sync pod status with apiserver
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.025774 2268 kubelet.go:1714] Starting kubelet main sync loop.
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.025784 2268 kubelet.go:1725] skipping pod synchronization - [Kubelet failed to get node info: failed to get external ID from cloud provider: instance not found container runtime is down]
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.026005 2268 volume_manager.go:242] Starting Kubelet Volume Manager
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.041277 2268 factory.go:295] Registering Docker factory
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: W0208 12:55:44.041318 2268 manager.go:247] Registration of the rkt container factory failed: unable to communicate with Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp [::1]:15441: getsockopt: connection refused
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.041328 2268 factory.go:54] Registering systemd factory
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.041456 2268 factory.go:86] Registering Raw factory
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.041591 2268 manager.go:1106] Started watching for new ooms in manager
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.042841 2268 oomparser.go:185] oomparser using systemd
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.044116 2268 manager.go:288] Starting recovery of all containers
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.082920 2268 manager.go:293] Recovery completed
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.126264 2268 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.126317 2268 alicloud.go:190] Alicloud.ExternalID("iz940dv4nthz")
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.126338 2268 alicloud_instances.go:101] Alicloud.findInstanceByNodeName("iz940dv4nthz")
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: E0208 12:55:44.146112 2268 eviction_manager.go:202] eviction manager: unexpected err: failed GetNode: node 'iz940dv4nthz' not found
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: E0208 12:55:44.250498 2268 kubelet_node_status.go:73] Unable to construct v1.Node object for kubelet: failed to get external ID from cloud provider: instance not found
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.450690 2268 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.450745 2268 alicloud.go:190] Alicloud.ExternalID("iz940dv4nthz")
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: I0208 12:55:44.450757 2268 alicloud_instances.go:101] Alicloud.findInstanceByNodeName("iz940dv4nthz")
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: E0208 12:55:44.603183 2268 kubelet_node_status.go:73] Unable to construct v1.Node object for kubelet: failed to get external ID from cloud provider: instance not found
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: E0208 12:55:44.803310 2268 reflector.go:188] pkg/kubelet/kubelet.go:387: Failed to list *v1.Node: Get https://x.x.x.x:6443/api/v1/nodes?fieldSelector=metadata.name%3Diz940dv4nthz&resourceVersion=0: dial tcp x.x.x.x:6443: getsockopt: connection refused
2月 08 12:55:44 iZ940dv4nthZ kubelet[2268]: E0208 12:55:44.803310 2268 reflector.go:188] pkg/kubelet/config/apiserver.go:45: Failed to list *v1.Pod: Get https://x.x.x.x:6443/api/v1/pods?fieldSelector=spec.nodeName%3Diz940dv4nthz&resourceVersion=0: dial tcp x.x.x.x:6443: getsockopt: connection refused

备注:
x.x.x.x显示的是实例的外网IP地址

nickzibow

谢谢,已解决,确实是实例名称改过导致的问题。

maxrocky

请问怎么解决的?

nickzibow

@maxrocky 我这边修改过ECS实例的名称,一开始新建ESC的时候实例ID和名称应该是一样的,你可以检查一下是不是修改过实例的名称。

maxrocky

没改过。而且是新的虚机直接安装就失败了

lzwujun

@nickzibow 非常感谢! 同样的问题,thanks

hupo

@maxrocky 是不是region没有写对,这个也会导致找不到实例的。

黑水之神

@nickzibow 你好,我有遇到error updating cni config: No networks found in /etc/cni/net.d,需要改回最初的名字吗

cloudexp

我的实例名称改过,那么改回来有效吗?

评论
9F
网君

本文好像没有说明不能修改实例名称吧(可以支持修改实例名称吗?),我年前也折腾好几次都是楼上朋友遇到的问题,无奈又换回了https://yq.aliyun.com/articles/66474 部署方式。另外,想请教一下作者,“阿里云快速部署Kubernetes - VPC环境”这个里面的access key and secret好像不起啥作业吧?我容器里的服务不知道如何利用SLB,谢谢。

初扬

使用本文的部署方式支持阿里云SLB

评论
10F
nickzibow

@初扬 请问一下,启动后一直出现以下的问题,是什么原因?

112

初扬

kubectl describe pod kube-dns-xxxx-xxxx 看看是什么问题在重启

nickzibow

@初扬 查看日志内容如下:

Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
1h 7m 9 {kubelet iz940dv4nthz} Warning FailedSync Error syncing pod, skipping: [failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-963068468-zmznr_kube-system(25d03f8a-ee7c-11e6-bf2b-00163e00018f)"
, failed to "StartContainer" for "kube-dns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-dns pod=kube-dns-963068468-zmznr_kube-system(25d03f8a-ee7c-11e6-bf2b-00163e00018f)"
]
1h 5m 24 {kubelet iz940dv4nthz} spec.containers{dnsmasq} Normal Pulled Container image "registry.cn-hangzhou.aliyuncs.com/google-containers/kube-dnsmasq-amd64:1.4" already present on machine
1h 4m 111 {kubelet iz940dv4nthz} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "kube-dns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-dns pod=kube-dns-963068468-zmznr_kube-system(25d03f8a-ee7c-11e6-bf2b-00163e00018f)"

1h 2m 24 {kubelet iz940dv4nthz} spec.containers{kube-dns} Normal Pulled Container image "registry.cn-hangzhou.aliyuncs.com/google-containers/kubedns-amd64:1.9" already present on machine
1h 2m 41 {kubelet iz940dv4nthz} spec.containers{dnsmasq} Normal Created (events with common reason combined)
1h 2m 41 {kubelet iz940dv4nthz} spec.containers{dnsmasq} Normal Started (events with common reason combined)
1h 1m 130 {kubelet iz940dv4nthz} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-963068468-zmznr_kube-system(25d03f8a-ee7c-11e6-bf2b-00163e00018f)"

1h 1m 56 {kubelet iz940dv4nthz} spec.containers{kube-dns} Warning Unhealthy Liveness probe failed: HTTP probe failed with statuscode: 503
1h 57s 263 {kubelet iz940dv4nthz} spec.containers{kube-dns} Warning Unhealthy Readiness probe failed: Get http://172.16.0.3:8081/readiness: dial tcp 172.16.0.3:8081: getsockopt: connection refused
1h 56s 39 {kubelet iz940dv4nthz} spec.containers{kube-dns} Normal Killing (events with common reason combined)
1h 14s 443 {kubelet iz940dv4nthz} spec.containers{kube-dns} Warning BackOff Back-off restarting failed docker container
1h 14s 81 {kubelet iz940dv4nthz} Warning FailedSync Error syncing pod, skipping: [failed to "StartContainer" for "kube-dns" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kube-dns pod=kube-dns-963068468-zmznr_kube-system(25d03f8a-ee7c-11e6-bf2b-00163e00018f)"
, failed to "StartContainer" for "dnsmasq" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=dnsmasq pod=kube-dns-963068468-zmznr_kube-system(25d03f8a-ee7c-11e6-bf2b-00163e00018f)"
]

nickzibow

@初扬 查看dashboard docker启动日志如下:

[root@iZ940dv4nthZ ~]# docker logs -f 50ec2428f61b
Using HTTP port: 9090
Creating API server client for https://172.19.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://172.19.0.1:443/version: dial tcp 172.19.0.1:443: getsockopt: no route to host
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

初扬

@nickzibow 看 DNS POD的日志

nickzibow

@初扬
查看master下docker的kube-dns的日志,提示以下错误日志,请问怎么解决?

I0212 02:53:48.014491       1 dns.go:42] version: v1.6.0-alpha.0.680+3872cb93abf948-dirty
I0212 02:53:48.015241       1 server.go:107] Using https://172.19.0.1:443 for kubernetes master, kubernetes API: <nil>
I0212 02:53:48.016078       1 server.go:68] Using configuration read from ConfigMap: kube-system:kube-dns
I0212 02:53:48.016193       1 server.go:113] FLAG: --alsologtostderr="false"
I0212 02:53:48.016259       1 server.go:113] FLAG: --config-map="kube-dns"
I0212 02:53:48.016302       1 server.go:113] FLAG: --config-map-namespace="kube-system"
I0212 02:53:48.016334       1 server.go:113] FLAG: --dns-bind-address="0.0.0.0"
I0212 02:53:48.016360       1 server.go:113] FLAG: --dns-port="10053"
I0212 02:53:48.016397       1 server.go:113] FLAG: --domain="cluster.local."
I0212 02:53:48.016453       1 server.go:113] FLAG: --federations=""
I0212 02:53:48.016505       1 server.go:113] FLAG: --healthz-port="8081"
I0212 02:53:48.016544       1 server.go:113] FLAG: --kube-master-url=""
I0212 02:53:48.016588       1 server.go:113] FLAG: --kubecfg-file=""
I0212 02:53:48.016641       1 server.go:113] FLAG: --log-backtrace-at=":0"
I0212 02:53:48.016672       1 server.go:113] FLAG: --log-dir=""
I0212 02:53:48.016699       1 server.go:113] FLAG: --log-flush-frequency="5s"
I0212 02:53:48.016762       1 server.go:113] FLAG: --logtostderr="true"
I0212 02:53:48.016790       1 server.go:113] FLAG: --stderrthreshold="2"
I0212 02:53:48.016828       1 server.go:113] FLAG: --v="2"
I0212 02:53:48.016868       1 server.go:113] FLAG: --version="false"
I0212 02:53:48.016916       1 server.go:113] FLAG: --vmodule=""
I0212 02:53:48.017033       1 server.go:155] Starting SkyDNS server (0.0.0.0:10053)
I0212 02:53:48.017445       1 server.go:165] Skydns metrics enabled (/metrics:10055)
I0212 02:53:48.017520       1 dns.go:144] Starting endpointsController
I0212 02:53:48.017566       1 dns.go:147] Starting serviceController
I0212 02:53:48.018557       1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0212 02:53:48.018586       1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
E0212 02:53:49.020118       1 sync.go:105] Error getting ConfigMap kube-system:kube-dns err: Get https://172.19.0.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp 172.19.0.1:443: getsockopt: no route to host
E0212 02:53:49.020156       1 dns.go:190] Error getting initial ConfigMap: Get https://172.19.0.1:443/api/v1/namespaces/kube-system/configmaps/kube-dns: dial tcp 172.19.0.1:443: getsockopt: no route to host, starting with default values
I0212 02:53:49.020206       1 dns.go:163] Waiting for Kubernetes service
I0212 02:53:49.020217       1 dns.go:169] Waiting for service: default/kubernetes
E0212 02:53:55.033962       1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://172.19.0.1:443/api/v1/services?resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: connection timed out
E0212 02:53:55.034004       1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://172.19.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: connection timed out
E0212 02:53:56.033833       1 reflector.go:199] pkg/dns/config/sync.go:114: Failed to list *api.ConfigMap: Get https://172.19.0.1:443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dkube-dns&resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: connection timed out
E0212 02:54:03.049902       1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://172.19.0.1:443/api/v1/services?resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: connection timed out
E0212 02:54:03.049916       1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://172.19.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: no route to host
E0212 02:54:04.049877       1 reflector.go:199] pkg/dns/config/sync.go:114: Failed to list *api.ConfigMap: Get https://172.19.0.1:443/api/v1/namespaces/kube-system/configmaps?fieldSelector=metadata.name%3Dkube-dns&resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: no route to host
E0212 02:54:11.065856       1 reflector.go:199] pkg/dns/dns.go:148: Failed to list *api.Service: Get https://172.19.0.1:443/api/v1/services?resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: connection timed out
E0212 02:54:11.065980       1 reflector.go:199] pkg/dns/dns.go:145: Failed to list *api.Endpoints: Get https://172.19.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 172.19.0.1:443: getsockopt: connection timed out
nickzibow

@初扬
反馈一下:
文章中”方案二、增加经典网络支持“中的命令有一个小手误:

kubect apply -f http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/conf/flannel-vxlan.yml 
命令敲少了一个”l"
kubectl apply -f http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/conf/flannel-vxlan.yml

建议修正一下,谢谢

初扬

@nickzibow 已修改,多谢!

初扬

你可以看看flannel日志,应该是网络创建那一步没有成功。或者 钉钉 搜索@初扬 私信你

nickzibow

@初扬
能提供一下钉钉的账号吗?我这边搜“初扬 ”,查不到你的账号

初扬

@nickzibow spacexnice

noahzao

@初扬 @nickzibow 我也遇到相同的问题无法使用kube-dns

1149051982032979

@初扬 @nickzibow 我也遇到这个问题了 最后是怎么解决的啊

评论
11F
hupo

创建SLB service一步出错误了:

screenshot

2017-02-09 20:21:45 +0800 CST 2017-02-09 20:21:45 +0800 CST 1 nginx Service Warning CreatingLoadBalancerFailed {service-controller } Error creating load balancer (will retry): Failed to create load balancer for service default/nginx: Aliyun API Error: RequestId: E86A67C5-2EC2-4773-AA45-8628FA8D7756 Status Code: 400 Code: ObtainIpFail Message: Obtain Ip Fail, please make sure the ECS exists and the status is running.

hupo

@初扬 请问一下是因为cloudprovider的问题么?

hupo

发现问题了:默认创建的SLB在cn-hangzhou,ecs在cn-beijing。需要用kubectl edit svc nginx来加上region

请问一下为什么slb创建的时候不会去读cloud-config里边的配置呢?

/etc/kubernetes/cloud-config中的内容:
{

"global": {
 "accessKeyID": "xxx",
 "accessKeySecret": "xxx",
 "kubernetesClusterTag": "hangzhou-kube",
 "region": "cn-beijing"

}
}

初扬

好的,感谢反馈,我更改一下默认配置。@hupo

hupo

@初扬 多谢多谢!请问你更新代码之后,我们集群需要重新部署么?

lzwujun

@初扬 我在运行 kubectl apply -f flannel-vpc.yml 后,阿里云总是反复给我开通负载均衡,我这边的负载均衡都已经开通十几个了,而且我的地区在华北2,它却一遍遍的给我开通华东2的负载均衡,我通过工单问技术,他们给我的回答是我是通过通过api开通的,请问是在什么地方加限制吗?

初扬

初扬

@lzwujun 这是因为你在安装时填写的region与你实际的ECS所在region 不一致导致的,你在安装的时候填写的region是什么?

初扬

@hupo 搜索钉钉号 spacexnice

评论
12F
maxrocky

终于安装成功了,我一直以为所说的实例名是实例ID,原来是名称。
把名称改成跟host一致可以了。
不过实例名我们是用来帮助管理ECS用途的,如果改为跟host一致还是挺不方便的。
下个版本建议不再通过实例名

nickzibow

你的dns配置有没有成功?@maxrocky

评论
13F
靖康

太赞了,周末试用了非常给力!成功跑起来了!

14F
lzwujun

@初扬 我在运行 kubectl apply -f flannel-vpc.yml 后,阿里云总是反复给我开通负载均衡,我这边的负载均衡都已经开通十几个了,而且我的地区在华北2,它却一遍遍的给我开通华东2的负载均衡,我通过工单问技术,他们给我的回答是我是通过通过api开通的,我没有办法只能把Accesskey删了,可是这样一来我的跑我应用啊?

初扬

搜索钉钉号 spacexnice

评论
15F
易加油

如果是经典网络的服务器, 执行kubeadm init 必须指定内网地址, 然后 另一台服务器通过 join master的内网地址, 才能使 flannel 插件容器正常运行. 但是我发现, master 和 node节点的 docker0 网卡仍然不能相互ping通, 请问一下我是否还需要配置route, 或修改docker0网卡相关配置??

cheyang52

docker0网卡你可以删掉了,我们不用docker0网卡

评论
16F
lzwujun

@初扬, 首先,非常感谢您写的这篇文章,让我们这些想在阿里云上玩kubernetes的小伙伴们少走了不少冤枉路,另外最重要的,还有感谢您的十分耐心的指导,让我们能从一个个的坑里面爬出来,通过这几天的摸爬滚打,让我对kubernetes和容器技术都有了不少新的认识,谢谢!祝好

17F
qingtingaliyun

感谢,期待CloudProvider,用这个脚本不能修改实例名称不是特别方便建议可以走实例ID,还有这个flannel走的是eth1不是走内网。

18F
oneyonyou

@初扬 问一下,在阿里云上自己搭建的k8s集群,是否可以集成SLB功能?如何可以如何集成?(是通过https://www.github.com/kubeup/kube-aliyun这个吗?)

ledzep2

这个是我开发的第三方out-of-tree的cloudprovider方案,目前已经支持了SLB, Routes, Volumes/PV, Dynamic Provisioning. 欢迎尝试。可以考虑用这个例子进行一键部署 https://github.com/kubeup/archon/tree/master/example/k8s-aliyun

1331664247802516

你做的就是个垃圾,错误百出,文档、各种说明、各种配置文件错误百出,互相矛盾,直接搞蹦了我两台服务器,这么垃圾的东西,还出来丢人现眼

1331664247802516

千万不要用https://github.com/kubeup/archon/tree/master/example/k8s-aliyun,这个东西,就是个垃圾

裴行检

@ledzep2 能不给个https://www.github.com/kubeup/kube-aliyun完整的操作示例
我想使用阿里云的aliyun-flexv,我的k8s集群是按照https://github.com/kubeup/okdc安装的

评论
19F
马二

@初扬 第一次配置了一个master和一个node,最后配置的经典网络支持,没有问题,可是再增加一个节点就不行了,老实报错:

Mar 23 17:08:35 hisensesaas8 kubelet[4941]: I0323 17:08:35.120723    4941 alicloud_instances.go:101] Alicloud.findInstanceByNodeName("hisensesaas8")
Mar 23 17:08:39 hisensesaas8 kubelet[4941]: E0323 17:08:39.530494    4941 cni.go:163] error updating cni config: No networks found in /etc/cni/net.d

没有/etc/cni/net.d,我看,里面确实没有生成10-flannel.conf文件,这个怎么解决? 我再次执行

[root@node1 ~]# kubectl apply -f http://aliacs-k8s.oss-cn-hangzhou.aliyuncs.com/conf/flannel-vxlan.yml

也不行,而且在node中,无法执行kubectl get nodes等命令,提示

$kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?

所以在node中也无法执行kubectl apply命令,怎么解决?

20F
1331664247802516

我如果要部署3个MASTER的话,怎么在NODE和MASTER节点之间安装和配置SLB呢,另外,你们的MASTER安装是否已经集成安装了ETCD,如果我要单独部署ETCD集群,又怎么办呢。谢谢

初扬
文章13篇 | 关注82
关注
提供了高性能可伸缩的容器应用管理服务,支持在一组云服务器上通过Docker容器来进行应用生命... 查看详情
一站式提供企业即时通讯、销售管理、协同办公。 查看详情
在云上签发Symantec、WoSign、CFCA证书,实现网站HTTPS化,使网站可信,防... 查看详情
为您提供简单高效、处理能力可弹性伸缩的计算服务,帮助您快速构建更稳定、安全的应用,提升运维效... 查看详情
阿里云9.10会员日

阿里云9.10会员日