我想要我的pod的回滚部署。我正在set ImageCI环境中更新我的pod 。当我将Deployment / web文件中的maxUnavailable设置为1时,我会停机。但是当我将maxUnavailable设置为0时,pod不会被替换,容器/ app也不会重新启动。
此外,我在Kubernetes集群中有一个节点,这是它的信息
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
881m (93%) 396m (42%) 909712Ki (33%) 1524112Ki (56%)
Events: <none>
这是完整的YAML文件。我确实准备好了探针套装。
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "10"
kompose.cmd: C:\ProgramData\chocolatey\lib\kubernetes-kompose\tools\kompose.exe
convert
kompose.version: 1.14.0 (fa706f2)
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"extensions/v1beta1","kind":"Deployment","metadata":{"annotations":{"kompose.cmd":"C:\\ProgramData\\chocolatey\\lib\\kubernetes-kompose\\tools\\kompose.exe convert","kompose.version":"1.14.0 (fa706f2)"},"creationTimestamp":null,"labels":{"io.kompose.service":"dev-web"},"name":"dev-web","namespace":"default"},"spec":{"replicas":1,"strategy":{},"template":{"metadata":{"labels":{"io.kompose.service":"dev-web"}},"spec":{"containers":[{"env":[{"name":"JWT_KEY","value":"ABCD"},{"name":"PORT","value":"2000"},{"name":"GOOGLE_APPLICATION_CREDENTIALS","value":"serviceaccount/quick-pay.json"},{"name":"mongoCon","value":"mongodb://quickpayadmin:quickpay1234@ds121343.mlab.com:21343/quick-pay-db"},{"name":"PGHost","value":"173.255.206.177"},{"name":"PGUser","value":"postgres"},{"name":"PGDatabase","value":"quickpay"},{"name":"PGPassword","value":"z33shan"},{"name":"PGPort","value":"5432"}],"image":"gcr.io/quick-pay-208307/quickpay-dev-node:latest","imagePullPolicy":"Always","name":"dev-web-container","ports":[{"containerPort":2000}],"readinessProbe":{"failureThreshold":3,"httpGet":{"path":"/","port":2000,"scheme":"HTTP"},"initialDelaySeconds":5,"periodSeconds":5,"successThreshold":1,"timeoutSeconds":1},"resources":{"requests":{"cpu":"20m"}}}]}}}}
creationTimestamp: 2018-12-24T12:13:48Z
generation: 12
labels:
io.kompose.service: dev-web
name: dev-web
namespace: default
resourceVersion: "9631122"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/web
uid: 5e66f7b3-0775-11e9-9653-42010a80019d
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
io.kompose.service: web
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: web
spec:
containers:
- env:
- name: PORT
value: "2000"
image: gcr.io/myimagepath/web-node
imagePullPolicy: Always
name: web-container
ports:
- containerPort: 2000
protocol: TCP
readinessProbe:
failureThreshold: 10
httpGet:
path: /
port: 2000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 10m
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2019-01-03T05:49:46Z
lastUpdateTime: 2019-01-03T05:49:46Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2018-12-24T12:13:48Z
lastUpdateTime: 2019-01-03T06:04:24Z
message: ReplicaSet "dev-web-7bd498fc74" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 12
readyReplicas: 2
replicas: 2
updatedReplicas: 2
我试过1副本,它仍然无法正常工作。
第一种情况下,已经设置了maxUnavailable: 1,在滚动升级期间,应该会至少保留一个ready的pod。
当新pod满足探针需求,得到了200到400的状态码,pod的状态变更为:
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
就会继续滚动更新,删除第二个旧POD,启动第二个新POD。
如果readinessProbe在重试失败后,就会把pod标记成noready,前端service流量也不会进来。
如果发现新POD已经开始接收流量,但是无法提供有效服务。应该是探针和服务提供方式没有匹配,探针返回200,后端服务还没准备好。
在第一种情况下,kubernetes删除一个pod(maxUnavailable: 1)并使用新映像启动pod并等待~110秒(基于准备探测)以检查新pod是否能够提供请求。新pod无法为请求提供服务但pod处于运行状态,因此它删除了第二个旧pod并使用新映像启动它,并且第二个pod再次等待准备探测完成。这就是两个容器都没有准备好服务请求并因此停机的时间间隔的原因。
在你所拥有的第二个场景中,maxUnavailable:0kubernetes首先使用新图像调出pod,并且它无法在~110秒内(基于你的准备情况探测器)提供请求,因此超时并删除新的pod新图片。与第二个pod相同。因此,您的两个pod都没有更新
所以原因是你没有给你的应用程序足够的时间来启动并开始提供请求。因此这个问题。请增加failureThreshold准备情况探测中的值maxUnavailable: 0,它会起作用。
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。