Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Recreate DaemonSet pods completed with Succeeded phase #117073

Merged
merged 1 commit into from Apr 4, 2023

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Apr 3, 2023

What type of PR is this?

/kind bug
/kind regression

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #117018

Special notes for your reviewer:

It is a fix for the issue introduced in #115331

Does this PR introduce a user-facing change?

Fix 1.27 regression recreating DaemonSet pods completed with Succeeded phase

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/regression Categorizes issue or PR as related to a regression from a prior release. labels Apr 3, 2023
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.27 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.27.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Apr 3 16:34:40 UTC 2023.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 3, 2023
@k8s-ci-robot k8s-ci-robot added area/test sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 3, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Apr 3, 2023

/sig apps
FYI @alculquicondor

Adding an integration test for DaemonSet to confirm that Succeeded pods are not restarted. Notably, the test for the Failed pod passes, so failed pods are restared. Not sure why the discrapancy exists, seems like a bug. Will look deeper tomorrow.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 4, 2023
@mimowo mimowo force-pushed the fix-daemonset-pod-restarts branch from 4be55b6 to 96e56e7 Compare April 4, 2023 06:01
@mimowo mimowo changed the title Investigate and fix the handling of Succeeded pods in DaemonSet Restart DaemonSet pods completed with Succeeded phase Apr 4, 2023
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 4, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Apr 4, 2023

/assign @alculquicondor @soltysh

@mimowo
Copy link
Contributor Author

mimowo commented Apr 4, 2023

/retest

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I'd like @smarterclayton to double check me, I looked in the history and couldn't find any documents about inner-workings of daemonset controller, so I'd fallback to wiser man to double check me 😅

test/integration/daemonset/daemonset_test.go Outdated Show resolved Hide resolved
@soltysh
Copy link
Contributor

soltysh commented Apr 4, 2023

/priority important-longterm
/triage accepted

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Apr 4, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 4, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 25221ea423a393d389236df41649fa168db4fe34

@xmudrii
Copy link
Member

xmudrii commented Apr 4, 2023

ACK-ing the PR, I pinged the Release Team Leads on Slack as they're responsible for the code freeze exceptions: https://kubernetes.slack.com/archives/C2C40FMNF/p1680636984022679

@xmudrii
Copy link
Member

xmudrii commented Apr 4, 2023

cc @kubernetes/release-team-leads

@cici37
Copy link
Contributor

cici37 commented Apr 4, 2023

Looks like the issue was introduced in 1.27 and the fix is critical. Since rc.1 was delayed to Thursday, I am +1 to merge this one before v1,27,0-rc.1.

@bobbypage
Copy link
Member

bobbypage commented Apr 4, 2023

/lgtm

nit: I would retitle PR title and the release note to "Recreate DaemonSet pods completed with Succeeded phase" rather than restart (since it is a slightly more accurate).

@alculquicondor
Copy link
Member

/retitle Fix: Recreate DaemonSet pods completed with Succeeded phase

/release-note-edit

Recreate DaemonSet pods completed with Succeeded phase

@k8s-ci-robot k8s-ci-robot changed the title Restart DaemonSet pods completed with Succeeded phase Fix: Recreate DaemonSet pods completed with Succeeded phase Apr 4, 2023
@alculquicondor
Copy link
Member

/cc @kow3ns
for approval.
Release leads are advising that we merge this by EoD

@k8s-ci-robot k8s-ci-robot requested a review from kow3ns April 4, 2023 20:17
@cici37
Copy link
Contributor

cici37 commented Apr 4, 2023

Since this is considered a release blocker and introduced in 1.27, adding label to merge it before 1.27.0-rc1 cut.
/milestone v1.27

@k8s-ci-robot k8s-ci-robot added this to the v1.27 milestone Apr 4, 2023
@alculquicondor
Copy link
Member

/hold cancel

@soltysh already approved and my LGTM is under my SIG Apps reviewer hat

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 4, 2023
@k8s-ci-robot k8s-ci-robot merged commit e7e7532 into kubernetes:master Apr 4, 2023
13 checks passed
@k8s-ci-robot k8s-ci-robot modified the milestones: v1.27, v1.28 Apr 4, 2023
@xmudrii
Copy link
Member

xmudrii commented Apr 4, 2023

/remove-milestone v1.28
/milestone v1.27

@k8s-ci-robot k8s-ci-robot modified the milestones: v1.28, v1.27 Apr 4, 2023
@liggitt
Copy link
Member

liggitt commented Apr 19, 2023

since 1.26 daemonset controllers can run against 1.27 API servers with 1.27 kubelets, shouldn't this be backported to 1.26?

@mimowo
Copy link
Contributor Author

mimowo commented Apr 20, 2023

I think we should indeed, based on https://kubernetes.io/releases/version-skew-policy/. I will open the cherry-pick PR

@mimowo
Copy link
Contributor Author

mimowo commented Apr 20, 2023

Ready: #117496

k8s-ci-robot added a commit that referenced this pull request Apr 27, 2023
…73-upstream-release-1.26

Automated cherry pick of #117073: Fix: Recreate DaemonSet pods completed with Succeeded phase
@mimowo mimowo deleted the fix-daemonset-pod-restarts branch November 29, 2023 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/regression Categorizes issue or PR as related to a regression from a prior release. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-blocker release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

daemonset stuck in Completed state after a reboot (with graceful kubelet shutdown)
10 participants