Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give terminal phase correctly to all pods that will not be restarted #115331

Merged

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Jan 26, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is needed to make sure all pods are eventually transitioned to a terminal phase (Failed or Succeeded). Currently, running pods with RestartPolicy equals OnFailure or Always are not transitioned to a terminal phase.
The case for OnFailure is problematic for jobs:

Which issue(s) this PR fixes:

Special notes for your reviewer:

This PR extends the scope from the PR to transition not only Pending pods, but also Running.

Does this PR introduce a user-facing change?

Give terminal phase correctly to all pods that will not be restarted. 

In particular, assign Failed phase to pods which are deleted while pending. Also, assign a terminal 
phase (Succeeded or Failed, depending on the exit statuses of the pod containers) to pods which
are deleted while running.

This fixes the issue for jobs using pod failure policy (with JobPodFailurePolicy and PodDisruptionConditions 
feature gates enabled) that their pods could get stuck in the pending phase when deleted.

ACTION REQUIRED: Users who maintain controllers which relied on the fact that pods with RestartPolicy=Always
never enter the Succeeded phase may need to adapt their controllers. This is because as a consequence of 
the change pods which use RestartPolicy=Always may end up in the Succeeded phase in two scenarios: pod 
deletion and graceful node shutdown.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/3329-retriable-and-non-retriable-failures 

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 26, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Jan 26, 2023

/retest

@mimowo mimowo force-pushed the kubelet-fail-pending-deleted-pods branch 5 times, most recently from a92f690 to 7b6e430 Compare January 27, 2023 12:07
@bart0sh bart0sh added this to WIP in SIG Node PR Triage Jan 27, 2023
@mimowo mimowo force-pushed the kubelet-fail-pending-deleted-pods branch 4 times, most recently from 649b111 to c4593d8 Compare February 8, 2023 07:48
@mimowo
Copy link
Contributor Author

mimowo commented Feb 8, 2023

/retest

@mimowo
Copy link
Contributor Author

mimowo commented Feb 13, 2023

/test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2

@mimowo mimowo force-pushed the kubelet-fail-pending-deleted-pods branch 2 times, most recently from c6c606f to 5f9af1f Compare February 24, 2023 10:32
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 24, 2023
@k8s-ci-robot k8s-ci-robot added release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 4, 2023
@Tusenka
Copy link

Tusenka commented Jun 10, 2023

Approved.
Follow you

1 similar comment
@Tusenka
Copy link

Tusenka commented Jun 10, 2023

Approved.
Follow you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/batch Categorizes an issue or PR as relevant to WG Batch.
Development

Successfully merging this pull request may close these issues.

Transition running deleted pods to a terminal phase