endpoints and endpointslices should not publish IPs for terminal pods #110115

aojea · 2022-05-18T15:10:52Z

Since 1.22 the pod phase lifecycle guarantees that terminal pods, those whose states are Unready or Succeeded , can not regress and will have all container stopped. Hence, terminal PodIPs will never been able to be reachable and should not be published on the endpoints or endpoints slices, independently of the TolerateUnready option

/kind bug

The pod phase lifecycle guarantees that terminal Pods, those whose states are Unready or Succeeded, can not regress and will have all container stopped. Hence, terminal Pods will never be reachable and should not publish their IP addresses on the Endpoints or EndpointSlices, independently of the Service TolerateUnready option.

Fixes: #109414, #109718

k8s-ci-robot · 2022-05-18T15:11:00Z

@aojea: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2022-05-18T15:11:55Z

/assign @thockin @robscott @smarterclayton

WIP to add e2e test that covers regressions, since is a considerable change in behavior,

pkg/controller/endpoint/endpoints_controller_test.go

aojea · 2022-05-26T18:02:59Z

addressed comments, please review

pkg/controller/endpoint/endpoints_controller.go

pkg/api/v1/pod/util.go

robscott

This looks good to me other than a few nits, thanks @aojea!

pkg/api/v1/pod/util.go

robscott · 2022-05-26T22:54:59Z

pkg/controller/endpoint/endpoints_controller.go

+		// tolerateUnreadyEndpoints is equal to service.Spec.PublishNotReadyAddresses only, the
+		// the difference with the endpointSlices controller, is that the later may consider terminating
+		// endpoints too. Ref: features.EndpointSliceTerminatingCondition


Instead of this comment, we could just use service.Spec.PublishNotReadyAddresses directly.

pkg/controller/util/endpoint/controller_utils.go

pkg/controller/util/endpoint/controller_utils_test.go

pkg/api/v1/pod/util.go

pods on phase succeeded or failed are guaranteed to have all containers stopped and to not ever regress

Terminal pods, whose phase its Failed or Succeeded, are guaranteed to never regress and to be stopped, so their IPs never should be published on the Endpoints.

k8s-ci-robot · 2022-05-27T04:44:19Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aojea
To complete the pull request process, please ask for approval from smarterclayton after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/api/OWNERS
pkg/controller/endpoint/OWNERS
~~pkg/controller/util/endpoint/OWNERS~~ [aojea]
pkg/kubelet/OWNERS
~~test/e2e/network/OWNERS~~ [aojea]
~~test/e2e/node/OWNERS~~ [aojea]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

robscott

I think this is the final round of review for me, other than these last two nits, I think this is good to go. Want @thockin to take a look though to be sure.

/assign @thockin

pkg/api/v1/pod/util.go

robscott · 2022-05-27T05:43:41Z

pkg/controller/util/endpoint/controller_utils.go

@@ -135,9 +135,15 @@ func DeepHashObjectToString(objectToWrite interface{}) string {
 	return hex.EncodeToString(hasher.Sum(nil)[0:])
 }

-// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an EndpointSlice object.
+// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an Endpoint or EndpointSlice object.
 // Terminating pods are only included if includeTerminating is true
 func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool {


Maybe Endpoints is a better term here due to being shorter and more clearly applying to both Endpoints and EndpointSlices (EndpointSlices contain an "Endpoints" list)

Suggested change

func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool {

func ShouldPodBeInEndpoints(pod *v1.Pod, includeTerminating bool) bool {

Agree, but it's minor

robscott · 2022-05-27T15:56:34Z

I can't approve anything more than @aojea already has here, but the bits in the Endpoint(Slice) controllers look right to me. Will defer to @smarterclayton and/or @thockin for the rest.

/lgtm

thockin

Overall I am good with this, just small things

pkg/api/v1/pod/util.go

thockin · 2022-05-27T15:54:09Z

pkg/controller/util/endpoint/controller_utils.go

@@ -135,9 +135,15 @@ func DeepHashObjectToString(objectToWrite interface{}) string {
 	return hex.EncodeToString(hasher.Sum(nil)[0:])
 }

-// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an EndpointSlice object.
+// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an Endpoint or EndpointSlice object.
 // Terminating pods are only included if includeTerminating is true
 func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool {


Agree, but it's minor

thockin · 2022-05-27T15:56:32Z

pkg/controller/util/endpoint/controller_utils.go

@@ -146,14 +149,6 @@ func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool {
 		return false
 	}

-	if pod.Spec.RestartPolicy == v1.RestartPolicyNever {


We've always tried to avoid describing overly-rigid state-machines because they make terrible APIs, but it seems reasonable that we clearly document SOMETHING here. There are non-terminal and terminal phases (states) and we should be clear about it.

thockin · 2022-05-27T16:29:17Z

test/e2e/network/service.go

+		cmd := fmt.Sprintf("/agnhost connect --timeout=3s %s", serviceAddress)
+
+		ginkgo.By(fmt.Sprintf("hitting service %v from pod %v on node %v expected to be refused", serviceAddress, podName, nodeName))
+		expectedErr := "REFUSED"


Is it better to probe the service and get refused, or just to look at the Endpoints & EPSlice resources?

Maybe both? What if we start by looking at Endpoints/EPS resources and if those both look right, continue on to the probe?

for e2e I always try to test the whole system and all the elements involved, we may have a bug in kube-proxy ... we can add an integration test for testing endpoints only

Since we already have an e2e test, can we just use that for both? I think it would be really helpful to know why the e2e test failed if it did. Just seeing the lack of a "REFUSED" response would require further debugging to understand why that happened.

ahh, got it , you mean in addition and I understood "instea of", you are right

aojea · 2022-05-27T22:05:16Z

superseded by #110255

Thanks Rob for bringing it to the finish line

/close

k8s-ci-robot · 2022-05-27T22:05:31Z

@aojea: Closed this PR.

In response to this:

superseded by #110255

Thanks Rob for bringing it to the finish line

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot requested review from caseydavenport and MrHohn May 18, 2022 15:11

k8s-ci-robot assigned robscott, smarterclayton and thockin May 18, 2022

aojea commented May 18, 2022

View reviewed changes

pkg/controller/endpoint/endpoints_controller_test.go Outdated Show resolved Hide resolved

aojea changed the title ~~[WIP] endpoints and endpointslices doesn't publish IPs for terminal pods~~ [WIP] endpoints and endpointslices don't publish IPs for terminal pods May 21, 2022

aojea force-pushed the pods_ips_eviction branch 2 times, most recently from 559d5e2 to d4f3cdd Compare May 22, 2022 10:40

aojea marked this pull request as draft May 22, 2022 11:58

aojea force-pushed the pods_ips_eviction branch from d4f3cdd to b20b903 Compare May 22, 2022 12:00

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 22, 2022

aojea marked this pull request as ready for review May 22, 2022 14:07

aojea changed the title ~~[WIP] endpoints and endpointslices don't publish IPs for terminal pods~~ endpoints and endpointslices don't publish IPs for terminal pods May 22, 2022

k8s-ci-robot added area/test and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels May 22, 2022

aojea mentioned this pull request May 26, 2022

"Terminated" pod on shutdown node listed in service edpoints. #109718

Closed

aojea force-pushed the pods_ips_eviction branch from b561d15 to 8f48520 Compare May 26, 2022 17:57

robscott reviewed May 26, 2022

View reviewed changes

pkg/controller/endpoint/endpoints_controller.go Outdated Show resolved Hide resolved

aojea force-pushed the pods_ips_eviction branch from 1e8ee5d to dac1f48 Compare May 26, 2022 20:25

swetharepakula reviewed May 26, 2022

View reviewed changes

pkg/api/v1/pod/util.go Show resolved Hide resolved

robscott reviewed May 26, 2022

View reviewed changes

bowei reviewed May 26, 2022

View reviewed changes

pkg/api/v1/pod/util.go Outdated Show resolved Hide resolved

aojea added 5 commits May 27, 2022 06:42

add pod util to verify pod is terminal

d16d23e

pods on phase succeeded or failed are guaranteed to have all containers stopped and to not ever regress

endpointslices: terminal pods doesn't receive enpoints

b905c28

endpoints controller: don't consider terminal endpoints

aa35f6f

Terminal pods, whose phase its Failed or Succeeded, are guaranteed to never regress and to be stopped, so their IPs never should be published on the Endpoints.

e2e test for evicted pods

ffdbce6

e2e: services with evicted pods doesn't have endpoints

3a8edca

aojea force-pushed the pods_ips_eviction branch from dac1f48 to 3a8edca Compare May 27, 2022 04:43

k8s-ci-robot added the area/kubelet label May 27, 2022

robscott reviewed May 27, 2022

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 27, 2022

thockin reviewed May 27, 2022

View reviewed changes

SergeyKanzhelev added this to Triage in SIG Node CI/Test Board May 27, 2022

SergeyKanzhelev added this to Triage in SIG Node PR Triage May 27, 2022

SergeyKanzhelev moved this from Triage to PRs - Needs Approver in SIG Node CI/Test Board May 27, 2022

SergeyKanzhelev moved this from Triage to Needs Approver in SIG Node PR Triage May 27, 2022

robscott mentioned this pull request May 27, 2022

Endpoints and EndpointSlices should not publish IPs for terminal pods #110255

Merged

k8s-ci-robot closed this May 27, 2022

SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done May 27, 2022

SIG Node PR Triage automation moved this from Needs Approver to Done May 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

endpoints and endpointslices should not publish IPs for terminal pods #110115

endpoints and endpointslices should not publish IPs for terminal pods #110115

aojea commented May 18, 2022 •

edited

k8s-ci-robot commented May 18, 2022

aojea commented May 18, 2022

aojea commented May 26, 2022

robscott left a comment

robscott May 26, 2022

k8s-ci-robot commented May 27, 2022

robscott left a comment

robscott May 27, 2022

thockin May 27, 2022

robscott commented May 27, 2022

thockin left a comment

thockin May 27, 2022

thockin May 27, 2022

thockin May 27, 2022

robscott May 27, 2022

aojea May 27, 2022

robscott May 27, 2022

aojea May 28, 2022

aojea commented May 27, 2022

k8s-ci-robot commented May 27, 2022

	func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool {
	func ShouldPodBeInEndpoints(pod *v1.Pod, includeTerminating bool) bool {

endpoints and endpointslices should not publish IPs for terminal pods #110115

endpoints and endpointslices should not publish IPs for terminal pods #110115

Conversation

aojea commented May 18, 2022 • edited

k8s-ci-robot commented May 18, 2022

aojea commented May 18, 2022

aojea commented May 26, 2022

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 27, 2022

robscott left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott commented May 27, 2022

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea commented May 27, 2022

k8s-ci-robot commented May 27, 2022

aojea commented May 18, 2022 •

edited