New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endpoints and endpointslices should not publish IPs for terminal pods #110115
Conversation
@aojea: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @thockin @robscott @smarterclayton WIP to add e2e test that covers regressions, since is a considerable change in behavior, |
559d5e2
to
d4f3cdd
Compare
addressed comments, please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me other than a few nits, thanks @aojea!
// tolerateUnreadyEndpoints is equal to service.Spec.PublishNotReadyAddresses only, the | ||
// the difference with the endpointSlices controller, is that the later may consider terminating | ||
// endpoints too. Ref: features.EndpointSliceTerminatingCondition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this comment, we could just use service.Spec.PublishNotReadyAddresses
directly.
pods on phase succeeded or failed are guaranteed to have all containers stopped and to not ever regress
Terminal pods, whose phase its Failed or Succeeded, are guaranteed to never regress and to be stopped, so their IPs never should be published on the Endpoints.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -135,9 +135,15 @@ func DeepHashObjectToString(objectToWrite interface{}) string { | |||
return hex.EncodeToString(hasher.Sum(nil)[0:]) | |||
} | |||
|
|||
// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an EndpointSlice object. | |||
// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an Endpoint or EndpointSlice object. | |||
// Terminating pods are only included if includeTerminating is true | |||
func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe Endpoints
is a better term here due to being shorter and more clearly applying to both Endpoints and EndpointSlices (EndpointSlices contain an "Endpoints" list)
func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool { | |
func ShouldPodBeInEndpoints(pod *v1.Pod, includeTerminating bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but it's minor
I can't approve anything more than @aojea already has here, but the bits in the Endpoint(Slice) controllers look right to me. Will defer to @smarterclayton and/or @thockin for the rest. /lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I am good with this, just small things
@@ -135,9 +135,15 @@ func DeepHashObjectToString(objectToWrite interface{}) string { | |||
return hex.EncodeToString(hasher.Sum(nil)[0:]) | |||
} | |||
|
|||
// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an EndpointSlice object. | |||
// ShouldPodBeInEndpointSlice returns true if a specified pod should be in an Endpoint or EndpointSlice object. | |||
// Terminating pods are only included if includeTerminating is true | |||
func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but it's minor
@@ -146,14 +149,6 @@ func ShouldPodBeInEndpointSlice(pod *v1.Pod, includeTerminating bool) bool { | |||
return false | |||
} | |||
|
|||
if pod.Spec.RestartPolicy == v1.RestartPolicyNever { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've always tried to avoid describing overly-rigid state-machines because they make terrible APIs, but it seems reasonable that we clearly document SOMETHING here. There are non-terminal and terminal phases (states) and we should be clear about it.
cmd := fmt.Sprintf("/agnhost connect --timeout=3s %s", serviceAddress) | ||
|
||
ginkgo.By(fmt.Sprintf("hitting service %v from pod %v on node %v expected to be refused", serviceAddress, podName, nodeName)) | ||
expectedErr := "REFUSED" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to probe the service and get refused, or just to look at the Endpoints & EPSlice resources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe both? What if we start by looking at Endpoints/EPS resources and if those both look right, continue on to the probe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for e2e I always try to test the whole system and all the elements involved, we may have a bug in kube-proxy ... we can add an integration test for testing endpoints only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already have an e2e test, can we just use that for both? I think it would be really helpful to know why the e2e test failed if it did. Just seeing the lack of a "REFUSED" response would require further debugging to understand why that happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, got it , you mean in addition and I understood "instea of", you are right
superseded by #110255 Thanks Rob for bringing it to the finish line /close |
@aojea: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Since 1.22 the pod phase lifecycle guarantees that terminal pods, those whose states are Unready or Succeeded , can not regress and will have all container stopped. Hence, terminal PodIPs will never been able to be reachable and should not be published on the endpoints or endpoints slices, independently of the TolerateUnready option
/kind bug
Fixes: #109414, #109718