Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow interception of port-forwarded traffic by making the proxy target the pod IP #231

Merged
merged 9 commits into from
Jul 13, 2023

Conversation

roobre
Copy link
Collaborator

@roobre roobre commented Jun 26, 2023

Description

Currently, the disruptor does not support disrupting local traffic, such as that generated through kubectl port-forward.

The main reason for this is that the agent running in the target pods redirects the traffic coming from the eth0 interface, which is the expected path for the traffic coming from other pods or from outside the cluster.

If we try to address this limitation by redirecting traffic from the lo interface, we would also be re-redirecting the egress traffic from the agrent back to itself.

This PR tackles the problem of differentiating the proxy traffic from port-forwarded traffic by not using the network interface as a way of identifying the source of the traffic. Instead, the proxy will differentiate the traffic using the source and destination IPs and will use the pod's IP as a target for the redirected traffic.

This is an alternative fix for #214.

Checklist:

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run linter locally (make lint) and all checks pass.
  • I have run tests locally (make test) and all tests pass.
  • I have run relevant e2e test locally (make e2e-xxx for agent, disruptors, kubernetes or cluster related changes)
  • Any dependent changes have been merged and published in downstream modules

@roobre roobre changed the title Upstream ip Allow interception of port-forwarded traffic by making the proxy target the pod IP Jun 26, 2023
@roobre roobre force-pushed the upstream-ip branch 3 times, most recently from 9e9b52a to 5586ff2 Compare June 26, 2023 16:59
@roobre roobre force-pushed the upstream-ip branch 4 times, most recently from 2dd1d3d to a9b0805 Compare July 6, 2023 10:25
@roobre
Copy link
Collaborator Author

roobre commented Jul 6, 2023

Still pending some more manual testing, but e2e tests pass. It should be ready for a first review.
Changeset is on the medium-large side, I'd suggest to review commit-by-commit as they should be pretty much self-contained.

@roobre roobre marked this pull request as ready for review July 6, 2023 10:26
@roobre roobre requested a review from pablochacin July 6, 2023 10:26
@pablochacin
Copy link
Collaborator

pablochacin commented Jul 6, 2023

@roobre I would update the description of the PR. I don't think that explaining the differences with the previous PR adds value and I found it distracting.

@pablochacin
Copy link
Collaborator

@roobre Another general comment. I've seen you change the type of ports from unit to string. This is in parameters and structs. Ports are expected to be uints. Changing the type removes domain information and I don't see how benefits the code, beyond maybe saving a type conversion. Could you please elaborate on the rationale for this change?

@roobre
Copy link
Collaborator Author

roobre commented Jul 6, 2023

Changing the type removes domain information and I don't see how benefits the code, beyond maybe saving a type conversion. Could you please elaborate on the rationale for this change?

I didn't see the point of storing something as integer when it is used as an string everywhere. Moreover, I think ports are better modeled as strings rather than numbers, as applications should tolerate using IANA-defined port names (although this is rarely used in practice.

That being said, I agree the change is not very relevant for this PR, so I have reverted it.

@pablochacin
Copy link
Collaborator

Moreover, I think ports are better modeled as strings rather than numbers, as applications should tolerate using IANA-defined port names (although this is rarely used in practice.

Even if this were valid in a general sense (I won't enter in discussing this) I don't see how this is relevant to the use cases of the disruptor and in particular, of the agent.

I think keeping code aligned to the use cases and not over-generalizing it decreases the cognitive load needed for understanding it.

@pablochacin pablochacin marked this pull request as draft July 7, 2023 09:26
@roobre
Copy link
Collaborator Author

roobre commented Jul 7, 2023

Manual testing done, everything seems to be working pretty well. The only thing I noticed is #231 (comment), I'll give it a swing as it should be fairly easy to restore that behavior.

Script used for manual testing
import  http from 'k6/http';
import { PodDisruptor } from 'k6/x/disruptor';
import { check } from 'k6';

export const options = {
    scenarios: {
        load: {
            executor: 'constant-arrival-rate',
            rate: 100,
            preAllocatedVUs: 10,
            maxVUs: 100,
            exec: "default",
            startTime: "10s", // Delay load for a few seconds to give disruptor time to start.
            duration: "20s",
        },
        disrupt: {
            executor: 'shared-iterations',
            iterations: 1,
            vus: 1,
            exec: "disrupt",
            startTime: "0s",
        }
    }
}

export default function(data) {
    const resp = http.get(`http://${__ENV.SVC_IP}/`);
    check(resp.status, {
        'Injection succeeds': (status) => status === 418,
        'Injection passes through': (status) => status === 200
    })
}

export function disrupt(data) {
    if (__ENV.SKIP_FAULTS == "1") {
        return
    }

    const selector = {
        namespace: "default",
        select: {
            labels: {
                app: "nginx"
            }
        }
    }
    const podDisruptor = new PodDisruptor(selector, {
        injectTimeout: "5s"
    })

    if (podDisruptor.targets().length === 0) {
        throw new Error("Disruptor has no targets")
    }

    // delay traffic from one random replica of the deployment
    const fault = {
        averageDelay: "500ms",
        errorCode: 418,
        errorRate: 0.5
    }

    podDisruptor.injectHTTPFaults(fault, "30s")
}

@roobre roobre marked this pull request as ready for review July 7, 2023 14:24
Copy link
Collaborator

@pablochacin pablochacin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Some minor comments and a couple of questions. I still have to run tests locally.

e2e/disruptors/service_e2e_test.go Outdated Show resolved Hide resolved
pkg/iptables/iptables.go Outdated Show resolved Hide resolved
e2e/disruptors/pod_e2e_test.go Outdated Show resolved Hide resolved
e2e/disruptors/pod_e2e_test.go Outdated Show resolved Hide resolved
pkg/disruptors/pod_test.go Outdated Show resolved Hide resolved
pkg/iptables/iptables.go Show resolved Hide resolved
e2e/disruptors/service_e2e_test.go Outdated Show resolved Hide resolved
e2e/agent/agent_e2e_test.go Outdated Show resolved Hide resolved
@roobre
Copy link
Collaborator Author

roobre commented Jul 11, 2023

I think we're past the point where fixup! commits are useful, as the original commits are out of my terminal's viewport. I'll manually squash/fixup commits once the PR gets approved.

}
})

t.Run("via port-forward", func(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are repeating here the same problem as when the tests were in the disruptor e2e test, putting them as a sub-test of the fault injection instead of an independent test. However, this is easy to change in a follow up PR so I will not ask to change this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main goal was to reuse the setup, as it is largely identical, but I don't have a strong opinion on it. We can split-and-copypaste if that's preferred.

pablochacin
pablochacin previously approved these changes Jul 12, 2023
Copy link
Collaborator

@pablochacin pablochacin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general. LGTM. Two minor comments.

@pablochacin pablochacin dismissed their stale review July 12, 2023 15:43

Code needs to be squashed before accepting PR

Copy link
Collaborator

@pablochacin pablochacin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits to clean history.

roobre and others added 9 commits July 13, 2023 11:46
iptables: redirect using two different chains and REDIRECT
Additionally, expect transparent address without cidr suffix, which is hardcoded to 32.
It is not possible to use a single rule as local traffic will only flow through OUTPUT, while external traffic flows through PREROUTING.

iptables: aggregate errors with homemade logic instead of errors.Join
cmd/grpc: fix wrong "http://" prefix

cmds: remove now-redundant nolint: dupl
Signed-off-by: Pablo Chacin <pablochacin@gmail.com>
@roobre
Copy link
Collaborator Author

roobre commented Jul 13, 2023

I've squashed the commits into the original set, which I think make sense and are pretty disjoint.

@roobre roobre requested a review from pablochacin July 13, 2023 09:51
Copy link
Collaborator

@pablochacin pablochacin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pablochacin pablochacin merged commit 43917d2 into main Jul 13, 2023
6 checks passed
@pablochacin pablochacin deleted the upstream-ip branch July 13, 2023 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants