Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the order of the parameters to be reported to the ccp agent on Report function #27

Open
yangxiaomaomao opened this issue Mar 23, 2022 · 4 comments

Comments

@yangxiaomaomao
Copy link

yangxiaomaomao commented Mar 23, 2022

While I am just starting using ccp, I copy your guide page's example code to verify I have done the preparation work, just like this

class AIMD(portus.AlgBase):

    def datapath_programs(self):
        return {
                "default" : """\
                (def (Report
                    (volatile acked 0) 
                    (volatile sacked 0) 
                    (volatile loss 0) 
                    (volatile timeout false)
                    (volatile rtt 0)
                    (volatile inflight 0)
                    (volatile accum_lost 0)   #LOOK HERE
                ))
                (when true
                    (:= Report.inflight Flow.packets_in_flight)
                    (:= Report.rtt Flow.rtt_sample_us)
                    (:= Report.acked  Ack.bytes_acked)
                    (:= Report.sacked (+ Report.sacked Ack.packets_misordered))
                    (:= Report.loss Ack.lost_pkts_sample)
                    (:= Report.timeout Flow.was_timeout)
                    (:= Report.accum_lost (+ Report.accum_lost Ack.lost_pkts_sample))   #LOOK HERE
                    (fallthrough)
                )
                (when (|| Report.timeout (> Report.loss 0))
                    (report)
                    (:= Micros 0)
                )
                (when (> Micros Flow.rtt_sample_us)
                    (report)
                    (:= Micros 0)  
                )
            """
        }

    def new_flow(self, datapath, datapath_info):
        return AIMDFlow(datapath, datapath_info)

As above, I add a parameter named accum_lost to report, and then in the class AIMDFlow, I use print(r.accum_lost) to see its value, just like the following:

def on_report(self, r):
        
        if r.loss > 0 or r.sacked > 0:
            self.cwnd /= 2
        else:
            self.cwnd += (self.datapath_info.mss * (r.acked / self.cwnd))
        
        self.cwnd = max(self.cwnd, self.init_cwnd)
        print(r.accum_lost)
        
        self.datapath.update_field("Cwnd", int(self.cwnd))

then I run alg.py and use iperf3 to test, here are the bugs I encounter
image
Then I exchange the order of parameters like this(inflight and accum_lost) with all others unchanged and avoid the bug and make it, but I don't know the reason about it, maybe it is a bug of project?

(def (Report
    (volatile acked 0) 
    (volatile sacked 0) 
    (volatile loss 0) 
    (volatile timeout false)
    (volatile rtt 0)
    (volatile accum_lost 0)   #LOOK HERE and compare the parameter order with the above code
    (volatile inflight 0)
))

Waiting for your reply!

@akshayknarayan
Copy link
Member

Hi, thanks for the report.
Assuming you are using ccp-kernel, is there any output from ccp in /var/log/syslog? Does this occur if you re-load the kernel module before re-running alg.py? Are the values of the other measurement variables correct once you re-order their initialization?

@yangxiaomaomao
Copy link
Author

Hi, thanks for the report. Assuming you are using ccp-kernel, is there any output from ccp in /var/log/syslog? Does this occur if you re-load the kernel module before re-running alg.py? Are the values of the other measurement variables correct once you re-order their initialization?

The following picture shows what is recorded in /var/log/syslog when I just fail to run the code above, maybe it looks ok?
image

I just reload the ccp kernel module(run the unload and load script) as you say, and it works, the same code can't be run before reloading. Then I try to exchange the order again to test, but I fail again just as I first mentioned. Finally I reload it again and it works again, (by the way, the values of the other measurement variables seem correct). (environment: ubuntu20.04 kernel version: 5.13.0-35-generic)

@akshayknarayan
Copy link
Member

cc @fcangialosi

Thanks for the info. I believe that this is a bug - I can reproduce your issue, explain it, and give a workaround. I can also suggest, but do not currently have time to implement, a more permanent fix. I would be happy to merge a PR addressing this.

Explanation

The design of ccp changed ~1 year ago from datapath-centric to algorithm-runtime-centric. This change was made to help support the integration of ccp into the mvfst datapath. That is, before the startup flow was like this:

  1. datapath is running, with ccp support
  2. ccp algorithm runtime (e.g., your alg.py) starts and sends a message to the datapath, which contains the datapath programs to install.
  3. when a new flow starts, it uses the installed datapath program to operate, communicate with the algorithm runtime, etc.

The new way is:

  1. ccp algorithm runtime starts.
  2. the datapath starts, and sends a "ready" message to the algorithm runtime. in response, the algorithm runtime sends the "install" message back with any datapath programs.
  3. when a new flow starts, everything is as before.

so now this is what happens when a datapath has just started:

➜ sudo RUST_LOG=info ./bin/python aimd.py
Mar 24 15:38:52.052  INFO pyportus: starting CCP ipc="netlink"
Mar 24 15:38:52.052  INFO portus::run: starting CCP ipc="netlink"
Mar 24 15:38:58.072  INFO portus::run: found new datapath, installing programs addr=()

then, when we change the datapath program as you tried to do:

➜ sudo RUST_LOG=info ./bin/python aimd.py
Mar 24 15:33:16.502  INFO pyportus: starting CCP ipc="netlink"
Mar 24 15:33:16.502  INFO portus::run: starting CCP ipc="netlink"
Traceback (most recent call last):
  File "aimd.py", line 20, in on_report
    print(f"acked {r.acked} rtt {r.rtt} inflight {r.inflight} foo {r.foo}")
Exception: Failed to get inflight: portus err: the requested field is in scope but was not found in the report

The datapath does not send a "ready" message, because it is the same datapath that was running before. The ccp algorithm runtime does not install new datapath programs, because it does not receive a "ready" message (installing programs logic is here). So when a new flow arrives, it uses the datapath program from the previous ccp algorithm runtime, which in this case has many of the same fields but of course not the new one, and this is why the new field is not present in the report.

Workaround + Suggested Fixes

The easiest workaround for now is to reload the kernel module when the datapath program changes.
A better fix to this problem might be to do initialization when either the alg. runtime or the datapath starts. Another way might be hashing the datapath programs to determine whether the one the ccp algorithm wants to use is available in the datapath, and installing it if not.

Hope this is helpful.

@yangxiaomaomao
Copy link
Author

You do help me. Thank you very much for suggesting and explaining it in detail, I do appreciate your ccp-project. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants