Skip to content
This repository was archived by the owner on May 6, 2020. It is now read-only.

CI: Call the ksm throttler installation #923

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

chavafg
Copy link
Contributor

@chavafg chavafg commented Feb 22, 2018

We need to call install_ksm_throttler.sh into
the main CC setup script.

Fixes: #922.

Signed-off-by: Salvador Fuentes [email protected]

@@ -45,6 +45,9 @@ bash -f ${cidir}/install_shim.sh
echo "Install proxy"
bash -f ${cidir}/install_proxy.sh

echo "Install ksm throttler"
bash -f ${cidir}/install_ksm_throttler.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the chmod to the install_ksm_throttler.sh in this commit, do you still need the bash -f ?

Copy link
Contributor

@grahamwhaley grahamwhaley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fundamentally lgtm.
I'll keep an eye on the metrics CI to see if we see the effect take place...

@clearcontainersbot
Copy link

kubernetes qa-failed 👎

@grahamwhaley
Copy link
Contributor

CIs failing:

14:48:26      GOBUILD  virtcontainers
14:48:29 make: Circular ksm-throttler.service.in <- ksm-throttler.service.in dependency dropped.
14:48:29      GEN      ksm-throttler.service
14:48:29      GEN      vc-throttler.service
14:48:29      INSTALL  install
14:48:30      INSTALL  install
14:48:30      INSTALL  install
14:48:30      INSTALL  install
14:48:30 ~/jenkins_slave/workspace/clear-containers-tests-fedora-26-PR/go/src/github.com/clearcontainers/tests
14:48:30 Failed to enable unit: The name org.freedesktop.PolicyKit1 was not provided by any .service files
14:48:30 Build step 'Execute shell' marked build as failure
14:48:30 Performing Post build task...
14:48:30 Could not match :Build was aborted  : False
14:48:30 Match found for :Build step 'Execute shell' marked build as failure : True

@chavafg chavafg force-pushed the topic/call-ksm-throttler-install branch from 6f3401e to 5ea0bae Compare February 22, 2018 15:12
@chavafg
Copy link
Contributor Author

chavafg commented Feb 22, 2018

I think sudo was missing :(

@clearcontainersbot
Copy link

kubernetes qa-passed 👍

@chavafg
Copy link
Contributor Author

chavafg commented Feb 22, 2018

I see a network issue in the metrics CI job... relaunching.

@clearcontainersbot
Copy link

kubernetes qa-passed 👍

@chavafg
Copy link
Contributor Author

chavafg commented Feb 22, 2018

I see that metrics now passed on job: https://clearlinux.org/cc-ci/job/clear-containers-tests-16.04-PR/68/console, but status was not updated.

@grahamwhaley
Copy link
Contributor

Yeah, something looks like it did not update. I see you might have kicked another build as well?

Anyway, looking at that log, it looks like stuff maybe did not do quite what we expected with the KSM. The summary part of the log where we get something like:

+------+---------------------------------+------------+------------+------------+---------+-------+-----------+---------+
| P/F  |              NAME               |   FLOOR    |    MEAN    |  CEILING   |   GAP   | ITERS |    SD     |   COV   |
+------+---------------------------------+------------+------------+------------+---------+-------+-----------+---------+
| Pass | memory-footprint-ksm            |  40000.000 |  99468.240 | 200000.000 | 400.0 % |     2 | 34068.400 | 34.25 % |
+------+---------------------------------+------------+------------+------------+---------+-------+-----------+---------+

If the KSM en/disabling is working for the metrics-CI, then we should only get a '1' in the iters column for the memory-footprint-ksm. The fact we have a '2' means when it did the run to measure without KSM enabled, it found KSM was enabled (and hence added the result into this row...).

Let's see what happens in the next run - and if that still does not work then I'll go look at the logs and on the machine..

@grahamwhaley
Copy link
Contributor

@chavafg hmm, yeah, maybe there is something not quite right with the jenkins rebuild integration... If we look at the first build (Jenkins#67), we see the status update push to github like:

07:12:19 Setting status of 5ea0bae5a6a4cf0d93e8f6c9688b3f28841dbd40 to PENDING with url https://clearlinux.org/cc-ci/job/clear-containers-tests-16.04-PR/67/ and message: 'Build running'
...
07:17:09 Setting status of 5ea0bae5a6a4cf0d93e8f6c9688b3f28841dbd40 to FAILURE with url https://clearlinux.org/cc-ci/job/clear-containers-tests-16.04-PR/67/ and message: 'Build finished. '

but if we look at the log from (Jenkins#68) - the rebuild, then those lines just don't appear - hence, no update on this page... odd.

So, I'd like to kick off a rebuild - but, before I do that, I'll have a look to see why we still get '2 iters' for the memory footprint test, which indicates the fix is not working...

@grahamwhaley
Copy link
Contributor

Ah, @chavafg - maybe this is the clue in the logs why I still have KSM actice:

07:13:59 Warning: cc-proxy.service changed on disk. Run 'systemctl daemon-reload' to reload units.

Probably because we have not landed clearcontainers/proxy#177 yet (as we are sort of racing PRs...), maybe we need an addition similar to:

sudo systemctl daemon-reload
sudo systemctl stop cc-proxy || true
sudo systemctl disable cc-proxy || true

I guess in reality we should do a daemon-reload after every install system units iyswim?

@chavafg chavafg force-pushed the topic/call-ksm-throttler-install branch from 5ea0bae to be29bd2 Compare February 23, 2018 16:17
@chavafg
Copy link
Contributor Author

chavafg commented Feb 23, 2018

Ohhh right, as systemd files already existed in the system, we need to add the daemon-reload. Updated the PR, lets see how it goes.

@clearcontainersbot
Copy link

kubernetes qa-passed 👍

@grahamwhaley
Copy link
Contributor

ho hum, we failed the metrics again. Well, the fail is actually due to an unrelated test fail (storage linear read came out smaller than we expected). The real issue is that we still get two KSM tests - you can find two of:

===== starting test [memory footprint ksm] =====

in the log, when there should be one of those and one:

===== starting test [memory footprint] =====

I'll have to look at that next week again. mumble....

@grahamwhaley
Copy link
Contributor

oh, we got an odd CRI-O related fail on F26 as well I think:

16:42:38 # rm: cannot remove '/tmp/tmp.DoOJFfh8Ne/crio/overlay': Device or resource busy
16:43:09 ok 16 ctr execsync std{out,err}
16:48:09 Build timed out (after 5 minutes). Marking the build as aborted.

@chavafg
Copy link
Contributor Author

chavafg commented Feb 23, 2018

yeap, odd crio failure, I did a job restart, but I'll keep an eye on the cri-o tests to see if the failure is reproducible.

@chavafg
Copy link
Contributor Author

chavafg commented Feb 23, 2018

adding the dnm label until we know why ksm is still active

@grahamwhaley
Copy link
Contributor

Sorry for the delay here @chavafg

I had a look on the CI system, and I see:

$ ps -ef | fgrep proxy
root     50267     1  0 Feb27 ?        00:00:00 /usr/libexec/clear-containers/cc-proxy
# systemctl status cc-proxy
● cc-proxy.service - Clear Containers Proxy
   Loaded: loaded (/lib/systemd/system/cc-proxy.service; disabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-02-27 15:00:35 PST; 16h ago
     Docs: https://github.com/clearcontainers/proxy
 Main PID: 50267 (cc-proxy)
    Tasks: 6
   Memory: 7.2M
      CPU: 41ms
   CGroup: /system.slice/cc-proxy.service
           └─50267 /usr/libexec/clear-containers/cc-proxy

Feb 27 15:00:35 myserver systemd[1]: Started Clear Containers Proxy.
Feb 27 15:00:35 myserver cc-proxy[50267]: starting in system mode

Reading the systemctl manual, it suggests for a disable, that will not invoke a stop unless you add a --now - or you do the stop afterwards. We have:

sudo systemctl stop cc-proxy || true
sudo systemctl disable cc-proxy || true

So, maybe we want to add a --now to the disable, and/or swap those lines over?

Having said that, this also refers to units - and (I'm no systemd expert), it is not clear to me if this is a unit or a service, and/or if they are really that different etc.

I can do a trivial quick fix here if we want to expedite, by just going and removing the cc-proxy systemd files that are dangling on the metrics CI servers - what do you think (and /cc @jodh-intel for any thoughts).

We need to call install_ksm_throttler.sh into
the main CC setup script.

Fixes: clearcontainers#922.

Signed-off-by: Salvador Fuentes <[email protected]>
@chavafg chavafg force-pushed the topic/call-ksm-throttler-install branch from be29bd2 to c132ec9 Compare March 14, 2018 13:42
@chavafg
Copy link
Contributor Author

chavafg commented Mar 14, 2018

Hi @grahamwhaley

I swapped the systemctl stop and systemctl disable lines. Lets see how it goes.

Run systemctl daemon-reload, in case the service
files already existed and were installed again.
Also we need to start systemd services with sudo.
Otherwise, it will fail.

Signed-off-by: Salvador Fuentes <[email protected]>
@chavafg chavafg force-pushed the topic/call-ksm-throttler-install branch from c132ec9 to a302fab Compare March 14, 2018 14:33
@chavafg
Copy link
Contributor Author

chavafg commented Mar 14, 2018

Well, we are still getting two iterations on the ksm test

07:57:38 | Pass | memory-footprint-ksm            |  40000.000 | 155059.355 | 200000.000 | 400.0 % |     2 | 78059.265 | 50.34 % |

mcastelino pushed a commit to mcastelino/tests that referenced this pull request Jan 23, 2019
We enable full debug in the CI install scripts, but that
costs us about 0.25s in boot time. Do not enable the debug
if we are doing a METRICS_CI build/run.

Fixes: clearcontainers#923

Signed-off-by: Graham Whaley <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants