Skip to content

Latest commit

 

History

History
248 lines (200 loc) · 22.3 KB

PodCustomizations.md

File metadata and controls

248 lines (200 loc) · 22.3 KB

Pod Customizations

This page is here to provide documentation of non-standard-K8s functionality that can be used with virtual node pods!

Table of Contents

Pod Annotation Short Summary Doc Link
microsoft.containerinstance.virtualnode.ccepolicy Run in Confidential ACI with provided policy Confidential Containers
microsoft.containerinstance.virtualnode.subnets.primary Run within a specific Subnet Subnet Override
microsoft.containerinstance.virtualnode.identity Run using a provided Azure Identity Managed Identity
microsoft.containerinstance.virtualnode.injectkubeproxy Controlling Kube-Proxy Usage Kube-Proxy
microsoft.containerinstance.virtualnode.injectdns Controlling K8s DNS Usage K8s DNS
microsoft.containerinstance.virtualnode.zones Requesting Azure Zone Deployment Zones
microsoft.containerinstance.virtualnode.imagecachepod Image caching request for Standby Pools Image Caching
virtual node Downlevel API Short Summary Doc Link
===VIRTUALNODE2.CC.THIM.ENDPOINT=== Replaced with THIM Endpoint THIM Downlevel APIs
===VIRTUALNODE2.CC.THIM.ADDRESS=== Replaced with THIM Address THIM Downlevel APIs

Controlling Behaviors through Pod Annotations

The general method for controlling non-K8s behavior of virtual nodes at the pod level is via pod annotations.

GENERAL NOTE: Annotations below all need to be applied to the appropriate part of the K8s resource so that they will be on the pods themselves. For a pod YAML file, this would be the metadata for the file itself, while for a Deployment / ScaleSet / etc. YAML the annotation would be in the template's metadata.

Example of annotations for Pod YAML (it's in the main metadata!)

apiVersion: v1
kind: Pod
metadata:
  annotations:   
    microsoft.containerinstance.virtualnode.injectdns: "false"
  name: demo-pod
spec:
  containers:
  - command:
    - /bin/bash
    - -c
    - 'counter=1; while true; do echo "Hello, World! Counter: $counter"; counter=$((counter+1)); sleep 1; done'
    image: mcr.microsoft.com/azure-cli
    name: hello-world-counter
    resources:
      limits:
        cpu: 2250m
        memory: 2256Mi
      requests:
        cpu: 100m
        memory: 128Mi
  nodeSelector:
    virtualization: virtualnode2
  tolerations:
  - effect: NoSchedule
    key: virtual-kubelet.io/provider
    operator: Exists

Example of annotations for Deployment YAML (it's in the template metadata!)

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  labels:
    type: scaletest
  name: deploy-alpine
spec:
  replicas: 3
  selector:
    matchLabels:
      type: scaletest
  template:
    metadata:
      annotations:
        microsoft.containerinstance.virtualnode.injectkubeproxy: 'false'
      labels:
        type: scaletest
    spec:
      containers:
      - image: mcr.microsoft.com/oss/nginx/nginx:1.17.3-alpine
        name: mypod
        resources:
          limits:
            cpu: 2250m
            memory: 2256Mi
          requests:
            cpu: 100m
            memory: 128Mi
      nodeSelector:
        virtualization: virtualnode2
      tolerations:
      - effect: NoSchedule
        key: virtual-kubelet.io/provider
        operator: Exists

Confidential Containers

Confidential containers are a high security offering from ACI that allows customers to have a high degree of confidence what they are running and what that image is allowed to do.

Overview of Confidential Containers on ACI

In order to have virtual node create your containers as Confidential, you must add a pod annotation which will contain the CCE policy the pod will run using:

microsoft.containerinstance.virtualnode.ccepolicy

In order to generate that policy, utilize the ConfCom extension which can be added into Az CLI. To add it, run:

az extension add -n confcom

Using that tool for virtual nodes is simple, just provide your YAML file with the --virtual-node-yaml parameter like so:

az confcom acipolicygen --virtual-node-yaml <yourYamlFile>.yaml

This will not only generate the CCE policy, but it will inject the policy annotation into the right section of the file.

Example Confidential YAML

apiVersion: v1
kind: Pod
metadata:
  annotations:
    microsoft.containerinstance.virtualnode.ccepolicy: package policy

import future.keywords.every
import future.keywords.in

api_version := "0.10.0"
framework_version := "0.2.3"

fragments := [
  {
    "feed": "mcr.microsoft.com/aci/aci-cc-infra-fragment",
    "includes": [
      "containers",
      "fragments"
    ],
    "issuer": "did:x509:0:sha256:I__iuL25oXEVFdTP_aBLx_eT1RPHbCQ_ECBQfYZpt9s::eku:1.3.6.1.4.1.311.76.59.1.3",
    "minimum_svn": "1"
  }
]

containers := [{"allow_elevated":false,"allow_stdio_access":true,"capabilities":{"ambient":[],"bounding":["CAP_AUDIT_WRITE","CAP_CHOWN","CAP_DAC_OVERRIDE","CAP_FOWNER","CAP_FSETID","CAP_KILL","CAP_MKNOD","CAP_NET_BIND_SERVICE","CAP_NET_RAW","CAP_SETFCAP","CAP_SETGID","CAP_SETPCAP","CAP_SETUID","CAP_SYS_CHROOT"],"effective":["CAP_AUDIT_WRITE","CAP_CHOWN","CAP_DAC_OVERRIDE","CAP_FOWNER","CAP_FSETID","CAP_KILL","CAP_MKNOD","CAP_NET_BIND_SERVICE","CAP_NET_RAW","CAP_SETFCAP","CAP_SETGID","CAP_SETPCAP","CAP_SETUID","CAP_SYS_CHROOT"],"inheritable":[],"permitted":["CAP_AUDIT_WRITE","CAP_CHOWN","CAP_DAC_OVERRIDE","CAP_FOWNER","CAP_FSETID","CAP_KILL","CAP_MKNOD","CAP_NET_BIND_SERVICE","CAP_NET_RAW","CAP_SETFCAP","CAP_SETGID","CAP_SETPCAP","CAP_SETUID","CAP_SYS_CHROOT"]},"command":["nginx","-g","daemon off;"],"env_rules":[{"pattern":"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","required":false,"strategy":"string"},{"pattern":"NGINX_VERSION=1.17.3","required":false,"strategy":"string"},{"pattern":"NJS_VERSION=0.3.5","required":false,"strategy":"string"},{"pattern":"PKG_RELEASE=1","required":false,"strategy":"string"},{"pattern":"TERM=xterm","required":false,"strategy":"string"},{"pattern":"(?i)(FABRIC)_.+=.+","required":false,"strategy":"re2"},{"pattern":"HOSTNAME=.+","required":false,"strategy":"re2"},{"pattern":"T(E)?MP=.+","required":false,"strategy":"re2"},{"pattern":"FabricPackageFileName=.+","required":false,"strategy":"re2"},{"pattern":"HostedServiceName=.+","required":false,"strategy":"re2"},{"pattern":"IDENTITY_API_VERSION=.+","required":false,"strategy":"re2"},{"pattern":"IDENTITY_HEADER=.+","required":false,"strategy":"re2"},{"pattern":"IDENTITY_SERVER_THUMBPRINT=.+","required":false,"strategy":"re2"},{"pattern":"azurecontainerinstance_restarted_by=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_SERVICE_HOST=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_SERVICE_PORT=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_SERVICE_PORT_[A-Z0-9_]+=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_PORT=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_PORT_[0-9]+_TCP=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_PORT_[0-9]+_TCP_PROTO=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_PORT_[0-9]+_TCP_PORT=.+","required":false,"strategy":"re2"},{"pattern":"[A-Z0-9_]+_PORT_[0-9]+_TCP_ADDR=.+","required":false,"strategy":"re2"}],"exec_processes":[{"command":["/bin/sh"],"signals":[]},{"command":["/bin/bash"],"signals":[]}],"id":"mcr.microsoft.com/oss/nginx/nginx:1.17.3-alpine","layers":["7f062c5ebb3dc6d3df7f25fa687bbec0f61530536267ad6d6afa32501f5340a6","297dd26b51191f85928508fb368e6b064502c128be6f51fc5cb302d3b253d730"],"mounts":[{"destination":"/var/run/secrets/kubernetes.io/serviceaccount","options":["rbind","rshared","ro"],"source":"sandbox:///tmp/atlas/emptydir/.+","type":"bind"},{"destination":"/etc/hosts","options":["rbind","rshared","rw"],"source":"sandbox:///tmp/atlas/emptydir/.+","type":"bind"},{"destination":"/dev/termination-log","options":["rbind","rshared","rw"],"source":"sandbox:///tmp/atlas/emptydir/.+","type":"bind"},{"destination":"/etc/hostname","options":["rbind","rshared","rw"],"source":"sandbox:///tmp/atlas/emptydir/.+","type":"bind"},{"destination":"/etc/resolv.conf","options":["rbind","rshared","rw"],"source":"sandbox:///tmp/atlas/emptydir/.+","type":"bind"}],"name":"mypod","no_new_privileges":false,"seccomp_profile_sha256":"","signals":[15],"user":{"group_idnames":[{"pattern":"","strategy":"any"}],"umask":"0022","user_idname":{"pattern":"","strategy":"any"}},"working_dir":"/"},{"allow_elevated":false,"allow_stdio_access":true,"capabilities":{"ambient":[],"bounding":["CAP_CHOWN","CAP_DAC_OVERRIDE","CAP_FSETID","CAP_FOWNER","CAP_MKNOD","CAP_NET_RAW","CAP_SETGID","CAP_SETUID","CAP_SETFCAP","CAP_SETPCAP","CAP_NET_BIND_SERVICE","CAP_SYS_CHROOT","CAP_KILL","CAP_AUDIT_WRITE"],"effective":["CAP_CHOWN","CAP_DAC_OVERRIDE","CAP_FSETID","CAP_FOWNER","CAP_MKNOD","CAP_NET_RAW","CAP_SETGID","CAP_SETUID","CAP_SETFCAP","CAP_SETPCAP","CAP_NET_BIND_SERVICE","CAP_SYS_CHROOT","CAP_KILL","CAP_AUDIT_WRITE"],"inheritable":[],"permitted":["CAP_CHOWN","CAP_DAC_OVERRIDE","CAP_FSETID","CAP_FOWNER","CAP_MKNOD","CAP_NET_RAW","CAP_SETGID","CAP_SETUID","CAP_SETFCAP","CAP_SETPCAP","CAP_NET_BIND_SERVICE","CAP_SYS_CHROOT","CAP_KILL","CAP_AUDIT_WRITE"]},"command":["/pause"],"env_rules":[{"pattern":"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin","required":true,"strategy":"string"},{"pattern":"TERM=xterm","required":false,"strategy":"string"}],"exec_processes":[],"layers":["16b514057a06ad665f92c02863aca074fd5976c755d26bff16365299169e8415"],"mounts":[],"no_new_privileges":false,"seccomp_profile_sha256":"","signals":[],"user":{"group_idnames":[{"pattern":"","strategy":"any"}],"umask":"0022","user_idname":{"pattern":"","strategy":"any"}},"working_dir":"/"}]

allow_properties_access := true
allow_dump_stacks := true
allow_runtime_logging := true
allow_environment_variable_dropping := true
allow_unencrypted_scratch := false
allow_capability_dropping := true

mount_device := data.framework.mount_device
unmount_device := data.framework.unmount_device
mount_overlay := data.framework.mount_overlay
unmount_overlay := data.framework.unmount_overlay
create_container := data.framework.create_container
exec_in_container := data.framework.exec_in_container
exec_external := data.framework.exec_external
shutdown_container := data.framework.shutdown_container
signal_container_process := data.framework.signal_container_process
plan9_mount := data.framework.plan9_mount
plan9_unmount := data.framework.plan9_unmount
get_properties := data.framework.get_properties
dump_stacks := data.framework.dump_stacks
runtime_logging := data.framework.runtime_logging
load_fragment := data.framework.load_fragment
scratch_mount := data.framework.scratch_mount
scratch_unmount := data.framework.scratch_unmount

reason := {"errors": data.framework.errors}
  name: confidential-alpine
spec:
  containers:
  - image: mcr.microsoft.com/oss/nginx/nginx:1.17.3-alpine
    name: mypod
    resources:
      limits:
        cpu: 2250m
        memory: 2256Mi
      requests:
        cpu: 100m
        memory: 128Mi
  nodeSelector:
    virtualization: virtualnode2
  tolerations:
  - effect: NoSchedule
    key: virtual-kubelet.io/provider
    operator: Exists

Confidential Policy with HELM dynamic values

When managing K8s deployments where multiple containers need to be aligned with dynamic setups, some customers use variables in the YAML and have HELM replace them with real values at deployment time. This is a great system for more complex deployments, but Confidential Policies are designed to reject configurations that they do not recognize!

However, you can still use a very similar process to the above policy generation to continue getting the benefits from both HELM's dynamic chart capabilities and Confidential's security mechanisms!

Instead of deploying directly from HELM using something like helm install, make use of the helm template command which will output a static YAML file that is no longer using dynamic values. At that point, using the policy gen command from the above section on that YAML will update the YAML with the appropriate Confidential Policy, and you can deploy it via kubectl. The static YAML generation and Confidential Policy generation steps will need to be re-run any time the HELM charts are updated to ingest those updates.

Allow All Confid Policy

For testing / developing containers before the functionality is locked in, often it is useful to run with a very permissive policy. The most permissive policy is below, which provides effectively NO security guarantees... allowing a container to be run with any payload and debug execution allowed, but still running inside the specialized confidential hardware and with the attestation services running.

This should NOT be used for any production workloads, just as a tool for initial experimentation.

"microsoft.containerinstance.virtualnode.ccepolicy":"cGFja2FnZSBwb2xpY3kKCmFwaV9zdm4gOj0gIjAuMTAuMCIKCm1vdW50X2RldmljZSA6PSB7ImFsbG93ZWQiOiB0cnVlfQptb3VudF9vdmVybGF5IDo9IHsiYWxsb3dlZCI6IHRydWV9CmNyZWF0ZV9jb250YWluZXIgOj0geyJhbGxvd2VkIjogdHJ1ZSwgImVudl9saXN0IjogbnVsbCwgImFsbG93X3N0ZGlvX2FjY2VzcyI6IHRydWV9CnVubW91bnRfZGV2aWNlIDo9IHsiYWxsb3dlZCI6IHRydWV9IAp1bm1vdW50X292ZXJsYXkgOj0geyJhbGxvd2VkIjogdHJ1ZX0KZXhlY19pbl9jb250YWluZXIgOj0geyJhbGxvd2VkIjogdHJ1ZSwgImVudl9saXN0IjogbnVsbH0KZXhlY19leHRlcm5hbCA6PSB7ImFsbG93ZWQiOiB0cnVlLCAiZW52X2xpc3QiOiBudWxsLCAiYWxsb3dfc3RkaW9fYWNjZXNzIjogdHJ1ZX0Kc2h1dGRvd25fY29udGFpbmVyIDo9IHsiYWxsb3dlZCI6IHRydWV9CnNpZ25hbF9jb250YWluZXJfcHJvY2VzcyA6PSB7ImFsbG93ZWQiOiB0cnVlfQpwbGFuOV9tb3VudCA6PSB7ImFsbG93ZWQiOiB0cnVlfQpwbGFuOV91bm1vdW50IDo9IHsiYWxsb3dlZCI6IHRydWV9CmdldF9wcm9wZXJ0aWVzIDo9IHsiYWxsb3dlZCI6IHRydWV9CmR1bXBfc3RhY2tzIDo9IHsiYWxsb3dlZCI6IHRydWV9CnJ1bnRpbWVfbG9nZ2luZyA6PSB7ImFsbG93ZWQiOiB0cnVlfQpsb2FkX2ZyYWdtZW50IDo9IHsiYWxsb3dlZCI6IHRydWV9CnNjcmF0Y2hfbW91bnQgOj0geyJhbGxvd2VkIjogdHJ1ZX0Kc2NyYXRjaF91bm1vdW50IDo9IHsiYWxsb3dlZCI6IHRydWV9Cg=="

Debug Mode

In order to slightly loosen the policy for a Pod to allow certain types of debugging activities like allowing an exec session to shell into the pod with sh or bash, you can generate a policy using the --debug-mode arg:

az confcom acipolicygen -k <yourYamlFile>.yaml --debug-mode

Using virtual nodes with Multiple Subnets

By default, virtual node pods will run in the subnet configured in the HELM chart as the default ACI subnet. However, some customers may want to run pods in their own isolated subnets (or in a subnet with only a specific set of other pods), and this can be achieved using the subnet override annotation.

microsoft.containerinstance.virtualnode.subnets.primary

Example: microsoft.containerinstance.virtualnode.subnets.primary: /subscriptions/000000-0000-0000-053ca49ab4b5/resourceGroups/definitely_a_fake_RG/providers/Microsoft.Network/virtualNetworks/the_VNET_For_This_Subnet/subnets/your_subnet_name

Running pods with an Azure Managed Identity

For some Azure interactions it can be very convenient (and a good security practice) to utilize Azure Managed Identities to make the requests, rather than having your code deal with the unpleasantness of rotating credentials. virtual node can hook up to Azure Container Instances functionality for running containers with a Managed Identity via a pod annotation:

microsoft.containerinstance.virtualnode.identity

Example: microsoft.containerinstance.virtualnode.identity: /subscriptions/000000-0000-0000-053ca49ab4b5/resourceGroups/definitely_a_fake_RG/providers/Microsoft.ManagedIdentity/userAssignedIdentities/my_MI_name

Disable Kube-Proxy

The Kube-Proxy is a standard K8s component that provides benefits like modifying local IP route tables for K8s internal network usage. However, if you do not require this functionality (or explicitly don't want it), the kube-proxy can be disabled for the virtual node pods via this annotation:

microsoft.containerinstance.virtualnode.injectkubeproxy: "false"

The default behavior for K8s is to include the Kube-Proxy so that is the behavior if the annotation is not provided.

Confidential containers do not support Kube-Proxy usage as it breaks some security guarantees, so regardless what value is provided for this annotation a Confidential pod will ignore it and load without a Kube-Proxy.

Disable K8s DNS Injection

By default, K8s Pods are expected to utilize the K8s cluster's DNS. If you want to avoid that interaction, you can add this annotation

microsoft.containerinstance.virtualnode.injectdns: "false"

If provided as false, ACI's default DNS will be used by this pod instead of K8s.

Zones

Azure has a concept of Availability Zones, which are separated groups of datacenters that exist within the same region. If your scenario calls for it, you can specify a zone for your pod to be hosted on within your given region.

microsoft.containerinstance.virtualnode.zones: "<semi-colon delimited string of zones>"

NOTE: Today, ACI only supports providing a single zone as part of the request to allocate a sandbox for your pod. If you provide multiple, you should get an informative error effectively saying you can only provide one.

When using the node level configuration to specify a default zone, if this pod annotation is set it will take precedence over that. When a node level zone is set and you want a particular pod to use no zone, set the pod level annotation with an empty string value.

virtual node Downlevel APIs

virtual node has a couple of downlevel APIs which don't behave quite like K8s downlevel APIs. They work such that if for a POD if the VALUE of on ENV var is exactly equal to one of the virtual node Downlevel APIs, it will be replaced server size with the appropriate "real" value.

THIM Downlevel APIs

THIM (Trusted Hardware Identity Management) is part of the attestation service used for Confidential ACI. In order to avoid hardcoding the address to interact with the attestation service, customers can instead set an environment variable to either of the below and then use the value of that in their container to access THIM:

===VIRTUALNODE2.CC.THIM.ENDPOINT=== , which will be replaced by something like http://169.254.128.1:2377/metadata/THIM/amd/certification

===VIRTUALNODE2.CC.THIM.ADDRESS===, which will be replaced by something like 169.254.128.1:2377

Example Pod YAML using the THIM Downlevel APIs:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    microsoft.containerinstance.virtualnode.injectkubeproxy: 'false'
  name: thim-downlevel
spec:
  containers:
  - command:
    - /bin/bash
    - -c
    - 'counter=1; while true; do echo "Hello, World! Counter: $counter"; counter=$((counter+1)); sleep 1; done'
    image: mcr.microsoft.com/azure-cli
    name: managed-identity-container
    env: 
    - name: THIM_ENDPOINT
      value: ===VIRTUALNODE2.CC.THIM.ENDPOINT===
    - name: whateverNameYouWant
      value: ===VIRTUALNODE2.CC.THIM.ADDRESS===
    resources:
      limits:
        cpu: 2250m
        memory: 2256Mi
      requests:
        cpu: 100m
        memory: 128Mi
  nodeSelector:
    type: virtual-kubelet
    virtualization: virtualnode2
  tolerations:
  - effect: NoSchedule
    key: virtual-kubelet.io/provider
    operator: Exists

Which, assuming you were running a Confidential pod with an image which includes CURL, you could then run something like this to get the THIM attestation:

curl GET $THIM_ENDPOINT -H "Metadata: true"