virtual explorer | Calico and Kubernetes - Part 2 - IP Pool and Intra Cluster Connectivity

Overview

In this part 2, let’s inspect our existing k8s setup in more detail especially from IP address alocation point of view.

Target topology

The following diagram is similar as the one in step 1 but added with actual subnet and container IP.

                     Internet
                         +                         
                         |                        +-------------+ 
                   +---------------+              | vmx gateway | 
                   | Internet GW   |              |             | 
                   +---------------+              +-------------+ 
             192.168.1.1 |                                 | 192.168.1.22 
                         |                                 |
   +---+-----------------+--------------+------------------+-------------------------------------------------------------------+--------+
       |                                |                                                                                      |
192.168.1.19                       192.168.1.18                                                                         192.168.1.142
       |                                |                                                                                      |
       |                                |                                                                                      |
  +----+----+   +------------------------------------------------------------------------------------------------+        +----+-------+
  |Contrail |   |                       |                                                                        |        |  Test PC   |
  |Control  |   |                     vrouter                                                                    |        |            |
  +---------+   |                       |                                                                        |        +------------+
                |                       |               openstack net1 100.64.1.0/24                             | 
                |    +-------+----------+------------------------------------------------------------------+     |
                |            |                                              |                                    |
                |            |                                              |                                    |
                |       100.64.1.23  ubuntu-4 k8s node                 100.64.1.24  ubuntu-3 k8s node            |
                |      +-----+------------------------------+         +-----+------------------------------+     |
                |      |     |                              |         |     |                              |     |
                |      |     |     10.201.0.192/26          |         |     |     10.201.0.128/26          |     |
                |      |   +-+----------------+--------+    |         |   +-+----------------+--------+    |     |
                |      |     |                |             |         |     |                |             |     |
                |      |     |           .196 |             |         |     |           .130 |             |     |
                |      |     |     +-----------------+      |         |     |     +-----------------+      |     |
                |      |     |     |                 |      |         |     |     |                 |      |     |
                |      |     |     |  container 11   |      |         |     |     |  container 21   |      |     |
                |      |     |     +-----------------+      |         |     |     +-----------------+      |     |
                |      |     |                              |         |     |                              |     |
                |      |     |                              |         |     |                              |     |
                |      |     |     10.91.2.0/26             |         |     |       10.91.1.128/26         |     |
                |      |   +-+----------------+--------+    |         |   +-+----------------+--------+    |     |
                |      |     |                |             |         |     |                |             |     |
                |      |     |             .0 |             |         |     |           .128 |             |     |
                |      |     |     +-----------------+      |         |     |     +-----------------+      |     |
                |      |     |     |                 |      |         |     |     |                 |      |     |
                |      |     |     |  container 12   |      |         |     |     |  container 22   |      |     |
                |      |     |     +-----------------+      |         |     |     +-----------------+      |     |
                |      |                                    |         |                                    |     |
                |      |                                    |         |                                    |     |
                |      +------------------------------------+         +------------------------------------+     |
                |                                                                                                |
                | Compute node                                                                                   |
                +------------------------------------------------------------------------------------------------+

Components

k8s node 1:
- IP: 100.64.1.23
- hostname: ubuntu-4
- role: k8s master and worker node
k8s node 2:
- IP: 100.64.1.24
- hostname: ubuntu-3
- role: worker node
Notes:
Although my test setup will have k8s nodes running as a VM on top of Openstack, it is not a mandatory requirement. You can have k8s on baremetal and directly connected to physical L2/L3 switches.

Add more stuff on existing setup

Let’s bring up 2 more containers

container number 2

  ubuntu@ubuntu-4:~$ kubectl run sshd-2 --image=rastasheep/ubuntu-sshd:16.04
  deployment "sshd-2" created

container number 3

  ubuntu@ubuntu-4:~$ kubectl run sshd-3 --image=rastasheep/ubuntu-sshd:16.04
  deployment "sshd-3" created

Verify the containers status

  ubuntu@ubuntu-4:~$ kubectl get pods -o wide
  NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
  sshd-1-84c4bf4558-284dj   1/1       Running   0          1d        10.201.0.197   ubuntu-4
  sshd-2-78f7789cc8-95srr   1/1       Running   0          1d        10.201.0.130   ubuntu-3
  sshd-3-6bb86d6bf8-bq4q8   1/1       Running   0          1d        10.201.0.131   ubuntu-3

Looks like we have 1 pod on ubuntu-4 host and 2 pods in ubuntu-3 host.
All of the pods above were getting IP from the IP Pool 10.201.0.0/24 that specified during kubeadm init.

Verify k8s node networking setup

Find out k8s node networking setup

Based on theory, we should have the following:
- Calico by default will create full-mesh IPIP tunnel between each node.
- Calico will create full-mesh BGP Peering between each node
- Unlike in Openstack where each VM is advertised as single /32 route, in k8s or docker, Calico will do route aggregation on each node
  - Based on some examples done by other people, and also myself in https://rendoaw.github.io/2017/06/Contrail-BGPaaS-with-Docker-and-Calico, looks like Calico will aggregate the routes per /26 prefix.
  - Other reference: https://docs.projectcalico.org/v2.6/reference/private-cloud/l3-interconnect-fabric
Let’s go back to our setup

Verify interface on the host.

node 1: ubuntu-4

  ubuntu@ubuntu-4:~$ ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host 
         valid_lft forever preferred_lft forever
  2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP group default qlen 1000
      link/ether 02:f5:83:2b:70:b6 brd ff:ff:ff:ff:ff:ff
      inet 100.64.1.23/24 brd 100.64.1.255 scope global ens3
         valid_lft forever preferred_lft forever
      inet6 fe80::f5:83ff:fe2b:70b6/64 scope link 
         valid_lft forever preferred_lft forever
  3: ens4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether 02:92:95:10:a4:d7 brd ff:ff:ff:ff:ff:ff
  4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
      link/ether 02:42:f0:fe:30:76 brd ff:ff:ff:ff:ff:ff
      inet 172.17.0.1/16 scope global docker0
         valid_lft forever preferred_lft forever
  5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
      link/ipip 0.0.0.0 brd 0.0.0.0
      inet 10.201.0.192/32 scope global tunl0
         valid_lft forever preferred_lft forever
  6: cali5bed1c02fcb@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
      link/ether 0e:75:a4:ac:a0:9b brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet6 fe80::c75:a4ff:feac:a09b/64 scope link 
         valid_lft forever preferred_lft forever
  7: calidd49badf8e9@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
      link/ether f6:02:04:52:7a:e4 brd ff:ff:ff:ff:ff:ff link-netnsid 1
      inet6 fe80::f402:4ff:fe52:7ae4/64 scope link 
         valid_lft forever preferred_lft forever

notes:
- ens3 is host 1st NIC for uplink interface
- ens4 is host 2nd NIC, not used
- docker0 is the default bidge for docker. This is not used in Kubernetes+Calico.
- tunl0 is the IPIP tunnel between nodes.
- cali….@.. is the interface between the host and the container.

node 2: ubuntu-3 interface list

  root@ubuntu-3:/home/ubuntu# ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host 
         valid_lft forever preferred_lft forever
  2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc pfifo_fast state UP group default qlen 1000
      link/ether 02:a2:f0:89:15:28 brd ff:ff:ff:ff:ff:ff
      inet 100.64.1.24/24 brd 100.64.1.255 scope global ens3
         valid_lft forever preferred_lft forever
      inet6 fe80::a2:f0ff:fe89:1528/64 scope link 
         valid_lft forever preferred_lft forever
  3: ens4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
      link/ether 02:e8:df:d6:11:b7 brd ff:ff:ff:ff:ff:ff
  4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
      link/ether 02:42:8f:0e:1c:cf brd ff:ff:ff:ff:ff:ff
      inet 172.17.0.1/16 scope global docker0
         valid_lft forever preferred_lft forever
  5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
      link/ipip 0.0.0.0 brd 0.0.0.0
      inet 10.201.0.128/32 scope global tunl0
         valid_lft forever preferred_lft forever
  6: cali06827901978@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
      link/ether f2:12:2c:d6:b4:99 brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet6 fe80::f012:2cff:fed6:b499/64 scope link 
         valid_lft forever preferred_lft forever
  7: caliaf4e899510c@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
      link/ether 3a:4b:96:92:d8:63 brd ff:ff:ff:ff:ff:ff link-netnsid 1
      inet6 fe80::384b:96ff:fe92:d863/64 scope link 
         valid_lft forever preferred_lft forever

Verify routing table on the node
- node 1: ubuntu-4
```
  ubuntu@ubuntu-4:~$ ip r
  default via 100.64.1.1 dev ens3 
  10.201.0.128/26 via 100.64.1.24 dev tunl0  proto bird onlink 
  blackhole 10.201.0.192/26  proto bird 
  10.201.0.196 dev cali5bed1c02fcb  scope link 
  10.201.0.197 dev calidd49badf8e9  scope link 
  100.64.1.0/24 dev ens3  proto kernel  scope link  src 100.64.1.23 
  172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
```
- notes
  - 10.201.0.192/26 is allocated for this node (ubuntu-4).
    - This is the route aggregation that mentioned earlier.
  - 10.201.0.128/26 is route aggregation that allocated for 2nd worker.
    - see the next-hop is tunl0 interface with next-hop is ubuntu-3 IP address
  - 10.201.0.197 is the container IP address. This IP is reachable via calidd49badf8e9 interface.
  - 172.17.0.0/16 is default docker0 bridge interface. I believe it is not used at all in Kubernetes with Calico.
- node 2: ubuntu-3
```
  root@ubuntu-3:/home/ubuntu# ip r
  default via 100.64.1.1 dev ens3 
  blackhole 10.201.0.128/26  proto bird 
  10.201.0.130 dev cali06827901978  scope link 
  10.201.0.131 dev caliaf4e899510c  scope link 
  10.201.0.192/26 via 100.64.1.23 dev tunl0  proto bird onlink 
  100.64.1.0/24 dev ens3  proto kernel  scope link  src 100.64.1.24 
  172.17.0.0/16 dev docker0  proto kernel  scope link  src 172.17.0.1 linkdown 
```
- notes
  - 10.201.0.128/26 is allocated for this node (ubuntu-3).
    - This is the route aggregation that mentioned earlier.
  - 10.201.0.192/26 is route aggregation that allocated for other node (ubuntu-4)
    - see the next-hop is tunl0 interface with next-hop is ubuntu-4 IP address
  - 10.201.0.130 is the container IP address. This IP is reachable via cali06827901978 interface.
  - 172.17.0.0/16 is default docker0 bridge interface. I believe it is not used at all in Kubernetes with Calico.

Find out how the network is setup is inside container

Let’s go inside the container.
It was on purpose to choose sshd image as a test container because it will make us easier to go to inside the container and do ping, traceroute or other related networking test

Default credential for the ssh container image is root/root. IP address of each container can be found via te following command

  ubuntu@ubuntu-4:~$ kubectl get pods -o wide
  NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
  sshd-1-84c4bf4558-284dj   1/1       Running   0          1d        10.201.0.197   ubuntu-4
  sshd-2-78f7789cc8-95srr   1/1       Running   0          1d        10.201.0.130   ubuntu-3
  sshd-3-6bb86d6bf8-bq4q8   1/1       Running   0          1d        10.201.0.131   ubuntu-3

Let’s ssh to sshd-1 container

  ubuntu@ubuntu-4:~$ ssh root@10.201.0.197
  The authenticity of host '10.201.0.197 (10.201.0.197)' can't be established.
  ECDSA key fingerprint is SHA256:oeXGFyX/mdzuCxeulD1bOzDe1lCbIktPxb7vKNPiDOs.
  Are you sure you want to continue connecting (yes/no)? yes
  Warning: Permanently added '10.201.0.197' (ECDSA) to the list of known hosts.

Since this container is very minimum, let’s install some basic packages.

  root@sshd-1-84c4bf4558-284dj:/# apt-get update 
  root@sshd-1-84c4bf4558-284dj:/# apt-get install iproute 2
  root@sshd-1-84c4bf4558-284dj:~# apt-get install inetutils-ping traceroute

Now, we verify the interface and routing table

  root@sshd-1-84c4bf4558-284dj:/sbin# ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host 
         valid_lft forever preferred_lft forever
  2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1
      link/ipip 0.0.0.0 brd 0.0.0.0
  4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
      link/ether ea:ff:92:a4:32:63 brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet 10.201.0.197/32 scope global eth0
         valid_lft forever preferred_lft forever
      inet6 fe80::e8ff:92ff:fea4:3263/64 scope link 
         valid_lft forever preferred_lft forever

  root@sshd-1-84c4bf4558-284dj:~# ip r
  default via 169.254.1.1 dev eth0
  169.254.1.1 dev eth0  scope link

Notes:
- container eth0 basically is the other side of the veth pair that terminated on the host.
- container eth0 IP is coming from the Calico IP Pool
- container IP is /32
- default gateway is eth0 interface.

Let’s do traceroute to other container that reside in other node

  root@sshd-1-84c4bf4558-284dj:~# traceroute -n 10.201.0.130
  traceroute to 10.201.0.130 (10.201.0.130), 30 hops max, 60 byte packets
   1  100.64.1.23  0.158 ms  0.060 ms  0.055 ms
   2  10.201.0.128  0.960 ms  0.769 ms  0.662 ms
   3  10.201.0.130  0.648 ms  0.625 ms  0.507 ms

from the above, the 1st hop is ubuntu-4 which is the host of sshd1 container
then, the 2nd hop is tunl0 IP on ubuntu-3 which is the host of sshd2 container

ok, just for curiosity, let’s traceroute to internet

  root@sshd-1-84c4bf4558-284dj:~# traceroute -n 8.8.8.8     
  traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
100.64.1.23  0.112 ms  0.039 ms  0.075 ms
100.64.1.1  0.765 ms  0.645 ms  0.485 ms
100.64.0.2  2.057 ms  2.880 ms  3.342 ms
100.64.0.1  8.384 ms  8.313 ms  8.239 ms
192.168.1.1  9.304 ms  9.188 ms  9.199 ms
* * *
67.59.226.53  16.035 ms  17.296 ms  17.308 ms
67.83.251.141  22.326 ms 67.83.251.133  21.483 ms  24.335 ms
65.19.114.67  25.486 ms 67.59.239.119  20.946 ms 65.19.114.67  21.563 ms
64.15.0.8  21.668 ms 64.15.1.64  20.923 ms 64.15.1.90  22.351 ms
72.14.215.203  22.273 ms  16.582 ms *
* * *
209.85.243.19  12.440 ms 108.170.238.203  16.062 ms 108.170.228.135  19.197 ms
8.8.8.8  19.882 ms  20.018 ms  20.675 ms

Woa, the container can access the internet!

WAIT, something looks suspicious

We have not configure any route from outside network to the container, how come the container can access internet and do apt-get?

The answer is because the default IP Pool has NAT enabled. We can verify that as below:

  ubuntu@ubuntu-4:~$ kubectl exec -ti -n kube-system calicoctl -- /calicoctl get -o yaml ippool
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.201.0.0/24
    spec:
      ipip:
        enabled: true
        mode: always
      nat-outgoing: true

That’s mean the host will NAT all outgoing traffic from the container.

We can also see the NAT from the IPTables below

  ubuntu@ubuntu-4:~$ sudo iptables -L -t nat -n
  ...
  Chain cali-nat-outgoing (1 references)
  target     prot opt source               destination         
  MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* cali:Wd76s91357Uv7N3v */ match-set cali4-masq-ipam-pools src ! match-set cali4-all-ipam-pools dst
  ...


  root@ubuntu-4:/home/ubuntu# ipset list cali4-masq-ipam-pools 
  Name: cali4-masq-ipam-pools
  Type: hash:net
  Revision: 6
  Header: family inet hashsize 1024 maxelem 1048576
  Size in memory: 448
  References: 1
  Members:
  10.201.0.0/24

OK, one mystery is solved.

BUT …. i though the intention of using Calico instead of flannel is to have full end-to-end routing to each container without any NAT?

Yes, but on that purpose either we need to disable NAT on default IP Pool, or we can create a new Pool with NAT disabled
Let’s create a new pool with IP Pool disable
To make it esier to run the calicoctl utility, let’s go inside calicoctl container and run the command from there
```
  ubuntu@ubuntu-4:~$ kubectl exec -ti -n kube-system calicoctl -- /bin/busybox sh
  ~ # pwd
  /root
```

Prepare a yaml file to define a new pool. In this case, my new pool is 10.91.1.0/24

  ~ # cat ippool.yaml 
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.91.1.0/24
    spec:
      ipip:
        enabled: true
        mode: always
      nat-outgoing: false

Enable new IP Pool

  / # /calicoctl create -f ippool.yaml
  Successfully created 1 'ipPool' resource(s)

Verify all IP Pools

  / # /calicoctl get -o yaml ippool
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.201.0.0/24
    spec:
      ipip:
        enabled: true
        mode: always
      nat-outgoing: true
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.91.1.0/24
    spec:
      ipip:
        enabled: true
        mode: always
  / #

Before we do any traffic test, since we know that Calico will do /26 aggregation, i am curious what will happen if i create IP Pool smaller that /26

  ~ # calicoctl create -f ippool-small.yaml 
  Failed to execute command: error with field CIDR = '10.91.2.0/27' (IP pool size is too small (min /26) for use with Calico IPAM)

Nice.. smaller that /26 is not allowed. How about /26 itself

  ~ # cat ippool2.yaml 
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.91.2.0/26
    spec:
      ipip:
        enabled: true
        mode: always
      nat-outgoing: false


  / # /calicoctl create -f ippool2.yaml
  Successfully created 1 'ipPool' resource(s)

    
  ~ # cat ippool2.yaml 
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.91.2.0/26
    spec:
      ipip:
        enabled: true
        mode: always
      nat-outgoing: false
        
        
  ~ # calicoctl get ippool -o yaml
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.201.0.0/24
    spec:
      ipip:
        enabled: true
        mode: always
      nat-outgoing: true
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.91.1.0/24
    spec:
      ipip:
        enabled: true
        mode: always
  - apiVersion: v1
    kind: ipPool
    metadata:
      cidr: 10.91.2.0/26
    spec:
      ipip:
        enabled: true
        mode: always

OK, now time to create a new container or pod, and this time we need to make sure that pod will get IP from the new pool. For this, we need to define the pod in yaml format before creating it.

  ubuntu@ubuntu-4:~$ more pod1.yaml 
  ---
  apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      "cni.projectcalico.org/ipv4pools": "[\"10.91.1.0/24\"]" 
    name: ssh-server
    labels:
      app: sshd
  spec:
    containers:
      - name: ssh-server-container
        image: "rastasheep/ubuntu-sshd:16.04"

Now, we create the pod

  ubuntu@ubuntu-4:~$ kubectl create -f pod1.yaml
  pod "ssh-server" created

Verify the new Pod IP

  ubuntu@ubuntu-4:~$ kubectl get pods -o wide
  NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
  ssh-server                1/1       Running   0          1d        10.91.1.128    ubuntu-3
  ...

Verify the host - ubuntu-3 routing table
```
  root@ubuntu-3:/home/ubuntu# ip r
  default via 100.64.1.1 dev ens3 
  10.91.1.128 dev cali90490b35e30  scope link 
  blackhole 10.91.1.128/26  proto bird 
  ...
```
- now we have a new aggregation route on ubuntu-3 which is 10.91.1.128/26
- This also a proof that Calico will always split the IP Pool into multiple /26.
Great, the POD now is using new Pool. We can go to inside he new pod, and do some ping to outside network and they will fail because this pool is not NATed.

Just curious, what will happen with the other new IP Pool that only have /26 subnet size.

Remember that we also create a second new IP Pool with /26 subnet size. In theory, if Calico split the IP Pool with /26 each, then in this case, only one node can have container with second new IP pool range (10.91.2.0/26).
For this test case, we need to find a way to pin a container to specific node. Why?
- Because, if we can create 2 new pods, let say pod-A and pod-B, and pod-A is forced to be in node-1 and pod-B in node-2, then we can verify how Calico split the /26 IP Pool into multiple nodes.
To pin a pod into a specific node, we need to use “node selector” feature. For that, we need to assign unique label for each node
```
  ubuntu@ubuntu-4:~$ kubectl label nodes ubuntu-4 node_id=n4
  node "ubuntu-4" labeled
  ubuntu@ubuntu-4:~$ kubectl label nodes ubuntu-3 node_id=n3
  node "ubuntu-3" labeled
```
- we assign “n4” label to ubuntu-4 node and “n3” to ubuntu-3 node

Create a YAML to define the container and its node pinning

  ubuntu@ubuntu-4:~$ cat pod_on_node3.yaml 
  ---
  apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      "cni.projectcalico.org/ipv4pools": "[\"10.91.2.0/26\"]" 
    name: ssh-server3
    labels:
      app: sshd
  spec:
    containers:
      - name: ssh-server-container
        image: "rastasheep/ubuntu-sshd:16.04"
    nodeSelector:
      node_id: n3
  ubuntu@ubuntu-4:~$ 
  ubuntu@ubuntu-4:~$ 
  ubuntu@ubuntu-4:~$ 
  ubuntu@ubuntu-4:~$ cat pod_on_node4.yaml 
  ---
  apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      "cni.projectcalico.org/ipv4pools": "[\"10.91.2.0/26\"]" 
    name: ssh-server4
    labels:
      app: sshd
  spec:
    containers:
      - name: ssh-server-container
        image: "rastasheep/ubuntu-sshd:16.04"
    nodeSelector:
      node_id: n4

Create the Pods

  ubuntu@ubuntu-4:~$ kubectl create -f pod_on_node4.yaml
  ubuntu@ubuntu-4:~$ kubectl create -f pod_on_node3.yaml

Notes: the new pods will have name ssh-server3 and ssh-server4

Verify the status

  ubuntu@ubuntu-4:~$ kubectl get pods -o wide
  NAME                      READY     STATUS    RESTARTS   AGE       IP             NODE
  ssh-server                1/1       Running   0          20h       10.91.1.128    ubuntu-3
  ssh-server3               0/1       Pending   0          25s       <none>         <none>
  ssh-server4               1/1       Running   0          1m        10.91.2.0      ubuntu-4
  sshd-1-84c4bf4558-284dj   1/1       Running   0          22h       10.201.0.197   ubuntu-4
  sshd-2-78f7789cc8-95srr   1/1       Running   0          21h       10.201.0.130   ubuntu-3
  sshd-3-6bb86d6bf8-bq4q8   1/1       Running   0          20h       10.201.0.131   ubuntu-3

Wait for a while

Interesting, ssh-server3 stay in pending status. Let’s check more detail

  ubuntu@ubuntu-4:~$ kubectl describe pod ssh-server3
  Name:         ssh-server3
  Namespace:    default
  Node:         <none>
  Labels:       app=sshd
  Annotations:  cni.projectcalico.org/ipv4pools=["10.91.2.0/26"]
  Status:       Pending
  IP:           
  Containers:
    ssh-server-container:
      Image:        rastasheep/ubuntu-sshd:16.04
      Port:         <none>
      Environment:  <none>
      Mounts:
        /var/run/secrets/kubernetes.io/serviceaccount from default-token-h52zb (ro)
  Conditions:
    Type           Status
    PodScheduled   False
  Volumes:
    default-token-h52zb:
      Type:        Secret (a volume populated by a Secret)
      SecretName:  default-token-h52zb
      Optional:    false
  QoS Class:       BestEffort
  Node-Selectors:  node_id=n3
  Tolerations:     node.alpha.kubernetes.io/notReady:NoExecute for 300s
                   node.alpha.kubernetes.io/unreachable:NoExecute for 300s
  Events:
    Type     Reason            Age                From               Message
    ----     ------            ----               ----               -------
    Warning  FailedScheduling  26s (x7 over 57s)  default-scheduler  No nodes are available that match all of the predicates: MatchNodeSelector (2).

So, ssh-server3 is pending because k8s scheduler can’t find any host that satisfy the requirement which has “n3” label. Why?
- it because 10.91.2.0/26 IP Pool can only be split into a single aggregated route.
- We created ssh-server4 first which is pinned to ubuntu-4 node.
  - this way, 10.91.2.0/26 is allocated to ubuntu-4
- When we create ssh-server3 which is pinned to ubuntu-3 node, no more sub-prefix is available from 10.91.2.0/26.
  - Calico can’t assign any aggregated route to ubuntu-3 node
  - This make ubuntu-3 is ineligible to host any container with 10.91.2.0/26 IP range.
- Great! This is according my expectation. If number of sub-prefix chunk is smaller than number of the node, then the container for that IP Pool will be only placed into some of the available nodes.
OK, we know for sure that Calico split the Pool into multiple /26 prefix. But which module/script is doing this?
- After searching around, i found that the following module is responsible for the splitting, and /26 size is hardcoded there.
- https://github.com/projectcalico/libcalico/blob/master/calico_containers/pycalico/block.py

OK, now we have more insight how the IP allocation works and know there are different behavior can be configured for different IP Pool.

I’ll continue with connectivity between the container and the external network in the next post here: Kubernetes and Calico Part 3

Calico and Kubernetes - Part 2 - IP Pool and Intra Cluster Connectivity