virtual explorer | Trying CoreOS

Overview

CoreOS is a lot different compared with other normal Linux distros.
CoreOS does not have its own package manager and expect to run application inside docker container.
CoreOS has built-in clustering support, mainly managed by etcd and fleet.

Creating first CoreOS machine

Installing CoreOS

For this purpose, i wanted to try CoreOS on VM, in this case is a KVM-based VM.

Lesson #1:

CoreOS ISO does not provide a complete installation mechanism like CentOS anaconda or Ubuntu installer. 
CoreOS ISO actually is an live CD, which allow you to download CoreOS image from internet and re-image your hard drive. 

note: i don't know how to do offline CoreOS installation yet.

For the installation procedure, i was following https://coreos.com/os/docs/latest/installing-to-disk.html

Booting CoreOS

Lesson #2:

Although we installed CoreOS using "ISO", the result is similar as if you are using other distro ready-made cloud image. 
By default, CoreOS fresh system has no username/password set. 

How do you set the default credential?
Apparently, there are 2 common ways to do this:

Option 1:
- Create a cloud-config file, and use it when you run coreos-install, e.g:
```
# coreos-install -d /dev/sda -c cloud-config.yaml
```
- example of the cloud-config
```
#cloud-config
    
hostname: coreos
users:
  - name: "rendo"
    passwd: "$1$9DHHJV.z$vY1BrX8dK4tgALvg6DWHz0"
    groups:
      - "sudo"
      - "docker"
```
- this method actually works once, but i have hard time to reproduce it. The problem is not on the cloud config itself, but more on the coreos-install script.
  - For some reason, i got “BLKRRPART: Device or resource busy” more frequently if i use “-c " parameters.
  - seems i am hitting the same issue as https://github.com/coreos/bugs/issues/152

Option 2:

The second approach basially is using standard installation without any username/password set
Then, similar as what we do with any cloud-init based image, we use cloud-init config to “initialize” the VM
Since i am running CoreOS on KVM, i don’t have metadata service, so i have to use config-drivae to inject my cloud-init configuration.

Procedure:

Create an ISO file contains the the cloud init config, e.g:

# cat coreos/openstack/latest/user_data
#cloud-config
      
hostname: coreos
users:
  - name: "rendo"
    passwd: "$1$9DHHJV.z$vY1BrX8dK4tgALvg6DWHz0"
    groups:
      - "sudo"
      - "docker"
            

create the ISO

# mkisofs -R -V config-2 -o coreos-configdrive.iso coreos

attach the ISO as 2nd disk on KVM VM

# virsh dumpxml coreos
      
...deleted..
      
<disk type='file' device='cdrom'>
  <driver name='qemu' type='raw'/>
  <source file='/data/kvm/coreos/coreos-configdrive.iso'/>
  <target dev='hda' bus='ide'/>
  <readonly/>
  <boot order='2'/>
  <address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<disk type='file' device='disk'>
  <driver name='qemu' type='qcow2'/>
  <source file='/data/kvm/coreos/coreos.qcow2'/>
  <target dev='vda' bus='virtio'/>
  <boot order='1'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
</disk>
      
..deleted..

start the CoreOS VM

we should be able to login with user/password as specified in the cloud-init file

Static IP instead of DHCP

By default CoreOS will use DHCP to get the IP. To use static IP, I was modifying the cloud-init as below

  #cloud-config
    
  hostname: coreos
  users:
    - name: "rendo"
      passwd: "$1$9DHHJV.z$vY1BrX8dK4tgALvg6DWHz0"
      groups:
        - "sudo"
        - "docker"
    
  write_files:
    
  coreos:
    units:
      - name: 00-eth0.network
        permissions: 0644
        runtime: true
        content: |
          [Match]
          Name=eth0
    
          [Network]
          Address=192.168.1.23/24
          Gateway=192.168.1.1
          DNS=8.8.8.8
          DNS=8.8.4.4
      - name: systemd-networkd.service
        command: start    

Trying CoreOS clustering

I was following the great tutorial provided by Digital Ocean: https://www.digitalocean.com/community/tutorials/an-introduction-to-coreos-system-components
Almost everything from the tutorial as working as expected but I do have some issues that mainly because of my specific setup.

Setup Overview

I created 3 CoreOS VM to test clustering

List of issues

etcd2 discovery service is not working

i am using discovery.etcd.io as discovery service

  # curl --silent -H "Accept: text/plain" https://discovery.etcd.io/new?size=3

Error messages

  Jun 05 23:42:01 coreos-1 etcd2[914]: member "2987153e1354f8c11c8360f6fd80a02d" has previously registered with discovery service token (https://discovery.etcd.io/cb525330fb8159eed9ff784287608e5b).
  Jun 05 23:42:01 coreos-1 etcd2[914]: But etcd could not find valid cluster configuration in the given data dir (/var/lib/etcd2).

I am not 100% sure about the root cause, but most likely it because all my CoreOS instances are behind N-to-1 NAT. In this case, all 3 nodes are using the same Public IP.

To solve this issue, i am using static etcd2 configuration. I changed my cloud-init to be something like this:

  #cloud-config
        
  hostname: coreos-1
  users:
    - name: "rendo"
      passwd: "$1$9DHHJV.z$vY1BrX8dK4tgALvg6DWHz0"
      groups:
        - "sudo"
        - "docker"
        
  write_files:
        
  coreos:
    etcd2:
      name: coreos-1
      initial-advertise-peer-urls: http://192.168.1.24:2380               ---> adjust this with each node IP
      listen-peer-urls: http://192.168.1.24:2380                          ---> adjust this with each node IP
      listen-client-urls: http://192.168.1.24:2379,http://127.0.0.1:2379  ---> adjust this with each node IP
      advertise-client-urls: http://192.168.1.24:2379                     ---> adjust this with each node IP
      initial-cluster-token: etcd-cluster-1
      initial-cluster: coreos-1=http://192.168.1.24:2380,coreos-2=http://192.168.1.25:2380,coreos-3=http://192.168.1.26:2380
      initial-cluster-state: new
    fleet:
      public-ip: 192.168.1.24   # used for fleetctl ssh command           ---> adjust this with each node IP
    units:
      - name: etcd2.service
        command: start
      - name: fleet.service
        command: start
      - name: 00-eth0.network
        permissions: 0755
        runtime: true
        content: |
          [Match]
          Name=eth0
        
          [Network]
          Address=192.168.1.24/24                                         ---> adjust this with each node IP
          Gateway=192.168.1.1
          DNS=8.8.8.8
          DNS=8.8.4.4
      - name: systemd-networkd.service
        command: start        

verify

  coreos-1 rendo # etcdctl member list
  4c5bf3f20ea95537: name=coreos-2 peerURLs=http://192.168.1.25:2380 clientURLs=http://192.168.1.25:2379 isLeader=false
  567623728ce92b28: name=coreos-1 peerURLs=http://192.168.1.24:2380 clientURLs=http://192.168.1.24:2379 isLeader=true
  e54d504de5fc9f4d: name=coreos-3 peerURLs=http://192.168.1.26:2380 clientURLs=http://192.168.1.26:2379 isLeader=false        

with the change above, etcd2 now works, but “fleetctl list-machines” always give me one machine instead of all 3 nodes.

list-machine output

  coreos-1 rendo # fleetctl list-machines
  MACHINE		IP		METADATA
  2987153e...	192.168.1.24	-        

error message

  coreos-1 journal # journalctl -b -u fleet
        
  Jun 06 00:25:45 coreos-1 fleetd[671]: ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([61 != 62])
  Jun 06 00:25:49 coreos-1 fleetd[671]: ERROR engine.go:217: Engine leadership lost, renewal failed: 101: Compare failed ([66 != 67])

problem
- it found out that the problem was because all my CoreOS instances has the same machine-id
  - https://groups.google.com/forum/#!topic/coreos-user/_wmOxOfMsEY
- to fix the machine id, as mentioned in the post above, i run
```
  # rm /etc/machine-id
  # reboot
```

verification, now fleetctl list-machines gives me complete member

  coreos-1 rendo # fleetctl list-machines
  MACHINE		IP		METADATA
  19081653...	192.168.1.26	-
  2987153e...	192.168.1.24	-
  46c715ec...	192.168.1.25	-

Todo

find out how to inject all member ssh key to each node cloud-init config

Trying CoreOS