copyright 1998-2018 by Mark Verboom
ipv6 ready RSS Feed Switch to English

Ga terug naar Nieuws

Proxmox 4 migration issues

Naar blog index

Wednesday, 30 December, 2015

Proxmox 4 migration issues

With the introduction of Proxmox 4 a rather large change was introduced. The combination of KVM and OpenVZ was changed to KVM and LXC. The big upside of this is a recent kernel, changing up from 2.6.36 (released 2010) to 4.2 (currently).

But with the change from OpenVZ to LXC come some changes which make a migration less trivial (depending of the amount of features that are in use with OpenVZ).

The Proxmox Wiki has some good information on migrating from Proxmox 3 to Proxmox 4. I have a couple of issue's I ran into during a couple of migrations that I will share here. Maybe it is of use to someone else. Keep in mind I only run Debian containers, so other distributions might have other issue's.

Inotify

When running a larger amount of containers, it is quite possible you're going to run out of inotify instances. Default this is set to 128 (at least it was on my system). After the first migration I started to get into issues after restoring a couple of containers. The errors pointed to this setting. After increasing the value the problems went away:

sysctl -w fs.inotify.max_user_instances=8192

If this helps, don't forget to make it permanent in /etc/sysct.conf

Ping doesn't work

When running ping in an LXC container as a non-root user, I got the error:

mark:~$ ping www.google.com
ping: icmp open socket: Operation not permitted

As it turns out, no special capbilities were set on /bin/ping:

root:~# getcap /bin/ping

Normally ping has cap_net_raw+ep (net raw capabitlity, effective and permitted). After restoring this, all works again.

root:~# setcap cap_net_raw+ep /bin/ping
root:~# exit
logout
mark:~$ ping www.google.com
PING www.google.com (74.125.136.104) 56(84) bytes of data.
64 bytes from ea-in-f104.1e100.net (74.125.136.104): icmp_seq=1 ttl=49 time=11.2 ms
^C
--- www.google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 11.242/11.242/11.242/0.000 ms
mark:~$

The problem is that when using vzdump and vzrestore to migrate the containers from OpenVZ to LXC the capabilities get lost. So when migrating it can be very useful to generate a list of files which have extra capabilities so they can be restored later.

What I did on the in-place migrations was to create a dump of all capabilites for all containers before starting the migration. You'll need to have libcap2-bin installed on the Proxmox host.

apt-get install libcap2-bin
for ctid in $(vzlist -o ctid -H -S)
do
cd /var/lib/vz/private/$ctid
find . -type f -print0 | xargs -0 getcap >> /var/lib/vz/$ctid.cap
done

This generates a file with all files per container which require different capabilities. The files are placed in /var/lib/vz.

After finishing up the migration to Proxmox 4 and all containers are running, I run the following to restore the capabilities.

cd /var/lib/vz
for name in *.cap
do
ctid=${name/.cap/}
PID=$(lxc-info -n $ctid -p -H)
cd /proc/$PID/root
cat /var/lib/vz/$name | while read line
do
file=${line/ = */}
cap=${line/* = /}
echo "Setting capabilities $cap on $file"
setcap $cap $file
done
done

Processes not starting

In some containers running Debian 8 (systemd) I ran into a problem where some processes didn't start. After some debugging it turned out to be an issue with systemd settings for that application. The problem seems to be with the following setting:

PrivateTmp=True

I found this to be in use with powerdns. Changing it to false solves the problem.

vi /lib/systemd/system/pdns.service
Change:
PrivateTmp=True
To:
PrivateTmp=False

This has security implications, so make sure you're comfortable changing this.

Another systemd setting that can be an issue is:

NoNewPrivileges=false

Chaning this to true seems to help.

Device access

This one came along when migrating an OpenVPN container. It needs access to /dev/net/tun to create the tunnels. Setting this up is easey, but different from Openvz.

You need to create a shellscript that create the required devices when the container starts. The configuration of the container needs to refer to this script. I use the following code, of course replacing the CTID by the correct container id.

CTID=100
cat - > /var/lib/lxc/$CTID/devices.sh <<EOF
#!/bin/bash
cd ${LXC_ROOTFS_MOUNT}/dev
mkdir net
mknod net/tun c 10 200
chmod 0666 net/tun
EOF
echo "lxc.hook.autodev: /var/lib/lxc/$CTID/devices.sh" >> /etc/pve/lxc/$CTID.conf
echo "lxc.autodev: 1" > /etc/pve/lxc/$CTID.conf

After this change the container needs to be (re)started.