- LXC and configuration check
- LXC networking
- Configuring defaults
- Unprivileged containers
- Creating containers
- Checking a container
- Container networking: DHCP, Firewall, NAT
- Create another container
For development purposes I'm creating a docker swarm/kubernetes infrastructure. It will consist of two nodes running as full operating system LXC containers (running init). Once set up I'll install docker and kubernetes in these nodes. The LXC containers will run Debian images (
There are plenty of tutorials on how to get LXC configured and running. However, I'm following the official Debian guide on LXC as it covers details specific to Debian and its kernel version/configuration.
The containers will run in unprivileged mode. This means that the processes in containers that are run as root user are mapped to a regular user on the main host). This is achieved by subordinate user id (
man 5 subuid).
The containers' network intefaces will be "Virtual Ethernet Devices" (
man 5 veth). LXC will use a individual
lxcbr0 interface and all container interfaces will be added to this bridge. I opt for static IP addressing, so this will be configured in dnsmasq (using regular hardware addresses (MAC)).
LXC and configuration check
# apt install lxc libvirt0 libpam-cgfs bridge-utils uidmap
Depending on the system configuration, the above command may report missing cgroups.
In my case I had to update grub configuration and reboot.
Then update grub and reboot:
# update-grub # reboot
Once the system boots, check
I'm configuring lxc to use the separate bridge interface. This is controlled via
The bridge comes up after restarting
# /etc/init.d/lxc-net restart Restarting lxc-net (via systemctl): lxc-net.service.
The bridge interface should now be present.
# ifconfig lxcbr0 lxcbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 10.0.3.1 netmask 255.255.255.0 broadcast 10.0.3.255 ether 00:16:3e:00:00:00 txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
The system wide LXC configuration resides in
In my case it is as follows:
lxc.net.0.type = veth lxc.net.0.link = lxcbr0 lxc.net.0.flags = up lxc.apparmor.profile = generated lxc.apparmor.allow_nesting = 1
However, unprivileged containers are started as a regular user which which will have a custom (but similar) lxc configuration. In case of unprivileged containers the apparmor profile will have to be changed.
Adding a dedicated user
I'm adding a new user (
lxcuser). At the moment I'm not interested in logging in as this user, so I'm setting the shell to
/usr/bin/nologin. This users account will only be available to root via
su with a shell argument.
# useradd -s /usr/sbin/nologin -d /home/lxcuser --create-home lxcuser
Upon the Creation of a new user account, new uid and gid range are added in the system. Check:
# grep lxcuser /etc/subuid lxcuser:231072:65536 # grep lxcuser /etc/subgid lxcuser:231072:65536
The specific numbers depend on other user accounts. When there are no other user accounts, the first number will usually be
100000. I have other accounts in the system, so a different id map was allocated.
Allow the regular user to create virtual interfaces
# cat /etc/lxc/lxc-usernet lxcuser veth lxcbr0 10
lxcuser to create veth interfaces that will be added to
lxcbr0 (at most 10 of them).
Unprivileged userns clone (sysctl)
If unprivileged user namespaces are not enabled - update sysctl.conf.
LXC configuration (regular user)
Once the dedicated account has been created, switch to it using
# su - lxcuser -s /bin/bash
Custom user configuration for LXC resized in
~/.config/lxc/default.conf. It is very similar to the system-wide defaults, but aooarmor profile is changed and idmaps are added.
lxc.net.0.type = veth lxc.net.0.link = lxcbr0 lxc.net.0.flags = up lxc.apparmor.profile = unconfined lxc.apparmor.allow_nesting = 1 lxc.idmap = u 0 231072 65535 lxc.idmap = g 0 231072 65535
For uid/gid maps use the ids from
Switch to the dedicated user account:
# su - lxcuser -s /bin/bash
Create a container. If
lxc-create command fails with "Unable to fetch gpg key from keyserver" message, it means the GPG keyserver has to be configured.
This can be set as environment variable (
DOWNLOAD_KEYSERVER) or the server address may be passed as an option.
$ lxc-create --template download \ --name node-a \ -- \ --dist debian --release bookworm -a amd64 \ --keyserver hkps://keyserver.ubuntu.com:443
Start the container:
$ lxc-start node-a
$ lxc-attach node-a root@node-a:/#
root@node-a:/# grep NAME /etc/os-release PRETTY_NAME="Debian GNU/Linux bookworm/sid" NAME="Debian GNU/Linux"
Checking a container
These should be a full operating system container, so attach to it and confirm that the init is running.
root@node-a:/# ps 1 PID TTY STAT TIME COMMAND 1 ? Ss 0:00 /sbin/init
Processes run by root in a container should be mapped to
lxcuser on the main host. Execute a command in a container and check
ps output on the main host.
root@node-a:/# sleep infinity
# ps waux | grep infinity 231072 19926 0.0 0.0 5416 676 pts/4 S+ 15:03 0:00 sleep infinity
Indeed, although the process in a container runs as root, the process on the main host belongs to a regular user.
Container networking: DHCP, Firewall, NAT
If IPv6 support is not needed in a container, it can be turned off:
(root@container) # echo 'net.ipv6.conf.all.disable_ipv6=1' >> /etc/sysctl.conf (root@container) # echo 'net.ipv6.conf.default.disable_ipv6=1' >> /etc/sysctl.conf
By default, container IP addresses are assigned by DHCP.
DHCP is served by dnsmasq on the main host. Static container IPs can be configured as well.
I choose to set MAC addresses for individual containers and add corresponding entries in dnsmasq.
Example - setting MAC address for
lxc.net.0.hwaddr = 00:00:00:00:00:0a
Entries in dnsmasq (file
/etc/dnsmasq.conf, main host):
domain-needed bogus-priv except-interface=wlan0 expand-hosts dhcp-range=lxc,10.0.3.100,10.0.3.200,12h dhcp-option=lxc,option:router,10.0.3.1 dhcp-host=00:00:00:00:00:0a,node-a,10.0.3.100 dhcp-host=00:00:00:00:00:0b,node-b,10.0.3.101 log-queries log-dhcp conf-dir=/etc/dnsmasq.d/,*.conf
Restart the container:
$ lxc-stop node-a $ lxc-start node-a
$ lxc-ls -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED node-a RUNNING 0 - 10.0.3.100 - true
A permissive iptables setup (main host):
iptables -t filter -A INPUT -i lxcbr0 -j ACCEPT iptables -t filter -A FORWARD -i lxcbr0 -j ACCEPT iptables -t filter -A FORWARD -o lxcbr0 -j ACCEPT iptables -t filter -A OUTPUT -o lxcbr0 -j ACCEPT iptables -t nat -A POSTROUTING -s 10.0.3.0/16 ! -d 10.0.3.0/16 -j MASQUERADE
Create another container
$ lxc-create --template download --name node-b \ -- \ --dist debian --release bookworm -a amd64 \ --keyserver hkps://keyserver.ubuntu.com:443
Set hardware address for the container's ethernet device (in
[...] # Network configuration lxc.net.0.type = veth lxc.net.0.link = lxcbr0 lxc.net.0.hwaddr = 00:00:00:00:00:0b lxc.net.0.flags = up [...]
Start and inspect the container.
$ lxc-start node-b $ lxc-ls -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED node-a RUNNING 0 - 10.0.3.100 - true node-b RUNNING 0 - 10.0.3.101 - true
If the MAC address was different than the one configured in dnsmasq, the container would get assigned a different IP.
DNS for the containers is handled by dnsmasq, so the nodes should be able to resolve the hostnames.
Ping test: node-a -> node-b.
root@node-a:/# ping node-b -c 1 PING node-b (10.0.3.101) 56(84) bytes of data. 64 bytes from node-b (10.0.3.101): icmp_seq=1 ttl=64 time=0.128 ms --- node-b ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.128/0.128/0.128/0.000 ms