1.. SPDX-License-Identifier: GPL-2.0 2 3######### 4UML HowTo 5######### 6 7.. contents:: :local: 8 9************ 10Introduction 11************ 12 13Welcome to User Mode Linux 14 15User Mode Linux is the first Open Source virtualization platform (first 16release date 1991) and second virtualization platform for an x86 PC. 17 18How is UML Different from a VM using Virtualization package X? 19============================================================== 20 21We have come to assume that virtualization also means some level of 22hardware emulation. In fact, it does not. As long as a virtualization 23package provides the OS with devices which the OS can recognize and 24has a driver for, the devices do not need to emulate real hardware. 25Most OSes today have built-in support for a number of "fake" 26devices used only under virtualization. 27User Mode Linux takes this concept to the ultimate extreme - there 28is not a single real device in sight. It is 100% artificial or if 29we use the correct term 100% paravirtual. All UML devices are abstract 30concepts which map onto something provided by the host - files, sockets, 31pipes, etc. 32 33The other major difference between UML and various virtualization 34packages is that there is a distinct difference between the way the UML 35kernel and the UML programs operate. 36The UML kernel is just a process running on Linux - same as any other 37program. It can be run by an unprivileged user and it does not require 38anything in terms of special CPU features. 39The UML userspace, however, is a bit different. The Linux kernel on the 40host machine assists UML in intercepting everything the program running 41on a UML instance is trying to do and making the UML kernel handle all 42of its requests. 43This is different from other virtualization packages which do not make any 44difference between the guest kernel and guest programs. This difference 45results in a number of advantages and disadvantages of UML over let's say 46QEMU which we will cover later in this document. 47 48 49Why Would I Want User Mode Linux? 50================================= 51 52 53* If User Mode Linux kernel crashes, your host kernel is still fine. It 54 is not accelerated in any way (vhost, kvm, etc) and it is not trying to 55 access any devices directly. It is, in fact, a process like any other. 56 57* You can run a usermode kernel as a non-root user (you may need to 58 arrange appropriate permissions for some devices). 59 60* You can run a very small VM with a minimal footprint for a specific 61 task (for example 32M or less). 62 63* You can get extremely high performance for anything which is a "kernel 64 specific task" such as forwarding, firewalling, etc while still being 65 isolated from the host kernel. 66 67* You can play with kernel concepts without breaking things. 68 69* You are not bound by "emulating" hardware, so you can try weird and 70 wonderful concepts which are very difficult to support when emulating 71 real hardware such as time travel and making your system clock 72 dependent on what UML does (very useful for things like tests). 73 74* It's fun. 75 76Why not to run UML 77================== 78 79* The syscall interception technique used by UML makes it inherently 80 slower for any userspace applications. While it can do kernel tasks 81 on par with most other virtualization packages, its userspace is 82 **slow**. The root cause is that UML has a very high cost of creating 83 new processes and threads (something most Unix/Linux applications 84 take for granted). 85 86* UML is strictly uniprocessor at present. If you want to run an 87 application which needs many CPUs to function, it is clearly the 88 wrong choice. 89 90*********************** 91Building a UML instance 92*********************** 93 94There is no UML installer in any distribution. While you can use off 95the shelf install media to install into a blank VM using a virtualization 96package, there is no UML equivalent. You have to use appropriate tools on 97your host to build a viable filesystem image. 98 99This is extremely easy on Debian - you can do it using debootstrap. It is 100also easy on OpenWRT - the build process can build UML images. All other 101distros - YMMV. 102 103Creating an image 104================= 105 106Create a sparse raw disk image:: 107 108 # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G 109 110This will create a 16G disk image. The OS will initially allocate only one 111block and will allocate more as they are written by UML. As of kernel 112version 4.19 UML fully supports TRIM (as usually used by flash drives). 113Using TRIM inside the UML image by specifying discard as a mount option 114or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to 115return any unused blocks to the OS. 116 117Create a filesystem on the disk image and mount it:: 118 119 # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt 120 121This example uses ext4, any other filesystem such as ext3, btrfs, xfs, 122jfs, etc will work too. 123 124Create a minimal OS installation on the mounted filesystem:: 125 126 # debootstrap buster /mnt http://deb.debian.org/debian 127 128debootstrap does not set up the root password, fstab, hostname or 129anything related to networking. It is up to the user to do that. 130 131Set the root password - the easiest way to do that is to chroot into the 132mounted image:: 133 134 # chroot /mnt 135 # passwd 136 # exit 137 138Edit key system files 139===================== 140 141UML block devices are called ubds. The fstab created by debootstrap 142will be empty and it needs an entry for the root file system:: 143 144 /dev/ubd0 ext4 discard,errors=remount-ro 0 1 145 146The image hostname will be set to the same as the host on which you 147are creating its image. It is a good idea to change that to avoid 148"Oh, bummer, I rebooted the wrong machine". 149 150UML supports vector I/O high performance network devices which have 151support for some standard virtual network encapsulations like 152Ethernet over GRE and Ethernet over L2TPv3. These are called vecX. 153 154When vector network devices are in use, ``/etc/network/interfaces`` 155will need entries like:: 156 157 # vector UML network devices 158 auto vec0 159 iface vec0 inet dhcp 160 161We now have a UML image which is nearly ready to run, all we need is a 162UML kernel and modules for it. 163 164Most distributions have a UML package. Even if you intend to use your own 165kernel, testing the image with a stock one is always a good start. These 166packages come with a set of modules which should be copied to the target 167filesystem. The location is distribution dependent. For Debian these 168reside under /usr/lib/uml/modules. Copy recursively the content of this 169directory to the mounted UML filesystem:: 170 171 # cp -rax /usr/lib/uml/modules /mnt/lib/modules 172 173If you have compiled your own kernel, you need to use the usual "install 174modules to a location" procedure by running:: 175 176 # make INSTALL_MOD_PATH=/mnt/lib/modules modules_install 177 178This will install modules into /mnt/lib/modules/$(KERNELRELEASE). 179To specify the full module installation path, use:: 180 181 # make MODLIB=/mnt/lib/modules modules_install 182 183At this point the image is ready to be brought up. 184 185************************* 186Setting Up UML Networking 187************************* 188 189UML networking is designed to emulate an Ethernet connection. This 190connection may be either point-to-point (similar to a connection 191between machines using a back-to-back cable) or a connection to a 192switch. UML supports a wide variety of means to build these 193connections to all of: local machine, remote machine(s), local and 194remote UML and other VM instances. 195 196 197+-----------+--------+------------------------------------+------------+ 198| Transport | Type | Capabilities | Throughput | 199+===========+========+====================================+============+ 200| tap | vector | checksum, tso | > 8Gbit | 201+-----------+--------+------------------------------------+------------+ 202| hybrid | vector | checksum, tso, multipacket rx | > 6GBit | 203+-----------+--------+------------------------------------+------------+ 204| raw | vector | checksum, tso, multipacket rx, tx" | > 6GBit | 205+-----------+--------+------------------------------------+------------+ 206| EoGRE | vector | multipacket rx, tx | > 3Gbit | 207+-----------+--------+------------------------------------+------------+ 208| Eol2tpv3 | vector | multipacket rx, tx | > 3Gbit | 209+-----------+--------+------------------------------------+------------+ 210| bess | vector | multipacket rx, tx | > 3Gbit | 211+-----------+--------+------------------------------------+------------+ 212| fd | vector | dependent on fd type | varies | 213+-----------+--------+------------------------------------+------------+ 214| vde | vector | dep. on VDE VPN: Virt.Net Locator | varies | 215+-----------+--------+------------------------------------+------------+ 216 217* All transports which have tso and checksum offloads can deliver speeds 218 approaching 10G on TCP streams. 219 220* All transports which have multi-packet rx and/or tx can deliver pps 221 rates of up to 1Mps or more. 222 223* GRE and L2TPv3 allow connections to all of: local machine, remote 224 machines, remote network devices and remote UML instances. 225 226 227Network configuration privileges 228================================ 229 230The majority of the supported networking modes need ``root`` privileges. 231For example, for vector transports, ``root`` privilege is required to fire 232an ioctl to setup the tun interface and/or use raw sockets where needed. 233 234This can be achieved by granting the user a particular capability instead 235of running UML as root. In case of vector transport, a user can add the 236capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW`` to the uml binary. 237Thenceforth, UML can be run with normal user privilges, along with 238full networking. 239 240For example:: 241 242 # sudo setcap cap_net_raw,cap_net_admin+ep linux 243 244Configuring vector transports 245=============================== 246 247All vector transports support a similar syntax: 248 249If X is the interface number as in vec0, vec1, vec2, etc, the general 250syntax for options is:: 251 252 vecX:transport="Transport Name",option=value,option=value,...,option=value 253 254Common options 255-------------- 256 257These options are common for all transports: 258 259* ``depth=int`` - sets the queue depth for vector IO. This is the 260 amount of packets UML will attempt to read or write in a single 261 system call. The default number is 64 and is generally sufficient 262 for most applications that need throughput in the 2-4 Gbit range. 263 Higher speeds may require larger values. 264 265* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value. 266 267* ``gro=[0,1]`` - sets GRO off or on. Enables receive/transmit offloads. 268 The effect of this option depends on the host side support in the transport 269 which is being configured. In most cases it will enable TCP segmentation and 270 RX/TX checksumming offloads. The setting must be identical on the host side 271 and the UML side. The UML kernel will produce warnings if it is not. 272 For example, GRO is enabled by default on local machine interfaces 273 (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the 274 corresponding UML transports (raw, tap, hybrid) in order for networking to 275 operate correctly. 276 277* ``mtu=int`` - sets the interface MTU 278 279* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved 280 if a packet will need to be re-encapsulated into for instance VXLAN. 281 282* ``vec=0`` - disable multipacket IO and fall back to packet at a 283 time mode 284 285Shared Options 286-------------- 287 288* ``ifname=str`` Transports which bind to a local network interface 289 have a shared option - the name of the interface to bind to. 290 291* ``src, dst, src_port, dst_port`` - all transports which use sockets 292 which have the notion of source and destination and/or source port 293 and destination port use these to specify them. 294 295* ``v6=[0,1]`` to specify if a v6 connection is desired for all 296 transports which operate over IP. Additionally, for transports that 297 have some differences in the way they operate over v4 and v6 (for example 298 EoL2TPv3), sets the correct mode of operation. In the absence of this 299 option, the socket type is determined based on what do the src and dst 300 arguments resolve/parse to. 301 302tap transport 303------------- 304 305Example:: 306 307 vecX:transport=tap,ifname=tap0,depth=128,gro=1 308 309This will connect vec0 to tap0 on the host. Tap0 must already exist (for example 310created using tunctl) and UP. 311 312tap0 can be configured as a point-to-point interface and given an IP 313address so that UML can talk to the host. Alternatively, it is possible 314to connect UML to a tap interface which is connected to a bridge. 315 316While tap relies on the vector infrastructure, it is not a true vector 317transport at this point, because Linux does not support multi-packet 318IO on tap file descriptors for normal userspace apps like UML. This 319is a privilege which is offered only to something which can hook up 320to it at kernel level via specialized interfaces like vhost-net. A 321vhost-net like helper for UML is planned at some point in the future. 322 323Privileges required: tap transport requires either: 324 325* tap interface to exist and be created persistent and owned by the 326 UML user using tunctl. Example ``tunctl -u uml-user -t tap0`` 327 328* binary to have ``CAP_NET_ADMIN`` privilege 329 330hybrid transport 331---------------- 332 333Example:: 334 335 vecX:transport=hybrid,ifname=tap0,depth=128,gro=1 336 337This is an experimental/demo transport which couples tap for transmit 338and a raw socket for receive. The raw socket allows multi-packet 339receive resulting in significantly higher packet rates than normal tap. 340 341Privileges required: hybrid requires ``CAP_NET_RAW`` capability by 342the UML user as well as the requirements for the tap transport. 343 344raw socket transport 345-------------------- 346 347Example:: 348 349 vecX:transport=raw,ifname=p-veth0,depth=128,gro=1 350 351 352This transport uses vector IO on raw sockets. While you can bind to any 353interface including a physical one, the most common use it to bind to 354the "peer" side of a veth pair with the other side configured on the 355host. 356 357Example host configuration for Debian: 358 359**/etc/network/interfaces**:: 360 361 auto veth0 362 iface veth0 inet static 363 address 192.168.4.1 364 netmask 255.255.255.252 365 broadcast 192.168.4.3 366 pre-up ip link add veth0 type veth peer name p-veth0 && \ 367 ifconfig p-veth0 up 368 369UML can now bind to p-veth0 like this:: 370 371 vec0:transport=raw,ifname=p-veth0,depth=128,gro=1 372 373 374If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0 375it can talk to the host on 192.168.4.1 376 377The raw transport also provides some support for offloading some of the 378filtering to the host. The two options to control it are: 379 380* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter 381 382* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux. 383 This option allows the use of the ethtool load firmware command to 384 load bpf code. 385 386In either case the bpf code is loaded into the host kernel. While this is 387presently limited to legacy bpf syntax (not ebpf), it is still a security 388risk. It is not recommended to allow this unless the User Mode Linux 389instance is considered trusted. 390 391Privileges required: raw socket transport requires `CAP_NET_RAW` 392capability. 393 394GRE socket transport 395-------------------- 396 397Example:: 398 399 vecX:transport=gre,src=$src_host,dst=$dst_host 400 401 402This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or 403``GREIRB``) tunnel which will connect the UML instance to a ``GRE`` 404endpoint at host dst_host. ``GRE`` supports the following additional 405options: 406 407* ``rx_key=int`` - GRE 32-bit integer key for rx packets, if set, 408 ``txkey`` must be set too 409 410* ``tx_key=int`` - GRE 32-bit integer key for tx packets, if set 411 ``rx_key`` must be set too 412 413* ``sequence=[0,1]`` - enable GRE sequence 414 415* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset 416 on each packet (needed to interoperate with some really broken 417 implementations) 418 419* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively 420 421* GRE checksum is not presently supported 422 423GRE has a number of caveats: 424 425* You can use only one GRE connection per IP address. There is no way to 426 multiplex connections as each GRE tunnel is terminated directly on 427 the UML instance. 428 429* The key is not really a security feature. While it was intended as such 430 its "security" is laughable. It is, however, a useful feature to 431 ensure that the tunnel is not misconfigured. 432 433An example configuration for a Linux host with a local address of 434192.168.128.1 to connect to a UML instance at 192.168.129.1 435 436**/etc/network/interfaces**:: 437 438 auto gt0 439 iface gt0 inet static 440 address 10.0.0.1 441 netmask 255.255.255.0 442 broadcast 10.0.0.255 443 mtu 1500 444 pre-up ip link add gt0 type gretap local 192.168.128.1 \ 445 remote 192.168.129.1 || true 446 down ip link del gt0 || true 447 448Additionally, GRE has been tested versus a variety of network equipment. 449 450Privileges required: GRE requires ``CAP_NET_RAW`` 451 452l2tpv3 socket transport 453----------------------- 454 455_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more 456options than GNU ls". While it has some advantages, there are usually 457easier (and less verbose) ways to connect a UML instance to something. 458For example, most devices which support L2TPv3 also support GRE. 459 460Example:: 461 462 vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff 463 464This will configure an Ethernet over L2TPv3 fixed tunnel which will 465connect the UML instance to a L2TPv3 endpoint at host $dst_host using 466the L2TPv3 UDP flavour and UDP destination port $dst_port. 467 468L2TPv3 always requires the following additional options: 469 470* ``rx_session=int`` - l2tpv3 32-bit integer session for rx packets 471 472* ``tx_session=int`` - l2tpv3 32-bit integer session for tx packets 473 474As the tunnel is fixed these are not negotiated and they are 475preconfigured on both ends. 476 477Additionally, L2TPv3 supports the following optional parameters. 478 479* ``rx_cookie=int`` - l2tpv3 32-bit integer cookie for rx packets - same 480 functionality as GRE key, more to prevent misconfiguration than provide 481 actual security 482 483* ``tx_cookie=int`` - l2tpv3 32-bit integer cookie for tx packets 484 485* ``cookie64=[0,1]`` - use 64-bit cookies instead of 32-bit. 486 487* ``counter=[0,1]`` - enable l2tpv3 counter 488 489* ``pin_counter=[0,1]`` - pretend that the counter is always reset on 490 each packet (needed to interoperate with some really broken 491 implementations) 492 493* ``v6=[0,1]`` - force v6 sockets 494 495* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol 496 497L2TPv3 has a number of caveats: 498 499* you can use only one connection per IP address in raw mode. There is 500 no way to multiplex connections as each L2TPv3 tunnel is terminated 501 directly on the UML instance. UDP mode can use different ports for 502 this purpose. 503 504Here is an example of how to configure a Linux host to connect to UML 505via L2TPv3: 506 507**/etc/network/interfaces**:: 508 509 auto l2tp1 510 iface l2tp1 inet static 511 address 192.168.126.1 512 netmask 255.255.255.0 513 broadcast 192.168.126.255 514 mtu 1500 515 pre-up ip l2tp add tunnel remote 127.0.0.1 \ 516 local 127.0.0.1 encap udp tunnel_id 2 \ 517 peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \ 518 ip l2tp add session name l2tp1 tunnel_id 2 \ 519 session_id 0xffffffff peer_session_id 0xffffffff 520 down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \ 521 ip l2tp del tunnel tunnel_id 2 522 523 524Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and 525no special privileges for the UDP mode. 526 527BESS socket transport 528--------------------- 529 530BESS is a high performance modular network switch. 531 532https://github.com/NetSys/bess 533 534It has support for a simple sequential packet socket mode which in the 535more recent versions is using vector IO for high performance. 536 537Example:: 538 539 vecX:transport=bess,src=$unix_src,dst=$unix_dst 540 541This will configure a BESS transport using the unix_src Unix domain 542socket address as source and unix_dst socket address as destination. 543 544For BESS configuration and how to allocate a BESS Unix domain socket port 545please see the BESS documentation. 546 547https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports 548 549BESS transport does not require any special privileges. 550 551VDE vector transport 552-------------------- 553 554Virtual Distributed Ethernet (VDE) is a project whose main goal is to provide a 555highly flexible support for virtual networking. 556 557http://wiki.virtualsquare.org/#/tutorials/vdebasics 558 559Common usages of VDE include fast prototyping and teaching. 560 561Examples: 562 563 ``vecX:transport=vde,vnl=tap://tap0`` 564 565use tap0 566 567 ``vecX:transport=vde,vnl=slirp://`` 568 569use slirp 570 571 ``vec0:transport=vde,vnl=vde:///tmp/switch`` 572 573connect to a vde switch 574 575 ``vecX:transport=\"vde,vnl=cmd://ssh remote.host //tmp/sshlirp\"`` 576 577connect to a remote slirp (instant VPN: convert ssh to VPN, it uses sshlirp) 578https://github.com/virtualsquare/sshlirp 579 580 ``vec0:transport=vde,vnl=vxvde://234.0.0.1`` 581 582connect to a local area cloud (all the UML nodes using the same 583multicast address running on hosts in the same multicast domain (LAN) 584will be automagically connected together to a virtual LAN. 585 586*********** 587Running UML 588*********** 589 590This section assumes that either the user-mode-linux package from the 591distribution or a custom built kernel has been installed on the host. 592 593These add an executable called linux to the system. This is the UML 594kernel. It can be run just like any other executable. 595It will take most normal linux kernel arguments as command line 596arguments. Additionally, it will need some UML-specific arguments 597in order to do something useful. 598 599Arguments 600========= 601 602Mandatory Arguments: 603-------------------- 604 605* ``mem=int[K,M,G]`` - amount of memory. By default in bytes. It will 606 also accept K, M or G qualifiers. 607 608* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really 609 mandatory, but it is likely to be needed in nearly all cases so we can 610 specify a root file system. 611 The simplest possible image specification is the name of the image 612 file for the filesystem (created using one of the methods described 613 in `Creating an image`_). 614 615 * UBD devices support copy on write (COW). The changes are kept in 616 a separate file which can be discarded allowing a rollback to the 617 original pristine image. If COW is desired, the UBD image is 618 specified as: ``cow_file,master_image``. 619 Example:``ubd0=Filesystem.cow,Filesystem.img`` 620 621 * UBD devices can be set to use synchronous IO. Any writes are 622 immediately flushed to disk. This is done by adding ``s`` after 623 the ``ubdX`` specification. 624 625 * UBD performs some heuristics on devices specified as a single 626 filename to make sure that a COW file has not been specified as 627 the image. To turn them off, use the ``d`` flag after ``ubdX``. 628 629 * UBD supports TRIM - asking the Host OS to reclaim any unused 630 blocks in the image. To turn it off, specify the ``t`` flag after 631 ``ubdX``. 632 633* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux 634 filesystem image) 635 636Important Optional Arguments 637---------------------------- 638 639If UML is run as "linux" with no extra arguments, it will try to start an 640xterm for every console configured inside the image (up to 6 in most 641Linux distributions). Each console is started inside an 642xterm. This makes it nice and easy to use UML on a host with a GUI. It is, 643however, the wrong approach if UML is to be used as a testing harness or run 644in a text-only environment. 645 646In order to change this behaviour we need to specify an alternative console 647and wire it to one of the supported "line" channels. For this we need to map a 648console to use something different from the default xterm. 649 650Example which will divert console number 1 to stdin/stdout:: 651 652 con1=fd:0,fd:1 653 654UML supports a wide variety of serial line channels which are specified using 655the following syntax 656 657 conX=channel_type:options[,channel_type:options] 658 659 660If the channel specification contains two parts separated by comma, the first 661one is input, the second one output. 662 663* The null channel - Discard all input or output. Example ``con=null`` will set 664 all consoles to null by default. 665 666* The fd channel - use file descriptor numbers for input/output. Example: 667 ``con1=fd:0,fd:1.`` 668 669* The port channel - start a telnet server on TCP port number. Example: 670 ``con1=port:4321``. The host must have /usr/sbin/in.telnetd (usually part of 671 a telnetd package) and the port-helper from the UML utilities (see the 672 information for the xterm channel below). UML will not boot until a client 673 connects. 674 675* The pty and pts channels - use system pty/pts. 676 677* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8`` 678 will make UML use the host 8th console (usually unused). 679 680* The xterm channel - this is the default - bring up an xterm on this channel 681 and direct IO to it. Note that in order for xterm to work, the host must 682 have the UML distribution package installed. This usually contains the 683 port-helper and other utilities needed for UML to communicate with the xterm. 684 Alternatively, these need to be complied and installed from source. All 685 options applicable to consoles also apply to UML serial lines which are 686 presented as ttyS inside UML. 687 688Starting UML 689============ 690 691We can now run UML. 692:: 693 694 # linux mem=2048M umid=TEST \ 695 ubd0=Filesystem.img \ 696 vec0:transport=tap,ifname=tap0,depth=128,gro=1 \ 697 root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1 698 699This will run an instance with ``2048M RAM`` and try to use the image file 700called ``Filesystem.img`` as root. It will connect to the host using tap0. 701All consoles except ``con1`` will be disabled and console 1 will 702use standard input/output making it appear in the same terminal it was started. 703 704Logging in 705============ 706 707If you have not set up a password when generating the image, you will have to 708shut down the UML instance, mount the image, chroot into it and set it - as 709described in the Generating an Image section. If the password is already set, 710you can just log in. 711 712The UML Management Console 713============================ 714 715In addition to managing the image from "the inside" using normal sysadmin tools, 716it is possible to perform a number of low-level operations using the UML 717management console. The UML management console is a low-level interface to the 718kernel on a running UML instance, somewhat like the i386 SysRq interface. Since 719there is a full-blown operating system under UML, there is much greater 720flexibility possible than with the SysRq mechanism. 721 722There are a number of things you can do with the mconsole interface: 723 724* get the kernel version 725* add and remove devices 726* halt or reboot the machine 727* Send SysRq commands 728* Pause and resume the UML 729* Inspect processes running inside UML 730* Inspect UML internal /proc state 731 732You need the mconsole client (uml\_mconsole) which is a part of the UML 733tools package available in most Linux distritions. 734 735You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML 736kernel. When you boot UML, you'll see a line like:: 737 738 mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole 739 740If you specify a unique machine id on the UML command line, i.e. 741``umid=debian``, you'll see this:: 742 743 mconsole initialized on /home/jdike/.uml/debian/mconsole 744 745 746That file is the socket that uml_mconsole will use to communicate with 747UML. Run it with either the umid or the full path as its argument:: 748 749 # uml_mconsole debian 750 751or 752 753 # uml_mconsole /home/jdike/.uml/debian/mconsole 754 755 756You'll get a prompt, at which you can run one of these commands: 757 758* version 759* help 760* halt 761* reboot 762* config 763* remove 764* sysrq 765* help 766* cad 767* stop 768* go 769* proc 770* stack 771 772version 773------- 774 775This command takes no arguments. It prints the UML version:: 776 777 (mconsole) version 778 OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64 779 780 781There are a couple actual uses for this. It's a simple no-op which 782can be used to check that a UML is running. It's also a way of 783sending a device interrupt to the UML. UML mconsole is treated internally as 784a UML device. 785 786help 787---- 788 789This command takes no arguments. It prints a short help screen with the 790supported mconsole commands. 791 792 793halt and reboot 794--------------- 795 796These commands take no arguments. They shut the machine down immediately, with 797no syncing of disks and no clean shutdown of userspace. So, they are 798pretty close to crashing the machine:: 799 800 (mconsole) halt 801 OK 802 803config 804------ 805 806"config" adds a new device to the virtual machine. This is supported 807by most UML device drivers. It takes one argument, which is the 808device to add, with the same syntax as the kernel command line:: 809 810 (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22 811 812remove 813------ 814 815"remove" deletes a device from the system. Its argument is just the 816name of the device to be removed. The device must be idle in whatever 817sense the driver considers necessary. In the case of the ubd driver, 818the removed block device must not be mounted, swapped on, or otherwise 819open, and in the case of the network driver, the device must be down:: 820 821 (mconsole) remove ubd3 822 823sysrq 824----- 825 826This command takes one argument, which is a single letter. It calls the 827generic kernel's SysRq driver, which does whatever is called for by 828that argument. See the SysRq documentation in 829Documentation/admin-guide/sysrq.rst in your favorite kernel tree to 830see what letters are valid and what they do. 831 832cad 833--- 834 835This invokes the ``Ctl-Alt-Del`` action in the running image. What exactly 836this ends up doing is up to init, systemd, etc. Normally, it reboots the 837machine. 838 839stop 840---- 841 842This puts the UML in a loop reading mconsole requests until a 'go' 843mconsole command is received. This is very useful as a 844debugging/snapshotting tool. 845 846go 847-- 848 849This resumes a UML after being paused by a 'stop' command. Note that 850when the UML has resumed, TCP connections may have timed out and if 851the UML is paused for a long period of time, crond might go a little 852crazy, running all the jobs it didn't do earlier. 853 854proc 855---- 856 857This takes one argument - the name of a file in /proc which is printed 858to the mconsole standard output 859 860stack 861----- 862 863This takes one argument - the pid number of a process. Its stack is 864printed to a standard output. 865 866******************* 867Advanced UML Topics 868******************* 869 870Sharing Filesystems between Virtual Machines 871============================================ 872 873Don't attempt to share filesystems simply by booting two UMLs from the 874same file. That's the same thing as booting two physical machines 875from a shared disk. It will result in filesystem corruption. 876 877Using layered block devices 878--------------------------- 879 880The way to share a filesystem between two virtual machines is to use 881the copy-on-write (COW) layering capability of the ubd block driver. 882Any changed blocks are stored in the private COW file, while reads come 883from either device - the private one if the requested block is valid in 884it, the shared one if not. Using this scheme, the majority of data 885which is unchanged is shared between an arbitrary number of virtual 886machines, each of which has a much smaller file containing the changes 887that it has made. With a large number of UMLs booting from a large root 888filesystem, this leads to a huge disk space saving. 889 890Sharing file system data will also help performance, since the host will 891be able to cache the shared data using a much smaller amount of memory, 892so UML disk requests will be served from the host's memory rather than 893its disks. There is a major caveat in doing this on multisocket NUMA 894machines. On such hardware, running many UML instances with a shared 895master image and COW changes may cause issues like NMIs from excess of 896inter-socket traffic. 897 898If you are running UML on high-end hardware like this, make sure to 899bind UML to a set of logical CPUs residing on the same socket using the 900``taskset`` command or have a look at the "tuning" section. 901 902To add a copy-on-write layer to an existing block device file, simply 903add the name of the COW file to the appropriate ubd switch:: 904 905 ubd0=root_fs_cow,root_fs_debian_22 906 907where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is 908the existing shared filesystem. The COW file need not exist. If it 909doesn't, the driver will create and initialize it. 910 911Disk Usage 912---------- 913 914UML has TRIM support which will release any unused space in its disk 915image files to the underlying OS. It is important to use either ls -ls 916or du to verify the actual file size. 917 918COW validity. 919------------- 920 921Any changes to the master image will invalidate all COW files. If this 922happens, UML will *NOT* automatically delete any of the COW files and 923will refuse to boot. In this case the only solution is to either 924restore the old image (including its last modified timestamp) or remove 925all COW files which will result in their recreation. Any changes in 926the COW files will be lost. 927 928Cows can moo - uml_moo : Merging a COW file with its backing file 929----------------------------------------------------------------- 930 931Depending on how you use UML and COW devices, it may be advisable to 932merge the changes in the COW file into the backing file every once in 933a while. 934 935The utility that does this is uml_moo. Its usage is:: 936 937 uml_moo COW_file new_backing_file 938 939 940There's no need to specify the backing file since that information is 941already in the COW file header. If you're paranoid, boot the new 942merged file, and if you're happy with it, move it over the old backing 943file. 944 945``uml_moo`` creates a new backing file by default as a safety measure. 946It also has a destructive merge option which will merge the COW file 947directly into its current backing file. This is really only usable 948when the backing file only has one COW file associated with it. If 949there are multiple COWs associated with a backing file, a -d merge of 950one of them will invalidate all of the others. However, it is 951convenient if you're short of disk space, and it should also be 952noticeably faster than a non-destructive merge. 953 954``uml_moo`` is installed with the UML distribution packages and is 955available as a part of UML utilities. 956 957Host file access 958================== 959 960If you want to access files on the host machine from inside UML, you 961can treat it as a separate machine and either nfs mount directories 962from the host or copy files into the virtual machine with scp. 963However, since UML is running on the host, it can access those 964files just like any other process and make them available inside the 965virtual machine without the need to use the network. 966This is possible with the hostfs virtual filesystem. With it, you 967can mount a host directory into the UML filesystem and access the 968files contained in it just as you would on the host. 969 970*SECURITY WARNING* 971 972Hostfs without any parameters to the UML Image will allow the image 973to mount any part of the host filesystem and write to it. Always 974confine hostfs to a specific "harmless" directory (for example ``/var/tmp``) 975if running UML. This is especially important if UML is being run as root. 976 977Using hostfs 978------------ 979 980To begin with, make sure that hostfs is available inside the virtual 981machine with:: 982 983 # cat /proc/filesystems 984 985``hostfs`` should be listed. If it's not, either rebuild the kernel 986with hostfs configured into it or make sure that hostfs is built as a 987module and available inside the virtual machine, and insmod it. 988 989 990Now all you need to do is run mount:: 991 992 # mount none /mnt/host -t hostfs 993 994will mount the host's ``/`` on the virtual machine's ``/mnt/host``. 995If you don't want to mount the host root directory, then you can 996specify a subdirectory to mount with the -o switch to mount:: 997 998 # mount none /mnt/home -t hostfs -o /home 999 1000will mount the host's /home on the virtual machine's /mnt/home. 1001 1002hostfs as the root filesystem 1003----------------------------- 1004 1005It's possible to boot from a directory hierarchy on the host using 1006hostfs rather than using the standard filesystem in a file. 1007To start, you need that hierarchy. The easiest way is to loop mount 1008an existing root_fs file:: 1009 1010 # mount root_fs uml_root_dir -o loop 1011 1012 1013You need to change the filesystem type of ``/`` in ``etc/fstab`` to be 1014'hostfs', so that line looks like this:: 1015 1016 /dev/ubd/0 / hostfs defaults 1 1 1017 1018Then you need to chown to yourself all the files in that directory 1019that are owned by root. This worked for me:: 1020 1021 # find . -uid 0 -exec chown jdike {} \; 1022 1023Next, make sure that your UML kernel has hostfs compiled in, not as a 1024module. Then run UML with the boot device pointing at that directory:: 1025 1026 ubd0=/path/to/uml/root/directory 1027 1028UML should then boot as it does normally. 1029 1030Hostfs Caveats 1031-------------- 1032 1033Hostfs does not support keeping track of host filesystem changes on the 1034host (outside UML). As a result, if a file is changed without UML's 1035knowledge, UML will not know about it and its own in-memory cache of 1036the file may be corrupt. While it is possible to fix this, it is not 1037something which is being worked on at present. 1038 1039Tuning UML 1040============ 1041 1042UML at present is strictly uniprocessor. It will, however spin up a 1043number of threads to handle various functions. 1044 1045The UBD driver, SIGIO and the MMU emulation do that. If the system is 1046idle, these threads will be migrated to other processors on a SMP host. 1047This, unfortunately, will usually result in LOWER performance because of 1048all of the cache/memory synchronization traffic between cores. As a 1049result, UML will usually benefit from being pinned on a single CPU, 1050especially on a large system. This can result in performance differences 1051of 5 times or higher on some benchmarks. 1052 1053Similarly, on large multi-node NUMA systems UML will benefit if all of 1054its memory is allocated from the same NUMA node it will run on. The 1055OS will *NOT* do that by default. In order to do that, the sysadmin 1056needs to create a suitable tmpfs ramdisk bound to a particular node 1057and use that as the source for UML RAM allocation by specifying it 1058in the TMP or TEMP environment variables. UML will look at the values 1059of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will 1060look for shmfs mounted under ``/dev/shm``. If everything else fails use 1061``/tmp/`` regardless of the filesystem type used for it:: 1062 1063 mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX 1064 TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options.. 1065 1066******************************************* 1067Contributing to UML and Developing with UML 1068******************************************* 1069 1070UML is an excellent platform to develop new Linux kernel concepts - 1071filesystems, devices, virtualization, etc. It provides unrivalled 1072opportunities to create and test them without being constrained to 1073emulating specific hardware. 1074 1075Example - want to try how Linux will work with 4096 "proper" network 1076devices? 1077 1078Not an issue with UML. At the same time, this is something which 1079is difficult with other virtualization packages - they are 1080constrained by the number of devices allowed on the hardware bus 1081they are trying to emulate (for example 16 on a PCI bus in qemu). 1082 1083If you have something to contribute such as a patch, a bugfix, a 1084new feature, please send it to ``linux-um@lists.infradead.org``. 1085 1086Please follow all standard Linux patch guidelines such as cc-ing 1087relevant maintainers and run ``./scripts/checkpatch.pl`` on your patch. 1088For more details see ``Documentation/process/submitting-patches.rst`` 1089 1090Note - the list does not accept HTML or attachments, all emails must 1091be formatted as plain text. 1092 1093Developing always goes hand in hand with debugging. First of all, 1094you can always run UML under gdb and there will be a whole section 1095later on on how to do that. That, however, is not the only way to 1096debug a Linux kernel. Quite often adding tracing statements and/or 1097using UML specific approaches such as ptracing the UML kernel process 1098are significantly more informative. 1099 1100Tracing UML 1101============= 1102 1103When running, UML consists of a main kernel thread and a number of 1104helper threads. The ones of interest for tracing are NOT the ones 1105that are already ptraced by UML as a part of its MMU emulation. 1106 1107These are usually the first three threads visible in a ps display. 1108The one with the lowest PID number and using most CPU is usually the 1109kernel thread. The other threads are the disk 1110(ubd) device helper thread and the SIGIO helper thread. 1111Running ptrace on this thread usually results in the following picture:: 1112 1113 host$ strace -p 16566 1114 --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} --- 1115 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1 1116 epoll_wait(4, [], 64, 0) = 0 1117 rt_sigreturn({mask=[PIPE]}) = 16967 1118 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0 1119 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0 1120 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0 1121 ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0 1122 ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0 1123 ptrace(PTRACE_SYSEMU, 16967, NULL, 0) = 0 1124 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} --- 1125 wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967 1126 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0 1127 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0 1128 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0 1129 timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0 1130 getpid() = 16566 1131 clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) 1132 --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} --- 1133 rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call) 1134 1135This is a typical picture from a mostly idle UML instance. 1136 1137* UML interrupt controller uses epoll - this is UML waiting for IO 1138 interrupts: 1139 1140 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1 1141 1142* The sequence of ptrace calls is part of MMU emulation and running the 1143 UML userspace. 1144* ``timer_settime`` is part of the UML high res timer subsystem mapping 1145 timer requests from inside UML onto the host high resolution timers. 1146* ``clock_nanosleep`` is UML going into idle (similar to the way a PC 1147 will execute an ACPI idle). 1148 1149As you can see UML will generate quite a bit of output even in idle. The output 1150can be very informative when observing IO. It shows the actual IO calls, their 1151arguments and returns values. 1152 1153Kernel debugging 1154================ 1155 1156You can run UML under gdb now, though it will not necessarily agree to 1157be started under it. If you are trying to track a runtime bug, it is 1158much better to attach gdb to a running UML instance and let UML run. 1159 1160Assuming the same PID number as in the previous example, this would be:: 1161 1162 # gdb -p 16566 1163 1164This will STOP the UML instance, so you must enter `cont` at the GDB 1165command line to request it to continue. It may be a good idea to make 1166this into a gdb script and pass it to gdb as an argument. 1167 1168Developing Device Drivers 1169========================= 1170 1171Nearly all UML drivers are monolithic. While it is possible to build a 1172UML driver as a kernel module, that limits the possible functionality 1173to in-kernel only and non-UML specific. The reason for this is that 1174in order to really leverage UML, one needs to write a piece of 1175userspace code which maps driver concepts onto actual userspace host 1176calls. 1177 1178This forms the so-called "user" portion of the driver. While it can 1179reuse a lot of kernel concepts, it is generally just another piece of 1180userspace code. This portion needs some matching "kernel" code which 1181resides inside the UML image and which implements the Linux kernel part. 1182 1183*Note: There are very few limitations in the way "kernel" and "user" interact*. 1184 1185UML does not have a strictly defined kernel-to-host API. It does not 1186try to emulate a specific architecture or bus. UML's "kernel" and 1187"user" can share memory, code and interact as needed to implement 1188whatever design the software developer has in mind. The only 1189limitations are purely technical. Due to a lot of functions and 1190variables having the same names, the developer should be careful 1191which includes and libraries they are trying to refer to. 1192 1193As a result a lot of userspace code consists of simple wrappers. 1194E.g. ``os_close_file()`` is just a wrapper around ``close()`` 1195which ensures that the userspace function close does not clash 1196with similarly named function(s) in the kernel part. 1197 1198Using UML as a Test Platform 1199============================ 1200 1201UML is an excellent test platform for device driver development. As 1202with most things UML, "some user assembly may be required". It is 1203up to the user to build their emulation environment. UML at present 1204provides only the kernel infrastructure. 1205 1206Part of this infrastructure is the ability to load and parse fdt 1207device tree blobs as used in Arm or Open Firmware platforms. These 1208are supplied as an optional extra argument to the kernel command 1209line:: 1210 1211 dtb=filename 1212 1213The device tree is loaded and parsed at boottime and is accessible by 1214drivers which query it. At this moment in time this facility is 1215intended solely for development purposes. UML's own devices do not 1216query the device tree. 1217 1218Security Considerations 1219----------------------- 1220 1221Drivers or any new functionality should default to not 1222accepting arbitrary filename, bpf code or other parameters 1223which can affect the host from inside the UML instance. 1224For example, specifying the socket used for IPC communication 1225between a driver and the host at the UML command line is OK 1226security-wise. Allowing it as a loadable module parameter 1227isn't. 1228 1229If such functionality is desirable for a particular application 1230(e.g. loading BPF "firmware" for raw socket network transports), 1231it should be off by default and should be explicitly turned on 1232as a command line parameter at startup. 1233 1234Even with this in mind, the level of isolation between UML 1235and the host is relatively weak. If the UML userspace is 1236allowed to load arbitrary kernel drivers, an attacker can 1237use this to break out of UML. Thus, if UML is used in 1238a production application, it is recommended that all modules 1239are loaded at boot and kernel module loading is disabled 1240afterwards. 1241