1.. SPDX-License-Identifier: GPL-2.0 2 3######### 4UML HowTo 5######### 6 7.. contents:: :local: 8 9************ 10Introduction 11************ 12 13Welcome to User Mode Linux 14 15User Mode Linux is the first Open Source virtualization platform (first 16release date 1991) and second virtualization platform for an x86 PC. 17 18How is UML Different from a VM using Virtualization package X? 19============================================================== 20 21We have come to assume that virtualization also means some level of 22hardware emulation. In fact, it does not. As long as a virtualization 23package provides the OS with devices which the OS can recognize and 24has a driver for, the devices do not need to emulate real hardware. 25Most OSes today have built-in support for a number of "fake" 26devices used only under virtualization. 27User Mode Linux takes this concept to the ultimate extreme - there 28is not a single real device in sight. It is 100% artificial or if 29we use the correct term 100% paravirtual. All UML devices are abstract 30concepts which map onto something provided by the host - files, sockets, 31pipes, etc. 32 33The other major difference between UML and various virtualization 34packages is that there is a distinct difference between the way the UML 35kernel and the UML programs operate. 36The UML kernel is just a process running on Linux - same as any other 37program. It can be run by an unprivileged user and it does not require 38anything in terms of special CPU features. 39The UML userspace, however, is a bit different. The Linux kernel on the 40host machine assists UML in intercepting everything the program running 41on a UML instance is trying to do and making the UML kernel handle all 42of its requests. 43This is different from other virtualization packages which do not make any 44difference between the guest kernel and guest programs. This difference 45results in a number of advantages and disadvantages of UML over let's say 46QEMU which we will cover later in this document. 47 48 49Why Would I Want User Mode Linux? 50================================= 51 52 53* If User Mode Linux kernel crashes, your host kernel is still fine. It 54 is not accelerated in any way (vhost, kvm, etc) and it is not trying to 55 access any devices directly. It is, in fact, a process like any other. 56 57* You can run a usermode kernel as a non-root user (you may need to 58 arrange appropriate permissions for some devices). 59 60* You can run a very small VM with a minimal footprint for a specific 61 task (for example 32M or less). 62 63* You can get extremely high performance for anything which is a "kernel 64 specific task" such as forwarding, firewalling, etc while still being 65 isolated from the host kernel. 66 67* You can play with kernel concepts without breaking things. 68 69* You are not bound by "emulating" hardware, so you can try weird and 70 wonderful concepts which are very difficult to support when emulating 71 real hardware such as time travel and making your system clock 72 dependent on what UML does (very useful for things like tests). 73 74* It's fun. 75 76Why not to run UML 77================== 78 79* The syscall interception technique used by UML makes it inherently 80 slower for any userspace applications. While it can do kernel tasks 81 on par with most other virtualization packages, its userspace is 82 **slow**. The root cause is that UML has a very high cost of creating 83 new processes and threads (something most Unix/Linux applications 84 take for granted). 85 86* UML is strictly uniprocessor at present. If you want to run an 87 application which needs many CPUs to function, it is clearly the 88 wrong choice. 89 90*********************** 91Building a UML instance 92*********************** 93 94There is no UML installer in any distribution. While you can use off 95the shelf install media to install into a blank VM using a virtualization 96package, there is no UML equivalent. You have to use appropriate tools on 97your host to build a viable filesystem image. 98 99This is extremely easy on Debian - you can do it using debootstrap. It is 100also easy on OpenWRT - the build process can build UML images. All other 101distros - YMMV. 102 103Creating an image 104================= 105 106Create a sparse raw disk image:: 107 108 # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G 109 110This will create a 16G disk image. The OS will initially allocate only one 111block and will allocate more as they are written by UML. As of kernel 112version 4.19 UML fully supports TRIM (as usually used by flash drives). 113Using TRIM inside the UML image by specifying discard as a mount option 114or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to 115return any unused blocks to the OS. 116 117Create a filesystem on the disk image and mount it:: 118 119 # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt 120 121This example uses ext4, any other filesystem such as ext3, btrfs, xfs, 122jfs, etc will work too. 123 124Create a minimal OS installation on the mounted filesystem:: 125 126 # debootstrap buster /mnt http://deb.debian.org/debian 127 128debootstrap does not set up the root password, fstab, hostname or 129anything related to networking. It is up to the user to do that. 130 131Set the root password - the easiest way to do that is to chroot into the 132mounted image:: 133 134 # chroot /mnt 135 # passwd 136 # exit 137 138Edit key system files 139===================== 140 141UML block devices are called ubds. The fstab created by debootstrap 142will be empty and it needs an entry for the root file system:: 143 144 /dev/ubd0 ext4 discard,errors=remount-ro 0 1 145 146The image hostname will be set to the same as the host on which you 147are creating its image. It is a good idea to change that to avoid 148"Oh, bummer, I rebooted the wrong machine". 149 150UML supports two classes of network devices - the older uml_net ones 151which are scheduled for obsoletion. These are called ethX. It also 152supports the newer vector IO devices which are significantly faster 153and have support for some standard virtual network encapsulations like 154Ethernet over GRE and Ethernet over L2TPv3. These are called vec0. 155 156Depending on which one is in use, ``/etc/network/interfaces`` will 157need entries like:: 158 159 # legacy UML network devices 160 auto eth0 161 iface eth0 inet dhcp 162 163 # vector UML network devices 164 auto vec0 165 iface vec0 inet dhcp 166 167We now have a UML image which is nearly ready to run, all we need is a 168UML kernel and modules for it. 169 170Most distributions have a UML package. Even if you intend to use your own 171kernel, testing the image with a stock one is always a good start. These 172packages come with a set of modules which should be copied to the target 173filesystem. The location is distribution dependent. For Debian these 174reside under /usr/lib/uml/modules. Copy recursively the content of this 175directory to the mounted UML filesystem:: 176 177 # cp -rax /usr/lib/uml/modules /mnt/lib/modules 178 179If you have compiled your own kernel, you need to use the usual "install 180modules to a location" procedure by running:: 181 182 # make INSTALL_MOD_PATH=/mnt/lib/modules modules_install 183 184This will install modules into /mnt/lib/modules/$(KERNELRELEASE). 185To specify the full module installation path, use:: 186 187 # make MODLIB=/mnt/lib/modules modules_install 188 189At this point the image is ready to be brought up. 190 191************************* 192Setting Up UML Networking 193************************* 194 195UML networking is designed to emulate an Ethernet connection. This 196connection may be either point-to-point (similar to a connection 197between machines using a back-to-back cable) or a connection to a 198switch. UML supports a wide variety of means to build these 199connections to all of: local machine, remote machine(s), local and 200remote UML and other VM instances. 201 202 203+-----------+--------+------------------------------------+------------+ 204| Transport | Type | Capabilities | Throughput | 205+===========+========+====================================+============+ 206| tap | vector | checksum, tso | > 8Gbit | 207+-----------+--------+------------------------------------+------------+ 208| hybrid | vector | checksum, tso, multipacket rx | > 6GBit | 209+-----------+--------+------------------------------------+------------+ 210| raw | vector | checksum, tso, multipacket rx, tx" | > 6GBit | 211+-----------+--------+------------------------------------+------------+ 212| EoGRE | vector | multipacket rx, tx | > 3Gbit | 213+-----------+--------+------------------------------------+------------+ 214| Eol2tpv3 | vector | multipacket rx, tx | > 3Gbit | 215+-----------+--------+------------------------------------+------------+ 216| bess | vector | multipacket rx, tx | > 3Gbit | 217+-----------+--------+------------------------------------+------------+ 218| fd | vector | dependent on fd type | varies | 219+-----------+--------+------------------------------------+------------+ 220| tuntap | legacy | none | ~ 500Mbit | 221+-----------+--------+------------------------------------+------------+ 222| daemon | legacy | none | ~ 450Mbit | 223+-----------+--------+------------------------------------+------------+ 224| socket | legacy | none | ~ 450Mbit | 225+-----------+--------+------------------------------------+------------+ 226| ethertap | legacy | obsolete | ~ 500Mbit | 227+-----------+--------+------------------------------------+------------+ 228| vde | legacy | obsolete | ~ 500Mbit | 229+-----------+--------+------------------------------------+------------+ 230 231* All transports which have tso and checksum offloads can deliver speeds 232 approaching 10G on TCP streams. 233 234* All transports which have multi-packet rx and/or tx can deliver pps 235 rates of up to 1Mps or more. 236 237* All legacy transports are generally limited to ~600-700MBit and 0.05Mps. 238 239* GRE and L2TPv3 allow connections to all of: local machine, remote 240 machines, remote network devices and remote UML instances. 241 242* Socket allows connections only between UML instances. 243 244* Daemon and bess require running a local switch. This switch may be 245 connected to the host as well. 246 247 248Network configuration privileges 249================================ 250 251The majority of the supported networking modes need ``root`` privileges. 252For example, in the legacy tuntap networking mode, users were required 253to be part of the group associated with the tunnel device. 254 255For newer network drivers like the vector transports, ``root`` privilege 256is required to fire an ioctl to setup the tun interface and/or use 257raw sockets where needed. 258 259This can be achieved by granting the user a particular capability instead 260of running UML as root. In case of vector transport, a user can add the 261capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW`` to the uml binary. 262Thenceforth, UML can be run with normal user privilges, along with 263full networking. 264 265For example:: 266 267 # sudo setcap cap_net_raw,cap_net_admin+ep linux 268 269Configuring vector transports 270=============================== 271 272All vector transports support a similar syntax: 273 274If X is the interface number as in vec0, vec1, vec2, etc, the general 275syntax for options is:: 276 277 vecX:transport="Transport Name",option=value,option=value,...,option=value 278 279Common options 280-------------- 281 282These options are common for all transports: 283 284* ``depth=int`` - sets the queue depth for vector IO. This is the 285 amount of packets UML will attempt to read or write in a single 286 system call. The default number is 64 and is generally sufficient 287 for most applications that need throughput in the 2-4 Gbit range. 288 Higher speeds may require larger values. 289 290* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value. 291 292* ``gro=[0,1]`` - sets GRO off or on. Enables receive/transmit offloads. 293 The effect of this option depends on the host side support in the transport 294 which is being configured. In most cases it will enable TCP segmentation and 295 RX/TX checksumming offloads. The setting must be identical on the host side 296 and the UML side. The UML kernel will produce warnings if it is not. 297 For example, GRO is enabled by default on local machine interfaces 298 (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the 299 corresponding UML transports (raw, tap, hybrid) in order for networking to 300 operate correctly. 301 302* ``mtu=int`` - sets the interface MTU 303 304* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved 305 if a packet will need to be re-encapsulated into for instance VXLAN. 306 307* ``vec=0`` - disable multipacket IO and fall back to packet at a 308 time mode 309 310Shared Options 311-------------- 312 313* ``ifname=str`` Transports which bind to a local network interface 314 have a shared option - the name of the interface to bind to. 315 316* ``src, dst, src_port, dst_port`` - all transports which use sockets 317 which have the notion of source and destination and/or source port 318 and destination port use these to specify them. 319 320* ``v6=[0,1]`` to specify if a v6 connection is desired for all 321 transports which operate over IP. Additionally, for transports that 322 have some differences in the way they operate over v4 and v6 (for example 323 EoL2TPv3), sets the correct mode of operation. In the absence of this 324 option, the socket type is determined based on what do the src and dst 325 arguments resolve/parse to. 326 327tap transport 328------------- 329 330Example:: 331 332 vecX:transport=tap,ifname=tap0,depth=128,gro=1 333 334This will connect vec0 to tap0 on the host. Tap0 must already exist (for example 335created using tunctl) and UP. 336 337tap0 can be configured as a point-to-point interface and given an IP 338address so that UML can talk to the host. Alternatively, it is possible 339to connect UML to a tap interface which is connected to a bridge. 340 341While tap relies on the vector infrastructure, it is not a true vector 342transport at this point, because Linux does not support multi-packet 343IO on tap file descriptors for normal userspace apps like UML. This 344is a privilege which is offered only to something which can hook up 345to it at kernel level via specialized interfaces like vhost-net. A 346vhost-net like helper for UML is planned at some point in the future. 347 348Privileges required: tap transport requires either: 349 350* tap interface to exist and be created persistent and owned by the 351 UML user using tunctl. Example ``tunctl -u uml-user -t tap0`` 352 353* binary to have ``CAP_NET_ADMIN`` privilege 354 355hybrid transport 356---------------- 357 358Example:: 359 360 vecX:transport=hybrid,ifname=tap0,depth=128,gro=1 361 362This is an experimental/demo transport which couples tap for transmit 363and a raw socket for receive. The raw socket allows multi-packet 364receive resulting in significantly higher packet rates than normal tap. 365 366Privileges required: hybrid requires ``CAP_NET_RAW`` capability by 367the UML user as well as the requirements for the tap transport. 368 369raw socket transport 370-------------------- 371 372Example:: 373 374 vecX:transport=raw,ifname=p-veth0,depth=128,gro=1 375 376 377This transport uses vector IO on raw sockets. While you can bind to any 378interface including a physical one, the most common use it to bind to 379the "peer" side of a veth pair with the other side configured on the 380host. 381 382Example host configuration for Debian: 383 384**/etc/network/interfaces**:: 385 386 auto veth0 387 iface veth0 inet static 388 address 192.168.4.1 389 netmask 255.255.255.252 390 broadcast 192.168.4.3 391 pre-up ip link add veth0 type veth peer name p-veth0 && \ 392 ifconfig p-veth0 up 393 394UML can now bind to p-veth0 like this:: 395 396 vec0:transport=raw,ifname=p-veth0,depth=128,gro=1 397 398 399If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0 400it can talk to the host on 192.168.4.1 401 402The raw transport also provides some support for offloading some of the 403filtering to the host. The two options to control it are: 404 405* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter 406 407* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux. 408 This option allows the use of the ethtool load firmware command to 409 load bpf code. 410 411In either case the bpf code is loaded into the host kernel. While this is 412presently limited to legacy bpf syntax (not ebpf), it is still a security 413risk. It is not recommended to allow this unless the User Mode Linux 414instance is considered trusted. 415 416Privileges required: raw socket transport requires `CAP_NET_RAW` 417capability. 418 419GRE socket transport 420-------------------- 421 422Example:: 423 424 vecX:transport=gre,src=$src_host,dst=$dst_host 425 426 427This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or 428``GREIRB``) tunnel which will connect the UML instance to a ``GRE`` 429endpoint at host dst_host. ``GRE`` supports the following additional 430options: 431 432* ``rx_key=int`` - GRE 32-bit integer key for rx packets, if set, 433 ``txkey`` must be set too 434 435* ``tx_key=int`` - GRE 32-bit integer key for tx packets, if set 436 ``rx_key`` must be set too 437 438* ``sequence=[0,1]`` - enable GRE sequence 439 440* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset 441 on each packet (needed to interoperate with some really broken 442 implementations) 443 444* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively 445 446* GRE checksum is not presently supported 447 448GRE has a number of caveats: 449 450* You can use only one GRE connection per IP address. There is no way to 451 multiplex connections as each GRE tunnel is terminated directly on 452 the UML instance. 453 454* The key is not really a security feature. While it was intended as such 455 its "security" is laughable. It is, however, a useful feature to 456 ensure that the tunnel is not misconfigured. 457 458An example configuration for a Linux host with a local address of 459192.168.128.1 to connect to a UML instance at 192.168.129.1 460 461**/etc/network/interfaces**:: 462 463 auto gt0 464 iface gt0 inet static 465 address 10.0.0.1 466 netmask 255.255.255.0 467 broadcast 10.0.0.255 468 mtu 1500 469 pre-up ip link add gt0 type gretap local 192.168.128.1 \ 470 remote 192.168.129.1 || true 471 down ip link del gt0 || true 472 473Additionally, GRE has been tested versus a variety of network equipment. 474 475Privileges required: GRE requires ``CAP_NET_RAW`` 476 477l2tpv3 socket transport 478----------------------- 479 480_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more 481options than GNU ls". While it has some advantages, there are usually 482easier (and less verbose) ways to connect a UML instance to something. 483For example, most devices which support L2TPv3 also support GRE. 484 485Example:: 486 487 vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff 488 489This will configure an Ethernet over L2TPv3 fixed tunnel which will 490connect the UML instance to a L2TPv3 endpoint at host $dst_host using 491the L2TPv3 UDP flavour and UDP destination port $dst_port. 492 493L2TPv3 always requires the following additional options: 494 495* ``rx_session=int`` - l2tpv3 32-bit integer session for rx packets 496 497* ``tx_session=int`` - l2tpv3 32-bit integer session for tx packets 498 499As the tunnel is fixed these are not negotiated and they are 500preconfigured on both ends. 501 502Additionally, L2TPv3 supports the following optional parameters. 503 504* ``rx_cookie=int`` - l2tpv3 32-bit integer cookie for rx packets - same 505 functionality as GRE key, more to prevent misconfiguration than provide 506 actual security 507 508* ``tx_cookie=int`` - l2tpv3 32-bit integer cookie for tx packets 509 510* ``cookie64=[0,1]`` - use 64-bit cookies instead of 32-bit. 511 512* ``counter=[0,1]`` - enable l2tpv3 counter 513 514* ``pin_counter=[0,1]`` - pretend that the counter is always reset on 515 each packet (needed to interoperate with some really broken 516 implementations) 517 518* ``v6=[0,1]`` - force v6 sockets 519 520* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol 521 522L2TPv3 has a number of caveats: 523 524* you can use only one connection per IP address in raw mode. There is 525 no way to multiplex connections as each L2TPv3 tunnel is terminated 526 directly on the UML instance. UDP mode can use different ports for 527 this purpose. 528 529Here is an example of how to configure a Linux host to connect to UML 530via L2TPv3: 531 532**/etc/network/interfaces**:: 533 534 auto l2tp1 535 iface l2tp1 inet static 536 address 192.168.126.1 537 netmask 255.255.255.0 538 broadcast 192.168.126.255 539 mtu 1500 540 pre-up ip l2tp add tunnel remote 127.0.0.1 \ 541 local 127.0.0.1 encap udp tunnel_id 2 \ 542 peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \ 543 ip l2tp add session name l2tp1 tunnel_id 2 \ 544 session_id 0xffffffff peer_session_id 0xffffffff 545 down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \ 546 ip l2tp del tunnel tunnel_id 2 547 548 549Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and 550no special privileges for the UDP mode. 551 552BESS socket transport 553--------------------- 554 555BESS is a high performance modular network switch. 556 557https://github.com/NetSys/bess 558 559It has support for a simple sequential packet socket mode which in the 560more recent versions is using vector IO for high performance. 561 562Example:: 563 564 vecX:transport=bess,src=$unix_src,dst=$unix_dst 565 566This will configure a BESS transport using the unix_src Unix domain 567socket address as source and unix_dst socket address as destination. 568 569For BESS configuration and how to allocate a BESS Unix domain socket port 570please see the BESS documentation. 571 572https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports 573 574BESS transport does not require any special privileges. 575 576Configuring Legacy transports 577============================= 578 579Legacy transports are now considered obsolete. Please use the vector 580versions. 581 582*********** 583Running UML 584*********** 585 586This section assumes that either the user-mode-linux package from the 587distribution or a custom built kernel has been installed on the host. 588 589These add an executable called linux to the system. This is the UML 590kernel. It can be run just like any other executable. 591It will take most normal linux kernel arguments as command line 592arguments. Additionally, it will need some UML-specific arguments 593in order to do something useful. 594 595Arguments 596========= 597 598Mandatory Arguments: 599-------------------- 600 601* ``mem=int[K,M,G]`` - amount of memory. By default in bytes. It will 602 also accept K, M or G qualifiers. 603 604* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really 605 mandatory, but it is likely to be needed in nearly all cases so we can 606 specify a root file system. 607 The simplest possible image specification is the name of the image 608 file for the filesystem (created using one of the methods described 609 in `Creating an image`_). 610 611 * UBD devices support copy on write (COW). The changes are kept in 612 a separate file which can be discarded allowing a rollback to the 613 original pristine image. If COW is desired, the UBD image is 614 specified as: ``cow_file,master_image``. 615 Example:``ubd0=Filesystem.cow,Filesystem.img`` 616 617 * UBD devices can be set to use synchronous IO. Any writes are 618 immediately flushed to disk. This is done by adding ``s`` after 619 the ``ubdX`` specification. 620 621 * UBD performs some heuristics on devices specified as a single 622 filename to make sure that a COW file has not been specified as 623 the image. To turn them off, use the ``d`` flag after ``ubdX``. 624 625 * UBD supports TRIM - asking the Host OS to reclaim any unused 626 blocks in the image. To turn it off, specify the ``t`` flag after 627 ``ubdX``. 628 629* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux 630 filesystem image) 631 632Important Optional Arguments 633---------------------------- 634 635If UML is run as "linux" with no extra arguments, it will try to start an 636xterm for every console configured inside the image (up to 6 in most 637Linux distributions). Each console is started inside an 638xterm. This makes it nice and easy to use UML on a host with a GUI. It is, 639however, the wrong approach if UML is to be used as a testing harness or run 640in a text-only environment. 641 642In order to change this behaviour we need to specify an alternative console 643and wire it to one of the supported "line" channels. For this we need to map a 644console to use something different from the default xterm. 645 646Example which will divert console number 1 to stdin/stdout:: 647 648 con1=fd:0,fd:1 649 650UML supports a wide variety of serial line channels which are specified using 651the following syntax 652 653 conX=channel_type:options[,channel_type:options] 654 655 656If the channel specification contains two parts separated by comma, the first 657one is input, the second one output. 658 659* The null channel - Discard all input or output. Example ``con=null`` will set 660 all consoles to null by default. 661 662* The fd channel - use file descriptor numbers for input/output. Example: 663 ``con1=fd:0,fd:1.`` 664 665* The port channel - start a telnet server on TCP port number. Example: 666 ``con1=port:4321``. The host must have /usr/sbin/in.telnetd (usually part of 667 a telnetd package) and the port-helper from the UML utilities (see the 668 information for the xterm channel below). UML will not boot until a client 669 connects. 670 671* The pty and pts channels - use system pty/pts. 672 673* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8`` 674 will make UML use the host 8th console (usually unused). 675 676* The xterm channel - this is the default - bring up an xterm on this channel 677 and direct IO to it. Note that in order for xterm to work, the host must 678 have the UML distribution package installed. This usually contains the 679 port-helper and other utilities needed for UML to communicate with the xterm. 680 Alternatively, these need to be complied and installed from source. All 681 options applicable to consoles also apply to UML serial lines which are 682 presented as ttyS inside UML. 683 684Starting UML 685============ 686 687We can now run UML. 688:: 689 690 # linux mem=2048M umid=TEST \ 691 ubd0=Filesystem.img \ 692 vec0:transport=tap,ifname=tap0,depth=128,gro=1 \ 693 root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1 694 695This will run an instance with ``2048M RAM`` and try to use the image file 696called ``Filesystem.img`` as root. It will connect to the host using tap0. 697All consoles except ``con1`` will be disabled and console 1 will 698use standard input/output making it appear in the same terminal it was started. 699 700Logging in 701============ 702 703If you have not set up a password when generating the image, you will have to 704shut down the UML instance, mount the image, chroot into it and set it - as 705described in the Generating an Image section. If the password is already set, 706you can just log in. 707 708The UML Management Console 709============================ 710 711In addition to managing the image from "the inside" using normal sysadmin tools, 712it is possible to perform a number of low-level operations using the UML 713management console. The UML management console is a low-level interface to the 714kernel on a running UML instance, somewhat like the i386 SysRq interface. Since 715there is a full-blown operating system under UML, there is much greater 716flexibility possible than with the SysRq mechanism. 717 718There are a number of things you can do with the mconsole interface: 719 720* get the kernel version 721* add and remove devices 722* halt or reboot the machine 723* Send SysRq commands 724* Pause and resume the UML 725* Inspect processes running inside UML 726* Inspect UML internal /proc state 727 728You need the mconsole client (uml\_mconsole) which is a part of the UML 729tools package available in most Linux distritions. 730 731You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML 732kernel. When you boot UML, you'll see a line like:: 733 734 mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole 735 736If you specify a unique machine id on the UML command line, i.e. 737``umid=debian``, you'll see this:: 738 739 mconsole initialized on /home/jdike/.uml/debian/mconsole 740 741 742That file is the socket that uml_mconsole will use to communicate with 743UML. Run it with either the umid or the full path as its argument:: 744 745 # uml_mconsole debian 746 747or 748 749 # uml_mconsole /home/jdike/.uml/debian/mconsole 750 751 752You'll get a prompt, at which you can run one of these commands: 753 754* version 755* help 756* halt 757* reboot 758* config 759* remove 760* sysrq 761* help 762* cad 763* stop 764* go 765* proc 766* stack 767 768version 769------- 770 771This command takes no arguments. It prints the UML version:: 772 773 (mconsole) version 774 OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64 775 776 777There are a couple actual uses for this. It's a simple no-op which 778can be used to check that a UML is running. It's also a way of 779sending a device interrupt to the UML. UML mconsole is treated internally as 780a UML device. 781 782help 783---- 784 785This command takes no arguments. It prints a short help screen with the 786supported mconsole commands. 787 788 789halt and reboot 790--------------- 791 792These commands take no arguments. They shut the machine down immediately, with 793no syncing of disks and no clean shutdown of userspace. So, they are 794pretty close to crashing the machine:: 795 796 (mconsole) halt 797 OK 798 799config 800------ 801 802"config" adds a new device to the virtual machine. This is supported 803by most UML device drivers. It takes one argument, which is the 804device to add, with the same syntax as the kernel command line:: 805 806 (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22 807 808remove 809------ 810 811"remove" deletes a device from the system. Its argument is just the 812name of the device to be removed. The device must be idle in whatever 813sense the driver considers necessary. In the case of the ubd driver, 814the removed block device must not be mounted, swapped on, or otherwise 815open, and in the case of the network driver, the device must be down:: 816 817 (mconsole) remove ubd3 818 819sysrq 820----- 821 822This command takes one argument, which is a single letter. It calls the 823generic kernel's SysRq driver, which does whatever is called for by 824that argument. See the SysRq documentation in 825Documentation/admin-guide/sysrq.rst in your favorite kernel tree to 826see what letters are valid and what they do. 827 828cad 829--- 830 831This invokes the ``Ctl-Alt-Del`` action in the running image. What exactly 832this ends up doing is up to init, systemd, etc. Normally, it reboots the 833machine. 834 835stop 836---- 837 838This puts the UML in a loop reading mconsole requests until a 'go' 839mconsole command is received. This is very useful as a 840debugging/snapshotting tool. 841 842go 843-- 844 845This resumes a UML after being paused by a 'stop' command. Note that 846when the UML has resumed, TCP connections may have timed out and if 847the UML is paused for a long period of time, crond might go a little 848crazy, running all the jobs it didn't do earlier. 849 850proc 851---- 852 853This takes one argument - the name of a file in /proc which is printed 854to the mconsole standard output 855 856stack 857----- 858 859This takes one argument - the pid number of a process. Its stack is 860printed to a standard output. 861 862******************* 863Advanced UML Topics 864******************* 865 866Sharing Filesystems between Virtual Machines 867============================================ 868 869Don't attempt to share filesystems simply by booting two UMLs from the 870same file. That's the same thing as booting two physical machines 871from a shared disk. It will result in filesystem corruption. 872 873Using layered block devices 874--------------------------- 875 876The way to share a filesystem between two virtual machines is to use 877the copy-on-write (COW) layering capability of the ubd block driver. 878Any changed blocks are stored in the private COW file, while reads come 879from either device - the private one if the requested block is valid in 880it, the shared one if not. Using this scheme, the majority of data 881which is unchanged is shared between an arbitrary number of virtual 882machines, each of which has a much smaller file containing the changes 883that it has made. With a large number of UMLs booting from a large root 884filesystem, this leads to a huge disk space saving. 885 886Sharing file system data will also help performance, since the host will 887be able to cache the shared data using a much smaller amount of memory, 888so UML disk requests will be served from the host's memory rather than 889its disks. There is a major caveat in doing this on multisocket NUMA 890machines. On such hardware, running many UML instances with a shared 891master image and COW changes may cause issues like NMIs from excess of 892inter-socket traffic. 893 894If you are running UML on high-end hardware like this, make sure to 895bind UML to a set of logical CPUs residing on the same socket using the 896``taskset`` command or have a look at the "tuning" section. 897 898To add a copy-on-write layer to an existing block device file, simply 899add the name of the COW file to the appropriate ubd switch:: 900 901 ubd0=root_fs_cow,root_fs_debian_22 902 903where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is 904the existing shared filesystem. The COW file need not exist. If it 905doesn't, the driver will create and initialize it. 906 907Disk Usage 908---------- 909 910UML has TRIM support which will release any unused space in its disk 911image files to the underlying OS. It is important to use either ls -ls 912or du to verify the actual file size. 913 914COW validity. 915------------- 916 917Any changes to the master image will invalidate all COW files. If this 918happens, UML will *NOT* automatically delete any of the COW files and 919will refuse to boot. In this case the only solution is to either 920restore the old image (including its last modified timestamp) or remove 921all COW files which will result in their recreation. Any changes in 922the COW files will be lost. 923 924Cows can moo - uml_moo : Merging a COW file with its backing file 925----------------------------------------------------------------- 926 927Depending on how you use UML and COW devices, it may be advisable to 928merge the changes in the COW file into the backing file every once in 929a while. 930 931The utility that does this is uml_moo. Its usage is:: 932 933 uml_moo COW_file new_backing_file 934 935 936There's no need to specify the backing file since that information is 937already in the COW file header. If you're paranoid, boot the new 938merged file, and if you're happy with it, move it over the old backing 939file. 940 941``uml_moo`` creates a new backing file by default as a safety measure. 942It also has a destructive merge option which will merge the COW file 943directly into its current backing file. This is really only usable 944when the backing file only has one COW file associated with it. If 945there are multiple COWs associated with a backing file, a -d merge of 946one of them will invalidate all of the others. However, it is 947convenient if you're short of disk space, and it should also be 948noticeably faster than a non-destructive merge. 949 950``uml_moo`` is installed with the UML distribution packages and is 951available as a part of UML utilities. 952 953Host file access 954================== 955 956If you want to access files on the host machine from inside UML, you 957can treat it as a separate machine and either nfs mount directories 958from the host or copy files into the virtual machine with scp. 959However, since UML is running on the host, it can access those 960files just like any other process and make them available inside the 961virtual machine without the need to use the network. 962This is possible with the hostfs virtual filesystem. With it, you 963can mount a host directory into the UML filesystem and access the 964files contained in it just as you would on the host. 965 966*SECURITY WARNING* 967 968Hostfs without any parameters to the UML Image will allow the image 969to mount any part of the host filesystem and write to it. Always 970confine hostfs to a specific "harmless" directory (for example ``/var/tmp``) 971if running UML. This is especially important if UML is being run as root. 972 973Using hostfs 974------------ 975 976To begin with, make sure that hostfs is available inside the virtual 977machine with:: 978 979 # cat /proc/filesystems 980 981``hostfs`` should be listed. If it's not, either rebuild the kernel 982with hostfs configured into it or make sure that hostfs is built as a 983module and available inside the virtual machine, and insmod it. 984 985 986Now all you need to do is run mount:: 987 988 # mount none /mnt/host -t hostfs 989 990will mount the host's ``/`` on the virtual machine's ``/mnt/host``. 991If you don't want to mount the host root directory, then you can 992specify a subdirectory to mount with the -o switch to mount:: 993 994 # mount none /mnt/home -t hostfs -o /home 995 996will mount the host's /home on the virtual machine's /mnt/home. 997 998hostfs as the root filesystem 999----------------------------- 1000 1001It's possible to boot from a directory hierarchy on the host using 1002hostfs rather than using the standard filesystem in a file. 1003To start, you need that hierarchy. The easiest way is to loop mount 1004an existing root_fs file:: 1005 1006 # mount root_fs uml_root_dir -o loop 1007 1008 1009You need to change the filesystem type of ``/`` in ``etc/fstab`` to be 1010'hostfs', so that line looks like this:: 1011 1012 /dev/ubd/0 / hostfs defaults 1 1 1013 1014Then you need to chown to yourself all the files in that directory 1015that are owned by root. This worked for me:: 1016 1017 # find . -uid 0 -exec chown jdike {} \; 1018 1019Next, make sure that your UML kernel has hostfs compiled in, not as a 1020module. Then run UML with the boot device pointing at that directory:: 1021 1022 ubd0=/path/to/uml/root/directory 1023 1024UML should then boot as it does normally. 1025 1026Hostfs Caveats 1027-------------- 1028 1029Hostfs does not support keeping track of host filesystem changes on the 1030host (outside UML). As a result, if a file is changed without UML's 1031knowledge, UML will not know about it and its own in-memory cache of 1032the file may be corrupt. While it is possible to fix this, it is not 1033something which is being worked on at present. 1034 1035Tuning UML 1036============ 1037 1038UML at present is strictly uniprocessor. It will, however spin up a 1039number of threads to handle various functions. 1040 1041The UBD driver, SIGIO and the MMU emulation do that. If the system is 1042idle, these threads will be migrated to other processors on a SMP host. 1043This, unfortunately, will usually result in LOWER performance because of 1044all of the cache/memory synchronization traffic between cores. As a 1045result, UML will usually benefit from being pinned on a single CPU, 1046especially on a large system. This can result in performance differences 1047of 5 times or higher on some benchmarks. 1048 1049Similarly, on large multi-node NUMA systems UML will benefit if all of 1050its memory is allocated from the same NUMA node it will run on. The 1051OS will *NOT* do that by default. In order to do that, the sysadmin 1052needs to create a suitable tmpfs ramdisk bound to a particular node 1053and use that as the source for UML RAM allocation by specifying it 1054in the TMP or TEMP environment variables. UML will look at the values 1055of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will 1056look for shmfs mounted under ``/dev/shm``. If everything else fails use 1057``/tmp/`` regardless of the filesystem type used for it:: 1058 1059 mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX 1060 TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options.. 1061 1062******************************************* 1063Contributing to UML and Developing with UML 1064******************************************* 1065 1066UML is an excellent platform to develop new Linux kernel concepts - 1067filesystems, devices, virtualization, etc. It provides unrivalled 1068opportunities to create and test them without being constrained to 1069emulating specific hardware. 1070 1071Example - want to try how Linux will work with 4096 "proper" network 1072devices? 1073 1074Not an issue with UML. At the same time, this is something which 1075is difficult with other virtualization packages - they are 1076constrained by the number of devices allowed on the hardware bus 1077they are trying to emulate (for example 16 on a PCI bus in qemu). 1078 1079If you have something to contribute such as a patch, a bugfix, a 1080new feature, please send it to ``linux-um@lists.infradead.org``. 1081 1082Please follow all standard Linux patch guidelines such as cc-ing 1083relevant maintainers and run ``./scripts/checkpatch.pl`` on your patch. 1084For more details see ``Documentation/process/submitting-patches.rst`` 1085 1086Note - the list does not accept HTML or attachments, all emails must 1087be formatted as plain text. 1088 1089Developing always goes hand in hand with debugging. First of all, 1090you can always run UML under gdb and there will be a whole section 1091later on on how to do that. That, however, is not the only way to 1092debug a Linux kernel. Quite often adding tracing statements and/or 1093using UML specific approaches such as ptracing the UML kernel process 1094are significantly more informative. 1095 1096Tracing UML 1097============= 1098 1099When running, UML consists of a main kernel thread and a number of 1100helper threads. The ones of interest for tracing are NOT the ones 1101that are already ptraced by UML as a part of its MMU emulation. 1102 1103These are usually the first three threads visible in a ps display. 1104The one with the lowest PID number and using most CPU is usually the 1105kernel thread. The other threads are the disk 1106(ubd) device helper thread and the SIGIO helper thread. 1107Running ptrace on this thread usually results in the following picture:: 1108 1109 host$ strace -p 16566 1110 --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} --- 1111 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1 1112 epoll_wait(4, [], 64, 0) = 0 1113 rt_sigreturn({mask=[PIPE]}) = 16967 1114 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0 1115 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0 1116 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0 1117 ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0 1118 ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0 1119 ptrace(PTRACE_SYSEMU, 16967, NULL, 0) = 0 1120 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} --- 1121 wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967 1122 ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0 1123 ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0 1124 ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0 1125 timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0 1126 getpid() = 16566 1127 clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) 1128 --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} --- 1129 rt_sigreturn({mask=[PIPE]}) = -1 EINTR (Interrupted system call) 1130 1131This is a typical picture from a mostly idle UML instance. 1132 1133* UML interrupt controller uses epoll - this is UML waiting for IO 1134 interrupts: 1135 1136 epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1 1137 1138* The sequence of ptrace calls is part of MMU emulation and running the 1139 UML userspace. 1140* ``timer_settime`` is part of the UML high res timer subsystem mapping 1141 timer requests from inside UML onto the host high resolution timers. 1142* ``clock_nanosleep`` is UML going into idle (similar to the way a PC 1143 will execute an ACPI idle). 1144 1145As you can see UML will generate quite a bit of output even in idle. The output 1146can be very informative when observing IO. It shows the actual IO calls, their 1147arguments and returns values. 1148 1149Kernel debugging 1150================ 1151 1152You can run UML under gdb now, though it will not necessarily agree to 1153be started under it. If you are trying to track a runtime bug, it is 1154much better to attach gdb to a running UML instance and let UML run. 1155 1156Assuming the same PID number as in the previous example, this would be:: 1157 1158 # gdb -p 16566 1159 1160This will STOP the UML instance, so you must enter `cont` at the GDB 1161command line to request it to continue. It may be a good idea to make 1162this into a gdb script and pass it to gdb as an argument. 1163 1164Developing Device Drivers 1165========================= 1166 1167Nearly all UML drivers are monolithic. While it is possible to build a 1168UML driver as a kernel module, that limits the possible functionality 1169to in-kernel only and non-UML specific. The reason for this is that 1170in order to really leverage UML, one needs to write a piece of 1171userspace code which maps driver concepts onto actual userspace host 1172calls. 1173 1174This forms the so-called "user" portion of the driver. While it can 1175reuse a lot of kernel concepts, it is generally just another piece of 1176userspace code. This portion needs some matching "kernel" code which 1177resides inside the UML image and which implements the Linux kernel part. 1178 1179*Note: There are very few limitations in the way "kernel" and "user" interact*. 1180 1181UML does not have a strictly defined kernel-to-host API. It does not 1182try to emulate a specific architecture or bus. UML's "kernel" and 1183"user" can share memory, code and interact as needed to implement 1184whatever design the software developer has in mind. The only 1185limitations are purely technical. Due to a lot of functions and 1186variables having the same names, the developer should be careful 1187which includes and libraries they are trying to refer to. 1188 1189As a result a lot of userspace code consists of simple wrappers. 1190E.g. ``os_close_file()`` is just a wrapper around ``close()`` 1191which ensures that the userspace function close does not clash 1192with similarly named function(s) in the kernel part. 1193 1194Using UML as a Test Platform 1195============================ 1196 1197UML is an excellent test platform for device driver development. As 1198with most things UML, "some user assembly may be required". It is 1199up to the user to build their emulation environment. UML at present 1200provides only the kernel infrastructure. 1201 1202Part of this infrastructure is the ability to load and parse fdt 1203device tree blobs as used in Arm or Open Firmware platforms. These 1204are supplied as an optional extra argument to the kernel command 1205line:: 1206 1207 dtb=filename 1208 1209The device tree is loaded and parsed at boottime and is accessible by 1210drivers which query it. At this moment in time this facility is 1211intended solely for development purposes. UML's own devices do not 1212query the device tree. 1213 1214Security Considerations 1215----------------------- 1216 1217Drivers or any new functionality should default to not 1218accepting arbitrary filename, bpf code or other parameters 1219which can affect the host from inside the UML instance. 1220For example, specifying the socket used for IPC communication 1221between a driver and the host at the UML command line is OK 1222security-wise. Allowing it as a loadable module parameter 1223isn't. 1224 1225If such functionality is desirable for a particular application 1226(e.g. loading BPF "firmware" for raw socket network transports), 1227it should be off by default and should be explicitly turned on 1228as a command line parameter at startup. 1229 1230Even with this in mind, the level of isolation between UML 1231and the host is relatively weak. If the UML userspace is 1232allowed to load arbitrary kernel drivers, an attacker can 1233use this to break out of UML. Thus, if UML is used in 1234a production application, it is recommended that all modules 1235are loaded at boot and kernel module loading is disabled 1236afterwards. 1237