xref: /linux/Documentation/virt/uml/user_mode_linux_howto_v2.rst (revision cfc4ca8986bb1f6182da6cd7bb57f228590b4643)
1.. SPDX-License-Identifier: GPL-2.0
2
3#########
4UML HowTo
5#########
6
7.. contents:: :local:
8
9************
10Introduction
11************
12
13Welcome to User Mode Linux
14
15User Mode Linux is the first Open Source virtualization platform (first
16release date 1991) and second virtualization platform for an x86 PC.
17
18How is UML Different from a VM using Virtualization package X?
19==============================================================
20
21We have come to assume that virtualization also means some level of
22hardware emulation. In fact, it does not. As long as a virtualization
23package provides the OS with devices which the OS can recognize and
24has a driver for, the devices do not need to emulate real hardware.
25Most OSes today have built-in support for a number of "fake"
26devices used only under virtualization.
27User Mode Linux takes this concept to the ultimate extreme - there
28is not a single real device in sight. It is 100% artificial or if
29we use the correct term 100% paravirtual. All UML devices are abstract
30concepts which map onto something provided by the host - files, sockets,
31pipes, etc.
32
33The other major difference between UML and various virtualization
34packages is that there is a distinct difference between the way the UML
35kernel and the UML programs operate.
36The UML kernel is just a process running on Linux - same as any other
37program. It can be run by an unprivileged user and it does not require
38anything in terms of special CPU features.
39The UML userspace, however, is a bit different. The Linux kernel on the
40host machine assists UML in intercepting everything the program running
41on a UML instance is trying to do and making the UML kernel handle all
42of its requests.
43This is different from other virtualization packages which do not make any
44difference between the guest kernel and guest programs. This difference
45results in a number of advantages and disadvantages of UML over let's say
46QEMU which we will cover later in this document.
47
48
49Why Would I Want User Mode Linux?
50=================================
51
52
53* If User Mode Linux kernel crashes, your host kernel is still fine. It
54  is not accelerated in any way (vhost, kvm, etc) and it is not trying to
55  access any devices directly.  It is, in fact, a process like any other.
56
57* You can run a usermode kernel as a non-root user (you may need to
58  arrange appropriate permissions for some devices).
59
60* You can run a very small VM with a minimal footprint for a specific
61  task (for example 32M or less).
62
63* You can get extremely high performance for anything which is a "kernel
64  specific task" such as forwarding, firewalling, etc while still being
65  isolated from the host kernel.
66
67* You can play with kernel concepts without breaking things.
68
69* You are not bound by "emulating" hardware, so you can try weird and
70  wonderful concepts which are very difficult to support when emulating
71  real hardware such as time travel and making your system clock
72  dependent on what UML does (very useful for things like tests).
73
74* It's fun.
75
76Why not to run UML
77==================
78
79* The syscall interception technique used by UML makes it inherently
80  slower for any userspace applications. While it can do kernel tasks
81  on par with most other virtualization packages, its userspace is
82  **slow**. The root cause is that UML has a very high cost of creating
83  new processes and threads (something most Unix/Linux applications
84  take for granted).
85
86* UML is strictly uniprocessor at present. If you want to run an
87  application which needs many CPUs to function, it is clearly the
88  wrong choice.
89
90***********************
91Building a UML instance
92***********************
93
94There is no UML installer in any distribution. While you can use off
95the shelf install media to install into a blank VM using a virtualization
96package, there is no UML equivalent. You have to use appropriate tools on
97your host to build a viable filesystem image.
98
99This is extremely easy on Debian - you can do it using debootstrap. It is
100also easy on OpenWRT - the build process can build UML images. All other
101distros - YMMV.
102
103Creating an image
104=================
105
106Create a sparse raw disk image::
107
108   # dd if=/dev/zero of=disk_image_name bs=1 count=1 seek=16G
109
110This will create a 16G disk image. The OS will initially allocate only one
111block and will allocate more as they are written by UML. As of kernel
112version 4.19 UML fully supports TRIM (as usually used by flash drives).
113Using TRIM inside the UML image by specifying discard as a mount option
114or by running ``tune2fs -o discard /dev/ubdXX`` will request UML to
115return any unused blocks to the OS.
116
117Create a filesystem on the disk image and mount it::
118
119   # mkfs.ext4 ./disk_image_name && mount ./disk_image_name /mnt
120
121This example uses ext4, any other filesystem such as ext3, btrfs, xfs,
122jfs, etc will work too.
123
124Create a minimal OS installation on the mounted filesystem::
125
126   # debootstrap buster /mnt http://deb.debian.org/debian
127
128debootstrap does not set up the root password, fstab, hostname or
129anything related to networking. It is up to the user to do that.
130
131Set the root password - the easiest way to do that is to chroot into the
132mounted image::
133
134   # chroot /mnt
135   # passwd
136   # exit
137
138Edit key system files
139=====================
140
141UML block devices are called ubds. The fstab created by debootstrap
142will be empty and it needs an entry for the root file system::
143
144   /dev/ubd0   ext4    discard,errors=remount-ro  0       1
145
146The image hostname will be set to the same as the host on which you
147are creating its image. It is a good idea to change that to avoid
148"Oh, bummer, I rebooted the wrong machine".
149
150UML supports vector I/O high performance network devices which have
151support for some standard virtual network encapsulations like
152Ethernet over GRE and Ethernet over L2TPv3. These are called vecX.
153
154When vector network devices are in use, ``/etc/network/interfaces``
155will need entries like::
156
157   # vector UML network devices
158   auto vec0
159   iface vec0 inet dhcp
160
161We now have a UML image which is nearly ready to run, all we need is a
162UML kernel and modules for it.
163
164Most distributions have a UML package. Even if you intend to use your own
165kernel, testing the image with a stock one is always a good start. These
166packages come with a set of modules which should be copied to the target
167filesystem. The location is distribution dependent. For Debian these
168reside under /usr/lib/uml/modules. Copy recursively the content of this
169directory to the mounted UML filesystem::
170
171   # cp -rax /usr/lib/uml/modules /mnt/lib/modules
172
173If you have compiled your own kernel, you need to use the usual "install
174modules to a location" procedure by running::
175
176  # make INSTALL_MOD_PATH=/mnt/lib/modules modules_install
177
178This will install modules into /mnt/lib/modules/$(KERNELRELEASE).
179To specify the full module installation path, use::
180
181  # make MODLIB=/mnt/lib/modules modules_install
182
183At this point the image is ready to be brought up.
184
185*************************
186Setting Up UML Networking
187*************************
188
189UML networking is designed to emulate an Ethernet connection. This
190connection may be either point-to-point (similar to a connection
191between machines using a back-to-back cable) or a connection to a
192switch. UML supports a wide variety of means to build these
193connections to all of: local machine, remote machine(s), local and
194remote UML and other VM instances.
195
196
197+-----------+--------+------------------------------------+------------+
198| Transport |  Type  |        Capabilities                | Throughput |
199+===========+========+====================================+============+
200| tap       | vector | checksum, tso                      | > 8Gbit    |
201+-----------+--------+------------------------------------+------------+
202| hybrid    | vector | checksum, tso, multipacket rx      | > 6GBit    |
203+-----------+--------+------------------------------------+------------+
204| raw       | vector | checksum, tso, multipacket rx, tx" | > 6GBit    |
205+-----------+--------+------------------------------------+------------+
206| EoGRE     | vector | multipacket rx, tx                 | > 3Gbit    |
207+-----------+--------+------------------------------------+------------+
208| Eol2tpv3  | vector | multipacket rx, tx                 | > 3Gbit    |
209+-----------+--------+------------------------------------+------------+
210| bess      | vector | multipacket rx, tx                 | > 3Gbit    |
211+-----------+--------+------------------------------------+------------+
212| fd        | vector | dependent on fd type               | varies     |
213+-----------+--------+------------------------------------+------------+
214| vde       | vector | dep. on VDE VPN: Virt.Net Locator  | varies     |
215+-----------+--------+------------------------------------+------------+
216
217* All transports which have tso and checksum offloads can deliver speeds
218  approaching 10G on TCP streams.
219
220* All transports which have multi-packet rx and/or tx can deliver pps
221  rates of up to 1Mps or more.
222
223* GRE and L2TPv3 allow connections to all of: local machine, remote
224  machines, remote network devices and remote UML instances.
225
226
227Network configuration privileges
228================================
229
230The majority of the supported networking modes need ``root`` privileges.
231For example, for vector transports, ``root`` privilege is required to fire
232an ioctl to setup the tun interface and/or use raw sockets where needed.
233
234This can be achieved by granting the user a particular capability instead
235of running UML as root.  In case of vector transport, a user can add the
236capability ``CAP_NET_ADMIN`` or ``CAP_NET_RAW`` to the uml binary.
237Thenceforth, UML can be run with normal user privilges, along with
238full networking.
239
240For example::
241
242   # sudo setcap cap_net_raw,cap_net_admin+ep linux
243
244Configuring vector transports
245===============================
246
247All vector transports support a similar syntax:
248
249If X is the interface number as in vec0, vec1, vec2, etc, the general
250syntax for options is::
251
252   vecX:transport="Transport Name",option=value,option=value,...,option=value
253
254Common options
255--------------
256
257These options are common for all transports:
258
259* ``depth=int`` - sets the queue depth for vector IO. This is the
260  amount of packets UML will attempt to read or write in a single
261  system call. The default number is 64 and is generally sufficient
262  for most applications that need throughput in the 2-4 Gbit range.
263  Higher speeds may require larger values.
264
265* ``mac=XX:XX:XX:XX:XX`` - sets the interface MAC address value.
266
267* ``gro=[0,1]`` - sets GRO off or on. Enables receive/transmit offloads.
268  The effect of this option depends on the host side support in the transport
269  which is being configured. In most cases it will enable TCP segmentation and
270  RX/TX checksumming offloads. The setting must be identical on the host side
271  and the UML side. The UML kernel will produce warnings if it is not.
272  For example, GRO is enabled by default on local machine interfaces
273  (e.g. veth pairs, bridge, etc), so it should be enabled in UML in the
274  corresponding UML transports (raw, tap, hybrid) in order for networking to
275  operate correctly.
276
277* ``mtu=int`` - sets the interface MTU
278
279* ``headroom=int`` - adjusts the default headroom (32 bytes) reserved
280  if a packet will need to be re-encapsulated into for instance VXLAN.
281
282* ``vec=0`` - disable multipacket IO and fall back to packet at a
283  time mode
284
285Shared Options
286--------------
287
288* ``ifname=str`` Transports which bind to a local network interface
289  have a shared option - the name of the interface to bind to.
290
291* ``src, dst, src_port, dst_port`` - all transports which use sockets
292  which have the notion of source and destination and/or source port
293  and destination port use these to specify them.
294
295* ``v6=[0,1]`` to specify if a v6 connection is desired for all
296  transports which operate over IP. Additionally, for transports that
297  have some differences in the way they operate over v4 and v6 (for example
298  EoL2TPv3), sets the correct mode of operation. In the absence of this
299  option, the socket type is determined based on what do the src and dst
300  arguments resolve/parse to.
301
302tap transport
303-------------
304
305Example::
306
307   vecX:transport=tap,ifname=tap0,depth=128,gro=1
308
309This will connect vec0 to tap0 on the host. Tap0 must already exist (for example
310created using tunctl) and UP.
311
312tap0 can be configured as a point-to-point interface and given an IP
313address so that UML can talk to the host. Alternatively, it is possible
314to connect UML to a tap interface which is connected to a bridge.
315
316While tap relies on the vector infrastructure, it is not a true vector
317transport at this point, because Linux does not support multi-packet
318IO on tap file descriptors for normal userspace apps like UML. This
319is a privilege which is offered only to something which can hook up
320to it at kernel level via specialized interfaces like vhost-net. A
321vhost-net like helper for UML is planned at some point in the future.
322
323Privileges required: tap transport requires either:
324
325* tap interface to exist and be created persistent and owned by the
326  UML user using tunctl. Example ``tunctl -u uml-user -t tap0``
327
328* binary to have ``CAP_NET_ADMIN`` privilege
329
330hybrid transport
331----------------
332
333Example::
334
335   vecX:transport=hybrid,ifname=tap0,depth=128,gro=1
336
337This is an experimental/demo transport which couples tap for transmit
338and a raw socket for receive. The raw socket allows multi-packet
339receive resulting in significantly higher packet rates than normal tap.
340
341Privileges required: hybrid requires ``CAP_NET_RAW`` capability by
342the UML user as well as the requirements for the tap transport.
343
344raw socket transport
345--------------------
346
347Example::
348
349   vecX:transport=raw,ifname=p-veth0,depth=128,gro=1
350
351
352This transport uses vector IO on raw sockets. While you can bind to any
353interface including a physical one, the most common use it to bind to
354the "peer" side of a veth pair with the other side configured on the
355host.
356
357Example host configuration for Debian:
358
359**/etc/network/interfaces**::
360
361   auto veth0
362   iface veth0 inet static
363	address 192.168.4.1
364	netmask 255.255.255.252
365	broadcast 192.168.4.3
366	pre-up ip link add veth0 type veth peer name p-veth0 && \
367          ifconfig p-veth0 up
368
369UML can now bind to p-veth0 like this::
370
371   vec0:transport=raw,ifname=p-veth0,depth=128,gro=1
372
373
374If the UML guest is configured with 192.168.4.2 and netmask 255.255.255.0
375it can talk to the host on 192.168.4.1
376
377The raw transport also provides some support for offloading some of the
378filtering to the host. The two options to control it are:
379
380* ``bpffile=str`` filename of raw bpf code to be loaded as a socket filter
381
382* ``bpfflash=int`` 0/1 allow loading of bpf from inside User Mode Linux.
383  This option allows the use of the ethtool load firmware command to
384  load bpf code.
385
386In either case the bpf code is loaded into the host kernel. While this is
387presently limited to legacy bpf syntax (not ebpf), it is still a security
388risk. It is not recommended to allow this unless the User Mode Linux
389instance is considered trusted.
390
391Privileges required: raw socket transport requires `CAP_NET_RAW`
392capability.
393
394GRE socket transport
395--------------------
396
397Example::
398
399   vecX:transport=gre,src=$src_host,dst=$dst_host
400
401
402This will configure an Ethernet over ``GRE`` (aka ``GRETAP`` or
403``GREIRB``) tunnel which will connect the UML instance to a ``GRE``
404endpoint at host dst_host. ``GRE`` supports the following additional
405options:
406
407* ``rx_key=int`` - GRE 32-bit integer key for rx packets, if set,
408  ``txkey`` must be set too
409
410* ``tx_key=int`` - GRE 32-bit integer key for tx packets, if set
411  ``rx_key`` must be set too
412
413* ``sequence=[0,1]`` - enable GRE sequence
414
415* ``pin_sequence=[0,1]`` - pretend that the sequence is always reset
416  on each packet (needed to interoperate with some really broken
417  implementations)
418
419* ``v6=[0,1]`` - force IPv4 or IPv6 sockets respectively
420
421* GRE checksum is not presently supported
422
423GRE has a number of caveats:
424
425* You can use only one GRE connection per IP address. There is no way to
426  multiplex connections as each GRE tunnel is terminated directly on
427  the UML instance.
428
429* The key is not really a security feature. While it was intended as such
430  its "security" is laughable. It is, however, a useful feature to
431  ensure that the tunnel is not misconfigured.
432
433An example configuration for a Linux host with a local address of
434192.168.128.1 to connect to a UML instance at 192.168.129.1
435
436**/etc/network/interfaces**::
437
438   auto gt0
439   iface gt0 inet static
440    address 10.0.0.1
441    netmask 255.255.255.0
442    broadcast 10.0.0.255
443    mtu 1500
444    pre-up ip link add gt0 type gretap local 192.168.128.1 \
445           remote 192.168.129.1 || true
446    down ip link del gt0 || true
447
448Additionally, GRE has been tested versus a variety of network equipment.
449
450Privileges required: GRE requires ``CAP_NET_RAW``
451
452l2tpv3 socket transport
453-----------------------
454
455_Warning_. L2TPv3 has a "bug". It is the "bug" known as "has more
456options than GNU ls". While it has some advantages, there are usually
457easier (and less verbose) ways to connect a UML instance to something.
458For example, most devices which support L2TPv3 also support GRE.
459
460Example::
461
462    vec0:transport=l2tpv3,udp=1,src=$src_host,dst=$dst_host,srcport=$src_port,dstport=$dst_port,depth=128,rx_session=0xffffffff,tx_session=0xffff
463
464This will configure an Ethernet over L2TPv3 fixed tunnel which will
465connect the UML instance to a L2TPv3 endpoint at host $dst_host using
466the L2TPv3 UDP flavour and UDP destination port $dst_port.
467
468L2TPv3 always requires the following additional options:
469
470* ``rx_session=int`` - l2tpv3 32-bit integer session for rx packets
471
472* ``tx_session=int`` - l2tpv3 32-bit integer session for tx packets
473
474As the tunnel is fixed these are not negotiated and they are
475preconfigured on both ends.
476
477Additionally, L2TPv3 supports the following optional parameters.
478
479* ``rx_cookie=int`` - l2tpv3 32-bit integer cookie for rx packets - same
480  functionality as GRE key, more to prevent misconfiguration than provide
481  actual security
482
483* ``tx_cookie=int`` - l2tpv3 32-bit integer cookie for tx packets
484
485* ``cookie64=[0,1]`` - use 64-bit cookies instead of 32-bit.
486
487* ``counter=[0,1]`` - enable l2tpv3 counter
488
489* ``pin_counter=[0,1]`` - pretend that the counter is always reset on
490  each packet (needed to interoperate with some really broken
491  implementations)
492
493* ``v6=[0,1]`` - force v6 sockets
494
495* ``udp=[0,1]`` - use raw sockets (0) or UDP (1) version of the protocol
496
497L2TPv3 has a number of caveats:
498
499* you can use only one connection per IP address in raw mode. There is
500  no way to multiplex connections as each L2TPv3 tunnel is terminated
501  directly on the UML instance. UDP mode can use different ports for
502  this purpose.
503
504Here is an example of how to configure a Linux host to connect to UML
505via L2TPv3:
506
507**/etc/network/interfaces**::
508
509   auto l2tp1
510   iface l2tp1 inet static
511    address 192.168.126.1
512    netmask 255.255.255.0
513    broadcast 192.168.126.255
514    mtu 1500
515    pre-up ip l2tp add tunnel remote 127.0.0.1 \
516           local 127.0.0.1 encap udp tunnel_id 2 \
517           peer_tunnel_id 2 udp_sport 1706 udp_dport 1707 && \
518           ip l2tp add session name l2tp1 tunnel_id 2 \
519           session_id 0xffffffff peer_session_id 0xffffffff
520    down ip l2tp del session tunnel_id 2 session_id 0xffffffff && \
521           ip l2tp del tunnel tunnel_id 2
522
523
524Privileges required: L2TPv3 requires ``CAP_NET_RAW`` for raw IP mode and
525no special privileges for the UDP mode.
526
527BESS socket transport
528---------------------
529
530BESS is a high performance modular network switch.
531
532https://github.com/NetSys/bess
533
534It has support for a simple sequential packet socket mode which in the
535more recent versions is using vector IO for high performance.
536
537Example::
538
539   vecX:transport=bess,src=$unix_src,dst=$unix_dst
540
541This will configure a BESS transport using the unix_src Unix domain
542socket address as source and unix_dst socket address as destination.
543
544For BESS configuration and how to allocate a BESS Unix domain socket port
545please see the BESS documentation.
546
547https://github.com/NetSys/bess/wiki/Built-In-Modules-and-Ports
548
549BESS transport does not require any special privileges.
550
551VDE vector transport
552--------------------
553
554Virtual Distributed Ethernet (VDE) is a project whose main goal is to provide a
555highly flexible support for virtual networking.
556
557http://wiki.virtualsquare.org/#/tutorials/vdebasics
558
559Common usages of VDE include fast prototyping and teaching.
560
561Examples:
562
563   ``vecX:transport=vde,vnl=tap://tap0``
564
565use tap0
566
567   ``vecX:transport=vde,vnl=slirp://``
568
569use slirp
570
571   ``vec0:transport=vde,vnl=vde:///tmp/switch``
572
573connect to a vde switch
574
575   ``vecX:transport=\"vde,vnl=cmd://ssh remote.host //tmp/sshlirp\"``
576
577connect to a remote slirp (instant VPN: convert ssh to VPN, it uses sshlirp)
578https://github.com/virtualsquare/sshlirp
579
580   ``vec0:transport=vde,vnl=vxvde://234.0.0.1``
581
582connect to a local area cloud (all the UML nodes using the same
583multicast address running on hosts in the same multicast domain (LAN)
584will be automagically connected together to a virtual LAN.
585
586***********
587Running UML
588***********
589
590This section assumes that either the user-mode-linux package from the
591distribution or a custom built kernel has been installed on the host.
592
593These add an executable called linux to the system. This is the UML
594kernel. It can be run just like any other executable.
595It will take most normal linux kernel arguments as command line
596arguments.  Additionally, it will need some UML-specific arguments
597in order to do something useful.
598
599Arguments
600=========
601
602Mandatory Arguments:
603--------------------
604
605* ``mem=int[K,M,G]`` - amount of memory. By default in bytes. It will
606  also accept K, M or G qualifiers.
607
608* ``ubdX[s,d,c,t]=`` virtual disk specification. This is not really
609  mandatory, but it is likely to be needed in nearly all cases so we can
610  specify a root file system.
611  The simplest possible image specification is the name of the image
612  file for the filesystem (created using one of the methods described
613  in `Creating an image`_).
614
615  * UBD devices support copy on write (COW). The changes are kept in
616    a separate file which can be discarded allowing a rollback to the
617    original pristine image.  If COW is desired, the UBD image is
618    specified as: ``cow_file,master_image``.
619    Example:``ubd0=Filesystem.cow,Filesystem.img``
620
621  * UBD devices can be set to use synchronous IO. Any writes are
622    immediately flushed to disk. This is done by adding ``s`` after
623    the ``ubdX`` specification.
624
625  * UBD performs some heuristics on devices specified as a single
626    filename to make sure that a COW file has not been specified as
627    the image. To turn them off, use the ``d`` flag after ``ubdX``.
628
629  * UBD supports TRIM - asking the Host OS to reclaim any unused
630    blocks in the image. To turn it off, specify the ``t`` flag after
631    ``ubdX``.
632
633* ``root=`` root device - most likely ``/dev/ubd0`` (this is a Linux
634  filesystem image)
635
636Important Optional Arguments
637----------------------------
638
639If UML is run as "linux" with no extra arguments, it will try to start an
640xterm for every console configured inside the image (up to 6 in most
641Linux distributions). Each console is started inside an
642xterm. This makes it nice and easy to use UML on a host with a GUI. It is,
643however, the wrong approach if UML is to be used as a testing harness or run
644in a text-only environment.
645
646In order to change this behaviour we need to specify an alternative console
647and wire it to one of the supported "line" channels. For this we need to map a
648console to use something different from the default xterm.
649
650Example which will divert console number 1 to stdin/stdout::
651
652   con1=fd:0,fd:1
653
654UML supports a wide variety of serial line channels which are specified using
655the following syntax
656
657   conX=channel_type:options[,channel_type:options]
658
659
660If the channel specification contains two parts separated by comma, the first
661one is input, the second one output.
662
663* The null channel - Discard all input or output. Example ``con=null`` will set
664  all consoles to null by default.
665
666* The fd channel - use file descriptor numbers for input/output. Example:
667  ``con1=fd:0,fd:1.``
668
669* The port channel - start a telnet server on TCP port number. Example:
670  ``con1=port:4321``.  The host must have /usr/sbin/in.telnetd (usually part of
671  a telnetd package) and the port-helper from the UML utilities (see the
672  information for the xterm channel below).  UML will not boot until a client
673  connects.
674
675* The pty and pts channels - use system pty/pts.
676
677* The tty channel - bind to an existing system tty. Example: ``con1=/dev/tty8``
678  will make UML use the host 8th console (usually unused).
679
680* The xterm channel - this is the default - bring up an xterm on this channel
681  and direct IO to it. Note that in order for xterm to work, the host must
682  have the UML distribution package installed. This usually contains the
683  port-helper and other utilities needed for UML to communicate with the xterm.
684  Alternatively, these need to be complied and installed from source. All
685  options applicable to consoles also apply to UML serial lines which are
686  presented as ttyS inside UML.
687
688Starting UML
689============
690
691We can now run UML.
692::
693
694   # linux mem=2048M umid=TEST \
695    ubd0=Filesystem.img \
696    vec0:transport=tap,ifname=tap0,depth=128,gro=1 \
697    root=/dev/ubda con=null con0=null,fd:2 con1=fd:0,fd:1
698
699This will run an instance with ``2048M RAM`` and try to use the image file
700called ``Filesystem.img`` as root. It will connect to the host using tap0.
701All consoles except ``con1`` will be disabled and console 1 will
702use standard input/output making it appear in the same terminal it was started.
703
704Logging in
705============
706
707If you have not set up a password when generating the image, you will have to
708shut down the UML instance, mount the image, chroot into it and set it - as
709described in the Generating an Image section.  If the password is already set,
710you can just log in.
711
712The UML Management Console
713============================
714
715In addition to managing the image from "the inside" using normal sysadmin tools,
716it is possible to perform a number of low-level operations using the UML
717management console. The UML management console is a low-level interface to the
718kernel on a running UML instance, somewhat like the i386 SysRq interface. Since
719there is a full-blown operating system under UML, there is much greater
720flexibility possible than with the SysRq mechanism.
721
722There are a number of things you can do with the mconsole interface:
723
724* get the kernel version
725* add and remove devices
726* halt or reboot the machine
727* Send SysRq commands
728* Pause and resume the UML
729* Inspect processes running inside UML
730* Inspect UML internal /proc state
731
732You need the mconsole client (uml\_mconsole) which is a part of the UML
733tools package available in most Linux distritions.
734
735You also need ``CONFIG_MCONSOLE`` (under 'General Setup') enabled in the UML
736kernel.  When you boot UML, you'll see a line like::
737
738   mconsole initialized on /home/jdike/.uml/umlNJ32yL/mconsole
739
740If you specify a unique machine id on the UML command line, i.e.
741``umid=debian``, you'll see this::
742
743   mconsole initialized on /home/jdike/.uml/debian/mconsole
744
745
746That file is the socket that uml_mconsole will use to communicate with
747UML.  Run it with either the umid or the full path as its argument::
748
749   # uml_mconsole debian
750
751or
752
753   # uml_mconsole /home/jdike/.uml/debian/mconsole
754
755
756You'll get a prompt, at which you can run one of these commands:
757
758* version
759* help
760* halt
761* reboot
762* config
763* remove
764* sysrq
765* help
766* cad
767* stop
768* go
769* proc
770* stack
771
772version
773-------
774
775This command takes no arguments.  It prints the UML version::
776
777   (mconsole)  version
778   OK Linux OpenWrt 4.14.106 #0 Tue Mar 19 08:19:41 2019 x86_64
779
780
781There are a couple actual uses for this.  It's a simple no-op which
782can be used to check that a UML is running.  It's also a way of
783sending a device interrupt to the UML. UML mconsole is treated internally as
784a UML device.
785
786help
787----
788
789This command takes no arguments. It prints a short help screen with the
790supported mconsole commands.
791
792
793halt and reboot
794---------------
795
796These commands take no arguments.  They shut the machine down immediately, with
797no syncing of disks and no clean shutdown of userspace.  So, they are
798pretty close to crashing the machine::
799
800   (mconsole)  halt
801   OK
802
803config
804------
805
806"config" adds a new device to the virtual machine. This is supported
807by most UML device drivers. It takes one argument, which is the
808device to add, with the same syntax as the kernel command line::
809
810   (mconsole) config ubd3=/home/jdike/incoming/roots/root_fs_debian22
811
812remove
813------
814
815"remove" deletes a device from the system.  Its argument is just the
816name of the device to be removed. The device must be idle in whatever
817sense the driver considers necessary.  In the case of the ubd driver,
818the removed block device must not be mounted, swapped on, or otherwise
819open, and in the case of the network driver, the device must be down::
820
821   (mconsole)  remove ubd3
822
823sysrq
824-----
825
826This command takes one argument, which is a single letter.  It calls the
827generic kernel's SysRq driver, which does whatever is called for by
828that argument.  See the SysRq documentation in
829Documentation/admin-guide/sysrq.rst in your favorite kernel tree to
830see what letters are valid and what they do.
831
832cad
833---
834
835This invokes the ``Ctl-Alt-Del`` action in the running image.  What exactly
836this ends up doing is up to init, systemd, etc.  Normally, it reboots the
837machine.
838
839stop
840----
841
842This puts the UML in a loop reading mconsole requests until a 'go'
843mconsole command is received. This is very useful as a
844debugging/snapshotting tool.
845
846go
847--
848
849This resumes a UML after being paused by a 'stop' command. Note that
850when the UML has resumed, TCP connections may have timed out and if
851the UML is paused for a long period of time, crond might go a little
852crazy, running all the jobs it didn't do earlier.
853
854proc
855----
856
857This takes one argument - the name of a file in /proc which is printed
858to the mconsole standard output
859
860stack
861-----
862
863This takes one argument - the pid number of a process. Its stack is
864printed to a standard output.
865
866*******************
867Advanced UML Topics
868*******************
869
870Sharing Filesystems between Virtual Machines
871============================================
872
873Don't attempt to share filesystems simply by booting two UMLs from the
874same file.  That's the same thing as booting two physical machines
875from a shared disk.  It will result in filesystem corruption.
876
877Using layered block devices
878---------------------------
879
880The way to share a filesystem between two virtual machines is to use
881the copy-on-write (COW) layering capability of the ubd block driver.
882Any changed blocks are stored in the private COW file, while reads come
883from either device - the private one if the requested block is valid in
884it, the shared one if not.  Using this scheme, the majority of data
885which is unchanged is shared between an arbitrary number of virtual
886machines, each of which has a much smaller file containing the changes
887that it has made.  With a large number of UMLs booting from a large root
888filesystem, this leads to a huge disk space saving.
889
890Sharing file system data will also help performance, since the host will
891be able to cache the shared data using a much smaller amount of memory,
892so UML disk requests will be served from the host's memory rather than
893its disks.  There is a major caveat in doing this on multisocket NUMA
894machines.  On such hardware, running many UML instances with a shared
895master image and COW changes may cause issues like NMIs from excess of
896inter-socket traffic.
897
898If you are running UML on high-end hardware like this, make sure to
899bind UML to a set of logical CPUs residing on the same socket using the
900``taskset`` command or have a look at the "tuning" section.
901
902To add a copy-on-write layer to an existing block device file, simply
903add the name of the COW file to the appropriate ubd switch::
904
905   ubd0=root_fs_cow,root_fs_debian_22
906
907where ``root_fs_cow`` is the private COW file and ``root_fs_debian_22`` is
908the existing shared filesystem.  The COW file need not exist.  If it
909doesn't, the driver will create and initialize it.
910
911Disk Usage
912----------
913
914UML has TRIM support which will release any unused space in its disk
915image files to the underlying OS. It is important to use either ls -ls
916or du to verify the actual file size.
917
918COW validity.
919-------------
920
921Any changes to the master image will invalidate all COW files. If this
922happens, UML will *NOT* automatically delete any of the COW files and
923will refuse to boot. In this case the only solution is to either
924restore the old image (including its last modified timestamp) or remove
925all COW files which will result in their recreation. Any changes in
926the COW files will be lost.
927
928Cows can moo - uml_moo : Merging a COW file with its backing file
929-----------------------------------------------------------------
930
931Depending on how you use UML and COW devices, it may be advisable to
932merge the changes in the COW file into the backing file every once in
933a while.
934
935The utility that does this is uml_moo.  Its usage is::
936
937   uml_moo COW_file new_backing_file
938
939
940There's no need to specify the backing file since that information is
941already in the COW file header.  If you're paranoid, boot the new
942merged file, and if you're happy with it, move it over the old backing
943file.
944
945``uml_moo`` creates a new backing file by default as a safety measure.
946It also has a destructive merge option which will merge the COW file
947directly into its current backing file.  This is really only usable
948when the backing file only has one COW file associated with it.  If
949there are multiple COWs associated with a backing file, a -d merge of
950one of them will invalidate all of the others.  However, it is
951convenient if you're short of disk space, and it should also be
952noticeably faster than a non-destructive merge.
953
954``uml_moo`` is installed with the UML distribution packages and is
955available as a part of UML utilities.
956
957Host file access
958==================
959
960If you want to access files on the host machine from inside UML, you
961can treat it as a separate machine and either nfs mount directories
962from the host or copy files into the virtual machine with scp.
963However, since UML is running on the host, it can access those
964files just like any other process and make them available inside the
965virtual machine without the need to use the network.
966This is possible with the hostfs virtual filesystem.  With it, you
967can mount a host directory into the UML filesystem and access the
968files contained in it just as you would on the host.
969
970*SECURITY WARNING*
971
972Hostfs without any parameters to the UML Image will allow the image
973to mount any part of the host filesystem and write to it. Always
974confine hostfs to a specific "harmless" directory (for example ``/var/tmp``)
975if running UML. This is especially important if UML is being run as root.
976
977Using hostfs
978------------
979
980To begin with, make sure that hostfs is available inside the virtual
981machine with::
982
983   # cat /proc/filesystems
984
985``hostfs`` should be listed.  If it's not, either rebuild the kernel
986with hostfs configured into it or make sure that hostfs is built as a
987module and available inside the virtual machine, and insmod it.
988
989
990Now all you need to do is run mount::
991
992   # mount none /mnt/host -t hostfs
993
994will mount the host's ``/`` on the virtual machine's ``/mnt/host``.
995If you don't want to mount the host root directory, then you can
996specify a subdirectory to mount with the -o switch to mount::
997
998   # mount none /mnt/home -t hostfs -o /home
999
1000will mount the host's /home on the virtual machine's /mnt/home.
1001
1002hostfs as the root filesystem
1003-----------------------------
1004
1005It's possible to boot from a directory hierarchy on the host using
1006hostfs rather than using the standard filesystem in a file.
1007To start, you need that hierarchy.  The easiest way is to loop mount
1008an existing root_fs file::
1009
1010   #  mount root_fs uml_root_dir -o loop
1011
1012
1013You need to change the filesystem type of ``/`` in ``etc/fstab`` to be
1014'hostfs', so that line looks like this::
1015
1016   /dev/ubd/0       /        hostfs      defaults          1   1
1017
1018Then you need to chown to yourself all the files in that directory
1019that are owned by root.  This worked for me::
1020
1021   #  find . -uid 0 -exec chown jdike {} \;
1022
1023Next, make sure that your UML kernel has hostfs compiled in, not as a
1024module.  Then run UML with the boot device pointing at that directory::
1025
1026   ubd0=/path/to/uml/root/directory
1027
1028UML should then boot as it does normally.
1029
1030Hostfs Caveats
1031--------------
1032
1033Hostfs does not support keeping track of host filesystem changes on the
1034host (outside UML). As a result, if a file is changed without UML's
1035knowledge, UML will not know about it and its own in-memory cache of
1036the file may be corrupt. While it is possible to fix this, it is not
1037something which is being worked on at present.
1038
1039Tuning UML
1040============
1041
1042UML at present is strictly uniprocessor. It will, however spin up a
1043number of threads to handle various functions.
1044
1045The UBD driver, SIGIO and the MMU emulation do that. If the system is
1046idle, these threads will be migrated to other processors on a SMP host.
1047This, unfortunately, will usually result in LOWER performance because of
1048all of the cache/memory synchronization traffic between cores. As a
1049result, UML will usually benefit from being pinned on a single CPU,
1050especially on a large system. This can result in performance differences
1051of 5 times or higher on some benchmarks.
1052
1053Similarly, on large multi-node NUMA systems UML will benefit if all of
1054its memory is allocated from the same NUMA node it will run on. The
1055OS will *NOT* do that by default. In order to do that, the sysadmin
1056needs to create a suitable tmpfs ramdisk bound to a particular node
1057and use that as the source for UML RAM allocation by specifying it
1058in the TMP or TEMP environment variables. UML will look at the values
1059of ``TMPDIR``, ``TMP`` or ``TEMP`` for that. If that fails, it will
1060look for shmfs mounted under ``/dev/shm``. If everything else fails use
1061``/tmp/`` regardless of the filesystem type used for it::
1062
1063   mount -t tmpfs -ompol=bind:X none /mnt/tmpfs-nodeX
1064   TEMP=/mnt/tmpfs-nodeX taskset -cX linux options options options..
1065
1066*******************************************
1067Contributing to UML and Developing with UML
1068*******************************************
1069
1070UML is an excellent platform to develop new Linux kernel concepts -
1071filesystems, devices, virtualization, etc. It provides unrivalled
1072opportunities to create and test them without being constrained to
1073emulating specific hardware.
1074
1075Example - want to try how Linux will work with 4096 "proper" network
1076devices?
1077
1078Not an issue with UML. At the same time, this is something which
1079is difficult with other virtualization packages - they are
1080constrained by the number of devices allowed on the hardware bus
1081they are trying to emulate (for example 16 on a PCI bus in qemu).
1082
1083If you have something to contribute such as a patch, a bugfix, a
1084new feature, please send it to ``linux-um@lists.infradead.org``.
1085
1086Please follow all standard Linux patch guidelines such as cc-ing
1087relevant maintainers and run ``./scripts/checkpatch.pl`` on your patch.
1088For more details see ``Documentation/process/submitting-patches.rst``
1089
1090Note - the list does not accept HTML or attachments, all emails must
1091be formatted as plain text.
1092
1093Developing always goes hand in hand with debugging. First of all,
1094you can always run UML under gdb and there will be a whole section
1095later on on how to do that. That, however, is not the only way to
1096debug a Linux kernel. Quite often adding tracing statements and/or
1097using UML specific approaches such as ptracing the UML kernel process
1098are significantly more informative.
1099
1100Tracing UML
1101=============
1102
1103When running, UML consists of a main kernel thread and a number of
1104helper threads. The ones of interest for tracing are NOT the ones
1105that are already ptraced by UML as a part of its MMU emulation.
1106
1107These are usually the first three threads visible in a ps display.
1108The one with the lowest PID number and using most CPU is usually the
1109kernel thread. The other threads are the disk
1110(ubd) device helper thread and the SIGIO helper thread.
1111Running ptrace on this thread usually results in the following picture::
1112
1113   host$ strace -p 16566
1114   --- SIGIO {si_signo=SIGIO, si_code=POLL_IN, si_band=65} ---
1115   epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
1116   epoll_wait(4, [], 64, 0)                = 0
1117   rt_sigreturn({mask=[PIPE]})             = 16967
1118   ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
1119   ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
1120   ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
1121   ptrace(PTRACE_SETREGS, 16967, NULL, 0xd5f34f38) = 0
1122   ptrace(PTRACE_SETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=2696}]) = 0
1123   ptrace(PTRACE_SYSEMU, 16967, NULL, 0)   = 0
1124   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=16967, si_uid=0, si_status=SIGTRAP, si_utime=65, si_stime=89} ---
1125   wait4(16967, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP | 0x80}], WSTOPPED|__WALL, NULL) = 16967
1126   ptrace(PTRACE_GETREGS, 16967, NULL, 0xd5f34f38) = 0
1127   ptrace(PTRACE_GETREGSET, 16967, NT_X86_XSTATE, [{iov_base=0xd5f35010, iov_len=832}]) = 0
1128   ptrace(PTRACE_GETSIGINFO, 16967, NULL, {si_signo=SIGTRAP, si_code=0x85, si_pid=16967, si_uid=0}) = 0
1129   timer_settime(0, 0, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=2830912}}, NULL) = 0
1130   getpid()                                = 16566
1131   clock_nanosleep(CLOCK_MONOTONIC, 0, {tv_sec=1, tv_nsec=0}, NULL) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
1132   --- SIGALRM {si_signo=SIGALRM, si_code=SI_TIMER, si_timerid=0, si_overrun=0, si_value={int=1631716592, ptr=0x614204f0}} ---
1133   rt_sigreturn({mask=[PIPE]})             = -1 EINTR (Interrupted system call)
1134
1135This is a typical picture from a mostly idle UML instance.
1136
1137* UML interrupt controller uses epoll - this is UML waiting for IO
1138  interrupts:
1139
1140   epoll_wait(4, [{EPOLLIN, {u32=3721159424, u64=3721159424}}], 64, 0) = 1
1141
1142* The sequence of ptrace calls is part of MMU emulation and running the
1143  UML userspace.
1144* ``timer_settime`` is part of the UML high res timer subsystem mapping
1145  timer requests from inside UML onto the host high resolution timers.
1146* ``clock_nanosleep`` is UML going into idle (similar to the way a PC
1147  will execute an ACPI idle).
1148
1149As you can see UML will generate quite a bit of output even in idle. The output
1150can be very informative when observing IO. It shows the actual IO calls, their
1151arguments and returns values.
1152
1153Kernel debugging
1154================
1155
1156You can run UML under gdb now, though it will not necessarily agree to
1157be started under it. If you are trying to track a runtime bug, it is
1158much better to attach gdb to a running UML instance and let UML run.
1159
1160Assuming the same PID number as in the previous example, this would be::
1161
1162   # gdb -p 16566
1163
1164This will STOP the UML instance, so you must enter `cont` at the GDB
1165command line to request it to continue. It may be a good idea to make
1166this into a gdb script and pass it to gdb as an argument.
1167
1168Developing Device Drivers
1169=========================
1170
1171Nearly all UML drivers are monolithic. While it is possible to build a
1172UML driver as a kernel module, that limits the possible functionality
1173to in-kernel only and non-UML specific.  The reason for this is that
1174in order to really leverage UML, one needs to write a piece of
1175userspace code which maps driver concepts onto actual userspace host
1176calls.
1177
1178This forms the so-called "user" portion of the driver. While it can
1179reuse a lot of kernel concepts, it is generally just another piece of
1180userspace code. This portion needs some matching "kernel" code which
1181resides inside the UML image and which implements the Linux kernel part.
1182
1183*Note: There are very few limitations in the way "kernel" and "user" interact*.
1184
1185UML does not have a strictly defined kernel-to-host API. It does not
1186try to emulate a specific architecture or bus. UML's "kernel" and
1187"user" can share memory, code and interact as needed to implement
1188whatever design the software developer has in mind. The only
1189limitations are purely technical. Due to a lot of functions and
1190variables having the same names, the developer should be careful
1191which includes and libraries they are trying to refer to.
1192
1193As a result a lot of userspace code consists of simple wrappers.
1194E.g. ``os_close_file()`` is just a wrapper around ``close()``
1195which ensures that the userspace function close does not clash
1196with similarly named function(s) in the kernel part.
1197
1198Using UML as a Test Platform
1199============================
1200
1201UML is an excellent test platform for device driver development. As
1202with most things UML, "some user assembly may be required". It is
1203up to the user to build their emulation environment. UML at present
1204provides only the kernel infrastructure.
1205
1206Part of this infrastructure is the ability to load and parse fdt
1207device tree blobs as used in Arm or Open Firmware platforms. These
1208are supplied as an optional extra argument to the kernel command
1209line::
1210
1211    dtb=filename
1212
1213The device tree is loaded and parsed at boottime and is accessible by
1214drivers which query it. At this moment in time this facility is
1215intended solely for development purposes. UML's own devices do not
1216query the device tree.
1217
1218Security Considerations
1219-----------------------
1220
1221Drivers or any new functionality should default to not
1222accepting arbitrary filename, bpf code or other parameters
1223which can affect the host from inside the UML instance.
1224For example, specifying the socket used for IPC communication
1225between a driver and the host at the UML command line is OK
1226security-wise. Allowing it as a loadable module parameter
1227isn't.
1228
1229If such functionality is desirable for a particular application
1230(e.g. loading BPF "firmware" for raw socket network transports),
1231it should be off by default and should be explicitly turned on
1232as a command line parameter at startup.
1233
1234Even with this in mind, the level of isolation between UML
1235and the host is relatively weak. If the UML userspace is
1236allowed to load arbitrary kernel drivers, an attacker can
1237use this to break out of UML. Thus, if UML is used in
1238a production application, it is recommended that all modules
1239are loaded at boot and kernel module loading is disabled
1240afterwards.
1241