xref: /linux/Documentation/networking/mptcp.rst (revision 55d0969c451159cff86949b38c39171cab962069)
1.. SPDX-License-Identifier: GPL-2.0
2
3=====================
4Multipath TCP (MPTCP)
5=====================
6
7Introduction
8============
9
10Multipath TCP or MPTCP is an extension to the standard TCP and is described in
11`RFC 8684 (MPTCPv1) <https://www.rfc-editor.org/rfc/rfc8684.html>`_. It allows a
12device to make use of multiple interfaces at once to send and receive TCP
13packets over a single MPTCP connection. MPTCP can aggregate the bandwidth of
14multiple interfaces or prefer the one with the lowest latency. It also allows a
15fail-over if one path is down, and the traffic is seamlessly reinjected on other
16paths.
17
18For more details about Multipath TCP in the Linux kernel, please see the
19official website: `mptcp.dev <https://www.mptcp.dev>`_.
20
21
22Use cases
23=========
24
25Thanks to MPTCP, being able to use multiple paths in parallel or simultaneously
26brings new use-cases, compared to TCP:
27
28- Seamless handovers: switching from one path to another while preserving
29  established connections, e.g. to be used in mobility use-cases, like on
30  smartphones.
31- Best network selection: using the "best" available path depending on some
32  conditions, e.g. latency, losses, cost, bandwidth, etc.
33- Network aggregation: using multiple paths at the same time to have a higher
34  throughput, e.g. to combine fixed and mobile networks to send files faster.
35
36
37Concepts
38========
39
40Technically, when a new socket is created with the ``IPPROTO_MPTCP`` protocol
41(Linux-specific), a *subflow* (or *path*) is created. This *subflow* consists of
42a regular TCP connection that is used to transmit data through one interface.
43Additional *subflows* can be negotiated later between the hosts. For the remote
44host to be able to detect the use of MPTCP, a new field is added to the TCP
45*option* field of the underlying TCP *subflow*. This field contains, amongst
46other things, a ``MP_CAPABLE`` option that tells the other host to use MPTCP if
47it is supported. If the remote host or any middlebox in between does not support
48it, the returned ``SYN+ACK`` packet will not contain MPTCP options in the TCP
49*option* field. In that case, the connection will be "downgraded" to plain TCP,
50and it will continue with a single path.
51
52This behavior is made possible by two internal components: the path manager, and
53the packet scheduler.
54
55Path Manager
56------------
57
58The Path Manager is in charge of *subflows*, from creation to deletion, and also
59address announcements. Typically, it is the client side that initiates subflows,
60and the server side that announces additional addresses via the ``ADD_ADDR`` and
61``REMOVE_ADDR`` options.
62
63Path managers are controlled by the ``net.mptcp.pm_type`` sysctl knob -- see
64mptcp-sysctl.rst. There are two types: the in-kernel one (type ``0``) where the
65same rules are applied for all the connections (see: ``ip mptcp``) ; and the
66userspace one (type ``1``), controlled by a userspace daemon (i.e. `mptcpd
67<https://mptcpd.mptcp.dev/>`_) where different rules can be applied for each
68connection. The path managers can be controlled via a Netlink API; see
69netlink_spec/mptcp_pm.rst.
70
71To be able to use multiple IP addresses on a host to create multiple *subflows*
72(paths), the default in-kernel MPTCP path-manager needs to know which IP
73addresses can be used. This can be configured with ``ip mptcp endpoint`` for
74example.
75
76Packet Scheduler
77----------------
78
79The Packet Scheduler is in charge of selecting which available *subflow(s)* to
80use to send the next data packet. It can decide to maximize the use of the
81available bandwidth, only to pick the path with the lower latency, or any other
82policy depending on the configuration.
83
84Packet schedulers are controlled by the ``net.mptcp.scheduler`` sysctl knob --
85see mptcp-sysctl.rst.
86
87
88Sockets API
89===========
90
91Creating MPTCP sockets
92----------------------
93
94On Linux, MPTCP can be used by selecting MPTCP instead of TCP when creating the
95``socket``:
96
97.. code-block:: C
98
99    int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
100
101Note that ``IPPROTO_MPTCP`` is defined as ``262``.
102
103If MPTCP is not supported, ``errno`` will be set to:
104
105- ``EINVAL``: (*Invalid argument*): MPTCP is not available, on kernels < 5.6.
106- ``EPROTONOSUPPORT`` (*Protocol not supported*): MPTCP has not been compiled,
107  on kernels >= v5.6.
108- ``ENOPROTOOPT`` (*Protocol not available*): MPTCP has been disabled using
109  ``net.mptcp.enabled`` sysctl knob; see mptcp-sysctl.rst.
110
111MPTCP is then opt-in: applications need to explicitly request it. Note that
112applications can be forced to use MPTCP with different techniques, e.g.
113``LD_PRELOAD`` (see ``mptcpize``), eBPF (see ``mptcpify``), SystemTAP,
114``GODEBUG`` (``GODEBUG=multipathtcp=1``), etc.
115
116Switching to ``IPPROTO_MPTCP`` instead of ``IPPROTO_TCP`` should be as
117transparent as possible for the userspace applications.
118
119Socket options
120--------------
121
122MPTCP supports most socket options handled by TCP. It is possible some less
123common options are not supported, but contributions are welcome.
124
125Generally, the same value is propagated to all subflows, including the ones
126created after the calls to ``setsockopt()``. eBPF can be used to set different
127values per subflow.
128
129There are some MPTCP specific socket options at the ``SOL_MPTCP`` (284) level to
130retrieve info. They fill the ``optval`` buffer of the ``getsockopt()`` system
131call:
132
133- ``MPTCP_INFO``: Uses ``struct mptcp_info``.
134- ``MPTCP_TCPINFO``: Uses ``struct mptcp_subflow_data``, followed by an array of
135  ``struct tcp_info``.
136- ``MPTCP_SUBFLOW_ADDRS``: Uses ``struct mptcp_subflow_data``, followed by an
137  array of ``mptcp_subflow_addrs``.
138- ``MPTCP_FULL_INFO``: Uses ``struct mptcp_full_info``, with one pointer to an
139  array of ``struct mptcp_subflow_info`` (including the
140  ``struct mptcp_subflow_addrs``), and one pointer to an array of
141  ``struct tcp_info``, followed by the content of ``struct mptcp_info``.
142
143Note that at the TCP level, ``TCP_IS_MPTCP`` socket option can be used to know
144if MPTCP is currently being used: the value will be set to 1 if it is.
145
146
147Design choices
148==============
149
150A new socket type has been added for MPTCP for the userspace-facing socket. The
151kernel is in charge of creating subflow sockets: they are TCP sockets where the
152behavior is modified using TCP-ULP.
153
154MPTCP listen sockets will create "plain" *accepted* TCP sockets if the
155connection request from the client didn't ask for MPTCP, making the performance
156impact minimal when MPTCP is enabled by default.
157