xref: /linux/Documentation/networking/ipvs-sysctl.rst (revision bba2c3615bd6cfee7456d1130f2e6b01b3f4e9ba)
1.. SPDX-License-Identifier: GPL-2.0
2
3===========
4IPvs-sysctl
5===========
6
7/proc/sys/net/ipv4/vs/* Variables:
8==================================
9
10am_droprate - INTEGER
11	default 10
12
13	It sets the always mode drop rate, which is used in the mode 3
14	of the drop_rate defense.
15
16amemthresh - INTEGER
17	default 1024
18
19	It sets the available memory threshold (in pages), which is
20	used in the automatic modes of defense. When there is no
21	enough available memory, the respective strategy will be
22	enabled and the variable is automatically set to 2, otherwise
23	the strategy is disabled and the variable is  set  to 1.
24
25backup_only - BOOLEAN
26	- 0 - disabled (default)
27	- not 0 - enabled
28
29	If set, disable the director function while the server is
30	in backup mode to avoid packet loops for DR/TUN methods.
31
32conn_lfactor - INTEGER
33	Possible values: -8 (larger table) .. 8 (smaller table)
34
35	Default: -4
36
37	Controls the sizing of the connection hash table based on the
38	load factor (number of connections per table buckets):
39
40		2^conn_lfactor = nodes / buckets
41
42	As result, the table grows if load increases and shrinks when
43	load decreases in the range of 2^8 - 2^conn_tab_bits (module
44	parameter).
45	The value is a shift count where negative values select
46	buckets = (connection hash nodes << -value) while positive
47	values select buckets = (connection hash nodes >> value). The
48	negative values reduce the collisions and reduce the time for
49	lookups but increase the table size. Positive values will
50	tolerate load above 100% when using smaller table is
51	preferred with the cost of more collisions. If using NAT
52	connections consider decreasing the value with one because
53	they add two nodes in the hash table.
54
55	Example:
56	-4: grow if load goes above 6% (buckets = nodes * 16)
57	2: grow if load goes above 400% (buckets = nodes / 4)
58
59conn_max - INTEGER
60	Limit for number of connections, per netns.
61
62	Controls the soft and hard limit for number of connections.
63	Initially, the platform specific limit is assigned for init_net.
64	The value can be changed and later the soft limit propagated
65	to other networking namespaces.
66
67	Privileged admin can change both limits up to the value of the
68	platform limit while the unprivileged admin can change only the
69	soft limit up to the value of the hard limit.
70
71	For setups using conntrack=1 (CONFIG_IP_VS_NFCT for
72	Netfilter connection tracking) the connections can be
73	limited also by nf_conntrack_max.
74
75	Limits for init_net:
76
77	======================= =============== =============
78	\			soft limit	hard limit
79	======================= =============== =============
80	create netns		platform	platform
81	priv admin		0 .. platform	0 .. platform
82	======================= =============== =============
83
84	Limits for new netns:
85
86	======================= =============== =============
87	\			soft limit	hard limit
88	======================= =============== =============
89	create netns		init_net:soft	init_net:soft
90	priv admin		0 .. platform	0 .. platform
91	unpriv admin		0 .. hard	N/A
92	======================= =============== =============
93
94	Limits per platform:
95
96	- 1,073,741,824 (2^30 for 64-bit)
97	- 16,777,216 (2^24 for 32-bit)
98
99	Possible values: 0 .. platform limit
100
101	Default: platform limit
102
103conn_reuse_mode - INTEGER
104	1 - default
105
106	Controls how ipvs will deal with connections that are detected
107	port reuse. It is a bitmap, with the values being:
108
109	0: disable any special handling on port reuse. The new
110	connection will be delivered to the same real server that was
111	servicing the previous connection.
112
113	bit 1: enable rescheduling of new connections when it is safe.
114	That is, whenever expire_nodest_conn and for TCP sockets, when
115	the connection is in TIME_WAIT state (which is only possible if
116	you use NAT mode).
117
118	bit 2: it is bit 1 plus, for TCP connections, when connections
119	are in FIN_WAIT state, as this is the last state seen by load
120	balancer in Direct Routing mode. This bit helps on adding new
121	real servers to a very busy cluster.
122
123conntrack - BOOLEAN
124	- 0 - disabled (default)
125	- not 0 - enabled
126
127	If set, maintain connection tracking entries for
128	connections handled by IPVS.
129
130	This should be enabled if connections handled by IPVS are to be
131	also handled by stateful firewall rules. That is, iptables rules
132	that make use of connection tracking.  It is a performance
133	optimisation to disable this setting otherwise.
134
135	Connections handled by the IPVS FTP application module
136	will have connection tracking entries regardless of this setting.
137
138	Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled.
139
140cache_bypass - BOOLEAN
141	- 0 - disabled (default)
142	- not 0 - enabled
143
144	If it is enabled, forward packets to the original destination
145	directly when no cache server is available and destination
146	address is not local (iph->daddr is RTN_UNICAST). It is mostly
147	used in transparent web cache cluster.
148
149debug_level - INTEGER
150	- 0          - transmission error messages (default)
151	- 1          - non-fatal error messages
152	- 2          - configuration
153	- 3          - destination trash
154	- 4          - drop entry
155	- 5          - service lookup
156	- 6          - scheduling
157	- 7          - connection new/expire, lookup and synchronization
158	- 8          - state transition
159	- 9          - binding destination, template checks and applications
160	- 10         - IPVS packet transmission
161	- 11         - IPVS packet handling (ip_vs_in/ip_vs_out)
162	- 12 or more - packet traversal
163
164	Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled.
165
166	Higher debugging levels include the messages for lower debugging
167	levels, so setting debug level 2, includes level 0, 1 and 2
168	messages. Thus, logging becomes more and more verbose the higher
169	the level.
170
171drop_entry - INTEGER
172	- 0  - disabled (default)
173
174	The drop_entry defense is to randomly drop entries in the
175	connection hash table, just in order to collect back some
176	memory for new connections. In the current code, the
177	drop_entry procedure can be activated every second, then it
178	randomly scans 1/32 of the whole and drops entries that are in
179	the SYN-RECV/SYNACK state, which should be effective against
180	syn-flooding attack.
181
182	The valid values of drop_entry are from 0 to 3, where 0 means
183	that this strategy is always disabled, 1 and 2 mean automatic
184	modes (when there is no enough available memory, the strategy
185	is enabled and the variable is automatically set to 2,
186	otherwise the strategy is disabled and the variable is set to
187	1), and 3 means that the strategy is always enabled.
188
189drop_packet - INTEGER
190	- 0  - disabled (default)
191
192	The drop_packet defense is designed to drop 1/rate packets
193	before forwarding them to real servers. If the rate is 1, then
194	drop all the incoming packets.
195
196	The value definition is the same as that of the drop_entry. In
197	the automatic mode, the rate is determined by the follow
198	formula: rate = amemthresh / (amemthresh - available_memory)
199	when available memory is less than the available memory
200	threshold. When the mode 3 is set, the always mode drop rate
201	is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
202
203est_cpulist - CPULIST
204	Allowed	CPUs for estimation kthreads
205
206	Syntax: standard cpulist format
207	empty list - stop kthread tasks and estimation
208	default - the system's housekeeping CPUs for kthreads
209
210	Example:
211	"all": all possible CPUs
212	"0-N": all possible CPUs, N denotes last CPU number
213	"0,1-N:1/2": first and all CPUs with odd number
214	"": empty list
215
216est_nice - INTEGER
217	default 0
218	Valid range: -20 (more favorable) .. 19 (less favorable)
219
220	Niceness value to use for the estimation kthreads (scheduling
221	priority)
222
223expire_nodest_conn - BOOLEAN
224	- 0 - disabled (default)
225	- not 0 - enabled
226
227	The default value is 0, the load balancer will silently drop
228	packets when its destination server is not available. It may
229	be useful, when user-space monitoring program deletes the
230	destination server (because of server overload or wrong
231	detection) and add back the server later, and the connections
232	to the server can continue.
233
234	If this feature is enabled, the load balancer will expire the
235	connection immediately when a packet arrives and its
236	destination server is not available, then the client program
237	will be notified that the connection is closed. This is
238	equivalent to the feature some people requires to flush
239	connections when its destination is not available.
240
241expire_quiescent_template - BOOLEAN
242	- 0 - disabled (default)
243	- not 0 - enabled
244
245	When set to a non-zero value, the load balancer will expire
246	persistent templates when the destination server is quiescent.
247	This may be useful, when a user makes a destination server
248	quiescent by setting its weight to 0 and it is desired that
249	subsequent otherwise persistent connections are sent to a
250	different destination server.  By default new persistent
251	connections are allowed to quiescent destination servers.
252
253	If this feature is enabled, the load balancer will expire the
254	persistence template if it is to be used to schedule a new
255	connection and the destination server is quiescent.
256
257ignore_tunneled - BOOLEAN
258	- 0 - disabled (default)
259	- not 0 - enabled
260
261	If set, ipvs will set the ipvs_property on all packets which are of
262	unrecognized protocols.  This prevents us from routing tunneled
263	protocols like ipip, which is useful to prevent rescheduling
264	packets that have been tunneled to the ipvs host (i.e. to prevent
265	ipvs routing loops when ipvs is also acting as a real server).
266
267nat_icmp_send - BOOLEAN
268	- 0 - disabled (default)
269	- not 0 - enabled
270
271	It controls sending icmp error messages (ICMP_DEST_UNREACH)
272	for VS/NAT when the load balancer receives packets from real
273	servers but the connection entries don't exist.
274
275pmtu_disc - BOOLEAN
276	- 0 - disabled
277	- not 0 - enabled (default)
278
279	By default, reject with FRAG_NEEDED all DF packets that exceed
280	the PMTU, irrespective of the forwarding method. For TUN method
281	the flag can be disabled to fragment such packets.
282
283secure_tcp - INTEGER
284	- 0  - disabled (default)
285
286	The secure_tcp defense is to use a more complicated TCP state
287	transition table. For VS/NAT, it also delays entering the
288	TCP ESTABLISHED state until the three way handshake is completed.
289
290	The value definition is the same as that of drop_entry and
291	drop_packet.
292
293svc_lfactor - INTEGER
294	Possible values: -8 (larger table) .. 8 (smaller table)
295
296	Default: -3
297
298	Controls the sizing of the service hash table based on the
299	load factor (number of services per table buckets). The table
300	will grow and shrink in the range of 2^4 - 2^20.
301	See conn_lfactor for explanation.
302
303sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period
304	default 3 50
305
306	It sets synchronization threshold, which is the minimum number
307	of incoming packets that a connection needs to receive before
308	the connection will be synchronized. A connection will be
309	synchronized, every time the number of its incoming packets
310	modulus sync_period equals the threshold. The range of the
311	threshold is from 0 to sync_period.
312
313	When sync_period and sync_refresh_period are 0, send sync only
314	for state changes or only once when pkts matches sync_threshold
315
316sync_refresh_period - UNSIGNED INTEGER
317	default 0
318
319	In seconds, difference in reported connection timer that triggers
320	new sync message. It can be used to avoid sync messages for the
321	specified period (or half of the connection timeout if it is lower)
322	if connection state is not changed since last sync.
323
324	This is useful for normal connections with high traffic to reduce
325	sync rate. Additionally, retry sync_retries times with period of
326	sync_refresh_period/8.
327
328sync_retries - INTEGER
329	default 0
330
331	Defines sync retries with period of sync_refresh_period/8. Useful
332	to protect against loss of sync messages. The range of the
333	sync_retries is from 0 to 3.
334
335sync_qlen_max - UNSIGNED LONG
336
337	Hard limit for queued sync messages that are not sent yet. It
338	defaults to 1/32 of the memory pages but actually represents
339	number of messages. It will protect us from allocating large
340	parts of memory when the sending rate is lower than the queuing
341	rate.
342
343sync_sock_size - INTEGER
344	default 0
345
346	Configuration of SNDBUF (master) or RCVBUF (slave) socket limit.
347	Default value is 0 (preserve system defaults).
348
349sync_ports - INTEGER
350	default 1
351
352	The number of threads that master and backup servers can use for
353	sync traffic. Every thread will use single UDP port, thread 0 will
354	use the default port 8848 while last thread will use port
355	8848+sync_ports-1.
356
357snat_reroute - BOOLEAN
358	- 0 - disabled
359	- not 0 - enabled (default)
360
361	If enabled, recalculate the route of SNATed packets from
362	realservers so that they are routed as if they originate from the
363	director. Otherwise they are routed as if they are forwarded by the
364	director.
365
366	If policy routing is in effect then it is possible that the route
367	of a packet originating from a director is routed differently to a
368	packet being forwarded by the director.
369
370	If policy routing is not in effect then the recalculated route will
371	always be the same as the original route so it is an optimisation
372	to disable snat_reroute and avoid the recalculation.
373
374sync_persist_mode - INTEGER
375	default 0
376
377	Controls the synchronisation of connections when using persistence
378
379	0: All types of connections are synchronised
380
381	1: Attempt to reduce the synchronisation traffic depending on
382	the connection type. For persistent services avoid synchronisation
383	for normal connections, do it only for persistence templates.
384	In such case, for TCP and SCTP it may need enabling sloppy_tcp and
385	sloppy_sctp flags on backup servers. For non-persistent services
386	such optimization is not applied, mode 0 is assumed.
387
388sync_version - INTEGER
389	default 1
390
391	The version of the synchronisation protocol used when sending
392	synchronisation messages.
393
394	0 selects the original synchronisation protocol (version 0). This
395	should be used when sending synchronisation messages to a legacy
396	system that only understands the original synchronisation protocol.
397
398	1 selects the current synchronisation protocol (version 1). This
399	should be used where possible.
400
401	Kernels with this sync_version entry are able to receive messages
402	of both version 1 and version 2 of the synchronisation protocol.
403
404run_estimation - BOOLEAN
405	0 - disabled
406	not 0 - enabled (default)
407
408	If disabled, the estimation will be suspended and kthread tasks
409	stopped.
410
411	You can always re-enable estimation by setting this value to 1.
412	But be careful, the first estimation after re-enable is not
413	accurate.
414