1.. SPDX-License-Identifier: GPL-2.0 2 3=========== 4IPvs-sysctl 5=========== 6 7/proc/sys/net/ipv4/vs/* Variables: 8================================== 9 10am_droprate - INTEGER 11 default 10 12 13 It sets the always mode drop rate, which is used in the mode 3 14 of the drop_rate defense. 15 16amemthresh - INTEGER 17 default 1024 18 19 It sets the available memory threshold (in pages), which is 20 used in the automatic modes of defense. When there is no 21 enough available memory, the respective strategy will be 22 enabled and the variable is automatically set to 2, otherwise 23 the strategy is disabled and the variable is set to 1. 24 25backup_only - BOOLEAN 26 - 0 - disabled (default) 27 - not 0 - enabled 28 29 If set, disable the director function while the server is 30 in backup mode to avoid packet loops for DR/TUN methods. 31 32conn_lfactor - INTEGER 33 Possible values: -8 (larger table) .. 8 (smaller table) 34 35 Default: -4 36 37 Controls the sizing of the connection hash table based on the 38 load factor (number of connections per table buckets): 39 40 2^conn_lfactor = nodes / buckets 41 42 As result, the table grows if load increases and shrinks when 43 load decreases in the range of 2^8 - 2^conn_tab_bits (module 44 parameter). 45 The value is a shift count where negative values select 46 buckets = (connection hash nodes << -value) while positive 47 values select buckets = (connection hash nodes >> value). The 48 negative values reduce the collisions and reduce the time for 49 lookups but increase the table size. Positive values will 50 tolerate load above 100% when using smaller table is 51 preferred with the cost of more collisions. If using NAT 52 connections consider decreasing the value with one because 53 they add two nodes in the hash table. 54 55 Example: 56 -4: grow if load goes above 6% (buckets = nodes * 16) 57 2: grow if load goes above 400% (buckets = nodes / 4) 58 59conn_max - INTEGER 60 Limit for number of connections, per netns. 61 62 Controls the soft and hard limit for number of connections. 63 Initially, the platform specific limit is assigned for init_net. 64 The value can be changed and later the soft limit propagated 65 to other networking namespaces. 66 67 Privileged admin can change both limits up to the value of the 68 platform limit while the unprivileged admin can change only the 69 soft limit up to the value of the hard limit. 70 71 For setups using conntrack=1 (CONFIG_IP_VS_NFCT for 72 Netfilter connection tracking) the connections can be 73 limited also by nf_conntrack_max. 74 75 soft limit hard limit 76 ===================================================== 77 init_net: 78 create netns platform platform 79 priv admin 0 .. platform 0 .. platform 80 ===================================================== 81 new netns: 82 create netns init_net:soft init_net:soft 83 priv admin 0 .. platform 0 .. platform 84 unpriv admin 0 .. hard N/A 85 86 Limits per platform: 87 1,073,741,824 (2^30 for 64-bit) 88 16,777,216 (2^24 for 32-bit) 89 90 Possible values: 0 .. platform limit 91 92 Default: platform limit 93 94conn_reuse_mode - INTEGER 95 1 - default 96 97 Controls how ipvs will deal with connections that are detected 98 port reuse. It is a bitmap, with the values being: 99 100 0: disable any special handling on port reuse. The new 101 connection will be delivered to the same real server that was 102 servicing the previous connection. 103 104 bit 1: enable rescheduling of new connections when it is safe. 105 That is, whenever expire_nodest_conn and for TCP sockets, when 106 the connection is in TIME_WAIT state (which is only possible if 107 you use NAT mode). 108 109 bit 2: it is bit 1 plus, for TCP connections, when connections 110 are in FIN_WAIT state, as this is the last state seen by load 111 balancer in Direct Routing mode. This bit helps on adding new 112 real servers to a very busy cluster. 113 114conntrack - BOOLEAN 115 - 0 - disabled (default) 116 - not 0 - enabled 117 118 If set, maintain connection tracking entries for 119 connections handled by IPVS. 120 121 This should be enabled if connections handled by IPVS are to be 122 also handled by stateful firewall rules. That is, iptables rules 123 that make use of connection tracking. It is a performance 124 optimisation to disable this setting otherwise. 125 126 Connections handled by the IPVS FTP application module 127 will have connection tracking entries regardless of this setting. 128 129 Only available when IPVS is compiled with CONFIG_IP_VS_NFCT enabled. 130 131cache_bypass - BOOLEAN 132 - 0 - disabled (default) 133 - not 0 - enabled 134 135 If it is enabled, forward packets to the original destination 136 directly when no cache server is available and destination 137 address is not local (iph->daddr is RTN_UNICAST). It is mostly 138 used in transparent web cache cluster. 139 140debug_level - INTEGER 141 - 0 - transmission error messages (default) 142 - 1 - non-fatal error messages 143 - 2 - configuration 144 - 3 - destination trash 145 - 4 - drop entry 146 - 5 - service lookup 147 - 6 - scheduling 148 - 7 - connection new/expire, lookup and synchronization 149 - 8 - state transition 150 - 9 - binding destination, template checks and applications 151 - 10 - IPVS packet transmission 152 - 11 - IPVS packet handling (ip_vs_in/ip_vs_out) 153 - 12 or more - packet traversal 154 155 Only available when IPVS is compiled with CONFIG_IP_VS_DEBUG enabled. 156 157 Higher debugging levels include the messages for lower debugging 158 levels, so setting debug level 2, includes level 0, 1 and 2 159 messages. Thus, logging becomes more and more verbose the higher 160 the level. 161 162drop_entry - INTEGER 163 - 0 - disabled (default) 164 165 The drop_entry defense is to randomly drop entries in the 166 connection hash table, just in order to collect back some 167 memory for new connections. In the current code, the 168 drop_entry procedure can be activated every second, then it 169 randomly scans 1/32 of the whole and drops entries that are in 170 the SYN-RECV/SYNACK state, which should be effective against 171 syn-flooding attack. 172 173 The valid values of drop_entry are from 0 to 3, where 0 means 174 that this strategy is always disabled, 1 and 2 mean automatic 175 modes (when there is no enough available memory, the strategy 176 is enabled and the variable is automatically set to 2, 177 otherwise the strategy is disabled and the variable is set to 178 1), and 3 means that the strategy is always enabled. 179 180drop_packet - INTEGER 181 - 0 - disabled (default) 182 183 The drop_packet defense is designed to drop 1/rate packets 184 before forwarding them to real servers. If the rate is 1, then 185 drop all the incoming packets. 186 187 The value definition is the same as that of the drop_entry. In 188 the automatic mode, the rate is determined by the follow 189 formula: rate = amemthresh / (amemthresh - available_memory) 190 when available memory is less than the available memory 191 threshold. When the mode 3 is set, the always mode drop rate 192 is controlled by the /proc/sys/net/ipv4/vs/am_droprate. 193 194est_cpulist - CPULIST 195 Allowed CPUs for estimation kthreads 196 197 Syntax: standard cpulist format 198 empty list - stop kthread tasks and estimation 199 default - the system's housekeeping CPUs for kthreads 200 201 Example: 202 "all": all possible CPUs 203 "0-N": all possible CPUs, N denotes last CPU number 204 "0,1-N:1/2": first and all CPUs with odd number 205 "": empty list 206 207est_nice - INTEGER 208 default 0 209 Valid range: -20 (more favorable) .. 19 (less favorable) 210 211 Niceness value to use for the estimation kthreads (scheduling 212 priority) 213 214expire_nodest_conn - BOOLEAN 215 - 0 - disabled (default) 216 - not 0 - enabled 217 218 The default value is 0, the load balancer will silently drop 219 packets when its destination server is not available. It may 220 be useful, when user-space monitoring program deletes the 221 destination server (because of server overload or wrong 222 detection) and add back the server later, and the connections 223 to the server can continue. 224 225 If this feature is enabled, the load balancer will expire the 226 connection immediately when a packet arrives and its 227 destination server is not available, then the client program 228 will be notified that the connection is closed. This is 229 equivalent to the feature some people requires to flush 230 connections when its destination is not available. 231 232expire_quiescent_template - BOOLEAN 233 - 0 - disabled (default) 234 - not 0 - enabled 235 236 When set to a non-zero value, the load balancer will expire 237 persistent templates when the destination server is quiescent. 238 This may be useful, when a user makes a destination server 239 quiescent by setting its weight to 0 and it is desired that 240 subsequent otherwise persistent connections are sent to a 241 different destination server. By default new persistent 242 connections are allowed to quiescent destination servers. 243 244 If this feature is enabled, the load balancer will expire the 245 persistence template if it is to be used to schedule a new 246 connection and the destination server is quiescent. 247 248ignore_tunneled - BOOLEAN 249 - 0 - disabled (default) 250 - not 0 - enabled 251 252 If set, ipvs will set the ipvs_property on all packets which are of 253 unrecognized protocols. This prevents us from routing tunneled 254 protocols like ipip, which is useful to prevent rescheduling 255 packets that have been tunneled to the ipvs host (i.e. to prevent 256 ipvs routing loops when ipvs is also acting as a real server). 257 258nat_icmp_send - BOOLEAN 259 - 0 - disabled (default) 260 - not 0 - enabled 261 262 It controls sending icmp error messages (ICMP_DEST_UNREACH) 263 for VS/NAT when the load balancer receives packets from real 264 servers but the connection entries don't exist. 265 266pmtu_disc - BOOLEAN 267 - 0 - disabled 268 - not 0 - enabled (default) 269 270 By default, reject with FRAG_NEEDED all DF packets that exceed 271 the PMTU, irrespective of the forwarding method. For TUN method 272 the flag can be disabled to fragment such packets. 273 274secure_tcp - INTEGER 275 - 0 - disabled (default) 276 277 The secure_tcp defense is to use a more complicated TCP state 278 transition table. For VS/NAT, it also delays entering the 279 TCP ESTABLISHED state until the three way handshake is completed. 280 281 The value definition is the same as that of drop_entry and 282 drop_packet. 283 284svc_lfactor - INTEGER 285 Possible values: -8 (larger table) .. 8 (smaller table) 286 287 Default: -3 288 289 Controls the sizing of the service hash table based on the 290 load factor (number of services per table buckets). The table 291 will grow and shrink in the range of 2^4 - 2^20. 292 See conn_lfactor for explanation. 293 294sync_threshold - vector of 2 INTEGERs: sync_threshold, sync_period 295 default 3 50 296 297 It sets synchronization threshold, which is the minimum number 298 of incoming packets that a connection needs to receive before 299 the connection will be synchronized. A connection will be 300 synchronized, every time the number of its incoming packets 301 modulus sync_period equals the threshold. The range of the 302 threshold is from 0 to sync_period. 303 304 When sync_period and sync_refresh_period are 0, send sync only 305 for state changes or only once when pkts matches sync_threshold 306 307sync_refresh_period - UNSIGNED INTEGER 308 default 0 309 310 In seconds, difference in reported connection timer that triggers 311 new sync message. It can be used to avoid sync messages for the 312 specified period (or half of the connection timeout if it is lower) 313 if connection state is not changed since last sync. 314 315 This is useful for normal connections with high traffic to reduce 316 sync rate. Additionally, retry sync_retries times with period of 317 sync_refresh_period/8. 318 319sync_retries - INTEGER 320 default 0 321 322 Defines sync retries with period of sync_refresh_period/8. Useful 323 to protect against loss of sync messages. The range of the 324 sync_retries is from 0 to 3. 325 326sync_qlen_max - UNSIGNED LONG 327 328 Hard limit for queued sync messages that are not sent yet. It 329 defaults to 1/32 of the memory pages but actually represents 330 number of messages. It will protect us from allocating large 331 parts of memory when the sending rate is lower than the queuing 332 rate. 333 334sync_sock_size - INTEGER 335 default 0 336 337 Configuration of SNDBUF (master) or RCVBUF (slave) socket limit. 338 Default value is 0 (preserve system defaults). 339 340sync_ports - INTEGER 341 default 1 342 343 The number of threads that master and backup servers can use for 344 sync traffic. Every thread will use single UDP port, thread 0 will 345 use the default port 8848 while last thread will use port 346 8848+sync_ports-1. 347 348snat_reroute - BOOLEAN 349 - 0 - disabled 350 - not 0 - enabled (default) 351 352 If enabled, recalculate the route of SNATed packets from 353 realservers so that they are routed as if they originate from the 354 director. Otherwise they are routed as if they are forwarded by the 355 director. 356 357 If policy routing is in effect then it is possible that the route 358 of a packet originating from a director is routed differently to a 359 packet being forwarded by the director. 360 361 If policy routing is not in effect then the recalculated route will 362 always be the same as the original route so it is an optimisation 363 to disable snat_reroute and avoid the recalculation. 364 365sync_persist_mode - INTEGER 366 default 0 367 368 Controls the synchronisation of connections when using persistence 369 370 0: All types of connections are synchronised 371 372 1: Attempt to reduce the synchronisation traffic depending on 373 the connection type. For persistent services avoid synchronisation 374 for normal connections, do it only for persistence templates. 375 In such case, for TCP and SCTP it may need enabling sloppy_tcp and 376 sloppy_sctp flags on backup servers. For non-persistent services 377 such optimization is not applied, mode 0 is assumed. 378 379sync_version - INTEGER 380 default 1 381 382 The version of the synchronisation protocol used when sending 383 synchronisation messages. 384 385 0 selects the original synchronisation protocol (version 0). This 386 should be used when sending synchronisation messages to a legacy 387 system that only understands the original synchronisation protocol. 388 389 1 selects the current synchronisation protocol (version 1). This 390 should be used where possible. 391 392 Kernels with this sync_version entry are able to receive messages 393 of both version 1 and version 2 of the synchronisation protocol. 394 395run_estimation - BOOLEAN 396 0 - disabled 397 not 0 - enabled (default) 398 399 If disabled, the estimation will be suspended and kthread tasks 400 stopped. 401 402 You can always re-enable estimation by setting this value to 1. 403 But be careful, the first estimation after re-enable is not 404 accurate. 405