1.\" Copyright (c) 2001, Matthew Dillon. Terms and conditions are those of 2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in 3.\" the source tree. 4.\" 5.\" $FreeBSD$ 6.\" 7.Dd June 25, 2002 8.Dt TUNING 7 9.Os 10.Sh NAME 11.Nm tuning 12.Nd performance tuning under FreeBSD 13.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP 14When using 15.Xr bsdlabel 8 16or 17.Xr sysinstall 8 18to lay out your file systems on a hard disk it is important to remember 19that hard drives can transfer data much more quickly from outer tracks 20than they can from inner tracks. 21To take advantage of this you should 22try to pack your smaller file systems and swap closer to the outer tracks, 23follow with the larger file systems, and end with the largest file systems. 24It is also important to size system standard file systems such that you 25will not be forced to resize them later as you scale the machine up. 26I usually create, in order, a 128M root, 1G swap, 128M 27.Pa /var , 28128M 29.Pa /var/tmp , 303G 31.Pa /usr , 32and use any remaining space for 33.Pa /home . 34.Pp 35You should typically size your swap space to approximately 2x main memory. 36If you do not have a lot of RAM, though, you will generally want a lot 37more swap. 38It is not recommended that you configure any less than 39256M of swap on a system and you should keep in mind future memory 40expansion when sizing the swap partition. 41The kernel's VM paging algorithms are tuned to perform best when there is 42at least 2x swap versus main memory. 43Configuring too little swap can lead 44to inefficiencies in the VM page scanning code as well as create issues 45later on if you add more memory to your machine. 46Finally, on larger systems 47with multiple SCSI disks (or multiple IDE disks operating on different 48controllers), we strongly recommend that you configure swap on each drive. 49The swap partitions on the drives should be approximately the same size. 50The kernel can handle arbitrary sizes but 51internal data structures scale to 4 times the largest swap partition. 52Keeping 53the swap partitions near the same size will allow the kernel to optimally 54stripe swap space across the N disks. 55Do not worry about overdoing it a 56little, swap space is the saving grace of 57.Ux 58and even if you do not normally use much swap, it can give you more time to 59recover from a runaway program before being forced to reboot. 60.Pp 61How you size your 62.Pa /var 63partition depends heavily on what you intend to use the machine for. 64This 65partition is primarily used to hold mailboxes, the print spool, and log 66files. 67Some people even make 68.Pa /var/log 69its own partition (but except for extreme cases it is not worth the waste 70of a partition ID). 71If your machine is intended to act as a mail 72or print server, 73or you are running a heavily visited web server, you should consider 74creating a much larger partition \(en perhaps a gig or more. 75It is very easy 76to underestimate log file storage requirements. 77.Pp 78Sizing 79.Pa /var/tmp 80depends on the kind of temporary file usage you think you will need. 81128M is 82the minimum we recommend. 83Also note that sysinstall will create a 84.Pa /tmp 85directory. 86Dedicating a partition for temporary file storage is important for 87two reasons: first, it reduces the possibility of file system corruption 88in a crash, and second it reduces the chance of a runaway process that 89fills up 90.Oo Pa /var Oc Ns Pa /tmp 91from blowing up more critical subsystems (mail, 92logging, etc). 93Filling up 94.Oo Pa /var Oc Ns Pa /tmp 95is a very common problem to have. 96.Pp 97In the old days there were differences between 98.Pa /tmp 99and 100.Pa /var/tmp , 101but the introduction of 102.Pa /var 103(and 104.Pa /var/tmp ) 105led to massive confusion 106by program writers so today programs haphazardly use one or the 107other and thus no real distinction can be made between the two. 108So it makes sense to have just one temporary directory and 109softlink to it from the other 110.Pa tmp 111directory locations. 112However you handle 113.Pa /tmp , 114the one thing you do not want to do is leave it sitting 115on the root partition where it might cause root to fill up or possibly 116corrupt root in a crash/reboot situation. 117.Pp 118The 119.Pa /usr 120partition holds the bulk of the files required to support the system and 121a subdirectory within it called 122.Pa /usr/local 123holds the bulk of the files installed from the 124.Xr ports 7 125hierarchy. 126If you do not use ports all that much and do not intend to keep 127system source 128.Pq Pa /usr/src 129on the machine, you can get away with 130a 1 gigabyte 131.Pa /usr 132partition. 133However, if you install a lot of ports 134(especially window managers and Linux-emulated binaries), we recommend 135at least a 2 gigabyte 136.Pa /usr 137and if you also intend to keep system source 138on the machine, we recommend a 3 gigabyte 139.Pa /usr . 140Do not underestimate the 141amount of space you will need in this partition, it can creep up and 142surprise you! 143.Pp 144The 145.Pa /home 146partition is typically used to hold user-specific data. 147I usually size it to the remainder of the disk. 148.Pp 149Why partition at all? 150Why not create one big 151.Pa / 152partition and be done with it? 153Then I do not have to worry about undersizing things! 154Well, there are several reasons this is not a good idea. 155First, 156each partition has different operational characteristics and separating them 157allows the file system to tune itself to those characteristics. 158For example, 159the root and 160.Pa /usr 161partitions are read-mostly, with very little writing, while 162a lot of reading and writing could occur in 163.Pa /var 164and 165.Pa /var/tmp . 166By properly 167partitioning your system fragmentation introduced in the smaller more 168heavily write-loaded partitions will not bleed over into the mostly-read 169partitions. 170Additionally, keeping the write-loaded partitions closer to 171the edge of the disk (i.e., before the really big partitions instead of after 172in the partition table) will increase I/O performance in the partitions 173where you need it the most. 174Now it is true that you might also need I/O 175performance in the larger partitions, but they are so large that shifting 176them more towards the edge of the disk will not lead to a significant 177performance improvement whereas moving 178.Pa /var 179to the edge can have a huge impact. 180Finally, there are safety concerns. 181Having a small neat root partition that 182is essentially read-only gives it a greater chance of surviving a bad crash 183intact. 184.Pp 185Properly partitioning your system also allows you to tune 186.Xr newfs 8 , 187and 188.Xr tunefs 8 189parameters. 190Tuning 191.Xr newfs 8 192requires more experience but can lead to significant improvements in 193performance. 194There are three parameters that are relatively safe to tune: 195.Em blocksize , bytes/i-node , 196and 197.Em cylinders/group . 198.Pp 199.Fx 200performs best when using 8K or 16K file system block sizes. 201The default file system block size is 16K, 202which provides best performance for most applications, 203with the exception of those that perform random access on large files 204(such as database server software). 205Such applications tend to perform better with a smaller block size, 206although modern disk characteristics are such that the performance 207gain from using a smaller block size may not be worth consideration. 208Using a block size larger than 16K 209can cause fragmentation of the buffer cache and 210lead to lower performance. 211.Pp 212The defaults may be unsuitable 213for a file system that requires a very large number of i-nodes 214or is intended to hold a large number of very small files. 215Such a file system should be created with an 8K or 4K block size. 216This also requires you to specify a smaller 217fragment size. 218We recommend always using a fragment size that is 1/8 219the block size (less testing has been done on other fragment size factors). 220The 221.Xr newfs 8 222options for this would be 223.Dq Li "newfs -f 1024 -b 8192 ..." . 224.Pp 225If a large partition is intended to be used to hold fewer, larger files, such 226as database files, you can increase the 227.Em bytes/i-node 228ratio which reduces the number of i-nodes (maximum number of files and 229directories that can be created) for that partition. 230Decreasing the number 231of i-nodes in a file system can greatly reduce 232.Xr fsck 8 233recovery times after a crash. 234Do not use this option 235unless you are actually storing large files on the partition, because if you 236overcompensate you can wind up with a file system that has lots of free 237space remaining but cannot accommodate any more files. 238Using 32768, 65536, or 262144 bytes/i-node is recommended. 239You can go higher but 240it will have only incremental effects on 241.Xr fsck 8 242recovery times. 243For example, 244.Dq Li "newfs -i 32768 ..." . 245.Pp 246.Xr tunefs 8 247may be used to further tune a file system. 248This command can be run in 249single-user mode without having to reformat the file system. 250However, this is possibly the most abused program in the system. 251Many people attempt to 252increase available file system space by setting the min-free percentage to 0. 253This can lead to severe file system fragmentation and we do not recommend 254that you do this. 255Really the only 256.Xr tunefs 8 257option worthwhile here is turning on 258.Em softupdates 259with 260.Dq Li "tunefs -n enable /filesystem" . 261(Note: in 262.Fx 4.5 263and later, softupdates can be turned on using the 264.Fl U 265option to 266.Xr newfs 8 , 267and 268.Xr sysinstall 8 269will typically enable softupdates automatically for non-root file systems). 270Softupdates drastically improves meta-data performance, mainly file 271creation and deletion. 272We recommend enabling softupdates on most file systems; however, there 273are two limitations to softupdates that you should be aware of when 274determining whether to use it on a file system. 275First, softupdates guarantees file system consistency in the 276case of a crash but could very easily be several seconds (even a minute!\&) 277behind on pending write to the physical disk. 278If you crash you may lose more work 279than otherwise. 280Secondly, softupdates delays the freeing of file system 281blocks. 282If you have a file system (such as the root file system) which is 283close to full, doing a major update of it, e.g.\& 284.Dq Li "make installworld" , 285can run it out of space and cause the update to fail. 286For this reason, softupdates will not be enabled on the root file system 287during a typical install. 288There is no loss of performance since the root 289file system is rarely written to. 290.Pp 291A number of run-time 292.Xr mount 8 293options exist that can help you tune the system. 294The most obvious and most dangerous one is 295.Cm async . 296Do not ever use it; it is far too dangerous. 297A less dangerous and more 298useful 299.Xr mount 8 300option is called 301.Cm noatime . 302.Ux 303file systems normally update the last-accessed time of a file or 304directory whenever it is accessed. 305This operation is handled in 306.Fx 307with a delayed write and normally does not create a burden on the system. 308However, if your system is accessing a huge number of files on a continuing 309basis the buffer cache can wind up getting polluted with atime updates, 310creating a burden on the system. 311For example, if you are running a heavily 312loaded web site, or a news server with lots of readers, you might want to 313consider turning off atime updates on your larger partitions with this 314.Xr mount 8 315option. 316However, you should not gratuitously turn off atime 317updates everywhere. 318For example, the 319.Pa /var 320file system customarily 321holds mailboxes, and atime (in combination with mtime) is used to 322determine whether a mailbox has new mail. 323You might as well leave 324atime turned on for mostly read-only partitions such as 325.Pa / 326and 327.Pa /usr 328as well. 329This is especially useful for 330.Pa / 331since some system utilities 332use the atime field for reporting. 333.Sh STRIPING DISKS 334In larger systems you can stripe partitions from several drives together 335to create a much larger overall partition. 336Striping can also improve 337the performance of a file system by splitting I/O operations across two 338or more disks. 339The 340.Xr vinum 8 341and 342.Xr ccdconfig 8 343utilities may be used to create simple striped file systems. 344Generally 345speaking, striping smaller partitions such as the root and 346.Pa /var/tmp , 347or essentially read-only partitions such as 348.Pa /usr 349is a complete waste of time. 350You should only stripe partitions that require serious I/O performance, 351typically 352.Pa /var , /home , 353or custom partitions used to hold databases and web pages. 354Choosing the proper stripe size is also 355important. 356File systems tend to store meta-data on power-of-2 boundaries 357and you usually want to reduce seeking rather than increase seeking. 358This 359means you want to use a large off-center stripe size such as 1152 sectors 360so sequential I/O does not seek both disks and so meta-data is distributed 361across both disks rather than concentrated on a single disk. 362If 363you really need to get sophisticated, we recommend using a real hardware 364RAID controller from the list of 365.Fx 366supported controllers. 367.Sh SYSCTL TUNING 368.Xr sysctl 8 369variables permit system behavior to be monitored and controlled at 370run-time. 371Some sysctls simply report on the behavior of the system; others allow 372the system behavior to be modified; 373some may be set at boot time using 374.Xr rc.conf 5 , 375but most will be set via 376.Xr sysctl.conf 5 . 377There are several hundred sysctls in the system, including many that appear 378to be candidates for tuning but actually are not. 379In this document we will only cover the ones that have the greatest effect 380on the system. 381.Pp 382The 383.Va kern.ipc.shm_use_phys 384sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). 385Setting 386this parameter to 1 will cause all System V shared memory segments to be 387mapped to unpageable physical RAM. 388This feature only has an effect if you 389are either (A) mapping small amounts of shared memory across many (hundreds) 390of processes, or (B) mapping large amounts of shared memory across any 391number of processes. 392This feature allows the kernel to remove a great deal 393of internal memory management page-tracking overhead at the cost of wiring 394the shared memory into core, making it unswappable. 395.Pp 396The 397.Va vfs.vmiodirenable 398sysctl defaults to 1 (on). 399This parameter controls how directories are cached 400by the system. 401Most directories are small and use but a single fragment 402(typically 1K) in the file system and even less (typically 512 bytes) in 403the buffer cache. 404However, when operating in the default mode the buffer 405cache will only cache a fixed number of directories even if you have a huge 406amount of memory. 407Turning on this sysctl allows the buffer cache to use 408the VM Page Cache to cache the directories. 409The advantage is that all of 410memory is now available for caching directories. 411The disadvantage is that 412the minimum in-core memory used to cache a directory is the physical page 413size (typically 4K) rather than 512 bytes. 414We recommend turning this option off in memory-constrained environments; 415however, when on, it will substantially improve the performance of services 416that manipulate a large number of files. 417Such services can include web caches, large mail systems, and news systems. 418Turning on this option will generally not reduce performance even with the 419wasted memory but you should experiment to find out. 420.Pp 421The 422.Va vfs.write_behind 423sysctl defaults to 1 (on). 424This tells the file system to issue media 425writes as full clusters are collected, which typically occurs when writing 426large sequential files. 427The idea is to avoid saturating the buffer 428cache with dirty buffers when it would not benefit I/O performance. 429However, 430this may stall processes and under certain circumstances you may wish to turn 431it off. 432.Pp 433The 434.Va vfs.hirunningspace 435sysctl determines how much outstanding write I/O may be queued to 436disk controllers system-wide at any given instance. 437The default is 438usually sufficient but on machines with lots of disks you may want to bump 439it up to four or five megabytes. 440Note that setting too high a value 441(exceeding the buffer cache's write threshold) can lead to extremely 442bad clustering performance. 443Do not set this value arbitrarily high! 444Also, 445higher write queueing values may add latency to reads occurring at the same 446time. 447.Pp 448There are various other buffer-cache and VM page cache related sysctls. 449We do not recommend modifying these values. 450As of 451.Fx 4.3 , 452the VM system does an extremely good job tuning itself. 453.Pp 454The 455.Va net.inet.tcp.sendspace 456and 457.Va net.inet.tcp.recvspace 458sysctls are of particular interest if you are running network intensive 459applications. 460They control the amount of send and receive buffer space 461allowed for any given TCP connection. 462The default sending buffer is 32K; the default receiving buffer 463is 64K. 464You can often 465improve bandwidth utilization by increasing the default at the cost of 466eating up more kernel memory for each connection. 467We do not recommend 468increasing the defaults if you are serving hundreds or thousands of 469simultaneous connections because it is possible to quickly run the system 470out of memory due to stalled connections building up. 471But if you need 472high bandwidth over a fewer number of connections, especially if you have 473gigabit Ethernet, increasing these defaults can make a huge difference. 474You can adjust the buffer size for incoming and outgoing data separately. 475For example, if your machine is primarily doing web serving you may want 476to decrease the recvspace in order to be able to increase the 477sendspace without eating too much kernel memory. 478Note that the routing table (see 479.Xr route 8 ) 480can be used to introduce route-specific send and receive buffer size 481defaults. 482.Pp 483As an additional management tool you can use pipes in your 484firewall rules (see 485.Xr ipfw 8 ) 486to limit the bandwidth going to or from particular IP blocks or ports. 487For example, if you have a T1 you might want to limit your web traffic 488to 70% of the T1's bandwidth in order to leave the remainder available 489for mail and interactive use. 490Normally a heavily loaded web server 491will not introduce significant latencies into other services even if 492the network link is maxed out, but enforcing a limit can smooth things 493out and lead to longer term stability. 494Many people also enforce artificial 495bandwidth limitations in order to ensure that they are not charged for 496using too much bandwidth. 497.Pp 498Setting the send or receive TCP buffer to values larger than 65535 will result 499in a marginal performance improvement unless both hosts support the window 500scaling extension of the TCP protocol, which is controlled by the 501.Va net.inet.tcp.rfc1323 502sysctl. 503These extensions should be enabled and the TCP buffer size should be set 504to a value larger than 65536 in order to obtain good performance from 505certain types of network links; specifically, gigabit WAN links and 506high-latency satellite links. 507RFC1323 support is enabled by default. 508.Pp 509The 510.Va net.inet.tcp.always_keepalive 511sysctl determines whether or not the TCP implementation should attempt 512to detect dead TCP connections by intermittently delivering 513.Dq keepalives 514on the connection. 515By default, this is enabled for all applications; by setting this 516sysctl to 0, only applications that specifically request keepalives 517will use them. 518In most environments, TCP keepalives will improve the management of 519system state by expiring dead TCP connections, particularly for 520systems serving dialup users who may not always terminate individual 521TCP connections before disconnecting from the network. 522However, in some environments, temporary network outages may be 523incorrectly identified as dead sessions, resulting in unexpectedly 524terminated TCP connections. 525In such environments, setting the sysctl to 0 may reduce the occurrence of 526TCP session disconnections. 527.Pp 528The 529.Va net.inet.tcp.delayed_ack 530TCP feature is largely misunderstood. 531Historically speaking, this feature 532was designed to allow the acknowledgement to transmitted data to be returned 533along with the response. 534For example, when you type over a remote shell, 535the acknowledgement to the character you send can be returned along with the 536data representing the echo of the character. 537With delayed acks turned off, 538the acknowledgement may be sent in its own packet, before the remote service 539has a chance to echo the data it just received. 540This same concept also 541applies to any interactive protocol (e.g.\& SMTP, WWW, POP3), and can cut the 542number of tiny packets flowing across the network in half. 543The 544.Fx 545delayed ACK implementation also follows the TCP protocol rule that 546at least every other packet be acknowledged even if the standard 100ms 547timeout has not yet passed. 548Normally the worst a delayed ACK can do is 549slightly delay the teardown of a connection, or slightly delay the ramp-up 550of a slow-start TCP connection. 551While we are not sure we believe that 552the several FAQs related to packages such as SAMBA and SQUID which advise 553turning off delayed acks may be referring to the slow-start issue. 554In 555.Fx , 556it would be more beneficial to increase the slow-start flightsize via 557the 558.Va net.inet.tcp.slowstart_flightsize 559sysctl rather than disable delayed acks. 560.Pp 561The 562.Va net.inet.tcp.inflight_enable 563sysctl turns on bandwidth delay product limiting for all TCP connections. 564The system will attempt to calculate the bandwidth delay product for each 565connection and limit the amount of data queued to the network to just the 566amount required to maintain optimum throughput. 567This feature is useful 568if you are serving data over modems, GigE, or high speed WAN links (or 569any other link with a high bandwidth*delay product), especially if you are 570also using window scaling or have configured a large send window. 571If you enable this option, you should also be sure to set 572.Va net.inet.tcp.inflight_debug 573to 0 (disable debugging), and for production use setting 574.Va net.inet.tcp.inflight_min 575to at least 6144 may be beneficial. 576Note however, that setting high 577minimums may effectively disable bandwidth limiting depending on the link. 578The limiting feature reduces the amount of data built up in intermediate 579router and switch packet queues as well as reduces the amount of data built 580up in the local host's interface queue. 581With fewer packets queued up, 582interactive connections, especially over slow modems, will also be able 583to operate with lower round trip times. 584However, note that this feature 585only affects data transmission (uploading / server-side). 586It does not 587affect data reception (downloading). 588.Pp 589Adjusting 590.Va net.inet.tcp.inflight_stab 591is not recommended. 592This parameter defaults to 20, representing 2 maximal packets added 593to the bandwidth delay product window calculation. 594The additional 595window is required to stabilize the algorithm and improve responsiveness 596to changing conditions, but it can also result in higher ping times 597over slow links (though still much lower than you would get without 598the inflight algorithm). 599In such cases you may 600wish to try reducing this parameter to 15, 10, or 5, and you may also 601have to reduce 602.Va net.inet.tcp.inflight_min 603(for example, to 3500) to get the desired effect. 604Reducing these parameters 605should be done as a last resort only. 606.Pp 607The 608.Va net.inet.ip.portrange.* 609sysctls control the port number ranges automatically bound to TCP and UDP 610sockets. 611There are three ranges: a low range, a default range, and a 612high range, selectable via the 613.Dv IP_PORTRANGE 614.Xr setsockopt 2 615call. 616Most 617network programs use the default range which is controlled by 618.Va net.inet.ip.portrange.first 619and 620.Va net.inet.ip.portrange.last , 621which default to 1024 and 5000, respectively. 622Bound port ranges are 623used for outgoing connections, and it is possible to run the system out 624of ports under certain circumstances. 625This most commonly occurs when you are 626running a heavily loaded web proxy. 627The port range is not an issue 628when running servers which handle mainly incoming connections, such as a 629normal web server, or has a limited number of outgoing connections, such 630as a mail relay. 631For situations where you may run yourself out of 632ports, we recommend increasing 633.Va net.inet.ip.portrange.last 634modestly. 635A value of 10000 or 20000 or 30000 may be reasonable. 636You should also consider firewall effects when changing the port range. 637Some firewalls 638may block large ranges of ports (usually low-numbered ports) and expect systems 639to use higher ranges of ports for outgoing connections. 640For this reason, 641we do not recommend that 642.Va net.inet.ip.portrange.first 643be lowered. 644.Pp 645The 646.Va kern.ipc.somaxconn 647sysctl limits the size of the listen queue for accepting new TCP connections. 648The default value of 128 is typically too low for robust handling of new 649connections in a heavily loaded web server environment. 650For such environments, 651we recommend increasing this value to 1024 or higher. 652The service daemon 653may itself limit the listen queue size (e.g.\& 654.Xr sendmail 8 , 655apache) but will 656often have a directive in its configuration file to adjust the queue size up. 657Larger listen queues also do a better job of fending off denial of service 658attacks. 659.Pp 660The 661.Va kern.maxfiles 662sysctl determines how many open files the system supports. 663The default is 664typically a few thousand but you may need to bump this up to ten or twenty 665thousand if you are running databases or large descriptor-heavy daemons. 666The read-only 667.Va kern.openfiles 668sysctl may be interrogated to determine the current number of open files 669on the system. 670.Pp 671The 672.Va vm.swap_idle_enabled 673sysctl is useful in large multi-user systems where you have lots of users 674entering and leaving the system and lots of idle processes. 675Such systems 676tend to generate a great deal of continuous pressure on free memory reserves. 677Turning this feature on and adjusting the swapout hysteresis (in idle 678seconds) via 679.Va vm.swap_idle_threshold1 680and 681.Va vm.swap_idle_threshold2 682allows you to depress the priority of pages associated with idle processes 683more quickly then the normal pageout algorithm. 684This gives a helping hand 685to the pageout daemon. 686Do not turn this option on unless you need it, 687because the tradeoff you are making is to essentially pre-page memory sooner 688rather than later, eating more swap and disk bandwidth. 689In a small system 690this option will have a detrimental effect but in a large system that is 691already doing moderate paging this option allows the VM system to stage 692whole processes into and out of memory more easily. 693.Sh LOADER TUNABLES 694Some aspects of the system behavior may not be tunable at runtime because 695memory allocations they perform must occur early in the boot process. 696To change loader tunables, you must set their values in 697.Xr loader.conf 5 698and reboot the system. 699.Pp 700.Va kern.maxusers 701controls the scaling of a number of static system tables, including defaults 702for the maximum number of open files, sizing of network memory resources, etc. 703As of 704.Fx 4.5 , 705.Va kern.maxusers 706is automatically sized at boot based on the amount of memory available in 707the system, and may be determined at run-time by inspecting the value of the 708read-only 709.Va kern.maxusers 710sysctl. 711Some sites will require larger or smaller values of 712.Va kern.maxusers 713and may set it as a loader tunable; values of 64, 128, and 256 are not 714uncommon. 715We do not recommend going above 256 unless you need a huge number 716of file descriptors; many of the tunable values set to their defaults by 717.Va kern.maxusers 718may be individually overridden at boot-time or run-time as described 719elsewhere in this document. 720Systems older than 721.Fx 4.4 722must set this value via the kernel 723.Xr config 8 724option 725.Cd maxusers 726instead. 727.Pp 728.Va kern.ipc.nmbclusters 729may be adjusted to increase the number of network mbufs the system is 730willing to allocate. 731Each cluster represents approximately 2K of memory, 732so a value of 1024 represents 2M of kernel memory reserved for network 733buffers. 734You can do a simple calculation to figure out how many you need. 735If you have a web server which maxes out at 1000 simultaneous connections, 736and each connection eats a 16K receive and 16K send buffer, you need 737approximately 32MB worth of network buffers to deal with it. 738A good rule of 739thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. 740So for this case 741you would want to set 742.Va kern.ipc.nmbclusters 743to 32768. 744We recommend values between 7451024 and 4096 for machines with moderates amount of memory, and between 4096 746and 32768 for machines with greater amounts of memory. 747Under no circumstances 748should you specify an arbitrarily high value for this parameter, it could 749lead to a boot-time crash. 750The 751.Fl m 752option to 753.Xr netstat 1 754may be used to observe network cluster use. 755Older versions of 756.Fx 757do not have this tunable and require that the 758kernel 759.Xr config 8 760option 761.Dv NMBCLUSTERS 762be set instead. 763.Pp 764More and more programs are using the 765.Xr sendfile 2 766system call to transmit files over the network. 767The 768.Va kern.ipc.nsfbufs 769sysctl controls the number of file system buffers 770.Xr sendfile 2 771is allowed to use to perform its work. 772This parameter nominally scales 773with 774.Va kern.maxusers 775so you should not need to modify this parameter except under extreme 776circumstances. 777See the 778.Sx TUNING 779section in the 780.Xr sendfile 2 781manual page for details. 782.Sh KERNEL CONFIG TUNING 783There are a number of kernel options that you may have to fiddle with in 784a large-scale system. 785In order to change these options you need to be 786able to compile a new kernel from source. 787The 788.Xr config 8 789manual page and the handbook are good starting points for learning how to 790do this. 791Generally the first thing you do when creating your own custom 792kernel is to strip out all the drivers and services you do not use. 793Removing things like 794.Dv INET6 795and drivers you do not have will reduce the size of your kernel, sometimes 796by a megabyte or more, leaving more memory available for applications. 797.Pp 798.Dv SCSI_DELAY 799and 800.Dv IDE_DELAY 801may be used to reduce system boot times. 802The defaults are fairly high and 803can be responsible for 15+ seconds of delay in the boot process. 804Reducing 805.Dv SCSI_DELAY 806to 5 seconds usually works (especially with modern drives). 807Reducing 808.Dv IDE_DELAY 809also works but you have to be a little more careful. 810.Pp 811There are a number of 812.Dv *_CPU 813options that can be commented out. 814If you only want the kernel to run 815on a Pentium class CPU, you can easily remove 816.Dv I386_CPU 817and 818.Dv I486_CPU , 819but only remove 820.Dv I586_CPU 821if you are sure your CPU is being recognized as a Pentium II or better. 822Some clones may be recognized as a Pentium or even a 486 and not be able 823to boot without those options. 824If it works, great! 825The operating system 826will be able to better use higher-end CPU features for MMU, task switching, 827timebase, and even device operations. 828Additionally, higher-end CPUs support 8294MB MMU pages, which the kernel uses to map the kernel itself into memory, 830increasing its efficiency under heavy syscall loads. 831.Sh IDE WRITE CACHING 832.Fx 4.3 833flirted with turning off IDE write caching. 834This reduced write bandwidth 835to IDE disks but was considered necessary due to serious data consistency 836issues introduced by hard drive vendors. 837Basically the problem is that 838IDE drives lie about when a write completes. 839With IDE write caching turned 840on, IDE hard drives will not only write data to disk out of order, they 841will sometimes delay some of the blocks indefinitely under heavy disk 842load. 843A crash or power failure can result in serious file system 844corruption. 845So our default was changed to be safe. 846Unfortunately, the 847result was such a huge loss in performance that we caved in and changed the 848default back to on after the release. 849You should check the default on 850your system by observing the 851.Va hw.ata.wc 852sysctl variable. 853If IDE write caching is turned off, you can turn it back 854on by setting the 855.Va hw.ata.wc 856loader tunable to 1. 857More information on tuning the ATA driver system may be found in the 858.Xr ata 4 859man page. 860If you need performance, go with SCSI. 861.Sh CPU, MEMORY, DISK, NETWORK 862The type of tuning you do depends heavily on where your system begins to 863bottleneck as load increases. 864If your system runs out of CPU (idle times 865are perpetually 0%) then you need to consider upgrading the CPU or moving to 866an SMP motherboard (multiple CPU's), or perhaps you need to revisit the 867programs that are causing the load and try to optimize them. 868If your system 869is paging to swap a lot you need to consider adding more memory. 870If your 871system is saturating the disk you typically see high CPU idle times and 872total disk saturation. 873.Xr systat 1 874can be used to monitor this. 875There are many solutions to saturated disks: 876increasing memory for caching, mirroring disks, distributing operations across 877several machines, and so forth. 878If disk performance is an issue and you 879are using IDE drives, switching to SCSI can help a great deal. 880While modern 881IDE drives compare with SCSI in raw sequential bandwidth, the moment you 882start seeking around the disk SCSI drives usually win. 883.Pp 884Finally, you might run out of network suds. 885The first line of defense for 886improving network performance is to make sure you are using switches instead 887of hubs, especially these days where switches are almost as cheap. 888Hubs 889have severe problems under heavy loads due to collision back-off and one bad 890host can severely degrade the entire LAN. 891Second, optimize the network path 892as much as possible. 893For example, in 894.Xr firewall 7 895we describe a firewall protecting internal hosts with a topology where 896the externally visible hosts are not routed through it. 897Use 100BaseT rather 898than 10BaseT, or use 1000BaseT rather than 100BaseT, depending on your needs. 899Most bottlenecks occur at the WAN link (e.g.\& 900modem, T1, DSL, whatever). 901If expanding the link is not an option it may be possible to use the 902.Xr dummynet 4 903feature to implement peak shaving or other forms of traffic shaping to 904prevent the overloaded service (such as web services) from affecting other 905services (such as email), or vice versa. 906In home installations this could 907be used to give interactive traffic (your browser, 908.Xr ssh 1 909logins) priority 910over services you export from your box (web services, email). 911.Sh SEE ALSO 912.Xr netstat 1 , 913.Xr systat 1 , 914.Xr ata 4 , 915.Xr dummynet 4 , 916.Xr login.conf 5 , 917.Xr rc.conf 5 , 918.Xr sysctl.conf 5 , 919.Xr firewall 7 , 920.Xr hier 7 , 921.Xr ports 7 , 922.Xr boot 8 , 923.Xr bsdlabel 8 , 924.Xr ccdconfig 8 , 925.Xr config 8 , 926.Xr fsck 8 , 927.Xr ifconfig 8 , 928.Xr ipfw 8 , 929.Xr loader 8 , 930.Xr mount 8 , 931.Xr newfs 8 , 932.Xr route 8 , 933.Xr sysctl 8 , 934.Xr sysinstall 8 , 935.Xr tunefs 8 , 936.Xr vinum 8 937.Sh HISTORY 938The 939.Nm 940manual page was originally written by 941.An Matthew Dillon 942and first appeared 943in 944.Fx 4.3 , 945May 2001. 946