1.\" Copyright (c) 2001, Matthew Dillon. Terms and conditions are those of 2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in 3.\" the source tree. 4.\" 5.\" $FreeBSD$ 6.\" 7.Dd June 25, 2002 8.Dt TUNING 7 9.Os 10.Sh NAME 11.Nm tuning 12.Nd performance tuning under FreeBSD 13.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP 14When using 15.Xr disklabel 8 16or 17.Xr sysinstall 8 18to lay out your filesystems on a hard disk it is important to remember 19that hard drives can transfer data much more quickly from outer tracks 20than they can from inner tracks. 21To take advantage of this you should 22try to pack your smaller filesystems and swap closer to the outer tracks, 23follow with the larger filesystems, and end with the largest filesystems. 24It is also important to size system standard filesystems such that you 25will not be forced to resize them later as you scale the machine up. 26I usually create, in order, a 128M root, 1G swap, 128M 27.Pa /var , 28128M 29.Pa /var/tmp , 303G 31.Pa /usr , 32and use any remaining space for 33.Pa /home . 34.Pp 35You should typically size your swap space to approximately 2x main memory. 36If you do not have a lot of RAM, though, you will generally want a lot 37more swap. 38It is not recommended that you configure any less than 39256M of swap on a system and you should keep in mind future memory 40expansion when sizing the swap partition. 41The kernel's VM paging algorithms are tuned to perform best when there is 42at least 2x swap versus main memory. 43Configuring too little swap can lead 44to inefficiencies in the VM page scanning code as well as create issues 45later on if you add more memory to your machine. 46Finally, on larger systems 47with multiple SCSI disks (or multiple IDE disks operating on different 48controllers), we strongly recommend that you configure swap on each drive 49(up to four drives). 50The swap partitions on the drives should be approximately the same size. 51The kernel can handle arbitrary sizes but 52internal data structures scale to 4 times the largest swap partition. 53Keeping 54the swap partitions near the same size will allow the kernel to optimally 55stripe swap space across the N disks. 56Do not worry about overdoing it a 57little, swap space is the saving grace of 58.Ux 59and even if you do not normally use much swap, it can give you more time to 60recover from a runaway program before being forced to reboot. 61.Pp 62How you size your 63.Pa /var 64partition depends heavily on what you intend to use the machine for. 65This 66partition is primarily used to hold mailboxes, the print spool, and log 67files. 68Some people even make 69.Pa /var/log 70its own partition (but except for extreme cases it is not worth the waste 71of a partition ID). 72If your machine is intended to act as a mail 73or print server, 74or you are running a heavily visited web server, you should consider 75creating a much larger partition \(en perhaps a gig or more. 76It is very easy 77to underestimate log file storage requirements. 78.Pp 79Sizing 80.Pa /var/tmp 81depends on the kind of temporary file usage you think you will need. 82128M is 83the minimum we recommend. 84Also note that sysinstall will create a 85.Pa /tmp 86directory. 87Dedicating a partition for temporary file storage is important for 88two reasons: first, it reduces the possibility of filesystem corruption 89in a crash, and second it reduces the chance of a runaway process that 90fills up 91.Oo Pa /var Oc Ns Pa /tmp 92from blowing up more critical subsystems (mail, 93logging, etc). 94Filling up 95.Oo Pa /var Oc Ns Pa /tmp 96is a very common problem to have. 97.Pp 98In the old days there were differences between 99.Pa /tmp 100and 101.Pa /var/tmp , 102but the introduction of 103.Pa /var 104(and 105.Pa /var/tmp ) 106led to massive confusion 107by program writers so today programs haphazardly use one or the 108other and thus no real distinction can be made between the two. 109So it makes sense to have just one temporary directory and 110softlink to it from the other tmp directory locations. 111However you handle 112.Pa /tmp , 113the one thing you do not want to do is leave it sitting 114on the root partition where it might cause root to fill up or possibly 115corrupt root in a crash/reboot situation. 116.Pp 117The 118.Pa /usr 119partition holds the bulk of the files required to support the system and 120a subdirectory within it called 121.Pa /usr/local 122holds the bulk of the files installed from the 123.Xr ports 7 124hierarchy. 125If you do not use ports all that much and do not intend to keep 126system source 127.Pq Pa /usr/src 128on the machine, you can get away with 129a 1 gigabyte 130.Pa /usr 131partition. 132However, if you install a lot of ports 133(especially window managers and Linux-emulated binaries), we recommend 134at least a 2 gigabyte 135.Pa /usr 136and if you also intend to keep system source 137on the machine, we recommend a 3 gigabyte 138.Pa /usr . 139Do not underestimate the 140amount of space you will need in this partition, it can creep up and 141surprise you! 142.Pp 143The 144.Pa /home 145partition is typically used to hold user-specific data. 146I usually size it to the remainder of the disk. 147.Pp 148Why partition at all? 149Why not create one big 150.Pa / 151partition and be done with it? 152Then I do not have to worry about undersizing things! 153Well, there are several reasons this is not a good idea. 154First, 155each partition has different operational characteristics and separating them 156allows the filesystem to tune itself to those characteristics. 157For example, 158the root and 159.Pa /usr 160partitions are read-mostly, with very little writing, while 161a lot of reading and writing could occur in 162.Pa /var 163and 164.Pa /var/tmp . 165By properly 166partitioning your system fragmentation introduced in the smaller more 167heavily write-loaded partitions will not bleed over into the mostly-read 168partitions. 169Additionally, keeping the write-loaded partitions closer to 170the edge of the disk (i.e. before the really big partitions instead of after 171in the partition table) will increase I/O performance in the partitions 172where you need it the most. 173Now it is true that you might also need I/O 174performance in the larger partitions, but they are so large that shifting 175them more towards the edge of the disk will not lead to a significant 176performance improvement whereas moving 177.Pa /var 178to the edge can have a huge impact. 179Finally, there are safety concerns. 180Having a small neat root partition that 181is essentially read-only gives it a greater chance of surviving a bad crash 182intact. 183.Pp 184Properly partitioning your system also allows you to tune 185.Xr newfs 8 , 186and 187.Xr tunefs 8 188parameters. 189Tuning 190.Xr newfs 8 191requires more experience but can lead to significant improvements in 192performance. 193There are three parameters that are relatively safe to tune: 194.Em blocksize , bytes/i-node , 195and 196.Em cylinders/group . 197.Pp 198.Fx 199performs best when using 8K or 16K filesystem block sizes. 200The default filesystem block size is 16K, 201which provides best performance for most applications, 202with the exception of those that perform random access on large files 203(such as database server software). 204Such applications tend to perform better with a smaller block size, 205although modern disk characteristics are such that the performance 206gain from using a smaller block size may not be worth consideration. 207Using a block size larger than 16K 208can cause fragmentation of the buffer cache and 209lead to lower performance. 210.Pp 211The defaults may be unsuitable 212for a filesystem that requires a very large number of i-nodes 213or is intended to hold a large number of very small files. 214Such a filesystem should be created with an 8K or 4K block size. 215This also requires you to specify a smaller 216fragment size. 217We recommend always using a fragment size that is 1/8 218the block size (less testing has been done on other fragment size factors). 219The 220.Xr newfs 8 221options for this would be 222.Dq Li "newfs -f 1024 -b 8192 ..." . 223.Pp 224If a large partition is intended to be used to hold fewer, larger files, such 225as database files, you can increase the 226.Em bytes/i-node 227ratio which reduces the number of i-nodes (maximum number of files and 228directories that can be created) for that partition. 229Decreasing the number 230of i-nodes in a filesystem can greatly reduce 231.Xr fsck 8 232recovery times after a crash. 233Do not use this option 234unless you are actually storing large files on the partition, because if you 235overcompensate you can wind up with a filesystem that has lots of free 236space remaining but cannot accommodate any more files. 237Using 32768, 65536, or 262144 bytes/i-node is recommended. 238You can go higher but 239it will have only incremental effects on 240.Xr fsck 8 241recovery times. 242For example, 243.Dq Li "newfs -i 32768 ..." . 244.Pp 245.Xr tunefs 8 246may be used to further tune a filesystem. 247This command can be run in 248single-user mode without having to reformat the filesystem. 249However, this is possibly the most abused program in the system. 250Many people attempt to 251increase available filesystem space by setting the min-free percentage to 0. 252This can lead to severe filesystem fragmentation and we do not recommend 253that you do this. 254Really the only 255.Xr tunefs 8 256option worthwhile here is turning on 257.Em softupdates 258with 259.Dq Li "tunefs -n enable /filesystem" . 260(Note: in 261.Fx 4.5 262and later, softupdates can be turned on using the 263.Fl U 264option to 265.Xr newfs 8 , 266and 267.Xr sysinstall 8 268will typically enable softupdates automatically for non-root filesystems). 269Softupdates drastically improves meta-data performance, mainly file 270creation and deletion. 271We recommend enabling softupdates on most filesystems; however, there 272are two limitations to softupdates that you should be aware of when 273determining whether to use it on a filesystem. 274First, softupdates guarantees filesystem consistency in the 275case of a crash but could very easily be several seconds (even a minute!) 276behind on pending write to the physical disk. 277If you crash you may lose more work 278than otherwise. 279Secondly, softupdates delays the freeing of filesystem 280blocks. 281If you have a filesystem (such as the root filesystem) which is 282close to full, doing a major update of it, e.g.\& 283.Dq Li "make installworld" , 284can run it out of space and cause the update to fail. 285For this reason, softupdates will not be enabled on the root filesystem 286during a typical install. There is no loss of performance since the root 287filesystem is rarely written to. 288.Pp 289A number of run-time 290.Xr mount 8 291options exist that can help you tune the system. 292The most obvious and most dangerous one is 293.Cm async . 294Do not ever use it; it is far too dangerous. 295A less dangerous and more 296useful 297.Xr mount 8 298option is called 299.Cm noatime . 300.Ux 301filesystems normally update the last-accessed time of a file or 302directory whenever it is accessed. 303This operation is handled in 304.Fx 305with a delayed write and normally does not create a burden on the system. 306However, if your system is accessing a huge number of files on a continuing 307basis the buffer cache can wind up getting polluted with atime updates, 308creating a burden on the system. 309For example, if you are running a heavily 310loaded web site, or a news server with lots of readers, you might want to 311consider turning off atime updates on your larger partitions with this 312.Xr mount 8 313option. 314However, you should not gratuitously turn off atime 315updates everywhere. 316For example, the 317.Pa /var 318filesystem customarily 319holds mailboxes, and atime (in combination with mtime) is used to 320determine whether a mailbox has new mail. 321You might as well leave 322atime turned on for mostly read-only partitions such as 323.Pa / 324and 325.Pa /usr 326as well. 327This is especially useful for 328.Pa / 329since some system utilities 330use the atime field for reporting. 331.Sh STRIPING DISKS 332In larger systems you can stripe partitions from several drives together 333to create a much larger overall partition. 334Striping can also improve 335the performance of a filesystem by splitting I/O operations across two 336or more disks. 337The 338.Xr vinum 8 339and 340.Xr ccdconfig 8 341utilities may be used to create simple striped filesystems. 342Generally 343speaking, striping smaller partitions such as the root and 344.Pa /var/tmp , 345or essentially read-only partitions such as 346.Pa /usr 347is a complete waste of time. 348You should only stripe partitions that require serious I/O performance, 349typically 350.Pa /var , /home , 351or custom partitions used to hold databases and web pages. 352Choosing the proper stripe size is also 353important. 354Filesystems tend to store meta-data on power-of-2 boundaries 355and you usually want to reduce seeking rather than increase seeking. 356This 357means you want to use a large off-center stripe size such as 1152 sectors 358so sequential I/O does not seek both disks and so meta-data is distributed 359across both disks rather than concentrated on a single disk. 360If 361you really need to get sophisticated, we recommend using a real hardware 362RAID controller from the list of 363.Fx 364supported controllers. 365.Sh SYSCTL TUNING 366.Xr sysctl 8 367variables permit system behavior to be monitored and controlled at 368run-time. 369Some sysctls simply report on the behavior of the system; others allow 370the system behavior to be modified; 371some may be set at boot time using 372.Xr rc.conf 5 , 373but most will be set via 374.Xr sysctl.conf 5 . 375There are several hundred sysctls in the system, including many that appear 376to be candidates for tuning but actually are not. 377In this document we will only cover the ones that have the greatest effect 378on the system. 379.Pp 380The 381.Va kern.ipc.shm_use_phys 382sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on). 383Setting 384this parameter to 1 will cause all System V shared memory segments to be 385mapped to unpageable physical RAM. 386This feature only has an effect if you 387are either (A) mapping small amounts of shared memory across many (hundreds) 388of processes, or (B) mapping large amounts of shared memory across any 389number of processes. 390This feature allows the kernel to remove a great deal 391of internal memory management page-tracking overhead at the cost of wiring 392the shared memory into core, making it unswappable. 393.Pp 394The 395.Va vfs.vmiodirenable 396sysctl defaults to 1 (on). 397This parameter controls how directories are cached 398by the system. 399Most directories are small and use but a single fragment 400(typically 1K) in the filesystem and even less (typically 512 bytes) in 401the buffer cache. 402However, when operating in the default mode the buffer 403cache will only cache a fixed number of directories even if you have a huge 404amount of memory. 405Turning on this sysctl allows the buffer cache to use 406the VM Page Cache to cache the directories. 407The advantage is that all of 408memory is now available for caching directories. 409The disadvantage is that 410the minimum in-core memory used to cache a directory is the physical page 411size (typically 4K) rather than 512 bytes. 412We recommend turning this option off in memory-constrained environments; 413however, when on, it will substantially improve the performance of services 414that manipulate a large number of files. 415Such services can include web caches, large mail systems, and news systems. 416Turning on this option will generally not reduce performance even with the 417wasted memory but you should experiment to find out. 418.Pp 419The 420.Va vfs.write_behind 421sysctl defaults to 1 (on). 422This tells the filesystem to issue media 423writes as full clusters are collected, which typically occurs when writing 424large sequential files. 425The idea is to avoid saturating the buffer 426cache with dirty buffers when it would not benefit I/O performance. 427However, 428this may stall processes and under certain circumstances you may wish to turn 429it off. 430.Pp 431The 432.Va vfs.hirunningspace 433sysctl determines how much outstanding write I/O may be queued to 434disk controllers system-wide at any given instance. 435The default is 436usually sufficient but on machines with lots of disks you may want to bump 437it up to four or five megabytes. 438Note that setting too high a value 439(exceeding the buffer cache's write threshold) can lead to extremely 440bad clustering performance. 441Do not set this value arbitrarily high! 442Also, 443higher write queueing values may add latency to reads occuring at the same 444time. 445.Pp 446There are various other buffer-cache and VM page cache related sysctls. 447We do not recommend modifying these values. 448As of 449.Fx 4.3 , 450the VM system does an extremely good job tuning itself. 451.Pp 452The 453.Va net.inet.tcp.sendspace 454and 455.Va net.inet.tcp.recvspace 456sysctls are of particular interest if you are running network intensive 457applications. 458They control the amount of send and receive buffer space 459allowed for any given TCP connection. 460The default sending buffer is 32K; the default receiving buffer 461is 64K. 462You can often 463improve bandwidth utilization by increasing the default at the cost of 464eating up more kernel memory for each connection. 465We do not recommend 466increasing the defaults if you are serving hundreds or thousands of 467simultaneous connections because it is possible to quickly run the system 468out of memory due to stalled connections building up. 469But if you need 470high bandwidth over a fewer number of connections, especially if you have 471gigabit Ethernet, increasing these defaults can make a huge difference. 472You can adjust the buffer size for incoming and outgoing data separately. 473For example, if your machine is primarily doing web serving you may want 474to decrease the recvspace in order to be able to increase the 475sendspace without eating too much kernel memory. 476Note that the routing table (see 477.Xr route 8 ) 478can be used to introduce route-specific send and receive buffer size 479defaults. 480.Pp 481As an additional management tool you can use pipes in your 482firewall rules (see 483.Xr ipfw 8 ) 484to limit the bandwidth going to or from particular IP blocks or ports. 485For example, if you have a T1 you might want to limit your web traffic 486to 70% of the T1's bandwidth in order to leave the remainder available 487for mail and interactive use. 488Normally a heavily loaded web server 489will not introduce significant latencies into other services even if 490the network link is maxed out, but enforcing a limit can smooth things 491out and lead to longer term stability. 492Many people also enforce artificial 493bandwidth limitations in order to ensure that they are not charged for 494using too much bandwidth. 495.Pp 496Setting the send or receive TCP buffer to values larger then 65535 will result 497in a marginal performance improvement unless both hosts support the window 498scaling extension of the TCP protocol, which is controlled by the 499.Va net.inet.tcp.rfc1323 500sysctl. 501These extensions should be enabled and the TCP buffer size should be set 502to a value larger than 65536 in order to obtain good performance from 503certain types of network links; specifically, gigabit WAN links and 504high-latency satellite links. 505RFC1323 support is enabled by default. 506.Pp 507The 508.Va net.inet.tcp.always_keepalive 509sysctl determines whether or not the TCP implementation should attempt 510to detect dead TCP connections by intermittently delivering 511.Dq keepalives 512on the connection. 513By default, this is enabled for all applications; by setting this 514sysctl to 0, only applications that specifically request keepalives 515will use them. 516In most environments, TCP keepalives will improve the management of 517system state by expiring dead TCP connections, particularly for 518systems serving dialup users who may not always terminate individual 519TCP connections before disconnecting from the network. 520However, in some environments, temporary network outages may be 521incorrectly identified as dead sessions, resulting in unexpectedly 522terminated TCP connections. 523In such environments, setting the sysctl to 0 may reduce the occurrence of 524TCP session disconnections. 525.Pp 526The 527.Va net.inet.tcp.delayed_ack 528TCP feature is largly misunderstood. Historically speaking this feature 529was designed to allow the acknowledgement to transmitted data to be returned 530along with the response. For example, when you type over a remote shell 531the acknowledgement to the character you send can be returned along with the 532data representing the echo of the character. With delayed acks turned off 533the acknowledgement may be sent in its own packet before the remote service 534has a chance to echo the data it just received. This same concept also 535applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the 536number of tiny packets flowing across the network in half. The FreeBSD 537delayed-ack implementation also follows the TCP protocol rule that 538at least every other packet be acknowledged even if the standard 100ms 539timeout has not yet passed. Normally the worst a delayed ack can do is 540slightly delay the teardown of a connection, or slightly delay the ramp-up 541of a slow-start TCP connection. While we aren't sure we believe that 542the several FAQs related to packages such as SAMBA and SQUID which advise 543turning off delayed acks may be refering to the slow-start issue. In FreeBSD 544it would be more beneficial to increase the slow-start flightsize via 545the 546.Va net.inet.tcp.slowstart_flightsize 547sysctl rather then disable delayed acks. 548.Pp 549The 550.Va net.inet.tcp.inflight_enable 551sysctl turns on bandwidth delay product limiting for all TCP connections. 552The system will attempt to calculate the bandwidth delay product for each 553connection and limit the amount of data queued to the network to just the 554amount required to maintain optimum throughput. This feature is useful 555if you are serving data over modems, GigE, or high speed WAN links (or 556any other link with a high bandwidth*delay product), especially if you are 557also using window scaling or have configured a large send window. If 558you enable this option you should also be sure to set 559.Va net.inet.tcp.inflight_debug 560to 0 (disable debugging), and for production use setting 561.Va net.inet.tcp.inflight_min 562to at least 6144 may be beneficial. Note, however, that setting high 563minimums may effectively disable bandwidth limiting depending on the link. 564The limiting feature reduces the amount of data built up in intermediate 565router and switch packet queues as well as reduces the amount of data built 566up in the local host's interface queue. With fewer packets queued up, 567interactive connections, especially over slow modems, will also be able 568to operate with lower round trip times. However, note that this feature 569only effects data transmission (uploading / server-side). It does not 570effect data reception (downloading). 571.Pp 572The 573.Va net.inet.ip.portrange.* 574sysctls control the port number ranges automatically bound to TCP and UDP 575sockets. There are three ranges: A low range, a default range, and a 576high range, selectable via an IP_PORTRANGE setsockopt() call. Most 577network programs use the default range which is controlled by 578.Va net.inet.ip.portrange.first 579and 580.Va net.inet.ip.portrange.last , 581which defaults to 1024 and 5000 respectively. Bound port ranges are 582used for outgoing connections and it is possible to run the system out 583of ports under certain circumstances. This most commonly occurs when you are 584running a heavily loaded web proxy. The port range is not an issue 585when running serves which handle mainly incoming connections such as a 586normal web server, or has a limited number of outgoing connections such 587as a mail relay. For situations where you may run yourself out of 588ports we recommend increasing 589.Va net.inet.ip.portrange.last 590modestly. A value of 10000 or 20000 or 30000 may be reasonable. You should 591also consider firewall effects when changing the port range. Some firewalls 592may block large ranges of ports (usually low-numbered ports) and expect systems 593to use higher ranges of ports for outgoing connections. For this reason 594we do not recommend that 595.Va net.inet.ip.portrange.first 596be lowered. 597.Pp 598The 599.Va kern.ipc.somaxconn 600sysctl limits the size of the listen queue for accepting new TCP connections. 601The default value of 128 is typically too low for robust handling of new 602connections in a heavily loaded web server environment. 603For such environments, 604we recommend increasing this value to 1024 or higher. 605The service daemon 606may itself limit the listen queue size (e.g.\& 607.Xr sendmail 8 , 608apache) but will 609often have a directive in its configuration file to adjust the queue size up. 610Larger listen queues also do a better job of fending off denial of service 611attacks. 612.Pp 613The 614.Va kern.maxfiles 615sysctl determines how many open files the system supports. 616The default is 617typically a few thousand but you may need to bump this up to ten or twenty 618thousand if you are running databases or large descriptor-heavy daemons. 619The read-only 620.Va kern.openfiles 621sysctl may be interrogated to determine the current number of open files 622on the system. 623.Pp 624The 625.Va vm.swap_idle_enabled 626sysctl is useful in large multi-user systems where you have lots of users 627entering and leaving the system and lots of idle processes. 628Such systems 629tend to generate a great deal of continuous pressure on free memory reserves. 630Turning this feature on and adjusting the swapout hysteresis (in idle 631seconds) via 632.Va vm.swap_idle_threshold1 633and 634.Va vm.swap_idle_threshold2 635allows you to depress the priority of pages associated with idle processes 636more quickly then the normal pageout algorithm. 637This gives a helping hand 638to the pageout daemon. 639Do not turn this option on unless you need it, 640because the tradeoff you are making is to essentially pre-page memory sooner 641rather then later, eating more swap and disk bandwidth. 642In a small system 643this option will have a detrimental effect but in a large system that is 644already doing moderate paging this option allows the VM system to stage 645whole processes into and out of memory more easily. 646.Sh LOADER TUNABLES 647Some aspects of the system behavior may not be tunable at runtime because 648memory allocations they perform must occur early in the boot process. 649To change loader tunables, you must set their values in 650.Xr loader.conf 5 651and reboot the system. 652.Pp 653.Va kern.maxusers 654controls the scaling of a number of static system tables, including defaults 655for the maximum number of open files, sizing of network memory resources, etc. 656As of 657.Fx 4.5 , 658.Va kern.maxusers 659is automatically sized at boot based on the amount of memory available in 660the system, and may be determined at run-time by inspecting the value of the 661read-only 662.Va kern.maxusers 663sysctl. 664Some sites will require larger or smaller values of 665.Va kern.maxusers 666and may set it as a loader tunable; values of 64, 128, and 256 are not 667uncommon. 668We do not recommend going above 256 unless you need a huge number 669of file descriptors; many of the tunable values set to their defaults by 670.Va kern.maxusers 671may be individually overridden at boot-time or run-time as described 672elsewhere in this document. 673Systems older than 674.Fx 4.4 675must set this value via the kernel 676.Xr config 8 677option 678.Cd maxusers 679instead. 680.Pp 681.Va kern.ipc.nmbclusters 682may be adjusted to increase the number of network mbufs the system is 683willing to allocate. 684Each cluster represents approximately 2K of memory, 685so a value of 1024 represents 2M of kernel memory reserved for network 686buffers. 687You can do a simple calculation to figure out how many you need. 688If you have a web server which maxes out at 1000 simultaneous connections, 689and each connection eats a 16K receive and 16K send buffer, you need 690approximately 32MB worth of network buffers to deal with it. 691A good rule of 692thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768. 693So for this case 694you would want to set 695.Va kern.ipc.nmbclusters 696to 32768. 697We recommend values between 6981024 and 4096 for machines with moderates amount of memory, and between 4096 699and 32768 for machines with greater amounts of memory. 700Under no circumstances 701should you specify an arbitrarily high value for this parameter, it could 702lead to a boot-time crash. 703The 704.Fl m 705option to 706.Xr netstat 1 707may be used to observe network cluster use. 708Older versions of 709.Fx 710do not have this tunable and require that the 711kernel 712.Xr config 8 713option 714.Dv NMBCLUSTERS 715be set instead. 716.Pp 717More and more programs are using the 718.Xr sendfile 2 719system call to transmit files over the network. 720The 721.Va kern.ipc.nsfbufs 722sysctl controls the number of filesystem buffers 723.Xr sendfile 2 724is allowed to use to perform its work. 725This parameter nominally scales 726with 727.Va kern.maxusers 728so you should not need to modify this parameter except under extreme 729circumstances. 730.Sh KERNEL CONFIG TUNING 731There are a number of kernel options that you may have to fiddle with in 732a large-scale system. 733In order to change these options you need to be 734able to compile a new kernel from source. 735The 736.Xr config 8 737manual page and the handbook are good starting points for learning how to 738do this. 739Generally the first thing you do when creating your own custom 740kernel is to strip out all the drivers and services you do not use. 741Removing things like 742.Dv INET6 743and drivers you do not have will reduce the size of your kernel, sometimes 744by a megabyte or more, leaving more memory available for applications. 745.Pp 746.Dv SCSI_DELAY 747and 748.Dv IDE_DELAY 749may be used to reduce system boot times. 750The defaults are fairly high and 751can be responsible for 15+ seconds of delay in the boot process. 752Reducing 753.Dv SCSI_DELAY 754to 5 seconds usually works (especially with modern drives). 755Reducing 756.Dv IDE_DELAY 757also works but you have to be a little more careful. 758.Pp 759There are a number of 760.Dv *_CPU 761options that can be commented out. 762If you only want the kernel to run 763on a Pentium class CPU, you can easily remove 764.Dv I386_CPU 765and 766.Dv I486_CPU , 767but only remove 768.Dv I586_CPU 769if you are sure your CPU is being recognized as a Pentium II or better. 770Some clones may be recognized as a Pentium or even a 486 and not be able 771to boot without those options. 772If it works, great! 773The operating system 774will be able to better use higher-end CPU features for MMU, task switching, 775timebase, and even device operations. 776Additionally, higher-end CPUs support 7774MB MMU pages, which the kernel uses to map the kernel itself into memory, 778increasing its efficiency under heavy syscall loads. 779.Sh IDE WRITE CACHING 780.Fx 4.3 781flirted with turning off IDE write caching. 782This reduced write bandwidth 783to IDE disks but was considered necessary due to serious data consistency 784issues introduced by hard drive vendors. 785Basically the problem is that 786IDE drives lie about when a write completes. 787With IDE write caching turned 788on, IDE hard drives will not only write data to disk out of order, they 789will sometimes delay some of the blocks indefinitely under heavy disk 790load. 791A crash or power failure can result in serious filesystem 792corruption. 793So our default was changed to be safe. 794Unfortunately, the 795result was such a huge loss in performance that we caved in and changed the 796default back to on after the release. 797You should check the default on 798your system by observing the 799.Va hw.ata.wc 800sysctl variable. 801If IDE write caching is turned off, you can turn it back 802on by setting the 803.Va hw.ata.wc 804loader tunable to 1. 805More information on tuning the ATA driver system may be found in the 806.Xr ata 4 807man page. 808.Pp 809There is a new experimental feature for IDE hard drives called 810.Va hw.ata.tags 811(you also set this in the boot loader) which allows write caching to be safely 812turned on. 813This brings SCSI tagging features to IDE drives. 814As of this 815writing only IBM DPTA and DTLA drives support the feature. 816Warning! 817These 818drives apparently have quality control problems and I do not recommend 819purchasing them at this time. 820If you need performance, go with SCSI. 821.Sh CPU, MEMORY, DISK, NETWORK 822The type of tuning you do depends heavily on where your system begins to 823bottleneck as load increases. 824If your system runs out of CPU (idle times 825are perpetually 0%) then you need to consider upgrading the CPU or moving to 826an SMP motherboard (multiple CPU's), or perhaps you need to revisit the 827programs that are causing the load and try to optimize them. 828If your system 829is paging to swap a lot you need to consider adding more memory. 830If your 831system is saturating the disk you typically see high CPU idle times and 832total disk saturation. 833.Xr systat 1 834can be used to monitor this. 835There are many solutions to saturated disks: 836increasing memory for caching, mirroring disks, distributing operations across 837several machines, and so forth. 838If disk performance is an issue and you 839are using IDE drives, switching to SCSI can help a great deal. 840While modern 841IDE drives compare with SCSI in raw sequential bandwidth, the moment you 842start seeking around the disk SCSI drives usually win. 843.Pp 844Finally, you might run out of network suds. 845The first line of defense for 846improving network performance is to make sure you are using switches instead 847of hubs, especially these days where switches are almost as cheap. 848Hubs 849have severe problems under heavy loads due to collision backoff and one bad 850host can severely degrade the entire LAN. 851Second, optimize the network path 852as much as possible. 853For example, in 854.Xr firewall 7 855we describe a firewall protecting internal hosts with a topology where 856the externally visible hosts are not routed through it. 857Use 100BaseT rather 858than 10BaseT, or use 1000BaseT rather then 100BaseT, depending on your needs. 859Most bottlenecks occur at the WAN link (e.g.\& 860modem, T1, DSL, whatever). 861If expanding the link is not an option it may be possible to use the 862.Xr dummynet 4 863feature to implement peak shaving or other forms of traffic shaping to 864prevent the overloaded service (such as web services) from affecting other 865services (such as email), or vice versa. 866In home installations this could 867be used to give interactive traffic (your browser, 868.Xr ssh 1 869logins) priority 870over services you export from your box (web services, email). 871.Sh SEE ALSO 872.Xr netstat 1 , 873.Xr systat 1 , 874.Xr ata 4 , 875.Xr dummynet 4 , 876.Xr login.conf 5 , 877.Xr rc.conf 5 , 878.Xr sysctl.conf 5 , 879.Xr firewall 7 , 880.Xr hier 7 , 881.Xr ports 7 , 882.Xr boot 8 , 883.Xr ccdconfig 8 , 884.Xr config 8 , 885.Xr disklabel 8 , 886.Xr fsck 8 , 887.Xr ifconfig 8 , 888.Xr ipfw 8 , 889.Xr loader 8 , 890.Xr mount 8 , 891.Xr newfs 8 , 892.Xr route 8 , 893.Xr sysctl 8 , 894.Xr sysinstall 8 , 895.Xr tunefs 8 , 896.Xr vinum 8 897.Sh HISTORY 898The 899.Nm 900manual page was originally written by 901.An Matthew Dillon 902and first appeared 903in 904.Fx 4.3 , 905May 2001. 906