xref: /freebsd/share/man/man7/tuning.7 (revision 71fe318b852b8dfb3e799cb12ef184750f7f8eac)
1.\" Copyright (c) 2001, Matthew Dillon.  Terms and conditions are those of
2.\" the BSD Copyright as specified in the file "/usr/src/COPYRIGHT" in
3.\" the source tree.
4.\"
5.\" $FreeBSD$
6.\"
7.Dd June 25, 2002
8.Dt TUNING 7
9.Os
10.Sh NAME
11.Nm tuning
12.Nd performance tuning under FreeBSD
13.Sh SYSTEM SETUP - DISKLABEL, NEWFS, TUNEFS, SWAP
14When using
15.Xr disklabel 8
16or
17.Xr sysinstall 8
18to lay out your filesystems on a hard disk it is important to remember
19that hard drives can transfer data much more quickly from outer tracks
20than they can from inner tracks.
21To take advantage of this you should
22try to pack your smaller filesystems and swap closer to the outer tracks,
23follow with the larger filesystems, and end with the largest filesystems.
24It is also important to size system standard filesystems such that you
25will not be forced to resize them later as you scale the machine up.
26I usually create, in order, a 128M root, 1G swap, 128M
27.Pa /var ,
28128M
29.Pa /var/tmp ,
303G
31.Pa /usr ,
32and use any remaining space for
33.Pa /home .
34.Pp
35You should typically size your swap space to approximately 2x main memory.
36If you do not have a lot of RAM, though, you will generally want a lot
37more swap.
38It is not recommended that you configure any less than
39256M of swap on a system and you should keep in mind future memory
40expansion when sizing the swap partition.
41The kernel's VM paging algorithms are tuned to perform best when there is
42at least 2x swap versus main memory.
43Configuring too little swap can lead
44to inefficiencies in the VM page scanning code as well as create issues
45later on if you add more memory to your machine.
46Finally, on larger systems
47with multiple SCSI disks (or multiple IDE disks operating on different
48controllers), we strongly recommend that you configure swap on each drive
49(up to four drives).
50The swap partitions on the drives should be approximately the same size.
51The kernel can handle arbitrary sizes but
52internal data structures scale to 4 times the largest swap partition.
53Keeping
54the swap partitions near the same size will allow the kernel to optimally
55stripe swap space across the N disks.
56Do not worry about overdoing it a
57little, swap space is the saving grace of
58.Ux
59and even if you do not normally use much swap, it can give you more time to
60recover from a runaway program before being forced to reboot.
61.Pp
62How you size your
63.Pa /var
64partition depends heavily on what you intend to use the machine for.
65This
66partition is primarily used to hold mailboxes, the print spool, and log
67files.
68Some people even make
69.Pa /var/log
70its own partition (but except for extreme cases it is not worth the waste
71of a partition ID).
72If your machine is intended to act as a mail
73or print server,
74or you are running a heavily visited web server, you should consider
75creating a much larger partition \(en perhaps a gig or more.
76It is very easy
77to underestimate log file storage requirements.
78.Pp
79Sizing
80.Pa /var/tmp
81depends on the kind of temporary file usage you think you will need.
82128M is
83the minimum we recommend.
84Also note that sysinstall will create a
85.Pa /tmp
86directory.
87Dedicating a partition for temporary file storage is important for
88two reasons: first, it reduces the possibility of filesystem corruption
89in a crash, and second it reduces the chance of a runaway process that
90fills up
91.Oo Pa /var Oc Ns Pa /tmp
92from blowing up more critical subsystems (mail,
93logging, etc).
94Filling up
95.Oo Pa /var Oc Ns Pa /tmp
96is a very common problem to have.
97.Pp
98In the old days there were differences between
99.Pa /tmp
100and
101.Pa /var/tmp ,
102but the introduction of
103.Pa /var
104(and
105.Pa /var/tmp )
106led to massive confusion
107by program writers so today programs haphazardly use one or the
108other and thus no real distinction can be made between the two.
109So it makes sense to have just one temporary directory and
110softlink to it from the other tmp directory locations.
111However you handle
112.Pa /tmp ,
113the one thing you do not want to do is leave it sitting
114on the root partition where it might cause root to fill up or possibly
115corrupt root in a crash/reboot situation.
116.Pp
117The
118.Pa /usr
119partition holds the bulk of the files required to support the system and
120a subdirectory within it called
121.Pa /usr/local
122holds the bulk of the files installed from the
123.Xr ports 7
124hierarchy.
125If you do not use ports all that much and do not intend to keep
126system source
127.Pq Pa /usr/src
128on the machine, you can get away with
129a 1 gigabyte
130.Pa /usr
131partition.
132However, if you install a lot of ports
133(especially window managers and Linux-emulated binaries), we recommend
134at least a 2 gigabyte
135.Pa /usr
136and if you also intend to keep system source
137on the machine, we recommend a 3 gigabyte
138.Pa /usr .
139Do not underestimate the
140amount of space you will need in this partition, it can creep up and
141surprise you!
142.Pp
143The
144.Pa /home
145partition is typically used to hold user-specific data.
146I usually size it to the remainder of the disk.
147.Pp
148Why partition at all?
149Why not create one big
150.Pa /
151partition and be done with it?
152Then I do not have to worry about undersizing things!
153Well, there are several reasons this is not a good idea.
154First,
155each partition has different operational characteristics and separating them
156allows the filesystem to tune itself to those characteristics.
157For example,
158the root and
159.Pa /usr
160partitions are read-mostly, with very little writing, while
161a lot of reading and writing could occur in
162.Pa /var
163and
164.Pa /var/tmp .
165By properly
166partitioning your system fragmentation introduced in the smaller more
167heavily write-loaded partitions will not bleed over into the mostly-read
168partitions.
169Additionally, keeping the write-loaded partitions closer to
170the edge of the disk (i.e. before the really big partitions instead of after
171in the partition table) will increase I/O performance in the partitions
172where you need it the most.
173Now it is true that you might also need I/O
174performance in the larger partitions, but they are so large that shifting
175them more towards the edge of the disk will not lead to a significant
176performance improvement whereas moving
177.Pa /var
178to the edge can have a huge impact.
179Finally, there are safety concerns.
180Having a small neat root partition that
181is essentially read-only gives it a greater chance of surviving a bad crash
182intact.
183.Pp
184Properly partitioning your system also allows you to tune
185.Xr newfs 8 ,
186and
187.Xr tunefs 8
188parameters.
189Tuning
190.Xr newfs 8
191requires more experience but can lead to significant improvements in
192performance.
193There are three parameters that are relatively safe to tune:
194.Em blocksize , bytes/i-node ,
195and
196.Em cylinders/group .
197.Pp
198.Fx
199performs best when using 8K or 16K filesystem block sizes.
200The default filesystem block size is 16K,
201which provides best performance for most applications,
202with the exception of those that perform random access on large files
203(such as database server software).
204Such applications tend to perform better with a smaller block size,
205although modern disk characteristics are such that the performance
206gain from using a smaller block size may not be worth consideration.
207Using a block size larger than 16K
208can cause fragmentation of the buffer cache and
209lead to lower performance.
210.Pp
211The defaults may be unsuitable
212for a filesystem that requires a very large number of i-nodes
213or is intended to hold a large number of very small files.
214Such a filesystem should be created with an 8K or 4K block size.
215This also requires you to specify a smaller
216fragment size.
217We recommend always using a fragment size that is 1/8
218the block size (less testing has been done on other fragment size factors).
219The
220.Xr newfs 8
221options for this would be
222.Dq Li "newfs -f 1024 -b 8192 ..." .
223.Pp
224If a large partition is intended to be used to hold fewer, larger files, such
225as database files, you can increase the
226.Em bytes/i-node
227ratio which reduces the number of i-nodes (maximum number of files and
228directories that can be created) for that partition.
229Decreasing the number
230of i-nodes in a filesystem can greatly reduce
231.Xr fsck 8
232recovery times after a crash.
233Do not use this option
234unless you are actually storing large files on the partition, because if you
235overcompensate you can wind up with a filesystem that has lots of free
236space remaining but cannot accommodate any more files.
237Using 32768, 65536, or 262144 bytes/i-node is recommended.
238You can go higher but
239it will have only incremental effects on
240.Xr fsck 8
241recovery times.
242For example,
243.Dq Li "newfs -i 32768 ..." .
244.Pp
245.Xr tunefs 8
246may be used to further tune a filesystem.
247This command can be run in
248single-user mode without having to reformat the filesystem.
249However, this is possibly the most abused program in the system.
250Many people attempt to
251increase available filesystem space by setting the min-free percentage to 0.
252This can lead to severe filesystem fragmentation and we do not recommend
253that you do this.
254Really the only
255.Xr tunefs 8
256option worthwhile here is turning on
257.Em softupdates
258with
259.Dq Li "tunefs -n enable /filesystem" .
260(Note: in
261.Fx 4.5
262and later, softupdates can be turned on using the
263.Fl U
264option to
265.Xr newfs 8 ,
266and
267.Xr sysinstall 8
268will typically enable softupdates automatically for non-root filesystems).
269Softupdates drastically improves meta-data performance, mainly file
270creation and deletion.
271We recommend enabling softupdates on most filesystems; however, there
272are two limitations to softupdates that you should be aware of when
273determining whether to use it on a filesystem.
274First, softupdates guarantees filesystem consistency in the
275case of a crash but could very easily be several seconds (even a minute!)
276behind on pending write to the physical disk.
277If you crash you may lose more work
278than otherwise.
279Secondly, softupdates delays the freeing of filesystem
280blocks.
281If you have a filesystem (such as the root filesystem) which is
282close to full, doing a major update of it, e.g.\&
283.Dq Li "make installworld" ,
284can run it out of space and cause the update to fail.
285For this reason, softupdates will not be enabled on the root filesystem
286during a typical install.  There is no loss of performance since the root
287filesystem is rarely written to.
288.Pp
289A number of run-time
290.Xr mount 8
291options exist that can help you tune the system.
292The most obvious and most dangerous one is
293.Cm async .
294Do not ever use it; it is far too dangerous.
295A less dangerous and more
296useful
297.Xr mount 8
298option is called
299.Cm noatime .
300.Ux
301filesystems normally update the last-accessed time of a file or
302directory whenever it is accessed.
303This operation is handled in
304.Fx
305with a delayed write and normally does not create a burden on the system.
306However, if your system is accessing a huge number of files on a continuing
307basis the buffer cache can wind up getting polluted with atime updates,
308creating a burden on the system.
309For example, if you are running a heavily
310loaded web site, or a news server with lots of readers, you might want to
311consider turning off atime updates on your larger partitions with this
312.Xr mount 8
313option.
314However, you should not gratuitously turn off atime
315updates everywhere.
316For example, the
317.Pa /var
318filesystem customarily
319holds mailboxes, and atime (in combination with mtime) is used to
320determine whether a mailbox has new mail.
321You might as well leave
322atime turned on for mostly read-only partitions such as
323.Pa /
324and
325.Pa /usr
326as well.
327This is especially useful for
328.Pa /
329since some system utilities
330use the atime field for reporting.
331.Sh STRIPING DISKS
332In larger systems you can stripe partitions from several drives together
333to create a much larger overall partition.
334Striping can also improve
335the performance of a filesystem by splitting I/O operations across two
336or more disks.
337The
338.Xr vinum 8
339and
340.Xr ccdconfig 8
341utilities may be used to create simple striped filesystems.
342Generally
343speaking, striping smaller partitions such as the root and
344.Pa /var/tmp ,
345or essentially read-only partitions such as
346.Pa /usr
347is a complete waste of time.
348You should only stripe partitions that require serious I/O performance,
349typically
350.Pa /var , /home ,
351or custom partitions used to hold databases and web pages.
352Choosing the proper stripe size is also
353important.
354Filesystems tend to store meta-data on power-of-2 boundaries
355and you usually want to reduce seeking rather than increase seeking.
356This
357means you want to use a large off-center stripe size such as 1152 sectors
358so sequential I/O does not seek both disks and so meta-data is distributed
359across both disks rather than concentrated on a single disk.
360If
361you really need to get sophisticated, we recommend using a real hardware
362RAID controller from the list of
363.Fx
364supported controllers.
365.Sh SYSCTL TUNING
366.Xr sysctl 8
367variables permit system behavior to be monitored and controlled at
368run-time.
369Some sysctls simply report on the behavior of the system; others allow
370the system behavior to be modified;
371some may be set at boot time using
372.Xr rc.conf 5 ,
373but most will be set via
374.Xr sysctl.conf 5 .
375There are several hundred sysctls in the system, including many that appear
376to be candidates for tuning but actually are not.
377In this document we will only cover the ones that have the greatest effect
378on the system.
379.Pp
380The
381.Va kern.ipc.shm_use_phys
382sysctl defaults to 0 (off) and may be set to 0 (off) or 1 (on).
383Setting
384this parameter to 1 will cause all System V shared memory segments to be
385mapped to unpageable physical RAM.
386This feature only has an effect if you
387are either (A) mapping small amounts of shared memory across many (hundreds)
388of processes, or (B) mapping large amounts of shared memory across any
389number of processes.
390This feature allows the kernel to remove a great deal
391of internal memory management page-tracking overhead at the cost of wiring
392the shared memory into core, making it unswappable.
393.Pp
394The
395.Va vfs.vmiodirenable
396sysctl defaults to 1 (on).
397This parameter controls how directories are cached
398by the system.
399Most directories are small and use but a single fragment
400(typically 1K) in the filesystem and even less (typically 512 bytes) in
401the buffer cache.
402However, when operating in the default mode the buffer
403cache will only cache a fixed number of directories even if you have a huge
404amount of memory.
405Turning on this sysctl allows the buffer cache to use
406the VM Page Cache to cache the directories.
407The advantage is that all of
408memory is now available for caching directories.
409The disadvantage is that
410the minimum in-core memory used to cache a directory is the physical page
411size (typically 4K) rather than 512 bytes.
412We recommend turning this option off in memory-constrained environments;
413however, when on, it will substantially improve the performance of services
414that manipulate a large number of files.
415Such services can include web caches, large mail systems, and news systems.
416Turning on this option will generally not reduce performance even with the
417wasted memory but you should experiment to find out.
418.Pp
419The
420.Va vfs.write_behind
421sysctl defaults to 1 (on).
422This tells the filesystem to issue media
423writes as full clusters are collected, which typically occurs when writing
424large sequential files.
425The idea is to avoid saturating the buffer
426cache with dirty buffers when it would not benefit I/O performance.
427However,
428this may stall processes and under certain circumstances you may wish to turn
429it off.
430.Pp
431The
432.Va vfs.hirunningspace
433sysctl determines how much outstanding write I/O may be queued to
434disk controllers system-wide at any given instance.
435The default is
436usually sufficient but on machines with lots of disks you may want to bump
437it up to four or five megabytes.
438Note that setting too high a value
439(exceeding the buffer cache's write threshold) can lead to extremely
440bad clustering performance.
441Do not set this value arbitrarily high!
442Also,
443higher write queueing values may add latency to reads occuring at the same
444time.
445.Pp
446There are various other buffer-cache and VM page cache related sysctls.
447We do not recommend modifying these values.
448As of
449.Fx 4.3 ,
450the VM system does an extremely good job tuning itself.
451.Pp
452The
453.Va net.inet.tcp.sendspace
454and
455.Va net.inet.tcp.recvspace
456sysctls are of particular interest if you are running network intensive
457applications.
458They control the amount of send and receive buffer space
459allowed for any given TCP connection.
460The default sending buffer is 32K; the default receiving buffer
461is 64K.
462You can often
463improve bandwidth utilization by increasing the default at the cost of
464eating up more kernel memory for each connection.
465We do not recommend
466increasing the defaults if you are serving hundreds or thousands of
467simultaneous connections because it is possible to quickly run the system
468out of memory due to stalled connections building up.
469But if you need
470high bandwidth over a fewer number of connections, especially if you have
471gigabit Ethernet, increasing these defaults can make a huge difference.
472You can adjust the buffer size for incoming and outgoing data separately.
473For example, if your machine is primarily doing web serving you may want
474to decrease the recvspace in order to be able to increase the
475sendspace without eating too much kernel memory.
476Note that the routing table (see
477.Xr route 8 )
478can be used to introduce route-specific send and receive buffer size
479defaults.
480.Pp
481As an additional management tool you can use pipes in your
482firewall rules (see
483.Xr ipfw 8 )
484to limit the bandwidth going to or from particular IP blocks or ports.
485For example, if you have a T1 you might want to limit your web traffic
486to 70% of the T1's bandwidth in order to leave the remainder available
487for mail and interactive use.
488Normally a heavily loaded web server
489will not introduce significant latencies into other services even if
490the network link is maxed out, but enforcing a limit can smooth things
491out and lead to longer term stability.
492Many people also enforce artificial
493bandwidth limitations in order to ensure that they are not charged for
494using too much bandwidth.
495.Pp
496Setting the send or receive TCP buffer to values larger then 65535 will result
497in a marginal performance improvement unless both hosts support the window
498scaling extension of the TCP protocol, which is controlled by the
499.Va net.inet.tcp.rfc1323
500sysctl.
501These extensions should be enabled and the TCP buffer size should be set
502to a value larger than 65536 in order to obtain good performance from
503certain types of network links; specifically, gigabit WAN links and
504high-latency satellite links.
505RFC1323 support is enabled by default.
506.Pp
507The
508.Va net.inet.tcp.always_keepalive
509sysctl determines whether or not the TCP implementation should attempt
510to detect dead TCP connections by intermittently delivering
511.Dq keepalives
512on the connection.
513By default, this is enabled for all applications; by setting this
514sysctl to 0, only applications that specifically request keepalives
515will use them.
516In most environments, TCP keepalives will improve the management of
517system state by expiring dead TCP connections, particularly for
518systems serving dialup users who may not always terminate individual
519TCP connections before disconnecting from the network.
520However, in some environments, temporary network outages may be
521incorrectly identified as dead sessions, resulting in unexpectedly
522terminated TCP connections.
523In such environments, setting the sysctl to 0 may reduce the occurrence of
524TCP session disconnections.
525.Pp
526The
527.Va net.inet.tcp.delayed_ack
528TCP feature is largly misunderstood.  Historically speaking this feature
529was designed to allow the acknowledgement to transmitted data to be returned
530along with the response.  For example, when you type over a remote shell
531the acknowledgement to the character you send can be returned along with the
532data representing the echo of the character.   With delayed acks turned off
533the acknowledgement may be sent in its own packet before the remote service
534has a chance to echo the data it just received.  This same concept also
535applies to any interactive protocol (e.g. SMTP, WWW, POP3) and can cut the
536number of tiny packets flowing across the network in half.   The FreeBSD
537delayed-ack implementation also follows the TCP protocol rule that
538at least every other packet be acknowledged even if the standard 100ms
539timeout has not yet passed.  Normally the worst a delayed ack can do is
540slightly delay the teardown of a connection, or slightly delay the ramp-up
541of a slow-start TCP connection.  While we aren't sure we believe that
542the several FAQs related to packages such as SAMBA and SQUID which advise
543turning off delayed acks may be refering to the slow-start issue.  In FreeBSD
544it would be more beneficial to increase the slow-start flightsize via
545the
546.Va net.inet.tcp.slowstart_flightsize
547sysctl rather then disable delayed acks.
548.Pp
549The
550.Va net.inet.tcp.inflight_enable
551sysctl turns on bandwidth delay product limiting for all TCP connections.
552The system will attempt to calculate the bandwidth delay product for each
553connection and limit the amount of data queued to the network to just the
554amount required to maintain optimum throughput.  This feature is useful
555if you are serving data over modems, GigE, or high speed WAN links (or
556any other link with a high bandwidth*delay product), especially if you are
557also using window scaling or have configured a large send window.  If
558you enable this option you should also be sure to set
559.Va net.inet.tcp.inflight_debug
560to 0 (disable debugging), and for production use setting
561.Va net.inet.tcp.inflight_min
562to at least 6144 may be beneficial.  Note, however, that setting high
563minimums may effectively disable bandwidth limiting depending on the link.
564The limiting feature reduces the amount of data built up in intermediate
565router and switch packet queues as well as reduces the amount of data built
566up in the local host's interface queue.  With fewer packets queued up,
567interactive connections, especially over slow modems, will also be able
568to operate with lower round trip times.  However, note that this feature
569only effects data transmission (uploading / server-side).  It does not
570effect data reception (downloading).
571.Pp
572The
573.Va net.inet.ip.portrange.*
574sysctls control the port number ranges automatically bound to TCP and UDP
575sockets.  There are three ranges:  A low range, a default range, and a
576high range, selectable via an IP_PORTRANGE setsockopt() call.  Most
577network programs use the default range which is controlled by
578.Va net.inet.ip.portrange.first
579and
580.Va net.inet.ip.portrange.last ,
581which defaults to 1024 and 5000 respectively.  Bound port ranges are
582used for outgoing connections and it is possible to run the system out
583of ports under certain circumstances.  This most commonly occurs when you are
584running a heavily loaded web proxy.  The port range is not an issue
585when running serves which handle mainly incoming connections such as a
586normal web server, or has a limited number of outgoing connections such
587as a mail relay.  For situations where you may run yourself out of
588ports we recommend increasing
589.Va net.inet.ip.portrange.last
590modestly.  A value of 10000 or 20000 or 30000 may be reasonable.  You should
591also consider firewall effects when changing the port range.  Some firewalls
592may block large ranges of ports (usually low-numbered ports) and expect systems
593to use higher ranges of ports for outgoing connections.  For this reason
594we do not recommend that
595.Va net.inet.ip.portrange.first
596be lowered.
597.Pp
598The
599.Va kern.ipc.somaxconn
600sysctl limits the size of the listen queue for accepting new TCP connections.
601The default value of 128 is typically too low for robust handling of new
602connections in a heavily loaded web server environment.
603For such environments,
604we recommend increasing this value to 1024 or higher.
605The service daemon
606may itself limit the listen queue size (e.g.\&
607.Xr sendmail 8 ,
608apache) but will
609often have a directive in its configuration file to adjust the queue size up.
610Larger listen queues also do a better job of fending off denial of service
611attacks.
612.Pp
613The
614.Va kern.maxfiles
615sysctl determines how many open files the system supports.
616The default is
617typically a few thousand but you may need to bump this up to ten or twenty
618thousand if you are running databases or large descriptor-heavy daemons.
619The read-only
620.Va kern.openfiles
621sysctl may be interrogated to determine the current number of open files
622on the system.
623.Pp
624The
625.Va vm.swap_idle_enabled
626sysctl is useful in large multi-user systems where you have lots of users
627entering and leaving the system and lots of idle processes.
628Such systems
629tend to generate a great deal of continuous pressure on free memory reserves.
630Turning this feature on and adjusting the swapout hysteresis (in idle
631seconds) via
632.Va vm.swap_idle_threshold1
633and
634.Va vm.swap_idle_threshold2
635allows you to depress the priority of pages associated with idle processes
636more quickly then the normal pageout algorithm.
637This gives a helping hand
638to the pageout daemon.
639Do not turn this option on unless you need it,
640because the tradeoff you are making is to essentially pre-page memory sooner
641rather then later, eating more swap and disk bandwidth.
642In a small system
643this option will have a detrimental effect but in a large system that is
644already doing moderate paging this option allows the VM system to stage
645whole processes into and out of memory more easily.
646.Sh LOADER TUNABLES
647Some aspects of the system behavior may not be tunable at runtime because
648memory allocations they perform must occur early in the boot process.
649To change loader tunables, you must set their values in
650.Xr loader.conf 5
651and reboot the system.
652.Pp
653.Va kern.maxusers
654controls the scaling of a number of static system tables, including defaults
655for the maximum number of open files, sizing of network memory resources, etc.
656As of
657.Fx 4.5 ,
658.Va kern.maxusers
659is automatically sized at boot based on the amount of memory available in
660the system, and may be determined at run-time by inspecting the value of the
661read-only
662.Va kern.maxusers
663sysctl.
664Some sites will require larger or smaller values of
665.Va kern.maxusers
666and may set it as a loader tunable; values of 64, 128, and 256 are not
667uncommon.
668We do not recommend going above 256 unless you need a huge number
669of file descriptors; many of the tunable values set to their defaults by
670.Va kern.maxusers
671may be individually overridden at boot-time or run-time as described
672elsewhere in this document.
673Systems older than
674.Fx 4.4
675must set this value via the kernel
676.Xr config 8
677option
678.Cd maxusers
679instead.
680.Pp
681.Va kern.ipc.nmbclusters
682may be adjusted to increase the number of network mbufs the system is
683willing to allocate.
684Each cluster represents approximately 2K of memory,
685so a value of 1024 represents 2M of kernel memory reserved for network
686buffers.
687You can do a simple calculation to figure out how many you need.
688If you have a web server which maxes out at 1000 simultaneous connections,
689and each connection eats a 16K receive and 16K send buffer, you need
690approximately 32MB worth of network buffers to deal with it.
691A good rule of
692thumb is to multiply by 2, so 32MBx2 = 64MB/2K = 32768.
693So for this case
694you would want to set
695.Va kern.ipc.nmbclusters
696to 32768.
697We recommend values between
6981024 and 4096 for machines with moderates amount of memory, and between 4096
699and 32768 for machines with greater amounts of memory.
700Under no circumstances
701should you specify an arbitrarily high value for this parameter, it could
702lead to a boot-time crash.
703The
704.Fl m
705option to
706.Xr netstat 1
707may be used to observe network cluster use.
708Older versions of
709.Fx
710do not have this tunable and require that the
711kernel
712.Xr config 8
713option
714.Dv NMBCLUSTERS
715be set instead.
716.Pp
717More and more programs are using the
718.Xr sendfile 2
719system call to transmit files over the network.
720The
721.Va kern.ipc.nsfbufs
722sysctl controls the number of filesystem buffers
723.Xr sendfile 2
724is allowed to use to perform its work.
725This parameter nominally scales
726with
727.Va kern.maxusers
728so you should not need to modify this parameter except under extreme
729circumstances.
730.Sh KERNEL CONFIG TUNING
731There are a number of kernel options that you may have to fiddle with in
732a large-scale system.
733In order to change these options you need to be
734able to compile a new kernel from source.
735The
736.Xr config 8
737manual page and the handbook are good starting points for learning how to
738do this.
739Generally the first thing you do when creating your own custom
740kernel is to strip out all the drivers and services you do not use.
741Removing things like
742.Dv INET6
743and drivers you do not have will reduce the size of your kernel, sometimes
744by a megabyte or more, leaving more memory available for applications.
745.Pp
746.Dv SCSI_DELAY
747and
748.Dv IDE_DELAY
749may be used to reduce system boot times.
750The defaults are fairly high and
751can be responsible for 15+ seconds of delay in the boot process.
752Reducing
753.Dv SCSI_DELAY
754to 5 seconds usually works (especially with modern drives).
755Reducing
756.Dv IDE_DELAY
757also works but you have to be a little more careful.
758.Pp
759There are a number of
760.Dv *_CPU
761options that can be commented out.
762If you only want the kernel to run
763on a Pentium class CPU, you can easily remove
764.Dv I386_CPU
765and
766.Dv I486_CPU ,
767but only remove
768.Dv I586_CPU
769if you are sure your CPU is being recognized as a Pentium II or better.
770Some clones may be recognized as a Pentium or even a 486 and not be able
771to boot without those options.
772If it works, great!
773The operating system
774will be able to better use higher-end CPU features for MMU, task switching,
775timebase, and even device operations.
776Additionally, higher-end CPUs support
7774MB MMU pages, which the kernel uses to map the kernel itself into memory,
778increasing its efficiency under heavy syscall loads.
779.Sh IDE WRITE CACHING
780.Fx 4.3
781flirted with turning off IDE write caching.
782This reduced write bandwidth
783to IDE disks but was considered necessary due to serious data consistency
784issues introduced by hard drive vendors.
785Basically the problem is that
786IDE drives lie about when a write completes.
787With IDE write caching turned
788on, IDE hard drives will not only write data to disk out of order, they
789will sometimes delay some of the blocks indefinitely under heavy disk
790load.
791A crash or power failure can result in serious filesystem
792corruption.
793So our default was changed to be safe.
794Unfortunately, the
795result was such a huge loss in performance that we caved in and changed the
796default back to on after the release.
797You should check the default on
798your system by observing the
799.Va hw.ata.wc
800sysctl variable.
801If IDE write caching is turned off, you can turn it back
802on by setting the
803.Va hw.ata.wc
804loader tunable to 1.
805More information on tuning the ATA driver system may be found in the
806.Xr ata 4
807man page.
808.Pp
809There is a new experimental feature for IDE hard drives called
810.Va hw.ata.tags
811(you also set this in the boot loader) which allows write caching to be safely
812turned on.
813This brings SCSI tagging features to IDE drives.
814As of this
815writing only IBM DPTA and DTLA drives support the feature.
816Warning!
817These
818drives apparently have quality control problems and I do not recommend
819purchasing them at this time.
820If you need performance, go with SCSI.
821.Sh CPU, MEMORY, DISK, NETWORK
822The type of tuning you do depends heavily on where your system begins to
823bottleneck as load increases.
824If your system runs out of CPU (idle times
825are perpetually 0%) then you need to consider upgrading the CPU or moving to
826an SMP motherboard (multiple CPU's), or perhaps you need to revisit the
827programs that are causing the load and try to optimize them.
828If your system
829is paging to swap a lot you need to consider adding more memory.
830If your
831system is saturating the disk you typically see high CPU idle times and
832total disk saturation.
833.Xr systat 1
834can be used to monitor this.
835There are many solutions to saturated disks:
836increasing memory for caching, mirroring disks, distributing operations across
837several machines, and so forth.
838If disk performance is an issue and you
839are using IDE drives, switching to SCSI can help a great deal.
840While modern
841IDE drives compare with SCSI in raw sequential bandwidth, the moment you
842start seeking around the disk SCSI drives usually win.
843.Pp
844Finally, you might run out of network suds.
845The first line of defense for
846improving network performance is to make sure you are using switches instead
847of hubs, especially these days where switches are almost as cheap.
848Hubs
849have severe problems under heavy loads due to collision backoff and one bad
850host can severely degrade the entire LAN.
851Second, optimize the network path
852as much as possible.
853For example, in
854.Xr firewall 7
855we describe a firewall protecting internal hosts with a topology where
856the externally visible hosts are not routed through it.
857Use 100BaseT rather
858than 10BaseT, or use 1000BaseT rather then 100BaseT, depending on your needs.
859Most bottlenecks occur at the WAN link (e.g.\&
860modem, T1, DSL, whatever).
861If expanding the link is not an option it may be possible to use the
862.Xr dummynet 4
863feature to implement peak shaving or other forms of traffic shaping to
864prevent the overloaded service (such as web services) from affecting other
865services (such as email), or vice versa.
866In home installations this could
867be used to give interactive traffic (your browser,
868.Xr ssh 1
869logins) priority
870over services you export from your box (web services, email).
871.Sh SEE ALSO
872.Xr netstat 1 ,
873.Xr systat 1 ,
874.Xr ata 4 ,
875.Xr dummynet 4 ,
876.Xr login.conf 5 ,
877.Xr rc.conf 5 ,
878.Xr sysctl.conf 5 ,
879.Xr firewall 7 ,
880.Xr hier 7 ,
881.Xr ports 7 ,
882.Xr boot 8 ,
883.Xr ccdconfig 8 ,
884.Xr config 8 ,
885.Xr disklabel 8 ,
886.Xr fsck 8 ,
887.Xr ifconfig 8 ,
888.Xr ipfw 8 ,
889.Xr loader 8 ,
890.Xr mount 8 ,
891.Xr newfs 8 ,
892.Xr route 8 ,
893.Xr sysctl 8 ,
894.Xr sysinstall 8 ,
895.Xr tunefs 8 ,
896.Xr vinum 8
897.Sh HISTORY
898The
899.Nm
900manual page was originally written by
901.An Matthew Dillon
902and first appeared
903in
904.Fx 4.3 ,
905May 2001.
906