xref: /freebsd/share/man/man9/buf.9 (revision eb0838029f5226a6ffdfc54035444639ea242c0c)
188b85f74SMatthew Dillon.\" Copyright (c) 1998
288b85f74SMatthew Dillon.\"	The Regents of the University of California.  All rights reserved.
388b85f74SMatthew Dillon.\"
488b85f74SMatthew Dillon.\" Redistribution and use in source and binary forms, with or without
588b85f74SMatthew Dillon.\" modification, are permitted provided that the following conditions
688b85f74SMatthew Dillon.\" are met:
788b85f74SMatthew Dillon.\" 1. Redistributions of source code must retain the above copyright
888b85f74SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer.
988b85f74SMatthew Dillon.\" 2. Redistributions in binary form must reproduce the above copyright
1088b85f74SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer in the
1188b85f74SMatthew Dillon.\"    documentation and/or other materials provided with the distribution.
1288b85f74SMatthew Dillon.\" 3. All advertising materials mentioning features or use of this software
1388b85f74SMatthew Dillon.\"    must display the following acknowledgement:
1488b85f74SMatthew Dillon.\"	This product includes software developed by the University of
1588b85f74SMatthew Dillon.\"	California, Berkeley and its contributors.
1688b85f74SMatthew Dillon.\" 4. Neither the name of the University nor the names of its contributors
1788b85f74SMatthew Dillon.\"    may be used to endorse or promote products derived from this software
1888b85f74SMatthew Dillon.\"    without specific prior written permission.
1988b85f74SMatthew Dillon.\"
2088b85f74SMatthew Dillon.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
2188b85f74SMatthew Dillon.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
2288b85f74SMatthew Dillon.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
2388b85f74SMatthew Dillon.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
2488b85f74SMatthew Dillon.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
2588b85f74SMatthew Dillon.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
2688b85f74SMatthew Dillon.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
2788b85f74SMatthew Dillon.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
2888b85f74SMatthew Dillon.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
2988b85f74SMatthew Dillon.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
3088b85f74SMatthew Dillon.\" SUCH DAMAGE.
3188b85f74SMatthew Dillon.\"
327f3dea24SPeter Wemm.\" $FreeBSD$
3388b85f74SMatthew Dillon.\"
3488b85f74SMatthew Dillon.Dd December 22, 1998
3588b85f74SMatthew Dillon.Dt BUF 9
3688b85f74SMatthew Dillon.Os
3788b85f74SMatthew Dillon.Sh NAME
385d70612bSMike Pritchard.Nm BUF
39eb083802SRuslan Ermilov.Nd "kernel buffer I/O scheme used in FreeBSD VM system"
4088b85f74SMatthew Dillon.Sh DESCRIPTION
4188b85f74SMatthew Dillon.Pp
4288b85f74SMatthew DillonThe kernel implements a KVM abstraction of the buffer cache which allows it
4388b85f74SMatthew Dillonto map potentially disparate vm_page's into contiguous KVM for use by
4488b85f74SMatthew Dillon(mainly filesystem) devices and device I/O.  This abstraction supports
4588b85f74SMatthew Dillonblock sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
4688b85f74SMatthew DillonIt also supports a relatively primitive byte-granular valid range and dirty
4788b85f74SMatthew Dillonrange currently hardcoded for use by NFS.  The code implementing the
48a169de1eSAlexey ZelkinVM Buffer abstraction is mostly concentrated in
49a169de1eSAlexey Zelkin.Pa /usr/src/sys/kern/vfs_bio.c .
5088b85f74SMatthew Dillon.Pp
5188b85f74SMatthew DillonOne of the most important things to remember when dealing with buffer pointers
5288b85f74SMatthew Dillon(struct buf) is that the underlying pages are mapped directly from the buffer
5388b85f74SMatthew Dilloncache.  No data copying occurs in the scheme proper, though some filesystems
5488b85f74SMatthew Dillonsuch as UFS do have to copy a little when dealing with file fragments.  The
5588b85f74SMatthew Dillonsecond most important thing to remember is that due to the underlying page
5688b85f74SMatthew Dillonmapping, the b_data base pointer in a buf is always *page* aligned, not
5788b85f74SMatthew Dillon*block* aligned.  When you have a VM buffer representing some b_offset and
5888b85f74SMatthew Dillonb_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
5988b85f74SMatthew Dillonand not just b_data.  Finally, the VM system's core buffer cache supports
6088b85f74SMatthew Dillonvalid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.  Thus
6188b85f74SMatthew Dillona platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
6288b85f74SMatthew Dillonbits.  These bits are generally set and cleared in groups based on the device
6388b85f74SMatthew Dillonblock size of the device backing the page.  Complete page's worth are often
64a04dd748SMike Pritchardreferred to using the VM_PAGE_BITS_ALL bitmask (i.e. 0xFF if the hardware page
6588b85f74SMatthew Dillonsize is 4096).
6688b85f74SMatthew Dillon.Pp
6788b85f74SMatthew DillonVM buffers also keep track of a byte-granular dirty range and valid range.
6888b85f74SMatthew DillonThis feature is normally only used by the NFS subsystem.  I'm not sure why it
6988b85f74SMatthew Dillonis used at all, actually, since we have DEV_BSIZE valid/dirty granularity
7088b85f74SMatthew Dillonwithin the VM buffer.  If a buffer dirty operation creates a 'hole',
7188b85f74SMatthew Dillonthe dirty range will extend to cover the hole.  If a buffer validation
7288b85f74SMatthew Dillonoperation creates a 'hole' the byte-granular valid range is left alone and
7388b85f74SMatthew Dillonwill not take into account the new extension.  Thus the whole byte-granular
7488b85f74SMatthew Dillonabstraction is considered a bad hack and it would be nice if we could get rid
7588b85f74SMatthew Dillonof it completely.
7688b85f74SMatthew Dillon.Pp
7788b85f74SMatthew DillonA VM buffer is capable of mapping the underlying VM cache pages into KVM in
7888b85f74SMatthew Dillonorder to allow the kernel to directly manipulate the data associated with
7988b85f74SMatthew Dillonthe (vnode,b_offset,b_size).  The kernel typically unmaps VM buffers the moment
8088b85f74SMatthew Dillonthey are no longer needed but often keeps the 'struct buf' structure
8188b85f74SMatthew Dilloninstantiated and even bp->b_pages array instantiated despite having unmapped
8288b85f74SMatthew Dillonthem from KVM.  If a page making up a VM buffer is about to undergo I/O, the
8388b85f74SMatthew Dillonsystem typically unmaps it from KVM and replaces the page in the b_pages[]
8488b85f74SMatthew Dillonarray with a placemarker called bogus_page.  The placemarker forces any kernel
8588b85f74SMatthew Dillonsubsystems referencing the associated struct buf to re-lookup the associated
8688b85f74SMatthew Dillonpage.  I believe the placemarker hack is used to allow sophisticated devices
8788b85f74SMatthew Dillonsuch as filesystem devices to remap underlying pages in order to deal with,
8888b85f74SMatthew Dillonfor example, remapping a file fragment into a file block.
8988b85f74SMatthew Dillon.Pp
9088b85f74SMatthew DillonVM buffers are used to track I/O operations within the kernel.  Unfortunately,
9188b85f74SMatthew Dillonthe I/O implementation is also somewhat of a hack because the kernel wants
9288b85f74SMatthew Dillonto clear the dirty bit on the underlying pages the moment it queues the I/O
9388b85f74SMatthew Dillonto the VFS device, not when the physical I/O is actually initiated.  This
9488b85f74SMatthew Dilloncan create confusion within filesystem devices that use delayed-writes because
9588b85f74SMatthew Dillonyou wind up with pages marked clean that are actually still dirty.  If not
9688b85f74SMatthew Dillontreated carefully, these pages could be thrown away!  Indeed, a number of
97a169de1eSAlexey Zelkinserious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
9888b85f74SMatthew DillonThe kernel uses an instantiated VM buffer (i.e. struct buf) to placemark pages
9988b85f74SMatthew Dillonin this special state.  The buffer is typically flagged B_DELWRI.  When a
10088b85f74SMatthew Dillondevice no longer needs a buffer it typically flags it as B_RELBUF.  Due to
10188b85f74SMatthew Dillonthe underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
10288b85f74SMatthew Dillonbe interpreted to mean that the buffer is still actually dirty and must be
10388b85f74SMatthew Dillonwritten to its backing store before it can actually be released.  In the case
10488b85f74SMatthew Dillonwhere B_DELWRI is not set, the underlying dirty pages are still properly
10588b85f74SMatthew Dillonmarked as dirty and the buffer can be completely freed without losing that
10688b85f74SMatthew Dillonclean/dirty state information.  ( XXX do we have to check other flags in
10788b85f74SMatthew Dillonregards to this situation ??? ).
10888b85f74SMatthew Dillon.Pp
10988b85f74SMatthew DillonThe kernel reserves a portion of its KVM space to hold VM Buffer's data
11088b85f74SMatthew Dillonmaps.  Even though this is virtual space (since the buffers are mapped
11188b85f74SMatthew Dillonfrom the buffer cache), we cannot make it arbitrarily large because
11288b85f74SMatthew Dilloninstantiated VM Buffers (struct buf's) prevent their underlying pages in the
11388b85f74SMatthew Dillonbuffer cache from being freed.  This can complicate the life of the paging
11488b85f74SMatthew Dillonsystem.
11588b85f74SMatthew Dillon.Pp
1165d70612bSMike Pritchard.\" .Sh SEE ALSO
1175d70612bSMike Pritchard.\" .Xr <fillmein> 9
11888b85f74SMatthew Dillon.Sh HISTORY
11988b85f74SMatthew DillonThe
12088b85f74SMatthew Dillon.Nm
121a169de1eSAlexey Zelkinmanual page was originally written by
122a169de1eSAlexey Zelkin.An Matthew Dillon
123a169de1eSAlexey Zelkinand first appeared in
124a169de1eSAlexey Zelkin.Fx 3.1 ,
1255d70612bSMike PritchardDecember 1998.
126