xref: /freebsd/share/man/man9/buf.9 (revision 96a2e036b7095e819cc536b6145950b54dffb929)
188b85f74SMatthew Dillon.\" Copyright (c) 1998
288b85f74SMatthew Dillon.\"	The Regents of the University of California.  All rights reserved.
388b85f74SMatthew Dillon.\"
488b85f74SMatthew Dillon.\" Redistribution and use in source and binary forms, with or without
588b85f74SMatthew Dillon.\" modification, are permitted provided that the following conditions
688b85f74SMatthew Dillon.\" are met:
788b85f74SMatthew Dillon.\" 1. Redistributions of source code must retain the above copyright
888b85f74SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer.
988b85f74SMatthew Dillon.\" 2. Redistributions in binary form must reproduce the above copyright
1088b85f74SMatthew Dillon.\"    notice, this list of conditions and the following disclaimer in the
1188b85f74SMatthew Dillon.\"    documentation and/or other materials provided with the distribution.
12*dda5b397SEitan Adler.\" 3. Neither the name of the University nor the names of its contributors
1388b85f74SMatthew Dillon.\"    may be used to endorse or promote products derived from this software
1488b85f74SMatthew Dillon.\"    without specific prior written permission.
1588b85f74SMatthew Dillon.\"
1688b85f74SMatthew Dillon.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
1788b85f74SMatthew Dillon.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
1888b85f74SMatthew Dillon.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
1988b85f74SMatthew Dillon.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
2088b85f74SMatthew Dillon.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
2188b85f74SMatthew Dillon.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
2288b85f74SMatthew Dillon.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
2388b85f74SMatthew Dillon.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
2488b85f74SMatthew Dillon.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
2588b85f74SMatthew Dillon.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
2688b85f74SMatthew Dillon.\" SUCH DAMAGE.
2788b85f74SMatthew Dillon.\"
2888b85f74SMatthew Dillon.Dd December 22, 1998
2988b85f74SMatthew Dillon.Dt BUF 9
3088b85f74SMatthew Dillon.Os
3188b85f74SMatthew Dillon.Sh NAME
329264a4f3SChad David.Nm buf
33eb083802SRuslan Ermilov.Nd "kernel buffer I/O scheme used in FreeBSD VM system"
3488b85f74SMatthew Dillon.Sh DESCRIPTION
3588b85f74SMatthew DillonThe kernel implements a KVM abstraction of the buffer cache which allows it
3688b85f74SMatthew Dillonto map potentially disparate vm_page's into contiguous KVM for use by
373a858f37SHiten Pandya(mainly file system) devices and device I/O.
383a858f37SHiten PandyaThis abstraction supports
3988b85f74SMatthew Dillonblock sizes from DEV_BSIZE (usually 512) to upwards of several pages or more.
4088b85f74SMatthew DillonIt also supports a relatively primitive byte-granular valid range and dirty
413a858f37SHiten Pandyarange currently hardcoded for use by NFS.
423a858f37SHiten PandyaThe code implementing the
43a169de1eSAlexey ZelkinVM Buffer abstraction is mostly concentrated in
44a169de1eSAlexey Zelkin.Pa /usr/src/sys/kern/vfs_bio.c .
4588b85f74SMatthew Dillon.Pp
4688b85f74SMatthew DillonOne of the most important things to remember when dealing with buffer pointers
4788b85f74SMatthew Dillon(struct buf) is that the underlying pages are mapped directly from the buffer
483a858f37SHiten Pandyacache.
493a858f37SHiten PandyaNo data copying occurs in the scheme proper, though some file systems
503a858f37SHiten Pandyasuch as UFS do have to copy a little when dealing with file fragments.
513a858f37SHiten PandyaThe second most important thing to remember is that due to the underlying page
5288b85f74SMatthew Dillonmapping, the b_data base pointer in a buf is always *page* aligned, not
533a858f37SHiten Pandya*block* aligned.
543a858f37SHiten PandyaWhen you have a VM buffer representing some b_offset and
5588b85f74SMatthew Dillonb_size, the actual start of the buffer is (b_data + (b_offset & PAGE_MASK))
563a858f37SHiten Pandyaand not just b_data.
573a858f37SHiten PandyaFinally, the VM system's core buffer cache supports
583a858f37SHiten Pandyavalid and dirty bits (m->valid, m->dirty) for pages in DEV_BSIZE chunks.
593a858f37SHiten PandyaThus
6088b85f74SMatthew Dillona platform with a hardware page size of 4096 bytes has 8 valid and 8 dirty
613a858f37SHiten Pandyabits.
623a858f37SHiten PandyaThese bits are generally set and cleared in groups based on the device
633a858f37SHiten Pandyablock size of the device backing the page.
643a858f37SHiten PandyaComplete page's worth are often
655203edcdSRuslan Ermilovreferred to using the VM_PAGE_BITS_ALL bitmask (i.e., 0xFF if the hardware page
6688b85f74SMatthew Dillonsize is 4096).
6788b85f74SMatthew Dillon.Pp
6888b85f74SMatthew DillonVM buffers also keep track of a byte-granular dirty range and valid range.
693a858f37SHiten PandyaThis feature is normally only used by the NFS subsystem.
70423ac680SRuslan ErmilovI am not sure why it
7188b85f74SMatthew Dillonis used at all, actually, since we have DEV_BSIZE valid/dirty granularity
723a858f37SHiten Pandyawithin the VM buffer.
733a858f37SHiten PandyaIf a buffer dirty operation creates a 'hole',
743a858f37SHiten Pandyathe dirty range will extend to cover the hole.
753a858f37SHiten PandyaIf a buffer validation
7688b85f74SMatthew Dillonoperation creates a 'hole' the byte-granular valid range is left alone and
773a858f37SHiten Pandyawill not take into account the new extension.
783a858f37SHiten PandyaThus the whole byte-granular
7988b85f74SMatthew Dillonabstraction is considered a bad hack and it would be nice if we could get rid
8088b85f74SMatthew Dillonof it completely.
8188b85f74SMatthew Dillon.Pp
8288b85f74SMatthew DillonA VM buffer is capable of mapping the underlying VM cache pages into KVM in
8388b85f74SMatthew Dillonorder to allow the kernel to directly manipulate the data associated with
843a858f37SHiten Pandyathe (vnode,b_offset,b_size).
853a858f37SHiten PandyaThe kernel typically unmaps VM buffers the moment
8688b85f74SMatthew Dillonthey are no longer needed but often keeps the 'struct buf' structure
8788b85f74SMatthew Dilloninstantiated and even bp->b_pages array instantiated despite having unmapped
883a858f37SHiten Pandyathem from KVM.
893a858f37SHiten PandyaIf a page making up a VM buffer is about to undergo I/O, the
9088b85f74SMatthew Dillonsystem typically unmaps it from KVM and replaces the page in the b_pages[]
913a858f37SHiten Pandyaarray with a place-marker called bogus_page.
923a858f37SHiten PandyaThe place-marker forces any kernel
9388b85f74SMatthew Dillonsubsystems referencing the associated struct buf to re-lookup the associated
943a858f37SHiten Pandyapage.
953a858f37SHiten PandyaI believe the place-marker hack is used to allow sophisticated devices
9688b85f74SMatthew Dillonsuch as file system devices to remap underlying pages in order to deal with,
97b82af3f5SMike Pritchardfor example, re-mapping a file fragment into a file block.
9888b85f74SMatthew Dillon.Pp
993a858f37SHiten PandyaVM buffers are used to track I/O operations within the kernel.
1003a858f37SHiten PandyaUnfortunately,
10188b85f74SMatthew Dillonthe I/O implementation is also somewhat of a hack because the kernel wants
10288b85f74SMatthew Dillonto clear the dirty bit on the underlying pages the moment it queues the I/O
1033a858f37SHiten Pandyato the VFS device, not when the physical I/O is actually initiated.
1043a858f37SHiten PandyaThis
10588b85f74SMatthew Dilloncan create confusion within file system devices that use delayed-writes because
1063a858f37SHiten Pandyayou wind up with pages marked clean that are actually still dirty.
1073a858f37SHiten PandyaIf not
1085203edcdSRuslan Ermilovtreated carefully, these pages could be thrown away!
1095203edcdSRuslan ErmilovIndeed, a number of
110a169de1eSAlexey Zelkinserious bugs related to this hack were not fixed until the 2.2.8/3.0 release.
1115203edcdSRuslan ErmilovThe kernel uses an instantiated VM buffer (i.e., struct buf) to place-mark pages
1123a858f37SHiten Pandyain this special state.
1133a858f37SHiten PandyaThe buffer is typically flagged B_DELWRI.
1143a858f37SHiten PandyaWhen a
1153a858f37SHiten Pandyadevice no longer needs a buffer it typically flags it as B_RELBUF.
1163a858f37SHiten PandyaDue to
11788b85f74SMatthew Dillonthe underlying pages being marked clean, the B_DELWRI|B_RELBUF combination must
11888b85f74SMatthew Dillonbe interpreted to mean that the buffer is still actually dirty and must be
1193a858f37SHiten Pandyawritten to its backing store before it can actually be released.
1203a858f37SHiten PandyaIn the case
12188b85f74SMatthew Dillonwhere B_DELWRI is not set, the underlying dirty pages are still properly
12288b85f74SMatthew Dillonmarked as dirty and the buffer can be completely freed without losing that
123d5a8819cSHiten Pandyaclean/dirty state information.
124bf7f20c2SRuslan Ermilov(XXX do we have to check other flags in
125bf7f20c2SRuslan Ermilovregards to this situation ???)
12688b85f74SMatthew Dillon.Pp
12788b85f74SMatthew DillonThe kernel reserves a portion of its KVM space to hold VM Buffer's data
1283a858f37SHiten Pandyamaps.
1293a858f37SHiten PandyaEven though this is virtual space (since the buffers are mapped
13088b85f74SMatthew Dillonfrom the buffer cache), we cannot make it arbitrarily large because
13188b85f74SMatthew Dilloninstantiated VM Buffers (struct buf's) prevent their underlying pages in the
1323a858f37SHiten Pandyabuffer cache from being freed.
1333a858f37SHiten PandyaThis can complicate the life of the paging
13488b85f74SMatthew Dillonsystem.
13588b85f74SMatthew Dillon.Sh HISTORY
13688b85f74SMatthew DillonThe
13788b85f74SMatthew Dillon.Nm
138a169de1eSAlexey Zelkinmanual page was originally written by
139a169de1eSAlexey Zelkin.An Matthew Dillon
140a169de1eSAlexey Zelkinand first appeared in
141a169de1eSAlexey Zelkin.Fx 3.1 ,
1425d70612bSMike PritchardDecember 1998.
143