#
63692125 |
| 28-Feb-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no b
Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no big loss since reads are usually synchronous) and for NFS writes, and left on for the default bwrite(). (MFC expected prior to 4.3 freeze)
Testing by: mjacob, dillon
show more ...
|
#
ba091d96 |
| 06-Feb-2001 |
Jeroen Ruigrok van der Werven <asmodai@FreeBSD.org> |
Fix typo: teh -> the.
|
#
bcc740c4 |
| 19-Jan-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Do not cluster with B_LOCKED buffers.
This is an odd one. This patch appears to fix a panic related to background bitmap writes (for FFS), though neither Kirk, Ian, or I can figure out how B_CLUSTE
Do not cluster with B_LOCKED buffers.
This is an odd one. This patch appears to fix a panic related to background bitmap writes (for FFS), though neither Kirk, Ian, or I can figure out how B_CLUSTEROK could possibly be set on a bitmap block to cause the clustering code to improperly cluster with a buffer undergoing a background write.
In anycase, the clustering code is very fragile and this patch helps with that, as well as possibly fixing a bug Andre was having.
Suggested by: Ian Dowse <iedowse@maths.tcd.ie> Testing by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
show more ...
|
#
2b6b0df7 |
| 26-Dec-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunde
This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit.
Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things.
NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
show more ...
|
Revision tags: release/4.2.0 |
|
#
936524aa |
| 19-Nov-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory situations prior to now.
The new code is based on the concept that I/O must
Implement a low-memory deadlock solution.
Removed most of the hacks that were trying to deal with low-memory situations prior to now.
The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired.
Code has been added to stall in a low-memory situation prior to a vnode being locked.
Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist.
Implement a number of VFS/BIO fixes
(found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled.
In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS.
Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op.
In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption.
There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported.
Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
show more ...
|
#
e5c5b829 |
| 18-Nov-2000 |
Tor Egge <tegge@FreeBSD.org> |
Don't attempt to cluster write buffers where the VMIO flag isn't set.
|
Revision tags: release/4.1.1_cvs, release/4.1.0, release/3.5.0_cvs |
|
#
a2e7a027 |
| 16-Jun-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Virtualizes & untangles the bioops operations vector.
Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
|
#
9626b608 |
| 05-May-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
show more ...
|
#
87150cb0 |
| 29-Apr-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
s/biowait/bufwait/g
Prodded by: several.
|
#
8177437d |
| 15-Apr-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Complete the bio/buf divorce for all code below devfs::strategy
Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case.
CCD not conve
Complete the bio/buf divorce for all code below devfs::strategy
Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
show more ...
|
#
c244d2de |
| 02-Apr-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while w
Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
show more ...
|
#
e4649cfa |
| 02-Apr-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normal
Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes.
This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks.
Reviewed-by: mckusick
show more ...
|
#
21144e3b |
| 20-Mar-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise t
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
show more ...
|
Revision tags: release/4.0.0_cvs, release/3.4.0_cvs |
|
#
923502ff |
| 29-Oct-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
show more ...
|
#
1b5464ef |
| 29-Sep-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove v_maxio from struct vnode.
Replace it with mnt_iosize_max in struct mount.
Nits from: bde
|
#
552f337f |
| 20-Sep-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Initialize vp->v_maxio to its default in getnetvnode() rather than four different places in vfs_cluster.c
|
Revision tags: release/3.3.0_cvs |
|
#
87f7b9a9 |
| 31-Aug-1999 |
Tor Egge <tegge@FreeBSD.org> |
If integration of a buffer into a cluster write operation fails, release the buffer instead of creating a future deadlock. PR: 12800 Submitted by: dillon
|
#
c3aac50f |
| 28-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
ad8ac923 |
| 08-Jul-1999 |
Kirk McKusick <mckusick@FreeBSD.org> |
These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding h
These changes appear to give us benefits with both small (32MB) and large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt.
* buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems.
* minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code.
* removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely.
* removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal.
* buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c
* VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system.
* removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers.
* slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers.
* addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation.
* Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ).
* removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ).
* removal of extra arguments passed to getnewbuf() that are not necessary.
* missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c
* vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently.
* removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon.
Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
show more ...
|
#
1c9ca585 |
| 04-Jul-1999 |
Kirk McKusick <mckusick@FreeBSD.org> |
The vfs.write_behind sysctl and related code support has been added to allow changes to the filesystem's write_behind behavior. By the default the filesystem aggressively issues write_behind's. Thr
The vfs.write_behind sysctl and related code support has been added to allow changes to the filesystem's write_behind behavior. By the default the filesystem aggressively issues write_behind's. Three values may be specified for vfs.write_behind. 0 disables write_behind, 1 results in historical operation (agressive write_behind), and 2 is an experimental backed-off write_behind. The values of 0 and 1 are recommended. The value of 0 is recommended in conjuction with an increase in the number of NBUF's and the number of dirty buffers allowed (vfs.{lo,hi}dirtybuffers). Note that a value of 0 will radically increase the dirty buffer load on the system. Future work on write_behind behavior will use values 2 and greater for testing purposes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
show more ...
|
#
ddebd879 |
| 29-Jun-1999 |
Peter Wemm <peter@FreeBSD.org> |
Hopefully fix the remaining glitches with the BUF_*() changes. This should (really this time) fix pageout to swap and a couple of clustering cases.
This simplifies BUF_KERNPROC() so that it uncondi
Hopefully fix the remaining glitches with the BUF_*() changes. This should (really this time) fix pageout to swap and a couple of clustering cases.
This simplifies BUF_KERNPROC() so that it unconditionally reassigns the lock owner rather than testing B_ASYNC and having the caller decide when to do the reassign. At present this is required because some places use B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also, vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers rather than the parent walking the cluster_head tailq.
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
show more ...
|
#
67812eac |
| 26-Jun-1999 |
Kirk McKusick <mckusick@FreeBSD.org> |
Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK
Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
show more ...
|
#
4e1b7540 |
| 17-Jun-1999 |
Julian Elischer <julian@FreeBSD.org> |
Reformat comment to match indentation of code around it.
|
#
cd3fe8d0 |
| 16-Jun-1999 |
David Greenman <dg@FreeBSD.org> |
Changed trypbuf to a getpbuf to work around a problem where redundant writes would occur when clustering them - caused by running out of buffers and taking a degenerate code path as a result. It appe
Changed trypbuf to a getpbuf to work around a problem where redundant writes would occur when clustering them - caused by running out of buffers and taking a degenerate code path as a result. It appears that waiting instead for buffers to become available is okay.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Discovered by: Craig A Soules <soules+@andrew.cmu.edu>
show more ...
|
Revision tags: release/3.2.0 |
|
#
4221e284 |
| 03-May-1999 |
Alan Cox <alc@FreeBSD.org> |
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I'v
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas.
The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE.
getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly.
There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE.
Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
show more ...
|