#
dfdcada3 |
| 26-Mar-2008 |
Doug Rabson <dfr@FreeBSD.org> |
Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock.
* Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
show more ...
|
Revision tags: release/7.0.0_cvs, release/7.0.0, release/6.3.0_cvs, release/6.3.0 |
|
#
22db15c0 |
| 13-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
show more ...
|
#
cb05b60a |
| 10-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
show more ...
|
#
cb65c1ee |
| 19-Oct-2007 |
Bruce Evans <bde@FreeBSD.org> |
Implement the async (really, delayed-write) mount option for msdosfs.
This is much simpler than for ffs since there are many fewer places where we need to choose between a delayed write and a sync w
Implement the async (really, delayed-write) mount option for msdosfs.
This is much simpler than for ffs since there are many fewer places where we need to choose between a delayed write and a sync write -- just 5 in msdosfs and more than 30 in ffs.
This is more complete and correct than in ffs. Several places in ffs are are still missing the choice. ffs_update() has a layering violation that breaks callers which want to force a sync update (mainly fsync(2) and O_SYNC write(2)).
However, fsync(2) and O_SYNC write(2) are still more broken than in ffs, since they are broken for default (non-sync non-async) mounts too. Both fail to sync the FAT in all cases, and both fail to sync the directory entry in some cases after losing a race. Async everything is probably safer than the half-baked sync of metadata given by default mounts.
show more ...
|
#
cefb5582 |
| 18-Oct-2007 |
Bruce Evans <bde@FreeBSD.org> |
In msdosfs_settattr(), don't do synchronous updates of the denode (except indirectly for the size pseudo-attribute). If anything deserves a sync update, then it is ids and immutable flags, since the
In msdosfs_settattr(), don't do synchronous updates of the denode (except indirectly for the size pseudo-attribute). If anything deserves a sync update, then it is ids and immutable flags, since these are related to security, but ffs never synced these and msdosfs doesn't support them. (ufs_setattr() only does an update in one case where it is least needed (for timestamps); it did pessimal sync updates for timestamps until 1998/03/08 but was changed for unlogged reasons related to soft updates.)
Now msdosfs calls deupdat() with waitfor == 0, which normally gives a delayed update to disk but always gives a sync update of timestamps in core, while for ffs everything is delayed until the syncer daemon or other activity causes an update (except for timestamps).
This gives a large optimization mainly for things like cp -p, where attribute adjustment could easily triple the number of physical I/O's if it is done synchronously (but cp -p to msdosfs is not as bad as that, since msdosfs doesn't support many attributes so null adjustments are more common, and msdosfs doesn't support ctimes so even if cp doesn't weed out null adjustments they don't become non-null after clobbering the ctime).
show more ...
|
#
c2819440 |
| 01-Sep-2007 |
Bruce Evans <bde@FreeBSD.org> |
Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions can easily block in bread(), and then there was nothing to prevent the static buffer (nambuf_{ptr,len,last_id}) being clobbered b
Fix races in msdosfs_lookup() and msdosfs_readdir(). These functions can easily block in bread(), and then there was nothing to prevent the static buffer (nambuf_{ptr,len,last_id}) being clobbered by another thread.
The effects of the bug seem to have been limited to failed lookups and mangled names in readdir(), since Giant locking provides enough serialization to prevent concurrent calls to the functions that access the buffer. They were very obvious for multiple concurrent tree walks, especially with a small cluster size.
The bug was introduced in msdosfs_conv.c 1.34 and associated changes, and is in all releases starting with 5.2.
The fix is to allocate the buffer as a local variable and pass around pointers to it like "_r" functions in libc do. Stack use from this is large but not too large. This also fixes a memory leak on module unload.
Reviewed by: kib Approved by: re (kensmith)
show more ...
|
#
a4e6807c |
| 07-Aug-2007 |
Bruce Evans <bde@FreeBSD.org> |
In msdosfs_read() and msdosfs_write(), don't check explicitly for (uio_offset < 0) since this can't happen. If this happens, then the general code handles the problem safely (better than before for
In msdosfs_read() and msdosfs_write(), don't check explicitly for (uio_offset < 0) since this can't happen. If this happens, then the general code handles the problem safely (better than before for reading, returning 0 (EOF) instead of the bogus errno EINVAL, and the same as before for writing, returning EFBIG).
In msdosfs_read(), don't check for (uio_resid < 0). msdosfs_write() already didn't check.
In msdosfs_read(), document in a comment our assumptions that the caller passed a valid uio_offset and uio_resid. ffs checks using KASSERT(), and that is enough sanity checking. In the same comment, partly document there is no need to check for the EOVERFLOW case, unlike in ffs where this case can happen at least in theory.
In msdosfs_write(), add a comment about why the checking of (uio_resid == 0) is explicit, unlike in ffs.
In msdosfs_write(), check for impossibly large final offsets before checking if the file size rlimit would be exceeded, so that we don't have an overflow bug in the rlimit check and are consistent with ffs. We now return EFBIG instead of EFBIG plus a SIGXFSZ signal if the final offset would be impossibly large but not so large as to cause overflow. Overflow normally gave the benign behaviour of no signal.
Approved by: re (kensmith) (blanket)
show more ...
|
#
b7837a91 |
| 07-Aug-2007 |
Bruce Evans <bde@FreeBSD.org> |
Fix and update the comments about the effect of the read-only flag on writing. They are still too verbose.
Remove nearby unreachable code for handling symlinks.
Approved by: re (kensmith) (blanket)
|
#
c0f5121c |
| 07-Aug-2007 |
Bruce Evans <bde@FreeBSD.org> |
Fix some style bugs (don't assume that off_t == int64_t; fix some comments; remove some parentheses; fix only a couple of whtespace errors).
Approved by: re (kensmith) (blanket)
|
#
d2bb66ba |
| 07-Aug-2007 |
Bruce Evans <bde@FreeBSD.org> |
Sort includes.
Remove rotted banal comment attached to includes.
Approved by: re (kensmith) (blanket)
|
#
eba34270 |
| 07-Aug-2007 |
Bruce Evans <bde@FreeBSD.org> |
Include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution in <sys/buf.h> and/or <sys/vnode.h>
Approved by: re (kensmith) (blanket)
|
#
6fd81fc7 |
| 07-Aug-2007 |
Bruce Evans <bde@FreeBSD.org> |
Remove unused include(s).
Approved by: re (kensmith) (blanket)
|
#
6b6c5f5e |
| 20-Jul-2007 |
Bruce Evans <bde@FreeBSD.org> |
Implement vfs clustering for msdosfs.
This gives a very large speedup for small block sizes (in my tests, about 5 times for write and 3 times for read with a block size of 512, if clustering is poss
Implement vfs clustering for msdosfs.
This gives a very large speedup for small block sizes (in my tests, about 5 times for write and 3 times for read with a block size of 512, if clustering is possible) and a moderate speedup for the moderatatly large block sizes that should be used on non-small media (4K is the best size in most cases, and the speedup for that is about 1.3 times for write and 1.2 times for read). mmap() should benefit from clustering like read()/write(), but the current implementation of vm only supports clustering (at least for getpages) if the fs block size is >= PAGE SIZE.
msdosfs is now only slightly slower than ffs with soft updates for writing and slightly faster for reading when both use their best block sizes. Writing is slower for msdosfs because of more sync writes. Reading is faster for msdosfs because indirect blocks interfere with clustering in ffs.
The changes in msdosfs_read() and msdosfs_write() are simpler merges of corresponding code in ffs (after fixing some style bugs in ffs). msdosfs_bmap() needs fs-specific code. This implementation loops calling a lower level bmap function to do the hard parts. This is a bit inefficient, but is efficient enough since msdsfs_bmap() is only called when there is physical i/o to do.
Approved by: re (hrs)
show more ...
|
#
d34b0a1b |
| 20-Jul-2007 |
Bruce Evans <bde@FreeBSD.org> |
Clean up before implementing vfs clustering for msdosfs:
In msdosfs_read(), mainly reorder the main loop to the same order as in ffs_read().
In msdosfs_write() and extendfile(), use vfs_bio_clrbuf(
Clean up before implementing vfs clustering for msdosfs:
In msdosfs_read(), mainly reorder the main loop to the same order as in ffs_read().
In msdosfs_write() and extendfile(), use vfs_bio_clrbuf() instead of clrbuf(). I think this just just a bogus optimization, but ffs always does it and msdosfs already did it in one place, and it is what I've tested.
In msdosfs_write(), merge good bits from a comment in ffs_write(), and fix 1 style bug.
In the main comment for msdosfs_pcbmap(), improve wording and catch up with 13 years of changes in the function. This comment belongs in VOP_BMAP.9 but that doesn't exist.
In msdosfs_bmap(), return EFBIG if the requested cluster number is out of bounds instead of blindly truncating it, and fix many style bugs.
Approved by: re (hrs)
show more ...
|
#
32f9753c |
| 12-Jun-2007 |
Robert Watson <rwatson@FreeBSD.org> |
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present.
Eliminate caller-side jai
Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in some cases, move to priv_check() if it was an operation on a thread and no other flags were present.
Eliminate caller-side jail exception checking (also now-unused); jail privilege exception code now goes solely in kern_jail.c.
We can't yet eliminate suser() due to some cases in the KAME code where a privilege check is performed and then used in many different deferred paths. Do, however, move those prototypes to priv.h.
Reviewed by: csjp Obtained from: TrustedBSD Project
show more ...
|
#
10bcafe9 |
| 15-Feb-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic.
Vnode-
Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic.
Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.
VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE.
Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
show more ...
|
#
61ad2e26 |
| 30-Jan-2007 |
Tai-hwa Liang <avatar@FreeBSD.org> |
Fixing compilation bustage by removing references to opt_msdosfs.h.
This auto-generated header file no longer exists since the removal of MSDOSFS_LARGE in sys/conf/options:1.574.
|
#
f458f2a5 |
| 30-Jan-2007 |
Craig Rodrigues <rodrigc@FreeBSD.org> |
Add a "-o large" mount option for msdosfs. Convert compile-time checks for #ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.
Test case provided by Oliver Fromme: truncat
Add a "-o large" mount option for msdosfs. Convert compile-time checks for #ifdef MSDOSFS_LARGE to run-time checks to see if "-o large" was specified.
Test case provided by Oliver Fromme: truncate -s 200G test.img mdconfig -a -t vnode -f test.img -u 9 newfs_msdos -s 419430400 -n 1 /dev/md9 zip250 mount -t msdosfs /dev/md9 /mnt # should fail mount -t msdosfs -o large /dev/md9 /mnt # should succeed
PR: 105964 Requested by: Oliver Fromme <olli lurza secnetix de> Tested by: trhodes MFC after: 2 weeks
show more ...
|
Revision tags: release/6.2.0_cvs, release/6.2.0 |
|
#
1c5cf521 |
| 03-Dec-2006 |
Maxim Konovalov <maxim@FreeBSD.org> |
o Do not leave uninitialized birthtime: in MSDOSFSMNT_LONGNAME set birthtime to FAT CTime (creation time) and in the other cases set birthtime to -1.
o Set ctime to mtime instead of FAT CTime which
o Do not leave uninitialized birthtime: in MSDOSFSMNT_LONGNAME set birthtime to FAT CTime (creation time) and in the other cases set birthtime to -1.
o Set ctime to mtime instead of FAT CTime which has completely different meaning.
PR: kern/106018 Submitted by: Oliver Fromme MFC after: 1 month
show more ...
|
#
acd3428b |
| 06-Nov-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
show more ...
|
#
3c960d93 |
| 24-Oct-2006 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Replace slightly crummy fattime<->timespec conversion functions.
|
Revision tags: release/5.5.0_cvs, release/5.5.0, release/6.1.0_cvs, release/6.1.0 |
|
#
89b0e109 |
| 01-Feb-2006 |
Jeff Roberson <jeff@FreeBSD.org> |
- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. Th
- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately.
MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn
show more ...
|
#
9fc31f8a |
| 23-Jan-2006 |
Tom Rhodes <trhodes@FreeBSD.org> |
Update incorrect comments here, there should not be a call to panic() over fs corruption.
Discussed with: alfred, phk
|
#
710a9acc |
| 22-Jan-2006 |
Max Khon <fjoe@FreeBSD.org> |
Do not assume that `char direntry::deExtension[3]' starts right after `char direntry::deName[8]' and access deExtension[] explicitly.
Found by: Coverity Prevent(tm) CID: 350, 351, 352
|
Revision tags: release/6.0.0_cvs, release/6.0.0, release/5.4.0_cvs, release/5.4.0 |
|
#
7ce296cf |
| 27-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove debug printout of major/minor numbers, print name instead.
|