#
1709ccf9 |
| 29-Mar-2014 |
Martin Matuska <mm@FreeBSD.org> |
Merge head up to r263906.
|
#
4a144410 |
| 16-Mar-2014 |
Robert Watson <rwatson@FreeBSD.org> |
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version
Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h.
MFC after: 3 weeks
show more ...
|
#
5748b897 |
| 19-Feb-2014 |
Martin Matuska <mm@FreeBSD.org> |
Merge head up to r262222 (last merge was incomplete).
|
Revision tags: release/10.0.0 |
|
#
e01ff621 |
| 09-Jan-2014 |
Glen Barber <gjb@FreeBSD.org> |
MFH: tracking commit (head@r260486)
Sponsored by: The FreeBSD Foundation
|
#
d473bac7 |
| 03-Jan-2014 |
Alexander Motin <mav@FreeBSD.org> |
Rework NFS Duplicate Request Cache cleanup logic.
- Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache
Rework NFS Duplicate Request Cache cleanup logic.
- Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time.
Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network.
As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%.
Sponsored by: iXsystems, Inc.
show more ...
|
#
43a213bb |
| 25-Dec-2013 |
Rick Macklem <rmacklem@FreeBSD.org> |
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when del
The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch.
MFC after: 1 week
show more ...
|
#
0c695afb |
| 24-Dec-2013 |
Rick Macklem <rmacklem@FreeBSD.org> |
An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot fi
An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set.
Reported by: jas@cse.yorku.ca Tested by: jas@cse.yorku.ca Reviewed by: asomers MFC after: 1 week
show more ...
|
#
0bfd163f |
| 18-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge head r233826 through r256722.
|
#
1ccca3b5 |
| 10-Oct-2013 |
Alan Somers <asomers@FreeBSD.org> |
IFC @256277
Approved by: ken (mentor)
|
Revision tags: release/9.2.0 |
|
#
ef90af83 |
| 20-Sep-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r255692
Comment out IA32_MISC_ENABLE MSR access - this doesn't exist on AMD. Need to sort out how arch-specific MSRs will be handled.
|
#
47823319 |
| 11-Sep-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r255459
|
#
0fbf163e |
| 06-Sep-2013 |
Mark Murray <markm@FreeBSD.org> |
MFC
|
#
d1d01586 |
| 05-Sep-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge from head
|
#
7008be5b |
| 05-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use o
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
show more ...
|
#
46ed9e49 |
| 04-Sep-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r255209
|
#
93c5875b |
| 14-Aug-2013 |
Rick Macklem <rmacklem@FreeBSD.org> |
Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Sing
Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater.
For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code.
Reported by: wollman Tested by: wollman (earlier version of patch) Submitted by: ivoras (earlier patch) Reviewed by: jhb (earlier version of patch) MFC after: 1 month
show more ...
|
#
40f65a4d |
| 07-Aug-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r254014
|
#
552311f4 |
| 17-Jul-2013 |
Xin LI <delphij@FreeBSD.org> |
IFC @253398
|
#
cfe30d02 |
| 19-Jun-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge fresh head.
|
Revision tags: release/8.4.0 |
|
#
22a72260 |
| 31-May-2013 |
Jeff Roberson <jeff@FreeBSD.org> |
- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with
- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
show more ...
|
#
72ccd4cc |
| 15-May-2013 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Fix typo in comment.
Submitted by: Alex Weber <alexwebr@gmail.com> MFC after: 1 week
|
#
c93c82f4 |
| 29-Apr-2013 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Fix a bug that allows NFS clients to issue READDIR on files.
PR: kern/178016 Security: CVE-2013-3266 Security: FreeBSD-SA-13:05.nfsserver
|
#
d96b98a3 |
| 17-Apr-2013 |
Kenneth D. Merry <ken@FreeBSD.org> |
Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server.
The FHA code keeps a cache of currently active file handles for NFSv2 and v3 reque
Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server.
The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support.
This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching.
The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed.
This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order.
In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code.
This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time.
I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS.
The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another.
sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built.
sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode.
sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs.
sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro.
sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call.
sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code.
Add the malloc flag argument to newnfs_realign().
Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code.
sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type.
sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.
sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc.
There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before.
In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module.
In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused.
The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread.
The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread.
Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc.
Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread.
Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage.
Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons.
nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them.
Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread.
sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the
The shims contain all of the code and definitions that are specific to the NFS servers.
They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables.
sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign().
sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c.
sys/modules/nfsserver/Makefile: Add nfs_fha_old.c.
Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
show more ...
|
#
69e6d7b7 |
| 12-Apr-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
sync from head
|
#
a03fbc7e |
| 09-Mar-2013 |
Martin Matuska <mm@FreeBSD.org> |
MFC @248093
|