History log of /linux/fs/xfs/xfs_exchrange.c (Results 1 – 25 of 28)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.12-rc2, v6.12-rc1
# 52c996d3 27-Sep-2024 Arnaldo Carvalho de Melo <acme@redhat.com>

Merge remote-tracking branch 'torvalds/master' into perf-tools

To pick up changes in other trees that may affect perf, such as libbpf
and in general the header files that perf has copies of, so that

Merge remote-tracking branch 'torvalds/master' into perf-tools

To pick up changes in other trees that may affect perf, such as libbpf
and in general the header files that perf has copies of, so that we can
do the sync with the kernel sources.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# c8d430db 06-Oct-2024 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not chang

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
down -- hopefully only temporarly

show more ...


# 0c436dfe 02-Oct-2024 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A bunch of fixes here that came in during the merge window and t

Merge tag 'asoc-fix-v6.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A bunch of fixes here that came in during the merge window and the first
week of release, plus some new quirks and device IDs. There's nothing
major here, it's a bit bigger than it might've been due to there being
no fixes sent during the merge window due to your vacation.

show more ...


# 2cd86f02 01-Oct-2024 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten L

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

show more ...


# 3a39d672 27-Sep-2024 Paolo Abeni <pabeni@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts and no adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 36ec807b 20-Sep-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.12 merge window.


Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4
# 50470d38 13-Aug-2024 Andrii Nakryiko <andrii@kernel.org>

Merge remote-tracking branch 'vfs/stable-struct_fd'

Merge Al Viro's struct fd refactorings.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>


Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1
# 3daee2e4 16-Jul-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.10' into next

Sync up with mainline to bring in device_for_each_child_node_scoped()
and other newer APIs.


# f8ffbc36 23-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor

Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"

* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 1da91ea8 31-May-2024 Al Viro <viro@zeniv.linux.org.uk>

introduce fd_file(), convert all accessors to it.

For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are ve

introduce fd_file(), convert all accessors to it.

For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).

NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).

[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


# 8751b21a 19-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'xfs-6.12-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
"New code:

- Introduce new ioctls to exchange contents of two files.

The

Merge tag 'xfs-6.12-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
"New code:

- Introduce new ioctls to exchange contents of two files.

The first ioctl does the preparation work to exchange the contents
of two files while the second ioctl performs the actual exchange if
the target file has not been changed since a given sampling point.

Fixes:

- Fix bugs associated with calculating the maximum range of realtime
extents to scan for free space.

- Copy keys instead of records when resizing the incore BMBT root
block.

- Do not report FITRIMming more bytes than possibly exist in the
filesystem.

- Modify xfs_fs.h to prevent C++ compilation errors.

- Do not over eagerly free post-EOF speculative preallocation.

- Ensure st_blocks never goes to zero during COW writes

Cleanups/refactors:

- Use Xarray to hold per-AG data instead of a Radix tree.

- Cleanups to:
- realtime bitmap
- inode allocator
- quota
- inode rooted btree code"

* tag 'xfs-6.12-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (61 commits)
xfs: ensure st_blocks never goes to zero during COW writes
xfs: use xas_for_each_marked in xfs_reclaim_inodes_count
xfs: convert perag lookup to xarray
xfs: simplify tagged perag iteration
xfs: move the tagged perag lookup helpers to xfs_icache.c
xfs: use kfree_rcu_mightsleep to free the perag structures
xfs: use LIST_HEAD() to simplify code
xfs: Remove duplicate xfs_trans_priv.h header
xfs: remove unnecessary check
xfs: Use xfs set and clear mp state helpers
xfs: reclaim speculative preallocations for append only files
xfs: simplify extent lookup in xfs_can_free_eofblocks
xfs: check XFS_EOFBLOCKS_RELEASED earlier in xfs_release_eofblocks
xfs: only free posteof blocks on first close
xfs: don't free post-EOF blocks on read close
xfs: skip all of xfs_file_release when shut down
xfs: don't bother returning errors from xfs_file_release
xfs: refactor f_op->release handling
xfs: remove the i_mode check in xfs_release
xfs: standardize the btree maxrecs function parameters
...

show more ...


# 41c38bf0 03-Sep-2024 Chandan Babu R <chandanbabu@kernel.org>

Merge tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: atomic file content commits [v31.1 1/8]

This series cre

Merge tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.12-mergeA

xfs: atomic file content commits [v31.1 1/8]

This series creates XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE ioctls
to perform the exchange only if the target file has not been changed
since a given sampling point.

This new functionality uses the mechanism underlying EXCHANGE_RANGE to
stage and commit file updates such that reader programs will see either
the old contents or the new contents in their entirety, with no chance
of torn writes. A successful call completion guarantees that the new
contents will be seen even if the system fails. The pair of ioctls
allows userspace to perform what amounts to a compare and exchange
operation on entire file contents.

Note that there are ongoing arguments in the community about how best to
implement some sort of file data write counter that nfsd could also use
to signal invalidations to clients. Until such a thing is implemented,
this patch will rely on ctime/mtime updates.

Here are the proposed manual pages:

IOCTL-XFS-COMMIT-RANGE(2) System Calls ManualIOCTL-XFS-COMMIT-RANGE(2)

NAME
ioctl_xfs_start_commit - prepare to exchange the contents of
two files ioctl_xfs_commit_range - conditionally exchange the
contents of parts of two files

SYNOPSIS
#include <sys/ioctl.h>
#include <xfs/xfs_fs.h>

int ioctl(int file2_fd, XFS_IOC_START_COMMIT, struct xfs_com‐
mit_range *arg);

int ioctl(int file2_fd, XFS_IOC_COMMIT_RANGE, struct xfs_com‐
mit_range *arg);

DESCRIPTION
Given a range of bytes in a first file file1_fd and a second
range of bytes in a second file file2_fd, this ioctl(2) ex‐
changes the contents of the two ranges if file2_fd passes cer‐
tain freshness criteria.

Before exchanging the contents, the program must call the
XFS_IOC_START_COMMIT ioctl to sample freshness data for
file2_fd. If the sampled metadata does not match the file
metadata at commit time, XFS_IOC_COMMIT_RANGE will return
EBUSY.

Exchanges are atomic with regards to concurrent file opera‐
tions. Implementations must guarantee that readers see either
the old contents or the new contents in their entirety, even if
the system fails.

The system call parameters are conveyed in structures of the
following form:

struct xfs_commit_range {
__s32 file1_fd;
__u32 pad;
__u64 file1_offset;
__u64 file2_offset;
__u64 length;
__u64 flags;
__u64 file2_freshness[5];
};

The field pad must be zero.

The fields file1_fd, file1_offset, and length define the first
range of bytes to be exchanged.

The fields file2_fd, file2_offset, and length define the second
range of bytes to be exchanged.

The field file2_freshness is an opaque field whose contents are
determined by the kernel. These file attributes are used to
confirm that file2_fd has not changed by another thread since
the current thread began staging its own update.

Both files must be from the same filesystem mount. If the two
file descriptors represent the same file, the byte ranges must
not overlap. Most disk-based filesystems require that the
starts of both ranges must be aligned to the file block size.
If this is the case, the ends of the ranges must also be so
aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.

The field flags control the behavior of the exchange operation.

XFS_EXCHANGE_RANGE_TO_EOF
Ignore the length parameter. All bytes in file1_fd
from file1_offset to EOF are moved to file2_fd, and
file2's size is set to (file2_offset+(file1_length-
file1_offset)). Meanwhile, all bytes in file2 from
file2_offset to EOF are moved to file1 and file1's
size is set to (file1_offset+(file2_length-
file2_offset)).

XFS_EXCHANGE_RANGE_DSYNC
Ensure that all modified in-core data in both file
ranges and all metadata updates pertaining to the
exchange operation are flushed to persistent storage
before the call returns. Opening either file de‐
scriptor with O_SYNC or O_DSYNC will have the same
effect.

XFS_EXCHANGE_RANGE_FILE1_WRITTEN
Only exchange sub-ranges of file1_fd that are known
to contain data written by application software.
Each sub-range may be expanded (both upwards and
downwards) to align with the file allocation unit.
For files on the data device, this is one filesystem
block. For files on the realtime device, this is
the realtime extent size. This facility can be used
to implement fast atomic scatter-gather writes of
any complexity for software-defined storage targets
if all writes are aligned to the file allocation
unit.

XFS_EXCHANGE_RANGE_DRY_RUN
Check the parameters and the feasibility of the op‐
eration, but do not change anything.

RETURN VALUE
On error, -1 is returned, and errno is set to indicate the er‐
ror.

ERRORS
Error codes can be one of, but are not limited to, the follow‐
ing:

EBADF file1_fd is not open for reading and writing or is open
for append-only writes; or file2_fd is not open for
reading and writing or is open for append-only writes.

EBUSY The file2 inode number and timestamps supplied do not
match file2_fd.

EINVAL The parameters are not correct for these files. This
error can also appear if either file descriptor repre‐
sents a device, FIFO, or socket. Disk filesystems gen‐
erally require the offset and length arguments to be
aligned to the fundamental block sizes of both files.

EIO An I/O error occurred.

EISDIR One of the files is a directory.

ENOMEM The kernel was unable to allocate sufficient memory to
perform the operation.

ENOSPC There is not enough free space in the filesystem ex‐
change the contents safely.

EOPNOTSUPP
The filesystem does not support exchanging bytes between
the two files.

EPERM file1_fd or file2_fd are immutable.

ETXTBSY
One of the files is a swap file.

EUCLEAN
The filesystem is corrupt.

EXDEV file1_fd and file2_fd are not on the same mounted
filesystem.

CONFORMING TO
This API is XFS-specific.

USE CASES
Several use cases are imagined for this system call. Coordina‐
tion between multiple threads is performed by the kernel.

The first is a filesystem defragmenter, which copies the con‐
tents of a file into another file and wishes to exchange the
space mappings of the two files, provided that the original
file has not changed.

An example program might look like this:

int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct stat sb;
struct xfs_commit_range args = {
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};

/* gather file2's freshness information */
ioctl(fd, XFS_IOC_START_COMMIT, &args);
fstat(fd, &sb);

/* make a fresh copy of the file with terrible alignment to avoid reflink */
clone_file_range(fd, NULL, temp_fd, NULL, 1, 0);
clone_file_range(fd, NULL, temp_fd, NULL, sb.st_size - 1, 0);

/* commit the entire update */
args.file1_fd = temp_fd;
ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
if (ret && errno == EBUSY)
printf("file changed while defrag was underway
");

The second is a data storage program that wants to commit non-
contiguous updates to a file atomically. This program cannot
coordinate updates to the file and therefore relies on the ker‐
nel to reject the COMMIT_RANGE command if the file has been up‐
dated by someone else. This can be done by creating a tempo‐
rary file, calling FICLONE(2) to share the contents, and stag‐
ing the updates into the temporary file. The FULL_FILES flag
is recommended for this purpose. The temporary file can be
deleted or punched out afterwards.

An example program might look like this:

int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct xfs_commit_range args = {
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};

/* gather file2's freshness information */
ioctl(fd, XFS_IOC_START_COMMIT, &args);

ioctl(temp_fd, FICLONE, fd);

/* append 1MB of records */
lseek(temp_fd, 0, SEEK_END);
write(temp_fd, data1, 1000000);

/* update record index */
pwrite(temp_fd, data1, 600, 98765);
pwrite(temp_fd, data2, 320, 54321);
pwrite(temp_fd, data2, 15, 0);

/* commit the entire update */
args.file1_fd = temp_fd;
ret = ioctl(fd, XFS_IOC_COMMIT_RANGE, &args);
if (ret && errno == EBUSY)
printf("file changed before commit; will roll back
");

NOTES
Some filesystems may limit the amount of data or the number of
extents that can be exchanged in a single call.

SEE ALSO
ioctl(2)

XFS 2024-02-18 IOCTL-XFS-COMMIT-RANGE(2)

With a bit of luck, this should all go splendidly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'atomic-file-commits-6.12_2024-09-02' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: introduce new file range commit ioctls

show more ...


# 398597c3 31-Aug-2024 Darrick J. Wong <djwong@kernel.org>

xfs: introduce new file range commit ioctls

This patch introduces two more new ioctls to manage atomic updates to
file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE. The
commit mechanis

xfs: introduce new file range commit ioctls

This patch introduces two more new ioctls to manage atomic updates to
file contents -- XFS_IOC_START_COMMIT and XFS_IOC_COMMIT_RANGE. The
commit mechanism here is exactly the same as what XFS_IOC_EXCHANGE_RANGE
does, but with the additional requirement that file2 cannot have changed
since some sampling point. The start-commit ioctl performs the sampling
of file attributes.

Note: This patch currently samples i_ctime during START_COMMIT and
checks that it hasn't changed during COMMIT_RANGE. This isn't entirely
safe in kernels prior to 6.12 because ctime only had coarse grained
granularity and very fast updates could collide with a COMMIT_RANGE.
With the multi-granularity ctime introduced by Jeff Layton, it's now
possible to update ctime such that this does not happen.

It is critical, then, that this patch must not be backported to any
kernel that does not support fine-grained file change timestamps.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

show more ...


# afeea275 04-Jul-2024 Maxime Ripard <mripard@kernel.org>

Merge drm-misc-next-2024-07-04 into drm-misc-next-fixes

Let's start the drm-misc-next-fixes cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


# d754ed28 19-Jun-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync to v6.10-rc3.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 89aa02ed 12-Jun-2024 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-xe-next

Needed to get tracing cleanup and add mmio tracing series.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 92815da4 12-Jun-2024 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Merge remote-tracking branch 'drm-misc/drm-misc-next' into HEAD

Merge drm-misc-next tree into the msm-next tree in order to be able to
use HDMI connector framework for the MSM HDMI driver.


# 375c4d15 27-May-2024 Maxime Ripard <mripard@kernel.org>

Merge drm/drm-next into drm-misc-next

Let's start the new release cycle.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


# 0c8ea05e 04-Jul-2024 Peter Zijlstra <peterz@infradead.org>

Merge branch 'tip/x86/cpu'

The Lunarlake patches rely on the new VFM stuff.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>


# 594ce0b8 10-Jun-2024 Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Merge topic branches 'clkdev' and 'fixes' into for-linus


# f73a058b 28-May-2024 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

v6.10-rc1 is released, forward from v6.9

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>


Revision tags: v6.10-rc1
# 119d1b8a 20-May-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
"Online repair feature continues to be expanded. Also, we now support
delayed all

Merge tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Chandan Babu:
"Online repair feature continues to be expanded. Also, we now support
delayed allocation for realtime devices which have an extent size that
is equal to filesystem's block size.

New code:

- Introduce Parent Pointer extended attribute for inodes

- Bring back delalloc support for realtime devices which have an
extent size that is equal to filesystem's block size

- Improve performance of log incompat feature handling

Online Repair:

- Implement atomic file content exchanges i.e. exchange ranges of
bytes between two files atomically

- Create temporary files to repair file-based metadata. This uses
atomic file content exchange facility to swap file fork mappings
between the temporary file and the metadata inode

- Allow callers of directory/xattr code to set an explicit owner
number to be written into the header fields of any new blocks that
are created. This is required to avoid walking every block of the
new structure and modify their ownership during online repair

- Repair more data structures:
- Extended attributes
- Inode unlinked state
- Directories
- Symbolic links
- AGI's unlinked inode list
- Parent pointers

- Move Orphan files to lost and found directory

- Fixes for Inode repair functionality

- Introduce a new sub-AG FITRIM implementation to reduce the duration
for which the AGF lock is held

- Updates for the design documentation

- Use Parent Pointers to assist in checking directories, parent
pointers, extended attributes, and link counts

Fixes:

- Prevent userspace from reading invalid file data due to incorrect.
updation of file size when performing a non-atomic clone operation

- Minor fixes to online repair

- Fix confusing return values from xfs_bmapi_write()

- Fix an out of bounds access due to incorrect h_size during log
recovery

- Defer upgrading the extent counters in xfs_reflink_end_cow_extent()
until we know we are going to modify the extent mapping

- Remove racy access to if_bytes check in
xfs_reflink_end_cow_extent()

- Fix sparse warnings

Cleanups:

- Hold inode locks on all files involved in a rename until the
completion of the operation. This is in preparation for the parent
pointers patchset where parent pointers are applied in a separate
chained update from the actual directory update

- Compile out v4 support when disabled

- Cleanup xfs_extent_busy_clear()

- Remove unused flags and fields from struct xfs_da_args

- Remove definitions of unused functions

- Improve extended attribute validation

- Add higher level directory operations helpers to remove duplication
of code

- Cleanup quota (un)reservation interfaces"

* tag 'xfs-6.10-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (221 commits)
xfs: simplify iext overflow checking and upgrade
xfs: remove a racy if_bytes check in xfs_reflink_end_cow_extent
xfs: upgrade the extent counters in xfs_reflink_end_cow_extent later
xfs: xfs_quota_unreserve_blkres can't fail
xfs: consolidate the xfs_quota_reserve_blkres definitions
xfs: clean up buffer allocation in xlog_do_recovery_pass
xfs: fix log recovery buffer allocation for the legacy h_size fixup
xfs: widen flags argument to the xfs_iflags_* helpers
xfs: minor cleanups of xfs_attr3_rmt_blocks
xfs: create a helper to compute the blockcount of a max sized remote value
xfs: turn XFS_ATTR3_RMT_BUF_SPACE into a function
xfs: use unsigned ints for non-negative quantities in xfs_attr_remote.c
xfs: do not allocate the entire delalloc extent in xfs_bmapi_write
xfs: fix xfs_bmap_add_extent_delay_real for partial conversions
xfs: remove the xfs_iext_peek_prev_extent call in xfs_bmapi_allocate
xfs: pass the actual offset and len to allocate to xfs_bmapi_allocate
xfs: don't open code XFS_FILBLKS_MIN in xfs_bmapi_write
xfs: lift a xfs_valid_startblock into xfs_bmapi_allocate
xfs: remove the unusued tmp_logflags variable in xfs_bmapi_allocate
xfs: fix error returns from xfs_bmapi_write
...

show more ...


Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5
# 22d5a8e5 16-Apr-2024 Chandan Babu R <chandanbabu@kernel.org>

Merge tag 'atomic-file-updates-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA

xfs: atomic file content exchanges

This series creates a new

Merge tag 'atomic-file-updates-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-6.10-mergeA

xfs: atomic file content exchanges

This series creates a new XFS_IOC_EXCHANGE_RANGE ioctl to exchange
ranges of bytes between two files atomically.

This new functionality enables data storage programs to stage and commit
file updates such that reader programs will see either the old contents
or the new contents in their entirety, with no chance of torn writes. A
successful call completion guarantees that the new contents will be seen
even if the system fails.

The ability to exchange file fork mappings between files in this manner
is critical to supporting online filesystem repair, which is built upon
the strategy of constructing a clean copy of a damaged structure and
committing the new structure into the metadata file atomically. The
ioctls exist to facilitate testing of the new functionality and to
enable future application program designs.

User programs will be able to update files atomically by opening an
O_TMPFILE, reflinking the source file to it, making whatever updates
they want to make, and exchange the relevant ranges of the temp file
with the original file. If the updates are aligned with the file block
size, a new (since v2) flag provides for exchanging only the written
areas. Note that application software must quiesce writes to the file
while it stages an atomic update. This will be addressed by a
subsequent series.

This mechanism solves the clunkiness of two existing atomic file update
mechanisms: for O_TRUNC + rewrite, this eliminates the brief period
where other programs can see an empty file. For create tempfile +
rename, the need to copy file attributes and extended attributes for
each file update is eliminated.

However, this method introduces its own awkwardness -- any program
initiating an exchange now needs to have a way to signal to other
programs that the file contents have changed. For file access mediated
via read and write, fanotify or inotify are probably sufficient. For
mmaped files, that may not be fast enough.

Here is the proposed manual page:

IOCTL-XFS-EXCHANGE-RANGE(2System Calls ManuIOCTL-XFS-EXCHANGE-RANGE(2)

NAME
ioctl_xfs_exchange_range - exchange the contents of parts of
two files

SYNOPSIS
#include <sys/ioctl.h>
#include <xfs/xfs_fs.h>

int ioctl(int file2_fd, XFS_IOC_EXCHANGE_RANGE, struct xfs_ex‐
change_range *arg);

DESCRIPTION
Given a range of bytes in a first file file1_fd and a second
range of bytes in a second file file2_fd, this ioctl(2) ex‐
changes the contents of the two ranges.

Exchanges are atomic with regards to concurrent file opera‐
tions. Implementations must guarantee that readers see either
the old contents or the new contents in their entirety, even if
the system fails.

The system call parameters are conveyed in structures of the
following form:

struct xfs_exchange_range {
__s32 file1_fd;
__u32 pad;
__u64 file1_offset;
__u64 file2_offset;
__u64 length;
__u64 flags;
};

The field pad must be zero.

The fields file1_fd, file1_offset, and length define the first
range of bytes to be exchanged.

The fields file2_fd, file2_offset, and length define the second
range of bytes to be exchanged.

Both files must be from the same filesystem mount. If the two
file descriptors represent the same file, the byte ranges must
not overlap. Most disk-based filesystems require that the
starts of both ranges must be aligned to the file block size.
If this is the case, the ends of the ranges must also be so
aligned unless the XFS_EXCHANGE_RANGE_TO_EOF flag is set.

The field flags control the behavior of the exchange operation.

XFS_EXCHANGE_RANGE_TO_EOF
Ignore the length parameter. All bytes in file1_fd
from file1_offset to EOF are moved to file2_fd, and
file2's size is set to (file2_offset+(file1_length-
file1_offset)). Meanwhile, all bytes in file2 from
file2_offset to EOF are moved to file1 and file1's
size is set to (file1_offset+(file2_length-
file2_offset)).

XFS_EXCHANGE_RANGE_DSYNC
Ensure that all modified in-core data in both file
ranges and all metadata updates pertaining to the
exchange operation are flushed to persistent storage
before the call returns. Opening either file de‐
scriptor with O_SYNC or O_DSYNC will have the same
effect.

XFS_EXCHANGE_RANGE_FILE1_WRITTEN
Only exchange sub-ranges of file1_fd that are known
to contain data written by application software.
Each sub-range may be expanded (both upwards and
downwards) to align with the file allocation unit.
For files on the data device, this is one filesystem
block. For files on the realtime device, this is
the realtime extent size. This facility can be used
to implement fast atomic scatter-gather writes of
any complexity for software-defined storage targets
if all writes are aligned to the file allocation
unit.

XFS_EXCHANGE_RANGE_DRY_RUN
Check the parameters and the feasibility of the op‐
eration, but do not change anything.

RETURN VALUE
On error, -1 is returned, and errno is set to indicate the er‐
ror.

ERRORS
Error codes can be one of, but are not limited to, the follow‐
ing:

EBADF file1_fd is not open for reading and writing or is open
for append-only writes; or file2_fd is not open for
reading and writing or is open for append-only writes.

EINVAL The parameters are not correct for these files. This
error can also appear if either file descriptor repre‐
sents a device, FIFO, or socket. Disk filesystems gen‐
erally require the offset and length arguments to be
aligned to the fundamental block sizes of both files.

EIO An I/O error occurred.

EISDIR One of the files is a directory.

ENOMEM The kernel was unable to allocate sufficient memory to
perform the operation.

ENOSPC There is not enough free space in the filesystem ex‐
change the contents safely.

EOPNOTSUPP
The filesystem does not support exchanging bytes between
the two files.

EPERM file1_fd or file2_fd are immutable.

ETXTBSY
One of the files is a swap file.

EUCLEAN
The filesystem is corrupt.

EXDEV file1_fd and file2_fd are not on the same mounted
filesystem.

CONFORMING TO
This API is XFS-specific.

USE CASES
Several use cases are imagined for this system call. In all
cases, application software must coordinate updates to the file
because the exchange is performed unconditionally.

The first is a data storage program that wants to commit non-
contiguous updates to a file atomically and coordinates write
access to that file. This can be done by creating a temporary
file, calling FICLONE(2) to share the contents, and staging the
updates into the temporary file. The FULL_FILES flag is recom‐
mended for this purpose. The temporary file can be deleted or
punched out afterwards.

An example program might look like this:

int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);

ioctl(temp_fd, FICLONE, fd);

/* append 1MB of records */
lseek(temp_fd, 0, SEEK_END);
write(temp_fd, data1, 1000000);

/* update record index */
pwrite(temp_fd, data1, 600, 98765);
pwrite(temp_fd, data2, 320, 54321);
pwrite(temp_fd, data2, 15, 0);

/* commit the entire update */
struct xfs_exchange_range args = {
.file1_fd = temp_fd,
.flags = XFS_EXCHANGE_RANGE_TO_EOF,
};

ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);

The second is a software-defined storage host (e.g. a disk
jukebox) which implements an atomic scatter-gather write com‐
mand. Provided the exported disk's logical block size matches
the file's allocation unit size, this can be done by creating a
temporary file and writing the data at the appropriate offsets.
It is recommended that the temporary file be truncated to the
size of the regular file before any writes are staged to the
temporary file to avoid issues with zeroing during EOF exten‐
sion. Use this call with the FILE1_WRITTEN flag to exchange
only the file allocation units involved in the emulated de‐
vice's write command. The temporary file should be truncated
or punched out completely before being reused to stage another
write.

An example program might look like this:

int fd = open("/some/file", O_RDWR);
int temp_fd = open("/some", O_TMPFILE | O_RDWR);
struct stat sb;
int blksz;

fstat(fd, &sb);
blksz = sb.st_blksize;

/* land scatter gather writes between 100fsb and 500fsb */
pwrite(temp_fd, data1, blksz * 2, blksz * 100);
pwrite(temp_fd, data2, blksz * 20, blksz * 480);
pwrite(temp_fd, data3, blksz * 7, blksz * 257);

/* commit the entire update */
struct xfs_exchange_range args = {
.file1_fd = temp_fd,
.file1_offset = blksz * 100,
.file2_offset = blksz * 100,
.length = blksz * 400,
.flags = XFS_EXCHANGE_RANGE_FILE1_WRITTEN |
XFS_EXCHANGE_RANGE_FILE1_DSYNC,
};

ioctl(fd, XFS_IOC_EXCHANGE_RANGE, &args);

NOTES
Some filesystems may limit the amount of data or the number of
extents that can be exchanged in a single call.

SEE ALSO
ioctl(2)

XFS 2024-02-10 IOCTL-XFS-EXCHANGE-RANGE(2)

The reference implementation in XFS creates a new log incompat feature
and log intent items to track high level progress of swapping ranges of
two files and finish interrupted work if the system goes down. Sample
code can be found in the corresponding changes to xfs_io to exercise the
use case mentioned above.

Note that this function is /not/ the O_DIRECT atomic untorn file writes
concept that has also been floating around for years. It is also not
the RWF_ATOMIC patchset that has been shared. This RFC is constructed
entirely in software, which means that there are no limitations other
than the general filesystem limits.

As a side note, the original motivation behind the kernel functionality
is online repair of file-based metadata. The atomic file content
exchange is implemented as an atomic exchange of file fork mappings,
which means that we can implement online reconstruction of extended
attributes and directories by building a new one in another inode and
exchanging the contents.

Subsequent patchsets adapt the online filesystem repair code to use
atomic file exchanges. This enables repair functions to construct a
clean copy of a directory, xattr information, symbolic links, realtime
bitmaps, and realtime summary information in a temporary inode. If this
completes successfully, the new contents can be committed atomically
into the inode being repaired. This is essential to avoid making
corruption problems worse if the system goes down in the middle of
running repair.

For userspace, this series also includes the userspace pieces needed to
test the new functionality, and a sample implementation of atomic file
updates.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

* tag 'atomic-file-updates-6.10_2024-04-15' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
xfs: enable logged file mapping exchange feature
docs: update swapext -> exchmaps language
xfs: capture inode generation numbers in the ondisk exchmaps log item
xfs: support non-power-of-two rtextsize with exchange-range
xfs: make file range exchange support realtime files
xfs: condense symbolic links after a mapping exchange operation
xfs: condense directories after a mapping exchange operation
xfs: condense extended attributes after a mapping exchange operation
xfs: add error injection to test file mapping exchange recovery
xfs: bind together the front and back ends of the file range exchange code
xfs: create deferred log items for file mapping exchanges
xfs: introduce a file mapping exchange log intent item
xfs: create a incompat flag for atomic file mapping exchanges
xfs: introduce new file range exchange ioctl
vfs: export remap and write check helpers

show more ...


# b3e60f84 15-Apr-2024 Darrick J. Wong <djwong@kernel.org>

xfs: support non-power-of-two rtextsize with exchange-range

The generic exchange-range alignment checks use (fast) bitmasking
operations to perform block alignment checks on the exchange parameters.

xfs: support non-power-of-two rtextsize with exchange-range

The generic exchange-range alignment checks use (fast) bitmasking
operations to perform block alignment checks on the exchange parameters.
Unfortunately, bitmasks require that the alignment size be a power of
two. This isn't true for realtime devices with a non-power-of-two
extent size, so we have to copy-pasta the generic checks using long
division for this to work properly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

show more ...


# e6294110 15-Apr-2024 Darrick J. Wong <djwong@kernel.org>

xfs: make file range exchange support realtime files

Now that bmap items support the realtime device, we can add the
necessary pieces to the file range exchange code to support exchanging
mappings.

xfs: make file range exchange support realtime files

Now that bmap items support the realtime device, we can add the
necessary pieces to the file range exchange code to support exchanging
mappings. All we really need to do here is adjust the blockcount
upwards to the end of the rt extent and remove the inode checks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>

show more ...


12