Documentation: iomap: Add missing flags descriptionLet's document the use of these flags in iomap design doc where otherflags are defined too -- IOMAP_F_BOUNDARY was added by XFS to prevent merg
Documentation: iomap: Add missing flags descriptionLet's document the use of these flags in iomap design doc where otherflags are defined too -- IOMAP_F_BOUNDARY was added by XFS to prevent merging of I/O and I/O completions across RTG boundaries.- IOMAP_F_ATOMIC_BIO was added for supporting atomic I/O operations for filesystems to inform the iomap that it needs HW-offload based mechanism for torn-write protection.While we are at it, let's also fix the description of IOMAP_F_PRIVATEflag after a recent:commit 923936efeb74b3 ("iomap: Fix conflicting values of iomap flags")Signed-off-by: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>Link: https://lore.kernel.org/8d8534a704c4f162f347a84830710db32a927b2e.1744432270.git.ritesh.list@gmail.comReviewed-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
iomap: rework IOMAP atomic flagsFlag IOMAP_ATOMIC_SW is not really required. The idea of having this flagis that the FS ->iomap_begin callback could check if this flag is set todecide whether to
iomap: rework IOMAP atomic flagsFlag IOMAP_ATOMIC_SW is not really required. The idea of having this flagis that the FS ->iomap_begin callback could check if this flag is set todecide whether to do a SW (FS-based) atomic write. But the FS can setwhich ->iomap_begin callback it wants when deciding to do a FS-basedatomic write.Furthermore, it was thought that IOMAP_ATOMIC_HW is not a proper name, asthe block driver can use SW-methods to emulate an atomic write. So changeback to IOMAP_ATOMIC.The ->iomap_begin callback needs though to indicate to iomap core thatREQ_ATOMIC needs to be set, so add IOMAP_F_ATOMIC_BIO for that.These changes were suggested by Christoph Hellwig and Dave Chinner.Signed-off-by: John Garry <john.g.garry@oracle.com>Link: https://lore.kernel.org/r/20250320120250.4087011-4-john.g.garry@oracle.comReviewed-by: Christoph Hellwig <hch@lst.de>Signed-off-by: Christian Brauner <brauner@kernel.org>
iomap: Support SW-based atomic writesCurrently atomic write support requires dedicated HW support. This imposesa restriction on the filesystem that disk blocks need to be aligned andcontiguously
iomap: Support SW-based atomic writesCurrently atomic write support requires dedicated HW support. This imposesa restriction on the filesystem that disk blocks need to be aligned andcontiguously mapped to FS blocks to issue atomic writes.XFS has no method to guarantee FS block alignment for regular,non-RT files. As such, atomic writes are currently limited to 1x FS blockthere.To deal with the scenario that we are issuing an atomic write overmisaligned or discontiguous data blocks - and raise the atomic write sizelimit - support a SW-based software emulated atomic write mode. For XFS,this SW-based atomic writes would use CoW support to issue emulated untornwrites.It is the responsibility of the FS to detect discontiguous atomic writesand switch to IOMAP_DIO_ATOMIC_SW mode and retry the write. Indeed,SW-based atomic writes could be used always when the mounted bdev doesnot support HW offload, but this strategy is not initially expected to beused.Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: John Garry <john.g.garry@oracle.com>Link: https://lore.kernel.org/r/20250303171120.2837067-6-john.g.garry@oracle.comSigned-off-by: Christian Brauner <brauner@kernel.org>
iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HWIn future xfs will support a SW-based atomic write, so renameIOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used.Also relocate setti
iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HWIn future xfs will support a SW-based atomic write, so renameIOMAP_ATOMIC -> IOMAP_ATOMIC_HW to be clear which mode is being used.Also relocate setting of IOMAP_ATOMIC_HW to the write path in__iomap_dio_rw(), to be clear that this flag is only relevant to writes.Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: John Garry <john.g.garry@oracle.com>Link: https://lore.kernel.org/r/20250303171120.2837067-3-john.g.garry@oracle.comSigned-off-by: Christian Brauner <brauner@kernel.org>
Merge branch 'vfs-6.15.shared.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfsBring in iomap changes that xfs relies on.Signed-off-by: Christian Brauner <brauner@kernel.org>
iomap: make buffered writes work with RWF_DONTCACHEAdd iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE isset for a write, mark the folios being written as uncached. Thenwriteback
iomap: make buffered writes work with RWF_DONTCACHEAdd iomap buffered write support for RWF_DONTCACHE. If RWF_DONTCACHE isset for a write, mark the folios being written as uncached. Thenwriteback completion will drop the pages. The write_iter handler simplykicks off writeback for the pages, and writeback completion will takecare of the rest.Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: Jens Axboe <axboe@kernel.dk>Link: https://lore.kernel.org/r/20250204184047.356762-2-axboe@kernel.dkReviewed-by: Christoph Hellwig <hch@lst.de>Signed-off-by: Christian Brauner <brauner@kernel.org>
iomap: add a IOMAP_F_ANON_WRITE flagAdd a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does nothave a target block assigned to it yet at iomap time and the file systemwill do that in
iomap: add a IOMAP_F_ANON_WRITE flagAdd a IOMAP_F_ANON_WRITE flag that indicates that the write I/O does nothave a target block assigned to it yet at iomap time and the file systemwill do that in the bio submission handler, splitting the I/O as needed.This is used to implement Zone Append based I/O for zoned XFS, wheresplitting writes to the hardware limits and assigning a zone to themhappens just before sending the I/O off to the block layer, but couldalso be useful for other things like compressed I/O.Signed-off-by: Christoph Hellwig <hch@lst.de>Link: https://lore.kernel.org/r/20250206064035.2323428-4-hch@lst.deReviewed-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: Christian Brauner <brauner@kernel.org>
iomap: allow the file system to submit the writeback biosChange ->prepare_ioend to ->submit_ioend and require file systems thatimplement it to submit the bio. This is needed for file systems that
iomap: allow the file system to submit the writeback biosChange ->prepare_ioend to ->submit_ioend and require file systems thatimplement it to submit the bio. This is needed for file systems thatdo their own work on the bios before submitting them to the block layerlike btrfs or zoned xfs. To make this easier also pass the writebackcontext to the method.Signed-off-by: Christoph Hellwig <hch@lst.de>Link: https://lore.kernel.org/r/20250206064035.2323428-2-hch@lst.deReviewed-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: Christian Brauner <brauner@kernel.org>
Documentation: filesystems: fix two misspellsThis fixes two small misspells in the filesystems documentation.Signed-off-by: Bingwu Zhang <xtex@aosc.io>Reviewed-by: Darrick J. Wong <djwong@kernel
Documentation: filesystems: fix two misspellsThis fixes two small misspells in the filesystems documentation.Signed-off-by: Bingwu Zhang <xtex@aosc.io>Reviewed-by: Darrick J. Wong <djwong@kernel.org>Signed-off-by: Jonathan Corbet <corbet@lwn.net>Link: https://lore.kernel.org/r/20241208035447.162465-2-xtex@envs.net
Merge tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsPull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write p
Merge tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfsPull vfs untorn write support from Christian Brauner: "An atomic write is a write issed with torn-write protection. This means for a power failure or any hardware failure all or none of the data from the write will be stored, never a mix of old and new data. This work is already supported for block devices. If a block device is opened with O_DIRECT and the block device supports atomic write, then FMODE_CAN_ATOMIC_WRITE is added to the file of the opened block device. This contains the work to expand atomic write support to filesystems, specifically ext4 and XFS. Currently, only support for writing exactly one filesystem block atomically is added. Since it's now possible to have filesystem block size > page size for XFS, it's possible to write 4K+ blocks atomically on x86"* tag 'vfs-6.13.untorn.writes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: drop an obsolete comment in iomap_dio_bio_iter ext4: Do not fallback to buffered-io for DIO atomic write ext4: Support setting FMODE_CAN_ATOMIC_WRITE ext4: Check for atomic writes support in write iter ext4: Add statx support for atomic writes xfs: Support setting FMODE_CAN_ATOMIC_WRITE xfs: Validate atomic writes xfs: Support atomic write for statx fs: iomap: Atomic write support fs: Export generic_atomic_write_valid() block: Add bdev atomic write limits helpers fs/block: Check for IOCB_DIRECT in generic_atomic_write_valid() block/fs: Pass an iocb to generic_atomic_write_valid()
fs: iomap: Atomic write supportSupport direct I/O atomic writes by producing a single bio with REQ_ATOMICflag set.Initially FSes (XFS) should only support writing a single FS blockatomically.
fs: iomap: Atomic write supportSupport direct I/O atomic writes by producing a single bio with REQ_ATOMICflag set.Initially FSes (XFS) should only support writing a single FS blockatomically.As with any atomic write, we should produce a single bio which covers thecomplete write length.Reviewed-by: Christoph Hellwig <hch@lst.de>Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>Signed-off-by: John Garry <john.g.garry@oracle.com>Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>[djwong: clarify a couple of things in the docs]Signed-off-by: Darrick J. Wong <djwong@kernel.org>
iomap: remove iomap_file_buffered_write_punch_delallocCurrently iomap_file_buffered_write_punch_delalloc can be called fromXFS either with the invalidate lock held or not. To fix this whilekeepi
iomap: remove iomap_file_buffered_write_punch_delallocCurrently iomap_file_buffered_write_punch_delalloc can be called fromXFS either with the invalidate lock held or not. To fix this whilekeeping the locking in the file system and not the iomap librarycode we'll need to life the locking up into the file system.To prepare for that, open code iomap_file_buffered_write_punch_delallocin the only caller, and instead export iomap_write_delalloc_release.Signed-off-by: Christoph Hellwig <hch@lst.de>Reviewed-by: Darrick J. Wong <djwong@kernel.org>Signed-off-by: Carlos Maiolino <cem@kernel.org>
Merge tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfsPull vfs blocksize updates from Christian Brauner: "This contains the vfs infrastructure as well as the xfs bi
Merge tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfsPull vfs blocksize updates from Christian Brauner: "This contains the vfs infrastructure as well as the xfs bits to enable support for block sizes (bs) larger than page sizes (ps) plus a few fixes to related infrastructure. There has been efforts over the last 16 years to enable enable Large Block Sizes (LBS), that is block sizes in filesystems where bs > page size. Through these efforts we have learned that one of the main blockers to supporting bs > ps in filesystems has been a way to allocate pages that are at least the filesystem block size on the page cache where bs > ps. Thanks to various previous efforts it is possible to support bs > ps in XFS with only a few changes in XFS itself. Most changes are to the page cache to support minimum order folio support for the target block size on the filesystem. A motivation for Large Block Sizes today is to support high-capacity (large amount of Terabytes) QLC SSDs where the internal Indirection Unit (IU) are typically greater than 4k to help reduce DRAM and so in turn cost and space. In practice this then allows different architectures to use a base page size of 4k while still enabling support for block sizes aligned to the larger IUs by relying on high order folios on the page cache when needed. It also allows to take advantage of the drive's support for atomics larger than 4k with buffered IO support in Linux. As described this year at LSFMM, supporting large atomics greater than 4k enables databases to remove the need to rely on their own journaling, so they can disable double buffered writes, which is a feature different cloud providers are already enabling through custom storage solutions"* tag 'vfs-6.12.blocksize' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (22 commits) Documentation: iomap: fix a typo iomap: remove the iomap_file_buffered_write_punch_delalloc return value iomap: pass the iomap to the punch callback iomap: pass flags to iomap_file_buffered_write_punch_delalloc iomap: improve shared block detection in iomap_unshare_iter iomap: handle a post-direct I/O invalidate race in iomap_write_delalloc_release docs:filesystems: fix spelling and grammar mistakes in iomap design page filemap: fix htmldoc warning for mapping_align_index() iomap: make zero range flush conditional on unwritten mappings iomap: fix handling of dirty folios over unwritten extents iomap: add a private argument for iomap_file_buffered_write iomap: remove set_memor_ro() on zero page xfs: enable block size larger than page size support xfs: make the calculation generic in xfs_sb_validate_fsb_count() xfs: expose block size in stat xfs: use kvmalloc for xattr buffers iomap: fix iomap_dio_zero() for fs bs > system page size filemap: cap PTE range to be created to allowed zero fill in folio_map_range() mm: split a folio in minimum folio order chunks readahead: allocate folios with mapping_min_order in readahead ...
Documentation: iomap: fix a typoChange voidw -> void.Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>Link: https://lore.kernel.org/r/20240820161329.1293718-1-kernel@pankajraghav.comReviewed-
Documentation: iomap: fix a typoChange voidw -> void.Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>Link: https://lore.kernel.org/r/20240820161329.1293718-1-kernel@pankajraghav.comReviewed-by: Darrick J. Wong <djwong@kernel.org>Signed-off-by: Christian Brauner <brauner@kernel.org>
docs:filesystems: fix spelling and grammar mistakes in iomap design pageSigned-off-by: Dennis Lam <dennis.lamerice@gmail.com>Link: https://lore.kernel.org/r/20240908172841.9616-2-dennis.lamerice@g
docs:filesystems: fix spelling and grammar mistakes in iomap design pageSigned-off-by: Dennis Lam <dennis.lamerice@gmail.com>Link: https://lore.kernel.org/r/20240908172841.9616-2-dennis.lamerice@gmail.comSigned-off-by: Christian Brauner <brauner@kernel.org>
Fix spelling and gramatical errorsFixed 3 typos in design.rstSigned-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.comReviewe
Fix spelling and gramatical errorsFixed 3 typos in design.rstSigned-off-by: Xiaxi Shen <shenxiaxi26@gmail.com>Link: https://lore.kernel.org/r/20240807070536.14536-1-shenxiaxi26@gmail.comReviewed-by: Carlos Maiolino <cmaiolino@redhat.com>Reviewed-by: Darrick J. Wong <djwong@kernel.org>Signed-off-by: Christian Brauner <brauner@kernel.org>
Documentation: the design of iomap and how to portCapture the design of iomap and how to port filesystems to use it.Apologies for all the rst formatting, but it's necessary to distinguishcode fro
Documentation: the design of iomap and how to portCapture the design of iomap and how to port filesystems to use it.Apologies for all the rst formatting, but it's necessary to distinguishcode from regular text.A lot of this has been collected from various email conversations, codecomments, commit messages, my own understanding of iomap, andRitesh/Luis' previous efforts to create a document. Please note a largepart of this has been taken from Dave's reply to last iomap docpatchset. Thanks to Ritesh, Luis, Dave, Darrick, Matthew, Christoph andother iomap developers who have taken time to explain the iomap designin various emails, commits, comments etc.Cc: Dave Chinner <david@fromorbit.com>Cc: Matthew Wilcox <willy@infradead.org>Cc: Christoph Hellwig <hch@infradead.org>Cc: Christian Brauner <brauner@kernel.org>Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>Cc: Jan Kara <jack@suse.cz>Cc: Luis Chamberlain <mcgrof@kernel.org>Inspired-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>Signed-off-by: Darrick J. Wong <djwong@kernel.org>Link: https://lore.kernel.org/r/20240614214347.GK6125@frogsfrogsfrogsSigned-off-by: Christian Brauner <brauner@kernel.org>