#
43a7548e |
| 12-Mar-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "Mostly stabilization, refactoring and cleanup changes. There rest are m
Merge tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "Mostly stabilization, refactoring and cleanup changes. There rest are minor performance optimizations due to caching or lock contention reduction and a few notable fixes.
Performance improvements:
- minor speedup in logging when repeatedly allocated structure is preallocated only once, improves latency and decreases lock contention
- minor throughput increase (+6%), reduced lock contention after clearing delayed allocation bits, applies to several common workload types
- skip full quota rescan if a new relation is added in the same transaction
Fixes:
- zstd fix for inline compressed file in subpage mode, updated version from the 6.8 time
- proper qgroup inheritance ioctl parameter validation
- more fiemap followup fixes after reduced locking done in 6.8: - fix race when detecting delalloc ranges
Core changes:
- more debugging code: - added assertions for a very rare crash in raid56 calculation - tree-checker dumps page state to give more insights into possible reference counting issues
- add checksum calculation offloading sysfs knob, for now enabled under DEBUG only to determine a good heuristic for deciding the offload or synchronous, depends on various factors (block group profile, device speed) and is not as clear as initially thought (checksum type)
- error handling improvements, added assertions
- more page to folio conversion (defrag, truncate), cached size and shift
- preparation for more fine grained locking of sectors in subpage mode
- cleanups and refactoring: - include cleanups, forward declarations - pointer-to-structure helpers - redundant argument removals - removed unused code - slab cache updates, last use of SLAB_MEM_SPREAD removed"
* tag 'for-6.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (114 commits) btrfs: reuse cloned extent buffer during fiemap to avoid re-allocations btrfs: fix race when detecting delalloc ranges during fiemap btrfs: fix off-by-one chunk length calculation at contains_pending_extent() btrfs: qgroup: allow quick inherit if snapshot is created and added to the same parent btrfs: qgroup: validate btrfs_qgroup_inherit parameter btrfs: include device major and minor numbers in the device scan notice btrfs: mark btrfs_put_caching_control() static btrfs: remove SLAB_MEM_SPREAD flag use btrfs: qgroup: always free reserved space for extent records btrfs: tree-checker: dump the page status if hit something wrong btrfs: compression: remove dead comments in btrfs_compress_heuristic() btrfs: subpage: make writer lock utilize bitmap btrfs: subpage: make reader lock utilize bitmap btrfs: unexport btrfs_subpage_start_writer() and btrfs_subpage_end_and_test_writer() btrfs: pass a valid extent map cache pointer to __get_extent_map() btrfs: merge btrfs_del_delalloc_inode() helpers btrfs: pass btrfs_device to btrfs_scratch_superblocks() btrfs: handle transaction commit errors in flush_reservations() btrfs: use KMEM_CACHE() to create btrfs_free_space cache btrfs: use KMEM_CACHE() to create delayed ref caches ...
show more ...
|
#
25da852d |
| 22-Feb-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: compression: remove dead comments in btrfs_compress_heuristic()
Since commit a440d48c7f93 ("Btrfs: heuristic: implement sampling logic"), btrfs_compress_heuristic() is no longer a simple "ret
btrfs: compression: remove dead comments in btrfs_compress_heuristic()
Since commit a440d48c7f93 ("Btrfs: heuristic: implement sampling logic"), btrfs_compress_heuristic() is no longer a simple "return true", but more complex to determine if we should compress.
Thus the comment is dead and can be confusing, just remove it.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
41044b41 |
| 14-Sep-2023 |
David Sterba <dsterba@suse.com> |
btrfs: add helper to get fs_info from struct inode pointer
Add a convenience helper to get a fs_info from a VFS inode pointer instead of open coding the chain or using btrfs_sb() that in some cases
btrfs: add helper to get fs_info from struct inode pointer
Add a convenience helper to get a fs_info from a VFS inode pointer instead of open coding the chain or using btrfs_sb() that in some cases does one more pointer hop. This is implemented as a macro (still with type checking) so we don't need full definitions of struct btrfs_inode, btrfs_root or btrfs_fs_info.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
b33d2e53 |
| 14-Sep-2023 |
David Sterba <dsterba@suse.com> |
btrfs: add helpers to get fs_info from page/folio pointers
Add convenience helpers to get a fs_info from a page or folio pointer instead of open coding the chain or using btrfs_sb() that in some cas
btrfs: add helpers to get fs_info from page/folio pointers
Add convenience helpers to get a fs_info from a page or folio pointer instead of open coding the chain or using btrfs_sb() that in some cases does one more pointer hop. This is implemented as a macro (still with type checking) so we don't need full definitions of struct page, folio, btrfs_root and btrfs_fs_info. The latter can't be static inlines as this would create loop between ctree.h <-> fs.h, or the headers would have to be restructured.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
2b712e3b |
| 25-Jan-2024 |
David Sterba <dsterba@suse.com> |
btrfs: remove unused included headers
With help of neovim, LSP and clangd we can identify header files that are not actually needed to be included in the .c files. This is focused only on removal (w
btrfs: remove unused included headers
With help of neovim, LSP and clangd we can identify header files that are not actually needed to be included in the .c files. This is focused only on removal (with minor fixups), further cleanups are possible but will require doing the header files properly with forward declarations, minimized includes and include-what-you-use care.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
03c11eb3 |
| 14-Feb-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch
Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h
Signed-off-by: Ingo Molnar <mingo@k
Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the branch
Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
42ac0be1 |
| 26-Jan-2024 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into x86/mm, to refresh the branch and pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
06f609b3 |
| 25-Jan-2024 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
#
2f910859 |
| 26-Feb-2024 |
Maxime Ripard <mripard@kernel.org> |
Merge drm/drm-fixes into drm-misc-fixes
Sima needs a more recent release to apply a patch.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
#
349bd87f |
| 02-Feb-2024 |
Andrew Morton <akpm@linux-foundation.org> |
Merge branch 'master' into mm-hotfixes-stable
|
#
d4ea2bd1 |
| 01-Feb-2024 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v6.8-rc2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.8
This pull request adds Richard Fitzgerald's series with extensiv
Merge tag 'asoc-fix-v6.8-rc2-2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.8
This pull request adds Richard Fitzgerald's series with extensive fixes for the CS35L56, he said:
These patches fix various things that were undocumented, unknown or uncertain when the original driver code was written. And also a few things that were just bugs.
show more ...
|
#
e81fdba0 |
| 01-Feb-2024 |
Mark Brown <broonie@kernel.org> |
ALSA: Various fixes for Cirrus Logic CS35L56 support
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
These patches fixe various things that were undocumented, unknown or uncertain
ALSA: Various fixes for Cirrus Logic CS35L56 support
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
These patches fixe various things that were undocumented, unknown or uncertain when the original driver code was written. And also a few things that were just bugs.
show more ...
|
#
fe33c0fb |
| 17-Jan-2024 |
Andrew Morton <akpm@linux-foundation.org> |
Merge branch 'master' into mm-hotfixes-stable
|
#
cf79f291 |
| 22-Jan-2024 |
Maxime Ripard <mripard@kernel.org> |
Merge v6.8-rc1 into drm-misc-fixes
Let's kickstart the 6.8 fix cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
#
5d9248ee |
| 22-Jan-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- zoned mode fixes: - fix slowdown when writing large file sequent
Merge tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- zoned mode fixes: - fix slowdown when writing large file sequentially by looking up block groups with enough space faster - locking fixes when activating a zone
- new mount API fixes: - preserve mount options for a ro/rw mount of the same subvolume
- scrub fixes: - fix use-after-free in case the chunk length is not aligned to 64K, this does not happen normally but has been reported on images converted from ext4 - similar alignment check was missing with raid-stripe-tree
- subvolume deletion fixes: - prevent calling ioctl on already deleted subvolume - properly track flag tracking a deleted subvolume
- in subpage mode, fix decompression of an inline extent (zlib, lzo, zstd)
- fix crash when starting writeback on a folio, after integration with recent MM changes this needs to be started conditionally
- reject unknown flags in defrag ioctl
- error handling, API fixes, minor warning fixes
* tag 'for-6.8-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: scrub: limit RST scrub to chunk boundary btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned btrfs: don't unconditionally call folio_start_writeback in subpage btrfs: use the original mount's mount options for the legacy reconfigure btrfs: don't warn if discard range is not aligned to sector btrfs: tree-checker: fix inline ref size in error messages btrfs: zstd: fix and simplify the inline extent decompression btrfs: lzo: fix and simplify the inline extent decompression btrfs: zlib: fix and simplify the inline extent decompression btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted btrfs: don't abort filesystem when attempting to snapshot deleted subvolume btrfs: zoned: fix lock ordering in btrfs_zone_activate() btrfs: fix unbalanced unlock of mapping_tree_lock btrfs: ref-verify: free ref cache before clearing mount opt btrfs: fix kvcalloc() arguments order in btrfs_ioctl_send() btrfs: zoned: optimize hint byte for zoned allocator btrfs: zoned: factor out prepare_allocation_zoned()
show more ...
|
#
2c25716d |
| 08-Jan-2024 |
Qu Wenruo <wqu@suse.com> |
btrfs: zlib: fix and simplify the inline extent decompression
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed extent created like this:
item 4 key (257 INODE_ITEM 0) i
btrfs: zlib: fix and simplify the inline extent decompression
[BUG]
If we have a filesystem with 4k sectorsize, and an inlined compressed extent created like this:
item 4 key (257 INODE_ITEM 0) itemoff 15863 itemsize 160 generation 8 transid 8 size 4096 nbytes 4096 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 1 flags 0x0(none) item 5 key (257 INODE_REF 256) itemoff 15839 itemsize 24 index 2 namelen 14 name: source_inlined item 6 key (257 EXTENT_DATA 0) itemoff 15770 itemsize 69 generation 8 type 0 (inline) inline extent data size 48 ram_bytes 4096 compression 1 (zlib)
Which has an inline compressed extent at file offset 0, and its decompressed size is 4K, allowing us to reflink that 4K range to another location (which will not be compressed).
If we do such reflink on a subpage system, it would fail like this:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest XFS_IOC_CLONE_RANGE: Input/output error
[CAUSE] In zlib_decompress(), we didn't treat @start_byte as just a page offset, but also use it as an indicator on whether we should switch our output buffer.
In reality, for subpage cases, although @start_byte can be non-zero, we should never switch input/output buffer, since the whole input/output buffer should never exceed one sector.
Note: The above assumption is only not true if we're going to support multi-page sectorsize.
Thus the current code using @start_byte as a condition to switch input/output buffer or finish the decompression is completely incorrect.
[FIX] The fix involves several modifications:
- Rename @start_byte to @dest_pgoff to properly express its meaning
- Add an extra ASSERT() inside btrfs_decompress() to make sure the input/output size never exceeds one sector.
- Use Z_FINISH flag to make sure the decompression happens in one go
- Remove the loop needed to switch input/output buffers
- Use correct destination offset inside the destination page
- Consider early end as an error
After the fix, even on 64K page sized aarch64, above reflink now works as expected:
# xfs_io -f -c "reflink $mnt/source_inlined 0 60k 4k" $mnt/dest linked 4096/4096 bytes at offset 61440
And resulted a correct file layout:
item 9 key (258 INODE_ITEM 0) itemoff 15542 itemsize 160 generation 10 transid 10 size 65536 nbytes 4096 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 1 flags 0x0(none) item 10 key (258 INODE_REF 256) itemoff 15528 itemsize 14 index 3 namelen 4 name: dest item 11 key (258 XATTR_ITEM 3817753667) itemoff 15445 itemsize 83 location key (0 UNKNOWN.0 0) type XATTR transid 10 data_len 37 name_len 16 name: security.selinux data unconfined_u:object_r:unlabeled_t:s0 item 12 key (258 EXTENT_DATA 61440) itemoff 15392 itemsize 53 generation 10 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none)
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
ab1c2470 |
| 19-Dec-2023 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes that went thru perf-tools for v6.7 and to get in sync with upstream to check for drift in the copies of headers,
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up fixes that went thru perf-tools for v6.7 and to get in sync with upstream to check for drift in the copies of headers, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
3bf3e21c |
| 15-Nov-2023 |
Maxime Ripard <mripard@kernel.org> |
Merge drm/drm-next into drm-misc-next
Let's kickstart the v6.8 release cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
#
affc5af3 |
| 10-Jan-2024 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "There are no exciting changes for users, it's been mostly API conversio
Merge tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba: "There are no exciting changes for users, it's been mostly API conversions and some fixes or refactoring.
The mount API conversion is a base for future improvements that would come with VFS. Metadata processing has been converted to folios, not yet enabling the large folios but it's one patch away once everything gets tested enough.
Core changes:
- convert extent buffers to folios: - direct API conversion where possible - performance can drop by a few percent on metadata heavy workloads, the folio sizes are not constant and the calculations add up in the item helpers - both regular and subpage modes - data cannot be converted yet, we need to port that to iomap and there are some other generic changes required
- convert mount to the new API, should not be user visible: - options deprecated long time ago have been removed: inode_cache, recovery - the new logic that splits mount to two phases slightly changes timing of device scanning for multi-device filesystems - LSM options will now work (like for selinux)
- convert delayed nodes radix tree to xarray, preserving the preload-like logic that still allows to allocate with GFP_NOFS
- more validation of sysfs value of scrub_speed_max
- refactor chunk map structure, reduce size and improve performance
- extent map refactoring, smaller data structures, improved performance
- reduce size of struct extent_io_tree, embedded in several structures
- temporary pages used for compression are cached and attached to a shrinker, this may slightly improve performance
- in zoned mode, remove redirty extent buffer tracking, zeros are written in case an out-of-order is detected and proper data are written to the actual write pointer
- cleanups, refactoring, error message improvements, updated tests
- verify and update branch name or tag
- remove unwanted text"
* tag 'for-6.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (89 commits) btrfs: pass btrfs_io_geometry into btrfs_max_io_len btrfs: pass struct btrfs_io_geometry to set_io_stripe btrfs: open code set_io_stripe for RAID56 btrfs: change block mapping to switch/case in btrfs_map_block btrfs: factor out block mapping for single profiles btrfs: factor out block mapping for RAID5/6 btrfs: reduce scope of data_stripes in btrfs_map_block btrfs: factor out block mapping for RAID10 btrfs: factor out block mapping for DUP profiles btrfs: factor out RAID1 block mapping btrfs: factor out block-mapping for RAID0 btrfs: re-introduce struct btrfs_io_geometry btrfs: factor out helper for single device IO check btrfs: migrate btrfs_repair_io_failure() to folio interfaces btrfs: migrate eb_bitmap_offset() to folio interfaces btrfs: migrate various end io functions to folios btrfs: migrate subpage code to folio interfaces btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios btrfs: don't double put our subpage reference in alloc_extent_buffer btrfs: cleanup metadata page pointer usage ...
show more ...
|
#
a700ca5e |
| 12-Dec-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: migrate various end io functions to folios
If we still go the old page based iterator functions, like bio_for_each_segment_all(), we can hit middle pages of a folio (compound page).
In that
btrfs: migrate various end io functions to folios
If we still go the old page based iterator functions, like bio_for_each_segment_all(), we can hit middle pages of a folio (compound page).
In that case if we set any page flag on those middle pages, we can easily trigger VM_BUG_ON(), as for compound page flags, they should follow their flag policies (normally only set on leading or tail pages).
To avoid such problem in the future full folio migration, here we do:
- Change from bio_for_each_segment_all() to bio_for_each_folio_all() This completely removes the ability to access the middle page.
- Add extra ASSERT()s for data read/write paths To ensure we only get single paged folio for data now.
- Rename those end io functions to follow a certain schema * end_bbio_compressed_read() * end_bbio_compressed_write()
These two endio functions don't set any page flags, as they use pages not mapped to any address space. They can be very good candidates for higher order folio testing.
And they are shared between compression and encoded IO.
* end_bbio_data_read() * end_bbio_data_write() * end_bbio_meta_read() * end_bbio_meta_write()
The old function names are not unified: - end_bio_extent_writepage() - end_bio_extent_readpage() - extent_buffer_write_end_io() - extent_buffer_read_end_io()
They share no schema on where the "end_*io" string should be, nor can be confusing just using "extent_buffer" and "extent" to distinguish data and metadata paths.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
55151ea9 |
| 12-Dec-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: migrate subpage code to folio interfaces
Although subpage itself is conflicting with higher folio, since subpage (sectorsize < PAGE_SIZE and nodesize < PAGE_SIZE) means we will never need hig
btrfs: migrate subpage code to folio interfaces
Although subpage itself is conflicting with higher folio, since subpage (sectorsize < PAGE_SIZE and nodesize < PAGE_SIZE) means we will never need higher order folio, there is a hidden pitfall:
- btrfs_page_*() helpers
Those helpers are an abstraction to handle both subpage and non-subpage cases, which means we're going to pass pages pointers to those helpers.
And since those helpers are shared between data and metadata paths, it's unavoidable to let them to handle folios, including higher order folios).
Meanwhile for true subpage case, we should only have a single page backed folios anyway, thus add a new ASSERT() for btrfs_subpage_assert() to ensure that.
Also since those helpers are shared between both data and metadata, add some extra ASSERT()s for data path to make sure we only get single page backed folio for now.
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
09e6cef1 |
| 29-Nov-2023 |
Qu Wenruo <wqu@suse.com> |
btrfs: refactor alloc_extent_buffer() to allocate-then-attach method
Currently alloc_extent_buffer() utilizes find_or_create_page() to allocate one page a time for an extent buffer.
This method has
btrfs: refactor alloc_extent_buffer() to allocate-then-attach method
Currently alloc_extent_buffer() utilizes find_or_create_page() to allocate one page a time for an extent buffer.
This method has the following disadvantages:
- find_or_create_page() is the legacy way of allocating new pages With the new folio infrastructure, find_or_create_page() is just redirected to filemap_get_folio().
- Lacks the way to support higher order (order >= 1) folios As we can not yet let filemap give us a higher order folio.
This patch would change the workflow by the following way:
Old | new -----------------------------------+------------------------------------- | ret = btrfs_alloc_page_array(); for (i = 0; i < num_pages; i++) { | for (i = 0; i < num_pages; i++) { p = find_or_create_page(); | ret = filemap_add_folio(); /* Attach page private */ | /* Reuse page cache if needed */ /* Reused eb if needed */ | | /* Attach page private and | reuse eb if needed */ | }
By this we split the page allocation and private attaching into two parts, allowing future updates to each part more easily, and migrate to folio interfaces (especially for possible higher order folios).
Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
f86f7a75 |
| 04-Dec-2023 |
Filipe Manana <fdmanana@suse.com> |
btrfs: use the flags of an extent map to identify the compression type
Currently, in struct extent_map, we use an unsigned int (32 bits) to identify the compression type of an extent and an unsigned
btrfs: use the flags of an extent map to identify the compression type
Currently, in struct extent_map, we use an unsigned int (32 bits) to identify the compression type of an extent and an unsigned long (64 bits on a 64 bits platform, 32 bits otherwise) for flags. We are only using 6 different flags, so an unsigned long is excessive and we can use flags to identify the compression type instead of using a dedicated 32 bits field.
We can easily have tens or hundreds of thousands (or more) of extent maps on busy and large filesystems, specially with compression enabled or many or large files with tons of small extents. So it's convenient to have the extent_map structure as small as possible in order to use less memory.
So remove the compression type field from struct extent_map, use flags to identify the compression type and shorten the flags field from an unsigned long to a u32. This saves 8 bytes (on 64 bits platforms) and reduces the size of the structure from 136 bytes down to 128 bytes, using now only two cache lines, and increases the number of extent maps we can have per 4K page from 30 to 32. By using a u32 for the flags instead of an unsigned long, we no longer use test_bit(), set_bit() and clear_bit(), but that level of atomicity is not needed as most flags are never cleared once set (before adding an extent map to the tree), and the ones that can be cleared or set after an extent map is added to the tree, are always performed while holding the write lock on the extent map tree, while the reader holds a lock on the tree or tests for a flag that never changes once the extent map is in the tree (such as compression flags).
Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
4cea422a |
| 15-Nov-2023 |
David Sterba <dsterba@suse.com> |
btrfs: use shrinker for compression page pool
The pages are now allocated and freed centrally, so we can extend the logic to manage the lifetime. The main idea is to keep a few recently used pages a
btrfs: use shrinker for compression page pool
The pages are now allocated and freed centrally, so we can extend the logic to manage the lifetime. The main idea is to keep a few recently used pages and hand them to all writers. Ideally we won't have to go to allocator at all (a slight performance gain) and also raise chance that we'll have the pages available (slightly increased reliability).
In order to avoid gathering too many pages, the shrinker is attached to the cache so we can free them on when MM demands that. The first implementation will drain the whole cache. Further this can be refined to keep some minimal number of pages for emergency purposes. The ultimate goal to avoid memory allocation failures on the write out path from the compression.
The pool threshold is set to cover full BTRFS_MAX_COMPRESSED / PAGE_SIZE for minimal thread pool, which is 8 (btrfs_init_fs_info()). This is 128K / 4K * 8 = 256 pages at maximum, which is 1MiB.
This is for all filesystems currently mounted, with heavy use of compression IO the allocator is still needed. The cache helps for short burst IO.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
#
9ba965dc |
| 15-Nov-2023 |
David Sterba <dsterba@suse.com> |
btrfs: use page alloc/free wrappers for compression pages
This is a preparation for managing compression pages in a cache-like manner, instead of asking the allocator each time. The common allocatio
btrfs: use page alloc/free wrappers for compression pages
This is a preparation for managing compression pages in a cache-like manner, instead of asking the allocator each time. The common allocation and free wrappers are introduced and are functionally equivalent to the current code.
The freeing helpers need to be carefully placed where the last reference is dropped. This is either after directly allocating (error handling) or when there are no other users of the pages (after copying the contents).
It's safe to not use the helper and use put_page() that will handle the reference count. Not using the helper means there's lower number of pages that could be reused without passing them back to allocator.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|