| 58829512 | 12-May-2026 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Fix the end_index calculation in __map_range_leaf()
Sashiko noticed a mismatch of units in this math: num_leaves is actually the number of leaf *entries* (so a 16-item contiguous leaf is on
iommupt: Fix the end_index calculation in __map_range_leaf()
Sashiko noticed a mismatch of units in this math: num_leaves is actually the number of leaf *entries* (so a 16-item contiguous leaf is one num_leaves), while index is in items. The mismatch in maths causes __map_range_leaf() to exit early instead of efficiently filling a larger range of contiguous PTEs.
The early exit is caught by the functions above and then __map_range_leaf() is re-invoked, so there is no functional issue.
Correct the misuse of units by adjusting num_leaves with the leaf size and avoid the performance cost of looping externally.
There are also some mismatched types for num_leaves; simplify things to remove the duplicated calculations.
Fixes: d6c65b0fd621 ("iommupt: Avoid rewalking during map") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewd-by: Pranjal Shrivastava <praan@google.com> Tested-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 8b72aa57 | 26-Mar-2026 |
Sherry Yang <sherry.yang@oracle.com> |
iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
After enabling CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL, following build failure is observed under GCC 14.2.1:
In function 'a
iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline
After enabling CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL, following build failure is observed under GCC 14.2.1:
In function 'amdv1pt_install_leaf_entry', inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:650:3, inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1, inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9, inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:657:10, inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1: ././include/linux/compiler_types.h:706:45: error: call to '__compiletime_assert_71' declared with attribute error: FIELD_PREP: value too large for the field 706 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) |
......
drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP' 220 | FIELD_PREP(AMDV1PT_FMT_OA, | ^~~~~~~~~~
In the path '__do_map_single_page()', level 0 always invokes 'pt_install_leaf_entry(&pts, map->oa, PAGE_SHIFT, …)'. At runtime that lands in the 'if (oasz_lg2 == isz_lg2)' arm of 'amdv1pt_install_leaf_entry()'; the contiguous-only 'else' block is unreachable for 4 KiB pages.
With CONFIG_GCOV_KERNEL + CONFIG_GCOV_PROFILE_ALL, the extra instrumentation changes GCC's inlining so that the "dead" 'else' branch still gets instantiated. The compiler constant-folds the contiguous OA expression, runs the 'FIELD_PREP()' compile-time check, and produces:
FIELD_PREP: value too large for the field
gcov-enabled builds therefore fail even though the code path never executes.
Fix this by marking amdv1pt_install_leaf_entry as __always_inline.
Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sherry Yang <sherry.yang@oracle.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| d6c65b0f | 27-Feb-2026 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Avoid rewalking during map
Currently the core code provides a simplified interface to drivers where it fragments a requested multi-page map into single page size steps after doing all the c
iommupt: Avoid rewalking during map
Currently the core code provides a simplified interface to drivers where it fragments a requested multi-page map into single page size steps after doing all the calculations to figure out what page size is appropriate. Each step rewalks the page tables from the start.
Since iommupt has a single implementation of the mapping algorithm it can internally compute each step as it goes while retaining its current position in the walk.
Add a new function pt_pgsz_count() which computes the same page size fragement of a large mapping operations.
Compute the next fragment when all the leaf entries of the current fragement have been written, then continue walking from the current point.
The function pointer is run through pt_iommu_ops instead of iommu_domain_ops to discourage using it outside iommupt. All drivers with their own page tables should continue to use the simplified map_pages() style interfaces.
Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 99fb8afa | 27-Feb-2026 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Directly call iommupt's unmap_range()
The common algorithm in iommupt does not require the iommu_pgsize() calculations, it can directly unmap any arbitrary range. Add a new function pointer
iommupt: Directly call iommupt's unmap_range()
The common algorithm in iommupt does not require the iommu_pgsize() calculations, it can directly unmap any arbitrary range. Add a new function pointer to directly call an iommupt unmap_range op and make __iommu_unmap() call it directly.
Gives about a 5% gain on single page unmappings.
The function pointer is run through pt_iommu_ops instead of iommu_domain_ops to discourage using it outside iommupt. All drivers with their own page tables should continue to use the simplified map/unmap_pages() style interfaces.
Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 199036ae | 27-Feb-2026 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Optimize the gather processing for DMA-FQ mode
In PT_FEAT_FLUSH_RANGE mode the gather was accumulated but never flushed and then the accumulated range was discarded by the dma-iommu code in
iommupt: Optimize the gather processing for DMA-FQ mode
In PT_FEAT_FLUSH_RANGE mode the gather was accumulated but never flushed and then the accumulated range was discarded by the dma-iommu code in DMA-FQ mode. This is basically optimal.
However for PT_FEAT_FLUSH_RANGE_NO_GAPS the page table would push flushes that are redundant with the flush all generated by the DMA-FQ mode.
Disable all range accumulation in the gather, and iommu_pt triggered flushing when in iommu_iotlb_gather_queued() indicates it is in DMA-FQ mode.
Reported-by: Robin Murphy <robin.murphy@arm.com> Closes: https://lore.kernel.org/r/794b6121-b66b-4819-b291-9761ed21cd83@arm.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| b48ca920 | 03-Feb-2026 |
Yu Zhang <zhangyu1@linux.microsoft.com> |
iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
Add current (iova, len) to the iotlb gather, regardless of the setting of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.
iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()
Add current (iova, len) to the iotlb gather, regardless of the setting of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.
In gather_range_pages(), the current IOVA range is only added to iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather will stay empty (start=ULONG_MAX, end=0) after initialization, and the current (iova, len) will not be added to the iotlb_gather, causing subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong parameters (e.g., amd_iommu_iotlb_sync() computes size from gather->end - gather->start + 1, leading to an invalid range).
The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain unchanged: when the new range is disjoint from the existing gather, we still sync first and then add the new range, so semantics for NO_GAPS are preserved.
Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op") Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 6a3d5fda | 09-Jan-2026 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Make pt_feature() always_inline
gcc 8.5 on powerpc does not automatically inline these functions even though they evaluate to constants in key cases. Since the constant propagation is essen
iommupt: Make pt_feature() always_inline
gcc 8.5 on powerpc does not automatically inline these functions even though they evaluate to constants in key cases. Since the constant propagation is essential for some code elimination and built-time checks this causes a build failure:
ERROR: modpost: "__pt_no_sw_bit" [drivers/iommu/generic_pt/fmt/iommu_amdv1.ko] undefined!
Caused by this:
if (pts_feature(&pts, PT_FEAT_DMA_INCOHERENT) && !pt_test_sw_bit_acquire(&pts, SW_BIT_CACHE_FLUSH_DONE)) flush_writes_item(&pts);
Where pts_feature() evaluates to a constant false. Mark them as __always_inline to force it to evaluate to a constant and trigger the code elimination.
Fixes: 7c5b184db714 ("genpt: Generic Page Table base API") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512230720.9y9DtWIo-lkp@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
| 1eb0ae6f | 28-Nov-2025 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
If the IOVA is limited to less than 48 the page table will be constructed with a 3 level configuration which is unsupported by ha
iommupt/vtd: Support mgaw's less than a 4 level walk for first stage
If the IOVA is limited to less than 48 the page table will be constructed with a 3 level configuration which is unsupported by hardware.
Like the second stage the caller needs to pass in both the top_level an the vasz to specify a table that has more levels than required to hold the IOVA range.
Fixes: 6cbc09b7719e ("iommu/vt-d: Restore previous domain::aperture_end calculation") Reported-by: Calvin Owens <calvin@wbinvd.org> Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| d856f9d2 | 28-Nov-2025 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
VT-d second stage HW specifies both the maximum IOVA and the supported table walk starting points. Weirdly there is HW that
iommupt/vtd: Allow VT-d to have a larger table top than the vasz requires
VT-d second stage HW specifies both the maximum IOVA and the supported table walk starting points. Weirdly there is HW that only supports a 4 level walk but has a maximum IOVA that only needs 3.
The current code miscalculates this and creates a wrongly sized page table which ultimately fails the compatibility check for number of levels.
This is fixed by allowing the page table to be created with both a vasz and top_level input. The vasz will set the aperture for the domain while the top_level will set the page table geometry.
Add top_level to vtdss and correct the logic in VT-d to generate the right top_level and vasz from mgaw and sagaw.
Fixes: d373449d8e97 ("iommu/vt-d: Use the generic iommu page table") Reported-by: Calvin Owens <calvin@wbinvd.org> Closes: https://lore.kernel.org/r/8f257d2651eb8a4358fcbd47b0145002e5f1d638.1764237717.git.calvin@wbinvd.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Tested-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 01569c21 | 12-Nov-2025 |
Geert Uytterhoeven <geert+renesas@glider.be> |
genpt: Make GENERIC_PT invisible
There is no point in asking the user about the Generic Radix Page Table API: - All IOMMU drivers that use this API already select GENERIC_PT when needed, - M
genpt: Make GENERIC_PT invisible
There is no point in asking the user about the Generic Radix Page Table API: - All IOMMU drivers that use this API already select GENERIC_PT when needed, - Most users probably do not know what to answer anyway.
Fixes: 7c5b184db7145fd4 ("genpt: Generic Page Table base API") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 5de863ef | 26-Nov-2025 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Avoid a compiler bug with sw_bit
gcc 13, in some cases, gets confused if the __builtin_constant_p() is inside the switch. It thinks that bitnr can have the value max+1 and fails. Lift the c
iommupt: Avoid a compiler bug with sw_bit
gcc 13, in some cases, gets confused if the __builtin_constant_p() is inside the switch. It thinks that bitnr can have the value max+1 and fails. Lift the check outside the switch to avoid it.
Fixes: ef7bfe5bbffd ("iommupt/x86: Support SW bits and permit PT_FEAT_DMA_INCOHERENT") Fixes: 5448c1558f60 ("iommupt: Add the Intel VT-d second stage page table format") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511242012.I7g504Ab-lkp@intel.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 152c862c | 20-Nov-2025 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Fix unlikely flows in increase_top()
Since increase_top() does it's own READ_ONCE() on top_of_table, the caller's prior READ_ONCE() could be inconsistent and the first time through the loop
iommupt: Fix unlikely flows in increase_top()
Since increase_top() does it's own READ_ONCE() on top_of_table, the caller's prior READ_ONCE() could be inconsistent and the first time through the loop we may actually already have the right level if two threads are racing map.
In this case new_level will be left uninitialized.
Further all the exits from the loop have to either commit to the new top or free any memory allocated so the early return must be a goto err_free.
Make it so the only break from the loop always sets new_level to the right value and all other exits go to err_free. Use pts.level (the pts represents the top we are stacking) within the loop instead of new_level.
Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/aRwgNW9PiW2j-Qwo@stanley.mountain Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 4fd24676 | 08-Nov-2025 |
Bagas Sanjaya <bagasdotme@gmail.com> |
iommupt: Actually correct pt_test_sw_bit_{acquire_release}() parameter description
In review comment for v1 of genpt documentation fixes [1], Randy suggested that pt_test_sw_bit_acquire() parameters
iommupt: Actually correct pt_test_sw_bit_{acquire_release}() parameter description
In review comment for v1 of genpt documentation fixes [1], Randy suggested that pt_test_sw_bit_acquire() parameters description should be written using "to read". Commit e4dfaf25df1210 ("iommupt: Describe @bitnr parameter"), however, misunderstood the review by instead using "to read" on @bitnr parameter on both pt_test_sw_bit_acquire() and pt_test_sw_bit_release().
Actually correct the description.
[1]: https://lore.kernel.org/linux-doc/9dba0eb7-6f32-41b7-b70b-12379364585f@infradead.org/
Fixes: e4dfaf25df1210 ("iommupt: Describe @bitnr parameter") Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 9ad64801 | 07-Nov-2025 |
Joerg Roedel <joerg.roedel@amd.com> |
iommu/iommupt: Fix build error in genericpt unit-tests
Fix the include of iommu-pages.h in the KUnit tests for the IOMMU generic page-table code to make the tests compile again.
Reported-by: Thorst
iommu/iommupt: Fix build error in genericpt unit-tests
Fix the include of iommu-pages.h in the KUnit tests for the IOMMU generic page-table code to make the tests compile again.
Reported-by: Thorsten Leemhuis <linux@leemhuis.info> Closes: https://lore.kernel.org/all/9844d4cb-f517-478b-9911-b6dc1a963b8e@leemhuis.info/ Reported-by: "Longia, Amandeep Kaur" <amandeepkaur.longia@amd.com> Closes: https://lore.kernel.org/all/e641c955-25ad-4eae-b3fe-4392966cf768@amd.com/ Fixes: 1dd4187f53c3 ("iommupt: Add a kunit test for Generic Page Table") Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|
| 5cb637d9 | 06-Nov-2025 |
Jason Gunthorpe <jgg@nvidia.com> |
iommupt: Documentation fixes
Some adjustments pointed out by Randy:
"decodes an full 64-bit" -> "decodes the full 64 bit"
Correct the function parameter name for iova_to_phys()
Use the recomme
iommupt: Documentation fixes
Some adjustments pointed out by Randy:
"decodes an full 64-bit" -> "decodes the full 64 bit"
Correct the function parameter name for iova_to_phys()
Use the recommended section heading style.
Suggested-by: Randy Dunlap <rdunlap@infradead.org> Fixes: ab0b572847ac ("genpt: Add Documentation/ files") Fixes: 879ced2bab1b ("iommupt: Add the AMD IOMMU v1 page table format") Fixes: 9d4c274cd7d5 ("iommupt: Add iova_to_phys op") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
show more ...
|