Lines Matching +full:in +full:- +full:tree
1 // SPDX-License-Identifier: GPL-2.0
11 * Ext4 extents status tree core functions.
21 * According to previous discussion in Ext4 Developer Workshop, we
22 * will introduce a new structure called io tree to track all extent
23 * status in order to solve some problems that we have met
24 * (e.g. Reservation space warning), and provide extent-level locking.
25 * Delay extent tree is the first step to achieve this goal. It is
27 * extent tree, whose goal is only track delayed extents in memory to
30 * delay extent tree at the first commit. But for better understand
31 * what it does, it has been rename to extent status tree.
35 * tracked in the tree. It maintains the delayed extent when a delayed
41 * status tree and future works.
44 * In this step all extent status are tracked by extent status tree.
45 * Thus, we can first try to lookup a block mapping in this tree before
46 * finding it in extent tree. Hence, single extent cache can be removed
47 * because extent status tree can do a better job. Extents in status
48 * tree are loaded on-demand. Therefore, the extent status tree may not
49 * contain all of the extents in a file. Meanwhile we define a shrinker
50 * to reclaim memory from extent status tree because fragmented extent
51 * tree will make status tree cost too much memory. written/unwritten/-
52 * hole extents in the tree will be reclaimed by this shrinker when we
58 * Extent status tree implementation for ext4.
62 * Extent status tree tracks all extent status.
64 * 1. Why we need to implement extent status tree?
66 * Without extent status tree, ext4 identifies a delayed extent by looking
67 * up page cache, this has several deficiencies - complicated, buggy,
73 * Let us have a look at how they do without extent status tree.
74 * -- FIEMAP
77 * -- SEEK_HOLE/DATA
80 * -- bigalloc
85 * -- writeout
90 * With extent status tree implementation, FIEMAP, SEEK_HOLE/DATA,
93 * not by searching the extent tree.
97 * 2. Ext4 extent status tree impelmentation
99 * -- extent
101 * physically. Unlike extent in extent tree, this extent in ext4 is
102 * a in-memory struct, there is no corresponding on-disk data. There
106 * -- extent status tree
107 * Every inode has an extent status tree and all allocation blocks
108 * are added to the tree with different status. The extent in the
109 * tree are ordered by logical block no.
111 * -- operations on a extent status tree
112 * There are three important operations on a delayed extent tree: find
115 * -- race on a extent status tree
116 * Extent status tree is protected by inode->i_es_lock.
118 * -- memory consumption
119 * Fragmented extent tree will make extent status tree cost too much
121 * the tree under a heavy memory pressure.
127 * -- overhead
129 * not very random, adding space operaions are in O(1) time.
131 * -- gain
139 * -- Refactor delayed space reservation
141 * -- Extent-level locking
163 return -ENOMEM; in ext4_init_es()
172 void ext4_es_init_tree(struct ext4_es_tree *tree) in ext4_es_init_tree() argument
174 tree->root = RB_ROOT; in ext4_es_init_tree()
175 tree->cache_es = NULL; in ext4_es_init_tree()
181 struct ext4_es_tree *tree; in ext4_es_print_tree() local
184 printk(KERN_DEBUG "status extents for inode %lu:", inode->i_ino); in ext4_es_print_tree()
185 tree = &EXT4_I(inode)->i_es_tree; in ext4_es_print_tree()
186 node = rb_first(&tree->root); in ext4_es_print_tree()
191 es->es_lblk, es->es_len, in ext4_es_print_tree()
203 BUG_ON(es->es_lblk + es->es_len < es->es_lblk); in ext4_es_end()
204 return es->es_lblk + es->es_len - 1; in ext4_es_end()
208 * search through the tree for an delayed extent with a given offset. If
214 struct rb_node *node = root->rb_node; in __es_tree_search()
219 if (lblk < es->es_lblk) in __es_tree_search()
220 node = node->rb_left; in __es_tree_search()
222 node = node->rb_right; in __es_tree_search()
227 if (es && lblk < es->es_lblk) in __es_tree_search()
231 node = rb_next(&es->rb_node); in __es_tree_search()
240 * ext4_es_find_extent_range - find extent with specified status within block
241 * range or next extent following block range in
242 * extents status tree
244 * @inode - file containing the range
245 * @matching_fn - pointer to function that matches extents with desired status
246 * @lblk - logical block defining start of range
247 * @end - logical block defining end of range
248 * @es - extent found, if any
251 * in the extents status tree that satisfies @matching_fn. If a match
252 * is found, it's returned in @es. If not, and a matching extent is found
253 * beyond the block range, it's returned in @es. If no match is found, an
254 * extent is returned in @es whose es_lblk, es_len, and es_pblk components
262 struct ext4_es_tree *tree = NULL; in __es_find_extent_range() local
269 tree = &EXT4_I(inode)->i_es_tree; in __es_find_extent_range()
272 es->es_lblk = es->es_len = es->es_pblk = 0; in __es_find_extent_range()
273 es1 = READ_ONCE(tree->cache_es); in __es_find_extent_range()
274 if (es1 && in_range(lblk, es1->es_lblk, es1->es_len)) { in __es_find_extent_range()
276 lblk, es1->es_lblk, es1->es_len, in __es_find_extent_range()
281 es1 = __es_tree_search(&tree->root, lblk); in __es_find_extent_range()
285 while ((node = rb_next(&es1->rb_node)) != NULL) { in __es_find_extent_range()
287 if (es1->es_lblk > end) { in __es_find_extent_range()
297 WRITE_ONCE(tree->cache_es, es1); in __es_find_extent_range()
298 es->es_lblk = es1->es_lblk; in __es_find_extent_range()
299 es->es_len = es1->es_len; in __es_find_extent_range()
300 es->es_pblk = es1->es_pblk; in __es_find_extent_range()
313 es->es_lblk = es->es_len = es->es_pblk = 0; in ext4_es_find_extent_range()
315 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_find_extent_range()
320 read_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_find_extent_range()
322 read_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_find_extent_range()
328 * __es_scan_range - search block range for block with specified status
329 * in extents status tree
331 * @inode - file containing the range
332 * @matching_fn - pointer to function that matches extents with desired status
333 * @lblk - logical block defining start of range
334 * @end - logical block defining end of range
336 * Returns true if at least one block in the specified block range satisfies
339 * in the cluster with that status. Should only be called by code that has
350 return false; /* no matching extent in the tree */ in __es_scan_range()
368 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_scan_range()
371 read_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_scan_range()
373 read_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_scan_range()
379 * __es_scan_clu - search cluster for block with specified status in
380 * extents status tree
382 * @inode - file containing the cluster
383 * @matching_fn - pointer to function that matches extents with desired status
384 * @lblk - logical block in cluster to be searched
386 * Returns true if at least one extent in the cluster containing @lblk
389 * in the cluster with that status. Should only be called by code that has
396 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in __es_scan_clu()
400 lblk_end = lblk_start + sbi->s_cluster_ratio - 1; in __es_scan_clu()
414 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_scan_clu()
417 read_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_scan_clu()
419 read_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_scan_clu()
427 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in ext4_es_list_add()
429 if (!list_empty(&ei->i_es_list)) in ext4_es_list_add()
432 spin_lock(&sbi->s_es_lock); in ext4_es_list_add()
433 if (list_empty(&ei->i_es_list)) { in ext4_es_list_add()
434 list_add_tail(&ei->i_es_list, &sbi->s_es_list); in ext4_es_list_add()
435 sbi->s_es_nr_inode++; in ext4_es_list_add()
437 spin_unlock(&sbi->s_es_lock); in ext4_es_list_add()
443 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in ext4_es_list_del()
445 spin_lock(&sbi->s_es_lock); in ext4_es_list_del()
446 if (!list_empty(&ei->i_es_list)) { in ext4_es_list_del()
447 list_del_init(&ei->i_es_list); in ext4_es_list_del()
448 sbi->s_es_nr_inode--; in ext4_es_list_del()
449 WARN_ON_ONCE(sbi->s_es_nr_inode < 0); in ext4_es_list_del()
451 spin_unlock(&sbi->s_es_lock); in ext4_es_list_del()
491 es->es_lblk = lblk; in ext4_es_init_extent()
492 es->es_len = len; in ext4_es_init_extent()
493 es->es_pblk = pblk; in ext4_es_init_extent()
497 if (!EXT4_I(inode)->i_es_shk_nr++) in ext4_es_init_extent()
499 percpu_counter_inc(&EXT4_SB(inode->i_sb)-> in ext4_es_init_extent()
503 EXT4_I(inode)->i_es_all_nr++; in ext4_es_init_extent()
504 percpu_counter_inc(&EXT4_SB(inode->i_sb)->s_es_stats.es_stats_all_cnt); in ext4_es_init_extent()
514 EXT4_I(inode)->i_es_all_nr--; in ext4_es_free_extent()
515 percpu_counter_dec(&EXT4_SB(inode->i_sb)->s_es_stats.es_stats_all_cnt); in ext4_es_free_extent()
519 BUG_ON(EXT4_I(inode)->i_es_shk_nr == 0); in ext4_es_free_extent()
520 if (!--EXT4_I(inode)->i_es_shk_nr) in ext4_es_free_extent()
522 percpu_counter_dec(&EXT4_SB(inode->i_sb)-> in ext4_es_free_extent()
532 * - logical block number is contiguous
533 * - physical block number is contiguous
534 * - status is equal
542 if (((__u64) es1->es_len) + es2->es_len > EXT_MAX_BLOCKS) { in ext4_es_can_be_merged()
546 es1->es_len, es2->es_len, EXT_MAX_BLOCKS); in ext4_es_can_be_merged()
551 if (((__u64) es1->es_lblk) + es1->es_len != es2->es_lblk) in ext4_es_can_be_merged()
555 (ext4_es_pblock(es1) + es1->es_len == ext4_es_pblock(es2))) in ext4_es_can_be_merged()
571 struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; in ext4_es_try_to_merge_left() local
575 node = rb_prev(&es->rb_node); in ext4_es_try_to_merge_left()
581 es1->es_len += es->es_len; in ext4_es_try_to_merge_left()
584 rb_erase(&es->rb_node, &tree->root); in ext4_es_try_to_merge_left()
595 struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; in ext4_es_try_to_merge_right() local
599 node = rb_next(&es->rb_node); in ext4_es_try_to_merge_right()
605 es->es_len += es1->es_len; in ext4_es_try_to_merge_right()
608 rb_erase(node, &tree->root); in ext4_es_try_to_merge_right()
628 path = ext4_find_extent(inode, es->es_lblk, NULL, EXT4_EX_NOCACHE); in ext4_es_insert_extent_ext_check()
637 ee_block = le32_to_cpu(ex->ee_block); in ext4_es_insert_extent_ext_check()
649 if (in_range(es->es_lblk, ee_block, ee_len)) { in ext4_es_insert_extent_ext_check()
655 inode->i_ino, ee_block, ee_len, in ext4_es_insert_extent_ext_check()
657 es->es_lblk, es->es_len, in ext4_es_insert_extent_ext_check()
664 * We don't check ee_block == es->es_lblk, etc. because es in ext4_es_insert_extent_ext_check()
667 if (es->es_lblk < ee_block || in ext4_es_insert_extent_ext_check()
668 ext4_es_pblock(es) != ee_start + es->es_lblk - ee_block) { in ext4_es_insert_extent_ext_check()
671 "es_status [%d/%d/%llu/%c]\n", inode->i_ino, in ext4_es_insert_extent_ext_check()
673 ee_status ? 'u' : 'w', es->es_lblk, es->es_len, in ext4_es_insert_extent_ext_check()
681 "es_status [%d/%d/%llu/%c]\n", inode->i_ino, in ext4_es_insert_extent_ext_check()
683 ee_status ? 'u' : 'w', es->es_lblk, es->es_len, in ext4_es_insert_extent_ext_check()
695 "[%d/%d/%llu/%x]\n", inode->i_ino, in ext4_es_insert_extent_ext_check()
696 es->es_lblk, es->es_lblk, es->es_len, in ext4_es_insert_extent_ext_check()
712 * 'Indirect' structure is defined in indirect.c. So we couldn't in ext4_es_insert_extent_ind_check()
713 * access direct/indirect tree from outside. It is too dirty to define in ext4_es_insert_extent_ind_check()
714 * this function in indirect.c file. in ext4_es_insert_extent_ind_check()
717 map.m_lblk = es->es_lblk; in ext4_es_insert_extent_ind_check()
718 map.m_len = es->es_len; in ext4_es_insert_extent_ind_check()
730 inode->i_ino, es->es_lblk, es->es_len, in ext4_es_insert_extent_ind_check()
734 if (retval != es->es_len) { in ext4_es_insert_extent_ind_check()
737 inode->i_ino, retval, es->es_len); in ext4_es_insert_extent_ind_check()
744 inode->i_ino, map.m_pblk, in ext4_es_insert_extent_ind_check()
751 * indirect-based file doesn't have it. in ext4_es_insert_extent_ind_check()
760 inode->i_ino, es->es_lblk, es->es_len, in ext4_es_insert_extent_ind_check()
774 BUG_ON(!rwsem_is_locked(&EXT4_I(inode)->i_data_sem)); in ext4_es_insert_extent_check()
790 struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; in __es_insert_extent() local
791 struct rb_node **p = &tree->root.rb_node; in __es_insert_extent()
799 if (newes->es_lblk < es->es_lblk) { in __es_insert_extent()
805 es->es_lblk = newes->es_lblk; in __es_insert_extent()
806 es->es_len += newes->es_len; in __es_insert_extent()
810 newes->es_pblk); in __es_insert_extent()
814 p = &(*p)->rb_left; in __es_insert_extent()
815 } else if (newes->es_lblk > ext4_es_end(es)) { in __es_insert_extent()
817 es->es_len += newes->es_len; in __es_insert_extent()
821 p = &(*p)->rb_right; in __es_insert_extent()
824 return -EINVAL; in __es_insert_extent()
833 return -ENOMEM; in __es_insert_extent()
834 ext4_es_init_extent(inode, es, newes->es_lblk, newes->es_len, in __es_insert_extent()
835 newes->es_pblk); in __es_insert_extent()
837 rb_link_node(&es->rb_node, parent, p); in __es_insert_extent()
838 rb_insert_color(&es->rb_node, &tree->root); in __es_insert_extent()
841 tree->cache_es = es; in __es_insert_extent()
847 * status tree.
854 ext4_lblk_t end = lblk + len - 1; in ext4_es_insert_extent()
857 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in ext4_es_insert_extent()
863 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_insert_extent()
866 es_debug("add [%u/%u) %llu %x %d to extent status tree of inode %lu\n", in ext4_es_insert_extent()
867 lblk, len, pblk, status, delalloc_reserve_used, inode->i_ino); in ext4_es_insert_extent()
882 revise_pending = sbi->s_cluster_ratio > 1 && in ext4_es_insert_extent()
883 test_opt(inode->i_sb, DELALLOC) && in ext4_es_insert_extent()
893 write_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_insert_extent()
900 if (!es1->es_len) in ext4_es_insert_extent()
906 if (err2 == -ENOMEM && !ext4_es_must_keep(&newes)) in ext4_es_insert_extent()
912 if (!es2->es_len) in ext4_es_insert_extent()
928 write_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_insert_extent()
933 * cluster is not included in the reserved count. in ext4_es_insert_extent()
938 * non-delayed allocated blocks (e.g. hole) or 2) contains non-delayed in ext4_es_insert_extent()
959 * tree if and only if there isn't information about the range in
968 ext4_lblk_t end = lblk + len - 1; in ext4_es_cache_extent()
970 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_cache_extent()
983 write_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_cache_extent()
985 es = __es_tree_search(&EXT4_I(inode)->i_es_tree.root, lblk); in ext4_es_cache_extent()
986 if (!es || es->es_lblk > end) in ext4_es_cache_extent()
988 write_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_cache_extent()
992 * ext4_es_lookup_extent() looks up an extent in extent status tree.
1002 struct ext4_es_tree *tree; in ext4_es_lookup_extent() local
1008 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_lookup_extent()
1012 es_debug("lookup extent in block %u\n", lblk); in ext4_es_lookup_extent()
1014 tree = &EXT4_I(inode)->i_es_tree; in ext4_es_lookup_extent()
1015 read_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_lookup_extent()
1017 /* find extent in cache firstly */ in ext4_es_lookup_extent()
1018 es->es_lblk = es->es_len = es->es_pblk = 0; in ext4_es_lookup_extent()
1019 es1 = READ_ONCE(tree->cache_es); in ext4_es_lookup_extent()
1020 if (es1 && in_range(lblk, es1->es_lblk, es1->es_len)) { in ext4_es_lookup_extent()
1022 lblk, es1->es_lblk, es1->es_len); in ext4_es_lookup_extent()
1027 node = tree->root.rb_node; in ext4_es_lookup_extent()
1030 if (lblk < es1->es_lblk) in ext4_es_lookup_extent()
1031 node = node->rb_left; in ext4_es_lookup_extent()
1033 node = node->rb_right; in ext4_es_lookup_extent()
1041 stats = &EXT4_SB(inode->i_sb)->s_es_stats; in ext4_es_lookup_extent()
1044 es->es_lblk = es1->es_lblk; in ext4_es_lookup_extent()
1045 es->es_len = es1->es_len; in ext4_es_lookup_extent()
1046 es->es_pblk = es1->es_pblk; in ext4_es_lookup_extent()
1049 percpu_counter_inc(&stats->es_stats_cache_hits); in ext4_es_lookup_extent()
1051 node = rb_next(&es1->rb_node); in ext4_es_lookup_extent()
1055 *next_lblk = es1->es_lblk; in ext4_es_lookup_extent()
1060 percpu_counter_inc(&stats->es_stats_cache_misses); in ext4_es_lookup_extent()
1063 read_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_lookup_extent()
1080 * init_rsvd - initialize reserved count data before removing block range
1081 * in file from extent status tree
1083 * @inode - file containing range
1084 * @lblk - first block in range
1085 * @es - pointer to first extent in range
1086 * @rc - pointer to reserved count data
1093 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in init_rsvd()
1096 rc->ndelayed = 0; in init_rsvd()
1099 * for bigalloc, note the first delayed block in the range has not in init_rsvd()
1104 if (sbi->s_cluster_ratio > 1) { in init_rsvd()
1105 rc->first_do_lblk_found = false; in init_rsvd()
1106 if (lblk > es->es_lblk) { in init_rsvd()
1107 rc->left_es = es; in init_rsvd()
1109 node = rb_prev(&es->rb_node); in init_rsvd()
1110 rc->left_es = node ? rb_entry(node, in init_rsvd()
1114 rc->partial = false; in init_rsvd()
1119 * count_rsvd - count the clusters containing delayed blocks in a range
1120 * within an extent and add to the running tally in rsvd_count
1122 * @inode - file containing extent
1123 * @lblk - first block in range
1124 * @len - length of range in blocks
1125 * @es - pointer to extent containing clusters to be counted
1126 * @rc - pointer to reserved count data
1134 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in count_rsvd()
1142 if (sbi->s_cluster_ratio == 1) { in count_rsvd()
1143 rc->ndelayed += (int) len; in count_rsvd()
1149 i = (lblk < es->es_lblk) ? es->es_lblk : lblk; in count_rsvd()
1150 end = lblk + (ext4_lblk_t) len - 1; in count_rsvd()
1154 if (!rc->first_do_lblk_found) { in count_rsvd()
1155 rc->first_do_lblk = i; in count_rsvd()
1156 rc->first_do_lblk_found = true; in count_rsvd()
1159 /* update the last lblk in the region seen so far */ in count_rsvd()
1160 rc->last_do_lblk = end; in count_rsvd()
1166 if (rc->partial && (rc->lclu != EXT4_B2C(sbi, i))) { in count_rsvd()
1167 rc->ndelayed++; in count_rsvd()
1168 rc->partial = false; in count_rsvd()
1177 rc->ndelayed++; in count_rsvd()
1178 rc->partial = false; in count_rsvd()
1185 * number of whole delayed clusters in the extent in count_rsvd()
1187 if ((i + sbi->s_cluster_ratio - 1) <= end) { in count_rsvd()
1188 nclu = (end - i + 1) >> sbi->s_cluster_bits; in count_rsvd()
1189 rc->ndelayed += nclu; in count_rsvd()
1190 i += nclu << sbi->s_cluster_bits; in count_rsvd()
1197 if (!rc->partial && i <= end) { in count_rsvd()
1198 rc->partial = true; in count_rsvd()
1199 rc->lclu = EXT4_B2C(sbi, i); in count_rsvd()
1204 * __pr_tree_search - search for a pending cluster reservation
1206 * @root - root of pending reservation tree
1207 * @lclu - logical cluster to search for
1216 struct rb_node *node = root->rb_node; in __pr_tree_search()
1221 if (lclu < pr->lclu) in __pr_tree_search()
1222 node = node->rb_left; in __pr_tree_search()
1223 else if (lclu > pr->lclu) in __pr_tree_search()
1224 node = node->rb_right; in __pr_tree_search()
1228 if (pr && lclu < pr->lclu) in __pr_tree_search()
1230 if (pr && lclu > pr->lclu) { in __pr_tree_search()
1231 node = rb_next(&pr->rb_node); in __pr_tree_search()
1239 * get_rsvd - calculates and returns the number of cluster reservations to be
1240 * released when removing a block range from the extent status tree
1243 * @inode - file containing block range
1244 * @end - last block in range
1245 * @right_es - pointer to extent containing next block beyond end or NULL
1246 * @rc - pointer to reserved count data
1257 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in get_rsvd()
1259 struct ext4_pending_tree *tree = &EXT4_I(inode)->i_pending_tree; in get_rsvd() local
1265 if (sbi->s_cluster_ratio > 1) { in get_rsvd()
1267 if (rc->partial) in get_rsvd()
1268 rc->ndelayed++; in get_rsvd()
1270 if (rc->ndelayed == 0) in get_rsvd()
1273 first_lclu = EXT4_B2C(sbi, rc->first_do_lblk); in get_rsvd()
1274 last_lclu = EXT4_B2C(sbi, rc->last_do_lblk); in get_rsvd()
1278 * ends of the range that still contain delayed blocks - in get_rsvd()
1283 es = rc->left_es; in get_rsvd()
1285 EXT4_LBLK_CMASK(sbi, rc->first_do_lblk)) { in get_rsvd()
1287 rc->ndelayed--; in get_rsvd()
1291 node = rb_prev(&es->rb_node); in get_rsvd()
1300 node = rb_next(&right_es->rb_node); in get_rsvd()
1304 while (es && es->es_lblk <= in get_rsvd()
1305 EXT4_LBLK_CFILL(sbi, rc->last_do_lblk)) { in get_rsvd()
1307 rc->ndelayed--; in get_rsvd()
1311 node = rb_next(&es->rb_node); in get_rsvd()
1325 * should be released with the information available in the in get_rsvd()
1326 * extents status tree. in get_rsvd()
1337 last_lclu--; in get_rsvd()
1351 pr = __pr_tree_search(&tree->root, first_lclu); in get_rsvd()
1352 while (pr && pr->lclu <= last_lclu) { in get_rsvd()
1353 rc->ndelayed--; in get_rsvd()
1354 node = rb_next(&pr->rb_node); in get_rsvd()
1355 rb_erase(&pr->rb_node, &tree->root); in get_rsvd()
1364 return rc->ndelayed; in get_rsvd()
1369 * __es_remove_extent - removes block range from extent status tree
1371 * @inode - file containing range
1372 * @lblk - first block in range
1373 * @end - last block in range
1374 * @reserved - number of cluster reservations released
1375 * @prealloc - pre-allocated es to avoid memory allocation failures
1386 struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; in __es_remove_extent() local
1396 if (reserved == NULL || !test_opt(inode->i_sb, DELALLOC)) in __es_remove_extent()
1399 es = __es_tree_search(&tree->root, lblk); in __es_remove_extent()
1402 if (es->es_lblk > end) in __es_remove_extent()
1406 tree->cache_es = NULL; in __es_remove_extent()
1410 orig_es.es_lblk = es->es_lblk; in __es_remove_extent()
1411 orig_es.es_len = es->es_len; in __es_remove_extent()
1412 orig_es.es_pblk = es->es_pblk; in __es_remove_extent()
1414 len1 = lblk > es->es_lblk ? lblk - es->es_lblk : 0; in __es_remove_extent()
1415 len2 = ext4_es_end(es) > end ? ext4_es_end(es) - end : 0; in __es_remove_extent()
1417 es->es_len = len1; in __es_remove_extent()
1428 orig_es.es_len - len2; in __es_remove_extent()
1436 es->es_lblk = orig_es.es_lblk; in __es_remove_extent()
1437 es->es_len = orig_es.es_len; in __es_remove_extent()
1441 es->es_lblk = end + 1; in __es_remove_extent()
1442 es->es_len = len2; in __es_remove_extent()
1445 block = orig_es.es_pblk + orig_es.es_len - len2; in __es_remove_extent()
1451 orig_es.es_len - len1 - len2, &orig_es, &rc); in __es_remove_extent()
1457 count_rsvd(inode, lblk, orig_es.es_len - len1, in __es_remove_extent()
1459 node = rb_next(&es->rb_node); in __es_remove_extent()
1468 count_rsvd(inode, es->es_lblk, es->es_len, es, &rc); in __es_remove_extent()
1469 node = rb_next(&es->rb_node); in __es_remove_extent()
1470 rb_erase(&es->rb_node, &tree->root); in __es_remove_extent()
1479 if (es && es->es_lblk < end + 1) { in __es_remove_extent()
1480 ext4_lblk_t orig_len = es->es_len; in __es_remove_extent()
1482 len1 = ext4_es_end(es) - end; in __es_remove_extent()
1484 count_rsvd(inode, es->es_lblk, orig_len - len1, in __es_remove_extent()
1486 es->es_lblk = end + 1; in __es_remove_extent()
1487 es->es_len = len1; in __es_remove_extent()
1489 block = es->es_pblk + orig_len - len1; in __es_remove_extent()
1502 * ext4_es_remove_extent - removes block range from extent status tree
1504 * @inode - file containing range
1505 * @lblk - first block in range
1506 * @len - number of blocks to remove
1519 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_remove_extent()
1523 es_debug("remove [%u/%u) from extent status tree of inode %lu\n", in ext4_es_remove_extent()
1524 lblk, len, inode->i_ino); in ext4_es_remove_extent()
1529 end = lblk + len - 1; in ext4_es_remove_extent()
1540 write_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_remove_extent()
1544 if (!es->es_len) in ext4_es_remove_extent()
1548 write_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_remove_extent()
1568 es_stats = &sbi->s_es_stats; in __es_shrink()
1572 spin_lock(&sbi->s_es_lock); in __es_shrink()
1573 nr_to_walk = sbi->s_es_nr_inode; in __es_shrink()
1574 while (nr_to_walk-- > 0) { in __es_shrink()
1575 if (list_empty(&sbi->s_es_list)) { in __es_shrink()
1576 spin_unlock(&sbi->s_es_lock); in __es_shrink()
1579 ei = list_first_entry(&sbi->s_es_list, struct ext4_inode_info, in __es_shrink()
1582 list_move_tail(&ei->i_es_list, &sbi->s_es_list); in __es_shrink()
1588 if (!retried && ext4_test_inode_state(&ei->vfs_inode, in __es_shrink()
1594 if (ei == locked_ei || !write_trylock(&ei->i_es_lock)) { in __es_shrink()
1602 spin_unlock(&sbi->s_es_lock); in __es_shrink()
1605 write_unlock(&ei->i_es_lock); in __es_shrink()
1609 spin_lock(&sbi->s_es_lock); in __es_shrink()
1611 spin_unlock(&sbi->s_es_lock); in __es_shrink()
1627 if (likely(es_stats->es_stats_scan_time)) in __es_shrink()
1628 es_stats->es_stats_scan_time = (scan_time + in __es_shrink()
1629 es_stats->es_stats_scan_time*3) / 4; in __es_shrink()
1631 es_stats->es_stats_scan_time = scan_time; in __es_shrink()
1632 if (scan_time > es_stats->es_stats_max_scan_time) in __es_shrink()
1633 es_stats->es_stats_max_scan_time = scan_time; in __es_shrink()
1634 if (likely(es_stats->es_stats_shrunk)) in __es_shrink()
1635 es_stats->es_stats_shrunk = (nr_shrunk + in __es_shrink()
1636 es_stats->es_stats_shrunk*3) / 4; in __es_shrink()
1638 es_stats->es_stats_shrunk = nr_shrunk; in __es_shrink()
1640 trace_ext4_es_shrink(sbi->s_sb, nr_shrunk, scan_time, in __es_shrink()
1651 sbi = shrink->private_data; in ext4_es_count()
1652 nr = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt); in ext4_es_count()
1653 trace_ext4_es_shrink_count(sbi->s_sb, sc->nr_to_scan, nr); in ext4_es_count()
1660 struct ext4_sb_info *sbi = shrink->private_data; in ext4_es_scan()
1661 int nr_to_scan = sc->nr_to_scan; in ext4_es_scan()
1664 ret = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt); in ext4_es_scan()
1665 trace_ext4_es_shrink_scan_enter(sbi->s_sb, nr_to_scan, ret); in ext4_es_scan()
1669 ret = percpu_counter_read_positive(&sbi->s_es_stats.es_stats_shk_cnt); in ext4_es_scan()
1670 trace_ext4_es_shrink_scan_exit(sbi->s_sb, nr_shrunk, ret); in ext4_es_scan()
1676 struct ext4_sb_info *sbi = EXT4_SB((struct super_block *) seq->private); in ext4_seq_es_shrinker_info_show()
1677 struct ext4_es_stats *es_stats = &sbi->s_es_stats; in ext4_seq_es_shrinker_info_show()
1685 spin_lock(&sbi->s_es_lock); in ext4_seq_es_shrinker_info_show()
1686 list_for_each_entry(ei, &sbi->s_es_list, i_es_list) { in ext4_seq_es_shrinker_info_show()
1688 if (max && max->i_es_all_nr < ei->i_es_all_nr) in ext4_seq_es_shrinker_info_show()
1693 spin_unlock(&sbi->s_es_lock); in ext4_seq_es_shrinker_info_show()
1696 percpu_counter_sum_positive(&es_stats->es_stats_all_cnt), in ext4_seq_es_shrinker_info_show()
1697 percpu_counter_sum_positive(&es_stats->es_stats_shk_cnt)); in ext4_seq_es_shrinker_info_show()
1699 percpu_counter_sum_positive(&es_stats->es_stats_cache_hits), in ext4_seq_es_shrinker_info_show()
1700 percpu_counter_sum_positive(&es_stats->es_stats_cache_misses)); in ext4_seq_es_shrinker_info_show()
1705 div_u64(es_stats->es_stats_scan_time, 1000)); in ext4_seq_es_shrinker_info_show()
1706 seq_printf(seq, " %lu shrunk objects\n", es_stats->es_stats_shrunk); in ext4_seq_es_shrinker_info_show()
1711 max->vfs_inode.i_ino, max->i_es_all_nr, max->i_es_shk_nr, in ext4_seq_es_shrinker_info_show()
1712 div_u64(es_stats->es_stats_max_scan_time, 1000)); in ext4_seq_es_shrinker_info_show()
1723 INIT_LIST_HEAD(&sbi->s_es_list); in ext4_es_register_shrinker()
1724 sbi->s_es_nr_inode = 0; in ext4_es_register_shrinker()
1725 spin_lock_init(&sbi->s_es_lock); in ext4_es_register_shrinker()
1726 sbi->s_es_stats.es_stats_shrunk = 0; in ext4_es_register_shrinker()
1727 err = percpu_counter_init(&sbi->s_es_stats.es_stats_cache_hits, 0, in ext4_es_register_shrinker()
1731 err = percpu_counter_init(&sbi->s_es_stats.es_stats_cache_misses, 0, in ext4_es_register_shrinker()
1735 sbi->s_es_stats.es_stats_scan_time = 0; in ext4_es_register_shrinker()
1736 sbi->s_es_stats.es_stats_max_scan_time = 0; in ext4_es_register_shrinker()
1737 err = percpu_counter_init(&sbi->s_es_stats.es_stats_all_cnt, 0, GFP_KERNEL); in ext4_es_register_shrinker()
1740 err = percpu_counter_init(&sbi->s_es_stats.es_stats_shk_cnt, 0, GFP_KERNEL); in ext4_es_register_shrinker()
1744 sbi->s_es_shrinker = shrinker_alloc(0, "ext4-es:%s", sbi->s_sb->s_id); in ext4_es_register_shrinker()
1745 if (!sbi->s_es_shrinker) { in ext4_es_register_shrinker()
1746 err = -ENOMEM; in ext4_es_register_shrinker()
1750 sbi->s_es_shrinker->scan_objects = ext4_es_scan; in ext4_es_register_shrinker()
1751 sbi->s_es_shrinker->count_objects = ext4_es_count; in ext4_es_register_shrinker()
1752 sbi->s_es_shrinker->private_data = sbi; in ext4_es_register_shrinker()
1754 shrinker_register(sbi->s_es_shrinker); in ext4_es_register_shrinker()
1758 percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt); in ext4_es_register_shrinker()
1760 percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt); in ext4_es_register_shrinker()
1762 percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_misses); in ext4_es_register_shrinker()
1764 percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_hits); in ext4_es_register_shrinker()
1770 percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_hits); in ext4_es_unregister_shrinker()
1771 percpu_counter_destroy(&sbi->s_es_stats.es_stats_cache_misses); in ext4_es_unregister_shrinker()
1772 percpu_counter_destroy(&sbi->s_es_stats.es_stats_all_cnt); in ext4_es_unregister_shrinker()
1773 percpu_counter_destroy(&sbi->s_es_stats.es_stats_shk_cnt); in ext4_es_unregister_shrinker()
1774 shrinker_free(sbi->s_es_shrinker); in ext4_es_unregister_shrinker()
1778 * Shrink extents in given inode from ei->i_es_shrink_lblk till end. Scan at
1781 * Return 0 if we hit end of tree / interval, 1 if we exhausted nr_to_scan.
1783 * ei->i_es_shrink_lblk to where we should continue scanning.
1788 struct inode *inode = &ei->vfs_inode; in es_do_reclaim_extents()
1789 struct ext4_es_tree *tree = &ei->i_es_tree; in es_do_reclaim_extents() local
1793 es = __es_tree_search(&tree->root, ei->i_es_shrink_lblk); in es_do_reclaim_extents()
1798 if (es->es_lblk > end) { in es_do_reclaim_extents()
1799 ei->i_es_shrink_lblk = end + 1; in es_do_reclaim_extents()
1803 (*nr_to_scan)--; in es_do_reclaim_extents()
1804 node = rb_next(&es->rb_node); in es_do_reclaim_extents()
1813 rb_erase(&es->rb_node, &tree->root); in es_do_reclaim_extents()
1821 ei->i_es_shrink_lblk = es->es_lblk; in es_do_reclaim_extents()
1824 ei->i_es_shrink_lblk = 0; in es_do_reclaim_extents()
1830 struct inode *inode = &ei->vfs_inode; in es_reclaim_extents()
1832 ext4_lblk_t start = ei->i_es_shrink_lblk; in es_reclaim_extents()
1836 if (ei->i_es_shk_nr == 0) in es_reclaim_extents()
1841 ext4_warning(inode->i_sb, "forced shrink of precached extents"); in es_reclaim_extents()
1845 es_do_reclaim_extents(ei, start - 1, nr_to_scan, &nr_shrunk); in es_reclaim_extents()
1847 ei->i_es_tree.cache_es = NULL; in es_reclaim_extents()
1860 struct ext4_es_tree *tree; in ext4_clear_inode_es() local
1863 write_lock(&ei->i_es_lock); in ext4_clear_inode_es()
1864 tree = &EXT4_I(inode)->i_es_tree; in ext4_clear_inode_es()
1865 tree->cache_es = NULL; in ext4_clear_inode_es()
1866 node = rb_first(&tree->root); in ext4_clear_inode_es()
1871 rb_erase(&es->rb_node, &tree->root); in ext4_clear_inode_es()
1876 write_unlock(&ei->i_es_lock); in ext4_clear_inode_es()
1882 struct ext4_pending_tree *tree; in ext4_print_pending_tree() local
1886 printk(KERN_DEBUG "pending reservations for inode %lu:", inode->i_ino); in ext4_print_pending_tree()
1887 tree = &EXT4_I(inode)->i_pending_tree; in ext4_print_pending_tree()
1888 node = rb_first(&tree->root); in ext4_print_pending_tree()
1891 printk(KERN_DEBUG " %u", pr->lclu); in ext4_print_pending_tree()
1904 return -ENOMEM; in ext4_init_pending()
1913 void ext4_init_pending_tree(struct ext4_pending_tree *tree) in ext4_init_pending_tree() argument
1915 tree->root = RB_ROOT; in ext4_init_pending_tree()
1919 * __get_pending - retrieve a pointer to a pending reservation
1921 * @inode - file containing the pending cluster reservation
1922 * @lclu - logical cluster of interest
1930 struct ext4_pending_tree *tree; in __get_pending() local
1934 tree = &EXT4_I(inode)->i_pending_tree; in __get_pending()
1935 node = (&tree->root)->rb_node; in __get_pending()
1939 if (lclu < pr->lclu) in __get_pending()
1940 node = node->rb_left; in __get_pending()
1941 else if (lclu > pr->lclu) in __get_pending()
1942 node = node->rb_right; in __get_pending()
1943 else if (lclu == pr->lclu) in __get_pending()
1950 * __insert_pending - adds a pending cluster reservation to the set of
1953 * @inode - file containing the cluster
1954 * @lblk - logical block in the cluster to be added
1955 * @prealloc - preallocated pending entry
1957 * Returns 1 on successful insertion and -ENOMEM on failure. If the
1958 * pending reservation is already in the set, returns successfully.
1963 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in __insert_pending()
1964 struct ext4_pending_tree *tree = &EXT4_I(inode)->i_pending_tree; in __insert_pending() local
1965 struct rb_node **p = &tree->root.rb_node; in __insert_pending()
1977 if (lclu < pr->lclu) { in __insert_pending()
1978 p = &(*p)->rb_left; in __insert_pending()
1979 } else if (lclu > pr->lclu) { in __insert_pending()
1980 p = &(*p)->rb_right; in __insert_pending()
1990 ret = -ENOMEM; in __insert_pending()
1997 pr->lclu = lclu; in __insert_pending()
1999 rb_link_node(&pr->rb_node, parent, p); in __insert_pending()
2000 rb_insert_color(&pr->rb_node, &tree->root); in __insert_pending()
2008 * __remove_pending - removes a pending cluster reservation from the set
2011 * @inode - file containing the cluster
2012 * @lblk - logical block in the pending cluster reservation to be removed
2018 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in __remove_pending()
2020 struct ext4_pending_tree *tree; in __remove_pending() local
2024 tree = &EXT4_I(inode)->i_pending_tree; in __remove_pending()
2025 rb_erase(&pr->rb_node, &tree->root); in __remove_pending()
2031 * ext4_remove_pending - removes a pending cluster reservation from the set
2034 * @inode - file containing the cluster
2035 * @lblk - logical block in the pending cluster reservation to be removed
2043 write_lock(&ei->i_es_lock); in ext4_remove_pending()
2045 write_unlock(&ei->i_es_lock); in ext4_remove_pending()
2049 * ext4_is_pending - determine whether a cluster has a pending reservation
2052 * @inode - file containing the cluster
2053 * @lblk - logical block in the cluster
2055 * Returns true if there's a pending reservation for the cluster in the
2060 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in ext4_is_pending()
2064 read_lock(&ei->i_es_lock); in ext4_is_pending()
2066 read_unlock(&ei->i_es_lock); in ext4_is_pending()
2072 * ext4_es_insert_delayed_extent - adds some delayed blocks to the extents
2073 * status tree, adding a pending reservation
2076 * @inode - file containing the newly added block
2077 * @lblk - start logical block to be added
2078 * @len - length of blocks to be added
2079 * @lclu_allocated/end_allocated - indicates whether a physical cluster has
2083 * if the start and the end block are in the
2090 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in ext4_es_insert_delayed_extent()
2092 ext4_lblk_t end = lblk + len - 1; in ext4_es_insert_delayed_extent()
2099 if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) in ext4_es_insert_delayed_extent()
2102 es_debug("add [%u/%u) delayed to extent status tree of inode %lu\n", in ext4_es_insert_delayed_extent()
2103 lblk, len, inode->i_ino); in ext4_es_insert_delayed_extent()
2129 write_lock(&EXT4_I(inode)->i_es_lock); in ext4_es_insert_delayed_extent()
2136 if (!es1->es_len) in ext4_es_insert_delayed_extent()
2146 if (!es2->es_len) in ext4_es_insert_delayed_extent()
2170 write_unlock(&EXT4_I(inode)->i_es_lock); in ext4_es_insert_delayed_extent()
2180 * __revise_pending - makes, cancels, or leaves unchanged pending cluster
2186 * @inode - file containing the range
2187 * @lblk - logical block defining the start of range
2188 * @len - length of range in blocks
2189 * @prealloc - preallocated pending entry
2191 * Used after a newly allocated extent is added to the extents status tree.
2192 * Requires that the extents in the range have either written or unwritten
2195 * return -ENOMEM on failure.
2201 struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); in __revise_pending()
2202 ext4_lblk_t end = lblk + len - 1; in __revise_pending()
2212 * Two cases - block range within single cluster and block range in __revise_pending()
2217 * cluster boundary, requiring that any pre-existing pending in __revise_pending()
2221 * inserted in the extents status tree due to ENOSPC. in __revise_pending()
2228 first, lblk - 1); in __revise_pending()
2236 sbi->s_cluster_ratio - 1; in __revise_pending()
2253 first, lblk - 1); in __revise_pending()
2262 last = EXT4_LBLK_CMASK(sbi, end) + sbi->s_cluster_ratio - 1; in __revise_pending()