spa_log_spacemap.c - OpenGrok cross reference for /freebsd/sys/contrib/openzfs/module/zfs/spa_log

Lines Matching full:we
41  * throughout the pool. This means that each TXG we will have to append some
42  * FREE records to almost every metaslab. With log space maps, we hold their
45  * more unflushed changes are accounted in memory, we flush a selected group
47  * when loading the pool. Flushing a metaslab to disk relieves memory as we
56  *   is activated when we create the first log space map and remains active
70  *   the metaslab haven't had its changes flushed. During import, we use this
72  *   from a TXG before msp_unflushed_txg. At that point, we also populate its
74  *   we flush that metaslab.
80  *   end of the TXG and will be destroyed when it becomes fully obsolete. We
84  *   doesn't have the changes from that log and we can therefore destroy it.
91  *   reasons. First of all, it is used during flushing where we try to flush
97  *   enabled, as we don't immediately add all of the pool's metaslabs but we
99  *   we do that is to ease these pools into the behavior of the flushing
126  *   * we modify or pop entries from its head when we flush metaslabs
127  *   * we modify or append entries to its tail when we sync changes.
138  * We keep track of the memory used by the unflushed trees from all the
139  * metaslabs [see sus_memused of spa_unflushed_stats] and we ensure that it
142  * spa_log_exceeds_memlimit()]. When we see that the memory usage of the
143  * unflushed changes are passing that threshold, we flush metaslabs, which
147  * We try to keep the total number of blocks in the log space maps in check
148  * so the log doesn't grow indefinitely and we don't induce a lot of overhead
149  * when loading the pool. At the same time we don't want to flush a lot of
151  * As a result we set a limit in the amount of blocks that we think it's
155  * In order to stay below the block limit every TXG we have to estimate how
156  * many metaslabs we need to flush based on the current rate of incoming blocks
158  * the question of how many metaslabs do we need to flush in order to get rid
159  * at least an X amount of log space map blocks. We can answer this question
162  * mentioned above comes handy as it reduces the amount of things that we have
164  * to its aggregation of data). So with that in mind, we project the incoming
166  * metaslabs would we need to flush from now in order to avoid exceeding our
167  * block limit in different points in the future (granted that we would keep
168  * flushing the same number of metaslabs for every TXG). Then we take the
177  * block size as we expect to be writing a lot of data to them at
210  * map's block size as we accumulate more changes per flush.
216  * As a rule of thumb we default this tunable to 400% based on the following:
221  *    obsolete entries, and the most recent one the least). With this we could
230  * 3] Even if [1] and [2] are slightly less than 2 each, we haven't taken into
234  *    metaslab space map entries). Depending on the workload, we've seen ~1.8
236  *    ~600%. Since most of these estimates though are workload dependent, we
239  *    Thus we could say that even in the worst
242  * That said, regardless of the number of metaslabs in the pool we need to
249  * If the number of metaslabs is small and our incoming rate is high, we could
250  * get into a situation that we are flushing all our metaslabs every TXG. Thus
251  * we always allow at least this many log blocks.
257  * terms of performance. Thus we have a hard limit in the size of the log in
263  * Also we have a hard limit in the size of the log in terms of dirty TXGs.
277  * Setting this to 0 has no effect since if the pool is idle we won't even be
278  * creating log space maps and therefore we won't be flushing. On the other
282  * The point of this tunable is to be used in extreme cases where we really
288  * Tunable that specifies how far in the past do we want to look when trying to
292  * effect of the incoming rates from the most recent TXGs as we take the
293  * average over all the blocks that we walk
409  * We typically flush the oldest flushed metaslab so the first (and oldest)
411  * we may flush the second oldest one which may be part of an entry later in
412  * the summary. Moreover, if we call into this function from metaslab_fini()
413  * the metaslabs probably won't be ordered by ms_unflushed_txg. Thus we ask
414  * for a txg as an argument so we can locate the appropriate summary entry for
421 	 * We don't track summary data for read-only pools and this function  in spa_log_summary_decrement_mscount()
437 		 * We didn't find a summary entry for this metaslab. We must be  in spa_log_summary_decrement_mscount()
451  * Update the log summary information to reflect the fact that we destroyed
452  * old log space maps. Since we can only destroy the oldest log space maps,
453  * we decrement the block count of the oldest summary entry and potentially
462  * There are certain scenarios though that don't work exactly like that so we
465  * Scenario [1]: It is possible that after we flushed the oldest flushed
466  * metaslab and we destroyed the oldest log space map, more recent logs had 0
467  * metaslabs pointing to them so we got rid of them too. This can happen due
469  * flushed metaslab was loading but we kept flushing more recently flushed
471  * we always iterate from the beginning of the summary and if blocks_gone is
472  * bigger than the block_count of the current entry we free that entry (we
473  * expect its metaslab count to be zero), we decrement blocks_gone and on to
483  * because they became obsolete after the removal. Thus, iterating as we did
486  * Scenario [3]: At times we decide to flush all the metaslabs in the pool
487  * in one TXG (either because we are exporting the pool or because our flushing
491  * we flush, entries in the summary are also destroyed. This brings a weird
492  * corner-case when we flush the last metaslab and the log space map of the
494  * are older. When that happens we are eventually left with this one last
497  * metaslabs in the pool as they all got flushed). Under this scenario we can't
500  * we close the syncing log space map). Thus we just decrement its current
502  * its metaslab count will be decremented over time as we call metaslab_fini()
540 	 * Ensure that there is no way we are trying to remove more blocks  in spa_log_summary_decrement_blkcount()
555 		 * We must be at the teardown of a spa_load() attempt that  in spa_log_sm_decrement_mscount()
626  * we flush to satisfy our block heuristic for the log spacemap
632  * flushes we would need to do for future TXGs individually to
651 	 * TXGs in the future we are so we can make our estimations.  in spa_estimate_metaslabs_to_flush()
656 	 * This variable tells us how much room do we have until we hit  in spa_estimate_metaslabs_to_flush()
657 	 * our limit. When it goes negative, it means that we've exceeded  in spa_estimate_metaslabs_to_flush()
658 	 * our limit and we need to flush.  in spa_estimate_metaslabs_to_flush()
660 	 * Note that since we start at the first TXG in the future (i.e.  in spa_estimate_metaslabs_to_flush()
661 	 * txgs_in_future starts from 1) we already decrement this  in spa_estimate_metaslabs_to_flush()
674 	 * keep the log size within the limit when we reach txgs_in_future.  in spa_estimate_metaslabs_to_flush()
682 	 * For our estimations we only look as far in the future  in spa_estimate_metaslabs_to_flush()
689 		 * If there is still room before we exceed our limit  in spa_estimate_metaslabs_to_flush()
691 		 * based on the incoming rate until we exceed it.  in spa_estimate_metaslabs_to_flush()
705 		 * At this point we're far enough into the future where  in spa_estimate_metaslabs_to_flush()
706 		 * the limit was just exceeded and we flush metaslabs  in spa_estimate_metaslabs_to_flush()
717 		 * we've done so far over the number of TXGs in the  in spa_estimate_metaslabs_to_flush()
718 		 * future that we are. The idea here is to estimate  in spa_estimate_metaslabs_to_flush()
719 		 * the average number of flushes that we should do  in spa_estimate_metaslabs_to_flush()
720 		 * every TXG so that when we are that many TXGs in the  in spa_estimate_metaslabs_to_flush()
721 		 * future we stay under the limit.  in spa_estimate_metaslabs_to_flush()
767 	 * If we don't have any metaslabs with unflushed changes  in spa_flush_metaslabs()
774 	 * During SPA export we leave a few empty TXGs to go by [see  in spa_flush_metaslabs()
781 	 * we try to flush all the metaslabs for that TXG before  in spa_flush_metaslabs()
782 	 * exporting the pool, thus we ensure that we didn't get a  in spa_flush_metaslabs()
783 	 * request of flushing everything before we attempt to return  in spa_flush_metaslabs()
792 	 * We need to generate a log space map before flushing because this  in spa_flush_metaslabs()
797 	 * That is not to say that we may generate a log space map when we  in spa_flush_metaslabs()
798 	 * don't need it. If we are flushing metaslabs, that means that we  in spa_flush_metaslabs()
799 	 * were going to write changes to disk anyway, so even if we were  in spa_flush_metaslabs()
806 	 * This variable tells us how many metaslabs we want to flush based  in spa_flush_metaslabs()
808 	 * of log space map feature). We also decrement this as we flush  in spa_flush_metaslabs()
823 	 * Ideally we would only iterate through spa_metaslabs_by_flushed  in spa_flush_metaslabs()
824 	 * using only one variable (curr). We can't do that because  in spa_flush_metaslabs()
827 	 * Thus we always keep track of the original next node of the  in spa_flush_metaslabs()
836 		 * If this metaslab has been flushed this txg then we've done  in spa_flush_metaslabs()
843 		 * If we are done flushing for the block heuristic and the  in spa_flush_metaslabs()
885 	 * Note that we can't assert that sls_mscount is not 0,  in spa_sync_close_syncing_log_sm()
887 	 * in spa_metaslabs_by_flushed is loading and we were  in spa_sync_close_syncing_log_sm()
899 	 * At this point we tried to flush as many metaslabs as we  in spa_sync_close_syncing_log_sm()
986 	 * We pass UINT64_MAX as the space map's representation size  in spa_generate_syncing_log_sm()
989 	 * to being more restrictive (given that we're already going  in spa_generate_syncing_log_sm()
1050 		 * import code path. In general, we would have placed a  in spa_ld_log_sm_metadata()
1053 		 * but since this is the import code path we can be a bit more  in spa_ld_log_sm_metadata()
1054 		 * lenient. Thus, for DEBUG bits we always cause a panic, while  in spa_ld_log_sm_metadata()
1055 		 * in production we log the error and just fail the import.  in spa_ld_log_sm_metadata()
1100 	 * If we have already flushed entries for this TXG to this  in spa_ld_log_sm_cb()
1101 	 * metaslab's space map, then ignore it. Note that we flush  in spa_ld_log_sm_cb()
1137 	 * If we are not going to do any writes there is no need  in spa_ld_log_sm_data()
1246 	 * Note that even in the case where we get here because of an  in spa_ld_log_sm_data()
1247 	 * error (e.g. error != 0), we still want to update the fields  in spa_ld_log_sm_data()
1356 	 * Note: we don't actually expect anything to change at this point  in spa_ld_log_spacemaps()
1357 	 * but we grab the config lock so we don't fail any assertions  in spa_ld_log_spacemaps()