xref: /linux/Documentation/filesystems/bcachefs/future/idle_work.rst (revision ab93e0dd72c37d378dd936f031ffb83ff2bd87ce)
1*9e260e45SKent OverstreetIdle/background work classes design doc:
2*9e260e45SKent Overstreet
3*9e260e45SKent OverstreetRight now, our behaviour at idle isn't ideal, it was designed for servers that
4*9e260e45SKent Overstreetwould be under sustained load, to keep pending work at a "medium" level, to
5*9e260e45SKent Overstreetlet work build up so we can process it in more efficient batches, while also
6*9e260e45SKent Overstreetgiving headroom for bursts in load.
7*9e260e45SKent Overstreet
8*9e260e45SKent OverstreetBut for desktops or mobile - scenarios where work is less sustained and power
9*9e260e45SKent Overstreetusage is more important - we want to operate differently, with a "rush to
10*9e260e45SKent Overstreetidle" so the system can go to sleep. We don't want to be dribbling out
11*9e260e45SKent Overstreetbackground work while the system should be idle.
12*9e260e45SKent Overstreet
13*9e260e45SKent OverstreetThe complicating factor is that there are a number of background tasks, which
14*9e260e45SKent Overstreetform a heirarchy (or a digraph, depending on how you divide it up) - one
15*9e260e45SKent Overstreetbackground task may generate work for another.
16*9e260e45SKent Overstreet
17*9e260e45SKent OverstreetThus proper idle detection needs to model this heirarchy.
18*9e260e45SKent Overstreet
19*9e260e45SKent Overstreet- Foreground writes
20*9e260e45SKent Overstreet- Page cache writeback
21*9e260e45SKent Overstreet- Copygc, rebalance
22*9e260e45SKent Overstreet- Journal reclaim
23*9e260e45SKent Overstreet
24*9e260e45SKent OverstreetWhen we implement idle detection and rush to idle, we need to be careful not
25*9e260e45SKent Overstreetto disturb too much the existing behaviour that works reasonably well when the
26*9e260e45SKent Overstreetsystem is under sustained load (or perhaps improve it in the case of
27*9e260e45SKent Overstreetrebalance, which currently does not actively attempt to let work batch up).
28*9e260e45SKent Overstreet
29*9e260e45SKent OverstreetSUSTAINED LOAD REGIME
30*9e260e45SKent Overstreet---------------------
31*9e260e45SKent Overstreet
32*9e260e45SKent OverstreetWhen the system is under continuous load, we want these jobs to run
33*9e260e45SKent Overstreetcontinuously - this is perhaps best modelled with a P/D controller, where
34*9e260e45SKent Overstreetthey'll be trying to keep a target value (i.e. fragmented disk space,
35*9e260e45SKent Overstreetavailable journal space) roughly in the middle of some range.
36*9e260e45SKent Overstreet
37*9e260e45SKent OverstreetThe goal under sustained load is to balance our ability to handle load spikes
38*9e260e45SKent Overstreetwithout running out of x resource (free disk space, free space in the
39*9e260e45SKent Overstreetjournal), while also letting some work accumululate to be batched (or become
40*9e260e45SKent Overstreetunnecessary).
41*9e260e45SKent Overstreet
42*9e260e45SKent OverstreetFor example, we don't want to run copygc too aggressively, because then it
43*9e260e45SKent Overstreetwill be evacuating buckets that would have become empty (been overwritten or
44*9e260e45SKent Overstreetdeleted) anyways, and we don't want to wait until we're almost out of free
45*9e260e45SKent Overstreetspace because then the system will behave unpredicably - suddenly we're doing
46*9e260e45SKent Overstreeta lot more work to service each write and the system becomes much slower.
47*9e260e45SKent Overstreet
48*9e260e45SKent OverstreetIDLE REGIME
49*9e260e45SKent Overstreet-----------
50*9e260e45SKent Overstreet
51*9e260e45SKent OverstreetWhen the system becomes idle, we should start flushing our pending work
52*9e260e45SKent Overstreetquicker so the system can go to sleep.
53*9e260e45SKent Overstreet
54*9e260e45SKent OverstreetNote that the definition of "idle" depends on where in the heirarchy a task
55*9e260e45SKent Overstreetis - a task should start flushing work more quickly when the task above it has
56*9e260e45SKent Overstreetstopped generating new work.
57*9e260e45SKent Overstreet
58*9e260e45SKent Overstreete.g. rebalance should start flushing more quickly when page cache writeback is
59*9e260e45SKent Overstreetidle, and journal reclaim should only start flushing more quickly when both
60*9e260e45SKent Overstreetcopygc and rebalance are idle.
61*9e260e45SKent Overstreet
62*9e260e45SKent OverstreetIt's important to let work accumulate when more work is still incoming and we
63*9e260e45SKent Overstreetstill have room, because flushing is always more efficient if we let it batch
64*9e260e45SKent Overstreetup. New writes may overwrite data before rebalance moves it, and tasks may be
65*9e260e45SKent Overstreetgenerating more updates for the btree nodes that journal reclaim needs to flush.
66*9e260e45SKent Overstreet
67*9e260e45SKent OverstreetOn idle, how much work we do at each interval should be proportional to the
68*9e260e45SKent Overstreetlength of time we have been idle for. If we're idle only for a short duration,
69*9e260e45SKent Overstreetwe shouldn't flush everything right away; the system might wake up and start
70*9e260e45SKent Overstreetgenerating new work soon, and flushing immediately might end up doing a lot of
71*9e260e45SKent Overstreetwork that would have been unnecessary if we'd allowed things to batch more.
72*9e260e45SKent Overstreet
73*9e260e45SKent OverstreetTo summarize, we will need:
74*9e260e45SKent Overstreet
75*9e260e45SKent Overstreet - A list of classes for background tasks that generate work, which will
76*9e260e45SKent Overstreet   include one "foreground" class.
77*9e260e45SKent Overstreet - Tracking for each class - "Am I doing work, or have I gone to sleep?"
78*9e260e45SKent Overstreet - And each class should check the class above it when deciding how much work to issue.
79