1*9e260e45SKent OverstreetIdle/background work classes design doc: 2*9e260e45SKent Overstreet 3*9e260e45SKent OverstreetRight now, our behaviour at idle isn't ideal, it was designed for servers that 4*9e260e45SKent Overstreetwould be under sustained load, to keep pending work at a "medium" level, to 5*9e260e45SKent Overstreetlet work build up so we can process it in more efficient batches, while also 6*9e260e45SKent Overstreetgiving headroom for bursts in load. 7*9e260e45SKent Overstreet 8*9e260e45SKent OverstreetBut for desktops or mobile - scenarios where work is less sustained and power 9*9e260e45SKent Overstreetusage is more important - we want to operate differently, with a "rush to 10*9e260e45SKent Overstreetidle" so the system can go to sleep. We don't want to be dribbling out 11*9e260e45SKent Overstreetbackground work while the system should be idle. 12*9e260e45SKent Overstreet 13*9e260e45SKent OverstreetThe complicating factor is that there are a number of background tasks, which 14*9e260e45SKent Overstreetform a heirarchy (or a digraph, depending on how you divide it up) - one 15*9e260e45SKent Overstreetbackground task may generate work for another. 16*9e260e45SKent Overstreet 17*9e260e45SKent OverstreetThus proper idle detection needs to model this heirarchy. 18*9e260e45SKent Overstreet 19*9e260e45SKent Overstreet- Foreground writes 20*9e260e45SKent Overstreet- Page cache writeback 21*9e260e45SKent Overstreet- Copygc, rebalance 22*9e260e45SKent Overstreet- Journal reclaim 23*9e260e45SKent Overstreet 24*9e260e45SKent OverstreetWhen we implement idle detection and rush to idle, we need to be careful not 25*9e260e45SKent Overstreetto disturb too much the existing behaviour that works reasonably well when the 26*9e260e45SKent Overstreetsystem is under sustained load (or perhaps improve it in the case of 27*9e260e45SKent Overstreetrebalance, which currently does not actively attempt to let work batch up). 28*9e260e45SKent Overstreet 29*9e260e45SKent OverstreetSUSTAINED LOAD REGIME 30*9e260e45SKent Overstreet--------------------- 31*9e260e45SKent Overstreet 32*9e260e45SKent OverstreetWhen the system is under continuous load, we want these jobs to run 33*9e260e45SKent Overstreetcontinuously - this is perhaps best modelled with a P/D controller, where 34*9e260e45SKent Overstreetthey'll be trying to keep a target value (i.e. fragmented disk space, 35*9e260e45SKent Overstreetavailable journal space) roughly in the middle of some range. 36*9e260e45SKent Overstreet 37*9e260e45SKent OverstreetThe goal under sustained load is to balance our ability to handle load spikes 38*9e260e45SKent Overstreetwithout running out of x resource (free disk space, free space in the 39*9e260e45SKent Overstreetjournal), while also letting some work accumululate to be batched (or become 40*9e260e45SKent Overstreetunnecessary). 41*9e260e45SKent Overstreet 42*9e260e45SKent OverstreetFor example, we don't want to run copygc too aggressively, because then it 43*9e260e45SKent Overstreetwill be evacuating buckets that would have become empty (been overwritten or 44*9e260e45SKent Overstreetdeleted) anyways, and we don't want to wait until we're almost out of free 45*9e260e45SKent Overstreetspace because then the system will behave unpredicably - suddenly we're doing 46*9e260e45SKent Overstreeta lot more work to service each write and the system becomes much slower. 47*9e260e45SKent Overstreet 48*9e260e45SKent OverstreetIDLE REGIME 49*9e260e45SKent Overstreet----------- 50*9e260e45SKent Overstreet 51*9e260e45SKent OverstreetWhen the system becomes idle, we should start flushing our pending work 52*9e260e45SKent Overstreetquicker so the system can go to sleep. 53*9e260e45SKent Overstreet 54*9e260e45SKent OverstreetNote that the definition of "idle" depends on where in the heirarchy a task 55*9e260e45SKent Overstreetis - a task should start flushing work more quickly when the task above it has 56*9e260e45SKent Overstreetstopped generating new work. 57*9e260e45SKent Overstreet 58*9e260e45SKent Overstreete.g. rebalance should start flushing more quickly when page cache writeback is 59*9e260e45SKent Overstreetidle, and journal reclaim should only start flushing more quickly when both 60*9e260e45SKent Overstreetcopygc and rebalance are idle. 61*9e260e45SKent Overstreet 62*9e260e45SKent OverstreetIt's important to let work accumulate when more work is still incoming and we 63*9e260e45SKent Overstreetstill have room, because flushing is always more efficient if we let it batch 64*9e260e45SKent Overstreetup. New writes may overwrite data before rebalance moves it, and tasks may be 65*9e260e45SKent Overstreetgenerating more updates for the btree nodes that journal reclaim needs to flush. 66*9e260e45SKent Overstreet 67*9e260e45SKent OverstreetOn idle, how much work we do at each interval should be proportional to the 68*9e260e45SKent Overstreetlength of time we have been idle for. If we're idle only for a short duration, 69*9e260e45SKent Overstreetwe shouldn't flush everything right away; the system might wake up and start 70*9e260e45SKent Overstreetgenerating new work soon, and flushing immediately might end up doing a lot of 71*9e260e45SKent Overstreetwork that would have been unnecessary if we'd allowed things to batch more. 72*9e260e45SKent Overstreet 73*9e260e45SKent OverstreetTo summarize, we will need: 74*9e260e45SKent Overstreet 75*9e260e45SKent Overstreet - A list of classes for background tasks that generate work, which will 76*9e260e45SKent Overstreet include one "foreground" class. 77*9e260e45SKent Overstreet - Tracking for each class - "Am I doing work, or have I gone to sleep?" 78*9e260e45SKent Overstreet - And each class should check the class above it when deciding how much work to issue. 79