xref: /linux/Documentation/mm/slub.rst (revision 621cde16e49b3ecf7d59a8106a20aaebfb4a59a9)
1ee65728eSMike Rapoport==========================
2ee65728eSMike RapoportShort users guide for SLUB
3ee65728eSMike Rapoport==========================
4ee65728eSMike Rapoport
5ee65728eSMike RapoportThe basic philosophy of SLUB is very different from SLAB. SLAB
6ee65728eSMike Rapoportrequires rebuilding the kernel to activate debug options for all
7ee65728eSMike Rapoportslab caches. SLUB always includes full debugging but it is off by default.
8ee65728eSMike RapoportSLUB can enable debugging only for selected slabs in order to avoid
9ee65728eSMike Rapoportan impact on overall system performance which may make a bug more
10ee65728eSMike Rapoportdifficult to find.
11ee65728eSMike Rapoport
12cb109a9dSXiongwei SongIn order to switch debugging on one can add an option ``slab_debug``
13ee65728eSMike Rapoportto the kernel command line. That will enable full debugging for
14ee65728eSMike Rapoportall slabs.
15ee65728eSMike Rapoport
16ee65728eSMike RapoportTypically one would then use the ``slabinfo`` command to get statistical
17ee65728eSMike Rapoportdata and perform operation on the slabs. By default ``slabinfo`` only lists
18ee65728eSMike Rapoportslabs that have data in them. See "slabinfo -h" for more options when
19ee65728eSMike Rapoportrunning the command. ``slabinfo`` can be compiled with
20ee65728eSMike Rapoport::
21ee65728eSMike Rapoport
22799fb82aSSeongJae Park	gcc -o slabinfo tools/mm/slabinfo.c
23ee65728eSMike Rapoport
24ee65728eSMike RapoportSome of the modes of operation of ``slabinfo`` require that slub debugging
25ee65728eSMike Rapoportbe enabled on the command line. F.e. no tracking information will be
26ee65728eSMike Rapoportavailable without debugging on and validation can only partially
27ee65728eSMike Rapoportbe performed if debugging was not switched on.
28ee65728eSMike Rapoport
29cb109a9dSXiongwei SongSome more sophisticated uses of slab_debug:
30ee65728eSMike Rapoport-------------------------------------------
31ee65728eSMike Rapoport
32cb109a9dSXiongwei SongParameters may be given to ``slab_debug``. If none is specified then full
33ee65728eSMike Rapoportdebugging is enabled. Format:
34ee65728eSMike Rapoport
35cb109a9dSXiongwei Songslab_debug=<Debug-Options>
36ee65728eSMike Rapoport	Enable options for all slabs
37ee65728eSMike Rapoport
38cb109a9dSXiongwei Songslab_debug=<Debug-Options>,<slab name1>,<slab name2>,...
39ee65728eSMike Rapoport	Enable options only for select slabs (no spaces
40ee65728eSMike Rapoport	after a comma)
41ee65728eSMike Rapoport
42ee65728eSMike RapoportMultiple blocks of options for all slabs or selected slabs can be given, with
43ee65728eSMike Rapoportblocks of options delimited by ';'. The last of "all slabs" blocks is applied
44ee65728eSMike Rapoportto all slabs except those that match one of the "select slabs" block. Options
45ee65728eSMike Rapoportof the first "select slabs" blocks that matches the slab's name are applied.
46ee65728eSMike Rapoport
47ee65728eSMike RapoportPossible debug options are::
48ee65728eSMike Rapoport
49ee65728eSMike Rapoport	F		Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
50ee65728eSMike Rapoport			Sorry SLAB legacy issues)
51ee65728eSMike Rapoport	Z		Red zoning
52ee65728eSMike Rapoport	P		Poisoning (object and padding)
53ee65728eSMike Rapoport	U		User tracking (free and alloc)
54ee65728eSMike Rapoport	T		Trace (please only use on single slabs)
55ee65728eSMike Rapoport	A		Enable failslab filter mark for the cache
56ee65728eSMike Rapoport	O		Switch debugging off for caches that would have
57ee65728eSMike Rapoport			caused higher minimum slab orders
58ee65728eSMike Rapoport	-		Switch all debugging off (useful if the kernel is
59ee65728eSMike Rapoport			configured with CONFIG_SLUB_DEBUG_ON)
60ee65728eSMike Rapoport
61ee65728eSMike RapoportF.e. in order to boot just with sanity checks and red zoning one would specify::
62ee65728eSMike Rapoport
63cb109a9dSXiongwei Song	slab_debug=FZ
64ee65728eSMike Rapoport
65ee65728eSMike RapoportTrying to find an issue in the dentry cache? Try::
66ee65728eSMike Rapoport
67cb109a9dSXiongwei Song	slab_debug=,dentry
68ee65728eSMike Rapoport
69ee65728eSMike Rapoportto only enable debugging on the dentry cache.  You may use an asterisk at the
70ee65728eSMike Rapoportend of the slab name, in order to cover all slabs with the same prefix.  For
71ee65728eSMike Rapoportexample, here's how you can poison the dentry cache as well as all kmalloc
72ee65728eSMike Rapoportslabs::
73ee65728eSMike Rapoport
74cb109a9dSXiongwei Song	slab_debug=P,kmalloc-*,dentry
75ee65728eSMike Rapoport
76ee65728eSMike RapoportRed zoning and tracking may realign the slab.  We can just apply sanity checks
77ee65728eSMike Rapoportto the dentry cache with::
78ee65728eSMike Rapoport
79cb109a9dSXiongwei Song	slab_debug=F,dentry
80ee65728eSMike Rapoport
81ee65728eSMike RapoportDebugging options may require the minimum possible slab order to increase as
82ee65728eSMike Rapoporta result of storing the metadata (for example, caches with PAGE_SIZE object
83*da51bbcdSRemington Brasgasizes).  This has a higher likelihood of resulting in slab allocation errors
84ee65728eSMike Rapoportin low memory situations or if there's high fragmentation of memory.  To
85ee65728eSMike Rapoportswitch off debugging for such caches by default, use::
86ee65728eSMike Rapoport
87cb109a9dSXiongwei Song	slab_debug=O
88ee65728eSMike Rapoport
89ee65728eSMike RapoportYou can apply different options to different list of slab names, using blocks
90ee65728eSMike Rapoportof options. This will enable red zoning for dentry and user tracking for
91ee65728eSMike Rapoportkmalloc. All other slabs will not get any debugging enabled::
92ee65728eSMike Rapoport
93cb109a9dSXiongwei Song	slab_debug=Z,dentry;U,kmalloc-*
94ee65728eSMike Rapoport
95ee65728eSMike RapoportYou can also enable options (e.g. sanity checks and poisoning) for all caches
96ee65728eSMike Rapoportexcept some that are deemed too performance critical and don't need to be
97ee65728eSMike Rapoportdebugged by specifying global debug options followed by a list of slab names
98ee65728eSMike Rapoportwith "-" as options::
99ee65728eSMike Rapoport
100cb109a9dSXiongwei Song	slab_debug=FZ;-,zs_handle,zspage
101ee65728eSMike Rapoport
102ee65728eSMike RapoportThe state of each debug option for a slab can be found in the respective files
103ee65728eSMike Rapoportunder::
104ee65728eSMike Rapoport
105ee65728eSMike Rapoport	/sys/kernel/slab/<slab name>/
106ee65728eSMike Rapoport
107ee65728eSMike RapoportIf the file contains 1, the option is enabled, 0 means disabled. The debug
108cb109a9dSXiongwei Songoptions from the ``slab_debug`` parameter translate to the following files::
109ee65728eSMike Rapoport
110ee65728eSMike Rapoport	F	sanity_checks
111ee65728eSMike Rapoport	Z	red_zone
112ee65728eSMike Rapoport	P	poison
113ee65728eSMike Rapoport	U	store_user
114ee65728eSMike Rapoport	T	trace
115ee65728eSMike Rapoport	A	failslab
116ee65728eSMike Rapoport
1177c82b3b3SAlexander Atanasovfailslab file is writable, so writing 1 or 0 will enable or disable
1187c82b3b3SAlexander Atanasovthe option at runtime. Write returns -EINVAL if cache is an alias.
119ee65728eSMike RapoportCareful with tracing: It may spew out lots of information and never stop if
120ee65728eSMike Rapoportused on the wrong slab.
121ee65728eSMike Rapoport
122ee65728eSMike RapoportSlab merging
123ee65728eSMike Rapoport============
124ee65728eSMike Rapoport
125ee65728eSMike RapoportIf no debug options are specified then SLUB may merge similar slabs together
126ee65728eSMike Rapoportin order to reduce overhead and increase cache hotness of objects.
127ee65728eSMike Rapoport``slabinfo -a`` displays which slabs were merged together.
128ee65728eSMike Rapoport
129ee65728eSMike RapoportSlab validation
130ee65728eSMike Rapoport===============
131ee65728eSMike Rapoport
132cb109a9dSXiongwei SongSLUB can validate all object if the kernel was booted with slab_debug. In
133ee65728eSMike Rapoportorder to do so you must have the ``slabinfo`` tool. Then you can do
134ee65728eSMike Rapoport::
135ee65728eSMike Rapoport
136ee65728eSMike Rapoport	slabinfo -v
137ee65728eSMike Rapoport
138ee65728eSMike Rapoportwhich will test all objects. Output will be generated to the syslog.
139ee65728eSMike Rapoport
140ee65728eSMike RapoportThis also works in a more limited way if boot was without slab debug.
141ee65728eSMike RapoportIn that case ``slabinfo -v`` simply tests all reachable objects. Usually
142ee65728eSMike Rapoportthese are in the cpu slabs and the partial slabs. Full slabs are not
143ee65728eSMike Rapoporttracked by SLUB in a non debug situation.
144ee65728eSMike Rapoport
145ee65728eSMike RapoportGetting more performance
146ee65728eSMike Rapoport========================
147ee65728eSMike Rapoport
148ee65728eSMike RapoportTo some degree SLUB's performance is limited by the need to take the
149ee65728eSMike Rapoportlist_lock once in a while to deal with partial slabs. That overhead is
150ee65728eSMike Rapoportgoverned by the order of the allocation for each slab. The allocations
151ee65728eSMike Rapoportcan be influenced by kernel parameters:
152ee65728eSMike Rapoport
15398d3b6d9SXiongwei Song.. slab_min_objects=x		(default: automatically scaled by number of cpus)
154cb109a9dSXiongwei Song.. slab_min_order=x		(default 0)
155cb109a9dSXiongwei Song.. slab_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))
156ee65728eSMike Rapoport
157cb109a9dSXiongwei Song``slab_min_objects``
158ee65728eSMike Rapoport	allows to specify how many objects must at least fit into one
159ee65728eSMike Rapoport	slab in order for the allocation order to be acceptable.  In
160ee65728eSMike Rapoport	general slub will be able to perform this number of
161ee65728eSMike Rapoport	allocations on a slab without consulting centralized resources
162ee65728eSMike Rapoport	(list_lock) where contention may occur.
163ee65728eSMike Rapoport
164cb109a9dSXiongwei Song``slab_min_order``
165ee65728eSMike Rapoport	specifies a minimum order of slabs. A similar effect like
166cb109a9dSXiongwei Song	``slab_min_objects``.
167ee65728eSMike Rapoport
168cb109a9dSXiongwei Song``slab_max_order``
169cb109a9dSXiongwei Song	specified the order at which ``slab_min_objects`` should no
170ee65728eSMike Rapoport	longer be checked. This is useful to avoid SLUB trying to
171cb109a9dSXiongwei Song	generate super large order pages to fit ``slab_min_objects``
172ee65728eSMike Rapoport	of a slab cache with large object sizes into one high order
173ee65728eSMike Rapoport	page. Setting command line parameter
174ee65728eSMike Rapoport	``debug_guardpage_minorder=N`` (N > 0), forces setting
175cb109a9dSXiongwei Song	``slab_max_order`` to 0, what cause minimum possible order of
176ee65728eSMike Rapoport	slabs allocation.
177ee65728eSMike Rapoport
178ee65728eSMike RapoportSLUB Debug output
179ee65728eSMike Rapoport=================
180ee65728eSMike Rapoport
181ee65728eSMike RapoportHere is a sample of slub debug output::
182ee65728eSMike Rapoport
183ee65728eSMike Rapoport ====================================================================
184ee65728eSMike Rapoport BUG kmalloc-8: Right Redzone overwritten
185ee65728eSMike Rapoport --------------------------------------------------------------------
186ee65728eSMike Rapoport
187ee65728eSMike Rapoport INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc
188ee65728eSMike Rapoport INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58
189ee65728eSMike Rapoport INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58
190ee65728eSMike Rapoport INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
191ee65728eSMike Rapoport
192ee65728eSMike Rapoport Bytes b4 (0xc90f6d10): 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
193ee65728eSMike Rapoport Object   (0xc90f6d20): 31 30 31 39 2e 30 30 35                         1019.005
194ee65728eSMike Rapoport Redzone  (0xc90f6d28): 00 cc cc cc                                     .
195ee65728eSMike Rapoport Padding  (0xc90f6d50): 5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
196ee65728eSMike Rapoport
197ee65728eSMike Rapoport   [<c010523d>] dump_trace+0x63/0x1eb
198ee65728eSMike Rapoport   [<c01053df>] show_trace_log_lvl+0x1a/0x2f
199ee65728eSMike Rapoport   [<c010601d>] show_trace+0x12/0x14
200ee65728eSMike Rapoport   [<c0106035>] dump_stack+0x16/0x18
201ee65728eSMike Rapoport   [<c017e0fa>] object_err+0x143/0x14b
202ee65728eSMike Rapoport   [<c017e2cc>] check_object+0x66/0x234
203ee65728eSMike Rapoport   [<c017eb43>] __slab_free+0x239/0x384
204ee65728eSMike Rapoport   [<c017f446>] kfree+0xa6/0xc6
205ee65728eSMike Rapoport   [<c02e2335>] get_modalias+0xb9/0xf5
206ee65728eSMike Rapoport   [<c02e23b7>] dmi_dev_uevent+0x27/0x3c
207ee65728eSMike Rapoport   [<c027866a>] dev_uevent+0x1ad/0x1da
208ee65728eSMike Rapoport   [<c0205024>] kobject_uevent_env+0x20a/0x45b
209ee65728eSMike Rapoport   [<c020527f>] kobject_uevent+0xa/0xf
210ee65728eSMike Rapoport   [<c02779f1>] store_uevent+0x4f/0x58
211ee65728eSMike Rapoport   [<c027758e>] dev_attr_store+0x29/0x2f
212ee65728eSMike Rapoport   [<c01bec4f>] sysfs_write_file+0x16e/0x19c
213ee65728eSMike Rapoport   [<c0183ba7>] vfs_write+0xd1/0x15a
214ee65728eSMike Rapoport   [<c01841d7>] sys_write+0x3d/0x72
215ee65728eSMike Rapoport   [<c0104112>] sysenter_past_esp+0x5f/0x99
216ee65728eSMike Rapoport   [<b7f7b410>] 0xb7f7b410
217ee65728eSMike Rapoport   =======================
218ee65728eSMike Rapoport
219ee65728eSMike Rapoport FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
220ee65728eSMike Rapoport
221ee65728eSMike RapoportIf SLUB encounters a corrupted object (full detection requires the kernel
222cb109a9dSXiongwei Songto be booted with slab_debug) then the following output will be dumped
223ee65728eSMike Rapoportinto the syslog:
224ee65728eSMike Rapoport
225ee65728eSMike Rapoport1. Description of the problem encountered
226ee65728eSMike Rapoport
227ee65728eSMike Rapoport   This will be a message in the system log starting with::
228ee65728eSMike Rapoport
229ee65728eSMike Rapoport     ===============================================
230ee65728eSMike Rapoport     BUG <slab cache affected>: <What went wrong>
231ee65728eSMike Rapoport     -----------------------------------------------
232ee65728eSMike Rapoport
233ee65728eSMike Rapoport     INFO: <corruption start>-<corruption_end> <more info>
234ee65728eSMike Rapoport     INFO: Slab <address> <slab information>
235ee65728eSMike Rapoport     INFO: Object <address> <object information>
236ee65728eSMike Rapoport     INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by
237ee65728eSMike Rapoport	cpu> pid=<pid of the process>
238ee65728eSMike Rapoport     INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu>
239ee65728eSMike Rapoport	pid=<pid of the process>
240ee65728eSMike Rapoport
241ee65728eSMike Rapoport   (Object allocation / free information is only available if SLAB_STORE_USER is
242cb109a9dSXiongwei Song   set for the slab. slab_debug sets that option)
243ee65728eSMike Rapoport
244ee65728eSMike Rapoport2. The object contents if an object was involved.
245ee65728eSMike Rapoport
246ee65728eSMike Rapoport   Various types of lines can follow the BUG SLUB line:
247ee65728eSMike Rapoport
248ee65728eSMike Rapoport   Bytes b4 <address> : <bytes>
249ee65728eSMike Rapoport	Shows a few bytes before the object where the problem was detected.
250ee65728eSMike Rapoport	Can be useful if the corruption does not stop with the start of the
251ee65728eSMike Rapoport	object.
252ee65728eSMike Rapoport
253ee65728eSMike Rapoport   Object <address> : <bytes>
254ee65728eSMike Rapoport	The bytes of the object. If the object is inactive then the bytes
255ee65728eSMike Rapoport	typically contain poison values. Any non-poison value shows a
256ee65728eSMike Rapoport	corruption by a write after free.
257ee65728eSMike Rapoport
258ee65728eSMike Rapoport   Redzone <address> : <bytes>
259ee65728eSMike Rapoport	The Redzone following the object. The Redzone is used to detect
260ee65728eSMike Rapoport	writes after the object. All bytes should always have the same
261ee65728eSMike Rapoport	value. If there is any deviation then it is due to a write after
262ee65728eSMike Rapoport	the object boundary.
263ee65728eSMike Rapoport
264ee65728eSMike Rapoport	(Redzone information is only available if SLAB_RED_ZONE is set.
265cb109a9dSXiongwei Song	slab_debug sets that option)
266ee65728eSMike Rapoport
267ee65728eSMike Rapoport   Padding <address> : <bytes>
268ee65728eSMike Rapoport	Unused data to fill up the space in order to get the next object
269ee65728eSMike Rapoport	properly aligned. In the debug case we make sure that there are
270ee65728eSMike Rapoport	at least 4 bytes of padding. This allows the detection of writes
271ee65728eSMike Rapoport	before the object.
272ee65728eSMike Rapoport
273ee65728eSMike Rapoport3. A stackdump
274ee65728eSMike Rapoport
275ee65728eSMike Rapoport   The stackdump describes the location where the error was detected. The cause
276ee65728eSMike Rapoport   of the corruption is may be more likely found by looking at the function that
277ee65728eSMike Rapoport   allocated or freed the object.
278ee65728eSMike Rapoport
279ee65728eSMike Rapoport4. Report on how the problem was dealt with in order to ensure the continued
280ee65728eSMike Rapoport   operation of the system.
281ee65728eSMike Rapoport
282ee65728eSMike Rapoport   These are messages in the system log beginning with::
283ee65728eSMike Rapoport
284ee65728eSMike Rapoport	FIX <slab cache affected>: <corrective action taken>
285ee65728eSMike Rapoport
286ee65728eSMike Rapoport   In the above sample SLUB found that the Redzone of an active object has
287ee65728eSMike Rapoport   been overwritten. Here a string of 8 characters was written into a slab that
288ee65728eSMike Rapoport   has the length of 8 characters. However, a 8 character string needs a
289ee65728eSMike Rapoport   terminating 0. That zero has overwritten the first byte of the Redzone field.
290ee65728eSMike Rapoport   After reporting the details of the issue encountered the FIX SLUB message
291ee65728eSMike Rapoport   tells us that SLUB has restored the Redzone to its proper value and then
292ee65728eSMike Rapoport   system operations continue.
293ee65728eSMike Rapoport
294ee65728eSMike RapoportEmergency operations
295ee65728eSMike Rapoport====================
296ee65728eSMike Rapoport
297ee65728eSMike RapoportMinimal debugging (sanity checks alone) can be enabled by booting with::
298ee65728eSMike Rapoport
299cb109a9dSXiongwei Song	slab_debug=F
300ee65728eSMike Rapoport
301ee65728eSMike RapoportThis will be generally be enough to enable the resiliency features of slub
302ee65728eSMike Rapoportwhich will keep the system running even if a bad kernel component will
303ee65728eSMike Rapoportkeep corrupting objects. This may be important for production systems.
304ee65728eSMike RapoportPerformance will be impacted by the sanity checks and there will be a
305ee65728eSMike Rapoportcontinual stream of error messages to the syslog but no additional memory
306ee65728eSMike Rapoportwill be used (unlike full debugging).
307ee65728eSMike Rapoport
308ee65728eSMike RapoportNo guarantees. The kernel component still needs to be fixed. Performance
309ee65728eSMike Rapoportmay be optimized further by locating the slab that experiences corruption
310ee65728eSMike Rapoportand enabling debugging only for that cache
311ee65728eSMike Rapoport
312ee65728eSMike RapoportI.e.::
313ee65728eSMike Rapoport
314cb109a9dSXiongwei Song	slab_debug=F,dentry
315ee65728eSMike Rapoport
316ee65728eSMike RapoportIf the corruption occurs by writing after the end of the object then it
317ee65728eSMike Rapoportmay be advisable to enable a Redzone to avoid corrupting the beginning
318ee65728eSMike Rapoportof other objects::
319ee65728eSMike Rapoport
320cb109a9dSXiongwei Song	slab_debug=FZ,dentry
321ee65728eSMike Rapoport
322ee65728eSMike RapoportExtended slabinfo mode and plotting
323ee65728eSMike Rapoport===================================
324ee65728eSMike Rapoport
325ee65728eSMike RapoportThe ``slabinfo`` tool has a special 'extended' ('-X') mode that includes:
326ee65728eSMike Rapoport - Slabcache Totals
327ee65728eSMike Rapoport - Slabs sorted by size (up to -N <num> slabs, default 1)
328ee65728eSMike Rapoport - Slabs sorted by loss (up to -N <num> slabs, default 1)
329ee65728eSMike Rapoport
330ee65728eSMike RapoportAdditionally, in this mode ``slabinfo`` does not dynamically scale
331ee65728eSMike Rapoportsizes (G/M/K) and reports everything in bytes (this functionality is
332ee65728eSMike Rapoportalso available to other slabinfo modes via '-B' option) which makes
333ee65728eSMike Rapoportreporting more precise and accurate. Moreover, in some sense the `-X'
334ee65728eSMike Rapoportmode also simplifies the analysis of slabs' behaviour, because its
335ee65728eSMike Rapoportoutput can be plotted using the ``slabinfo-gnuplot.sh`` script. So it
336ee65728eSMike Rapoportpushes the analysis from looking through the numbers (tons of numbers)
337ee65728eSMike Rapoportto something easier -- visual analysis.
338ee65728eSMike Rapoport
339ee65728eSMike RapoportTo generate plots:
340ee65728eSMike Rapoport
341ee65728eSMike Rapoporta) collect slabinfo extended records, for example::
342ee65728eSMike Rapoport
343ee65728eSMike Rapoport	while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done
344ee65728eSMike Rapoport
345ee65728eSMike Rapoportb) pass stats file(-s) to ``slabinfo-gnuplot.sh`` script::
346ee65728eSMike Rapoport
347ee65728eSMike Rapoport	slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN]
348ee65728eSMike Rapoport
349ee65728eSMike Rapoport   The ``slabinfo-gnuplot.sh`` script will pre-processes the collected records
350ee65728eSMike Rapoport   and generates 3 png files (and 3 pre-processing cache files) per STATS
351ee65728eSMike Rapoport   file:
352ee65728eSMike Rapoport   - Slabcache Totals: FOO_STATS-totals.png
353ee65728eSMike Rapoport   - Slabs sorted by size: FOO_STATS-slabs-by-size.png
354ee65728eSMike Rapoport   - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png
355ee65728eSMike Rapoport
356ee65728eSMike RapoportAnother use case, when ``slabinfo-gnuplot.sh`` can be useful, is when you
357ee65728eSMike Rapoportneed to compare slabs' behaviour "prior to" and "after" some code
358ee65728eSMike Rapoportmodification.  To help you out there, ``slabinfo-gnuplot.sh`` script
359ee65728eSMike Rapoportcan 'merge' the `Slabcache Totals` sections from different
360ee65728eSMike Rapoportmeasurements. To visually compare N plots:
361ee65728eSMike Rapoport
362ee65728eSMike Rapoporta) Collect as many STATS1, STATS2, .. STATSN files as you need::
363ee65728eSMike Rapoport
364ee65728eSMike Rapoport	while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done
365ee65728eSMike Rapoport
366ee65728eSMike Rapoportb) Pre-process those STATS files::
367ee65728eSMike Rapoport
368ee65728eSMike Rapoport	slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN
369ee65728eSMike Rapoport
370ee65728eSMike Rapoportc) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
371ee65728eSMike Rapoport   generated pre-processed \*-totals::
372ee65728eSMike Rapoport
373ee65728eSMike Rapoport	slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals
374ee65728eSMike Rapoport
375ee65728eSMike Rapoport   This will produce a single plot (png file).
376ee65728eSMike Rapoport
377ee65728eSMike Rapoport   Plots, expectedly, can be large so some fluctuations or small spikes
378ee65728eSMike Rapoport   can go unnoticed. To deal with that, ``slabinfo-gnuplot.sh`` has two
379ee65728eSMike Rapoport   options to 'zoom-in'/'zoom-out':
380ee65728eSMike Rapoport
381ee65728eSMike Rapoport   a) ``-s %d,%d`` -- overwrites the default image width and height
382ee65728eSMike Rapoport   b) ``-r %d,%d`` -- specifies a range of samples to use (for example,
383ee65728eSMike Rapoport      in ``slabinfo -X >> FOO_STATS; sleep 1;`` case, using a ``-r
384ee65728eSMike Rapoport      40,60`` range will plot only samples collected between 40th and
385ee65728eSMike Rapoport      60th seconds).
386ee65728eSMike Rapoport
387ee65728eSMike Rapoport
388ee65728eSMike RapoportDebugFS files for SLUB
389ee65728eSMike Rapoport======================
390ee65728eSMike Rapoport
391ee65728eSMike RapoportFor more information about current state of SLUB caches with the user tracking
392ee65728eSMike Rapoportdebug option enabled, debugfs files are available, typically under
393ee65728eSMike Rapoport/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
394ee65728eSMike Rapoporttracking). There are 2 types of these files with the following debug
395ee65728eSMike Rapoportinformation:
396ee65728eSMike Rapoport
397ee65728eSMike Rapoport1. alloc_traces::
398ee65728eSMike Rapoport
399ee65728eSMike Rapoport    Prints information about unique allocation traces of the currently
400ee65728eSMike Rapoport    allocated objects. The output is sorted by frequency of each trace.
401ee65728eSMike Rapoport
402ee65728eSMike Rapoport    Information in the output:
4036edf2576SFeng Tang    Number of objects, allocating function, possible memory wastage of
4046edf2576SFeng Tang    kmalloc objects(total/per-object), minimal/average/maximal jiffies
4056edf2576SFeng Tang    since alloc, pid range of the allocating processes, cpu mask of
4066edf2576SFeng Tang    allocating cpus, numa node mask of origins of memory, and stack trace.
407ee65728eSMike Rapoport
408ee65728eSMike Rapoport    Example:::
409ee65728eSMike Rapoport
4106edf2576SFeng Tang    338 pci_alloc_dev+0x2c/0xa0 waste=521872/1544 age=290837/291891/293509 pid=1 cpus=106 nodes=0-1
4116edf2576SFeng Tang        __kmem_cache_alloc_node+0x11f/0x4e0
4126edf2576SFeng Tang        kmalloc_trace+0x26/0xa0
4136edf2576SFeng Tang        pci_alloc_dev+0x2c/0xa0
4146edf2576SFeng Tang        pci_scan_single_device+0xd2/0x150
4156edf2576SFeng Tang        pci_scan_slot+0xf7/0x2d0
4166edf2576SFeng Tang        pci_scan_child_bus_extend+0x4e/0x360
4176edf2576SFeng Tang        acpi_pci_root_create+0x32e/0x3b0
4186edf2576SFeng Tang        pci_acpi_scan_root+0x2b9/0x2d0
4196edf2576SFeng Tang        acpi_pci_root_add.cold.11+0x110/0xb0a
4206edf2576SFeng Tang        acpi_bus_attach+0x262/0x3f0
4216edf2576SFeng Tang        device_for_each_child+0xb7/0x110
4226edf2576SFeng Tang        acpi_dev_for_each_child+0x77/0xa0
4236edf2576SFeng Tang        acpi_bus_attach+0x108/0x3f0
4246edf2576SFeng Tang        device_for_each_child+0xb7/0x110
4256edf2576SFeng Tang        acpi_dev_for_each_child+0x77/0xa0
4266edf2576SFeng Tang        acpi_bus_attach+0x108/0x3f0
427ee65728eSMike Rapoport
428ee65728eSMike Rapoport2. free_traces::
429ee65728eSMike Rapoport
430ee65728eSMike Rapoport    Prints information about unique freeing traces of the currently allocated
431ee65728eSMike Rapoport    objects. The freeing traces thus come from the previous life-cycle of the
432ee65728eSMike Rapoport    objects and are reported as not available for objects allocated for the first
433ee65728eSMike Rapoport    time. The output is sorted by frequency of each trace.
434ee65728eSMike Rapoport
435ee65728eSMike Rapoport    Information in the output:
436ee65728eSMike Rapoport    Number of objects, freeing function, minimal/average/maximal jiffies since free,
437ee65728eSMike Rapoport    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
438ee65728eSMike Rapoport
439ee65728eSMike Rapoport    Example:::
440ee65728eSMike Rapoport
441ee65728eSMike Rapoport    1980 <not-available> age=4294912290 pid=0 cpus=0
442ee65728eSMike Rapoport    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
443ee65728eSMike Rapoport	kfree+0x2db/0x420
444ee65728eSMike Rapoport	acpi_ut_update_ref_count+0x6a6/0x782
445ee65728eSMike Rapoport	acpi_ut_update_object_reference+0x1ad/0x234
446ee65728eSMike Rapoport	acpi_ut_remove_reference+0x7d/0x84
447ee65728eSMike Rapoport	acpi_rs_get_prt_method_data+0x97/0xd6
448ee65728eSMike Rapoport	acpi_get_irq_routing_table+0x82/0xc4
449ee65728eSMike Rapoport	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
450ee65728eSMike Rapoport	acpi_pci_irq_lookup+0x3a/0x1e0
451ee65728eSMike Rapoport	acpi_pci_irq_enable+0x77/0x240
452ee65728eSMike Rapoport	pcibios_enable_device+0x39/0x40
453ee65728eSMike Rapoport	do_pci_enable_device.part.0+0x5d/0xe0
454ee65728eSMike Rapoport	pci_enable_device_flags+0xfc/0x120
455ee65728eSMike Rapoport	pci_enable_device+0x13/0x20
456ee65728eSMike Rapoport	virtio_pci_probe+0x9e/0x170
457ee65728eSMike Rapoport	local_pci_probe+0x48/0x80
458ee65728eSMike Rapoport	pci_device_probe+0x105/0x1c0
459ee65728eSMike Rapoport
460ee65728eSMike RapoportChristoph Lameter, May 30, 2007
461ee65728eSMike RapoportSergey Senozhatsky, October 23, 2015
462