xref: /linux/Documentation/mm/slub.rst (revision 799fb82aa132fa3a3886b7872997a5a84e820062)
1ee65728eSMike Rapoport.. _slub:
2ee65728eSMike Rapoport
3ee65728eSMike Rapoport==========================
4ee65728eSMike RapoportShort users guide for SLUB
5ee65728eSMike Rapoport==========================
6ee65728eSMike Rapoport
7ee65728eSMike RapoportThe basic philosophy of SLUB is very different from SLAB. SLAB
8ee65728eSMike Rapoportrequires rebuilding the kernel to activate debug options for all
9ee65728eSMike Rapoportslab caches. SLUB always includes full debugging but it is off by default.
10ee65728eSMike RapoportSLUB can enable debugging only for selected slabs in order to avoid
11ee65728eSMike Rapoportan impact on overall system performance which may make a bug more
12ee65728eSMike Rapoportdifficult to find.
13ee65728eSMike Rapoport
14ee65728eSMike RapoportIn order to switch debugging on one can add an option ``slub_debug``
15ee65728eSMike Rapoportto the kernel command line. That will enable full debugging for
16ee65728eSMike Rapoportall slabs.
17ee65728eSMike Rapoport
18ee65728eSMike RapoportTypically one would then use the ``slabinfo`` command to get statistical
19ee65728eSMike Rapoportdata and perform operation on the slabs. By default ``slabinfo`` only lists
20ee65728eSMike Rapoportslabs that have data in them. See "slabinfo -h" for more options when
21ee65728eSMike Rapoportrunning the command. ``slabinfo`` can be compiled with
22ee65728eSMike Rapoport::
23ee65728eSMike Rapoport
24*799fb82aSSeongJae Park	gcc -o slabinfo tools/mm/slabinfo.c
25ee65728eSMike Rapoport
26ee65728eSMike RapoportSome of the modes of operation of ``slabinfo`` require that slub debugging
27ee65728eSMike Rapoportbe enabled on the command line. F.e. no tracking information will be
28ee65728eSMike Rapoportavailable without debugging on and validation can only partially
29ee65728eSMike Rapoportbe performed if debugging was not switched on.
30ee65728eSMike Rapoport
31ee65728eSMike RapoportSome more sophisticated uses of slub_debug:
32ee65728eSMike Rapoport-------------------------------------------
33ee65728eSMike Rapoport
34ee65728eSMike RapoportParameters may be given to ``slub_debug``. If none is specified then full
35ee65728eSMike Rapoportdebugging is enabled. Format:
36ee65728eSMike Rapoport
37ee65728eSMike Rapoportslub_debug=<Debug-Options>
38ee65728eSMike Rapoport	Enable options for all slabs
39ee65728eSMike Rapoport
40ee65728eSMike Rapoportslub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
41ee65728eSMike Rapoport	Enable options only for select slabs (no spaces
42ee65728eSMike Rapoport	after a comma)
43ee65728eSMike Rapoport
44ee65728eSMike RapoportMultiple blocks of options for all slabs or selected slabs can be given, with
45ee65728eSMike Rapoportblocks of options delimited by ';'. The last of "all slabs" blocks is applied
46ee65728eSMike Rapoportto all slabs except those that match one of the "select slabs" block. Options
47ee65728eSMike Rapoportof the first "select slabs" blocks that matches the slab's name are applied.
48ee65728eSMike Rapoport
49ee65728eSMike RapoportPossible debug options are::
50ee65728eSMike Rapoport
51ee65728eSMike Rapoport	F		Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
52ee65728eSMike Rapoport			Sorry SLAB legacy issues)
53ee65728eSMike Rapoport	Z		Red zoning
54ee65728eSMike Rapoport	P		Poisoning (object and padding)
55ee65728eSMike Rapoport	U		User tracking (free and alloc)
56ee65728eSMike Rapoport	T		Trace (please only use on single slabs)
57ee65728eSMike Rapoport	A		Enable failslab filter mark for the cache
58ee65728eSMike Rapoport	O		Switch debugging off for caches that would have
59ee65728eSMike Rapoport			caused higher minimum slab orders
60ee65728eSMike Rapoport	-		Switch all debugging off (useful if the kernel is
61ee65728eSMike Rapoport			configured with CONFIG_SLUB_DEBUG_ON)
62ee65728eSMike Rapoport
63ee65728eSMike RapoportF.e. in order to boot just with sanity checks and red zoning one would specify::
64ee65728eSMike Rapoport
65ee65728eSMike Rapoport	slub_debug=FZ
66ee65728eSMike Rapoport
67ee65728eSMike RapoportTrying to find an issue in the dentry cache? Try::
68ee65728eSMike Rapoport
69ee65728eSMike Rapoport	slub_debug=,dentry
70ee65728eSMike Rapoport
71ee65728eSMike Rapoportto only enable debugging on the dentry cache.  You may use an asterisk at the
72ee65728eSMike Rapoportend of the slab name, in order to cover all slabs with the same prefix.  For
73ee65728eSMike Rapoportexample, here's how you can poison the dentry cache as well as all kmalloc
74ee65728eSMike Rapoportslabs::
75ee65728eSMike Rapoport
76ee65728eSMike Rapoport	slub_debug=P,kmalloc-*,dentry
77ee65728eSMike Rapoport
78ee65728eSMike RapoportRed zoning and tracking may realign the slab.  We can just apply sanity checks
79ee65728eSMike Rapoportto the dentry cache with::
80ee65728eSMike Rapoport
81ee65728eSMike Rapoport	slub_debug=F,dentry
82ee65728eSMike Rapoport
83ee65728eSMike RapoportDebugging options may require the minimum possible slab order to increase as
84ee65728eSMike Rapoporta result of storing the metadata (for example, caches with PAGE_SIZE object
85ee65728eSMike Rapoportsizes).  This has a higher liklihood of resulting in slab allocation errors
86ee65728eSMike Rapoportin low memory situations or if there's high fragmentation of memory.  To
87ee65728eSMike Rapoportswitch off debugging for such caches by default, use::
88ee65728eSMike Rapoport
89ee65728eSMike Rapoport	slub_debug=O
90ee65728eSMike Rapoport
91ee65728eSMike RapoportYou can apply different options to different list of slab names, using blocks
92ee65728eSMike Rapoportof options. This will enable red zoning for dentry and user tracking for
93ee65728eSMike Rapoportkmalloc. All other slabs will not get any debugging enabled::
94ee65728eSMike Rapoport
95ee65728eSMike Rapoport	slub_debug=Z,dentry;U,kmalloc-*
96ee65728eSMike Rapoport
97ee65728eSMike RapoportYou can also enable options (e.g. sanity checks and poisoning) for all caches
98ee65728eSMike Rapoportexcept some that are deemed too performance critical and don't need to be
99ee65728eSMike Rapoportdebugged by specifying global debug options followed by a list of slab names
100ee65728eSMike Rapoportwith "-" as options::
101ee65728eSMike Rapoport
102ee65728eSMike Rapoport	slub_debug=FZ;-,zs_handle,zspage
103ee65728eSMike Rapoport
104ee65728eSMike RapoportThe state of each debug option for a slab can be found in the respective files
105ee65728eSMike Rapoportunder::
106ee65728eSMike Rapoport
107ee65728eSMike Rapoport	/sys/kernel/slab/<slab name>/
108ee65728eSMike Rapoport
109ee65728eSMike RapoportIf the file contains 1, the option is enabled, 0 means disabled. The debug
110ee65728eSMike Rapoportoptions from the ``slub_debug`` parameter translate to the following files::
111ee65728eSMike Rapoport
112ee65728eSMike Rapoport	F	sanity_checks
113ee65728eSMike Rapoport	Z	red_zone
114ee65728eSMike Rapoport	P	poison
115ee65728eSMike Rapoport	U	store_user
116ee65728eSMike Rapoport	T	trace
117ee65728eSMike Rapoport	A	failslab
118ee65728eSMike Rapoport
1197c82b3b3SAlexander Atanasovfailslab file is writable, so writing 1 or 0 will enable or disable
1207c82b3b3SAlexander Atanasovthe option at runtime. Write returns -EINVAL if cache is an alias.
121ee65728eSMike RapoportCareful with tracing: It may spew out lots of information and never stop if
122ee65728eSMike Rapoportused on the wrong slab.
123ee65728eSMike Rapoport
124ee65728eSMike RapoportSlab merging
125ee65728eSMike Rapoport============
126ee65728eSMike Rapoport
127ee65728eSMike RapoportIf no debug options are specified then SLUB may merge similar slabs together
128ee65728eSMike Rapoportin order to reduce overhead and increase cache hotness of objects.
129ee65728eSMike Rapoport``slabinfo -a`` displays which slabs were merged together.
130ee65728eSMike Rapoport
131ee65728eSMike RapoportSlab validation
132ee65728eSMike Rapoport===============
133ee65728eSMike Rapoport
134ee65728eSMike RapoportSLUB can validate all object if the kernel was booted with slub_debug. In
135ee65728eSMike Rapoportorder to do so you must have the ``slabinfo`` tool. Then you can do
136ee65728eSMike Rapoport::
137ee65728eSMike Rapoport
138ee65728eSMike Rapoport	slabinfo -v
139ee65728eSMike Rapoport
140ee65728eSMike Rapoportwhich will test all objects. Output will be generated to the syslog.
141ee65728eSMike Rapoport
142ee65728eSMike RapoportThis also works in a more limited way if boot was without slab debug.
143ee65728eSMike RapoportIn that case ``slabinfo -v`` simply tests all reachable objects. Usually
144ee65728eSMike Rapoportthese are in the cpu slabs and the partial slabs. Full slabs are not
145ee65728eSMike Rapoporttracked by SLUB in a non debug situation.
146ee65728eSMike Rapoport
147ee65728eSMike RapoportGetting more performance
148ee65728eSMike Rapoport========================
149ee65728eSMike Rapoport
150ee65728eSMike RapoportTo some degree SLUB's performance is limited by the need to take the
151ee65728eSMike Rapoportlist_lock once in a while to deal with partial slabs. That overhead is
152ee65728eSMike Rapoportgoverned by the order of the allocation for each slab. The allocations
153ee65728eSMike Rapoportcan be influenced by kernel parameters:
154ee65728eSMike Rapoport
155ee65728eSMike Rapoport.. slub_min_objects=x		(default 4)
156ee65728eSMike Rapoport.. slub_min_order=x		(default 0)
157ee65728eSMike Rapoport.. slub_max_order=x		(default 3 (PAGE_ALLOC_COSTLY_ORDER))
158ee65728eSMike Rapoport
159ee65728eSMike Rapoport``slub_min_objects``
160ee65728eSMike Rapoport	allows to specify how many objects must at least fit into one
161ee65728eSMike Rapoport	slab in order for the allocation order to be acceptable.  In
162ee65728eSMike Rapoport	general slub will be able to perform this number of
163ee65728eSMike Rapoport	allocations on a slab without consulting centralized resources
164ee65728eSMike Rapoport	(list_lock) where contention may occur.
165ee65728eSMike Rapoport
166ee65728eSMike Rapoport``slub_min_order``
167ee65728eSMike Rapoport	specifies a minimum order of slabs. A similar effect like
168ee65728eSMike Rapoport	``slub_min_objects``.
169ee65728eSMike Rapoport
170ee65728eSMike Rapoport``slub_max_order``
171ee65728eSMike Rapoport	specified the order at which ``slub_min_objects`` should no
172ee65728eSMike Rapoport	longer be checked. This is useful to avoid SLUB trying to
173ee65728eSMike Rapoport	generate super large order pages to fit ``slub_min_objects``
174ee65728eSMike Rapoport	of a slab cache with large object sizes into one high order
175ee65728eSMike Rapoport	page. Setting command line parameter
176ee65728eSMike Rapoport	``debug_guardpage_minorder=N`` (N > 0), forces setting
177ee65728eSMike Rapoport	``slub_max_order`` to 0, what cause minimum possible order of
178ee65728eSMike Rapoport	slabs allocation.
179ee65728eSMike Rapoport
180ee65728eSMike RapoportSLUB Debug output
181ee65728eSMike Rapoport=================
182ee65728eSMike Rapoport
183ee65728eSMike RapoportHere is a sample of slub debug output::
184ee65728eSMike Rapoport
185ee65728eSMike Rapoport ====================================================================
186ee65728eSMike Rapoport BUG kmalloc-8: Right Redzone overwritten
187ee65728eSMike Rapoport --------------------------------------------------------------------
188ee65728eSMike Rapoport
189ee65728eSMike Rapoport INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc
190ee65728eSMike Rapoport INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58
191ee65728eSMike Rapoport INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58
192ee65728eSMike Rapoport INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
193ee65728eSMike Rapoport
194ee65728eSMike Rapoport Bytes b4 (0xc90f6d10): 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
195ee65728eSMike Rapoport Object   (0xc90f6d20): 31 30 31 39 2e 30 30 35                         1019.005
196ee65728eSMike Rapoport Redzone  (0xc90f6d28): 00 cc cc cc                                     .
197ee65728eSMike Rapoport Padding  (0xc90f6d50): 5a 5a 5a 5a 5a 5a 5a 5a                         ZZZZZZZZ
198ee65728eSMike Rapoport
199ee65728eSMike Rapoport   [<c010523d>] dump_trace+0x63/0x1eb
200ee65728eSMike Rapoport   [<c01053df>] show_trace_log_lvl+0x1a/0x2f
201ee65728eSMike Rapoport   [<c010601d>] show_trace+0x12/0x14
202ee65728eSMike Rapoport   [<c0106035>] dump_stack+0x16/0x18
203ee65728eSMike Rapoport   [<c017e0fa>] object_err+0x143/0x14b
204ee65728eSMike Rapoport   [<c017e2cc>] check_object+0x66/0x234
205ee65728eSMike Rapoport   [<c017eb43>] __slab_free+0x239/0x384
206ee65728eSMike Rapoport   [<c017f446>] kfree+0xa6/0xc6
207ee65728eSMike Rapoport   [<c02e2335>] get_modalias+0xb9/0xf5
208ee65728eSMike Rapoport   [<c02e23b7>] dmi_dev_uevent+0x27/0x3c
209ee65728eSMike Rapoport   [<c027866a>] dev_uevent+0x1ad/0x1da
210ee65728eSMike Rapoport   [<c0205024>] kobject_uevent_env+0x20a/0x45b
211ee65728eSMike Rapoport   [<c020527f>] kobject_uevent+0xa/0xf
212ee65728eSMike Rapoport   [<c02779f1>] store_uevent+0x4f/0x58
213ee65728eSMike Rapoport   [<c027758e>] dev_attr_store+0x29/0x2f
214ee65728eSMike Rapoport   [<c01bec4f>] sysfs_write_file+0x16e/0x19c
215ee65728eSMike Rapoport   [<c0183ba7>] vfs_write+0xd1/0x15a
216ee65728eSMike Rapoport   [<c01841d7>] sys_write+0x3d/0x72
217ee65728eSMike Rapoport   [<c0104112>] sysenter_past_esp+0x5f/0x99
218ee65728eSMike Rapoport   [<b7f7b410>] 0xb7f7b410
219ee65728eSMike Rapoport   =======================
220ee65728eSMike Rapoport
221ee65728eSMike Rapoport FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
222ee65728eSMike Rapoport
223ee65728eSMike RapoportIf SLUB encounters a corrupted object (full detection requires the kernel
224ee65728eSMike Rapoportto be booted with slub_debug) then the following output will be dumped
225ee65728eSMike Rapoportinto the syslog:
226ee65728eSMike Rapoport
227ee65728eSMike Rapoport1. Description of the problem encountered
228ee65728eSMike Rapoport
229ee65728eSMike Rapoport   This will be a message in the system log starting with::
230ee65728eSMike Rapoport
231ee65728eSMike Rapoport     ===============================================
232ee65728eSMike Rapoport     BUG <slab cache affected>: <What went wrong>
233ee65728eSMike Rapoport     -----------------------------------------------
234ee65728eSMike Rapoport
235ee65728eSMike Rapoport     INFO: <corruption start>-<corruption_end> <more info>
236ee65728eSMike Rapoport     INFO: Slab <address> <slab information>
237ee65728eSMike Rapoport     INFO: Object <address> <object information>
238ee65728eSMike Rapoport     INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by
239ee65728eSMike Rapoport	cpu> pid=<pid of the process>
240ee65728eSMike Rapoport     INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu>
241ee65728eSMike Rapoport	pid=<pid of the process>
242ee65728eSMike Rapoport
243ee65728eSMike Rapoport   (Object allocation / free information is only available if SLAB_STORE_USER is
244ee65728eSMike Rapoport   set for the slab. slub_debug sets that option)
245ee65728eSMike Rapoport
246ee65728eSMike Rapoport2. The object contents if an object was involved.
247ee65728eSMike Rapoport
248ee65728eSMike Rapoport   Various types of lines can follow the BUG SLUB line:
249ee65728eSMike Rapoport
250ee65728eSMike Rapoport   Bytes b4 <address> : <bytes>
251ee65728eSMike Rapoport	Shows a few bytes before the object where the problem was detected.
252ee65728eSMike Rapoport	Can be useful if the corruption does not stop with the start of the
253ee65728eSMike Rapoport	object.
254ee65728eSMike Rapoport
255ee65728eSMike Rapoport   Object <address> : <bytes>
256ee65728eSMike Rapoport	The bytes of the object. If the object is inactive then the bytes
257ee65728eSMike Rapoport	typically contain poison values. Any non-poison value shows a
258ee65728eSMike Rapoport	corruption by a write after free.
259ee65728eSMike Rapoport
260ee65728eSMike Rapoport   Redzone <address> : <bytes>
261ee65728eSMike Rapoport	The Redzone following the object. The Redzone is used to detect
262ee65728eSMike Rapoport	writes after the object. All bytes should always have the same
263ee65728eSMike Rapoport	value. If there is any deviation then it is due to a write after
264ee65728eSMike Rapoport	the object boundary.
265ee65728eSMike Rapoport
266ee65728eSMike Rapoport	(Redzone information is only available if SLAB_RED_ZONE is set.
267ee65728eSMike Rapoport	slub_debug sets that option)
268ee65728eSMike Rapoport
269ee65728eSMike Rapoport   Padding <address> : <bytes>
270ee65728eSMike Rapoport	Unused data to fill up the space in order to get the next object
271ee65728eSMike Rapoport	properly aligned. In the debug case we make sure that there are
272ee65728eSMike Rapoport	at least 4 bytes of padding. This allows the detection of writes
273ee65728eSMike Rapoport	before the object.
274ee65728eSMike Rapoport
275ee65728eSMike Rapoport3. A stackdump
276ee65728eSMike Rapoport
277ee65728eSMike Rapoport   The stackdump describes the location where the error was detected. The cause
278ee65728eSMike Rapoport   of the corruption is may be more likely found by looking at the function that
279ee65728eSMike Rapoport   allocated or freed the object.
280ee65728eSMike Rapoport
281ee65728eSMike Rapoport4. Report on how the problem was dealt with in order to ensure the continued
282ee65728eSMike Rapoport   operation of the system.
283ee65728eSMike Rapoport
284ee65728eSMike Rapoport   These are messages in the system log beginning with::
285ee65728eSMike Rapoport
286ee65728eSMike Rapoport	FIX <slab cache affected>: <corrective action taken>
287ee65728eSMike Rapoport
288ee65728eSMike Rapoport   In the above sample SLUB found that the Redzone of an active object has
289ee65728eSMike Rapoport   been overwritten. Here a string of 8 characters was written into a slab that
290ee65728eSMike Rapoport   has the length of 8 characters. However, a 8 character string needs a
291ee65728eSMike Rapoport   terminating 0. That zero has overwritten the first byte of the Redzone field.
292ee65728eSMike Rapoport   After reporting the details of the issue encountered the FIX SLUB message
293ee65728eSMike Rapoport   tells us that SLUB has restored the Redzone to its proper value and then
294ee65728eSMike Rapoport   system operations continue.
295ee65728eSMike Rapoport
296ee65728eSMike RapoportEmergency operations
297ee65728eSMike Rapoport====================
298ee65728eSMike Rapoport
299ee65728eSMike RapoportMinimal debugging (sanity checks alone) can be enabled by booting with::
300ee65728eSMike Rapoport
301ee65728eSMike Rapoport	slub_debug=F
302ee65728eSMike Rapoport
303ee65728eSMike RapoportThis will be generally be enough to enable the resiliency features of slub
304ee65728eSMike Rapoportwhich will keep the system running even if a bad kernel component will
305ee65728eSMike Rapoportkeep corrupting objects. This may be important for production systems.
306ee65728eSMike RapoportPerformance will be impacted by the sanity checks and there will be a
307ee65728eSMike Rapoportcontinual stream of error messages to the syslog but no additional memory
308ee65728eSMike Rapoportwill be used (unlike full debugging).
309ee65728eSMike Rapoport
310ee65728eSMike RapoportNo guarantees. The kernel component still needs to be fixed. Performance
311ee65728eSMike Rapoportmay be optimized further by locating the slab that experiences corruption
312ee65728eSMike Rapoportand enabling debugging only for that cache
313ee65728eSMike Rapoport
314ee65728eSMike RapoportI.e.::
315ee65728eSMike Rapoport
316ee65728eSMike Rapoport	slub_debug=F,dentry
317ee65728eSMike Rapoport
318ee65728eSMike RapoportIf the corruption occurs by writing after the end of the object then it
319ee65728eSMike Rapoportmay be advisable to enable a Redzone to avoid corrupting the beginning
320ee65728eSMike Rapoportof other objects::
321ee65728eSMike Rapoport
322ee65728eSMike Rapoport	slub_debug=FZ,dentry
323ee65728eSMike Rapoport
324ee65728eSMike RapoportExtended slabinfo mode and plotting
325ee65728eSMike Rapoport===================================
326ee65728eSMike Rapoport
327ee65728eSMike RapoportThe ``slabinfo`` tool has a special 'extended' ('-X') mode that includes:
328ee65728eSMike Rapoport - Slabcache Totals
329ee65728eSMike Rapoport - Slabs sorted by size (up to -N <num> slabs, default 1)
330ee65728eSMike Rapoport - Slabs sorted by loss (up to -N <num> slabs, default 1)
331ee65728eSMike Rapoport
332ee65728eSMike RapoportAdditionally, in this mode ``slabinfo`` does not dynamically scale
333ee65728eSMike Rapoportsizes (G/M/K) and reports everything in bytes (this functionality is
334ee65728eSMike Rapoportalso available to other slabinfo modes via '-B' option) which makes
335ee65728eSMike Rapoportreporting more precise and accurate. Moreover, in some sense the `-X'
336ee65728eSMike Rapoportmode also simplifies the analysis of slabs' behaviour, because its
337ee65728eSMike Rapoportoutput can be plotted using the ``slabinfo-gnuplot.sh`` script. So it
338ee65728eSMike Rapoportpushes the analysis from looking through the numbers (tons of numbers)
339ee65728eSMike Rapoportto something easier -- visual analysis.
340ee65728eSMike Rapoport
341ee65728eSMike RapoportTo generate plots:
342ee65728eSMike Rapoport
343ee65728eSMike Rapoporta) collect slabinfo extended records, for example::
344ee65728eSMike Rapoport
345ee65728eSMike Rapoport	while [ 1 ]; do slabinfo -X >> FOO_STATS; sleep 1; done
346ee65728eSMike Rapoport
347ee65728eSMike Rapoportb) pass stats file(-s) to ``slabinfo-gnuplot.sh`` script::
348ee65728eSMike Rapoport
349ee65728eSMike Rapoport	slabinfo-gnuplot.sh FOO_STATS [FOO_STATS2 .. FOO_STATSN]
350ee65728eSMike Rapoport
351ee65728eSMike Rapoport   The ``slabinfo-gnuplot.sh`` script will pre-processes the collected records
352ee65728eSMike Rapoport   and generates 3 png files (and 3 pre-processing cache files) per STATS
353ee65728eSMike Rapoport   file:
354ee65728eSMike Rapoport   - Slabcache Totals: FOO_STATS-totals.png
355ee65728eSMike Rapoport   - Slabs sorted by size: FOO_STATS-slabs-by-size.png
356ee65728eSMike Rapoport   - Slabs sorted by loss: FOO_STATS-slabs-by-loss.png
357ee65728eSMike Rapoport
358ee65728eSMike RapoportAnother use case, when ``slabinfo-gnuplot.sh`` can be useful, is when you
359ee65728eSMike Rapoportneed to compare slabs' behaviour "prior to" and "after" some code
360ee65728eSMike Rapoportmodification.  To help you out there, ``slabinfo-gnuplot.sh`` script
361ee65728eSMike Rapoportcan 'merge' the `Slabcache Totals` sections from different
362ee65728eSMike Rapoportmeasurements. To visually compare N plots:
363ee65728eSMike Rapoport
364ee65728eSMike Rapoporta) Collect as many STATS1, STATS2, .. STATSN files as you need::
365ee65728eSMike Rapoport
366ee65728eSMike Rapoport	while [ 1 ]; do slabinfo -X >> STATS<X>; sleep 1; done
367ee65728eSMike Rapoport
368ee65728eSMike Rapoportb) Pre-process those STATS files::
369ee65728eSMike Rapoport
370ee65728eSMike Rapoport	slabinfo-gnuplot.sh STATS1 STATS2 .. STATSN
371ee65728eSMike Rapoport
372ee65728eSMike Rapoportc) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
373ee65728eSMike Rapoport   generated pre-processed \*-totals::
374ee65728eSMike Rapoport
375ee65728eSMike Rapoport	slabinfo-gnuplot.sh -t STATS1-totals STATS2-totals .. STATSN-totals
376ee65728eSMike Rapoport
377ee65728eSMike Rapoport   This will produce a single plot (png file).
378ee65728eSMike Rapoport
379ee65728eSMike Rapoport   Plots, expectedly, can be large so some fluctuations or small spikes
380ee65728eSMike Rapoport   can go unnoticed. To deal with that, ``slabinfo-gnuplot.sh`` has two
381ee65728eSMike Rapoport   options to 'zoom-in'/'zoom-out':
382ee65728eSMike Rapoport
383ee65728eSMike Rapoport   a) ``-s %d,%d`` -- overwrites the default image width and height
384ee65728eSMike Rapoport   b) ``-r %d,%d`` -- specifies a range of samples to use (for example,
385ee65728eSMike Rapoport      in ``slabinfo -X >> FOO_STATS; sleep 1;`` case, using a ``-r
386ee65728eSMike Rapoport      40,60`` range will plot only samples collected between 40th and
387ee65728eSMike Rapoport      60th seconds).
388ee65728eSMike Rapoport
389ee65728eSMike Rapoport
390ee65728eSMike RapoportDebugFS files for SLUB
391ee65728eSMike Rapoport======================
392ee65728eSMike Rapoport
393ee65728eSMike RapoportFor more information about current state of SLUB caches with the user tracking
394ee65728eSMike Rapoportdebug option enabled, debugfs files are available, typically under
395ee65728eSMike Rapoport/sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user
396ee65728eSMike Rapoporttracking). There are 2 types of these files with the following debug
397ee65728eSMike Rapoportinformation:
398ee65728eSMike Rapoport
399ee65728eSMike Rapoport1. alloc_traces::
400ee65728eSMike Rapoport
401ee65728eSMike Rapoport    Prints information about unique allocation traces of the currently
402ee65728eSMike Rapoport    allocated objects. The output is sorted by frequency of each trace.
403ee65728eSMike Rapoport
404ee65728eSMike Rapoport    Information in the output:
4056edf2576SFeng Tang    Number of objects, allocating function, possible memory wastage of
4066edf2576SFeng Tang    kmalloc objects(total/per-object), minimal/average/maximal jiffies
4076edf2576SFeng Tang    since alloc, pid range of the allocating processes, cpu mask of
4086edf2576SFeng Tang    allocating cpus, numa node mask of origins of memory, and stack trace.
409ee65728eSMike Rapoport
410ee65728eSMike Rapoport    Example:::
411ee65728eSMike Rapoport
4126edf2576SFeng Tang    338 pci_alloc_dev+0x2c/0xa0 waste=521872/1544 age=290837/291891/293509 pid=1 cpus=106 nodes=0-1
4136edf2576SFeng Tang        __kmem_cache_alloc_node+0x11f/0x4e0
4146edf2576SFeng Tang        kmalloc_trace+0x26/0xa0
4156edf2576SFeng Tang        pci_alloc_dev+0x2c/0xa0
4166edf2576SFeng Tang        pci_scan_single_device+0xd2/0x150
4176edf2576SFeng Tang        pci_scan_slot+0xf7/0x2d0
4186edf2576SFeng Tang        pci_scan_child_bus_extend+0x4e/0x360
4196edf2576SFeng Tang        acpi_pci_root_create+0x32e/0x3b0
4206edf2576SFeng Tang        pci_acpi_scan_root+0x2b9/0x2d0
4216edf2576SFeng Tang        acpi_pci_root_add.cold.11+0x110/0xb0a
4226edf2576SFeng Tang        acpi_bus_attach+0x262/0x3f0
4236edf2576SFeng Tang        device_for_each_child+0xb7/0x110
4246edf2576SFeng Tang        acpi_dev_for_each_child+0x77/0xa0
4256edf2576SFeng Tang        acpi_bus_attach+0x108/0x3f0
4266edf2576SFeng Tang        device_for_each_child+0xb7/0x110
4276edf2576SFeng Tang        acpi_dev_for_each_child+0x77/0xa0
4286edf2576SFeng Tang        acpi_bus_attach+0x108/0x3f0
429ee65728eSMike Rapoport
430ee65728eSMike Rapoport2. free_traces::
431ee65728eSMike Rapoport
432ee65728eSMike Rapoport    Prints information about unique freeing traces of the currently allocated
433ee65728eSMike Rapoport    objects. The freeing traces thus come from the previous life-cycle of the
434ee65728eSMike Rapoport    objects and are reported as not available for objects allocated for the first
435ee65728eSMike Rapoport    time. The output is sorted by frequency of each trace.
436ee65728eSMike Rapoport
437ee65728eSMike Rapoport    Information in the output:
438ee65728eSMike Rapoport    Number of objects, freeing function, minimal/average/maximal jiffies since free,
439ee65728eSMike Rapoport    pid range of the freeing processes, cpu mask of freeing cpus, and stack trace.
440ee65728eSMike Rapoport
441ee65728eSMike Rapoport    Example:::
442ee65728eSMike Rapoport
443ee65728eSMike Rapoport    1980 <not-available> age=4294912290 pid=0 cpus=0
444ee65728eSMike Rapoport    51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1
445ee65728eSMike Rapoport	kfree+0x2db/0x420
446ee65728eSMike Rapoport	acpi_ut_update_ref_count+0x6a6/0x782
447ee65728eSMike Rapoport	acpi_ut_update_object_reference+0x1ad/0x234
448ee65728eSMike Rapoport	acpi_ut_remove_reference+0x7d/0x84
449ee65728eSMike Rapoport	acpi_rs_get_prt_method_data+0x97/0xd6
450ee65728eSMike Rapoport	acpi_get_irq_routing_table+0x82/0xc4
451ee65728eSMike Rapoport	acpi_pci_irq_find_prt_entry+0x8e/0x2e0
452ee65728eSMike Rapoport	acpi_pci_irq_lookup+0x3a/0x1e0
453ee65728eSMike Rapoport	acpi_pci_irq_enable+0x77/0x240
454ee65728eSMike Rapoport	pcibios_enable_device+0x39/0x40
455ee65728eSMike Rapoport	do_pci_enable_device.part.0+0x5d/0xe0
456ee65728eSMike Rapoport	pci_enable_device_flags+0xfc/0x120
457ee65728eSMike Rapoport	pci_enable_device+0x13/0x20
458ee65728eSMike Rapoport	virtio_pci_probe+0x9e/0x170
459ee65728eSMike Rapoport	local_pci_probe+0x48/0x80
460ee65728eSMike Rapoport	pci_device_probe+0x105/0x1c0
461ee65728eSMike Rapoport
462ee65728eSMike RapoportChristoph Lameter, May 30, 2007
463ee65728eSMike RapoportSergey Senozhatsky, October 23, 2015
464