Lines Matching full:of

1 .\" Copyright (c) 1988 The Regents of the University of California.
7 .\" 1. Redistributions of source code must retain the above copyright
8 .\" notice, this list of conditions and the following disclaimer.
10 .\" notice, this list of conditions and the following disclaimer in the
12 .\" 3. Neither the name of the University nor the names of its contributors
18 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
67 \fIProceedings of the San Francisco USENIX Conference\fP,
74 Design of a General Purpose Memory Allocator for the 4.3BSD UNIX\(dg Kernel
78 .EH 'Design of a General Purpose Memory ...''McKusick, Karels'
79 .OH 'McKusick, Karels''Design of a General Purpose Memory ...'
81 \(dgUNIX is a registered trademark of AT&T in the US and other countries.
90 Department of Electrical Engineering and Computer Science
91 University of California, Berkeley
95 each designed for the particular needs of the utilizing subsystem.
97 that can be used by all of the kernel subsystems.
98 The design of this allocator takes advantage of known memory usage
103 results in more efficient use of global memory by eliminating
107 The paper concludes with a discussion of our experience in using
115 Some of them handle large blocks,
116 some of them handle small chained data structures,
118 Often the allocations are for small pieces of memory that are only
119 needed for the duration of a single system call.
123 it is not feasible to allocate even moderate blocks of memory on it.
128 Other blocks of memory must be more persistent than a single system call
131 the duration of the network connection.
135 Each time a new type of memory allocation has been required,
139 For example, the block device subsystem provides a crude form of
140 memory allocation through the allocation of empty buffers [Thompson78].
141 The allocation is slow because of the implied semantics of
143 and moving physical memory into or out of the buffer to create
146 for name translation that allocates a pool of empty buffers.
151 First, the new allocator can only handle a limited range of sizes.
154 Finally, it creates yet another interface of
159 Rather than providing many semi-specialized ways of allocating memory,
165 it helps avoid the syndrome of creating yet another special
168 To ease the task of understanding how to use it,
177 size of memory that is needed.
178 The range of sizes for memory requests should not be constrained.
181 of the piece of memory being freed.
189 Good use of memory is measured by the amount of memory needed to hold
190 a set of allocations at any point in time.
203 Here, ``requested'' is the sum of the memory that has been requested
205 ``Required'' is the amount of memory that has been
207 An allocator requires more memory than requested because of fragmentation
208 and a need to have a ready supply of free memory for future requests.
209 A perfect memory allocator would have a utilization of 100%.
216 unused parts of their address space can be paged out.
218 that are part of the ``required'' pool that are not
233 Speed of allocation is more critical when executing in the
239 and if it is slow, it will degrade the performance of every process
245 Instead they will build their own memory allocator on top of the
246 original by maintaining their own pool of memory blocks.
248 The kernel ends up with many different free lists of memory
249 instead of a single free list from which all allocation can be drawn.
251 consider the case of two subsystems that need memory.
253 the amount of memory tied up in the two lists will be the
254 sum of the greatest amount of memory that each of
257 the amount of memory tied up in the free list may be as low as the
258 greatest amount of memory that either subsystem used.
259 As the number of subsystems grows,
264 implementations of user-level memory allocators.
265 A survey of those available on UNIX systems appeared in [Korn85].
266 Nearly all of the memory allocators tested made good use of memory,
267 though most of them were too slow for use in the kernel.
268 The fastest memory allocator in the survey by nearly a factor of two
270 written by Chris Kingsley at California Institute of Technology.
275 The 4.2BSD user-level memory allocator works by maintaining a set of lists
276 that are ordered by increasing powers of two.
277 Each list contains a set of memory blocks of its corresponding size.
279 the size of the request is rounded up to the next power of two.
280 A piece of memory is then removed from the list corresponding
281 to the specified power of two and returned to the requester.
282 Thus, a request for a block of memory of size 53 returns
288 the block of memory is put back onto the list from which it came.
298 This number is never more than the amount of physical memory on the machine,
301 Thus, the kernel can statically allocate a set of data structures
306 For a user process, the maximum amount of memory that may be allocated
307 is a function of the maximum size of its virtual memory.
316 Another special condition of the kernel memory allocator is that it
319 the kernel can keep an arena of kernel addresses and allocate
321 The effect is much the same as a user process that has parts of
323 except that the kernel can explicitly control the set of pages
325 The result is that the ``working set'' of pages in use by the
326 kernel exactly corresponds to the set of pages that it is really using.
332 all of the different uses of dynamic memory are known in advance.
333 Each one of these uses of dynamic memory can be assigned a type.
334 For each type of dynamic memory that is allocated,
337 no single allocator could starve the rest of the kernel of all
340 By putting limits on each type of memory,
349 \*(Lb shows the memory usage of the kernel over a one day period
352 the ``Requests'' field is the number of allocations since system startup;
353 the ``High Use'' field is the maximum value of
362 fast for small pieces of memory.
363 .H 1 "Implementation of the Kernel Memory Allocator
366 none of their strategies could be used without some modification.
371 Small allocations are done using the 4.2BSD power-of-two list strategy;
372 the typical allocation requires only a computation of
373 the list to use and the removal of an element if it is available,
375 Macros are provided to avoid the cost of a subroutine call.
380 Appendix A shows the data structures and implementation of the macros.
382 Similarly, freeing a block of memory can be done with a macro.
385 The free routine is called only if the block of memory is
387 Including the cost of blocking out interrupts,
391 Because of the inefficiency of power-of-two allocation strategies
394 The selection of two kilobytes is derived from our statistics on
395 the utilization of memory within the kernel,
396 that showed that 95 to 98% of allocations are of size one kilobyte or less.
397 A frequent caller of the memory allocator
401 pieces of memory in multiples of pages.
402 Consequently the actual allocation size for requests of size
406 observes that the power-of-two algorithm yields sizes of 1, 2, 4, 8, \&...
408 of pages yields sizes of 1, 2, 3, 4, \&... pages.
409 Thus for allocations of sizes between one and two pages
411 it is not until allocations of sizes between two and three pages
412 that a difference emerges where the power-of-two algorithm will use
418 Large allocations are first rounded up to be a multiple of the page size.
421 Thus a request for a five kilobyte piece of memory will use exactly
422 five pages of memory rather than eight kilobytes as with
423 the power-of-two allocation strategy.
424 When a large piece of memory is freed,
429 Another technique to improve both the efficiency of memory utilization
430 and the speed of allocation
432 When a list for a power-of-two allocation is empty,
433 a new page is allocated and divided into pieces of the needed size.
434 This strategy speeds future allocations as several pieces of memory
435 become available as a result of the call into the allocator.
437 .FI "Calculation of allocation size"
440 Because the size is not specified when a block of memory is freed,
441 the allocator must keep track of the sizes of the pieces it has handed out.
442 The 4.2BSD user-level allocator stores the size of each block
445 require a power-of-two-sized block.
447 instead of storing the size of each piece of memory with the piece itself,
450 the size of a piece of memory that is being freed,
453 Eliminating the cost of the overhead per piece improved utilization
455 The reason is that many allocations in the kernel are for blocks of
456 memory whose size is exactly a power of two.
460 The allocator can be called both from the top half of the kernel,
462 and from the interrupt routines in the bottom half of the kernel
474 .H 1 "Results of the Implementation
479 Many of the special purpose allocators have been eliminated.
485 Many of the special purpose memory allocators built on
486 top of other allocators have also been eliminated.
487 For example, the allocator that was built on top of the buffer pool allocator
493 we have found that none of the special purpose pools are needed.
494 Indeed, the allocation is about the same as the previous cost of
500 Quantifying the performance of the allocator is difficult because
501 it is hard to measure the amount of time spent allocating
504 and then compare the running time of the routines that
511 and for its intended purpose of providing buffers to the filesystem.
513 To get a measure of the cost of memory allocation before
515 we summed up the running time of all the routines whose
518 of the running time of the multi-purpose routines that could
520 This number showed that approximately three percent of
524 because the usual case of the memory allocator is implemented as a macro.
525 Thus, its running time is a small fraction of the running time of the
531 Factoring out the cost of the statistics collection and the
535 at most four percent of time in the kernel.
543 It also reduces the amount of bookkeeping information associated
544 with the allocator to four kilobytes of information
545 per megabyte of memory under management (with a one kilobyte page size).
548 Our next project is to convert many of the static
553 First, it will reduce the amount of memory
565 a quarter megabyte piece of memory once,
572 If a burst of requests came in for a particular size,
573 that size would acquire a large amount of memory
580 that can run as part of the idle loop that would sort the elements
581 on each of the free lists into order of increasing address.
582 Since any given page has only one size of elements allocated from it,
583 the effect of the sorting would be to sort the list into distinct pages.
584 When all the pieces of a page became free,
587 Although there is no guarantee that all the pieces of a page would ever
589 most allocations are short-lived, lasting only for the duration of
592 the front of the list,
593 return of elements from pages at the back would eventually
596 Two of the traditional UNIX
599 That part of the system is expected to undergo major revision within
612 This change will allow the size of the buffer pool to be changed
618 In the spirit of community support,
619 we have made various versions of our allocator available to our test sites.
623 The feedback from the Usenix program committee on the initial draft of
629 ``In Search of a Better Malloc''
630 \fIProceedings of the Portland Usenix Conference\fP,
635 \fIProceedings of the Portland Usenix Conference\fP,
640 \fIProceedings of the San Francisco Usenix Conference\fP,