1a4bd5210SJason EvansFollowing are change highlights associated with official releases. Important 2d0e79aa3SJason Evansbug fixes are all mentioned, but some internal enhancements are omitted here for 3d0e79aa3SJason Evansbrevity. Much more detail can be found in the git revision history: 4a4bd5210SJason Evans 5706d9bd1SJason Evans https://github.com/jemalloc/jemalloc 6706d9bd1SJason Evans 7*b7eaed25SJason Evans* 5.0.0 (June 13, 2017) 8*b7eaed25SJason Evans 9*b7eaed25SJason Evans Unlike all previous jemalloc releases, this release does not use naturally 10*b7eaed25SJason Evans aligned "chunks" for virtual memory management, and instead uses page-aligned 11*b7eaed25SJason Evans "extents". This change has few externally visible effects, but the internal 12*b7eaed25SJason Evans impacts are... extensive. Many other internal changes combine to make this 13*b7eaed25SJason Evans the most cohesively designed version of jemalloc so far, with ample 14*b7eaed25SJason Evans opportunity for further enhancements. 15*b7eaed25SJason Evans 16*b7eaed25SJason Evans Continuous integration is now an integral aspect of development thanks to the 17*b7eaed25SJason Evans efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably 18*b7eaed25SJason Evans stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a 19*b7eaed25SJason Evans side effect the official release frequency may decrease over time. 20*b7eaed25SJason Evans 21*b7eaed25SJason Evans New features: 22*b7eaed25SJason Evans - Implement optional per-CPU arena support; threads choose which arena to use 23*b7eaed25SJason Evans based on current CPU rather than on fixed thread-->arena associations. 24*b7eaed25SJason Evans (@interwq) 25*b7eaed25SJason Evans - Implement two-phase decay of unused dirty pages. Pages transition from 26*b7eaed25SJason Evans dirty-->muzzy-->clean, where the first phase transition relies on 27*b7eaed25SJason Evans madvise(... MADV_FREE) semantics, and the second phase transition discards 28*b7eaed25SJason Evans pages such that they are replaced with demand-zeroed pages on next access. 29*b7eaed25SJason Evans (@jasone) 30*b7eaed25SJason Evans - Increase decay time resolution from seconds to milliseconds. (@jasone) 31*b7eaed25SJason Evans - Implement opt-in per CPU background threads, and use them for asynchronous 32*b7eaed25SJason Evans decay-driven unused dirty page purging. (@interwq) 33*b7eaed25SJason Evans - Add mutex profiling, which collects a variety of statistics useful for 34*b7eaed25SJason Evans diagnosing overhead/contention issues. (@interwq) 35*b7eaed25SJason Evans - Add C++ new/delete operator bindings. (@djwatson) 36*b7eaed25SJason Evans - Support manually created arena destruction, such that all data and metadata 37*b7eaed25SJason Evans are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats 38*b7eaed25SJason Evans associated with destroyed arenas. (@jasone) 39*b7eaed25SJason Evans - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing 40*b7eaed25SJason Evans merged/destroyed arena statistics via mallctl. (@jasone) 41*b7eaed25SJason Evans - Add opt.abort_conf to optionally abort if invalid configuration options are 42*b7eaed25SJason Evans detected during initialization. (@interwq) 43*b7eaed25SJason Evans - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the 44*b7eaed25SJason Evans stats dumped during exit if opt.stats_print is true. (@jasone) 45*b7eaed25SJason Evans - Add --with-version=VERSION for use when embedding jemalloc into another 46*b7eaed25SJason Evans project's git repository. (@jasone) 47*b7eaed25SJason Evans - Add --disable-thp to support cross compiling. (@jasone) 48*b7eaed25SJason Evans - Add --with-lg-hugepage to support cross compiling. (@jasone) 49*b7eaed25SJason Evans - Add mallctl interfaces (various authors): 50*b7eaed25SJason Evans + background_thread 51*b7eaed25SJason Evans + opt.abort_conf 52*b7eaed25SJason Evans + opt.retain 53*b7eaed25SJason Evans + opt.percpu_arena 54*b7eaed25SJason Evans + opt.background_thread 55*b7eaed25SJason Evans + opt.{dirty,muzzy}_decay_ms 56*b7eaed25SJason Evans + opt.stats_print_opts 57*b7eaed25SJason Evans + arena.<i>.initialized 58*b7eaed25SJason Evans + arena.<i>.destroy 59*b7eaed25SJason Evans + arena.<i>.{dirty,muzzy}_decay_ms 60*b7eaed25SJason Evans + arena.<i>.extent_hooks 61*b7eaed25SJason Evans + arenas.{dirty,muzzy}_decay_ms 62*b7eaed25SJason Evans + arenas.bin.<i>.slab_size 63*b7eaed25SJason Evans + arenas.nlextents 64*b7eaed25SJason Evans + arenas.lextent.<i>.size 65*b7eaed25SJason Evans + arenas.create 66*b7eaed25SJason Evans + stats.background_thread.{num_threads,num_runs,run_interval} 67*b7eaed25SJason Evans + stats.mutexes.{ctl,background_thread,prof,reset}. 68*b7eaed25SJason Evans {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds, 69*b7eaed25SJason Evans num_owner_switch} 70*b7eaed25SJason Evans + stats.arenas.<i>.{dirty,muzzy}_decay_ms 71*b7eaed25SJason Evans + stats.arenas.<i>.uptime 72*b7eaed25SJason Evans + stats.arenas.<i>.{pmuzzy,base,internal,resident} 73*b7eaed25SJason Evans + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} 74*b7eaed25SJason Evans + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs} 75*b7eaed25SJason Evans + stats.arenas.<i>.bins.<j>.mutex. 76*b7eaed25SJason Evans {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds, 77*b7eaed25SJason Evans num_owner_switch} 78*b7eaed25SJason Evans + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents} 79*b7eaed25SJason Evans + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy, 80*b7eaed25SJason Evans extents_retained,decay_dirty,decay_muzzy,base,tcache_list}. 81*b7eaed25SJason Evans {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds, 82*b7eaed25SJason Evans num_owner_switch} 83*b7eaed25SJason Evans 84*b7eaed25SJason Evans Portability improvements: 85*b7eaed25SJason Evans - Improve reentrant allocation support, such that deadlock is less likely if 86*b7eaed25SJason Evans e.g. a system library call in turn allocates memory. (@davidtgoldblatt, 87*b7eaed25SJason Evans @interwq) 88*b7eaed25SJason Evans - Support static linking of jemalloc with glibc. (@djwatson) 89*b7eaed25SJason Evans 90*b7eaed25SJason Evans Optimizations and refactors: 91*b7eaed25SJason Evans - Organize virtual memory as "extents" of virtual memory pages, rather than as 92*b7eaed25SJason Evans naturally aligned "chunks", and store all metadata in arbitrarily distant 93*b7eaed25SJason Evans locations. This reduces virtual memory external fragmentation, and will 94*b7eaed25SJason Evans interact better with huge pages (not yet explicitly supported). (@jasone) 95*b7eaed25SJason Evans - Fold large and huge size classes together; only small and large size classes 96*b7eaed25SJason Evans remain. (@jasone) 97*b7eaed25SJason Evans - Unify the allocation paths, and merge most fast-path branching decisions. 98*b7eaed25SJason Evans (@davidtgoldblatt, @interwq) 99*b7eaed25SJason Evans - Embed per thread automatic tcache into thread-specific data, which reduces 100*b7eaed25SJason Evans conditional branches and dereferences. Also reorganize tcache to increase 101*b7eaed25SJason Evans fast-path data locality. (@interwq) 102*b7eaed25SJason Evans - Rewrite atomics to closely model the C11 API, convert various 103*b7eaed25SJason Evans synchronization from mutex-based to atomic, and use the explicit memory 104*b7eaed25SJason Evans ordering control to resolve various hypothetical races without increasing 105*b7eaed25SJason Evans synchronization overhead. (@davidtgoldblatt) 106*b7eaed25SJason Evans - Extensively optimize rtree via various methods: 107*b7eaed25SJason Evans + Add multiple layers of rtree lookup caching, since rtree lookups are now 108*b7eaed25SJason Evans part of fast-path deallocation. (@interwq) 109*b7eaed25SJason Evans + Determine rtree layout at compile time. (@jasone) 110*b7eaed25SJason Evans + Make the tree shallower for common configurations. (@jasone) 111*b7eaed25SJason Evans + Embed the root node in the top-level rtree data structure, thus avoiding 112*b7eaed25SJason Evans one level of indirection. (@jasone) 113*b7eaed25SJason Evans + Further specialize leaf elements as compared to internal node elements, 114*b7eaed25SJason Evans and directly embed extent metadata needed for fast-path deallocation. 115*b7eaed25SJason Evans (@jasone) 116*b7eaed25SJason Evans + Ignore leading always-zero address bits (architecture-specific). 117*b7eaed25SJason Evans (@jasone) 118*b7eaed25SJason Evans - Reorganize headers (ongoing work) to make them hermetic, and disentangle 119*b7eaed25SJason Evans various module dependencies. (@davidtgoldblatt) 120*b7eaed25SJason Evans - Convert various internal data structures such as size class metadata from 121*b7eaed25SJason Evans boot-time-initialized to compile-time-initialized. Propagate resulting data 122*b7eaed25SJason Evans structure simplifications, such as making arena metadata fixed-size. 123*b7eaed25SJason Evans (@jasone) 124*b7eaed25SJason Evans - Simplify size class lookups when constrained to size classes that are 125*b7eaed25SJason Evans multiples of the page size. This speeds lookups, but the primary benefit is 126*b7eaed25SJason Evans complexity reduction in code that was the source of numerous regressions. 127*b7eaed25SJason Evans (@jasone) 128*b7eaed25SJason Evans - Lock individual extents when possible for localized extent operations, 129*b7eaed25SJason Evans rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone) 130*b7eaed25SJason Evans - Use first fit layout policy instead of best fit, in order to improve 131*b7eaed25SJason Evans packing. (@jasone) 132*b7eaed25SJason Evans - If munmap(2) is not in use, use an exponential series to grow each arena's 133*b7eaed25SJason Evans virtual memory, so that the number of disjoint virtual memory mappings 134*b7eaed25SJason Evans remains low. (@jasone) 135*b7eaed25SJason Evans - Implement per arena base allocators, so that arenas never share any virtual 136*b7eaed25SJason Evans memory pages. (@jasone) 137*b7eaed25SJason Evans - Automatically generate private symbol name mangling macros. (@jasone) 138*b7eaed25SJason Evans 139*b7eaed25SJason Evans Incompatible changes: 140*b7eaed25SJason Evans - Replace chunk hooks with an expanded/normalized set of extent hooks. 141*b7eaed25SJason Evans (@jasone) 142*b7eaed25SJason Evans - Remove ratio-based purging. (@jasone) 143*b7eaed25SJason Evans - Remove --disable-tcache. (@jasone) 144*b7eaed25SJason Evans - Remove --disable-tls. (@jasone) 145*b7eaed25SJason Evans - Remove --enable-ivsalloc. (@jasone) 146*b7eaed25SJason Evans - Remove --with-lg-size-class-group. (@jasone) 147*b7eaed25SJason Evans - Remove --with-lg-tiny-min. (@jasone) 148*b7eaed25SJason Evans - Remove --disable-cc-silence. (@jasone) 149*b7eaed25SJason Evans - Remove --enable-code-coverage. (@jasone) 150*b7eaed25SJason Evans - Remove --disable-munmap (replaced by opt.retain). (@jasone) 151*b7eaed25SJason Evans - Remove Valgrind support. (@jasone) 152*b7eaed25SJason Evans - Remove quarantine support. (@jasone) 153*b7eaed25SJason Evans - Remove redzone support. (@jasone) 154*b7eaed25SJason Evans - Remove mallctl interfaces (various authors): 155*b7eaed25SJason Evans + config.munmap 156*b7eaed25SJason Evans + config.tcache 157*b7eaed25SJason Evans + config.tls 158*b7eaed25SJason Evans + config.valgrind 159*b7eaed25SJason Evans + opt.lg_chunk 160*b7eaed25SJason Evans + opt.purge 161*b7eaed25SJason Evans + opt.lg_dirty_mult 162*b7eaed25SJason Evans + opt.decay_time 163*b7eaed25SJason Evans + opt.quarantine 164*b7eaed25SJason Evans + opt.redzone 165*b7eaed25SJason Evans + opt.thp 166*b7eaed25SJason Evans + arena.<i>.lg_dirty_mult 167*b7eaed25SJason Evans + arena.<i>.decay_time 168*b7eaed25SJason Evans + arena.<i>.chunk_hooks 169*b7eaed25SJason Evans + arenas.initialized 170*b7eaed25SJason Evans + arenas.lg_dirty_mult 171*b7eaed25SJason Evans + arenas.decay_time 172*b7eaed25SJason Evans + arenas.bin.<i>.run_size 173*b7eaed25SJason Evans + arenas.nlruns 174*b7eaed25SJason Evans + arenas.lrun.<i>.size 175*b7eaed25SJason Evans + arenas.nhchunks 176*b7eaed25SJason Evans + arenas.hchunk.<i>.size 177*b7eaed25SJason Evans + arenas.extend 178*b7eaed25SJason Evans + stats.cactive 179*b7eaed25SJason Evans + stats.arenas.<i>.lg_dirty_mult 180*b7eaed25SJason Evans + stats.arenas.<i>.decay_time 181*b7eaed25SJason Evans + stats.arenas.<i>.metadata.{mapped,allocated} 182*b7eaed25SJason Evans + stats.arenas.<i>.{npurge,nmadvise,purged} 183*b7eaed25SJason Evans + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests} 184*b7eaed25SJason Evans + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns} 185*b7eaed25SJason Evans + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns} 186*b7eaed25SJason Evans + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks} 187*b7eaed25SJason Evans 188*b7eaed25SJason Evans Bug fixes: 189*b7eaed25SJason Evans - Improve interval-based profile dump triggering to dump only one profile when 190*b7eaed25SJason Evans a single allocation's size exceeds the interval. (@jasone) 191*b7eaed25SJason Evans - Use prefixed function names (as controlled by --with-jemalloc-prefix) when 192*b7eaed25SJason Evans pruning backtrace frames in jeprof. (@jasone) 193*b7eaed25SJason Evans 1948244f2aaSJason Evans* 4.5.0 (February 28, 2017) 1958244f2aaSJason Evans 1968244f2aaSJason Evans This is the first release to benefit from much broader continuous integration 1978244f2aaSJason Evans testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure 1988244f2aaSJason Evans in place for prior releases, it would have caught all of the most serious 1998244f2aaSJason Evans regressions fixed by this release. 2008244f2aaSJason Evans 2018244f2aaSJason Evans New features: 202*b7eaed25SJason Evans - Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for 2038244f2aaSJason Evans transparent huge page integration. (@jasone) 2048244f2aaSJason Evans - Update zone allocator integration to work with macOS 10.12. (@glandium) 2058244f2aaSJason Evans - Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and 2068244f2aaSJason Evans EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not 2078244f2aaSJason Evans during configuration. (@jasone, @ronawho) 2088244f2aaSJason Evans 2098244f2aaSJason Evans Bug fixes: 2108244f2aaSJason Evans - Fix DSS (sbrk(2)-based) allocation. This regression was first released in 2118244f2aaSJason Evans 4.3.0. (@jasone) 2128244f2aaSJason Evans - Handle race in per size class utilization computation. This functionality 2138244f2aaSJason Evans was first released in 4.0.0. (@interwq) 2148244f2aaSJason Evans - Fix lock order reversal during gdump. (@jasone) 215*b7eaed25SJason Evans - Fix/refactor tcache synchronization. This regression was first released in 2168244f2aaSJason Evans 4.0.0. (@jasone) 2178244f2aaSJason Evans - Fix various JSON-formatted malloc_stats_print() bugs. This functionality 2188244f2aaSJason Evans was first released in 4.3.0. (@jasone) 2198244f2aaSJason Evans - Fix huge-aligned allocation. This regression was first released in 4.4.0. 2208244f2aaSJason Evans (@jasone) 2218244f2aaSJason Evans - When transparent huge page integration is enabled, detect what state pages 2228244f2aaSJason Evans start in according to the kernel's current operating mode, and only convert 2238244f2aaSJason Evans arena chunks to non-huge during purging if that is not their initial state. 2248244f2aaSJason Evans This functionality was first released in 4.4.0. (@jasone) 2258244f2aaSJason Evans - Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case. 2268244f2aaSJason Evans This regression was first released in 4.0.0. (@jasone, @428desmo) 2278244f2aaSJason Evans - Properly detect sparc64 when building for Linux. (@glaubitz) 2288244f2aaSJason Evans 2297fa7f12fSJason Evans* 4.4.0 (December 3, 2016) 2307fa7f12fSJason Evans 2317fa7f12fSJason Evans New features: 2327fa7f12fSJason Evans - Add configure support for *-*-linux-android. (@cferris1000, @jasone) 2337fa7f12fSJason Evans - Add the --disable-syscall configure option, for use on systems that place 2347fa7f12fSJason Evans security-motivated limitations on syscall(2). (@jasone) 2357fa7f12fSJason Evans - Add support for Debian GNU/kFreeBSD. (@thesam) 2367fa7f12fSJason Evans 2377fa7f12fSJason Evans Optimizations: 2387fa7f12fSJason Evans - Add extent serial numbers and use them where appropriate as a sort key that 2397fa7f12fSJason Evans is higher priority than address, so that the allocation policy prefers older 2407fa7f12fSJason Evans extents. This tends to improve locality (decrease fragmentation) when 2417fa7f12fSJason Evans memory grows downward. (@jasone) 2427fa7f12fSJason Evans - Refactor madvise(2) configuration so that MADV_FREE is detected and utilized 2437fa7f12fSJason Evans on Linux 4.5 and newer. (@jasone) 2447fa7f12fSJason Evans - Mark partially purged arena chunks as non-huge-page. This improves 2457fa7f12fSJason Evans interaction with Linux's transparent huge page functionality. (@jasone) 2467fa7f12fSJason Evans 2477fa7f12fSJason Evans Bug fixes: 2487fa7f12fSJason Evans - Fix size class computations for edge conditions involving extremely large 2497fa7f12fSJason Evans allocations. This regression was first released in 4.0.0. (@jasone, 2507fa7f12fSJason Evans @ingvarha) 2517fa7f12fSJason Evans - Remove overly restrictive assertions related to the cactive statistic. This 2527fa7f12fSJason Evans regression was first released in 4.1.0. (@jasone) 2537fa7f12fSJason Evans - Implement a more reliable detection scheme for os_unfair_lock on macOS. 2547fa7f12fSJason Evans (@jszakmeister) 2557fa7f12fSJason Evans 256bde95144SJason Evans* 4.3.1 (November 7, 2016) 257bde95144SJason Evans 258bde95144SJason Evans Bug fixes: 259bde95144SJason Evans - Fix a severe virtual memory leak. This regression was first released in 260bde95144SJason Evans 4.3.0. (@interwq, @jasone) 261bde95144SJason Evans - Refactor atomic and prng APIs to restore support for 32-bit platforms that 262bde95144SJason Evans use pre-C11 toolchains, e.g. FreeBSD's mips. (@jasone) 263bde95144SJason Evans 264bde95144SJason Evans* 4.3.0 (November 4, 2016) 265bde95144SJason Evans 266bde95144SJason Evans This is the first release that passes the test suite for multiple Windows 267bde95144SJason Evans configurations, thanks in large part to @glandium setting up continuous 268bde95144SJason Evans integration via AppVeyor (and Travis CI for Linux and OS X). 269bde95144SJason Evans 270bde95144SJason Evans New features: 271bde95144SJason Evans - Add "J" (JSON) support to malloc_stats_print(). (@jasone) 272bde95144SJason Evans - Add Cray compiler support. (@ronawho) 273bde95144SJason Evans 274bde95144SJason Evans Optimizations: 275bde95144SJason Evans - Add/use adaptive spinning for bootstrapping and radix tree node 276bde95144SJason Evans initialization. (@jasone) 277bde95144SJason Evans 278bde95144SJason Evans Bug fixes: 279bde95144SJason Evans - Fix large allocation to search starting in the optimal size class heap, 280bde95144SJason Evans which can substantially reduce virtual memory churn and fragmentation. This 281bde95144SJason Evans regression was first released in 4.0.0. (@mjp41, @jasone) 282bde95144SJason Evans - Fix stats.arenas.<i>.nthreads accounting. (@interwq) 283bde95144SJason Evans - Fix and simplify decay-based purging. (@jasone) 284bde95144SJason Evans - Make DSS (sbrk(2)-related) operations lockless, which resolves potential 285bde95144SJason Evans deadlocks during thread exit. (@jasone) 286bde95144SJason Evans - Fix over-sized allocation of radix tree leaf nodes. (@mjp41, @ogaun, 287bde95144SJason Evans @jasone) 288bde95144SJason Evans - Fix over-sized allocation of arena_t (plus associated stats) data 289bde95144SJason Evans structures. (@jasone, @interwq) 290bde95144SJason Evans - Fix EXTRA_CFLAGS to not affect configuration. (@jasone) 291bde95144SJason Evans - Fix a Valgrind integration bug. (@ronawho) 292bde95144SJason Evans - Disallow 0x5a junk filling when running in Valgrind. (@jasone) 293bde95144SJason Evans - Fix a file descriptor leak on Linux. This regression was first released in 294bde95144SJason Evans 4.2.0. (@vsarunas, @jasone) 295bde95144SJason Evans - Fix static linking of jemalloc with glibc. (@djwatson) 296bde95144SJason Evans - Use syscall(2) rather than {open,read,close}(2) during boot on Linux. This 297bde95144SJason Evans works around other libraries' system call wrappers performing reentrant 298bde95144SJason Evans allocation. (@kspinka, @Whissi, @jasone) 299bde95144SJason Evans - Fix OS X default zone replacement to work with OS X 10.12. (@glandium, 300bde95144SJason Evans @jasone) 301bde95144SJason Evans - Fix cached memory management to avoid needless commit/decommit operations 302bde95144SJason Evans during purging, which resolves permanent virtual memory map fragmentation 303bde95144SJason Evans issues on Windows. (@mjp41, @jasone) 304bde95144SJason Evans - Fix TSD fetches to avoid (recursive) allocation. This is relevant to 305bde95144SJason Evans non-TLS and Windows configurations. (@jasone) 306bde95144SJason Evans - Fix malloc_conf overriding to work on Windows. (@jasone) 307bde95144SJason Evans - Forcibly disable lazy-lock on Windows (was forcibly *enabled*). (@jasone) 308bde95144SJason Evans 30962b2691eSJason Evans* 4.2.1 (June 8, 2016) 31062b2691eSJason Evans 31162b2691eSJason Evans Bug fixes: 31262b2691eSJason Evans - Fix bootstrapping issues for configurations that require allocation during 31362b2691eSJason Evans tsd initialization (e.g. --disable-tls). (@cferris1000, @jasone) 31462b2691eSJason Evans - Fix gettimeofday() version of nstime_update(). (@ronawho) 31562b2691eSJason Evans - Fix Valgrind regressions in calloc() and chunk_alloc_wrapper(). (@ronawho) 31662b2691eSJason Evans - Fix potential VM map fragmentation regression. (@jasone) 31762b2691eSJason Evans - Fix opt_zero-triggered in-place huge reallocation zeroing. (@jasone) 31862b2691eSJason Evans - Fix heap profiling context leaks in reallocation edge cases. (@jasone) 31962b2691eSJason Evans 3201f0a49e8SJason Evans* 4.2.0 (May 12, 2016) 3211f0a49e8SJason Evans 3221f0a49e8SJason Evans New features: 3231f0a49e8SJason Evans - Add the arena.<i>.reset mallctl, which makes it possible to discard all of 3241f0a49e8SJason Evans an arena's allocations in a single operation. (@jasone) 3251f0a49e8SJason Evans - Add the stats.retained and stats.arenas.<i>.retained statistics. (@jasone) 3261f0a49e8SJason Evans - Add the --with-version configure option. (@jasone) 3271f0a49e8SJason Evans - Support --with-lg-page values larger than actual page size. (@jasone) 3281f0a49e8SJason Evans 3291f0a49e8SJason Evans Optimizations: 3301f0a49e8SJason Evans - Use pairing heaps rather than red-black trees for various hot data 3311f0a49e8SJason Evans structures. (@djwatson, @jasone) 3321f0a49e8SJason Evans - Streamline fast paths of rtree operations. (@jasone) 3331f0a49e8SJason Evans - Optimize the fast paths of calloc() and [m,d,sd]allocx(). (@jasone) 3341f0a49e8SJason Evans - Decommit unused virtual memory if the OS does not overcommit. (@jasone) 3351f0a49e8SJason Evans - Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order 3361f0a49e8SJason Evans to avoid unfortunate interactions during fork(2). (@jasone) 3371f0a49e8SJason Evans 3381f0a49e8SJason Evans Bug fixes: 3391f0a49e8SJason Evans - Fix chunk accounting related to triggering gdump profiles. (@jasone) 3401f0a49e8SJason Evans - Link against librt for clock_gettime(2) if glibc < 2.17. (@jasone) 3411f0a49e8SJason Evans - Scale leak report summary according to sampling probability. (@jasone) 3421f0a49e8SJason Evans 3431f0a49e8SJason Evans* 4.1.1 (May 3, 2016) 3441f0a49e8SJason Evans 3451f0a49e8SJason Evans This bugfix release resolves a variety of mostly minor issues, though the 3461f0a49e8SJason Evans bitmap fix is critical for 64-bit Windows. 3471f0a49e8SJason Evans 3481f0a49e8SJason Evans Bug fixes: 3491f0a49e8SJason Evans - Fix the linear scan version of bitmap_sfu() to shift by the proper amount 3501f0a49e8SJason Evans even when sizeof(long) is not the same as sizeof(void *), as on 64-bit 3511f0a49e8SJason Evans Windows. (@jasone) 3521f0a49e8SJason Evans - Fix hashing functions to avoid unaligned memory accesses (and resulting 3531f0a49e8SJason Evans crashes). This is relevant at least to some ARM-based platforms. 3541f0a49e8SJason Evans (@rkmisra) 3551f0a49e8SJason Evans - Fix fork()-related lock rank ordering reversals. These reversals were 3561f0a49e8SJason Evans unlikely to cause deadlocks in practice except when heap profiling was 3571f0a49e8SJason Evans enabled and active. (@jasone) 3581f0a49e8SJason Evans - Fix various chunk leaks in OOM code paths. (@jasone) 3591f0a49e8SJason Evans - Fix malloc_stats_print() to print opt.narenas correctly. (@jasone) 3601f0a49e8SJason Evans - Fix MSVC-specific build/test issues. (@rustyx, @yuslepukhin) 3611f0a49e8SJason Evans - Fix a variety of test failures that were due to test fragility rather than 3621f0a49e8SJason Evans core bugs. (@jasone) 3631f0a49e8SJason Evans 364df0d881dSJason Evans* 4.1.0 (February 28, 2016) 365df0d881dSJason Evans 366df0d881dSJason Evans This release is primarily about optimizations, but it also incorporates a lot 367df0d881dSJason Evans of portability-motivated refactoring and enhancements. Many people worked on 368df0d881dSJason Evans this release, to an extent that even with the omission here of minor changes 369df0d881dSJason Evans (see git revision history), and of the people who reported and diagnosed 370df0d881dSJason Evans issues, so much of the work was contributed that starting with this release, 371df0d881dSJason Evans changes are annotated with author credits to help reflect the collaborative 372df0d881dSJason Evans effort involved. 373df0d881dSJason Evans 374df0d881dSJason Evans New features: 375df0d881dSJason Evans - Implement decay-based unused dirty page purging, a major optimization with 376df0d881dSJason Evans mallctl API impact. This is an alternative to the existing ratio-based 377df0d881dSJason Evans unused dirty page purging, and is intended to eventually become the sole 378df0d881dSJason Evans purging mechanism. New mallctls: 379df0d881dSJason Evans + opt.purge 380df0d881dSJason Evans + opt.decay_time 381df0d881dSJason Evans + arena.<i>.decay 382df0d881dSJason Evans + arena.<i>.decay_time 383df0d881dSJason Evans + arenas.decay_time 384df0d881dSJason Evans + stats.arenas.<i>.decay_time 385df0d881dSJason Evans (@jasone, @cevans87) 386df0d881dSJason Evans - Add --with-malloc-conf, which makes it possible to embed a default 387df0d881dSJason Evans options string during configuration. This was motivated by the desire to 388df0d881dSJason Evans specify --with-malloc-conf=purge:decay , since the default must remain 389df0d881dSJason Evans purge:ratio until the 5.0.0 release. (@jasone) 390df0d881dSJason Evans - Add MS Visual Studio 2015 support. (@rustyx, @yuslepukhin) 391df0d881dSJason Evans - Make *allocx() size class overflow behavior defined. The maximum 392df0d881dSJason Evans size class is now less than PTRDIFF_MAX to protect applications against 393df0d881dSJason Evans numerical overflow, and all allocation functions are guaranteed to indicate 394df0d881dSJason Evans errors rather than potentially crashing if the request size exceeds the 395df0d881dSJason Evans maximum size class. (@jasone) 396df0d881dSJason Evans - jeprof: 397df0d881dSJason Evans + Add raw heap profile support. (@jasone) 398df0d881dSJason Evans + Add --retain and --exclude for backtrace symbol filtering. (@jasone) 399df0d881dSJason Evans 400df0d881dSJason Evans Optimizations: 401df0d881dSJason Evans - Optimize the fast path to combine various bootstrapping and configuration 402df0d881dSJason Evans checks and execute more streamlined code in the common case. (@interwq) 403df0d881dSJason Evans - Use linear scan for small bitmaps (used for small object tracking). In 404df0d881dSJason Evans addition to speeding up bitmap operations on 64-bit systems, this reduces 405df0d881dSJason Evans allocator metadata overhead by approximately 0.2%. (@djwatson) 406df0d881dSJason Evans - Separate arena_avail trees, which substantially speeds up run tree 407df0d881dSJason Evans operations. (@djwatson) 408df0d881dSJason Evans - Use memoization (boot-time-computed table) for run quantization. Separate 409df0d881dSJason Evans arena_avail trees reduced the importance of this optimization. (@jasone) 410df0d881dSJason Evans - Attempt mmap-based in-place huge reallocation. This can dramatically speed 411df0d881dSJason Evans up incremental huge reallocation. (@jasone) 412df0d881dSJason Evans 413df0d881dSJason Evans Incompatible changes: 414df0d881dSJason Evans - Make opt.narenas unsigned rather than size_t. (@jasone) 415df0d881dSJason Evans 416df0d881dSJason Evans Bug fixes: 417df0d881dSJason Evans - Fix stats.cactive accounting regression. (@rustyx, @jasone) 418df0d881dSJason Evans - Handle unaligned keys in hash(). This caused problems for some ARM systems. 4191f0a49e8SJason Evans (@jasone, @cferris1000) 420df0d881dSJason Evans - Refactor arenas array. In addition to fixing a fork-related deadlock, this 421df0d881dSJason Evans makes arena lookups faster and simpler. (@jasone) 422df0d881dSJason Evans - Move retained memory allocation out of the default chunk allocation 423df0d881dSJason Evans function, to a location that gets executed even if the application installs 424df0d881dSJason Evans a custom chunk allocation function. This resolves a virtual memory leak. 425df0d881dSJason Evans (@buchgr) 4261f0a49e8SJason Evans - Fix a potential tsd cleanup leak. (@cferris1000, @jasone) 427df0d881dSJason Evans - Fix run quantization. In practice this bug had no impact unless 428df0d881dSJason Evans applications requested memory with alignment exceeding one page. 429df0d881dSJason Evans (@jasone, @djwatson) 430df0d881dSJason Evans - Fix LinuxThreads-specific bootstrapping deadlock. (Cosmin Paraschiv) 431df0d881dSJason Evans - jeprof: 432df0d881dSJason Evans + Don't discard curl options if timeout is not defined. (@djwatson) 433df0d881dSJason Evans + Detect failed profile fetches. (@djwatson) 434df0d881dSJason Evans - Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for 435df0d881dSJason Evans --disable-stats case. (@jasone) 436df0d881dSJason Evans 437ba4f5cc0SJason Evans* 4.0.4 (October 24, 2015) 438ba4f5cc0SJason Evans 439ba4f5cc0SJason Evans This bugfix release fixes another xallocx() regression. No other regressions 440ba4f5cc0SJason Evans have come to light in over a month, so this is likely a good starting point 441ba4f5cc0SJason Evans for people who prefer to wait for "dot one" releases with all the major issues 442ba4f5cc0SJason Evans shaken out. 443ba4f5cc0SJason Evans 444ba4f5cc0SJason Evans Bug fixes: 445ba4f5cc0SJason Evans - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large 446ba4f5cc0SJason Evans allocations that have been randomly assigned an offset of 0 when 447ba4f5cc0SJason Evans --enable-cache-oblivious configure option is enabled. 448ba4f5cc0SJason Evans 449ba4f5cc0SJason Evans* 4.0.3 (September 24, 2015) 450ba4f5cc0SJason Evans 451ba4f5cc0SJason Evans This bugfix release continues the trend of xallocx() and heap profiling fixes. 452ba4f5cc0SJason Evans 453ba4f5cc0SJason Evans Bug fixes: 454ba4f5cc0SJason Evans - Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large 455ba4f5cc0SJason Evans allocations when --enable-cache-oblivious configure option is enabled. 456ba4f5cc0SJason Evans - Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations 457ba4f5cc0SJason Evans when resizing from/to a size class that is not a multiple of the chunk size. 458ba4f5cc0SJason Evans - Fix prof_tctx_dump_iter() to filter out nodes that were created after heap 459ba4f5cc0SJason Evans profile dumping started. 460ba4f5cc0SJason Evans - Work around a potentially bad thread-specific data initialization 461ba4f5cc0SJason Evans interaction with NPTL (glibc's pthreads implementation). 462ba4f5cc0SJason Evans 463536b3538SJason Evans* 4.0.2 (September 21, 2015) 464536b3538SJason Evans 465536b3538SJason Evans This bugfix release addresses a few bugs specific to heap profiling. 466536b3538SJason Evans 467536b3538SJason Evans Bug fixes: 468536b3538SJason Evans - Fix ixallocx_prof_sample() to never modify nor create sampled small 469536b3538SJason Evans allocations. xallocx() is in general incapable of moving small allocations, 470536b3538SJason Evans so this fix removes buggy code without loss of generality. 471536b3538SJason Evans - Fix irallocx_prof_sample() to always allocate large regions, even when 472536b3538SJason Evans alignment is non-zero. 473536b3538SJason Evans - Fix prof_alloc_rollback() to read tdata from thread-specific data rather 474536b3538SJason Evans than dereferencing a potentially invalid tctx. 475536b3538SJason Evans 476536b3538SJason Evans* 4.0.1 (September 15, 2015) 477536b3538SJason Evans 478536b3538SJason Evans This is a bugfix release that is somewhat high risk due to the amount of 479536b3538SJason Evans refactoring required to address deep xallocx() problems. As a side effect of 480536b3538SJason Evans these fixes, xallocx() now tries harder to partially fulfill requests for 481536b3538SJason Evans optional extra space. Note that a couple of minor heap profiling 482536b3538SJason Evans optimizations are included, but these are better thought of as performance 483536b3538SJason Evans fixes that were integral to disovering most of the other bugs. 484536b3538SJason Evans 485536b3538SJason Evans Optimizations: 486536b3538SJason Evans - Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the 487536b3538SJason Evans fast path when heap profiling is enabled. Additionally, split a special 488536b3538SJason Evans case out into arena_prof_tctx_reset(), which also avoids chunk metadata 489536b3538SJason Evans reads. 490536b3538SJason Evans - Optimize irallocx_prof() to optimistically update the sampler state. The 491536b3538SJason Evans prior implementation appears to have been a holdover from when 492536b3538SJason Evans rallocx()/xallocx() functionality was combined as rallocm(). 493536b3538SJason Evans 494536b3538SJason Evans Bug fixes: 495536b3538SJason Evans - Fix TLS configuration such that it is enabled by default for platforms on 496536b3538SJason Evans which it works correctly. 497536b3538SJason Evans - Fix arenas_cache_cleanup() and arena_get_hard() to handle 498536b3538SJason Evans allocation/deallocation within the application's thread-specific data 499536b3538SJason Evans cleanup functions even after arenas_cache is torn down. 500536b3538SJason Evans - Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS. 501536b3538SJason Evans - Fix chunk purge hook calls for in-place huge shrinking reallocation to 502536b3538SJason Evans specify the old chunk size rather than the new chunk size. This bug caused 503536b3538SJason Evans no correctness issues for the default chunk purge function, but was 504536b3538SJason Evans visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl. 505536b3538SJason Evans - Fix heap profiling bugs: 506536b3538SJason Evans + Fix heap profiling to distinguish among otherwise identical sample sites 507536b3538SJason Evans with interposed resets (triggered via the "prof.reset" mallctl). This bug 508536b3538SJason Evans could cause data structure corruption that would most likely result in a 509536b3538SJason Evans segfault. 510536b3538SJason Evans + Fix irealloc_prof() to prof_alloc_rollback() on OOM. 511536b3538SJason Evans + Make one call to prof_active_get_unlocked() per allocation event, and use 512536b3538SJason Evans the result throughout the relevant functions that handle an allocation 513536b3538SJason Evans event. Also add a missing check in prof_realloc(). These fixes protect 514536b3538SJason Evans allocation events against concurrent prof_active changes. 515536b3538SJason Evans + Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample() 516536b3538SJason Evans in the correct order. 517536b3538SJason Evans + Fix prof_realloc() to call prof_free_sampled_object() after calling 518536b3538SJason Evans prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were 519536b3538SJason Evans the same, the tctx could have been prematurely destroyed. 520536b3538SJason Evans - Fix portability bugs: 521536b3538SJason Evans + Don't bitshift by negative amounts when encoding/decoding run sizes in 522536b3538SJason Evans chunk header maps. This affected systems with page sizes greater than 8 523536b3538SJason Evans KiB. 524536b3538SJason Evans + Rename index_t to szind_t to avoid an existing type on Solaris. 525536b3538SJason Evans + Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to 526536b3538SJason Evans match glibc and avoid compilation errors when including both 527536b3538SJason Evans jemalloc/jemalloc.h and malloc.h in C++ code. 528536b3538SJason Evans + Don't assume that /bin/sh is appropriate when running size_classes.sh 529536b3538SJason Evans during configuration. 530536b3538SJason Evans + Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM. 531536b3538SJason Evans + Link tests to librt if it contains clock_gettime(2). 532536b3538SJason Evans 533d0e79aa3SJason Evans* 4.0.0 (August 17, 2015) 534d0e79aa3SJason Evans 535d0e79aa3SJason Evans This version contains many speed and space optimizations, both minor and 536d0e79aa3SJason Evans major. The major themes are generalization, unification, and simplification. 537d0e79aa3SJason Evans Although many of these optimizations cause no visible behavior change, their 538d0e79aa3SJason Evans cumulative effect is substantial. 539d0e79aa3SJason Evans 540d0e79aa3SJason Evans New features: 541d0e79aa3SJason Evans - Normalize size class spacing to be consistent across the complete size 542d0e79aa3SJason Evans range. By default there are four size classes per size doubling, but this 543d0e79aa3SJason Evans is now configurable via the --with-lg-size-class-group option. Also add the 544d0e79aa3SJason Evans --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and 545d0e79aa3SJason Evans --with-lg-tiny-min options, which can be used to tweak page and size class 546d0e79aa3SJason Evans settings. Impacts: 547d0e79aa3SJason Evans + Worst case performance for incrementally growing/shrinking reallocation 548d0e79aa3SJason Evans is improved because there are far fewer size classes, and therefore 549d0e79aa3SJason Evans copying happens less often. 550d0e79aa3SJason Evans + Internal fragmentation is limited to 20% for all but the smallest size 551d0e79aa3SJason Evans classes (those less than four times the quantum). (1B + 4 KiB) 552d0e79aa3SJason Evans and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation. 553d0e79aa3SJason Evans + Chunk fragmentation tends to be lower because there are fewer distinct run 554d0e79aa3SJason Evans sizes to pack. 555d0e79aa3SJason Evans - Add support for explicit tcaches. The "tcache.create", "tcache.flush", and 556d0e79aa3SJason Evans "tcache.destroy" mallctls control tcache lifetime and flushing, and the 557d0e79aa3SJason Evans MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API 558d0e79aa3SJason Evans control which tcache is used for each operation. 559d0e79aa3SJason Evans - Implement per thread heap profiling, as well as the ability to 560d0e79aa3SJason Evans enable/disable heap profiling on a per thread basis. Add the "prof.reset", 561d0e79aa3SJason Evans "prof.lg_sample", "thread.prof.name", "thread.prof.active", 562d0e79aa3SJason Evans "opt.prof_thread_active_init", "prof.thread_active_init", and 563d0e79aa3SJason Evans "thread.prof.active" mallctls. 564d0e79aa3SJason Evans - Add support for per arena application-specified chunk allocators, configured 565d0e79aa3SJason Evans via the "arena.<i>.chunk_hooks" mallctl. 566d0e79aa3SJason Evans - Refactor huge allocation to be managed by arenas, so that arenas now 567d0e79aa3SJason Evans function as general purpose independent allocators. This is important in 568d0e79aa3SJason Evans the context of user-specified chunk allocators, aside from the scalability 569d0e79aa3SJason Evans benefits. Related new statistics: 570d0e79aa3SJason Evans + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc", 571d0e79aa3SJason Evans "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests" 572d0e79aa3SJason Evans mallctls provide high level per arena huge allocation statistics. 573d0e79aa3SJason Evans + The "arenas.nhchunks", "arenas.hchunk.<i>.size", 574d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.nmalloc", 575d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.ndalloc", 576d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.nrequests", and 577d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class 578d0e79aa3SJason Evans statistics. 579d0e79aa3SJason Evans - Add the 'util' column to malloc_stats_print() output, which reports the 580d0e79aa3SJason Evans proportion of available regions that are currently in use for each small 581d0e79aa3SJason Evans size class. 582d0e79aa3SJason Evans - Add "alloc" and "free" modes for for junk filling (see the "opt.junk" 583d0e79aa3SJason Evans mallctl), so that it is possible to separately enable junk filling for 584d0e79aa3SJason Evans allocation versus deallocation. 585d0e79aa3SJason Evans - Add the jemalloc-config script, which provides information about how 586d0e79aa3SJason Evans jemalloc was configured, and how to integrate it into application builds. 587d0e79aa3SJason Evans - Add metadata statistics, which are accessible via the "stats.metadata", 588d0e79aa3SJason Evans "stats.arenas.<i>.metadata.mapped", and 589d0e79aa3SJason Evans "stats.arenas.<i>.metadata.allocated" mallctls. 590d0e79aa3SJason Evans - Add the "stats.resident" mallctl, which reports the upper limit of 591d0e79aa3SJason Evans physically resident memory mapped by the allocator. 592d0e79aa3SJason Evans - Add per arena control over unused dirty page purging, via the 593d0e79aa3SJason Evans "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and 594d0e79aa3SJason Evans "stats.arenas.<i>.lg_dirty_mult" mallctls. 595d0e79aa3SJason Evans - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump 596d0e79aa3SJason Evans feature on/off during program execution. 597d0e79aa3SJason Evans - Add sdallocx(), which implements sized deallocation. The primary 598d0e79aa3SJason Evans optimization over dallocx() is the removal of a metadata read, which often 599d0e79aa3SJason Evans suffers an L1 cache miss. 600d0e79aa3SJason Evans - Add missing header includes in jemalloc/jemalloc.h, so that applications 601d0e79aa3SJason Evans only have to #include <jemalloc/jemalloc.h>. 602d0e79aa3SJason Evans - Add support for additional platforms: 603d0e79aa3SJason Evans + Bitrig 604d0e79aa3SJason Evans + Cygwin 605d0e79aa3SJason Evans + DragonFlyBSD 606d0e79aa3SJason Evans + iOS 607d0e79aa3SJason Evans + OpenBSD 608d0e79aa3SJason Evans + OpenRISC/or1k 609d0e79aa3SJason Evans 610d0e79aa3SJason Evans Optimizations: 611d0e79aa3SJason Evans - Maintain dirty runs in per arena LRUs rather than in per arena trees of 612d0e79aa3SJason Evans dirty-run-containing chunks. In practice this change significantly reduces 613d0e79aa3SJason Evans dirty page purging volume. 614d0e79aa3SJason Evans - Integrate whole chunks into the unused dirty page purging machinery. This 615d0e79aa3SJason Evans reduces the cost of repeated huge allocation/deallocation, because it 616d0e79aa3SJason Evans effectively introduces a cache of chunks. 617d0e79aa3SJason Evans - Split the arena chunk map into two separate arrays, in order to increase 618d0e79aa3SJason Evans cache locality for the frequently accessed bits. 619d0e79aa3SJason Evans - Move small run metadata out of runs, into arena chunk headers. This reduces 620d0e79aa3SJason Evans run fragmentation, smaller runs reduce external fragmentation for small size 621d0e79aa3SJason Evans classes, and packed (less uniformly aligned) metadata layout improves CPU 622d0e79aa3SJason Evans cache set distribution. 623d0e79aa3SJason Evans - Randomly distribute large allocation base pointer alignment relative to page 624d0e79aa3SJason Evans boundaries in order to more uniformly utilize CPU cache sets. This can be 625d0e79aa3SJason Evans disabled via the --disable-cache-oblivious configure option, and queried via 626d0e79aa3SJason Evans the "config.cache_oblivious" mallctl. 627d0e79aa3SJason Evans - Micro-optimize the fast paths for the public API functions. 628d0e79aa3SJason Evans - Refactor thread-specific data to reside in a single structure. This assures 629d0e79aa3SJason Evans that only a single TLS read is necessary per call into the public API. 630d0e79aa3SJason Evans - Implement in-place huge allocation growing and shrinking. 631d0e79aa3SJason Evans - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make 632d0e79aa3SJason Evans additional optimizations that reduce maximum lookup depth to one or two 633d0e79aa3SJason Evans levels. This resolves what was a concurrency bottleneck for per arena huge 634d0e79aa3SJason Evans allocation, because a global data structure is critical for determining 635d0e79aa3SJason Evans which arenas own which huge allocations. 636d0e79aa3SJason Evans 637d0e79aa3SJason Evans Incompatible changes: 638d0e79aa3SJason Evans - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious 639d0e79aa3SJason Evans warnings by default. 640d0e79aa3SJason Evans - Assure that the constness of malloc_usable_size()'s return type matches that 641d0e79aa3SJason Evans of the system implementation. 642d0e79aa3SJason Evans - Change the heap profile dump format to support per thread heap profiling, 643d0e79aa3SJason Evans rename pprof to jeprof, and enhance it with the --thread=<n> option. As a 644d0e79aa3SJason Evans result, the bundled jeprof must now be used rather than the upstream 645d0e79aa3SJason Evans (gperftools) pprof. 646d0e79aa3SJason Evans - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can 647d0e79aa3SJason Evans internally deadlock on some platforms. 648d0e79aa3SJason Evans - Change the "arenas.nlruns" mallctl type from size_t to unsigned. 649d0e79aa3SJason Evans - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with 650d0e79aa3SJason Evans "stats.arenas.<i>.bins.<j>.curregs". 651d0e79aa3SJason Evans - Ignore MALLOC_CONF in set{uid,gid,cap} binaries. 652d0e79aa3SJason Evans - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the 653d0e79aa3SJason Evans MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage. 654d0e79aa3SJason Evans 655d0e79aa3SJason Evans Removed features: 656d0e79aa3SJason Evans - Remove the *allocm() API, which is superseded by the *allocx() API. 657d0e79aa3SJason Evans - Remove the --enable-dss options, and make dss non-optional on all platforms 658d0e79aa3SJason Evans which support sbrk(2). 659d0e79aa3SJason Evans - Remove the "arenas.purge" mallctl, which was obsoleted by the 660d0e79aa3SJason Evans "arena.<i>.purge" mallctl in 3.1.0. 661d0e79aa3SJason Evans - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically 662d0e79aa3SJason Evans detects whether it is running inside Valgrind. 663d0e79aa3SJason Evans - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and 664d0e79aa3SJason Evans "stats.huge.ndalloc" mallctls. 665d0e79aa3SJason Evans - Remove the --enable-mremap option. 666d0e79aa3SJason Evans - Remove the "stats.chunks.current", "stats.chunks.total", and 667d0e79aa3SJason Evans "stats.chunks.high" mallctls. 668d0e79aa3SJason Evans 669d0e79aa3SJason Evans Bug fixes: 670d0e79aa3SJason Evans - Fix the cactive statistic to decrease (rather than increase) when active 671d0e79aa3SJason Evans memory decreases. This regression was first released in 3.5.0. 672d0e79aa3SJason Evans - Fix OOM handling in memalign() and valloc(). A variant of this bug existed 673d0e79aa3SJason Evans in all releases since 2.0.0, which introduced these functions. 674d0e79aa3SJason Evans - Fix an OOM-related regression in arena_tcache_fill_small(), which could 675d0e79aa3SJason Evans cause cache corruption on OOM. This regression was present in all releases 676d0e79aa3SJason Evans from 2.2.0 through 3.6.0. 677d0e79aa3SJason Evans - Fix size class overflow handling for malloc(), posix_memalign(), memalign(), 678d0e79aa3SJason Evans calloc(), and realloc() when profiling is enabled. 679d0e79aa3SJason Evans - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or 680d0e79aa3SJason Evans "secondary" precedence is specified, but sbrk(2) is not supported. 681d0e79aa3SJason Evans - Fix fallback lg_floor() implementations to handle extremely large inputs. 682d0e79aa3SJason Evans - Ensure the default purgeable zone is after the default zone on OS X. 683d0e79aa3SJason Evans - Fix latent bugs in atomic_*(). 684d0e79aa3SJason Evans - Fix the "arena.<i>.dss" mallctl to handle read-only calls. 685d0e79aa3SJason Evans - Fix tls_model configuration to enable the initial-exec model when possible. 686d0e79aa3SJason Evans - Mark malloc_conf as a weak symbol so that the application can override it. 687d0e79aa3SJason Evans - Correctly detect glibc's adaptive pthread mutexes. 688d0e79aa3SJason Evans - Fix the --without-export configure option. 689d0e79aa3SJason Evans 6902fff27f8SJason Evans* 3.6.0 (March 31, 2014) 6912fff27f8SJason Evans 6922fff27f8SJason Evans This version contains a critical bug fix for a regression present in 3.5.0 and 6932fff27f8SJason Evans 3.5.1. 6942fff27f8SJason Evans 6952fff27f8SJason Evans Bug fixes: 6962fff27f8SJason Evans - Fix a regression in arena_chunk_alloc() that caused crashes during 6972fff27f8SJason Evans small/large allocation if chunk allocation failed. In the absence of this 6982fff27f8SJason Evans bug, chunk allocation failure would result in allocation failure, e.g. NULL 6992fff27f8SJason Evans return from malloc(). This regression was introduced in 3.5.0. 7002fff27f8SJason Evans - Fix backtracing for gcc intrinsics-based backtracing by specifying 7012fff27f8SJason Evans -fno-omit-frame-pointer to gcc. Note that the application (and all the 7022fff27f8SJason Evans libraries it links to) must also be compiled with this option for 7032fff27f8SJason Evans backtracing to be reliable. 7042fff27f8SJason Evans - Use dss allocation precedence for huge allocations as well as small/large 7052fff27f8SJason Evans allocations. 706d0e79aa3SJason Evans - Fix test assertion failure message formatting. This bug did not manifest on 7072fff27f8SJason Evans x86_64 systems because of implementation subtleties in va_list. 7082fff27f8SJason Evans - Fix inconsequential test failures for hash and SFMT code. 7092fff27f8SJason Evans 7102fff27f8SJason Evans New features: 7112fff27f8SJason Evans - Support heap profiling on FreeBSD. This feature depends on the proc 7122fff27f8SJason Evans filesystem being mounted during heap profile dumping. 7132fff27f8SJason Evans 714706d9bd1SJason Evans* 3.5.1 (February 25, 2014) 715706d9bd1SJason Evans 716706d9bd1SJason Evans This version primarily addresses minor bugs in test code. 717706d9bd1SJason Evans 718706d9bd1SJason Evans Bug fixes: 719706d9bd1SJason Evans - Configure Solaris/Illumos to use MADV_FREE. 720706d9bd1SJason Evans - Fix junk filling for mremap(2)-based huge reallocation. This is only 721706d9bd1SJason Evans relevant if configuring with the --enable-mremap option specified. 722706d9bd1SJason Evans - Avoid compilation failure if 'restrict' C99 keyword is not supported by the 723706d9bd1SJason Evans compiler. 724706d9bd1SJason Evans - Add a configure test for SSE2 rather than assuming it is usable on i686 725706d9bd1SJason Evans systems. This fixes test compilation errors, especially on 32-bit Linux 726706d9bd1SJason Evans systems. 727706d9bd1SJason Evans - Fix mallctl argument size mismatches (size_t vs. uint64_t) in the stats unit 728706d9bd1SJason Evans test. 729706d9bd1SJason Evans - Fix/remove flawed alignment-related overflow tests. 730706d9bd1SJason Evans - Prevent compiler optimizations that could change backtraces in the 731706d9bd1SJason Evans prof_accum unit test. 732a4bd5210SJason Evans 733f921d10fSJason Evans* 3.5.0 (January 22, 2014) 734f921d10fSJason Evans 735f921d10fSJason Evans This version focuses on refactoring and automated testing, though it also 736f921d10fSJason Evans includes some non-trivial heap profiling optimizations not mentioned below. 737f921d10fSJason Evans 738f921d10fSJason Evans New features: 739f921d10fSJason Evans - Add the *allocx() API, which is a successor to the experimental *allocm() 740f921d10fSJason Evans API. The *allocx() functions are slightly simpler to use because they have 741f921d10fSJason Evans fewer parameters, they directly return the results of primary interest, and 742f921d10fSJason Evans mallocx()/rallocx() avoid the strict aliasing pitfall that 743706d9bd1SJason Evans allocm()/rallocm() share with posix_memalign(). Note that *allocm() is 744f921d10fSJason Evans slated for removal in the next non-bugfix release. 745f921d10fSJason Evans - Add support for LinuxThreads. 746f921d10fSJason Evans 747f921d10fSJason Evans Bug fixes: 748f921d10fSJason Evans - Unless heap profiling is enabled, disable floating point code and don't link 749f921d10fSJason Evans with libm. This, in combination with e.g. EXTRA_CFLAGS=-mno-sse on x64 750f921d10fSJason Evans systems, makes it possible to completely disable floating point register 751f921d10fSJason Evans use. Some versions of glibc neglect to save/restore caller-saved floating 752f921d10fSJason Evans point registers during dynamic lazy symbol loading, and the symbol loading 753f921d10fSJason Evans code uses whatever malloc the application happens to have linked/loaded 754f921d10fSJason Evans with, the result being potential floating point register corruption. 755f921d10fSJason Evans - Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling 756f921d10fSJason Evans backtrace creation in imemalign(). This bug impacted posix_memalign() and 757f921d10fSJason Evans aligned_alloc(). 758f921d10fSJason Evans - Fix a file descriptor leak in a prof_dump_maps() error path. 759f921d10fSJason Evans - Fix prof_dump() to close the dump file descriptor for all relevant error 760f921d10fSJason Evans paths. 761f921d10fSJason Evans - Fix rallocm() to use the arena specified by the ALLOCM_ARENA(s) flag for 762f921d10fSJason Evans allocation, not just deallocation. 763f921d10fSJason Evans - Fix a data race for large allocation stats counters. 764f921d10fSJason Evans - Fix a potential infinite loop during thread exit. This bug occurred on 765f921d10fSJason Evans Solaris, and could affect other platforms with similar pthreads TSD 766f921d10fSJason Evans implementations. 767f921d10fSJason Evans - Don't junk-fill reallocations unless usable size changes. This fixes a 768f921d10fSJason Evans violation of the *allocx()/*allocm() semantics. 769f921d10fSJason Evans - Fix growing large reallocation to junk fill new space. 770f921d10fSJason Evans - Fix huge deallocation to junk fill when munmap is disabled. 771f921d10fSJason Evans - Change the default private namespace prefix from empty to je_, and change 772f921d10fSJason Evans --with-private-namespace-prefix so that it prepends an additional prefix 773f921d10fSJason Evans rather than replacing je_. This reduces the likelihood of applications 774f921d10fSJason Evans which statically link jemalloc experiencing symbol name collisions. 775f921d10fSJason Evans - Add missing private namespace mangling (relevant when 776f921d10fSJason Evans --with-private-namespace is specified). 777f921d10fSJason Evans - Add and use JEMALLOC_INLINE_C so that static inline functions are marked as 778f921d10fSJason Evans static even for debug builds. 779f921d10fSJason Evans - Add a missing mutex unlock in a malloc_init_hard() error path. In practice 780f921d10fSJason Evans this error path is never executed. 781f921d10fSJason Evans - Fix numerous bugs in malloc_strotumax() error handling/reporting. These 782f921d10fSJason Evans bugs had no impact except for malformed inputs. 783f921d10fSJason Evans - Fix numerous bugs in malloc_snprintf(). These bugs were not exercised by 784f921d10fSJason Evans existing calls, so they had no impact. 785f921d10fSJason Evans 7862b06b201SJason Evans* 3.4.1 (October 20, 2013) 7872b06b201SJason Evans 7882b06b201SJason Evans Bug fixes: 7892b06b201SJason Evans - Fix a race in the "arenas.extend" mallctl that could cause memory corruption 7902b06b201SJason Evans of internal data structures and subsequent crashes. 7912b06b201SJason Evans - Fix Valgrind integration flaws that caused Valgrind warnings about reads of 7922b06b201SJason Evans uninitialized memory in: 7932b06b201SJason Evans + arena chunk headers 7942b06b201SJason Evans + internal zero-initialized data structures (relevant to tcache and prof 7952b06b201SJason Evans code) 7962b06b201SJason Evans - Preserve errno during the first allocation. A readlink(2) call during 7972b06b201SJason Evans initialization fails unless /etc/malloc.conf exists, so errno was typically 7982b06b201SJason Evans set during the first allocation prior to this fix. 7992b06b201SJason Evans - Fix compilation warnings reported by gcc 4.8.1. 8002b06b201SJason Evans 801f8ca2db1SJason Evans* 3.4.0 (June 2, 2013) 802f8ca2db1SJason Evans 803f8ca2db1SJason Evans This version is essentially a small bugfix release, but the addition of 804f8ca2db1SJason Evans aarch64 support requires that the minor version be incremented. 805f8ca2db1SJason Evans 806f8ca2db1SJason Evans Bug fixes: 807f8ca2db1SJason Evans - Fix race-triggered deadlocks in chunk_record(). These deadlocks were 808f8ca2db1SJason Evans typically triggered by multiple threads concurrently deallocating huge 809f8ca2db1SJason Evans objects. 810f8ca2db1SJason Evans 811f8ca2db1SJason Evans New features: 812f8ca2db1SJason Evans - Add support for the aarch64 architecture. 813f8ca2db1SJason Evans 814f8ca2db1SJason Evans* 3.3.1 (March 6, 2013) 815f8ca2db1SJason Evans 816f8ca2db1SJason Evans This version fixes bugs that are typically encountered only when utilizing 817f8ca2db1SJason Evans custom run-time options. 818f8ca2db1SJason Evans 819f8ca2db1SJason Evans Bug fixes: 820f8ca2db1SJason Evans - Fix a locking order bug that could cause deadlock during fork if heap 821f8ca2db1SJason Evans profiling were enabled. 822f8ca2db1SJason Evans - Fix a chunk recycling bug that could cause the allocator to lose track of 823f8ca2db1SJason Evans whether a chunk was zeroed. On FreeBSD, NetBSD, and OS X, it could cause 824f8ca2db1SJason Evans corruption if allocating via sbrk(2) (unlikely unless running with the 825f8ca2db1SJason Evans "dss:primary" option specified). This was completely harmless on Linux 826f8ca2db1SJason Evans unless using mlockall(2) (and unlikely even then, unless the 827f8ca2db1SJason Evans --disable-munmap configure option or the "dss:primary" option was 828f8ca2db1SJason Evans specified). This regression was introduced in 3.1.0 by the 829f8ca2db1SJason Evans mlockall(2)/madvise(2) interaction fix. 830f8ca2db1SJason Evans - Fix TLS-related memory corruption that could occur during thread exit if the 831f8ca2db1SJason Evans thread never allocated memory. Only the quarantine and prof facilities were 832f8ca2db1SJason Evans susceptible. 833f8ca2db1SJason Evans - Fix two quarantine bugs: 834f8ca2db1SJason Evans + Internal reallocation of the quarantined object array leaked the old 835f8ca2db1SJason Evans array. 836f8ca2db1SJason Evans + Reallocation failure for internal reallocation of the quarantined object 837f8ca2db1SJason Evans array (very unlikely) resulted in memory corruption. 838f8ca2db1SJason Evans - Fix Valgrind integration to annotate all internally allocated memory in a 839f8ca2db1SJason Evans way that keeps Valgrind happy about internal data structure access. 840f8ca2db1SJason Evans - Fix building for s390 systems. 841f8ca2db1SJason Evans 84288ad2f8dSJason Evans* 3.3.0 (January 23, 2013) 84388ad2f8dSJason Evans 84488ad2f8dSJason Evans This version includes a few minor performance improvements in addition to the 84588ad2f8dSJason Evans listed new features and bug fixes. 84688ad2f8dSJason Evans 84788ad2f8dSJason Evans New features: 84888ad2f8dSJason Evans - Add clipping support to lg_chunk option processing. 84988ad2f8dSJason Evans - Add the --enable-ivsalloc option. 85088ad2f8dSJason Evans - Add the --without-export option. 85188ad2f8dSJason Evans - Add the --disable-zone-allocator option. 85288ad2f8dSJason Evans 85388ad2f8dSJason Evans Bug fixes: 85488ad2f8dSJason Evans - Fix "arenas.extend" mallctl to output the number of arenas. 8552b06b201SJason Evans - Fix chunk_recycle() to unconditionally inform Valgrind that returned memory 85688ad2f8dSJason Evans is undefined. 85788ad2f8dSJason Evans - Fix build break on FreeBSD related to alloca.h. 85888ad2f8dSJason Evans 85982872ac0SJason Evans* 3.2.0 (November 9, 2012) 86082872ac0SJason Evans 86182872ac0SJason Evans In addition to a couple of bug fixes, this version modifies page run 86282872ac0SJason Evans allocation and dirty page purging algorithms in order to better control 86382872ac0SJason Evans page-level virtual memory fragmentation. 86482872ac0SJason Evans 86582872ac0SJason Evans Incompatible changes: 86682872ac0SJason Evans - Change the "opt.lg_dirty_mult" default from 5 to 3 (32:1 to 8:1). 86782872ac0SJason Evans 86882872ac0SJason Evans Bug fixes: 86982872ac0SJason Evans - Fix dss/mmap allocation precedence code to use recyclable mmap memory only 87082872ac0SJason Evans after primary dss allocation fails. 87182872ac0SJason Evans - Fix deadlock in the "arenas.purge" mallctl. This regression was introduced 87282872ac0SJason Evans in 3.1.0 by the addition of the "arena.<i>.purge" mallctl. 87382872ac0SJason Evans 87482872ac0SJason Evans* 3.1.0 (October 16, 2012) 87582872ac0SJason Evans 87682872ac0SJason Evans New features: 87782872ac0SJason Evans - Auto-detect whether running inside Valgrind, thus removing the need to 87882872ac0SJason Evans manually specify MALLOC_CONF=valgrind:true. 87982872ac0SJason Evans - Add the "arenas.extend" mallctl, which allows applications to create 88082872ac0SJason Evans manually managed arenas. 88182872ac0SJason Evans - Add the ALLOCM_ARENA() flag for {,r,d}allocm(). 88282872ac0SJason Evans - Add the "opt.dss", "arena.<i>.dss", and "stats.arenas.<i>.dss" mallctls, 88382872ac0SJason Evans which provide control over dss/mmap precedence. 88482872ac0SJason Evans - Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". 88582872ac0SJason Evans - Define LG_QUANTUM for hppa. 88682872ac0SJason Evans 88782872ac0SJason Evans Incompatible changes: 88882872ac0SJason Evans - Disable tcache by default if running inside Valgrind, in order to avoid 88982872ac0SJason Evans making unallocated objects appear reachable to Valgrind. 89082872ac0SJason Evans - Drop const from malloc_usable_size() argument on Linux. 89182872ac0SJason Evans 89282872ac0SJason Evans Bug fixes: 89382872ac0SJason Evans - Fix heap profiling crash if sampled object is freed via realloc(p, 0). 89482872ac0SJason Evans - Remove const from __*_hook variable declarations, so that glibc can modify 89582872ac0SJason Evans them during process forking. 89682872ac0SJason Evans - Fix mlockall(2)/madvise(2) interaction. 89782872ac0SJason Evans - Fix fork(2)-related deadlocks. 89882872ac0SJason Evans - Fix error return value for "thread.tcache.enabled" mallctl. 89982872ac0SJason Evans 90035dad073SJason Evans* 3.0.0 (May 11, 2012) 901a4bd5210SJason Evans 902a4bd5210SJason Evans Although this version adds some major new features, the primary focus is on 903a4bd5210SJason Evans internal code cleanup that facilitates maintainability and portability, most 904a4bd5210SJason Evans of which is not reflected in the ChangeLog. This is the first release to 905a4bd5210SJason Evans incorporate substantial contributions from numerous other developers, and the 906a4bd5210SJason Evans result is a more broadly useful allocator (see the git revision history for 907a4bd5210SJason Evans contribution details). Note that the license has been unified, thanks to 908a4bd5210SJason Evans Facebook granting a license under the same terms as the other copyright 909a4bd5210SJason Evans holders (see COPYING). 910a4bd5210SJason Evans 911a4bd5210SJason Evans New features: 912a4bd5210SJason Evans - Implement Valgrind support, redzones, and quarantine. 913e722f8f8SJason Evans - Add support for additional platforms: 914a4bd5210SJason Evans + FreeBSD 915a4bd5210SJason Evans + Mac OS X Lion 916e722f8f8SJason Evans + MinGW 91735dad073SJason Evans + Windows (no support yet for replacing the system malloc) 918a4bd5210SJason Evans - Add support for additional architectures: 919a4bd5210SJason Evans + MIPS 920a4bd5210SJason Evans + SH4 921a4bd5210SJason Evans + Tilera 922a4bd5210SJason Evans - Add support for cross compiling. 923a4bd5210SJason Evans - Add nallocm(), which rounds a request size up to the nearest size class 924a4bd5210SJason Evans without actually allocating. 925a4bd5210SJason Evans - Implement aligned_alloc() (blame C11). 926a4bd5210SJason Evans - Add the "thread.tcache.enabled" mallctl. 9278ed34ab0SJason Evans - Add the "opt.prof_final" mallctl. 9288ed34ab0SJason Evans - Update pprof (from gperftools 2.0). 92935dad073SJason Evans - Add the --with-mangling option. 93035dad073SJason Evans - Add the --disable-experimental option. 93135dad073SJason Evans - Add the --disable-munmap option, and make it the default on Linux. 93235dad073SJason Evans - Add the --enable-mremap option, which disables use of mremap(2) by default. 933a4bd5210SJason Evans 934a4bd5210SJason Evans Incompatible changes: 935a4bd5210SJason Evans - Enable stats by default. 936a4bd5210SJason Evans - Enable fill by default. 937a4bd5210SJason Evans - Disable lazy locking by default. 938a4bd5210SJason Evans - Rename the "tcache.flush" mallctl to "thread.tcache.flush". 939a4bd5210SJason Evans - Rename the "arenas.pagesize" mallctl to "arenas.page". 9408ed34ab0SJason Evans - Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB). 9418ed34ab0SJason Evans - Change the "opt.prof_accum" default from true to false. 942a4bd5210SJason Evans 943a4bd5210SJason Evans Removed features: 944a4bd5210SJason Evans - Remove the swap feature, including the "config.swap", "swap.avail", 945a4bd5210SJason Evans "swap.prezeroed", "swap.nfds", and "swap.fds" mallctls. 946a4bd5210SJason Evans - Remove highruns statistics, including the 947a4bd5210SJason Evans "stats.arenas.<i>.bins.<j>.highruns" and 948a4bd5210SJason Evans "stats.arenas.<i>.lruns.<j>.highruns" mallctls. 949a4bd5210SJason Evans - As part of small size class refactoring, remove the "opt.lg_[qc]space_max", 950a4bd5210SJason Evans "arenas.cacheline", "arenas.subpage", "arenas.[tqcs]space_{min,max}", and 951a4bd5210SJason Evans "arenas.[tqcs]bins" mallctls. 952a4bd5210SJason Evans - Remove the "arenas.chunksize" mallctl. 953a4bd5210SJason Evans - Remove the "opt.lg_prof_tcmax" option. 954a4bd5210SJason Evans - Remove the "opt.lg_prof_bt_max" option. 955a4bd5210SJason Evans - Remove the "opt.lg_tcache_gc_sweep" option. 956a4bd5210SJason Evans - Remove the --disable-tiny option, including the "config.tiny" mallctl. 957a4bd5210SJason Evans - Remove the --enable-dynamic-page-shift configure option. 958a4bd5210SJason Evans - Remove the --enable-sysv configure option. 959a4bd5210SJason Evans 960a4bd5210SJason Evans Bug fixes: 961a4bd5210SJason Evans - Fix a statistics-related bug in the "thread.arena" mallctl that could cause 962a4bd5210SJason Evans invalid statistics and crashes. 963e722f8f8SJason Evans - Work around TLS deallocation via free() on Linux. This bug could cause 964a4bd5210SJason Evans write-after-free memory corruption. 965e722f8f8SJason Evans - Fix a potential deadlock that could occur during interval- and 966e722f8f8SJason Evans growth-triggered heap profile dumps. 96735dad073SJason Evans - Fix large calloc() zeroing bugs due to dropping chunk map unzeroed flags. 9684bcb1430SJason Evans - Fix chunk_alloc_dss() to stop claiming memory is zeroed. This bug could 9694bcb1430SJason Evans cause memory corruption and crashes with --enable-dss specified. 970e722f8f8SJason Evans - Fix fork-related bugs that could cause deadlock in children between fork 971e722f8f8SJason Evans and exec. 972a4bd5210SJason Evans - Fix malloc_stats_print() to honor 'b' and 'l' in the opts parameter. 973a4bd5210SJason Evans - Fix realloc(p, 0) to act like free(p). 974a4bd5210SJason Evans - Do not enforce minimum alignment in memalign(). 975a4bd5210SJason Evans - Check for NULL pointer in malloc_usable_size(). 976e722f8f8SJason Evans - Fix an off-by-one heap profile statistics bug that could be observed in 977e722f8f8SJason Evans interval- and growth-triggered heap profiles. 978e722f8f8SJason Evans - Fix the "epoch" mallctl to update cached stats even if the passed in epoch 979e722f8f8SJason Evans is 0. 980a4bd5210SJason Evans - Fix bin->runcur management to fix a layout policy bug. This bug did not 981a4bd5210SJason Evans affect correctness. 982a4bd5210SJason Evans - Fix a bug in choose_arena_hard() that potentially caused more arenas to be 983a4bd5210SJason Evans initialized than necessary. 984a4bd5210SJason Evans - Add missing "opt.lg_tcache_max" mallctl implementation. 985a4bd5210SJason Evans - Use glibc allocator hooks to make mixed allocator usage less likely. 986a4bd5210SJason Evans - Fix build issues for --disable-tcache. 9878ed34ab0SJason Evans - Don't mangle pthread_create() when --with-private-namespace is specified. 988a4bd5210SJason Evans 989a4bd5210SJason Evans* 2.2.5 (November 14, 2011) 990a4bd5210SJason Evans 991a4bd5210SJason Evans Bug fixes: 992a4bd5210SJason Evans - Fix huge_ralloc() race when using mremap(2). This is a serious bug that 993a4bd5210SJason Evans could cause memory corruption and/or crashes. 994a4bd5210SJason Evans - Fix huge_ralloc() to maintain chunk statistics. 995a4bd5210SJason Evans - Fix malloc_stats_print(..., "a") output. 996a4bd5210SJason Evans 997a4bd5210SJason Evans* 2.2.4 (November 5, 2011) 998a4bd5210SJason Evans 999a4bd5210SJason Evans Bug fixes: 1000a4bd5210SJason Evans - Initialize arenas_tsd before using it. This bug existed for 2.2.[0-3], as 1001a4bd5210SJason Evans well as for --disable-tls builds in earlier releases. 1002a4bd5210SJason Evans - Do not assume a 4 KiB page size in test/rallocm.c. 1003a4bd5210SJason Evans 1004a4bd5210SJason Evans* 2.2.3 (August 31, 2011) 1005a4bd5210SJason Evans 1006a4bd5210SJason Evans This version fixes numerous bugs related to heap profiling. 1007a4bd5210SJason Evans 1008a4bd5210SJason Evans Bug fixes: 1009a4bd5210SJason Evans - Fix a prof-related race condition. This bug could cause memory corruption, 1010a4bd5210SJason Evans but only occurred in non-default configurations (prof_accum:false). 1011a4bd5210SJason Evans - Fix off-by-one backtracing issues (make sure that prof_alloc_prep() is 1012a4bd5210SJason Evans excluded from backtraces). 1013a4bd5210SJason Evans - Fix a prof-related bug in realloc() (only triggered by OOM errors). 1014a4bd5210SJason Evans - Fix prof-related bugs in allocm() and rallocm(). 1015a4bd5210SJason Evans - Fix prof_tdata_cleanup() for --disable-tls builds. 1016a4bd5210SJason Evans - Fix a relative include path, to fix objdir builds. 1017a4bd5210SJason Evans 1018a4bd5210SJason Evans* 2.2.2 (July 30, 2011) 1019a4bd5210SJason Evans 1020a4bd5210SJason Evans Bug fixes: 1021a4bd5210SJason Evans - Fix a build error for --disable-tcache. 1022a4bd5210SJason Evans - Fix assertions in arena_purge() (for real this time). 1023a4bd5210SJason Evans - Add the --with-private-namespace option. This is a workaround for symbol 1024a4bd5210SJason Evans conflicts that can inadvertently arise when using static libraries. 1025a4bd5210SJason Evans 1026a4bd5210SJason Evans* 2.2.1 (March 30, 2011) 1027a4bd5210SJason Evans 1028a4bd5210SJason Evans Bug fixes: 1029a4bd5210SJason Evans - Implement atomic operations for x86/x64. This fixes compilation failures 1030a4bd5210SJason Evans for versions of gcc that are still in wide use. 1031a4bd5210SJason Evans - Fix an assertion in arena_purge(). 1032a4bd5210SJason Evans 1033a4bd5210SJason Evans* 2.2.0 (March 22, 2011) 1034a4bd5210SJason Evans 1035a4bd5210SJason Evans This version incorporates several improvements to algorithms and data 1036a4bd5210SJason Evans structures that tend to reduce fragmentation and increase speed. 1037a4bd5210SJason Evans 1038a4bd5210SJason Evans New features: 1039a4bd5210SJason Evans - Add the "stats.cactive" mallctl. 1040a4bd5210SJason Evans - Update pprof (from google-perftools 1.7). 1041a4bd5210SJason Evans - Improve backtracing-related configuration logic, and add the 1042a4bd5210SJason Evans --disable-prof-libgcc option. 1043a4bd5210SJason Evans 1044a4bd5210SJason Evans Bug fixes: 1045a4bd5210SJason Evans - Change default symbol visibility from "internal", to "hidden", which 1046a4bd5210SJason Evans decreases the overhead of library-internal function calls. 1047a4bd5210SJason Evans - Fix symbol visibility so that it is also set on OS X. 1048a4bd5210SJason Evans - Fix a build dependency regression caused by the introduction of the .pic.o 1049a4bd5210SJason Evans suffix for PIC object files. 1050a4bd5210SJason Evans - Add missing checks for mutex initialization failures. 1051a4bd5210SJason Evans - Don't use libgcc-based backtracing except on x64, where it is known to work. 1052a4bd5210SJason Evans - Fix deadlocks on OS X that were due to memory allocation in 1053a4bd5210SJason Evans pthread_mutex_lock(). 1054a4bd5210SJason Evans - Heap profiling-specific fixes: 1055a4bd5210SJason Evans + Fix memory corruption due to integer overflow in small region index 1056a4bd5210SJason Evans computation, when using a small enough sample interval that profiling 1057a4bd5210SJason Evans context pointers are stored in small run headers. 1058a4bd5210SJason Evans + Fix a bootstrap ordering bug that only occurred with TLS disabled. 1059a4bd5210SJason Evans + Fix a rallocm() rsize bug. 1060a4bd5210SJason Evans + Fix error detection bugs for aligned memory allocation. 1061a4bd5210SJason Evans 1062a4bd5210SJason Evans* 2.1.3 (March 14, 2011) 1063a4bd5210SJason Evans 1064a4bd5210SJason Evans Bug fixes: 1065a4bd5210SJason Evans - Fix a cpp logic regression (due to the "thread.{de,}allocatedp" mallctl fix 1066a4bd5210SJason Evans for OS X in 2.1.2). 1067a4bd5210SJason Evans - Fix a "thread.arena" mallctl bug. 1068a4bd5210SJason Evans - Fix a thread cache stats merging bug. 1069a4bd5210SJason Evans 1070a4bd5210SJason Evans* 2.1.2 (March 2, 2011) 1071a4bd5210SJason Evans 1072a4bd5210SJason Evans Bug fixes: 1073a4bd5210SJason Evans - Fix "thread.{de,}allocatedp" mallctl for OS X. 1074a4bd5210SJason Evans - Add missing jemalloc.a to build system. 1075a4bd5210SJason Evans 1076a4bd5210SJason Evans* 2.1.1 (January 31, 2011) 1077a4bd5210SJason Evans 1078a4bd5210SJason Evans Bug fixes: 1079a4bd5210SJason Evans - Fix aligned huge reallocation (affected allocm()). 1080a4bd5210SJason Evans - Fix the ALLOCM_LG_ALIGN macro definition. 1081a4bd5210SJason Evans - Fix a heap dumping deadlock. 1082a4bd5210SJason Evans - Fix a "thread.arena" mallctl bug. 1083a4bd5210SJason Evans 1084a4bd5210SJason Evans* 2.1.0 (December 3, 2010) 1085a4bd5210SJason Evans 1086a4bd5210SJason Evans This version incorporates some optimizations that can't quite be considered 1087a4bd5210SJason Evans bug fixes. 1088a4bd5210SJason Evans 1089a4bd5210SJason Evans New features: 1090a4bd5210SJason Evans - Use Linux's mremap(2) for huge object reallocation when possible. 1091a4bd5210SJason Evans - Avoid locking in mallctl*() when possible. 1092a4bd5210SJason Evans - Add the "thread.[de]allocatedp" mallctl's. 1093a4bd5210SJason Evans - Convert the manual page source from roff to DocBook, and generate both roff 1094a4bd5210SJason Evans and HTML manuals. 1095a4bd5210SJason Evans 1096a4bd5210SJason Evans Bug fixes: 1097a4bd5210SJason Evans - Fix a crash due to incorrect bootstrap ordering. This only impacted 1098a4bd5210SJason Evans --enable-debug --enable-dss configurations. 1099a4bd5210SJason Evans - Fix a minor statistics bug for mallctl("swap.avail", ...). 1100a4bd5210SJason Evans 1101a4bd5210SJason Evans* 2.0.1 (October 29, 2010) 1102a4bd5210SJason Evans 1103a4bd5210SJason Evans Bug fixes: 1104a4bd5210SJason Evans - Fix a race condition in heap profiling that could cause undefined behavior 1105a4bd5210SJason Evans if "opt.prof_accum" were disabled. 1106a4bd5210SJason Evans - Add missing mutex unlocks for some OOM error paths in the heap profiling 1107a4bd5210SJason Evans code. 1108a4bd5210SJason Evans - Fix a compilation error for non-C99 builds. 1109a4bd5210SJason Evans 1110a4bd5210SJason Evans* 2.0.0 (October 24, 2010) 1111a4bd5210SJason Evans 1112a4bd5210SJason Evans This version focuses on the experimental *allocm() API, and on improved 1113a4bd5210SJason Evans run-time configuration/introspection. Nonetheless, numerous performance 1114a4bd5210SJason Evans improvements are also included. 1115a4bd5210SJason Evans 1116a4bd5210SJason Evans New features: 1117a4bd5210SJason Evans - Implement the experimental {,r,s,d}allocm() API, which provides a superset 1118a4bd5210SJason Evans of the functionality available via malloc(), calloc(), posix_memalign(), 1119a4bd5210SJason Evans realloc(), malloc_usable_size(), and free(). These functions can be used to 1120a4bd5210SJason Evans allocate/reallocate aligned zeroed memory, ask for optional extra memory 1121a4bd5210SJason Evans during reallocation, prevent object movement during reallocation, etc. 1122a4bd5210SJason Evans - Replace JEMALLOC_OPTIONS/JEMALLOC_PROF_PREFIX with MALLOC_CONF, which is 1123a4bd5210SJason Evans more human-readable, and more flexible. For example: 1124a4bd5210SJason Evans JEMALLOC_OPTIONS=AJP 1125a4bd5210SJason Evans is now: 1126a4bd5210SJason Evans MALLOC_CONF=abort:true,fill:true,stats_print:true 1127a4bd5210SJason Evans - Port to Apple OS X. Sponsored by Mozilla. 1128a4bd5210SJason Evans - Make it possible for the application to control thread-->arena mappings via 1129a4bd5210SJason Evans the "thread.arena" mallctl. 1130a4bd5210SJason Evans - Add compile-time support for all TLS-related functionality via pthreads TSD. 1131a4bd5210SJason Evans This is mainly of interest for OS X, which does not support TLS, but has a 1132a4bd5210SJason Evans TSD implementation with similar performance. 1133a4bd5210SJason Evans - Override memalign() and valloc() if they are provided by the system. 1134a4bd5210SJason Evans - Add the "arenas.purge" mallctl, which can be used to synchronously purge all 1135a4bd5210SJason Evans dirty unused pages. 1136a4bd5210SJason Evans - Make cumulative heap profiling data optional, so that it is possible to 1137a4bd5210SJason Evans limit the amount of memory consumed by heap profiling data structures. 1138a4bd5210SJason Evans - Add per thread allocation counters that can be accessed via the 1139a4bd5210SJason Evans "thread.allocated" and "thread.deallocated" mallctls. 1140a4bd5210SJason Evans 1141a4bd5210SJason Evans Incompatible changes: 1142a4bd5210SJason Evans - Remove JEMALLOC_OPTIONS and malloc_options (see MALLOC_CONF above). 1143a4bd5210SJason Evans - Increase default backtrace depth from 4 to 128 for heap profiling. 1144a4bd5210SJason Evans - Disable interval-based profile dumps by default. 1145a4bd5210SJason Evans 1146a4bd5210SJason Evans Bug fixes: 1147a4bd5210SJason Evans - Remove bad assertions in fork handler functions. These assertions could 1148a4bd5210SJason Evans cause aborts for some combinations of configure settings. 1149a4bd5210SJason Evans - Fix strerror_r() usage to deal with non-standard semantics in GNU libc. 1150a4bd5210SJason Evans - Fix leak context reporting. This bug tended to cause the number of contexts 1151a4bd5210SJason Evans to be underreported (though the reported number of objects and bytes were 1152a4bd5210SJason Evans correct). 1153a4bd5210SJason Evans - Fix a realloc() bug for large in-place growing reallocation. This bug could 1154a4bd5210SJason Evans cause memory corruption, but it was hard to trigger. 1155a4bd5210SJason Evans - Fix an allocation bug for small allocations that could be triggered if 1156a4bd5210SJason Evans multiple threads raced to create a new run of backing pages. 1157a4bd5210SJason Evans - Enhance the heap profiler to trigger samples based on usable size, rather 1158a4bd5210SJason Evans than request size. 1159a4bd5210SJason Evans - Fix a heap profiling bug due to sometimes losing track of requested object 1160a4bd5210SJason Evans size for sampled objects. 1161a4bd5210SJason Evans 1162a4bd5210SJason Evans* 1.0.3 (August 12, 2010) 1163a4bd5210SJason Evans 1164a4bd5210SJason Evans Bug fixes: 1165a4bd5210SJason Evans - Fix the libunwind-based implementation of stack backtracing (used for heap 1166a4bd5210SJason Evans profiling). This bug could cause zero-length backtraces to be reported. 1167a4bd5210SJason Evans - Add a missing mutex unlock in library initialization code. If multiple 1168a4bd5210SJason Evans threads raced to initialize malloc, some of them could end up permanently 1169a4bd5210SJason Evans blocked. 1170a4bd5210SJason Evans 1171a4bd5210SJason Evans* 1.0.2 (May 11, 2010) 1172a4bd5210SJason Evans 1173a4bd5210SJason Evans Bug fixes: 1174a4bd5210SJason Evans - Fix junk filling of large objects, which could cause memory corruption. 1175a4bd5210SJason Evans - Add MAP_NORESERVE support for chunk mapping, because otherwise virtual 1176a4bd5210SJason Evans memory limits could cause swap file configuration to fail. Contributed by 1177a4bd5210SJason Evans Jordan DeLong. 1178a4bd5210SJason Evans 1179a4bd5210SJason Evans* 1.0.1 (April 14, 2010) 1180a4bd5210SJason Evans 1181a4bd5210SJason Evans Bug fixes: 1182a4bd5210SJason Evans - Fix compilation when --enable-fill is specified. 1183a4bd5210SJason Evans - Fix threads-related profiling bugs that affected accuracy and caused memory 1184a4bd5210SJason Evans to be leaked during thread exit. 1185a4bd5210SJason Evans - Fix dirty page purging race conditions that could cause crashes. 1186a4bd5210SJason Evans - Fix crash in tcache flushing code during thread destruction. 1187a4bd5210SJason Evans 1188a4bd5210SJason Evans* 1.0.0 (April 11, 2010) 1189a4bd5210SJason Evans 1190a4bd5210SJason Evans This release focuses on speed and run-time introspection. Numerous 1191a4bd5210SJason Evans algorithmic improvements make this release substantially faster than its 1192a4bd5210SJason Evans predecessors. 1193a4bd5210SJason Evans 1194a4bd5210SJason Evans New features: 1195a4bd5210SJason Evans - Implement autoconf-based configuration system. 1196a4bd5210SJason Evans - Add mallctl*(), for the purposes of introspection and run-time 1197a4bd5210SJason Evans configuration. 1198a4bd5210SJason Evans - Make it possible for the application to manually flush a thread's cache, via 1199a4bd5210SJason Evans the "tcache.flush" mallctl. 1200a4bd5210SJason Evans - Base maximum dirty page count on proportion of active memory. 1201d0e79aa3SJason Evans - Compute various additional run-time statistics, including per size class 1202a4bd5210SJason Evans statistics for large objects. 1203a4bd5210SJason Evans - Expose malloc_stats_print(), which can be called repeatedly by the 1204a4bd5210SJason Evans application. 1205a4bd5210SJason Evans - Simplify the malloc_message() signature to only take one string argument, 1206a4bd5210SJason Evans and incorporate an opaque data pointer argument for use by the application 1207a4bd5210SJason Evans in combination with malloc_stats_print(). 1208a4bd5210SJason Evans - Add support for allocation backed by one or more swap files, and allow the 1209a4bd5210SJason Evans application to disable over-commit if swap files are in use. 1210a4bd5210SJason Evans - Implement allocation profiling and leak checking. 1211a4bd5210SJason Evans 1212a4bd5210SJason Evans Removed features: 1213a4bd5210SJason Evans - Remove the dynamic arena rebalancing code, since thread-specific caching 1214a4bd5210SJason Evans reduces its utility. 1215a4bd5210SJason Evans 1216a4bd5210SJason Evans Bug fixes: 1217a4bd5210SJason Evans - Modify chunk allocation to work when address space layout randomization 1218a4bd5210SJason Evans (ASLR) is in use. 1219a4bd5210SJason Evans - Fix thread cleanup bugs related to TLS destruction. 1220a4bd5210SJason Evans - Handle 0-size allocation requests in posix_memalign(). 1221a4bd5210SJason Evans - Fix a chunk leak. The leaked chunks were never touched, so this impacted 1222a4bd5210SJason Evans virtual memory usage, but not physical memory usage. 1223a4bd5210SJason Evans 1224a4bd5210SJason Evans* linux_2008082[78]a (August 27/28, 2008) 1225a4bd5210SJason Evans 1226a4bd5210SJason Evans These snapshot releases are the simple result of incorporating Linux-specific 1227a4bd5210SJason Evans support into the FreeBSD malloc sources. 1228a4bd5210SJason Evans 1229a4bd5210SJason Evans-------------------------------------------------------------------------------- 1230a4bd5210SJason Evansvim:filetype=text:textwidth=80 1231