1a4bd5210SJason EvansFollowing are change highlights associated with official releases. Important 2d0e79aa3SJason Evansbug fixes are all mentioned, but some internal enhancements are omitted here for 3d0e79aa3SJason Evansbrevity. Much more detail can be found in the git revision history: 4a4bd5210SJason Evans 5706d9bd1SJason Evans https://github.com/jemalloc/jemalloc 6706d9bd1SJason Evans 7*c5ad8142SEric van Gyzen* 5.2.1 (August 5, 2019) 8*c5ad8142SEric van Gyzen 9*c5ad8142SEric van Gyzen This release is primarily about Windows. A critical virtual memory leak is 10*c5ad8142SEric van Gyzen resolved on all Windows platforms. The regression was present in all releases 11*c5ad8142SEric van Gyzen since 5.0.0. 12*c5ad8142SEric van Gyzen 13*c5ad8142SEric van Gyzen Bug fixes: 14*c5ad8142SEric van Gyzen - Fix a severe virtual memory leak on Windows. This regression was first 15*c5ad8142SEric van Gyzen released in 5.0.0. (@Ignition, @j0t, @frederik-h, @davidtgoldblatt, 16*c5ad8142SEric van Gyzen @interwq) 17*c5ad8142SEric van Gyzen - Fix size 0 handling in posix_memalign(). This regression was first released 18*c5ad8142SEric van Gyzen in 5.2.0. (@interwq) 19*c5ad8142SEric van Gyzen - Fix the prof_log unit test which may observe unexpected backtraces from 20*c5ad8142SEric van Gyzen compiler optimizations. The test was first added in 5.2.0. (@marxin, 21*c5ad8142SEric van Gyzen @gnzlbg, @interwq) 22*c5ad8142SEric van Gyzen - Fix the declaration of the extent_avail tree. This regression was first 23*c5ad8142SEric van Gyzen released in 5.1.0. (@zoulasc) 24*c5ad8142SEric van Gyzen - Fix an incorrect reference in jeprof. This functionality was first released 25*c5ad8142SEric van Gyzen in 3.0.0. (@prehistoric-penguin) 26*c5ad8142SEric van Gyzen - Fix an assertion on the deallocation fast-path. This regression was first 27*c5ad8142SEric van Gyzen released in 5.2.0. (@yinan1048576) 28*c5ad8142SEric van Gyzen - Fix the TLS_MODEL attribute in headers. This regression was first released 29*c5ad8142SEric van Gyzen in 5.0.0. (@zoulasc, @interwq) 30*c5ad8142SEric van Gyzen 31*c5ad8142SEric van Gyzen Optimizations and refactors: 32*c5ad8142SEric van Gyzen - Implement opt.retain on Windows and enable by default on 64-bit. (@interwq, 33*c5ad8142SEric van Gyzen @davidtgoldblatt) 34*c5ad8142SEric van Gyzen - Optimize away a branch on the operator delete[] path. (@mgrice) 35*c5ad8142SEric van Gyzen - Add format annotation to the format generator function. (@zoulasc) 36*c5ad8142SEric van Gyzen - Refactor and improve the size class header generation. (@yinan1048576) 37*c5ad8142SEric van Gyzen - Remove best fit. (@djwatson) 38*c5ad8142SEric van Gyzen - Avoid blocking on background thread locks for stats. (@oranagra, @interwq) 39*c5ad8142SEric van Gyzen 40*c5ad8142SEric van Gyzen* 5.2.0 (April 2, 2019) 41*c5ad8142SEric van Gyzen 42*c5ad8142SEric van Gyzen This release includes a few notable improvements, which are summarized below: 43*c5ad8142SEric van Gyzen 1) improved fast-path performance from the optimizations by @djwatson; 2) 44*c5ad8142SEric van Gyzen reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on 45*c5ad8142SEric van Gyzen setting the number of background threads. In addition, peak / spike memory 46*c5ad8142SEric van Gyzen usage is improved with certain allocation patterns. As usual, the release and 47*c5ad8142SEric van Gyzen prior dev versions have gone through large-scale production testing. 48*c5ad8142SEric van Gyzen 49*c5ad8142SEric van Gyzen New features: 50*c5ad8142SEric van Gyzen - Implement oversize_threshold, which uses a dedicated arena for allocations 51*c5ad8142SEric van Gyzen crossing the specified threshold to reduce fragmentation. (@interwq) 52*c5ad8142SEric van Gyzen - Add extents usage information to stats. (@tyleretzel) 53*c5ad8142SEric van Gyzen - Log time information for sampled allocations. (@tyleretzel) 54*c5ad8142SEric van Gyzen - Support 0 size in sdallocx. (@djwatson) 55*c5ad8142SEric van Gyzen - Output rate for certain counters in malloc_stats. (@zinoale) 56*c5ad8142SEric van Gyzen - Add configure option --enable-readlinkat, which allows the use of readlinkat 57*c5ad8142SEric van Gyzen over readlink. (@davidtgoldblatt) 58*c5ad8142SEric van Gyzen - Add configure options --{enable,disable}-{static,shared} to allow not 59*c5ad8142SEric van Gyzen building unwanted libraries. (@Ericson2314) 60*c5ad8142SEric van Gyzen - Add configure option --disable-libdl to enable fully static builds. 61*c5ad8142SEric van Gyzen (@interwq) 62*c5ad8142SEric van Gyzen - Add mallctl interfaces: 63*c5ad8142SEric van Gyzen + opt.oversize_threshold (@interwq) 64*c5ad8142SEric van Gyzen + stats.arenas.<i>.extent_avail (@tyleretzel) 65*c5ad8142SEric van Gyzen + stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel) 66*c5ad8142SEric van Gyzen + stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes 67*c5ad8142SEric van Gyzen (@tyleretzel) 68*c5ad8142SEric van Gyzen 69*c5ad8142SEric van Gyzen Portability improvements: 70*c5ad8142SEric van Gyzen - Update MSVC builds. (@maksqwe, @rustyx) 71*c5ad8142SEric van Gyzen - Workaround a compiler optimizer bug on s390x. (@rkmisra) 72*c5ad8142SEric van Gyzen - Make use of pthread_set_name_np(3) on FreeBSD. (@trasz) 73*c5ad8142SEric van Gyzen - Implement malloc_getcpu() to enable percpu_arena for windows. (@santagada) 74*c5ad8142SEric van Gyzen - Link against -pthread instead of -lpthread. (@paravoid) 75*c5ad8142SEric van Gyzen - Make background_thread not dependent on libdl. (@interwq) 76*c5ad8142SEric van Gyzen - Add stringify to fix a linker directive issue on MSVC. (@daverigby) 77*c5ad8142SEric van Gyzen - Detect and fall back when 8-bit atomics are unavailable. (@interwq) 78*c5ad8142SEric van Gyzen - Fall back to the default pthread_create if dlsym(3) fails. (@interwq) 79*c5ad8142SEric van Gyzen 80*c5ad8142SEric van Gyzen Optimizations and refactors: 81*c5ad8142SEric van Gyzen - Refactor the TSD module. (@davidtgoldblatt) 82*c5ad8142SEric van Gyzen - Avoid taking extents_muzzy mutex when muzzy is disabled. (@interwq) 83*c5ad8142SEric van Gyzen - Avoid taking large_mtx for auto arenas on the tcache flush path. (@interwq) 84*c5ad8142SEric van Gyzen - Optimize ixalloc by avoiding a size lookup. (@interwq) 85*c5ad8142SEric van Gyzen - Implement opt.oversize_threshold which uses a dedicated arena for requests 86*c5ad8142SEric van Gyzen crossing the threshold, also eagerly purges the oversize extents. Default 87*c5ad8142SEric van Gyzen the threshold to 8 MiB. (@interwq) 88*c5ad8142SEric van Gyzen - Clean compilation with -Wextra. (@gnzlbg, @jasone) 89*c5ad8142SEric van Gyzen - Refactor the size class module. (@davidtgoldblatt) 90*c5ad8142SEric van Gyzen - Refactor the stats emitter. (@tyleretzel) 91*c5ad8142SEric van Gyzen - Optimize pow2_ceil. (@rkmisra) 92*c5ad8142SEric van Gyzen - Avoid runtime detection of lazy purging on FreeBSD. (@trasz) 93*c5ad8142SEric van Gyzen - Optimize mmap(2) alignment handling on FreeBSD. (@trasz) 94*c5ad8142SEric van Gyzen - Improve error handling for THP state initialization. (@jsteemann) 95*c5ad8142SEric van Gyzen - Rework the malloc() fast path. (@djwatson) 96*c5ad8142SEric van Gyzen - Rework the free() fast path. (@djwatson) 97*c5ad8142SEric van Gyzen - Refactor and optimize the tcache fill / flush paths. (@djwatson) 98*c5ad8142SEric van Gyzen - Optimize sync / lwsync on PowerPC. (@chmeeedalf) 99*c5ad8142SEric van Gyzen - Bypass extent_dalloc() when retain is enabled. (@interwq) 100*c5ad8142SEric van Gyzen - Optimize the locking on large deallocation. (@interwq) 101*c5ad8142SEric van Gyzen - Reduce the number of pages committed from sanity checking in debug build. 102*c5ad8142SEric van Gyzen (@trasz, @interwq) 103*c5ad8142SEric van Gyzen - Deprecate OSSpinLock. (@interwq) 104*c5ad8142SEric van Gyzen - Lower the default number of background threads to 4 (when the feature 105*c5ad8142SEric van Gyzen is enabled). (@interwq) 106*c5ad8142SEric van Gyzen - Optimize the trylock spin wait. (@djwatson) 107*c5ad8142SEric van Gyzen - Use arena index for arena-matching checks. (@interwq) 108*c5ad8142SEric van Gyzen - Avoid forced decay on thread termination when using background threads. 109*c5ad8142SEric van Gyzen (@interwq) 110*c5ad8142SEric van Gyzen - Disable muzzy decay by default. (@djwatson, @interwq) 111*c5ad8142SEric van Gyzen - Only initialize libgcc unwinder when profiling is enabled. (@paravoid, 112*c5ad8142SEric van Gyzen @interwq) 113*c5ad8142SEric van Gyzen 114*c5ad8142SEric van Gyzen Bug fixes (all only relevant to jemalloc 5.x): 115*c5ad8142SEric van Gyzen - Fix background thread index issues with max_background_threads. (@djwatson, 116*c5ad8142SEric van Gyzen @interwq) 117*c5ad8142SEric van Gyzen - Fix stats output for opt.lg_extent_max_active_fit. (@interwq) 118*c5ad8142SEric van Gyzen - Fix opt.prof_prefix initialization. (@davidtgoldblatt) 119*c5ad8142SEric van Gyzen - Properly trigger decay on tcache destroy. (@interwq, @amosbird) 120*c5ad8142SEric van Gyzen - Fix tcache.flush. (@interwq) 121*c5ad8142SEric van Gyzen - Detect whether explicit extent zero out is necessary with huge pages or 122*c5ad8142SEric van Gyzen custom extent hooks, which may change the purge semantics. (@interwq) 123*c5ad8142SEric van Gyzen - Fix a side effect caused by extent_max_active_fit combined with decay-based 124*c5ad8142SEric van Gyzen purging, where freed extents can accumulate and not be reused for an 125*c5ad8142SEric van Gyzen extended period of time. (@interwq, @mpghf) 126*c5ad8142SEric van Gyzen - Fix a missing unlock on extent register error handling. (@zoulasc) 127*c5ad8142SEric van Gyzen 128*c5ad8142SEric van Gyzen Testing: 129*c5ad8142SEric van Gyzen - Simplify the Travis script output. (@gnzlbg) 130*c5ad8142SEric van Gyzen - Update the test scripts for FreeBSD. (@devnexen) 131*c5ad8142SEric van Gyzen - Add unit tests for the producer-consumer pattern. (@interwq) 132*c5ad8142SEric van Gyzen - Add Cirrus-CI config for FreeBSD builds. (@jasone) 133*c5ad8142SEric van Gyzen - Add size-matching sanity checks on tcache flush. (@davidtgoldblatt, 134*c5ad8142SEric van Gyzen @interwq) 135*c5ad8142SEric van Gyzen 136*c5ad8142SEric van Gyzen Incompatible changes: 137*c5ad8142SEric van Gyzen - Remove --with-lg-page-sizes. (@davidtgoldblatt) 138*c5ad8142SEric van Gyzen 139*c5ad8142SEric van Gyzen Documentation: 140*c5ad8142SEric van Gyzen - Attempt to build docs by default, however skip doc building when xsltproc 141*c5ad8142SEric van Gyzen is missing. (@interwq, @cmuellner) 142*c5ad8142SEric van Gyzen 143*c5ad8142SEric van Gyzen* 5.1.0 (May 4, 2018) 1440ef50b4eSJason Evans 1450ef50b4eSJason Evans This release is primarily about fine-tuning, ranging from several new features 1460ef50b4eSJason Evans to numerous notable performance and portability enhancements. The release and 1470ef50b4eSJason Evans prior dev versions have been running in multiple large scale applications for 1480ef50b4eSJason Evans months, and the cumulative improvements are substantial in many cases. 1490ef50b4eSJason Evans 1500ef50b4eSJason Evans Given the long and successful production runs, this release is likely a good 1510ef50b4eSJason Evans candidate for applications to upgrade, from both jemalloc 5.0 and before. For 1520ef50b4eSJason Evans performance-critical applications, the newly added TUNING.md provides 1530ef50b4eSJason Evans guidelines on jemalloc tuning. 1540ef50b4eSJason Evans 1550ef50b4eSJason Evans New features: 1560ef50b4eSJason Evans - Implement transparent huge page support for internal metadata. (@interwq) 1570ef50b4eSJason Evans - Add opt.thp to allow enabling / disabling transparent huge pages for all 1580ef50b4eSJason Evans mappings. (@interwq) 1590ef50b4eSJason Evans - Add maximum background thread count option. (@djwatson) 1600ef50b4eSJason Evans - Allow prof_active to control opt.lg_prof_interval and prof.gdump. 1610ef50b4eSJason Evans (@interwq) 1620ef50b4eSJason Evans - Allow arena index lookup based on allocation addresses via mallctl. 1630ef50b4eSJason Evans (@lionkov) 1640ef50b4eSJason Evans - Allow disabling initial-exec TLS model. (@davidtgoldblatt, @KenMacD) 1650ef50b4eSJason Evans - Add opt.lg_extent_max_active_fit to set the max ratio between the size of 1660ef50b4eSJason Evans the active extent selected (to split off from) and the size of the requested 1670ef50b4eSJason Evans allocation. (@interwq, @davidtgoldblatt) 1680ef50b4eSJason Evans - Add retain_grow_limit to set the max size when growing virtual address 1690ef50b4eSJason Evans space. (@interwq) 1700ef50b4eSJason Evans - Add mallctl interfaces: 1710ef50b4eSJason Evans + arena.<i>.retain_grow_limit (@interwq) 1720ef50b4eSJason Evans + arenas.lookup (@lionkov) 1730ef50b4eSJason Evans + max_background_threads (@djwatson) 1740ef50b4eSJason Evans + opt.lg_extent_max_active_fit (@interwq) 1750ef50b4eSJason Evans + opt.max_background_threads (@djwatson) 1760ef50b4eSJason Evans + opt.metadata_thp (@interwq) 1770ef50b4eSJason Evans + opt.thp (@interwq) 1780ef50b4eSJason Evans + stats.metadata_thp (@interwq) 1790ef50b4eSJason Evans 1800ef50b4eSJason Evans Portability improvements: 1810ef50b4eSJason Evans - Support GNU/kFreeBSD configuration. (@paravoid) 1820ef50b4eSJason Evans - Support m68k, nios2 and SH3 architectures. (@paravoid) 1830ef50b4eSJason Evans - Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable. (@zonyitoo) 1840ef50b4eSJason Evans - Fix symbol listing for cross-compiling. (@tamird) 1850ef50b4eSJason Evans - Fix high bits computation on ARM. (@davidtgoldblatt, @paravoid) 1860ef50b4eSJason Evans - Disable the CPU_SPINWAIT macro for Power. (@davidtgoldblatt, @marxin) 1870ef50b4eSJason Evans - Fix MSVC 2015 & 2017 builds. (@rustyx) 1880ef50b4eSJason Evans - Improve RISC-V support. (@EdSchouten) 1890ef50b4eSJason Evans - Set name mangling script in strict mode. (@nicolov) 1900ef50b4eSJason Evans - Avoid MADV_HUGEPAGE on ARM. (@marxin) 1910ef50b4eSJason Evans - Modify configure to determine return value of strerror_r. 1920ef50b4eSJason Evans (@davidtgoldblatt, @cferris1000) 1930ef50b4eSJason Evans - Make sure CXXFLAGS is tested with CPP compiler. (@nehaljwani) 1940ef50b4eSJason Evans - Fix 32-bit build on MSVC. (@rustyx) 1950ef50b4eSJason Evans - Fix external symbol on MSVC. (@maksqwe) 1960ef50b4eSJason Evans - Avoid a printf format specifier warning. (@jasone) 1970ef50b4eSJason Evans - Add configure option --disable-initial-exec-tls which can allow jemalloc to 1980ef50b4eSJason Evans be dynamically loaded after program startup. (@davidtgoldblatt, @KenMacD) 1990ef50b4eSJason Evans - AArch64: Add ILP32 support. (@cmuellner) 2000ef50b4eSJason Evans - Add --with-lg-vaddr configure option to support cross compiling. 2010ef50b4eSJason Evans (@cmuellner, @davidtgoldblatt) 2020ef50b4eSJason Evans 2030ef50b4eSJason Evans Optimizations and refactors: 2040ef50b4eSJason Evans - Improve active extent fit with extent_max_active_fit. This considerably 2050ef50b4eSJason Evans reduces fragmentation over time and improves virtual memory and metadata 2060ef50b4eSJason Evans usage. (@davidtgoldblatt, @interwq) 2070ef50b4eSJason Evans - Eagerly coalesce large extents to reduce fragmentation. (@interwq) 2080ef50b4eSJason Evans - sdallocx: only read size info when page aligned (i.e. possibly sampled), 2090ef50b4eSJason Evans which speeds up the sized deallocation path significantly. (@interwq) 2100ef50b4eSJason Evans - Avoid attempting new mappings for in place expansion with retain, since 2110ef50b4eSJason Evans it rarely succeeds in practice and causes high overhead. (@interwq) 2120ef50b4eSJason Evans - Refactor OOM handling in newImpl. (@wqfish) 2130ef50b4eSJason Evans - Add internal fine-grained logging functionality for debugging use. 2140ef50b4eSJason Evans (@davidtgoldblatt) 2150ef50b4eSJason Evans - Refactor arena / tcache interactions. (@davidtgoldblatt) 2160ef50b4eSJason Evans - Refactor extent management with dumpable flag. (@davidtgoldblatt) 2170ef50b4eSJason Evans - Add runtime detection of lazy purging. (@interwq) 2180ef50b4eSJason Evans - Use pairing heap instead of red-black tree for extents_avail. (@djwatson) 2190ef50b4eSJason Evans - Use sysctl on startup in FreeBSD. (@trasz) 2200ef50b4eSJason Evans - Use thread local prng state instead of atomic. (@djwatson) 2210ef50b4eSJason Evans - Make decay to always purge one more extent than before, because in 2220ef50b4eSJason Evans practice large extents are usually the ones that cross the decay threshold. 2230ef50b4eSJason Evans Purging the additional extent helps save memory as well as reduce VM 2240ef50b4eSJason Evans fragmentation. (@interwq) 2250ef50b4eSJason Evans - Fast division by dynamic values. (@davidtgoldblatt) 2260ef50b4eSJason Evans - Improve the fit for aligned allocation. (@interwq, @edwinsmith) 2270ef50b4eSJason Evans - Refactor extent_t bitpacking. (@rkmisra) 2280ef50b4eSJason Evans - Optimize the generated assembly for ticker operations. (@davidtgoldblatt) 2290ef50b4eSJason Evans - Convert stats printing to use a structured text emitter. (@davidtgoldblatt) 2300ef50b4eSJason Evans - Remove preserve_lru feature for extents management. (@djwatson) 2310ef50b4eSJason Evans - Consolidate two memory loads into one on the fast deallocation path. 2320ef50b4eSJason Evans (@davidtgoldblatt, @interwq) 2330ef50b4eSJason Evans 2340ef50b4eSJason Evans Bug fixes (most of the issues are only relevant to jemalloc 5.0): 2350ef50b4eSJason Evans - Fix deadlock with multithreaded fork in OS X. (@davidtgoldblatt) 2360ef50b4eSJason Evans - Validate returned file descriptor before use. (@zonyitoo) 2370ef50b4eSJason Evans - Fix a few background thread initialization and shutdown issues. (@interwq) 2380ef50b4eSJason Evans - Fix an extent coalesce + decay race by taking both coalescing extents off 2390ef50b4eSJason Evans the LRU list. (@interwq) 2400ef50b4eSJason Evans - Fix potentially unbound increase during decay, caused by one thread keep 2410ef50b4eSJason Evans stashing memory to purge while other threads generating new pages. The 2420ef50b4eSJason Evans number of pages to purge is checked to prevent this. (@interwq) 2430ef50b4eSJason Evans - Fix a FreeBSD bootstrap assertion. (@strejda, @interwq) 2440ef50b4eSJason Evans - Handle 32 bit mutex counters. (@rkmisra) 2450ef50b4eSJason Evans - Fix a indexing bug when creating background threads. (@davidtgoldblatt, 2460ef50b4eSJason Evans @binliu19) 2470ef50b4eSJason Evans - Fix arguments passed to extent_init. (@yuleniwo, @interwq) 2480ef50b4eSJason Evans - Fix addresses used for ordering mutexes. (@rkmisra) 2490ef50b4eSJason Evans - Fix abort_conf processing during bootstrap. (@interwq) 2500ef50b4eSJason Evans - Fix include path order for out-of-tree builds. (@cmuellner) 2510ef50b4eSJason Evans 2520ef50b4eSJason Evans Incompatible changes: 2530ef50b4eSJason Evans - Remove --disable-thp. (@interwq) 2540ef50b4eSJason Evans - Remove mallctl interfaces: 2550ef50b4eSJason Evans + config.thp (@interwq) 2560ef50b4eSJason Evans 2570ef50b4eSJason Evans Documentation: 2580ef50b4eSJason Evans - Add TUNING.md. (@interwq, @davidtgoldblatt, @djwatson) 2590ef50b4eSJason Evans 2608b2f5aafSJason Evans* 5.0.1 (July 1, 2017) 2618b2f5aafSJason Evans 2628b2f5aafSJason Evans This bugfix release fixes several issues, most of which are obscure enough 2638b2f5aafSJason Evans that typical applications are not impacted. 2648b2f5aafSJason Evans 2658b2f5aafSJason Evans Bug fixes: 2668b2f5aafSJason Evans - Update decay->nunpurged before purging, in order to avoid potential update 2678b2f5aafSJason Evans races and subsequent incorrect purging volume. (@interwq) 2688b2f5aafSJason Evans - Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy 2698b2f5aafSJason Evans locking and/or background threads). This mitigates an initialization 2708b2f5aafSJason Evans failure bug for which we still do not have a clear reproduction test case. 2718b2f5aafSJason Evans (@interwq) 2728b2f5aafSJason Evans - Modify tsd management so that it neither crashes nor leaks if a thread's 2738b2f5aafSJason Evans only allocation activity is to call free() after TLS destructors have been 2748b2f5aafSJason Evans executed. This behavior was observed when operating with GNU libc, and is 2758b2f5aafSJason Evans unlikely to be an issue with other libc implementations. (@interwq) 2768b2f5aafSJason Evans - Mask signals during background thread creation. This prevents signals from 2778b2f5aafSJason Evans being inadvertently delivered to background threads. (@jasone, 2788b2f5aafSJason Evans @davidtgoldblatt, @interwq) 2798b2f5aafSJason Evans - Avoid inactivity checks within background threads, in order to prevent 2808b2f5aafSJason Evans recursive mutex acquisition. (@interwq) 2818b2f5aafSJason Evans - Fix extent_grow_retained() to use the specified hooks when the 2828b2f5aafSJason Evans arena.<i>.extent_hooks mallctl is used to override the default hooks. 2838b2f5aafSJason Evans (@interwq) 2848b2f5aafSJason Evans - Add missing reentrancy support for custom extent hooks which allocate. 2858b2f5aafSJason Evans (@interwq) 2868b2f5aafSJason Evans - Post-fork(2), re-initialize the list of tcaches associated with each arena 2878b2f5aafSJason Evans to contain no tcaches except the forking thread's. (@interwq) 2888b2f5aafSJason Evans - Add missing post-fork(2) mutex reinitialization for extent_grow_mtx. This 2898b2f5aafSJason Evans fixes potential deadlocks after fork(2). (@interwq) 2908b2f5aafSJason Evans - Enforce minimum autoconf version (currently 2.68), since 2.63 is known to 2918b2f5aafSJason Evans generate corrupt configure scripts. (@jasone) 2928b2f5aafSJason Evans - Ensure that the configured page size (--with-lg-page) is no larger than the 2938b2f5aafSJason Evans configured huge page size (--with-lg-hugepage). (@jasone) 2948b2f5aafSJason Evans 295b7eaed25SJason Evans* 5.0.0 (June 13, 2017) 296b7eaed25SJason Evans 297b7eaed25SJason Evans Unlike all previous jemalloc releases, this release does not use naturally 298b7eaed25SJason Evans aligned "chunks" for virtual memory management, and instead uses page-aligned 299b7eaed25SJason Evans "extents". This change has few externally visible effects, but the internal 300b7eaed25SJason Evans impacts are... extensive. Many other internal changes combine to make this 301b7eaed25SJason Evans the most cohesively designed version of jemalloc so far, with ample 302b7eaed25SJason Evans opportunity for further enhancements. 303b7eaed25SJason Evans 304b7eaed25SJason Evans Continuous integration is now an integral aspect of development thanks to the 305b7eaed25SJason Evans efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably 306b7eaed25SJason Evans stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a 307b7eaed25SJason Evans side effect the official release frequency may decrease over time. 308b7eaed25SJason Evans 309b7eaed25SJason Evans New features: 310b7eaed25SJason Evans - Implement optional per-CPU arena support; threads choose which arena to use 311b7eaed25SJason Evans based on current CPU rather than on fixed thread-->arena associations. 312b7eaed25SJason Evans (@interwq) 313b7eaed25SJason Evans - Implement two-phase decay of unused dirty pages. Pages transition from 314b7eaed25SJason Evans dirty-->muzzy-->clean, where the first phase transition relies on 315b7eaed25SJason Evans madvise(... MADV_FREE) semantics, and the second phase transition discards 316b7eaed25SJason Evans pages such that they are replaced with demand-zeroed pages on next access. 317b7eaed25SJason Evans (@jasone) 318b7eaed25SJason Evans - Increase decay time resolution from seconds to milliseconds. (@jasone) 319b7eaed25SJason Evans - Implement opt-in per CPU background threads, and use them for asynchronous 320b7eaed25SJason Evans decay-driven unused dirty page purging. (@interwq) 321b7eaed25SJason Evans - Add mutex profiling, which collects a variety of statistics useful for 322b7eaed25SJason Evans diagnosing overhead/contention issues. (@interwq) 323b7eaed25SJason Evans - Add C++ new/delete operator bindings. (@djwatson) 324b7eaed25SJason Evans - Support manually created arena destruction, such that all data and metadata 325b7eaed25SJason Evans are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats 326b7eaed25SJason Evans associated with destroyed arenas. (@jasone) 327b7eaed25SJason Evans - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing 328b7eaed25SJason Evans merged/destroyed arena statistics via mallctl. (@jasone) 329b7eaed25SJason Evans - Add opt.abort_conf to optionally abort if invalid configuration options are 330b7eaed25SJason Evans detected during initialization. (@interwq) 331b7eaed25SJason Evans - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the 332b7eaed25SJason Evans stats dumped during exit if opt.stats_print is true. (@jasone) 333b7eaed25SJason Evans - Add --with-version=VERSION for use when embedding jemalloc into another 334b7eaed25SJason Evans project's git repository. (@jasone) 335b7eaed25SJason Evans - Add --disable-thp to support cross compiling. (@jasone) 336b7eaed25SJason Evans - Add --with-lg-hugepage to support cross compiling. (@jasone) 337b7eaed25SJason Evans - Add mallctl interfaces (various authors): 338b7eaed25SJason Evans + background_thread 339b7eaed25SJason Evans + opt.abort_conf 340b7eaed25SJason Evans + opt.retain 341b7eaed25SJason Evans + opt.percpu_arena 342b7eaed25SJason Evans + opt.background_thread 343b7eaed25SJason Evans + opt.{dirty,muzzy}_decay_ms 344b7eaed25SJason Evans + opt.stats_print_opts 345b7eaed25SJason Evans + arena.<i>.initialized 346b7eaed25SJason Evans + arena.<i>.destroy 347b7eaed25SJason Evans + arena.<i>.{dirty,muzzy}_decay_ms 348b7eaed25SJason Evans + arena.<i>.extent_hooks 349b7eaed25SJason Evans + arenas.{dirty,muzzy}_decay_ms 350b7eaed25SJason Evans + arenas.bin.<i>.slab_size 351b7eaed25SJason Evans + arenas.nlextents 352b7eaed25SJason Evans + arenas.lextent.<i>.size 353b7eaed25SJason Evans + arenas.create 354b7eaed25SJason Evans + stats.background_thread.{num_threads,num_runs,run_interval} 355b7eaed25SJason Evans + stats.mutexes.{ctl,background_thread,prof,reset}. 356b7eaed25SJason Evans {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds, 357b7eaed25SJason Evans num_owner_switch} 358b7eaed25SJason Evans + stats.arenas.<i>.{dirty,muzzy}_decay_ms 359b7eaed25SJason Evans + stats.arenas.<i>.uptime 360b7eaed25SJason Evans + stats.arenas.<i>.{pmuzzy,base,internal,resident} 361b7eaed25SJason Evans + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged} 362b7eaed25SJason Evans + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs} 363b7eaed25SJason Evans + stats.arenas.<i>.bins.<j>.mutex. 364b7eaed25SJason Evans {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds, 365b7eaed25SJason Evans num_owner_switch} 366b7eaed25SJason Evans + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents} 367b7eaed25SJason Evans + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy, 368b7eaed25SJason Evans extents_retained,decay_dirty,decay_muzzy,base,tcache_list}. 369b7eaed25SJason Evans {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds, 370b7eaed25SJason Evans num_owner_switch} 371b7eaed25SJason Evans 372b7eaed25SJason Evans Portability improvements: 373b7eaed25SJason Evans - Improve reentrant allocation support, such that deadlock is less likely if 374b7eaed25SJason Evans e.g. a system library call in turn allocates memory. (@davidtgoldblatt, 375b7eaed25SJason Evans @interwq) 376b7eaed25SJason Evans - Support static linking of jemalloc with glibc. (@djwatson) 377b7eaed25SJason Evans 378b7eaed25SJason Evans Optimizations and refactors: 379b7eaed25SJason Evans - Organize virtual memory as "extents" of virtual memory pages, rather than as 380b7eaed25SJason Evans naturally aligned "chunks", and store all metadata in arbitrarily distant 381b7eaed25SJason Evans locations. This reduces virtual memory external fragmentation, and will 382b7eaed25SJason Evans interact better with huge pages (not yet explicitly supported). (@jasone) 383b7eaed25SJason Evans - Fold large and huge size classes together; only small and large size classes 384b7eaed25SJason Evans remain. (@jasone) 385b7eaed25SJason Evans - Unify the allocation paths, and merge most fast-path branching decisions. 386b7eaed25SJason Evans (@davidtgoldblatt, @interwq) 387b7eaed25SJason Evans - Embed per thread automatic tcache into thread-specific data, which reduces 388b7eaed25SJason Evans conditional branches and dereferences. Also reorganize tcache to increase 389b7eaed25SJason Evans fast-path data locality. (@interwq) 390b7eaed25SJason Evans - Rewrite atomics to closely model the C11 API, convert various 391b7eaed25SJason Evans synchronization from mutex-based to atomic, and use the explicit memory 392b7eaed25SJason Evans ordering control to resolve various hypothetical races without increasing 393b7eaed25SJason Evans synchronization overhead. (@davidtgoldblatt) 394b7eaed25SJason Evans - Extensively optimize rtree via various methods: 395b7eaed25SJason Evans + Add multiple layers of rtree lookup caching, since rtree lookups are now 396b7eaed25SJason Evans part of fast-path deallocation. (@interwq) 397b7eaed25SJason Evans + Determine rtree layout at compile time. (@jasone) 398b7eaed25SJason Evans + Make the tree shallower for common configurations. (@jasone) 399b7eaed25SJason Evans + Embed the root node in the top-level rtree data structure, thus avoiding 400b7eaed25SJason Evans one level of indirection. (@jasone) 401b7eaed25SJason Evans + Further specialize leaf elements as compared to internal node elements, 402b7eaed25SJason Evans and directly embed extent metadata needed for fast-path deallocation. 403b7eaed25SJason Evans (@jasone) 404b7eaed25SJason Evans + Ignore leading always-zero address bits (architecture-specific). 405b7eaed25SJason Evans (@jasone) 406b7eaed25SJason Evans - Reorganize headers (ongoing work) to make them hermetic, and disentangle 407b7eaed25SJason Evans various module dependencies. (@davidtgoldblatt) 408b7eaed25SJason Evans - Convert various internal data structures such as size class metadata from 409b7eaed25SJason Evans boot-time-initialized to compile-time-initialized. Propagate resulting data 410b7eaed25SJason Evans structure simplifications, such as making arena metadata fixed-size. 411b7eaed25SJason Evans (@jasone) 412b7eaed25SJason Evans - Simplify size class lookups when constrained to size classes that are 413b7eaed25SJason Evans multiples of the page size. This speeds lookups, but the primary benefit is 414b7eaed25SJason Evans complexity reduction in code that was the source of numerous regressions. 415b7eaed25SJason Evans (@jasone) 416b7eaed25SJason Evans - Lock individual extents when possible for localized extent operations, 417b7eaed25SJason Evans rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone) 418b7eaed25SJason Evans - Use first fit layout policy instead of best fit, in order to improve 419b7eaed25SJason Evans packing. (@jasone) 420b7eaed25SJason Evans - If munmap(2) is not in use, use an exponential series to grow each arena's 421b7eaed25SJason Evans virtual memory, so that the number of disjoint virtual memory mappings 422b7eaed25SJason Evans remains low. (@jasone) 423b7eaed25SJason Evans - Implement per arena base allocators, so that arenas never share any virtual 424b7eaed25SJason Evans memory pages. (@jasone) 425b7eaed25SJason Evans - Automatically generate private symbol name mangling macros. (@jasone) 426b7eaed25SJason Evans 427b7eaed25SJason Evans Incompatible changes: 428b7eaed25SJason Evans - Replace chunk hooks with an expanded/normalized set of extent hooks. 429b7eaed25SJason Evans (@jasone) 430b7eaed25SJason Evans - Remove ratio-based purging. (@jasone) 431b7eaed25SJason Evans - Remove --disable-tcache. (@jasone) 432b7eaed25SJason Evans - Remove --disable-tls. (@jasone) 433b7eaed25SJason Evans - Remove --enable-ivsalloc. (@jasone) 434b7eaed25SJason Evans - Remove --with-lg-size-class-group. (@jasone) 435b7eaed25SJason Evans - Remove --with-lg-tiny-min. (@jasone) 436b7eaed25SJason Evans - Remove --disable-cc-silence. (@jasone) 437b7eaed25SJason Evans - Remove --enable-code-coverage. (@jasone) 438b7eaed25SJason Evans - Remove --disable-munmap (replaced by opt.retain). (@jasone) 439b7eaed25SJason Evans - Remove Valgrind support. (@jasone) 440b7eaed25SJason Evans - Remove quarantine support. (@jasone) 441b7eaed25SJason Evans - Remove redzone support. (@jasone) 442b7eaed25SJason Evans - Remove mallctl interfaces (various authors): 443b7eaed25SJason Evans + config.munmap 444b7eaed25SJason Evans + config.tcache 445b7eaed25SJason Evans + config.tls 446b7eaed25SJason Evans + config.valgrind 447b7eaed25SJason Evans + opt.lg_chunk 448b7eaed25SJason Evans + opt.purge 449b7eaed25SJason Evans + opt.lg_dirty_mult 450b7eaed25SJason Evans + opt.decay_time 451b7eaed25SJason Evans + opt.quarantine 452b7eaed25SJason Evans + opt.redzone 453b7eaed25SJason Evans + opt.thp 454b7eaed25SJason Evans + arena.<i>.lg_dirty_mult 455b7eaed25SJason Evans + arena.<i>.decay_time 456b7eaed25SJason Evans + arena.<i>.chunk_hooks 457b7eaed25SJason Evans + arenas.initialized 458b7eaed25SJason Evans + arenas.lg_dirty_mult 459b7eaed25SJason Evans + arenas.decay_time 460b7eaed25SJason Evans + arenas.bin.<i>.run_size 461b7eaed25SJason Evans + arenas.nlruns 462b7eaed25SJason Evans + arenas.lrun.<i>.size 463b7eaed25SJason Evans + arenas.nhchunks 464b7eaed25SJason Evans + arenas.hchunk.<i>.size 465b7eaed25SJason Evans + arenas.extend 466b7eaed25SJason Evans + stats.cactive 467b7eaed25SJason Evans + stats.arenas.<i>.lg_dirty_mult 468b7eaed25SJason Evans + stats.arenas.<i>.decay_time 469b7eaed25SJason Evans + stats.arenas.<i>.metadata.{mapped,allocated} 470b7eaed25SJason Evans + stats.arenas.<i>.{npurge,nmadvise,purged} 471b7eaed25SJason Evans + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests} 472b7eaed25SJason Evans + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns} 473b7eaed25SJason Evans + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns} 474b7eaed25SJason Evans + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks} 475b7eaed25SJason Evans 476b7eaed25SJason Evans Bug fixes: 477b7eaed25SJason Evans - Improve interval-based profile dump triggering to dump only one profile when 478b7eaed25SJason Evans a single allocation's size exceeds the interval. (@jasone) 479b7eaed25SJason Evans - Use prefixed function names (as controlled by --with-jemalloc-prefix) when 480b7eaed25SJason Evans pruning backtrace frames in jeprof. (@jasone) 481b7eaed25SJason Evans 4828244f2aaSJason Evans* 4.5.0 (February 28, 2017) 4838244f2aaSJason Evans 4848244f2aaSJason Evans This is the first release to benefit from much broader continuous integration 4858244f2aaSJason Evans testing, thanks to @davidtgoldblatt. Had we had this testing infrastructure 4868244f2aaSJason Evans in place for prior releases, it would have caught all of the most serious 4878244f2aaSJason Evans regressions fixed by this release. 4888244f2aaSJason Evans 4898244f2aaSJason Evans New features: 490b7eaed25SJason Evans - Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for 4918244f2aaSJason Evans transparent huge page integration. (@jasone) 4928244f2aaSJason Evans - Update zone allocator integration to work with macOS 10.12. (@glandium) 4938244f2aaSJason Evans - Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and 4948244f2aaSJason Evans EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not 4958244f2aaSJason Evans during configuration. (@jasone, @ronawho) 4968244f2aaSJason Evans 4978244f2aaSJason Evans Bug fixes: 4988244f2aaSJason Evans - Fix DSS (sbrk(2)-based) allocation. This regression was first released in 4998244f2aaSJason Evans 4.3.0. (@jasone) 5008244f2aaSJason Evans - Handle race in per size class utilization computation. This functionality 5018244f2aaSJason Evans was first released in 4.0.0. (@interwq) 5028244f2aaSJason Evans - Fix lock order reversal during gdump. (@jasone) 503b7eaed25SJason Evans - Fix/refactor tcache synchronization. This regression was first released in 5048244f2aaSJason Evans 4.0.0. (@jasone) 5058244f2aaSJason Evans - Fix various JSON-formatted malloc_stats_print() bugs. This functionality 5068244f2aaSJason Evans was first released in 4.3.0. (@jasone) 5078244f2aaSJason Evans - Fix huge-aligned allocation. This regression was first released in 4.4.0. 5088244f2aaSJason Evans (@jasone) 5098244f2aaSJason Evans - When transparent huge page integration is enabled, detect what state pages 5108244f2aaSJason Evans start in according to the kernel's current operating mode, and only convert 5118244f2aaSJason Evans arena chunks to non-huge during purging if that is not their initial state. 5128244f2aaSJason Evans This functionality was first released in 4.4.0. (@jasone) 5138244f2aaSJason Evans - Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case. 5148244f2aaSJason Evans This regression was first released in 4.0.0. (@jasone, @428desmo) 5158244f2aaSJason Evans - Properly detect sparc64 when building for Linux. (@glaubitz) 5168244f2aaSJason Evans 5177fa7f12fSJason Evans* 4.4.0 (December 3, 2016) 5187fa7f12fSJason Evans 5197fa7f12fSJason Evans New features: 5207fa7f12fSJason Evans - Add configure support for *-*-linux-android. (@cferris1000, @jasone) 5217fa7f12fSJason Evans - Add the --disable-syscall configure option, for use on systems that place 5227fa7f12fSJason Evans security-motivated limitations on syscall(2). (@jasone) 5237fa7f12fSJason Evans - Add support for Debian GNU/kFreeBSD. (@thesam) 5247fa7f12fSJason Evans 5257fa7f12fSJason Evans Optimizations: 5267fa7f12fSJason Evans - Add extent serial numbers and use them where appropriate as a sort key that 5277fa7f12fSJason Evans is higher priority than address, so that the allocation policy prefers older 5287fa7f12fSJason Evans extents. This tends to improve locality (decrease fragmentation) when 5297fa7f12fSJason Evans memory grows downward. (@jasone) 5307fa7f12fSJason Evans - Refactor madvise(2) configuration so that MADV_FREE is detected and utilized 5317fa7f12fSJason Evans on Linux 4.5 and newer. (@jasone) 5327fa7f12fSJason Evans - Mark partially purged arena chunks as non-huge-page. This improves 5337fa7f12fSJason Evans interaction with Linux's transparent huge page functionality. (@jasone) 5347fa7f12fSJason Evans 5357fa7f12fSJason Evans Bug fixes: 5367fa7f12fSJason Evans - Fix size class computations for edge conditions involving extremely large 5377fa7f12fSJason Evans allocations. This regression was first released in 4.0.0. (@jasone, 5387fa7f12fSJason Evans @ingvarha) 5397fa7f12fSJason Evans - Remove overly restrictive assertions related to the cactive statistic. This 5407fa7f12fSJason Evans regression was first released in 4.1.0. (@jasone) 5417fa7f12fSJason Evans - Implement a more reliable detection scheme for os_unfair_lock on macOS. 5427fa7f12fSJason Evans (@jszakmeister) 5437fa7f12fSJason Evans 544bde95144SJason Evans* 4.3.1 (November 7, 2016) 545bde95144SJason Evans 546bde95144SJason Evans Bug fixes: 547bde95144SJason Evans - Fix a severe virtual memory leak. This regression was first released in 548bde95144SJason Evans 4.3.0. (@interwq, @jasone) 549bde95144SJason Evans - Refactor atomic and prng APIs to restore support for 32-bit platforms that 550bde95144SJason Evans use pre-C11 toolchains, e.g. FreeBSD's mips. (@jasone) 551bde95144SJason Evans 552bde95144SJason Evans* 4.3.0 (November 4, 2016) 553bde95144SJason Evans 554bde95144SJason Evans This is the first release that passes the test suite for multiple Windows 555bde95144SJason Evans configurations, thanks in large part to @glandium setting up continuous 556bde95144SJason Evans integration via AppVeyor (and Travis CI for Linux and OS X). 557bde95144SJason Evans 558bde95144SJason Evans New features: 559bde95144SJason Evans - Add "J" (JSON) support to malloc_stats_print(). (@jasone) 560bde95144SJason Evans - Add Cray compiler support. (@ronawho) 561bde95144SJason Evans 562bde95144SJason Evans Optimizations: 563bde95144SJason Evans - Add/use adaptive spinning for bootstrapping and radix tree node 564bde95144SJason Evans initialization. (@jasone) 565bde95144SJason Evans 566bde95144SJason Evans Bug fixes: 567bde95144SJason Evans - Fix large allocation to search starting in the optimal size class heap, 568bde95144SJason Evans which can substantially reduce virtual memory churn and fragmentation. This 569bde95144SJason Evans regression was first released in 4.0.0. (@mjp41, @jasone) 570bde95144SJason Evans - Fix stats.arenas.<i>.nthreads accounting. (@interwq) 571bde95144SJason Evans - Fix and simplify decay-based purging. (@jasone) 572bde95144SJason Evans - Make DSS (sbrk(2)-related) operations lockless, which resolves potential 573bde95144SJason Evans deadlocks during thread exit. (@jasone) 574bde95144SJason Evans - Fix over-sized allocation of radix tree leaf nodes. (@mjp41, @ogaun, 575bde95144SJason Evans @jasone) 576bde95144SJason Evans - Fix over-sized allocation of arena_t (plus associated stats) data 577bde95144SJason Evans structures. (@jasone, @interwq) 578bde95144SJason Evans - Fix EXTRA_CFLAGS to not affect configuration. (@jasone) 579bde95144SJason Evans - Fix a Valgrind integration bug. (@ronawho) 580bde95144SJason Evans - Disallow 0x5a junk filling when running in Valgrind. (@jasone) 581bde95144SJason Evans - Fix a file descriptor leak on Linux. This regression was first released in 582bde95144SJason Evans 4.2.0. (@vsarunas, @jasone) 583bde95144SJason Evans - Fix static linking of jemalloc with glibc. (@djwatson) 584bde95144SJason Evans - Use syscall(2) rather than {open,read,close}(2) during boot on Linux. This 585bde95144SJason Evans works around other libraries' system call wrappers performing reentrant 586bde95144SJason Evans allocation. (@kspinka, @Whissi, @jasone) 587bde95144SJason Evans - Fix OS X default zone replacement to work with OS X 10.12. (@glandium, 588bde95144SJason Evans @jasone) 589bde95144SJason Evans - Fix cached memory management to avoid needless commit/decommit operations 590bde95144SJason Evans during purging, which resolves permanent virtual memory map fragmentation 591bde95144SJason Evans issues on Windows. (@mjp41, @jasone) 592bde95144SJason Evans - Fix TSD fetches to avoid (recursive) allocation. This is relevant to 593bde95144SJason Evans non-TLS and Windows configurations. (@jasone) 594bde95144SJason Evans - Fix malloc_conf overriding to work on Windows. (@jasone) 595bde95144SJason Evans - Forcibly disable lazy-lock on Windows (was forcibly *enabled*). (@jasone) 596bde95144SJason Evans 59762b2691eSJason Evans* 4.2.1 (June 8, 2016) 59862b2691eSJason Evans 59962b2691eSJason Evans Bug fixes: 60062b2691eSJason Evans - Fix bootstrapping issues for configurations that require allocation during 60162b2691eSJason Evans tsd initialization (e.g. --disable-tls). (@cferris1000, @jasone) 60262b2691eSJason Evans - Fix gettimeofday() version of nstime_update(). (@ronawho) 60362b2691eSJason Evans - Fix Valgrind regressions in calloc() and chunk_alloc_wrapper(). (@ronawho) 60462b2691eSJason Evans - Fix potential VM map fragmentation regression. (@jasone) 60562b2691eSJason Evans - Fix opt_zero-triggered in-place huge reallocation zeroing. (@jasone) 60662b2691eSJason Evans - Fix heap profiling context leaks in reallocation edge cases. (@jasone) 60762b2691eSJason Evans 6081f0a49e8SJason Evans* 4.2.0 (May 12, 2016) 6091f0a49e8SJason Evans 6101f0a49e8SJason Evans New features: 6111f0a49e8SJason Evans - Add the arena.<i>.reset mallctl, which makes it possible to discard all of 6121f0a49e8SJason Evans an arena's allocations in a single operation. (@jasone) 6131f0a49e8SJason Evans - Add the stats.retained and stats.arenas.<i>.retained statistics. (@jasone) 6141f0a49e8SJason Evans - Add the --with-version configure option. (@jasone) 6151f0a49e8SJason Evans - Support --with-lg-page values larger than actual page size. (@jasone) 6161f0a49e8SJason Evans 6171f0a49e8SJason Evans Optimizations: 6181f0a49e8SJason Evans - Use pairing heaps rather than red-black trees for various hot data 6191f0a49e8SJason Evans structures. (@djwatson, @jasone) 6201f0a49e8SJason Evans - Streamline fast paths of rtree operations. (@jasone) 6211f0a49e8SJason Evans - Optimize the fast paths of calloc() and [m,d,sd]allocx(). (@jasone) 6221f0a49e8SJason Evans - Decommit unused virtual memory if the OS does not overcommit. (@jasone) 6231f0a49e8SJason Evans - Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order 6241f0a49e8SJason Evans to avoid unfortunate interactions during fork(2). (@jasone) 6251f0a49e8SJason Evans 6261f0a49e8SJason Evans Bug fixes: 6271f0a49e8SJason Evans - Fix chunk accounting related to triggering gdump profiles. (@jasone) 6281f0a49e8SJason Evans - Link against librt for clock_gettime(2) if glibc < 2.17. (@jasone) 6291f0a49e8SJason Evans - Scale leak report summary according to sampling probability. (@jasone) 6301f0a49e8SJason Evans 6311f0a49e8SJason Evans* 4.1.1 (May 3, 2016) 6321f0a49e8SJason Evans 6331f0a49e8SJason Evans This bugfix release resolves a variety of mostly minor issues, though the 6341f0a49e8SJason Evans bitmap fix is critical for 64-bit Windows. 6351f0a49e8SJason Evans 6361f0a49e8SJason Evans Bug fixes: 6371f0a49e8SJason Evans - Fix the linear scan version of bitmap_sfu() to shift by the proper amount 6381f0a49e8SJason Evans even when sizeof(long) is not the same as sizeof(void *), as on 64-bit 6391f0a49e8SJason Evans Windows. (@jasone) 6401f0a49e8SJason Evans - Fix hashing functions to avoid unaligned memory accesses (and resulting 6411f0a49e8SJason Evans crashes). This is relevant at least to some ARM-based platforms. 6421f0a49e8SJason Evans (@rkmisra) 6431f0a49e8SJason Evans - Fix fork()-related lock rank ordering reversals. These reversals were 6441f0a49e8SJason Evans unlikely to cause deadlocks in practice except when heap profiling was 6451f0a49e8SJason Evans enabled and active. (@jasone) 6461f0a49e8SJason Evans - Fix various chunk leaks in OOM code paths. (@jasone) 6471f0a49e8SJason Evans - Fix malloc_stats_print() to print opt.narenas correctly. (@jasone) 6481f0a49e8SJason Evans - Fix MSVC-specific build/test issues. (@rustyx, @yuslepukhin) 6491f0a49e8SJason Evans - Fix a variety of test failures that were due to test fragility rather than 6501f0a49e8SJason Evans core bugs. (@jasone) 6511f0a49e8SJason Evans 652df0d881dSJason Evans* 4.1.0 (February 28, 2016) 653df0d881dSJason Evans 654df0d881dSJason Evans This release is primarily about optimizations, but it also incorporates a lot 655df0d881dSJason Evans of portability-motivated refactoring and enhancements. Many people worked on 656df0d881dSJason Evans this release, to an extent that even with the omission here of minor changes 657df0d881dSJason Evans (see git revision history), and of the people who reported and diagnosed 658df0d881dSJason Evans issues, so much of the work was contributed that starting with this release, 659df0d881dSJason Evans changes are annotated with author credits to help reflect the collaborative 660df0d881dSJason Evans effort involved. 661df0d881dSJason Evans 662df0d881dSJason Evans New features: 663df0d881dSJason Evans - Implement decay-based unused dirty page purging, a major optimization with 664df0d881dSJason Evans mallctl API impact. This is an alternative to the existing ratio-based 665df0d881dSJason Evans unused dirty page purging, and is intended to eventually become the sole 666df0d881dSJason Evans purging mechanism. New mallctls: 667df0d881dSJason Evans + opt.purge 668df0d881dSJason Evans + opt.decay_time 669df0d881dSJason Evans + arena.<i>.decay 670df0d881dSJason Evans + arena.<i>.decay_time 671df0d881dSJason Evans + arenas.decay_time 672df0d881dSJason Evans + stats.arenas.<i>.decay_time 673df0d881dSJason Evans (@jasone, @cevans87) 674df0d881dSJason Evans - Add --with-malloc-conf, which makes it possible to embed a default 675df0d881dSJason Evans options string during configuration. This was motivated by the desire to 676df0d881dSJason Evans specify --with-malloc-conf=purge:decay , since the default must remain 677df0d881dSJason Evans purge:ratio until the 5.0.0 release. (@jasone) 678df0d881dSJason Evans - Add MS Visual Studio 2015 support. (@rustyx, @yuslepukhin) 679df0d881dSJason Evans - Make *allocx() size class overflow behavior defined. The maximum 680df0d881dSJason Evans size class is now less than PTRDIFF_MAX to protect applications against 681df0d881dSJason Evans numerical overflow, and all allocation functions are guaranteed to indicate 682df0d881dSJason Evans errors rather than potentially crashing if the request size exceeds the 683df0d881dSJason Evans maximum size class. (@jasone) 684df0d881dSJason Evans - jeprof: 685df0d881dSJason Evans + Add raw heap profile support. (@jasone) 686df0d881dSJason Evans + Add --retain and --exclude for backtrace symbol filtering. (@jasone) 687df0d881dSJason Evans 688df0d881dSJason Evans Optimizations: 689df0d881dSJason Evans - Optimize the fast path to combine various bootstrapping and configuration 690df0d881dSJason Evans checks and execute more streamlined code in the common case. (@interwq) 691df0d881dSJason Evans - Use linear scan for small bitmaps (used for small object tracking). In 692df0d881dSJason Evans addition to speeding up bitmap operations on 64-bit systems, this reduces 693df0d881dSJason Evans allocator metadata overhead by approximately 0.2%. (@djwatson) 694df0d881dSJason Evans - Separate arena_avail trees, which substantially speeds up run tree 695df0d881dSJason Evans operations. (@djwatson) 696df0d881dSJason Evans - Use memoization (boot-time-computed table) for run quantization. Separate 697df0d881dSJason Evans arena_avail trees reduced the importance of this optimization. (@jasone) 698df0d881dSJason Evans - Attempt mmap-based in-place huge reallocation. This can dramatically speed 699df0d881dSJason Evans up incremental huge reallocation. (@jasone) 700df0d881dSJason Evans 701df0d881dSJason Evans Incompatible changes: 702df0d881dSJason Evans - Make opt.narenas unsigned rather than size_t. (@jasone) 703df0d881dSJason Evans 704df0d881dSJason Evans Bug fixes: 705df0d881dSJason Evans - Fix stats.cactive accounting regression. (@rustyx, @jasone) 706df0d881dSJason Evans - Handle unaligned keys in hash(). This caused problems for some ARM systems. 7071f0a49e8SJason Evans (@jasone, @cferris1000) 708df0d881dSJason Evans - Refactor arenas array. In addition to fixing a fork-related deadlock, this 709df0d881dSJason Evans makes arena lookups faster and simpler. (@jasone) 710df0d881dSJason Evans - Move retained memory allocation out of the default chunk allocation 711df0d881dSJason Evans function, to a location that gets executed even if the application installs 712df0d881dSJason Evans a custom chunk allocation function. This resolves a virtual memory leak. 713df0d881dSJason Evans (@buchgr) 7141f0a49e8SJason Evans - Fix a potential tsd cleanup leak. (@cferris1000, @jasone) 715df0d881dSJason Evans - Fix run quantization. In practice this bug had no impact unless 716df0d881dSJason Evans applications requested memory with alignment exceeding one page. 717df0d881dSJason Evans (@jasone, @djwatson) 718df0d881dSJason Evans - Fix LinuxThreads-specific bootstrapping deadlock. (Cosmin Paraschiv) 719df0d881dSJason Evans - jeprof: 720df0d881dSJason Evans + Don't discard curl options if timeout is not defined. (@djwatson) 721df0d881dSJason Evans + Detect failed profile fetches. (@djwatson) 722df0d881dSJason Evans - Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for 723df0d881dSJason Evans --disable-stats case. (@jasone) 724df0d881dSJason Evans 725ba4f5cc0SJason Evans* 4.0.4 (October 24, 2015) 726ba4f5cc0SJason Evans 727ba4f5cc0SJason Evans This bugfix release fixes another xallocx() regression. No other regressions 728ba4f5cc0SJason Evans have come to light in over a month, so this is likely a good starting point 729ba4f5cc0SJason Evans for people who prefer to wait for "dot one" releases with all the major issues 730ba4f5cc0SJason Evans shaken out. 731ba4f5cc0SJason Evans 732ba4f5cc0SJason Evans Bug fixes: 733ba4f5cc0SJason Evans - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large 734ba4f5cc0SJason Evans allocations that have been randomly assigned an offset of 0 when 735ba4f5cc0SJason Evans --enable-cache-oblivious configure option is enabled. 736ba4f5cc0SJason Evans 737ba4f5cc0SJason Evans* 4.0.3 (September 24, 2015) 738ba4f5cc0SJason Evans 739ba4f5cc0SJason Evans This bugfix release continues the trend of xallocx() and heap profiling fixes. 740ba4f5cc0SJason Evans 741ba4f5cc0SJason Evans Bug fixes: 742ba4f5cc0SJason Evans - Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large 743ba4f5cc0SJason Evans allocations when --enable-cache-oblivious configure option is enabled. 744ba4f5cc0SJason Evans - Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations 745ba4f5cc0SJason Evans when resizing from/to a size class that is not a multiple of the chunk size. 746ba4f5cc0SJason Evans - Fix prof_tctx_dump_iter() to filter out nodes that were created after heap 747ba4f5cc0SJason Evans profile dumping started. 748ba4f5cc0SJason Evans - Work around a potentially bad thread-specific data initialization 749ba4f5cc0SJason Evans interaction with NPTL (glibc's pthreads implementation). 750ba4f5cc0SJason Evans 751536b3538SJason Evans* 4.0.2 (September 21, 2015) 752536b3538SJason Evans 753536b3538SJason Evans This bugfix release addresses a few bugs specific to heap profiling. 754536b3538SJason Evans 755536b3538SJason Evans Bug fixes: 756536b3538SJason Evans - Fix ixallocx_prof_sample() to never modify nor create sampled small 757536b3538SJason Evans allocations. xallocx() is in general incapable of moving small allocations, 758536b3538SJason Evans so this fix removes buggy code without loss of generality. 759536b3538SJason Evans - Fix irallocx_prof_sample() to always allocate large regions, even when 760536b3538SJason Evans alignment is non-zero. 761536b3538SJason Evans - Fix prof_alloc_rollback() to read tdata from thread-specific data rather 762536b3538SJason Evans than dereferencing a potentially invalid tctx. 763536b3538SJason Evans 764536b3538SJason Evans* 4.0.1 (September 15, 2015) 765536b3538SJason Evans 766536b3538SJason Evans This is a bugfix release that is somewhat high risk due to the amount of 767536b3538SJason Evans refactoring required to address deep xallocx() problems. As a side effect of 768536b3538SJason Evans these fixes, xallocx() now tries harder to partially fulfill requests for 769536b3538SJason Evans optional extra space. Note that a couple of minor heap profiling 770536b3538SJason Evans optimizations are included, but these are better thought of as performance 7710ef50b4eSJason Evans fixes that were integral to discovering most of the other bugs. 772536b3538SJason Evans 773536b3538SJason Evans Optimizations: 774536b3538SJason Evans - Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the 775536b3538SJason Evans fast path when heap profiling is enabled. Additionally, split a special 776536b3538SJason Evans case out into arena_prof_tctx_reset(), which also avoids chunk metadata 777536b3538SJason Evans reads. 778536b3538SJason Evans - Optimize irallocx_prof() to optimistically update the sampler state. The 779536b3538SJason Evans prior implementation appears to have been a holdover from when 780536b3538SJason Evans rallocx()/xallocx() functionality was combined as rallocm(). 781536b3538SJason Evans 782536b3538SJason Evans Bug fixes: 783536b3538SJason Evans - Fix TLS configuration such that it is enabled by default for platforms on 784536b3538SJason Evans which it works correctly. 785536b3538SJason Evans - Fix arenas_cache_cleanup() and arena_get_hard() to handle 786536b3538SJason Evans allocation/deallocation within the application's thread-specific data 787536b3538SJason Evans cleanup functions even after arenas_cache is torn down. 788536b3538SJason Evans - Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS. 789536b3538SJason Evans - Fix chunk purge hook calls for in-place huge shrinking reallocation to 790536b3538SJason Evans specify the old chunk size rather than the new chunk size. This bug caused 791536b3538SJason Evans no correctness issues for the default chunk purge function, but was 792536b3538SJason Evans visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl. 793536b3538SJason Evans - Fix heap profiling bugs: 794536b3538SJason Evans + Fix heap profiling to distinguish among otherwise identical sample sites 795536b3538SJason Evans with interposed resets (triggered via the "prof.reset" mallctl). This bug 796536b3538SJason Evans could cause data structure corruption that would most likely result in a 797536b3538SJason Evans segfault. 798536b3538SJason Evans + Fix irealloc_prof() to prof_alloc_rollback() on OOM. 799536b3538SJason Evans + Make one call to prof_active_get_unlocked() per allocation event, and use 800536b3538SJason Evans the result throughout the relevant functions that handle an allocation 801536b3538SJason Evans event. Also add a missing check in prof_realloc(). These fixes protect 802536b3538SJason Evans allocation events against concurrent prof_active changes. 803536b3538SJason Evans + Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample() 804536b3538SJason Evans in the correct order. 805536b3538SJason Evans + Fix prof_realloc() to call prof_free_sampled_object() after calling 806536b3538SJason Evans prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were 807536b3538SJason Evans the same, the tctx could have been prematurely destroyed. 808536b3538SJason Evans - Fix portability bugs: 809536b3538SJason Evans + Don't bitshift by negative amounts when encoding/decoding run sizes in 810536b3538SJason Evans chunk header maps. This affected systems with page sizes greater than 8 811536b3538SJason Evans KiB. 812536b3538SJason Evans + Rename index_t to szind_t to avoid an existing type on Solaris. 813536b3538SJason Evans + Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to 814536b3538SJason Evans match glibc and avoid compilation errors when including both 815536b3538SJason Evans jemalloc/jemalloc.h and malloc.h in C++ code. 816536b3538SJason Evans + Don't assume that /bin/sh is appropriate when running size_classes.sh 817536b3538SJason Evans during configuration. 818536b3538SJason Evans + Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM. 819536b3538SJason Evans + Link tests to librt if it contains clock_gettime(2). 820536b3538SJason Evans 821d0e79aa3SJason Evans* 4.0.0 (August 17, 2015) 822d0e79aa3SJason Evans 823d0e79aa3SJason Evans This version contains many speed and space optimizations, both minor and 824d0e79aa3SJason Evans major. The major themes are generalization, unification, and simplification. 825d0e79aa3SJason Evans Although many of these optimizations cause no visible behavior change, their 826d0e79aa3SJason Evans cumulative effect is substantial. 827d0e79aa3SJason Evans 828d0e79aa3SJason Evans New features: 829d0e79aa3SJason Evans - Normalize size class spacing to be consistent across the complete size 830d0e79aa3SJason Evans range. By default there are four size classes per size doubling, but this 831d0e79aa3SJason Evans is now configurable via the --with-lg-size-class-group option. Also add the 832d0e79aa3SJason Evans --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and 833d0e79aa3SJason Evans --with-lg-tiny-min options, which can be used to tweak page and size class 834d0e79aa3SJason Evans settings. Impacts: 835d0e79aa3SJason Evans + Worst case performance for incrementally growing/shrinking reallocation 836d0e79aa3SJason Evans is improved because there are far fewer size classes, and therefore 837d0e79aa3SJason Evans copying happens less often. 838d0e79aa3SJason Evans + Internal fragmentation is limited to 20% for all but the smallest size 839d0e79aa3SJason Evans classes (those less than four times the quantum). (1B + 4 KiB) 840d0e79aa3SJason Evans and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation. 841d0e79aa3SJason Evans + Chunk fragmentation tends to be lower because there are fewer distinct run 842d0e79aa3SJason Evans sizes to pack. 843d0e79aa3SJason Evans - Add support for explicit tcaches. The "tcache.create", "tcache.flush", and 844d0e79aa3SJason Evans "tcache.destroy" mallctls control tcache lifetime and flushing, and the 845d0e79aa3SJason Evans MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API 846d0e79aa3SJason Evans control which tcache is used for each operation. 847d0e79aa3SJason Evans - Implement per thread heap profiling, as well as the ability to 848d0e79aa3SJason Evans enable/disable heap profiling on a per thread basis. Add the "prof.reset", 849d0e79aa3SJason Evans "prof.lg_sample", "thread.prof.name", "thread.prof.active", 850d0e79aa3SJason Evans "opt.prof_thread_active_init", "prof.thread_active_init", and 851d0e79aa3SJason Evans "thread.prof.active" mallctls. 852d0e79aa3SJason Evans - Add support for per arena application-specified chunk allocators, configured 853d0e79aa3SJason Evans via the "arena.<i>.chunk_hooks" mallctl. 854d0e79aa3SJason Evans - Refactor huge allocation to be managed by arenas, so that arenas now 855d0e79aa3SJason Evans function as general purpose independent allocators. This is important in 856d0e79aa3SJason Evans the context of user-specified chunk allocators, aside from the scalability 857d0e79aa3SJason Evans benefits. Related new statistics: 858d0e79aa3SJason Evans + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc", 859d0e79aa3SJason Evans "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests" 860d0e79aa3SJason Evans mallctls provide high level per arena huge allocation statistics. 861d0e79aa3SJason Evans + The "arenas.nhchunks", "arenas.hchunk.<i>.size", 862d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.nmalloc", 863d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.ndalloc", 864d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.nrequests", and 865d0e79aa3SJason Evans "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class 866d0e79aa3SJason Evans statistics. 867d0e79aa3SJason Evans - Add the 'util' column to malloc_stats_print() output, which reports the 868d0e79aa3SJason Evans proportion of available regions that are currently in use for each small 869d0e79aa3SJason Evans size class. 870d0e79aa3SJason Evans - Add "alloc" and "free" modes for for junk filling (see the "opt.junk" 871d0e79aa3SJason Evans mallctl), so that it is possible to separately enable junk filling for 872d0e79aa3SJason Evans allocation versus deallocation. 873d0e79aa3SJason Evans - Add the jemalloc-config script, which provides information about how 874d0e79aa3SJason Evans jemalloc was configured, and how to integrate it into application builds. 875d0e79aa3SJason Evans - Add metadata statistics, which are accessible via the "stats.metadata", 876d0e79aa3SJason Evans "stats.arenas.<i>.metadata.mapped", and 877d0e79aa3SJason Evans "stats.arenas.<i>.metadata.allocated" mallctls. 878d0e79aa3SJason Evans - Add the "stats.resident" mallctl, which reports the upper limit of 879d0e79aa3SJason Evans physically resident memory mapped by the allocator. 880d0e79aa3SJason Evans - Add per arena control over unused dirty page purging, via the 881d0e79aa3SJason Evans "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and 882d0e79aa3SJason Evans "stats.arenas.<i>.lg_dirty_mult" mallctls. 883d0e79aa3SJason Evans - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump 884d0e79aa3SJason Evans feature on/off during program execution. 885d0e79aa3SJason Evans - Add sdallocx(), which implements sized deallocation. The primary 886d0e79aa3SJason Evans optimization over dallocx() is the removal of a metadata read, which often 887d0e79aa3SJason Evans suffers an L1 cache miss. 888d0e79aa3SJason Evans - Add missing header includes in jemalloc/jemalloc.h, so that applications 889d0e79aa3SJason Evans only have to #include <jemalloc/jemalloc.h>. 890d0e79aa3SJason Evans - Add support for additional platforms: 891d0e79aa3SJason Evans + Bitrig 892d0e79aa3SJason Evans + Cygwin 893d0e79aa3SJason Evans + DragonFlyBSD 894d0e79aa3SJason Evans + iOS 895d0e79aa3SJason Evans + OpenBSD 896d0e79aa3SJason Evans + OpenRISC/or1k 897d0e79aa3SJason Evans 898d0e79aa3SJason Evans Optimizations: 899d0e79aa3SJason Evans - Maintain dirty runs in per arena LRUs rather than in per arena trees of 900d0e79aa3SJason Evans dirty-run-containing chunks. In practice this change significantly reduces 901d0e79aa3SJason Evans dirty page purging volume. 902d0e79aa3SJason Evans - Integrate whole chunks into the unused dirty page purging machinery. This 903d0e79aa3SJason Evans reduces the cost of repeated huge allocation/deallocation, because it 904d0e79aa3SJason Evans effectively introduces a cache of chunks. 905d0e79aa3SJason Evans - Split the arena chunk map into two separate arrays, in order to increase 906d0e79aa3SJason Evans cache locality for the frequently accessed bits. 907d0e79aa3SJason Evans - Move small run metadata out of runs, into arena chunk headers. This reduces 908d0e79aa3SJason Evans run fragmentation, smaller runs reduce external fragmentation for small size 909d0e79aa3SJason Evans classes, and packed (less uniformly aligned) metadata layout improves CPU 910d0e79aa3SJason Evans cache set distribution. 911d0e79aa3SJason Evans - Randomly distribute large allocation base pointer alignment relative to page 912d0e79aa3SJason Evans boundaries in order to more uniformly utilize CPU cache sets. This can be 913d0e79aa3SJason Evans disabled via the --disable-cache-oblivious configure option, and queried via 914d0e79aa3SJason Evans the "config.cache_oblivious" mallctl. 915d0e79aa3SJason Evans - Micro-optimize the fast paths for the public API functions. 916d0e79aa3SJason Evans - Refactor thread-specific data to reside in a single structure. This assures 917d0e79aa3SJason Evans that only a single TLS read is necessary per call into the public API. 918d0e79aa3SJason Evans - Implement in-place huge allocation growing and shrinking. 919d0e79aa3SJason Evans - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make 920d0e79aa3SJason Evans additional optimizations that reduce maximum lookup depth to one or two 921d0e79aa3SJason Evans levels. This resolves what was a concurrency bottleneck for per arena huge 922d0e79aa3SJason Evans allocation, because a global data structure is critical for determining 923d0e79aa3SJason Evans which arenas own which huge allocations. 924d0e79aa3SJason Evans 925d0e79aa3SJason Evans Incompatible changes: 926d0e79aa3SJason Evans - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious 927d0e79aa3SJason Evans warnings by default. 928d0e79aa3SJason Evans - Assure that the constness of malloc_usable_size()'s return type matches that 929d0e79aa3SJason Evans of the system implementation. 930d0e79aa3SJason Evans - Change the heap profile dump format to support per thread heap profiling, 931d0e79aa3SJason Evans rename pprof to jeprof, and enhance it with the --thread=<n> option. As a 932d0e79aa3SJason Evans result, the bundled jeprof must now be used rather than the upstream 933d0e79aa3SJason Evans (gperftools) pprof. 934d0e79aa3SJason Evans - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can 935d0e79aa3SJason Evans internally deadlock on some platforms. 936d0e79aa3SJason Evans - Change the "arenas.nlruns" mallctl type from size_t to unsigned. 937d0e79aa3SJason Evans - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with 938d0e79aa3SJason Evans "stats.arenas.<i>.bins.<j>.curregs". 939d0e79aa3SJason Evans - Ignore MALLOC_CONF in set{uid,gid,cap} binaries. 940d0e79aa3SJason Evans - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the 941d0e79aa3SJason Evans MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage. 942d0e79aa3SJason Evans 943d0e79aa3SJason Evans Removed features: 944d0e79aa3SJason Evans - Remove the *allocm() API, which is superseded by the *allocx() API. 945d0e79aa3SJason Evans - Remove the --enable-dss options, and make dss non-optional on all platforms 946d0e79aa3SJason Evans which support sbrk(2). 947d0e79aa3SJason Evans - Remove the "arenas.purge" mallctl, which was obsoleted by the 948d0e79aa3SJason Evans "arena.<i>.purge" mallctl in 3.1.0. 949d0e79aa3SJason Evans - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically 950d0e79aa3SJason Evans detects whether it is running inside Valgrind. 951d0e79aa3SJason Evans - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and 952d0e79aa3SJason Evans "stats.huge.ndalloc" mallctls. 953d0e79aa3SJason Evans - Remove the --enable-mremap option. 954d0e79aa3SJason Evans - Remove the "stats.chunks.current", "stats.chunks.total", and 955d0e79aa3SJason Evans "stats.chunks.high" mallctls. 956d0e79aa3SJason Evans 957d0e79aa3SJason Evans Bug fixes: 958d0e79aa3SJason Evans - Fix the cactive statistic to decrease (rather than increase) when active 959d0e79aa3SJason Evans memory decreases. This regression was first released in 3.5.0. 960d0e79aa3SJason Evans - Fix OOM handling in memalign() and valloc(). A variant of this bug existed 961d0e79aa3SJason Evans in all releases since 2.0.0, which introduced these functions. 962d0e79aa3SJason Evans - Fix an OOM-related regression in arena_tcache_fill_small(), which could 963d0e79aa3SJason Evans cause cache corruption on OOM. This regression was present in all releases 964d0e79aa3SJason Evans from 2.2.0 through 3.6.0. 965d0e79aa3SJason Evans - Fix size class overflow handling for malloc(), posix_memalign(), memalign(), 966d0e79aa3SJason Evans calloc(), and realloc() when profiling is enabled. 967d0e79aa3SJason Evans - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or 968d0e79aa3SJason Evans "secondary" precedence is specified, but sbrk(2) is not supported. 969d0e79aa3SJason Evans - Fix fallback lg_floor() implementations to handle extremely large inputs. 970d0e79aa3SJason Evans - Ensure the default purgeable zone is after the default zone on OS X. 971d0e79aa3SJason Evans - Fix latent bugs in atomic_*(). 972d0e79aa3SJason Evans - Fix the "arena.<i>.dss" mallctl to handle read-only calls. 973d0e79aa3SJason Evans - Fix tls_model configuration to enable the initial-exec model when possible. 974d0e79aa3SJason Evans - Mark malloc_conf as a weak symbol so that the application can override it. 975d0e79aa3SJason Evans - Correctly detect glibc's adaptive pthread mutexes. 976d0e79aa3SJason Evans - Fix the --without-export configure option. 977d0e79aa3SJason Evans 9782fff27f8SJason Evans* 3.6.0 (March 31, 2014) 9792fff27f8SJason Evans 9802fff27f8SJason Evans This version contains a critical bug fix for a regression present in 3.5.0 and 9812fff27f8SJason Evans 3.5.1. 9822fff27f8SJason Evans 9832fff27f8SJason Evans Bug fixes: 9842fff27f8SJason Evans - Fix a regression in arena_chunk_alloc() that caused crashes during 9852fff27f8SJason Evans small/large allocation if chunk allocation failed. In the absence of this 9862fff27f8SJason Evans bug, chunk allocation failure would result in allocation failure, e.g. NULL 9872fff27f8SJason Evans return from malloc(). This regression was introduced in 3.5.0. 9882fff27f8SJason Evans - Fix backtracing for gcc intrinsics-based backtracing by specifying 9892fff27f8SJason Evans -fno-omit-frame-pointer to gcc. Note that the application (and all the 9902fff27f8SJason Evans libraries it links to) must also be compiled with this option for 9912fff27f8SJason Evans backtracing to be reliable. 9922fff27f8SJason Evans - Use dss allocation precedence for huge allocations as well as small/large 9932fff27f8SJason Evans allocations. 994d0e79aa3SJason Evans - Fix test assertion failure message formatting. This bug did not manifest on 9952fff27f8SJason Evans x86_64 systems because of implementation subtleties in va_list. 9962fff27f8SJason Evans - Fix inconsequential test failures for hash and SFMT code. 9972fff27f8SJason Evans 9982fff27f8SJason Evans New features: 9992fff27f8SJason Evans - Support heap profiling on FreeBSD. This feature depends on the proc 10002fff27f8SJason Evans filesystem being mounted during heap profile dumping. 10012fff27f8SJason Evans 1002706d9bd1SJason Evans* 3.5.1 (February 25, 2014) 1003706d9bd1SJason Evans 1004706d9bd1SJason Evans This version primarily addresses minor bugs in test code. 1005706d9bd1SJason Evans 1006706d9bd1SJason Evans Bug fixes: 1007706d9bd1SJason Evans - Configure Solaris/Illumos to use MADV_FREE. 1008706d9bd1SJason Evans - Fix junk filling for mremap(2)-based huge reallocation. This is only 1009706d9bd1SJason Evans relevant if configuring with the --enable-mremap option specified. 1010706d9bd1SJason Evans - Avoid compilation failure if 'restrict' C99 keyword is not supported by the 1011706d9bd1SJason Evans compiler. 1012706d9bd1SJason Evans - Add a configure test for SSE2 rather than assuming it is usable on i686 1013706d9bd1SJason Evans systems. This fixes test compilation errors, especially on 32-bit Linux 1014706d9bd1SJason Evans systems. 1015706d9bd1SJason Evans - Fix mallctl argument size mismatches (size_t vs. uint64_t) in the stats unit 1016706d9bd1SJason Evans test. 1017706d9bd1SJason Evans - Fix/remove flawed alignment-related overflow tests. 1018706d9bd1SJason Evans - Prevent compiler optimizations that could change backtraces in the 1019706d9bd1SJason Evans prof_accum unit test. 1020a4bd5210SJason Evans 1021f921d10fSJason Evans* 3.5.0 (January 22, 2014) 1022f921d10fSJason Evans 1023f921d10fSJason Evans This version focuses on refactoring and automated testing, though it also 1024f921d10fSJason Evans includes some non-trivial heap profiling optimizations not mentioned below. 1025f921d10fSJason Evans 1026f921d10fSJason Evans New features: 1027f921d10fSJason Evans - Add the *allocx() API, which is a successor to the experimental *allocm() 1028f921d10fSJason Evans API. The *allocx() functions are slightly simpler to use because they have 1029f921d10fSJason Evans fewer parameters, they directly return the results of primary interest, and 1030f921d10fSJason Evans mallocx()/rallocx() avoid the strict aliasing pitfall that 1031706d9bd1SJason Evans allocm()/rallocm() share with posix_memalign(). Note that *allocm() is 1032f921d10fSJason Evans slated for removal in the next non-bugfix release. 1033f921d10fSJason Evans - Add support for LinuxThreads. 1034f921d10fSJason Evans 1035f921d10fSJason Evans Bug fixes: 1036f921d10fSJason Evans - Unless heap profiling is enabled, disable floating point code and don't link 1037f921d10fSJason Evans with libm. This, in combination with e.g. EXTRA_CFLAGS=-mno-sse on x64 1038f921d10fSJason Evans systems, makes it possible to completely disable floating point register 1039f921d10fSJason Evans use. Some versions of glibc neglect to save/restore caller-saved floating 1040f921d10fSJason Evans point registers during dynamic lazy symbol loading, and the symbol loading 1041f921d10fSJason Evans code uses whatever malloc the application happens to have linked/loaded 1042f921d10fSJason Evans with, the result being potential floating point register corruption. 1043f921d10fSJason Evans - Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling 1044f921d10fSJason Evans backtrace creation in imemalign(). This bug impacted posix_memalign() and 1045f921d10fSJason Evans aligned_alloc(). 1046f921d10fSJason Evans - Fix a file descriptor leak in a prof_dump_maps() error path. 1047f921d10fSJason Evans - Fix prof_dump() to close the dump file descriptor for all relevant error 1048f921d10fSJason Evans paths. 1049f921d10fSJason Evans - Fix rallocm() to use the arena specified by the ALLOCM_ARENA(s) flag for 1050f921d10fSJason Evans allocation, not just deallocation. 1051f921d10fSJason Evans - Fix a data race for large allocation stats counters. 1052f921d10fSJason Evans - Fix a potential infinite loop during thread exit. This bug occurred on 1053f921d10fSJason Evans Solaris, and could affect other platforms with similar pthreads TSD 1054f921d10fSJason Evans implementations. 1055f921d10fSJason Evans - Don't junk-fill reallocations unless usable size changes. This fixes a 1056f921d10fSJason Evans violation of the *allocx()/*allocm() semantics. 1057f921d10fSJason Evans - Fix growing large reallocation to junk fill new space. 1058f921d10fSJason Evans - Fix huge deallocation to junk fill when munmap is disabled. 1059f921d10fSJason Evans - Change the default private namespace prefix from empty to je_, and change 1060f921d10fSJason Evans --with-private-namespace-prefix so that it prepends an additional prefix 1061f921d10fSJason Evans rather than replacing je_. This reduces the likelihood of applications 1062f921d10fSJason Evans which statically link jemalloc experiencing symbol name collisions. 1063f921d10fSJason Evans - Add missing private namespace mangling (relevant when 1064f921d10fSJason Evans --with-private-namespace is specified). 1065f921d10fSJason Evans - Add and use JEMALLOC_INLINE_C so that static inline functions are marked as 1066f921d10fSJason Evans static even for debug builds. 1067f921d10fSJason Evans - Add a missing mutex unlock in a malloc_init_hard() error path. In practice 1068f921d10fSJason Evans this error path is never executed. 1069f921d10fSJason Evans - Fix numerous bugs in malloc_strotumax() error handling/reporting. These 1070f921d10fSJason Evans bugs had no impact except for malformed inputs. 1071f921d10fSJason Evans - Fix numerous bugs in malloc_snprintf(). These bugs were not exercised by 1072f921d10fSJason Evans existing calls, so they had no impact. 1073f921d10fSJason Evans 10742b06b201SJason Evans* 3.4.1 (October 20, 2013) 10752b06b201SJason Evans 10762b06b201SJason Evans Bug fixes: 10772b06b201SJason Evans - Fix a race in the "arenas.extend" mallctl that could cause memory corruption 10782b06b201SJason Evans of internal data structures and subsequent crashes. 10792b06b201SJason Evans - Fix Valgrind integration flaws that caused Valgrind warnings about reads of 10802b06b201SJason Evans uninitialized memory in: 10812b06b201SJason Evans + arena chunk headers 10822b06b201SJason Evans + internal zero-initialized data structures (relevant to tcache and prof 10832b06b201SJason Evans code) 10842b06b201SJason Evans - Preserve errno during the first allocation. A readlink(2) call during 10852b06b201SJason Evans initialization fails unless /etc/malloc.conf exists, so errno was typically 10862b06b201SJason Evans set during the first allocation prior to this fix. 10872b06b201SJason Evans - Fix compilation warnings reported by gcc 4.8.1. 10882b06b201SJason Evans 1089f8ca2db1SJason Evans* 3.4.0 (June 2, 2013) 1090f8ca2db1SJason Evans 1091f8ca2db1SJason Evans This version is essentially a small bugfix release, but the addition of 1092f8ca2db1SJason Evans aarch64 support requires that the minor version be incremented. 1093f8ca2db1SJason Evans 1094f8ca2db1SJason Evans Bug fixes: 1095f8ca2db1SJason Evans - Fix race-triggered deadlocks in chunk_record(). These deadlocks were 1096f8ca2db1SJason Evans typically triggered by multiple threads concurrently deallocating huge 1097f8ca2db1SJason Evans objects. 1098f8ca2db1SJason Evans 1099f8ca2db1SJason Evans New features: 1100f8ca2db1SJason Evans - Add support for the aarch64 architecture. 1101f8ca2db1SJason Evans 1102f8ca2db1SJason Evans* 3.3.1 (March 6, 2013) 1103f8ca2db1SJason Evans 1104f8ca2db1SJason Evans This version fixes bugs that are typically encountered only when utilizing 1105f8ca2db1SJason Evans custom run-time options. 1106f8ca2db1SJason Evans 1107f8ca2db1SJason Evans Bug fixes: 1108f8ca2db1SJason Evans - Fix a locking order bug that could cause deadlock during fork if heap 1109f8ca2db1SJason Evans profiling were enabled. 1110f8ca2db1SJason Evans - Fix a chunk recycling bug that could cause the allocator to lose track of 1111f8ca2db1SJason Evans whether a chunk was zeroed. On FreeBSD, NetBSD, and OS X, it could cause 1112f8ca2db1SJason Evans corruption if allocating via sbrk(2) (unlikely unless running with the 1113f8ca2db1SJason Evans "dss:primary" option specified). This was completely harmless on Linux 1114f8ca2db1SJason Evans unless using mlockall(2) (and unlikely even then, unless the 1115f8ca2db1SJason Evans --disable-munmap configure option or the "dss:primary" option was 1116f8ca2db1SJason Evans specified). This regression was introduced in 3.1.0 by the 1117f8ca2db1SJason Evans mlockall(2)/madvise(2) interaction fix. 1118f8ca2db1SJason Evans - Fix TLS-related memory corruption that could occur during thread exit if the 1119f8ca2db1SJason Evans thread never allocated memory. Only the quarantine and prof facilities were 1120f8ca2db1SJason Evans susceptible. 1121f8ca2db1SJason Evans - Fix two quarantine bugs: 1122f8ca2db1SJason Evans + Internal reallocation of the quarantined object array leaked the old 1123f8ca2db1SJason Evans array. 1124f8ca2db1SJason Evans + Reallocation failure for internal reallocation of the quarantined object 1125f8ca2db1SJason Evans array (very unlikely) resulted in memory corruption. 1126f8ca2db1SJason Evans - Fix Valgrind integration to annotate all internally allocated memory in a 1127f8ca2db1SJason Evans way that keeps Valgrind happy about internal data structure access. 1128f8ca2db1SJason Evans - Fix building for s390 systems. 1129f8ca2db1SJason Evans 113088ad2f8dSJason Evans* 3.3.0 (January 23, 2013) 113188ad2f8dSJason Evans 113288ad2f8dSJason Evans This version includes a few minor performance improvements in addition to the 113388ad2f8dSJason Evans listed new features and bug fixes. 113488ad2f8dSJason Evans 113588ad2f8dSJason Evans New features: 113688ad2f8dSJason Evans - Add clipping support to lg_chunk option processing. 113788ad2f8dSJason Evans - Add the --enable-ivsalloc option. 113888ad2f8dSJason Evans - Add the --without-export option. 113988ad2f8dSJason Evans - Add the --disable-zone-allocator option. 114088ad2f8dSJason Evans 114188ad2f8dSJason Evans Bug fixes: 114288ad2f8dSJason Evans - Fix "arenas.extend" mallctl to output the number of arenas. 11432b06b201SJason Evans - Fix chunk_recycle() to unconditionally inform Valgrind that returned memory 114488ad2f8dSJason Evans is undefined. 114588ad2f8dSJason Evans - Fix build break on FreeBSD related to alloca.h. 114688ad2f8dSJason Evans 114782872ac0SJason Evans* 3.2.0 (November 9, 2012) 114882872ac0SJason Evans 114982872ac0SJason Evans In addition to a couple of bug fixes, this version modifies page run 115082872ac0SJason Evans allocation and dirty page purging algorithms in order to better control 115182872ac0SJason Evans page-level virtual memory fragmentation. 115282872ac0SJason Evans 115382872ac0SJason Evans Incompatible changes: 115482872ac0SJason Evans - Change the "opt.lg_dirty_mult" default from 5 to 3 (32:1 to 8:1). 115582872ac0SJason Evans 115682872ac0SJason Evans Bug fixes: 115782872ac0SJason Evans - Fix dss/mmap allocation precedence code to use recyclable mmap memory only 115882872ac0SJason Evans after primary dss allocation fails. 115982872ac0SJason Evans - Fix deadlock in the "arenas.purge" mallctl. This regression was introduced 116082872ac0SJason Evans in 3.1.0 by the addition of the "arena.<i>.purge" mallctl. 116182872ac0SJason Evans 116282872ac0SJason Evans* 3.1.0 (October 16, 2012) 116382872ac0SJason Evans 116482872ac0SJason Evans New features: 116582872ac0SJason Evans - Auto-detect whether running inside Valgrind, thus removing the need to 116682872ac0SJason Evans manually specify MALLOC_CONF=valgrind:true. 116782872ac0SJason Evans - Add the "arenas.extend" mallctl, which allows applications to create 116882872ac0SJason Evans manually managed arenas. 116982872ac0SJason Evans - Add the ALLOCM_ARENA() flag for {,r,d}allocm(). 117082872ac0SJason Evans - Add the "opt.dss", "arena.<i>.dss", and "stats.arenas.<i>.dss" mallctls, 117182872ac0SJason Evans which provide control over dss/mmap precedence. 117282872ac0SJason Evans - Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge". 117382872ac0SJason Evans - Define LG_QUANTUM for hppa. 117482872ac0SJason Evans 117582872ac0SJason Evans Incompatible changes: 117682872ac0SJason Evans - Disable tcache by default if running inside Valgrind, in order to avoid 117782872ac0SJason Evans making unallocated objects appear reachable to Valgrind. 117882872ac0SJason Evans - Drop const from malloc_usable_size() argument on Linux. 117982872ac0SJason Evans 118082872ac0SJason Evans Bug fixes: 118182872ac0SJason Evans - Fix heap profiling crash if sampled object is freed via realloc(p, 0). 118282872ac0SJason Evans - Remove const from __*_hook variable declarations, so that glibc can modify 118382872ac0SJason Evans them during process forking. 118482872ac0SJason Evans - Fix mlockall(2)/madvise(2) interaction. 118582872ac0SJason Evans - Fix fork(2)-related deadlocks. 118682872ac0SJason Evans - Fix error return value for "thread.tcache.enabled" mallctl. 118782872ac0SJason Evans 118835dad073SJason Evans* 3.0.0 (May 11, 2012) 1189a4bd5210SJason Evans 1190a4bd5210SJason Evans Although this version adds some major new features, the primary focus is on 1191a4bd5210SJason Evans internal code cleanup that facilitates maintainability and portability, most 1192a4bd5210SJason Evans of which is not reflected in the ChangeLog. This is the first release to 1193a4bd5210SJason Evans incorporate substantial contributions from numerous other developers, and the 1194a4bd5210SJason Evans result is a more broadly useful allocator (see the git revision history for 1195a4bd5210SJason Evans contribution details). Note that the license has been unified, thanks to 1196a4bd5210SJason Evans Facebook granting a license under the same terms as the other copyright 1197a4bd5210SJason Evans holders (see COPYING). 1198a4bd5210SJason Evans 1199a4bd5210SJason Evans New features: 1200a4bd5210SJason Evans - Implement Valgrind support, redzones, and quarantine. 1201e722f8f8SJason Evans - Add support for additional platforms: 1202a4bd5210SJason Evans + FreeBSD 1203a4bd5210SJason Evans + Mac OS X Lion 1204e722f8f8SJason Evans + MinGW 120535dad073SJason Evans + Windows (no support yet for replacing the system malloc) 1206a4bd5210SJason Evans - Add support for additional architectures: 1207a4bd5210SJason Evans + MIPS 1208a4bd5210SJason Evans + SH4 1209a4bd5210SJason Evans + Tilera 1210a4bd5210SJason Evans - Add support for cross compiling. 1211a4bd5210SJason Evans - Add nallocm(), which rounds a request size up to the nearest size class 1212a4bd5210SJason Evans without actually allocating. 1213a4bd5210SJason Evans - Implement aligned_alloc() (blame C11). 1214a4bd5210SJason Evans - Add the "thread.tcache.enabled" mallctl. 12158ed34ab0SJason Evans - Add the "opt.prof_final" mallctl. 12168ed34ab0SJason Evans - Update pprof (from gperftools 2.0). 121735dad073SJason Evans - Add the --with-mangling option. 121835dad073SJason Evans - Add the --disable-experimental option. 121935dad073SJason Evans - Add the --disable-munmap option, and make it the default on Linux. 122035dad073SJason Evans - Add the --enable-mremap option, which disables use of mremap(2) by default. 1221a4bd5210SJason Evans 1222a4bd5210SJason Evans Incompatible changes: 1223a4bd5210SJason Evans - Enable stats by default. 1224a4bd5210SJason Evans - Enable fill by default. 1225a4bd5210SJason Evans - Disable lazy locking by default. 1226a4bd5210SJason Evans - Rename the "tcache.flush" mallctl to "thread.tcache.flush". 1227a4bd5210SJason Evans - Rename the "arenas.pagesize" mallctl to "arenas.page". 12288ed34ab0SJason Evans - Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB). 12298ed34ab0SJason Evans - Change the "opt.prof_accum" default from true to false. 1230a4bd5210SJason Evans 1231a4bd5210SJason Evans Removed features: 1232a4bd5210SJason Evans - Remove the swap feature, including the "config.swap", "swap.avail", 1233a4bd5210SJason Evans "swap.prezeroed", "swap.nfds", and "swap.fds" mallctls. 1234a4bd5210SJason Evans - Remove highruns statistics, including the 1235a4bd5210SJason Evans "stats.arenas.<i>.bins.<j>.highruns" and 1236a4bd5210SJason Evans "stats.arenas.<i>.lruns.<j>.highruns" mallctls. 1237a4bd5210SJason Evans - As part of small size class refactoring, remove the "opt.lg_[qc]space_max", 1238a4bd5210SJason Evans "arenas.cacheline", "arenas.subpage", "arenas.[tqcs]space_{min,max}", and 1239a4bd5210SJason Evans "arenas.[tqcs]bins" mallctls. 1240a4bd5210SJason Evans - Remove the "arenas.chunksize" mallctl. 1241a4bd5210SJason Evans - Remove the "opt.lg_prof_tcmax" option. 1242a4bd5210SJason Evans - Remove the "opt.lg_prof_bt_max" option. 1243a4bd5210SJason Evans - Remove the "opt.lg_tcache_gc_sweep" option. 1244a4bd5210SJason Evans - Remove the --disable-tiny option, including the "config.tiny" mallctl. 1245a4bd5210SJason Evans - Remove the --enable-dynamic-page-shift configure option. 1246a4bd5210SJason Evans - Remove the --enable-sysv configure option. 1247a4bd5210SJason Evans 1248a4bd5210SJason Evans Bug fixes: 1249a4bd5210SJason Evans - Fix a statistics-related bug in the "thread.arena" mallctl that could cause 1250a4bd5210SJason Evans invalid statistics and crashes. 1251e722f8f8SJason Evans - Work around TLS deallocation via free() on Linux. This bug could cause 1252a4bd5210SJason Evans write-after-free memory corruption. 1253e722f8f8SJason Evans - Fix a potential deadlock that could occur during interval- and 1254e722f8f8SJason Evans growth-triggered heap profile dumps. 125535dad073SJason Evans - Fix large calloc() zeroing bugs due to dropping chunk map unzeroed flags. 12564bcb1430SJason Evans - Fix chunk_alloc_dss() to stop claiming memory is zeroed. This bug could 12574bcb1430SJason Evans cause memory corruption and crashes with --enable-dss specified. 1258e722f8f8SJason Evans - Fix fork-related bugs that could cause deadlock in children between fork 1259e722f8f8SJason Evans and exec. 1260a4bd5210SJason Evans - Fix malloc_stats_print() to honor 'b' and 'l' in the opts parameter. 1261a4bd5210SJason Evans - Fix realloc(p, 0) to act like free(p). 1262a4bd5210SJason Evans - Do not enforce minimum alignment in memalign(). 1263a4bd5210SJason Evans - Check for NULL pointer in malloc_usable_size(). 1264e722f8f8SJason Evans - Fix an off-by-one heap profile statistics bug that could be observed in 1265e722f8f8SJason Evans interval- and growth-triggered heap profiles. 1266e722f8f8SJason Evans - Fix the "epoch" mallctl to update cached stats even if the passed in epoch 1267e722f8f8SJason Evans is 0. 1268a4bd5210SJason Evans - Fix bin->runcur management to fix a layout policy bug. This bug did not 1269a4bd5210SJason Evans affect correctness. 1270a4bd5210SJason Evans - Fix a bug in choose_arena_hard() that potentially caused more arenas to be 1271a4bd5210SJason Evans initialized than necessary. 1272a4bd5210SJason Evans - Add missing "opt.lg_tcache_max" mallctl implementation. 1273a4bd5210SJason Evans - Use glibc allocator hooks to make mixed allocator usage less likely. 1274a4bd5210SJason Evans - Fix build issues for --disable-tcache. 12758ed34ab0SJason Evans - Don't mangle pthread_create() when --with-private-namespace is specified. 1276a4bd5210SJason Evans 1277a4bd5210SJason Evans* 2.2.5 (November 14, 2011) 1278a4bd5210SJason Evans 1279a4bd5210SJason Evans Bug fixes: 1280a4bd5210SJason Evans - Fix huge_ralloc() race when using mremap(2). This is a serious bug that 1281a4bd5210SJason Evans could cause memory corruption and/or crashes. 1282a4bd5210SJason Evans - Fix huge_ralloc() to maintain chunk statistics. 1283a4bd5210SJason Evans - Fix malloc_stats_print(..., "a") output. 1284a4bd5210SJason Evans 1285a4bd5210SJason Evans* 2.2.4 (November 5, 2011) 1286a4bd5210SJason Evans 1287a4bd5210SJason Evans Bug fixes: 1288a4bd5210SJason Evans - Initialize arenas_tsd before using it. This bug existed for 2.2.[0-3], as 1289a4bd5210SJason Evans well as for --disable-tls builds in earlier releases. 1290a4bd5210SJason Evans - Do not assume a 4 KiB page size in test/rallocm.c. 1291a4bd5210SJason Evans 1292a4bd5210SJason Evans* 2.2.3 (August 31, 2011) 1293a4bd5210SJason Evans 1294a4bd5210SJason Evans This version fixes numerous bugs related to heap profiling. 1295a4bd5210SJason Evans 1296a4bd5210SJason Evans Bug fixes: 1297a4bd5210SJason Evans - Fix a prof-related race condition. This bug could cause memory corruption, 1298a4bd5210SJason Evans but only occurred in non-default configurations (prof_accum:false). 1299a4bd5210SJason Evans - Fix off-by-one backtracing issues (make sure that prof_alloc_prep() is 1300a4bd5210SJason Evans excluded from backtraces). 1301a4bd5210SJason Evans - Fix a prof-related bug in realloc() (only triggered by OOM errors). 1302a4bd5210SJason Evans - Fix prof-related bugs in allocm() and rallocm(). 1303a4bd5210SJason Evans - Fix prof_tdata_cleanup() for --disable-tls builds. 1304a4bd5210SJason Evans - Fix a relative include path, to fix objdir builds. 1305a4bd5210SJason Evans 1306a4bd5210SJason Evans* 2.2.2 (July 30, 2011) 1307a4bd5210SJason Evans 1308a4bd5210SJason Evans Bug fixes: 1309a4bd5210SJason Evans - Fix a build error for --disable-tcache. 1310a4bd5210SJason Evans - Fix assertions in arena_purge() (for real this time). 1311a4bd5210SJason Evans - Add the --with-private-namespace option. This is a workaround for symbol 1312a4bd5210SJason Evans conflicts that can inadvertently arise when using static libraries. 1313a4bd5210SJason Evans 1314a4bd5210SJason Evans* 2.2.1 (March 30, 2011) 1315a4bd5210SJason Evans 1316a4bd5210SJason Evans Bug fixes: 1317a4bd5210SJason Evans - Implement atomic operations for x86/x64. This fixes compilation failures 1318a4bd5210SJason Evans for versions of gcc that are still in wide use. 1319a4bd5210SJason Evans - Fix an assertion in arena_purge(). 1320a4bd5210SJason Evans 1321a4bd5210SJason Evans* 2.2.0 (March 22, 2011) 1322a4bd5210SJason Evans 1323a4bd5210SJason Evans This version incorporates several improvements to algorithms and data 1324a4bd5210SJason Evans structures that tend to reduce fragmentation and increase speed. 1325a4bd5210SJason Evans 1326a4bd5210SJason Evans New features: 1327a4bd5210SJason Evans - Add the "stats.cactive" mallctl. 1328a4bd5210SJason Evans - Update pprof (from google-perftools 1.7). 1329a4bd5210SJason Evans - Improve backtracing-related configuration logic, and add the 1330a4bd5210SJason Evans --disable-prof-libgcc option. 1331a4bd5210SJason Evans 1332a4bd5210SJason Evans Bug fixes: 1333a4bd5210SJason Evans - Change default symbol visibility from "internal", to "hidden", which 1334a4bd5210SJason Evans decreases the overhead of library-internal function calls. 1335a4bd5210SJason Evans - Fix symbol visibility so that it is also set on OS X. 1336a4bd5210SJason Evans - Fix a build dependency regression caused by the introduction of the .pic.o 1337a4bd5210SJason Evans suffix for PIC object files. 1338a4bd5210SJason Evans - Add missing checks for mutex initialization failures. 1339a4bd5210SJason Evans - Don't use libgcc-based backtracing except on x64, where it is known to work. 1340a4bd5210SJason Evans - Fix deadlocks on OS X that were due to memory allocation in 1341a4bd5210SJason Evans pthread_mutex_lock(). 1342a4bd5210SJason Evans - Heap profiling-specific fixes: 1343a4bd5210SJason Evans + Fix memory corruption due to integer overflow in small region index 1344a4bd5210SJason Evans computation, when using a small enough sample interval that profiling 1345a4bd5210SJason Evans context pointers are stored in small run headers. 1346a4bd5210SJason Evans + Fix a bootstrap ordering bug that only occurred with TLS disabled. 1347a4bd5210SJason Evans + Fix a rallocm() rsize bug. 1348a4bd5210SJason Evans + Fix error detection bugs for aligned memory allocation. 1349a4bd5210SJason Evans 1350a4bd5210SJason Evans* 2.1.3 (March 14, 2011) 1351a4bd5210SJason Evans 1352a4bd5210SJason Evans Bug fixes: 1353a4bd5210SJason Evans - Fix a cpp logic regression (due to the "thread.{de,}allocatedp" mallctl fix 1354a4bd5210SJason Evans for OS X in 2.1.2). 1355a4bd5210SJason Evans - Fix a "thread.arena" mallctl bug. 1356a4bd5210SJason Evans - Fix a thread cache stats merging bug. 1357a4bd5210SJason Evans 1358a4bd5210SJason Evans* 2.1.2 (March 2, 2011) 1359a4bd5210SJason Evans 1360a4bd5210SJason Evans Bug fixes: 1361a4bd5210SJason Evans - Fix "thread.{de,}allocatedp" mallctl for OS X. 1362a4bd5210SJason Evans - Add missing jemalloc.a to build system. 1363a4bd5210SJason Evans 1364a4bd5210SJason Evans* 2.1.1 (January 31, 2011) 1365a4bd5210SJason Evans 1366a4bd5210SJason Evans Bug fixes: 1367a4bd5210SJason Evans - Fix aligned huge reallocation (affected allocm()). 1368a4bd5210SJason Evans - Fix the ALLOCM_LG_ALIGN macro definition. 1369a4bd5210SJason Evans - Fix a heap dumping deadlock. 1370a4bd5210SJason Evans - Fix a "thread.arena" mallctl bug. 1371a4bd5210SJason Evans 1372a4bd5210SJason Evans* 2.1.0 (December 3, 2010) 1373a4bd5210SJason Evans 1374a4bd5210SJason Evans This version incorporates some optimizations that can't quite be considered 1375a4bd5210SJason Evans bug fixes. 1376a4bd5210SJason Evans 1377a4bd5210SJason Evans New features: 1378a4bd5210SJason Evans - Use Linux's mremap(2) for huge object reallocation when possible. 1379a4bd5210SJason Evans - Avoid locking in mallctl*() when possible. 1380a4bd5210SJason Evans - Add the "thread.[de]allocatedp" mallctl's. 1381a4bd5210SJason Evans - Convert the manual page source from roff to DocBook, and generate both roff 1382a4bd5210SJason Evans and HTML manuals. 1383a4bd5210SJason Evans 1384a4bd5210SJason Evans Bug fixes: 1385a4bd5210SJason Evans - Fix a crash due to incorrect bootstrap ordering. This only impacted 1386a4bd5210SJason Evans --enable-debug --enable-dss configurations. 1387a4bd5210SJason Evans - Fix a minor statistics bug for mallctl("swap.avail", ...). 1388a4bd5210SJason Evans 1389a4bd5210SJason Evans* 2.0.1 (October 29, 2010) 1390a4bd5210SJason Evans 1391a4bd5210SJason Evans Bug fixes: 1392a4bd5210SJason Evans - Fix a race condition in heap profiling that could cause undefined behavior 1393a4bd5210SJason Evans if "opt.prof_accum" were disabled. 1394a4bd5210SJason Evans - Add missing mutex unlocks for some OOM error paths in the heap profiling 1395a4bd5210SJason Evans code. 1396a4bd5210SJason Evans - Fix a compilation error for non-C99 builds. 1397a4bd5210SJason Evans 1398a4bd5210SJason Evans* 2.0.0 (October 24, 2010) 1399a4bd5210SJason Evans 1400a4bd5210SJason Evans This version focuses on the experimental *allocm() API, and on improved 1401a4bd5210SJason Evans run-time configuration/introspection. Nonetheless, numerous performance 1402a4bd5210SJason Evans improvements are also included. 1403a4bd5210SJason Evans 1404a4bd5210SJason Evans New features: 1405a4bd5210SJason Evans - Implement the experimental {,r,s,d}allocm() API, which provides a superset 1406a4bd5210SJason Evans of the functionality available via malloc(), calloc(), posix_memalign(), 1407a4bd5210SJason Evans realloc(), malloc_usable_size(), and free(). These functions can be used to 1408a4bd5210SJason Evans allocate/reallocate aligned zeroed memory, ask for optional extra memory 1409a4bd5210SJason Evans during reallocation, prevent object movement during reallocation, etc. 1410a4bd5210SJason Evans - Replace JEMALLOC_OPTIONS/JEMALLOC_PROF_PREFIX with MALLOC_CONF, which is 1411a4bd5210SJason Evans more human-readable, and more flexible. For example: 1412a4bd5210SJason Evans JEMALLOC_OPTIONS=AJP 1413a4bd5210SJason Evans is now: 1414a4bd5210SJason Evans MALLOC_CONF=abort:true,fill:true,stats_print:true 1415a4bd5210SJason Evans - Port to Apple OS X. Sponsored by Mozilla. 1416a4bd5210SJason Evans - Make it possible for the application to control thread-->arena mappings via 1417a4bd5210SJason Evans the "thread.arena" mallctl. 1418a4bd5210SJason Evans - Add compile-time support for all TLS-related functionality via pthreads TSD. 1419a4bd5210SJason Evans This is mainly of interest for OS X, which does not support TLS, but has a 1420a4bd5210SJason Evans TSD implementation with similar performance. 1421a4bd5210SJason Evans - Override memalign() and valloc() if they are provided by the system. 1422a4bd5210SJason Evans - Add the "arenas.purge" mallctl, which can be used to synchronously purge all 1423a4bd5210SJason Evans dirty unused pages. 1424a4bd5210SJason Evans - Make cumulative heap profiling data optional, so that it is possible to 1425a4bd5210SJason Evans limit the amount of memory consumed by heap profiling data structures. 1426a4bd5210SJason Evans - Add per thread allocation counters that can be accessed via the 1427a4bd5210SJason Evans "thread.allocated" and "thread.deallocated" mallctls. 1428a4bd5210SJason Evans 1429a4bd5210SJason Evans Incompatible changes: 1430a4bd5210SJason Evans - Remove JEMALLOC_OPTIONS and malloc_options (see MALLOC_CONF above). 1431a4bd5210SJason Evans - Increase default backtrace depth from 4 to 128 for heap profiling. 1432a4bd5210SJason Evans - Disable interval-based profile dumps by default. 1433a4bd5210SJason Evans 1434a4bd5210SJason Evans Bug fixes: 1435a4bd5210SJason Evans - Remove bad assertions in fork handler functions. These assertions could 1436a4bd5210SJason Evans cause aborts for some combinations of configure settings. 1437a4bd5210SJason Evans - Fix strerror_r() usage to deal with non-standard semantics in GNU libc. 1438a4bd5210SJason Evans - Fix leak context reporting. This bug tended to cause the number of contexts 1439a4bd5210SJason Evans to be underreported (though the reported number of objects and bytes were 1440a4bd5210SJason Evans correct). 1441a4bd5210SJason Evans - Fix a realloc() bug for large in-place growing reallocation. This bug could 1442a4bd5210SJason Evans cause memory corruption, but it was hard to trigger. 1443a4bd5210SJason Evans - Fix an allocation bug for small allocations that could be triggered if 1444a4bd5210SJason Evans multiple threads raced to create a new run of backing pages. 1445a4bd5210SJason Evans - Enhance the heap profiler to trigger samples based on usable size, rather 1446a4bd5210SJason Evans than request size. 1447a4bd5210SJason Evans - Fix a heap profiling bug due to sometimes losing track of requested object 1448a4bd5210SJason Evans size for sampled objects. 1449a4bd5210SJason Evans 1450a4bd5210SJason Evans* 1.0.3 (August 12, 2010) 1451a4bd5210SJason Evans 1452a4bd5210SJason Evans Bug fixes: 1453a4bd5210SJason Evans - Fix the libunwind-based implementation of stack backtracing (used for heap 1454a4bd5210SJason Evans profiling). This bug could cause zero-length backtraces to be reported. 1455a4bd5210SJason Evans - Add a missing mutex unlock in library initialization code. If multiple 1456a4bd5210SJason Evans threads raced to initialize malloc, some of them could end up permanently 1457a4bd5210SJason Evans blocked. 1458a4bd5210SJason Evans 1459a4bd5210SJason Evans* 1.0.2 (May 11, 2010) 1460a4bd5210SJason Evans 1461a4bd5210SJason Evans Bug fixes: 1462a4bd5210SJason Evans - Fix junk filling of large objects, which could cause memory corruption. 1463a4bd5210SJason Evans - Add MAP_NORESERVE support for chunk mapping, because otherwise virtual 1464a4bd5210SJason Evans memory limits could cause swap file configuration to fail. Contributed by 1465a4bd5210SJason Evans Jordan DeLong. 1466a4bd5210SJason Evans 1467a4bd5210SJason Evans* 1.0.1 (April 14, 2010) 1468a4bd5210SJason Evans 1469a4bd5210SJason Evans Bug fixes: 1470a4bd5210SJason Evans - Fix compilation when --enable-fill is specified. 1471a4bd5210SJason Evans - Fix threads-related profiling bugs that affected accuracy and caused memory 1472a4bd5210SJason Evans to be leaked during thread exit. 1473a4bd5210SJason Evans - Fix dirty page purging race conditions that could cause crashes. 1474a4bd5210SJason Evans - Fix crash in tcache flushing code during thread destruction. 1475a4bd5210SJason Evans 1476a4bd5210SJason Evans* 1.0.0 (April 11, 2010) 1477a4bd5210SJason Evans 1478a4bd5210SJason Evans This release focuses on speed and run-time introspection. Numerous 1479a4bd5210SJason Evans algorithmic improvements make this release substantially faster than its 1480a4bd5210SJason Evans predecessors. 1481a4bd5210SJason Evans 1482a4bd5210SJason Evans New features: 1483a4bd5210SJason Evans - Implement autoconf-based configuration system. 1484a4bd5210SJason Evans - Add mallctl*(), for the purposes of introspection and run-time 1485a4bd5210SJason Evans configuration. 1486a4bd5210SJason Evans - Make it possible for the application to manually flush a thread's cache, via 1487a4bd5210SJason Evans the "tcache.flush" mallctl. 1488a4bd5210SJason Evans - Base maximum dirty page count on proportion of active memory. 1489d0e79aa3SJason Evans - Compute various additional run-time statistics, including per size class 1490a4bd5210SJason Evans statistics for large objects. 1491a4bd5210SJason Evans - Expose malloc_stats_print(), which can be called repeatedly by the 1492a4bd5210SJason Evans application. 1493a4bd5210SJason Evans - Simplify the malloc_message() signature to only take one string argument, 1494a4bd5210SJason Evans and incorporate an opaque data pointer argument for use by the application 1495a4bd5210SJason Evans in combination with malloc_stats_print(). 1496a4bd5210SJason Evans - Add support for allocation backed by one or more swap files, and allow the 1497a4bd5210SJason Evans application to disable over-commit if swap files are in use. 1498a4bd5210SJason Evans - Implement allocation profiling and leak checking. 1499a4bd5210SJason Evans 1500a4bd5210SJason Evans Removed features: 1501a4bd5210SJason Evans - Remove the dynamic arena rebalancing code, since thread-specific caching 1502a4bd5210SJason Evans reduces its utility. 1503a4bd5210SJason Evans 1504a4bd5210SJason Evans Bug fixes: 1505a4bd5210SJason Evans - Modify chunk allocation to work when address space layout randomization 1506a4bd5210SJason Evans (ASLR) is in use. 1507a4bd5210SJason Evans - Fix thread cleanup bugs related to TLS destruction. 1508a4bd5210SJason Evans - Handle 0-size allocation requests in posix_memalign(). 1509a4bd5210SJason Evans - Fix a chunk leak. The leaked chunks were never touched, so this impacted 1510a4bd5210SJason Evans virtual memory usage, but not physical memory usage. 1511a4bd5210SJason Evans 1512a4bd5210SJason Evans* linux_2008082[78]a (August 27/28, 2008) 1513a4bd5210SJason Evans 1514a4bd5210SJason Evans These snapshot releases are the simple result of incorporating Linux-specific 1515a4bd5210SJason Evans support into the FreeBSD malloc sources. 1516a4bd5210SJason Evans 1517a4bd5210SJason Evans-------------------------------------------------------------------------------- 1518a4bd5210SJason Evansvim:filetype=text:textwidth=80 1519