xref: /freebsd/contrib/jemalloc/ChangeLog (revision c5ad81420c495d1d5de04209b0ec4fcb435c322c)
1a4bd5210SJason EvansFollowing are change highlights associated with official releases.  Important
2d0e79aa3SJason Evansbug fixes are all mentioned, but some internal enhancements are omitted here for
3d0e79aa3SJason Evansbrevity.  Much more detail can be found in the git revision history:
4a4bd5210SJason Evans
5706d9bd1SJason Evans    https://github.com/jemalloc/jemalloc
6706d9bd1SJason Evans
7*c5ad8142SEric van Gyzen* 5.2.1 (August 5, 2019)
8*c5ad8142SEric van Gyzen
9*c5ad8142SEric van Gyzen  This release is primarily about Windows.  A critical virtual memory leak is
10*c5ad8142SEric van Gyzen  resolved on all Windows platforms.  The regression was present in all releases
11*c5ad8142SEric van Gyzen  since 5.0.0.
12*c5ad8142SEric van Gyzen
13*c5ad8142SEric van Gyzen  Bug fixes:
14*c5ad8142SEric van Gyzen  - Fix a severe virtual memory leak on Windows.  This regression was first
15*c5ad8142SEric van Gyzen    released in 5.0.0.  (@Ignition, @j0t, @frederik-h, @davidtgoldblatt,
16*c5ad8142SEric van Gyzen    @interwq)
17*c5ad8142SEric van Gyzen  - Fix size 0 handling in posix_memalign().  This regression was first released
18*c5ad8142SEric van Gyzen    in 5.2.0.  (@interwq)
19*c5ad8142SEric van Gyzen  - Fix the prof_log unit test which may observe unexpected backtraces from
20*c5ad8142SEric van Gyzen    compiler optimizations.  The test was first added in 5.2.0.  (@marxin,
21*c5ad8142SEric van Gyzen    @gnzlbg, @interwq)
22*c5ad8142SEric van Gyzen  - Fix the declaration of the extent_avail tree.  This regression was first
23*c5ad8142SEric van Gyzen    released in 5.1.0.  (@zoulasc)
24*c5ad8142SEric van Gyzen  - Fix an incorrect reference in jeprof.  This functionality was first released
25*c5ad8142SEric van Gyzen    in 3.0.0.  (@prehistoric-penguin)
26*c5ad8142SEric van Gyzen  - Fix an assertion on the deallocation fast-path.  This regression was first
27*c5ad8142SEric van Gyzen    released in 5.2.0.  (@yinan1048576)
28*c5ad8142SEric van Gyzen  - Fix the TLS_MODEL attribute in headers.  This regression was first released
29*c5ad8142SEric van Gyzen    in 5.0.0.  (@zoulasc, @interwq)
30*c5ad8142SEric van Gyzen
31*c5ad8142SEric van Gyzen  Optimizations and refactors:
32*c5ad8142SEric van Gyzen  - Implement opt.retain on Windows and enable by default on 64-bit.  (@interwq,
33*c5ad8142SEric van Gyzen    @davidtgoldblatt)
34*c5ad8142SEric van Gyzen  - Optimize away a branch on the operator delete[] path.  (@mgrice)
35*c5ad8142SEric van Gyzen  - Add format annotation to the format generator function.  (@zoulasc)
36*c5ad8142SEric van Gyzen  - Refactor and improve the size class header generation.  (@yinan1048576)
37*c5ad8142SEric van Gyzen  - Remove best fit.  (@djwatson)
38*c5ad8142SEric van Gyzen  - Avoid blocking on background thread locks for stats.  (@oranagra, @interwq)
39*c5ad8142SEric van Gyzen
40*c5ad8142SEric van Gyzen* 5.2.0 (April 2, 2019)
41*c5ad8142SEric van Gyzen
42*c5ad8142SEric van Gyzen  This release includes a few notable improvements, which are summarized below:
43*c5ad8142SEric van Gyzen  1) improved fast-path performance from the optimizations by @djwatson; 2)
44*c5ad8142SEric van Gyzen  reduced virtual memory fragmentation and metadata usage; and 3) bug fixes on
45*c5ad8142SEric van Gyzen  setting the number of background threads.  In addition, peak / spike memory
46*c5ad8142SEric van Gyzen  usage is improved with certain allocation patterns.  As usual, the release and
47*c5ad8142SEric van Gyzen  prior dev versions have gone through large-scale production testing.
48*c5ad8142SEric van Gyzen
49*c5ad8142SEric van Gyzen  New features:
50*c5ad8142SEric van Gyzen  - Implement oversize_threshold, which uses a dedicated arena for allocations
51*c5ad8142SEric van Gyzen    crossing the specified threshold to reduce fragmentation.  (@interwq)
52*c5ad8142SEric van Gyzen  - Add extents usage information to stats.  (@tyleretzel)
53*c5ad8142SEric van Gyzen  - Log time information for sampled allocations.  (@tyleretzel)
54*c5ad8142SEric van Gyzen  - Support 0 size in sdallocx.  (@djwatson)
55*c5ad8142SEric van Gyzen  - Output rate for certain counters in malloc_stats.  (@zinoale)
56*c5ad8142SEric van Gyzen  - Add configure option --enable-readlinkat, which allows the use of readlinkat
57*c5ad8142SEric van Gyzen    over readlink.  (@davidtgoldblatt)
58*c5ad8142SEric van Gyzen  - Add configure options --{enable,disable}-{static,shared} to allow not
59*c5ad8142SEric van Gyzen    building unwanted libraries.  (@Ericson2314)
60*c5ad8142SEric van Gyzen  - Add configure option --disable-libdl to enable fully static builds.
61*c5ad8142SEric van Gyzen    (@interwq)
62*c5ad8142SEric van Gyzen  - Add mallctl interfaces:
63*c5ad8142SEric van Gyzen	+ opt.oversize_threshold (@interwq)
64*c5ad8142SEric van Gyzen	+ stats.arenas.<i>.extent_avail (@tyleretzel)
65*c5ad8142SEric van Gyzen	+ stats.arenas.<i>.extents.<j>.n{dirty,muzzy,retained} (@tyleretzel)
66*c5ad8142SEric van Gyzen	+ stats.arenas.<i>.extents.<j>.{dirty,muzzy,retained}_bytes
67*c5ad8142SEric van Gyzen	  (@tyleretzel)
68*c5ad8142SEric van Gyzen
69*c5ad8142SEric van Gyzen  Portability improvements:
70*c5ad8142SEric van Gyzen  - Update MSVC builds.  (@maksqwe, @rustyx)
71*c5ad8142SEric van Gyzen  - Workaround a compiler optimizer bug on s390x.  (@rkmisra)
72*c5ad8142SEric van Gyzen  - Make use of pthread_set_name_np(3) on FreeBSD.  (@trasz)
73*c5ad8142SEric van Gyzen  - Implement malloc_getcpu() to enable percpu_arena for windows.  (@santagada)
74*c5ad8142SEric van Gyzen  - Link against -pthread instead of -lpthread.  (@paravoid)
75*c5ad8142SEric van Gyzen  - Make background_thread not dependent on libdl.  (@interwq)
76*c5ad8142SEric van Gyzen  - Add stringify to fix a linker directive issue on MSVC.  (@daverigby)
77*c5ad8142SEric van Gyzen  - Detect and fall back when 8-bit atomics are unavailable.  (@interwq)
78*c5ad8142SEric van Gyzen  - Fall back to the default pthread_create if dlsym(3) fails.  (@interwq)
79*c5ad8142SEric van Gyzen
80*c5ad8142SEric van Gyzen  Optimizations and refactors:
81*c5ad8142SEric van Gyzen  - Refactor the TSD module.  (@davidtgoldblatt)
82*c5ad8142SEric van Gyzen  - Avoid taking extents_muzzy mutex when muzzy is disabled.  (@interwq)
83*c5ad8142SEric van Gyzen  - Avoid taking large_mtx for auto arenas on the tcache flush path.  (@interwq)
84*c5ad8142SEric van Gyzen  - Optimize ixalloc by avoiding a size lookup.  (@interwq)
85*c5ad8142SEric van Gyzen  - Implement opt.oversize_threshold which uses a dedicated arena for requests
86*c5ad8142SEric van Gyzen    crossing the threshold, also eagerly purges the oversize extents.  Default
87*c5ad8142SEric van Gyzen    the threshold to 8 MiB.  (@interwq)
88*c5ad8142SEric van Gyzen  - Clean compilation with -Wextra.  (@gnzlbg, @jasone)
89*c5ad8142SEric van Gyzen  - Refactor the size class module.  (@davidtgoldblatt)
90*c5ad8142SEric van Gyzen  - Refactor the stats emitter.  (@tyleretzel)
91*c5ad8142SEric van Gyzen  - Optimize pow2_ceil.  (@rkmisra)
92*c5ad8142SEric van Gyzen  - Avoid runtime detection of lazy purging on FreeBSD.  (@trasz)
93*c5ad8142SEric van Gyzen  - Optimize mmap(2) alignment handling on FreeBSD.  (@trasz)
94*c5ad8142SEric van Gyzen  - Improve error handling for THP state initialization.  (@jsteemann)
95*c5ad8142SEric van Gyzen  - Rework the malloc() fast path.  (@djwatson)
96*c5ad8142SEric van Gyzen  - Rework the free() fast path.  (@djwatson)
97*c5ad8142SEric van Gyzen  - Refactor and optimize the tcache fill / flush paths.  (@djwatson)
98*c5ad8142SEric van Gyzen  - Optimize sync / lwsync on PowerPC.  (@chmeeedalf)
99*c5ad8142SEric van Gyzen  - Bypass extent_dalloc() when retain is enabled.  (@interwq)
100*c5ad8142SEric van Gyzen  - Optimize the locking on large deallocation.  (@interwq)
101*c5ad8142SEric van Gyzen  - Reduce the number of pages committed from sanity checking in debug build.
102*c5ad8142SEric van Gyzen    (@trasz, @interwq)
103*c5ad8142SEric van Gyzen  - Deprecate OSSpinLock.  (@interwq)
104*c5ad8142SEric van Gyzen  - Lower the default number of background threads to 4 (when the feature
105*c5ad8142SEric van Gyzen    is enabled).  (@interwq)
106*c5ad8142SEric van Gyzen  - Optimize the trylock spin wait.  (@djwatson)
107*c5ad8142SEric van Gyzen  - Use arena index for arena-matching checks.  (@interwq)
108*c5ad8142SEric van Gyzen  - Avoid forced decay on thread termination when using background threads.
109*c5ad8142SEric van Gyzen    (@interwq)
110*c5ad8142SEric van Gyzen  - Disable muzzy decay by default.  (@djwatson, @interwq)
111*c5ad8142SEric van Gyzen  - Only initialize libgcc unwinder when profiling is enabled.  (@paravoid,
112*c5ad8142SEric van Gyzen    @interwq)
113*c5ad8142SEric van Gyzen
114*c5ad8142SEric van Gyzen  Bug fixes (all only relevant to jemalloc 5.x):
115*c5ad8142SEric van Gyzen  - Fix background thread index issues with max_background_threads.  (@djwatson,
116*c5ad8142SEric van Gyzen    @interwq)
117*c5ad8142SEric van Gyzen  - Fix stats output for opt.lg_extent_max_active_fit.  (@interwq)
118*c5ad8142SEric van Gyzen  - Fix opt.prof_prefix initialization.  (@davidtgoldblatt)
119*c5ad8142SEric van Gyzen  - Properly trigger decay on tcache destroy.  (@interwq, @amosbird)
120*c5ad8142SEric van Gyzen  - Fix tcache.flush.  (@interwq)
121*c5ad8142SEric van Gyzen  - Detect whether explicit extent zero out is necessary with huge pages or
122*c5ad8142SEric van Gyzen    custom extent hooks, which may change the purge semantics.  (@interwq)
123*c5ad8142SEric van Gyzen  - Fix a side effect caused by extent_max_active_fit combined with decay-based
124*c5ad8142SEric van Gyzen    purging, where freed extents can accumulate and not be reused for an
125*c5ad8142SEric van Gyzen    extended period of time.  (@interwq, @mpghf)
126*c5ad8142SEric van Gyzen  - Fix a missing unlock on extent register error handling.  (@zoulasc)
127*c5ad8142SEric van Gyzen
128*c5ad8142SEric van Gyzen  Testing:
129*c5ad8142SEric van Gyzen  - Simplify the Travis script output.  (@gnzlbg)
130*c5ad8142SEric van Gyzen  - Update the test scripts for FreeBSD.  (@devnexen)
131*c5ad8142SEric van Gyzen  - Add unit tests for the producer-consumer pattern.  (@interwq)
132*c5ad8142SEric van Gyzen  - Add Cirrus-CI config for FreeBSD builds.  (@jasone)
133*c5ad8142SEric van Gyzen  - Add size-matching sanity checks on tcache flush.  (@davidtgoldblatt,
134*c5ad8142SEric van Gyzen    @interwq)
135*c5ad8142SEric van Gyzen
136*c5ad8142SEric van Gyzen  Incompatible changes:
137*c5ad8142SEric van Gyzen  - Remove --with-lg-page-sizes.  (@davidtgoldblatt)
138*c5ad8142SEric van Gyzen
139*c5ad8142SEric van Gyzen  Documentation:
140*c5ad8142SEric van Gyzen  - Attempt to build docs by default, however skip doc building when xsltproc
141*c5ad8142SEric van Gyzen    is missing. (@interwq, @cmuellner)
142*c5ad8142SEric van Gyzen
143*c5ad8142SEric van Gyzen* 5.1.0 (May 4, 2018)
1440ef50b4eSJason Evans
1450ef50b4eSJason Evans  This release is primarily about fine-tuning, ranging from several new features
1460ef50b4eSJason Evans  to numerous notable performance and portability enhancements.  The release and
1470ef50b4eSJason Evans  prior dev versions have been running in multiple large scale applications for
1480ef50b4eSJason Evans  months, and the cumulative improvements are substantial in many cases.
1490ef50b4eSJason Evans
1500ef50b4eSJason Evans  Given the long and successful production runs, this release is likely a good
1510ef50b4eSJason Evans  candidate for applications to upgrade, from both jemalloc 5.0 and before.  For
1520ef50b4eSJason Evans  performance-critical applications, the newly added TUNING.md provides
1530ef50b4eSJason Evans  guidelines on jemalloc tuning.
1540ef50b4eSJason Evans
1550ef50b4eSJason Evans  New features:
1560ef50b4eSJason Evans  - Implement transparent huge page support for internal metadata.  (@interwq)
1570ef50b4eSJason Evans  - Add opt.thp to allow enabling / disabling transparent huge pages for all
1580ef50b4eSJason Evans    mappings.  (@interwq)
1590ef50b4eSJason Evans  - Add maximum background thread count option.  (@djwatson)
1600ef50b4eSJason Evans  - Allow prof_active to control opt.lg_prof_interval and prof.gdump.
1610ef50b4eSJason Evans    (@interwq)
1620ef50b4eSJason Evans  - Allow arena index lookup based on allocation addresses via mallctl.
1630ef50b4eSJason Evans    (@lionkov)
1640ef50b4eSJason Evans  - Allow disabling initial-exec TLS model.  (@davidtgoldblatt, @KenMacD)
1650ef50b4eSJason Evans  - Add opt.lg_extent_max_active_fit to set the max ratio between the size of
1660ef50b4eSJason Evans    the active extent selected (to split off from) and the size of the requested
1670ef50b4eSJason Evans    allocation.  (@interwq, @davidtgoldblatt)
1680ef50b4eSJason Evans  - Add retain_grow_limit to set the max size when growing virtual address
1690ef50b4eSJason Evans    space.  (@interwq)
1700ef50b4eSJason Evans  - Add mallctl interfaces:
1710ef50b4eSJason Evans    + arena.<i>.retain_grow_limit  (@interwq)
1720ef50b4eSJason Evans    + arenas.lookup  (@lionkov)
1730ef50b4eSJason Evans    + max_background_threads  (@djwatson)
1740ef50b4eSJason Evans    + opt.lg_extent_max_active_fit  (@interwq)
1750ef50b4eSJason Evans    + opt.max_background_threads  (@djwatson)
1760ef50b4eSJason Evans    + opt.metadata_thp  (@interwq)
1770ef50b4eSJason Evans    + opt.thp  (@interwq)
1780ef50b4eSJason Evans    + stats.metadata_thp  (@interwq)
1790ef50b4eSJason Evans
1800ef50b4eSJason Evans  Portability improvements:
1810ef50b4eSJason Evans  - Support GNU/kFreeBSD configuration.  (@paravoid)
1820ef50b4eSJason Evans  - Support m68k, nios2 and SH3 architectures.  (@paravoid)
1830ef50b4eSJason Evans  - Fall back to FD_CLOEXEC when O_CLOEXEC is unavailable.  (@zonyitoo)
1840ef50b4eSJason Evans  - Fix symbol listing for cross-compiling.  (@tamird)
1850ef50b4eSJason Evans  - Fix high bits computation on ARM.  (@davidtgoldblatt, @paravoid)
1860ef50b4eSJason Evans  - Disable the CPU_SPINWAIT macro for Power.  (@davidtgoldblatt, @marxin)
1870ef50b4eSJason Evans  - Fix MSVC 2015 & 2017 builds.  (@rustyx)
1880ef50b4eSJason Evans  - Improve RISC-V support.  (@EdSchouten)
1890ef50b4eSJason Evans  - Set name mangling script in strict mode.  (@nicolov)
1900ef50b4eSJason Evans  - Avoid MADV_HUGEPAGE on ARM.  (@marxin)
1910ef50b4eSJason Evans  - Modify configure to determine return value of strerror_r.
1920ef50b4eSJason Evans    (@davidtgoldblatt, @cferris1000)
1930ef50b4eSJason Evans  - Make sure CXXFLAGS is tested with CPP compiler.  (@nehaljwani)
1940ef50b4eSJason Evans  - Fix 32-bit build on MSVC.  (@rustyx)
1950ef50b4eSJason Evans  - Fix external symbol on MSVC.  (@maksqwe)
1960ef50b4eSJason Evans  - Avoid a printf format specifier warning.  (@jasone)
1970ef50b4eSJason Evans  - Add configure option --disable-initial-exec-tls which can allow jemalloc to
1980ef50b4eSJason Evans    be dynamically loaded after program startup.  (@davidtgoldblatt, @KenMacD)
1990ef50b4eSJason Evans  - AArch64: Add ILP32 support.  (@cmuellner)
2000ef50b4eSJason Evans  - Add --with-lg-vaddr configure option to support cross compiling.
2010ef50b4eSJason Evans    (@cmuellner, @davidtgoldblatt)
2020ef50b4eSJason Evans
2030ef50b4eSJason Evans  Optimizations and refactors:
2040ef50b4eSJason Evans  - Improve active extent fit with extent_max_active_fit.  This considerably
2050ef50b4eSJason Evans    reduces fragmentation over time and improves virtual memory and metadata
2060ef50b4eSJason Evans    usage.  (@davidtgoldblatt, @interwq)
2070ef50b4eSJason Evans  - Eagerly coalesce large extents to reduce fragmentation.  (@interwq)
2080ef50b4eSJason Evans  - sdallocx: only read size info when page aligned (i.e. possibly sampled),
2090ef50b4eSJason Evans    which speeds up the sized deallocation path significantly.  (@interwq)
2100ef50b4eSJason Evans  - Avoid attempting new mappings for in place expansion with retain, since
2110ef50b4eSJason Evans    it rarely succeeds in practice and causes high overhead.  (@interwq)
2120ef50b4eSJason Evans  - Refactor OOM handling in newImpl.  (@wqfish)
2130ef50b4eSJason Evans  - Add internal fine-grained logging functionality for debugging use.
2140ef50b4eSJason Evans    (@davidtgoldblatt)
2150ef50b4eSJason Evans  - Refactor arena / tcache interactions.  (@davidtgoldblatt)
2160ef50b4eSJason Evans  - Refactor extent management with dumpable flag.  (@davidtgoldblatt)
2170ef50b4eSJason Evans  - Add runtime detection of lazy purging.  (@interwq)
2180ef50b4eSJason Evans  - Use pairing heap instead of red-black tree for extents_avail.  (@djwatson)
2190ef50b4eSJason Evans  - Use sysctl on startup in FreeBSD.  (@trasz)
2200ef50b4eSJason Evans  - Use thread local prng state instead of atomic.  (@djwatson)
2210ef50b4eSJason Evans  - Make decay to always purge one more extent than before, because in
2220ef50b4eSJason Evans    practice large extents are usually the ones that cross the decay threshold.
2230ef50b4eSJason Evans    Purging the additional extent helps save memory as well as reduce VM
2240ef50b4eSJason Evans    fragmentation.  (@interwq)
2250ef50b4eSJason Evans  - Fast division by dynamic values.  (@davidtgoldblatt)
2260ef50b4eSJason Evans  - Improve the fit for aligned allocation.  (@interwq, @edwinsmith)
2270ef50b4eSJason Evans  - Refactor extent_t bitpacking.  (@rkmisra)
2280ef50b4eSJason Evans  - Optimize the generated assembly for ticker operations.  (@davidtgoldblatt)
2290ef50b4eSJason Evans  - Convert stats printing to use a structured text emitter.  (@davidtgoldblatt)
2300ef50b4eSJason Evans  - Remove preserve_lru feature for extents management.  (@djwatson)
2310ef50b4eSJason Evans  - Consolidate two memory loads into one on the fast deallocation path.
2320ef50b4eSJason Evans    (@davidtgoldblatt, @interwq)
2330ef50b4eSJason Evans
2340ef50b4eSJason Evans  Bug fixes (most of the issues are only relevant to jemalloc 5.0):
2350ef50b4eSJason Evans  - Fix deadlock with multithreaded fork in OS X.  (@davidtgoldblatt)
2360ef50b4eSJason Evans  - Validate returned file descriptor before use.  (@zonyitoo)
2370ef50b4eSJason Evans  - Fix a few background thread initialization and shutdown issues.  (@interwq)
2380ef50b4eSJason Evans  - Fix an extent coalesce + decay race by taking both coalescing extents off
2390ef50b4eSJason Evans    the LRU list.  (@interwq)
2400ef50b4eSJason Evans  - Fix potentially unbound increase during decay, caused by one thread keep
2410ef50b4eSJason Evans    stashing memory to purge while other threads generating new pages.  The
2420ef50b4eSJason Evans    number of pages to purge is checked to prevent this.  (@interwq)
2430ef50b4eSJason Evans  - Fix a FreeBSD bootstrap assertion.  (@strejda, @interwq)
2440ef50b4eSJason Evans  - Handle 32 bit mutex counters.  (@rkmisra)
2450ef50b4eSJason Evans  - Fix a indexing bug when creating background threads.  (@davidtgoldblatt,
2460ef50b4eSJason Evans    @binliu19)
2470ef50b4eSJason Evans  - Fix arguments passed to extent_init.  (@yuleniwo, @interwq)
2480ef50b4eSJason Evans  - Fix addresses used for ordering mutexes.  (@rkmisra)
2490ef50b4eSJason Evans  - Fix abort_conf processing during bootstrap.  (@interwq)
2500ef50b4eSJason Evans  - Fix include path order for out-of-tree builds.  (@cmuellner)
2510ef50b4eSJason Evans
2520ef50b4eSJason Evans  Incompatible changes:
2530ef50b4eSJason Evans  - Remove --disable-thp.  (@interwq)
2540ef50b4eSJason Evans  - Remove mallctl interfaces:
2550ef50b4eSJason Evans    + config.thp  (@interwq)
2560ef50b4eSJason Evans
2570ef50b4eSJason Evans  Documentation:
2580ef50b4eSJason Evans  - Add TUNING.md.  (@interwq, @davidtgoldblatt, @djwatson)
2590ef50b4eSJason Evans
2608b2f5aafSJason Evans* 5.0.1 (July 1, 2017)
2618b2f5aafSJason Evans
2628b2f5aafSJason Evans  This bugfix release fixes several issues, most of which are obscure enough
2638b2f5aafSJason Evans  that typical applications are not impacted.
2648b2f5aafSJason Evans
2658b2f5aafSJason Evans  Bug fixes:
2668b2f5aafSJason Evans  - Update decay->nunpurged before purging, in order to avoid potential update
2678b2f5aafSJason Evans    races and subsequent incorrect purging volume.  (@interwq)
2688b2f5aafSJason Evans  - Only abort on dlsym(3) error if the failure impacts an enabled feature (lazy
2698b2f5aafSJason Evans    locking and/or background threads).  This mitigates an initialization
2708b2f5aafSJason Evans    failure bug for which we still do not have a clear reproduction test case.
2718b2f5aafSJason Evans    (@interwq)
2728b2f5aafSJason Evans  - Modify tsd management so that it neither crashes nor leaks if a thread's
2738b2f5aafSJason Evans    only allocation activity is to call free() after TLS destructors have been
2748b2f5aafSJason Evans    executed.  This behavior was observed when operating with GNU libc, and is
2758b2f5aafSJason Evans    unlikely to be an issue with other libc implementations.  (@interwq)
2768b2f5aafSJason Evans  - Mask signals during background thread creation.  This prevents signals from
2778b2f5aafSJason Evans    being inadvertently delivered to background threads.  (@jasone,
2788b2f5aafSJason Evans    @davidtgoldblatt, @interwq)
2798b2f5aafSJason Evans  - Avoid inactivity checks within background threads, in order to prevent
2808b2f5aafSJason Evans    recursive mutex acquisition.  (@interwq)
2818b2f5aafSJason Evans  - Fix extent_grow_retained() to use the specified hooks when the
2828b2f5aafSJason Evans    arena.<i>.extent_hooks mallctl is used to override the default hooks.
2838b2f5aafSJason Evans    (@interwq)
2848b2f5aafSJason Evans  - Add missing reentrancy support for custom extent hooks which allocate.
2858b2f5aafSJason Evans    (@interwq)
2868b2f5aafSJason Evans  - Post-fork(2), re-initialize the list of tcaches associated with each arena
2878b2f5aafSJason Evans    to contain no tcaches except the forking thread's.  (@interwq)
2888b2f5aafSJason Evans  - Add missing post-fork(2) mutex reinitialization for extent_grow_mtx.  This
2898b2f5aafSJason Evans    fixes potential deadlocks after fork(2).  (@interwq)
2908b2f5aafSJason Evans  - Enforce minimum autoconf version (currently 2.68), since 2.63 is known to
2918b2f5aafSJason Evans    generate corrupt configure scripts.  (@jasone)
2928b2f5aafSJason Evans  - Ensure that the configured page size (--with-lg-page) is no larger than the
2938b2f5aafSJason Evans    configured huge page size (--with-lg-hugepage).  (@jasone)
2948b2f5aafSJason Evans
295b7eaed25SJason Evans* 5.0.0 (June 13, 2017)
296b7eaed25SJason Evans
297b7eaed25SJason Evans  Unlike all previous jemalloc releases, this release does not use naturally
298b7eaed25SJason Evans  aligned "chunks" for virtual memory management, and instead uses page-aligned
299b7eaed25SJason Evans  "extents".  This change has few externally visible effects, but the internal
300b7eaed25SJason Evans  impacts are... extensive.  Many other internal changes combine to make this
301b7eaed25SJason Evans  the most cohesively designed version of jemalloc so far, with ample
302b7eaed25SJason Evans  opportunity for further enhancements.
303b7eaed25SJason Evans
304b7eaed25SJason Evans  Continuous integration is now an integral aspect of development thanks to the
305b7eaed25SJason Evans  efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
306b7eaed25SJason Evans  stable on the tested platforms (Linux, FreeBSD, macOS, and Windows).  As a
307b7eaed25SJason Evans  side effect the official release frequency may decrease over time.
308b7eaed25SJason Evans
309b7eaed25SJason Evans  New features:
310b7eaed25SJason Evans  - Implement optional per-CPU arena support; threads choose which arena to use
311b7eaed25SJason Evans    based on current CPU rather than on fixed thread-->arena associations.
312b7eaed25SJason Evans    (@interwq)
313b7eaed25SJason Evans  - Implement two-phase decay of unused dirty pages.  Pages transition from
314b7eaed25SJason Evans    dirty-->muzzy-->clean, where the first phase transition relies on
315b7eaed25SJason Evans    madvise(... MADV_FREE) semantics, and the second phase transition discards
316b7eaed25SJason Evans    pages such that they are replaced with demand-zeroed pages on next access.
317b7eaed25SJason Evans    (@jasone)
318b7eaed25SJason Evans  - Increase decay time resolution from seconds to milliseconds.  (@jasone)
319b7eaed25SJason Evans  - Implement opt-in per CPU background threads, and use them for asynchronous
320b7eaed25SJason Evans    decay-driven unused dirty page purging.  (@interwq)
321b7eaed25SJason Evans  - Add mutex profiling, which collects a variety of statistics useful for
322b7eaed25SJason Evans    diagnosing overhead/contention issues.  (@interwq)
323b7eaed25SJason Evans  - Add C++ new/delete operator bindings.  (@djwatson)
324b7eaed25SJason Evans  - Support manually created arena destruction, such that all data and metadata
325b7eaed25SJason Evans    are discarded.  Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
326b7eaed25SJason Evans    associated with destroyed arenas.  (@jasone)
327b7eaed25SJason Evans  - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
328b7eaed25SJason Evans    merged/destroyed arena statistics via mallctl.  (@jasone)
329b7eaed25SJason Evans  - Add opt.abort_conf to optionally abort if invalid configuration options are
330b7eaed25SJason Evans    detected during initialization.  (@interwq)
331b7eaed25SJason Evans  - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
332b7eaed25SJason Evans    stats dumped during exit if opt.stats_print is true.  (@jasone)
333b7eaed25SJason Evans  - Add --with-version=VERSION for use when embedding jemalloc into another
334b7eaed25SJason Evans    project's git repository.  (@jasone)
335b7eaed25SJason Evans  - Add --disable-thp to support cross compiling.  (@jasone)
336b7eaed25SJason Evans  - Add --with-lg-hugepage to support cross compiling.  (@jasone)
337b7eaed25SJason Evans  - Add mallctl interfaces (various authors):
338b7eaed25SJason Evans    + background_thread
339b7eaed25SJason Evans    + opt.abort_conf
340b7eaed25SJason Evans    + opt.retain
341b7eaed25SJason Evans    + opt.percpu_arena
342b7eaed25SJason Evans    + opt.background_thread
343b7eaed25SJason Evans    + opt.{dirty,muzzy}_decay_ms
344b7eaed25SJason Evans    + opt.stats_print_opts
345b7eaed25SJason Evans    + arena.<i>.initialized
346b7eaed25SJason Evans    + arena.<i>.destroy
347b7eaed25SJason Evans    + arena.<i>.{dirty,muzzy}_decay_ms
348b7eaed25SJason Evans    + arena.<i>.extent_hooks
349b7eaed25SJason Evans    + arenas.{dirty,muzzy}_decay_ms
350b7eaed25SJason Evans    + arenas.bin.<i>.slab_size
351b7eaed25SJason Evans    + arenas.nlextents
352b7eaed25SJason Evans    + arenas.lextent.<i>.size
353b7eaed25SJason Evans    + arenas.create
354b7eaed25SJason Evans    + stats.background_thread.{num_threads,num_runs,run_interval}
355b7eaed25SJason Evans    + stats.mutexes.{ctl,background_thread,prof,reset}.
356b7eaed25SJason Evans      {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
357b7eaed25SJason Evans      num_owner_switch}
358b7eaed25SJason Evans    + stats.arenas.<i>.{dirty,muzzy}_decay_ms
359b7eaed25SJason Evans    + stats.arenas.<i>.uptime
360b7eaed25SJason Evans    + stats.arenas.<i>.{pmuzzy,base,internal,resident}
361b7eaed25SJason Evans    + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
362b7eaed25SJason Evans    + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
363b7eaed25SJason Evans    + stats.arenas.<i>.bins.<j>.mutex.
364b7eaed25SJason Evans      {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
365b7eaed25SJason Evans      num_owner_switch}
366b7eaed25SJason Evans    + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
367b7eaed25SJason Evans    + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
368b7eaed25SJason Evans      extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
369b7eaed25SJason Evans      {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
370b7eaed25SJason Evans      num_owner_switch}
371b7eaed25SJason Evans
372b7eaed25SJason Evans  Portability improvements:
373b7eaed25SJason Evans  - Improve reentrant allocation support, such that deadlock is less likely if
374b7eaed25SJason Evans    e.g. a system library call in turn allocates memory.  (@davidtgoldblatt,
375b7eaed25SJason Evans    @interwq)
376b7eaed25SJason Evans  - Support static linking of jemalloc with glibc.  (@djwatson)
377b7eaed25SJason Evans
378b7eaed25SJason Evans  Optimizations and refactors:
379b7eaed25SJason Evans  - Organize virtual memory as "extents" of virtual memory pages, rather than as
380b7eaed25SJason Evans    naturally aligned "chunks", and store all metadata in arbitrarily distant
381b7eaed25SJason Evans    locations.  This reduces virtual memory external fragmentation, and will
382b7eaed25SJason Evans    interact better with huge pages (not yet explicitly supported).  (@jasone)
383b7eaed25SJason Evans  - Fold large and huge size classes together; only small and large size classes
384b7eaed25SJason Evans    remain.  (@jasone)
385b7eaed25SJason Evans  - Unify the allocation paths, and merge most fast-path branching decisions.
386b7eaed25SJason Evans    (@davidtgoldblatt, @interwq)
387b7eaed25SJason Evans  - Embed per thread automatic tcache into thread-specific data, which reduces
388b7eaed25SJason Evans    conditional branches and dereferences.  Also reorganize tcache to increase
389b7eaed25SJason Evans    fast-path data locality.  (@interwq)
390b7eaed25SJason Evans  - Rewrite atomics to closely model the C11 API, convert various
391b7eaed25SJason Evans    synchronization from mutex-based to atomic, and use the explicit memory
392b7eaed25SJason Evans    ordering control to resolve various hypothetical races without increasing
393b7eaed25SJason Evans    synchronization overhead.  (@davidtgoldblatt)
394b7eaed25SJason Evans  - Extensively optimize rtree via various methods:
395b7eaed25SJason Evans    + Add multiple layers of rtree lookup caching, since rtree lookups are now
396b7eaed25SJason Evans      part of fast-path deallocation.  (@interwq)
397b7eaed25SJason Evans    + Determine rtree layout at compile time.  (@jasone)
398b7eaed25SJason Evans    + Make the tree shallower for common configurations.  (@jasone)
399b7eaed25SJason Evans    + Embed the root node in the top-level rtree data structure, thus avoiding
400b7eaed25SJason Evans      one level of indirection.  (@jasone)
401b7eaed25SJason Evans    + Further specialize leaf elements as compared to internal node elements,
402b7eaed25SJason Evans      and directly embed extent metadata needed for fast-path deallocation.
403b7eaed25SJason Evans      (@jasone)
404b7eaed25SJason Evans    + Ignore leading always-zero address bits (architecture-specific).
405b7eaed25SJason Evans      (@jasone)
406b7eaed25SJason Evans  - Reorganize headers (ongoing work) to make them hermetic, and disentangle
407b7eaed25SJason Evans    various module dependencies.  (@davidtgoldblatt)
408b7eaed25SJason Evans  - Convert various internal data structures such as size class metadata from
409b7eaed25SJason Evans    boot-time-initialized to compile-time-initialized.  Propagate resulting data
410b7eaed25SJason Evans    structure simplifications, such as making arena metadata fixed-size.
411b7eaed25SJason Evans    (@jasone)
412b7eaed25SJason Evans  - Simplify size class lookups when constrained to size classes that are
413b7eaed25SJason Evans    multiples of the page size.  This speeds lookups, but the primary benefit is
414b7eaed25SJason Evans    complexity reduction in code that was the source of numerous regressions.
415b7eaed25SJason Evans    (@jasone)
416b7eaed25SJason Evans  - Lock individual extents when possible for localized extent operations,
417b7eaed25SJason Evans    rather than relying on a top-level arena lock.  (@davidtgoldblatt, @jasone)
418b7eaed25SJason Evans  - Use first fit layout policy instead of best fit, in order to improve
419b7eaed25SJason Evans    packing.  (@jasone)
420b7eaed25SJason Evans  - If munmap(2) is not in use, use an exponential series to grow each arena's
421b7eaed25SJason Evans    virtual memory, so that the number of disjoint virtual memory mappings
422b7eaed25SJason Evans    remains low.  (@jasone)
423b7eaed25SJason Evans  - Implement per arena base allocators, so that arenas never share any virtual
424b7eaed25SJason Evans    memory pages.  (@jasone)
425b7eaed25SJason Evans  - Automatically generate private symbol name mangling macros.  (@jasone)
426b7eaed25SJason Evans
427b7eaed25SJason Evans  Incompatible changes:
428b7eaed25SJason Evans  - Replace chunk hooks with an expanded/normalized set of extent hooks.
429b7eaed25SJason Evans    (@jasone)
430b7eaed25SJason Evans  - Remove ratio-based purging.  (@jasone)
431b7eaed25SJason Evans  - Remove --disable-tcache.  (@jasone)
432b7eaed25SJason Evans  - Remove --disable-tls.  (@jasone)
433b7eaed25SJason Evans  - Remove --enable-ivsalloc.  (@jasone)
434b7eaed25SJason Evans  - Remove --with-lg-size-class-group.  (@jasone)
435b7eaed25SJason Evans  - Remove --with-lg-tiny-min.  (@jasone)
436b7eaed25SJason Evans  - Remove --disable-cc-silence.  (@jasone)
437b7eaed25SJason Evans  - Remove --enable-code-coverage.  (@jasone)
438b7eaed25SJason Evans  - Remove --disable-munmap (replaced by opt.retain).  (@jasone)
439b7eaed25SJason Evans  - Remove Valgrind support.  (@jasone)
440b7eaed25SJason Evans  - Remove quarantine support.  (@jasone)
441b7eaed25SJason Evans  - Remove redzone support.  (@jasone)
442b7eaed25SJason Evans  - Remove mallctl interfaces (various authors):
443b7eaed25SJason Evans    + config.munmap
444b7eaed25SJason Evans    + config.tcache
445b7eaed25SJason Evans    + config.tls
446b7eaed25SJason Evans    + config.valgrind
447b7eaed25SJason Evans    + opt.lg_chunk
448b7eaed25SJason Evans    + opt.purge
449b7eaed25SJason Evans    + opt.lg_dirty_mult
450b7eaed25SJason Evans    + opt.decay_time
451b7eaed25SJason Evans    + opt.quarantine
452b7eaed25SJason Evans    + opt.redzone
453b7eaed25SJason Evans    + opt.thp
454b7eaed25SJason Evans    + arena.<i>.lg_dirty_mult
455b7eaed25SJason Evans    + arena.<i>.decay_time
456b7eaed25SJason Evans    + arena.<i>.chunk_hooks
457b7eaed25SJason Evans    + arenas.initialized
458b7eaed25SJason Evans    + arenas.lg_dirty_mult
459b7eaed25SJason Evans    + arenas.decay_time
460b7eaed25SJason Evans    + arenas.bin.<i>.run_size
461b7eaed25SJason Evans    + arenas.nlruns
462b7eaed25SJason Evans    + arenas.lrun.<i>.size
463b7eaed25SJason Evans    + arenas.nhchunks
464b7eaed25SJason Evans    + arenas.hchunk.<i>.size
465b7eaed25SJason Evans    + arenas.extend
466b7eaed25SJason Evans    + stats.cactive
467b7eaed25SJason Evans    + stats.arenas.<i>.lg_dirty_mult
468b7eaed25SJason Evans    + stats.arenas.<i>.decay_time
469b7eaed25SJason Evans    + stats.arenas.<i>.metadata.{mapped,allocated}
470b7eaed25SJason Evans    + stats.arenas.<i>.{npurge,nmadvise,purged}
471b7eaed25SJason Evans    + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
472b7eaed25SJason Evans    + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
473b7eaed25SJason Evans    + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
474b7eaed25SJason Evans    + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
475b7eaed25SJason Evans
476b7eaed25SJason Evans  Bug fixes:
477b7eaed25SJason Evans  - Improve interval-based profile dump triggering to dump only one profile when
478b7eaed25SJason Evans    a single allocation's size exceeds the interval.  (@jasone)
479b7eaed25SJason Evans  - Use prefixed function names (as controlled by --with-jemalloc-prefix) when
480b7eaed25SJason Evans    pruning backtrace frames in jeprof.  (@jasone)
481b7eaed25SJason Evans
4828244f2aaSJason Evans* 4.5.0 (February 28, 2017)
4838244f2aaSJason Evans
4848244f2aaSJason Evans  This is the first release to benefit from much broader continuous integration
4858244f2aaSJason Evans  testing, thanks to @davidtgoldblatt.  Had we had this testing infrastructure
4868244f2aaSJason Evans  in place for prior releases, it would have caught all of the most serious
4878244f2aaSJason Evans  regressions fixed by this release.
4888244f2aaSJason Evans
4898244f2aaSJason Evans  New features:
490b7eaed25SJason Evans  - Add --disable-thp and the opt.thp mallctl to provide opt-out mechanisms for
4918244f2aaSJason Evans    transparent huge page integration.  (@jasone)
4928244f2aaSJason Evans  - Update zone allocator integration to work with macOS 10.12.  (@glandium)
4938244f2aaSJason Evans  - Restructure *CFLAGS configuration, so that CFLAGS behaves typically, and
4948244f2aaSJason Evans    EXTRA_CFLAGS provides a way to specify e.g. -Werror during building, but not
4958244f2aaSJason Evans    during configuration.  (@jasone, @ronawho)
4968244f2aaSJason Evans
4978244f2aaSJason Evans  Bug fixes:
4988244f2aaSJason Evans  - Fix DSS (sbrk(2)-based) allocation.  This regression was first released in
4998244f2aaSJason Evans    4.3.0.  (@jasone)
5008244f2aaSJason Evans  - Handle race in per size class utilization computation.  This functionality
5018244f2aaSJason Evans    was first released in 4.0.0.  (@interwq)
5028244f2aaSJason Evans  - Fix lock order reversal during gdump.  (@jasone)
503b7eaed25SJason Evans  - Fix/refactor tcache synchronization.  This regression was first released in
5048244f2aaSJason Evans    4.0.0.  (@jasone)
5058244f2aaSJason Evans  - Fix various JSON-formatted malloc_stats_print() bugs.  This functionality
5068244f2aaSJason Evans    was first released in 4.3.0.  (@jasone)
5078244f2aaSJason Evans  - Fix huge-aligned allocation.  This regression was first released in 4.4.0.
5088244f2aaSJason Evans    (@jasone)
5098244f2aaSJason Evans  - When transparent huge page integration is enabled, detect what state pages
5108244f2aaSJason Evans    start in according to the kernel's current operating mode, and only convert
5118244f2aaSJason Evans    arena chunks to non-huge during purging if that is not their initial state.
5128244f2aaSJason Evans    This functionality was first released in 4.4.0.  (@jasone)
5138244f2aaSJason Evans  - Fix lg_chunk clamping for the --enable-cache-oblivious --disable-fill case.
5148244f2aaSJason Evans    This regression was first released in 4.0.0.  (@jasone, @428desmo)
5158244f2aaSJason Evans  - Properly detect sparc64 when building for Linux.  (@glaubitz)
5168244f2aaSJason Evans
5177fa7f12fSJason Evans* 4.4.0 (December 3, 2016)
5187fa7f12fSJason Evans
5197fa7f12fSJason Evans  New features:
5207fa7f12fSJason Evans  - Add configure support for *-*-linux-android.  (@cferris1000, @jasone)
5217fa7f12fSJason Evans  - Add the --disable-syscall configure option, for use on systems that place
5227fa7f12fSJason Evans    security-motivated limitations on syscall(2).  (@jasone)
5237fa7f12fSJason Evans  - Add support for Debian GNU/kFreeBSD.  (@thesam)
5247fa7f12fSJason Evans
5257fa7f12fSJason Evans  Optimizations:
5267fa7f12fSJason Evans  - Add extent serial numbers and use them where appropriate as a sort key that
5277fa7f12fSJason Evans    is higher priority than address, so that the allocation policy prefers older
5287fa7f12fSJason Evans    extents.  This tends to improve locality (decrease fragmentation) when
5297fa7f12fSJason Evans    memory grows downward.  (@jasone)
5307fa7f12fSJason Evans  - Refactor madvise(2) configuration so that MADV_FREE is detected and utilized
5317fa7f12fSJason Evans    on Linux 4.5 and newer.  (@jasone)
5327fa7f12fSJason Evans  - Mark partially purged arena chunks as non-huge-page.  This improves
5337fa7f12fSJason Evans    interaction with Linux's transparent huge page functionality.  (@jasone)
5347fa7f12fSJason Evans
5357fa7f12fSJason Evans  Bug fixes:
5367fa7f12fSJason Evans  - Fix size class computations for edge conditions involving extremely large
5377fa7f12fSJason Evans    allocations.  This regression was first released in 4.0.0.  (@jasone,
5387fa7f12fSJason Evans    @ingvarha)
5397fa7f12fSJason Evans  - Remove overly restrictive assertions related to the cactive statistic.  This
5407fa7f12fSJason Evans    regression was first released in 4.1.0.  (@jasone)
5417fa7f12fSJason Evans  - Implement a more reliable detection scheme for os_unfair_lock on macOS.
5427fa7f12fSJason Evans    (@jszakmeister)
5437fa7f12fSJason Evans
544bde95144SJason Evans* 4.3.1 (November 7, 2016)
545bde95144SJason Evans
546bde95144SJason Evans  Bug fixes:
547bde95144SJason Evans  - Fix a severe virtual memory leak.  This regression was first released in
548bde95144SJason Evans    4.3.0.  (@interwq, @jasone)
549bde95144SJason Evans  - Refactor atomic and prng APIs to restore support for 32-bit platforms that
550bde95144SJason Evans    use pre-C11 toolchains, e.g. FreeBSD's mips.  (@jasone)
551bde95144SJason Evans
552bde95144SJason Evans* 4.3.0 (November 4, 2016)
553bde95144SJason Evans
554bde95144SJason Evans  This is the first release that passes the test suite for multiple Windows
555bde95144SJason Evans  configurations, thanks in large part to @glandium setting up continuous
556bde95144SJason Evans  integration via AppVeyor (and Travis CI for Linux and OS X).
557bde95144SJason Evans
558bde95144SJason Evans  New features:
559bde95144SJason Evans  - Add "J" (JSON) support to malloc_stats_print().  (@jasone)
560bde95144SJason Evans  - Add Cray compiler support.  (@ronawho)
561bde95144SJason Evans
562bde95144SJason Evans  Optimizations:
563bde95144SJason Evans  - Add/use adaptive spinning for bootstrapping and radix tree node
564bde95144SJason Evans    initialization.  (@jasone)
565bde95144SJason Evans
566bde95144SJason Evans  Bug fixes:
567bde95144SJason Evans  - Fix large allocation to search starting in the optimal size class heap,
568bde95144SJason Evans    which can substantially reduce virtual memory churn and fragmentation.  This
569bde95144SJason Evans    regression was first released in 4.0.0.  (@mjp41, @jasone)
570bde95144SJason Evans  - Fix stats.arenas.<i>.nthreads accounting.  (@interwq)
571bde95144SJason Evans  - Fix and simplify decay-based purging.  (@jasone)
572bde95144SJason Evans  - Make DSS (sbrk(2)-related) operations lockless, which resolves potential
573bde95144SJason Evans    deadlocks during thread exit.  (@jasone)
574bde95144SJason Evans  - Fix over-sized allocation of radix tree leaf nodes.  (@mjp41, @ogaun,
575bde95144SJason Evans    @jasone)
576bde95144SJason Evans  - Fix over-sized allocation of arena_t (plus associated stats) data
577bde95144SJason Evans    structures.  (@jasone, @interwq)
578bde95144SJason Evans  - Fix EXTRA_CFLAGS to not affect configuration.  (@jasone)
579bde95144SJason Evans  - Fix a Valgrind integration bug.  (@ronawho)
580bde95144SJason Evans  - Disallow 0x5a junk filling when running in Valgrind.  (@jasone)
581bde95144SJason Evans  - Fix a file descriptor leak on Linux.  This regression was first released in
582bde95144SJason Evans    4.2.0.  (@vsarunas, @jasone)
583bde95144SJason Evans  - Fix static linking of jemalloc with glibc.  (@djwatson)
584bde95144SJason Evans  - Use syscall(2) rather than {open,read,close}(2) during boot on Linux.  This
585bde95144SJason Evans    works around other libraries' system call wrappers performing reentrant
586bde95144SJason Evans    allocation.  (@kspinka, @Whissi, @jasone)
587bde95144SJason Evans  - Fix OS X default zone replacement to work with OS X 10.12.  (@glandium,
588bde95144SJason Evans    @jasone)
589bde95144SJason Evans  - Fix cached memory management to avoid needless commit/decommit operations
590bde95144SJason Evans    during purging, which resolves permanent virtual memory map fragmentation
591bde95144SJason Evans    issues on Windows.  (@mjp41, @jasone)
592bde95144SJason Evans  - Fix TSD fetches to avoid (recursive) allocation.  This is relevant to
593bde95144SJason Evans    non-TLS and Windows configurations.  (@jasone)
594bde95144SJason Evans  - Fix malloc_conf overriding to work on Windows.  (@jasone)
595bde95144SJason Evans  - Forcibly disable lazy-lock on Windows (was forcibly *enabled*).  (@jasone)
596bde95144SJason Evans
59762b2691eSJason Evans* 4.2.1 (June 8, 2016)
59862b2691eSJason Evans
59962b2691eSJason Evans  Bug fixes:
60062b2691eSJason Evans  - Fix bootstrapping issues for configurations that require allocation during
60162b2691eSJason Evans    tsd initialization (e.g. --disable-tls).  (@cferris1000, @jasone)
60262b2691eSJason Evans  - Fix gettimeofday() version of nstime_update().  (@ronawho)
60362b2691eSJason Evans  - Fix Valgrind regressions in calloc() and chunk_alloc_wrapper().  (@ronawho)
60462b2691eSJason Evans  - Fix potential VM map fragmentation regression.  (@jasone)
60562b2691eSJason Evans  - Fix opt_zero-triggered in-place huge reallocation zeroing.  (@jasone)
60662b2691eSJason Evans  - Fix heap profiling context leaks in reallocation edge cases.  (@jasone)
60762b2691eSJason Evans
6081f0a49e8SJason Evans* 4.2.0 (May 12, 2016)
6091f0a49e8SJason Evans
6101f0a49e8SJason Evans  New features:
6111f0a49e8SJason Evans  - Add the arena.<i>.reset mallctl, which makes it possible to discard all of
6121f0a49e8SJason Evans    an arena's allocations in a single operation.  (@jasone)
6131f0a49e8SJason Evans  - Add the stats.retained and stats.arenas.<i>.retained statistics.  (@jasone)
6141f0a49e8SJason Evans  - Add the --with-version configure option.  (@jasone)
6151f0a49e8SJason Evans  - Support --with-lg-page values larger than actual page size.  (@jasone)
6161f0a49e8SJason Evans
6171f0a49e8SJason Evans  Optimizations:
6181f0a49e8SJason Evans  - Use pairing heaps rather than red-black trees for various hot data
6191f0a49e8SJason Evans    structures.  (@djwatson, @jasone)
6201f0a49e8SJason Evans  - Streamline fast paths of rtree operations.  (@jasone)
6211f0a49e8SJason Evans  - Optimize the fast paths of calloc() and [m,d,sd]allocx().  (@jasone)
6221f0a49e8SJason Evans  - Decommit unused virtual memory if the OS does not overcommit.  (@jasone)
6231f0a49e8SJason Evans  - Specify MAP_NORESERVE on Linux if [heuristic] overcommit is active, in order
6241f0a49e8SJason Evans    to avoid unfortunate interactions during fork(2).  (@jasone)
6251f0a49e8SJason Evans
6261f0a49e8SJason Evans  Bug fixes:
6271f0a49e8SJason Evans  - Fix chunk accounting related to triggering gdump profiles.  (@jasone)
6281f0a49e8SJason Evans  - Link against librt for clock_gettime(2) if glibc < 2.17.  (@jasone)
6291f0a49e8SJason Evans  - Scale leak report summary according to sampling probability.  (@jasone)
6301f0a49e8SJason Evans
6311f0a49e8SJason Evans* 4.1.1 (May 3, 2016)
6321f0a49e8SJason Evans
6331f0a49e8SJason Evans  This bugfix release resolves a variety of mostly minor issues, though the
6341f0a49e8SJason Evans  bitmap fix is critical for 64-bit Windows.
6351f0a49e8SJason Evans
6361f0a49e8SJason Evans  Bug fixes:
6371f0a49e8SJason Evans  - Fix the linear scan version of bitmap_sfu() to shift by the proper amount
6381f0a49e8SJason Evans    even when sizeof(long) is not the same as sizeof(void *), as on 64-bit
6391f0a49e8SJason Evans    Windows.  (@jasone)
6401f0a49e8SJason Evans  - Fix hashing functions to avoid unaligned memory accesses (and resulting
6411f0a49e8SJason Evans    crashes).  This is relevant at least to some ARM-based platforms.
6421f0a49e8SJason Evans    (@rkmisra)
6431f0a49e8SJason Evans  - Fix fork()-related lock rank ordering reversals.  These reversals were
6441f0a49e8SJason Evans    unlikely to cause deadlocks in practice except when heap profiling was
6451f0a49e8SJason Evans    enabled and active.  (@jasone)
6461f0a49e8SJason Evans  - Fix various chunk leaks in OOM code paths.  (@jasone)
6471f0a49e8SJason Evans  - Fix malloc_stats_print() to print opt.narenas correctly.  (@jasone)
6481f0a49e8SJason Evans  - Fix MSVC-specific build/test issues.  (@rustyx, @yuslepukhin)
6491f0a49e8SJason Evans  - Fix a variety of test failures that were due to test fragility rather than
6501f0a49e8SJason Evans    core bugs.  (@jasone)
6511f0a49e8SJason Evans
652df0d881dSJason Evans* 4.1.0 (February 28, 2016)
653df0d881dSJason Evans
654df0d881dSJason Evans  This release is primarily about optimizations, but it also incorporates a lot
655df0d881dSJason Evans  of portability-motivated refactoring and enhancements.  Many people worked on
656df0d881dSJason Evans  this release, to an extent that even with the omission here of minor changes
657df0d881dSJason Evans  (see git revision history), and of the people who reported and diagnosed
658df0d881dSJason Evans  issues, so much of the work was contributed that starting with this release,
659df0d881dSJason Evans  changes are annotated with author credits to help reflect the collaborative
660df0d881dSJason Evans  effort involved.
661df0d881dSJason Evans
662df0d881dSJason Evans  New features:
663df0d881dSJason Evans  - Implement decay-based unused dirty page purging, a major optimization with
664df0d881dSJason Evans    mallctl API impact.  This is an alternative to the existing ratio-based
665df0d881dSJason Evans    unused dirty page purging, and is intended to eventually become the sole
666df0d881dSJason Evans    purging mechanism.  New mallctls:
667df0d881dSJason Evans    + opt.purge
668df0d881dSJason Evans    + opt.decay_time
669df0d881dSJason Evans    + arena.<i>.decay
670df0d881dSJason Evans    + arena.<i>.decay_time
671df0d881dSJason Evans    + arenas.decay_time
672df0d881dSJason Evans    + stats.arenas.<i>.decay_time
673df0d881dSJason Evans    (@jasone, @cevans87)
674df0d881dSJason Evans  - Add --with-malloc-conf, which makes it possible to embed a default
675df0d881dSJason Evans    options string during configuration.  This was motivated by the desire to
676df0d881dSJason Evans    specify --with-malloc-conf=purge:decay , since the default must remain
677df0d881dSJason Evans    purge:ratio until the 5.0.0 release.  (@jasone)
678df0d881dSJason Evans  - Add MS Visual Studio 2015 support.  (@rustyx, @yuslepukhin)
679df0d881dSJason Evans  - Make *allocx() size class overflow behavior defined.  The maximum
680df0d881dSJason Evans    size class is now less than PTRDIFF_MAX to protect applications against
681df0d881dSJason Evans    numerical overflow, and all allocation functions are guaranteed to indicate
682df0d881dSJason Evans    errors rather than potentially crashing if the request size exceeds the
683df0d881dSJason Evans    maximum size class.  (@jasone)
684df0d881dSJason Evans  - jeprof:
685df0d881dSJason Evans    + Add raw heap profile support.  (@jasone)
686df0d881dSJason Evans    + Add --retain and --exclude for backtrace symbol filtering.  (@jasone)
687df0d881dSJason Evans
688df0d881dSJason Evans  Optimizations:
689df0d881dSJason Evans  - Optimize the fast path to combine various bootstrapping and configuration
690df0d881dSJason Evans    checks and execute more streamlined code in the common case.  (@interwq)
691df0d881dSJason Evans  - Use linear scan for small bitmaps (used for small object tracking).  In
692df0d881dSJason Evans    addition to speeding up bitmap operations on 64-bit systems, this reduces
693df0d881dSJason Evans    allocator metadata overhead by approximately 0.2%.  (@djwatson)
694df0d881dSJason Evans  - Separate arena_avail trees, which substantially speeds up run tree
695df0d881dSJason Evans    operations.  (@djwatson)
696df0d881dSJason Evans  - Use memoization (boot-time-computed table) for run quantization.  Separate
697df0d881dSJason Evans    arena_avail trees reduced the importance of this optimization.  (@jasone)
698df0d881dSJason Evans  - Attempt mmap-based in-place huge reallocation.  This can dramatically speed
699df0d881dSJason Evans    up incremental huge reallocation.  (@jasone)
700df0d881dSJason Evans
701df0d881dSJason Evans  Incompatible changes:
702df0d881dSJason Evans  - Make opt.narenas unsigned rather than size_t.  (@jasone)
703df0d881dSJason Evans
704df0d881dSJason Evans  Bug fixes:
705df0d881dSJason Evans  - Fix stats.cactive accounting regression.  (@rustyx, @jasone)
706df0d881dSJason Evans  - Handle unaligned keys in hash().  This caused problems for some ARM systems.
7071f0a49e8SJason Evans    (@jasone, @cferris1000)
708df0d881dSJason Evans  - Refactor arenas array.  In addition to fixing a fork-related deadlock, this
709df0d881dSJason Evans    makes arena lookups faster and simpler.  (@jasone)
710df0d881dSJason Evans  - Move retained memory allocation out of the default chunk allocation
711df0d881dSJason Evans    function, to a location that gets executed even if the application installs
712df0d881dSJason Evans    a custom chunk allocation function.  This resolves a virtual memory leak.
713df0d881dSJason Evans    (@buchgr)
7141f0a49e8SJason Evans  - Fix a potential tsd cleanup leak.  (@cferris1000, @jasone)
715df0d881dSJason Evans  - Fix run quantization.  In practice this bug had no impact unless
716df0d881dSJason Evans    applications requested memory with alignment exceeding one page.
717df0d881dSJason Evans    (@jasone, @djwatson)
718df0d881dSJason Evans  - Fix LinuxThreads-specific bootstrapping deadlock.  (Cosmin Paraschiv)
719df0d881dSJason Evans  - jeprof:
720df0d881dSJason Evans    + Don't discard curl options if timeout is not defined.  (@djwatson)
721df0d881dSJason Evans    + Detect failed profile fetches.  (@djwatson)
722df0d881dSJason Evans  - Fix stats.arenas.<i>.{dss,lg_dirty_mult,decay_time,pactive,pdirty} for
723df0d881dSJason Evans    --disable-stats case.  (@jasone)
724df0d881dSJason Evans
725ba4f5cc0SJason Evans* 4.0.4 (October 24, 2015)
726ba4f5cc0SJason Evans
727ba4f5cc0SJason Evans  This bugfix release fixes another xallocx() regression.  No other regressions
728ba4f5cc0SJason Evans  have come to light in over a month, so this is likely a good starting point
729ba4f5cc0SJason Evans  for people who prefer to wait for "dot one" releases with all the major issues
730ba4f5cc0SJason Evans  shaken out.
731ba4f5cc0SJason Evans
732ba4f5cc0SJason Evans  Bug fixes:
733ba4f5cc0SJason Evans  - Fix xallocx(..., MALLOCX_ZERO to zero the last full trailing page of large
734ba4f5cc0SJason Evans    allocations that have been randomly assigned an offset of 0 when
735ba4f5cc0SJason Evans    --enable-cache-oblivious configure option is enabled.
736ba4f5cc0SJason Evans
737ba4f5cc0SJason Evans* 4.0.3 (September 24, 2015)
738ba4f5cc0SJason Evans
739ba4f5cc0SJason Evans  This bugfix release continues the trend of xallocx() and heap profiling fixes.
740ba4f5cc0SJason Evans
741ba4f5cc0SJason Evans  Bug fixes:
742ba4f5cc0SJason Evans  - Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large
743ba4f5cc0SJason Evans    allocations when --enable-cache-oblivious configure option is enabled.
744ba4f5cc0SJason Evans  - Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations
745ba4f5cc0SJason Evans    when resizing from/to a size class that is not a multiple of the chunk size.
746ba4f5cc0SJason Evans  - Fix prof_tctx_dump_iter() to filter out nodes that were created after heap
747ba4f5cc0SJason Evans    profile dumping started.
748ba4f5cc0SJason Evans  - Work around a potentially bad thread-specific data initialization
749ba4f5cc0SJason Evans    interaction with NPTL (glibc's pthreads implementation).
750ba4f5cc0SJason Evans
751536b3538SJason Evans* 4.0.2 (September 21, 2015)
752536b3538SJason Evans
753536b3538SJason Evans  This bugfix release addresses a few bugs specific to heap profiling.
754536b3538SJason Evans
755536b3538SJason Evans  Bug fixes:
756536b3538SJason Evans  - Fix ixallocx_prof_sample() to never modify nor create sampled small
757536b3538SJason Evans    allocations.  xallocx() is in general incapable of moving small allocations,
758536b3538SJason Evans    so this fix removes buggy code without loss of generality.
759536b3538SJason Evans  - Fix irallocx_prof_sample() to always allocate large regions, even when
760536b3538SJason Evans    alignment is non-zero.
761536b3538SJason Evans  - Fix prof_alloc_rollback() to read tdata from thread-specific data rather
762536b3538SJason Evans    than dereferencing a potentially invalid tctx.
763536b3538SJason Evans
764536b3538SJason Evans* 4.0.1 (September 15, 2015)
765536b3538SJason Evans
766536b3538SJason Evans  This is a bugfix release that is somewhat high risk due to the amount of
767536b3538SJason Evans  refactoring required to address deep xallocx() problems.  As a side effect of
768536b3538SJason Evans  these fixes, xallocx() now tries harder to partially fulfill requests for
769536b3538SJason Evans  optional extra space.  Note that a couple of minor heap profiling
770536b3538SJason Evans  optimizations are included, but these are better thought of as performance
7710ef50b4eSJason Evans  fixes that were integral to discovering most of the other bugs.
772536b3538SJason Evans
773536b3538SJason Evans  Optimizations:
774536b3538SJason Evans  - Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the
775536b3538SJason Evans    fast path when heap profiling is enabled.  Additionally, split a special
776536b3538SJason Evans    case out into arena_prof_tctx_reset(), which also avoids chunk metadata
777536b3538SJason Evans    reads.
778536b3538SJason Evans  - Optimize irallocx_prof() to optimistically update the sampler state.  The
779536b3538SJason Evans    prior implementation appears to have been a holdover from when
780536b3538SJason Evans    rallocx()/xallocx() functionality was combined as rallocm().
781536b3538SJason Evans
782536b3538SJason Evans  Bug fixes:
783536b3538SJason Evans  - Fix TLS configuration such that it is enabled by default for platforms on
784536b3538SJason Evans    which it works correctly.
785536b3538SJason Evans  - Fix arenas_cache_cleanup() and arena_get_hard() to handle
786536b3538SJason Evans    allocation/deallocation within the application's thread-specific data
787536b3538SJason Evans    cleanup functions even after arenas_cache is torn down.
788536b3538SJason Evans  - Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS.
789536b3538SJason Evans  - Fix chunk purge hook calls for in-place huge shrinking reallocation to
790536b3538SJason Evans    specify the old chunk size rather than the new chunk size.  This bug caused
791536b3538SJason Evans    no correctness issues for the default chunk purge function, but was
792536b3538SJason Evans    visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl.
793536b3538SJason Evans  - Fix heap profiling bugs:
794536b3538SJason Evans    + Fix heap profiling to distinguish among otherwise identical sample sites
795536b3538SJason Evans      with interposed resets (triggered via the "prof.reset" mallctl).  This bug
796536b3538SJason Evans      could cause data structure corruption that would most likely result in a
797536b3538SJason Evans      segfault.
798536b3538SJason Evans    + Fix irealloc_prof() to prof_alloc_rollback() on OOM.
799536b3538SJason Evans    + Make one call to prof_active_get_unlocked() per allocation event, and use
800536b3538SJason Evans      the result throughout the relevant functions that handle an allocation
801536b3538SJason Evans      event.  Also add a missing check in prof_realloc().  These fixes protect
802536b3538SJason Evans      allocation events against concurrent prof_active changes.
803536b3538SJason Evans    + Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample()
804536b3538SJason Evans      in the correct order.
805536b3538SJason Evans    + Fix prof_realloc() to call prof_free_sampled_object() after calling
806536b3538SJason Evans      prof_malloc_sample_object().  Prior to this fix, if tctx and old_tctx were
807536b3538SJason Evans      the same, the tctx could have been prematurely destroyed.
808536b3538SJason Evans  - Fix portability bugs:
809536b3538SJason Evans    + Don't bitshift by negative amounts when encoding/decoding run sizes in
810536b3538SJason Evans      chunk header maps.  This affected systems with page sizes greater than 8
811536b3538SJason Evans      KiB.
812536b3538SJason Evans    + Rename index_t to szind_t to avoid an existing type on Solaris.
813536b3538SJason Evans    + Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to
814536b3538SJason Evans      match glibc and avoid compilation errors when including both
815536b3538SJason Evans      jemalloc/jemalloc.h and malloc.h in C++ code.
816536b3538SJason Evans    + Don't assume that /bin/sh is appropriate when running size_classes.sh
817536b3538SJason Evans      during configuration.
818536b3538SJason Evans    + Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM.
819536b3538SJason Evans    + Link tests to librt if it contains clock_gettime(2).
820536b3538SJason Evans
821d0e79aa3SJason Evans* 4.0.0 (August 17, 2015)
822d0e79aa3SJason Evans
823d0e79aa3SJason Evans  This version contains many speed and space optimizations, both minor and
824d0e79aa3SJason Evans  major.  The major themes are generalization, unification, and simplification.
825d0e79aa3SJason Evans  Although many of these optimizations cause no visible behavior change, their
826d0e79aa3SJason Evans  cumulative effect is substantial.
827d0e79aa3SJason Evans
828d0e79aa3SJason Evans  New features:
829d0e79aa3SJason Evans  - Normalize size class spacing to be consistent across the complete size
830d0e79aa3SJason Evans    range.  By default there are four size classes per size doubling, but this
831d0e79aa3SJason Evans    is now configurable via the --with-lg-size-class-group option.  Also add the
832d0e79aa3SJason Evans    --with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
833d0e79aa3SJason Evans    --with-lg-tiny-min options, which can be used to tweak page and size class
834d0e79aa3SJason Evans    settings.  Impacts:
835d0e79aa3SJason Evans    + Worst case performance for incrementally growing/shrinking reallocation
836d0e79aa3SJason Evans      is improved because there are far fewer size classes, and therefore
837d0e79aa3SJason Evans      copying happens less often.
838d0e79aa3SJason Evans    + Internal fragmentation is limited to 20% for all but the smallest size
839d0e79aa3SJason Evans      classes (those less than four times the quantum).  (1B + 4 KiB)
840d0e79aa3SJason Evans      and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
841d0e79aa3SJason Evans    + Chunk fragmentation tends to be lower because there are fewer distinct run
842d0e79aa3SJason Evans      sizes to pack.
843d0e79aa3SJason Evans  - Add support for explicit tcaches.  The "tcache.create", "tcache.flush", and
844d0e79aa3SJason Evans    "tcache.destroy" mallctls control tcache lifetime and flushing, and the
845d0e79aa3SJason Evans    MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
846d0e79aa3SJason Evans    control which tcache is used for each operation.
847d0e79aa3SJason Evans  - Implement per thread heap profiling, as well as the ability to
848d0e79aa3SJason Evans    enable/disable heap profiling on a per thread basis.  Add the "prof.reset",
849d0e79aa3SJason Evans    "prof.lg_sample", "thread.prof.name", "thread.prof.active",
850d0e79aa3SJason Evans    "opt.prof_thread_active_init", "prof.thread_active_init", and
851d0e79aa3SJason Evans    "thread.prof.active" mallctls.
852d0e79aa3SJason Evans  - Add support for per arena application-specified chunk allocators, configured
853d0e79aa3SJason Evans    via the "arena.<i>.chunk_hooks" mallctl.
854d0e79aa3SJason Evans  - Refactor huge allocation to be managed by arenas, so that arenas now
855d0e79aa3SJason Evans    function as general purpose independent allocators.  This is important in
856d0e79aa3SJason Evans    the context of user-specified chunk allocators, aside from the scalability
857d0e79aa3SJason Evans    benefits.  Related new statistics:
858d0e79aa3SJason Evans    + The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
859d0e79aa3SJason Evans      "stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
860d0e79aa3SJason Evans      mallctls provide high level per arena huge allocation statistics.
861d0e79aa3SJason Evans    + The "arenas.nhchunks", "arenas.hchunk.<i>.size",
862d0e79aa3SJason Evans      "stats.arenas.<i>.hchunks.<j>.nmalloc",
863d0e79aa3SJason Evans      "stats.arenas.<i>.hchunks.<j>.ndalloc",
864d0e79aa3SJason Evans      "stats.arenas.<i>.hchunks.<j>.nrequests", and
865d0e79aa3SJason Evans      "stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
866d0e79aa3SJason Evans      statistics.
867d0e79aa3SJason Evans  - Add the 'util' column to malloc_stats_print() output, which reports the
868d0e79aa3SJason Evans    proportion of available regions that are currently in use for each small
869d0e79aa3SJason Evans    size class.
870d0e79aa3SJason Evans  - Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
871d0e79aa3SJason Evans    mallctl), so that it is possible to separately enable junk filling for
872d0e79aa3SJason Evans    allocation versus deallocation.
873d0e79aa3SJason Evans  - Add the jemalloc-config script, which provides information about how
874d0e79aa3SJason Evans    jemalloc was configured, and how to integrate it into application builds.
875d0e79aa3SJason Evans  - Add metadata statistics, which are accessible via the "stats.metadata",
876d0e79aa3SJason Evans    "stats.arenas.<i>.metadata.mapped", and
877d0e79aa3SJason Evans    "stats.arenas.<i>.metadata.allocated" mallctls.
878d0e79aa3SJason Evans  - Add the "stats.resident" mallctl, which reports the upper limit of
879d0e79aa3SJason Evans    physically resident memory mapped by the allocator.
880d0e79aa3SJason Evans  - Add per arena control over unused dirty page purging, via the
881d0e79aa3SJason Evans    "arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and
882d0e79aa3SJason Evans    "stats.arenas.<i>.lg_dirty_mult" mallctls.
883d0e79aa3SJason Evans  - Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
884d0e79aa3SJason Evans    feature on/off during program execution.
885d0e79aa3SJason Evans  - Add sdallocx(), which implements sized deallocation.  The primary
886d0e79aa3SJason Evans    optimization over dallocx() is the removal of a metadata read, which often
887d0e79aa3SJason Evans    suffers an L1 cache miss.
888d0e79aa3SJason Evans  - Add missing header includes in jemalloc/jemalloc.h, so that applications
889d0e79aa3SJason Evans    only have to #include <jemalloc/jemalloc.h>.
890d0e79aa3SJason Evans  - Add support for additional platforms:
891d0e79aa3SJason Evans    + Bitrig
892d0e79aa3SJason Evans    + Cygwin
893d0e79aa3SJason Evans    + DragonFlyBSD
894d0e79aa3SJason Evans    + iOS
895d0e79aa3SJason Evans    + OpenBSD
896d0e79aa3SJason Evans    + OpenRISC/or1k
897d0e79aa3SJason Evans
898d0e79aa3SJason Evans  Optimizations:
899d0e79aa3SJason Evans  - Maintain dirty runs in per arena LRUs rather than in per arena trees of
900d0e79aa3SJason Evans    dirty-run-containing chunks.  In practice this change significantly reduces
901d0e79aa3SJason Evans    dirty page purging volume.
902d0e79aa3SJason Evans  - Integrate whole chunks into the unused dirty page purging machinery.  This
903d0e79aa3SJason Evans    reduces the cost of repeated huge allocation/deallocation, because it
904d0e79aa3SJason Evans    effectively introduces a cache of chunks.
905d0e79aa3SJason Evans  - Split the arena chunk map into two separate arrays, in order to increase
906d0e79aa3SJason Evans    cache locality for the frequently accessed bits.
907d0e79aa3SJason Evans  - Move small run metadata out of runs, into arena chunk headers.  This reduces
908d0e79aa3SJason Evans    run fragmentation, smaller runs reduce external fragmentation for small size
909d0e79aa3SJason Evans    classes, and packed (less uniformly aligned) metadata layout improves CPU
910d0e79aa3SJason Evans    cache set distribution.
911d0e79aa3SJason Evans  - Randomly distribute large allocation base pointer alignment relative to page
912d0e79aa3SJason Evans    boundaries in order to more uniformly utilize CPU cache sets.  This can be
913d0e79aa3SJason Evans    disabled via the --disable-cache-oblivious configure option, and queried via
914d0e79aa3SJason Evans    the "config.cache_oblivious" mallctl.
915d0e79aa3SJason Evans  - Micro-optimize the fast paths for the public API functions.
916d0e79aa3SJason Evans  - Refactor thread-specific data to reside in a single structure.  This assures
917d0e79aa3SJason Evans    that only a single TLS read is necessary per call into the public API.
918d0e79aa3SJason Evans  - Implement in-place huge allocation growing and shrinking.
919d0e79aa3SJason Evans  - Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
920d0e79aa3SJason Evans    additional optimizations that reduce maximum lookup depth to one or two
921d0e79aa3SJason Evans    levels.  This resolves what was a concurrency bottleneck for per arena huge
922d0e79aa3SJason Evans    allocation, because a global data structure is critical for determining
923d0e79aa3SJason Evans    which arenas own which huge allocations.
924d0e79aa3SJason Evans
925d0e79aa3SJason Evans  Incompatible changes:
926d0e79aa3SJason Evans  - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
927d0e79aa3SJason Evans    warnings by default.
928d0e79aa3SJason Evans  - Assure that the constness of malloc_usable_size()'s return type matches that
929d0e79aa3SJason Evans    of the system implementation.
930d0e79aa3SJason Evans  - Change the heap profile dump format to support per thread heap profiling,
931d0e79aa3SJason Evans    rename pprof to jeprof, and enhance it with the --thread=<n> option.  As a
932d0e79aa3SJason Evans    result, the bundled jeprof must now be used rather than the upstream
933d0e79aa3SJason Evans    (gperftools) pprof.
934d0e79aa3SJason Evans  - Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
935d0e79aa3SJason Evans    internally deadlock on some platforms.
936d0e79aa3SJason Evans  - Change the "arenas.nlruns" mallctl type from size_t to unsigned.
937d0e79aa3SJason Evans  - Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
938d0e79aa3SJason Evans    "stats.arenas.<i>.bins.<j>.curregs".
939d0e79aa3SJason Evans  - Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
940d0e79aa3SJason Evans  - Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
941d0e79aa3SJason Evans    MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.
942d0e79aa3SJason Evans
943d0e79aa3SJason Evans  Removed features:
944d0e79aa3SJason Evans  - Remove the *allocm() API, which is superseded by the *allocx() API.
945d0e79aa3SJason Evans  - Remove the --enable-dss options, and make dss non-optional on all platforms
946d0e79aa3SJason Evans    which support sbrk(2).
947d0e79aa3SJason Evans  - Remove the "arenas.purge" mallctl, which was obsoleted by the
948d0e79aa3SJason Evans    "arena.<i>.purge" mallctl in 3.1.0.
949d0e79aa3SJason Evans  - Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
950d0e79aa3SJason Evans    detects whether it is running inside Valgrind.
951d0e79aa3SJason Evans  - Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
952d0e79aa3SJason Evans    "stats.huge.ndalloc" mallctls.
953d0e79aa3SJason Evans  - Remove the --enable-mremap option.
954d0e79aa3SJason Evans  - Remove the "stats.chunks.current", "stats.chunks.total", and
955d0e79aa3SJason Evans    "stats.chunks.high" mallctls.
956d0e79aa3SJason Evans
957d0e79aa3SJason Evans  Bug fixes:
958d0e79aa3SJason Evans  - Fix the cactive statistic to decrease (rather than increase) when active
959d0e79aa3SJason Evans    memory decreases.  This regression was first released in 3.5.0.
960d0e79aa3SJason Evans  - Fix OOM handling in memalign() and valloc().  A variant of this bug existed
961d0e79aa3SJason Evans    in all releases since 2.0.0, which introduced these functions.
962d0e79aa3SJason Evans  - Fix an OOM-related regression in arena_tcache_fill_small(), which could
963d0e79aa3SJason Evans    cause cache corruption on OOM.  This regression was present in all releases
964d0e79aa3SJason Evans    from 2.2.0 through 3.6.0.
965d0e79aa3SJason Evans  - Fix size class overflow handling for malloc(), posix_memalign(), memalign(),
966d0e79aa3SJason Evans    calloc(), and realloc() when profiling is enabled.
967d0e79aa3SJason Evans  - Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
968d0e79aa3SJason Evans    "secondary" precedence is specified, but sbrk(2) is not supported.
969d0e79aa3SJason Evans  - Fix fallback lg_floor() implementations to handle extremely large inputs.
970d0e79aa3SJason Evans  - Ensure the default purgeable zone is after the default zone on OS X.
971d0e79aa3SJason Evans  - Fix latent bugs in atomic_*().
972d0e79aa3SJason Evans  - Fix the "arena.<i>.dss" mallctl to handle read-only calls.
973d0e79aa3SJason Evans  - Fix tls_model configuration to enable the initial-exec model when possible.
974d0e79aa3SJason Evans  - Mark malloc_conf as a weak symbol so that the application can override it.
975d0e79aa3SJason Evans  - Correctly detect glibc's adaptive pthread mutexes.
976d0e79aa3SJason Evans  - Fix the --without-export configure option.
977d0e79aa3SJason Evans
9782fff27f8SJason Evans* 3.6.0 (March 31, 2014)
9792fff27f8SJason Evans
9802fff27f8SJason Evans  This version contains a critical bug fix for a regression present in 3.5.0 and
9812fff27f8SJason Evans  3.5.1.
9822fff27f8SJason Evans
9832fff27f8SJason Evans  Bug fixes:
9842fff27f8SJason Evans  - Fix a regression in arena_chunk_alloc() that caused crashes during
9852fff27f8SJason Evans    small/large allocation if chunk allocation failed.  In the absence of this
9862fff27f8SJason Evans    bug, chunk allocation failure would result in allocation failure, e.g.  NULL
9872fff27f8SJason Evans    return from malloc().  This regression was introduced in 3.5.0.
9882fff27f8SJason Evans  - Fix backtracing for gcc intrinsics-based backtracing by specifying
9892fff27f8SJason Evans    -fno-omit-frame-pointer to gcc.  Note that the application (and all the
9902fff27f8SJason Evans    libraries it links to) must also be compiled with this option for
9912fff27f8SJason Evans    backtracing to be reliable.
9922fff27f8SJason Evans  - Use dss allocation precedence for huge allocations as well as small/large
9932fff27f8SJason Evans    allocations.
994d0e79aa3SJason Evans  - Fix test assertion failure message formatting.  This bug did not manifest on
9952fff27f8SJason Evans    x86_64 systems because of implementation subtleties in va_list.
9962fff27f8SJason Evans  - Fix inconsequential test failures for hash and SFMT code.
9972fff27f8SJason Evans
9982fff27f8SJason Evans  New features:
9992fff27f8SJason Evans  - Support heap profiling on FreeBSD.  This feature depends on the proc
10002fff27f8SJason Evans    filesystem being mounted during heap profile dumping.
10012fff27f8SJason Evans
1002706d9bd1SJason Evans* 3.5.1 (February 25, 2014)
1003706d9bd1SJason Evans
1004706d9bd1SJason Evans  This version primarily addresses minor bugs in test code.
1005706d9bd1SJason Evans
1006706d9bd1SJason Evans  Bug fixes:
1007706d9bd1SJason Evans  - Configure Solaris/Illumos to use MADV_FREE.
1008706d9bd1SJason Evans  - Fix junk filling for mremap(2)-based huge reallocation.  This is only
1009706d9bd1SJason Evans    relevant if configuring with the --enable-mremap option specified.
1010706d9bd1SJason Evans  - Avoid compilation failure if 'restrict' C99 keyword is not supported by the
1011706d9bd1SJason Evans    compiler.
1012706d9bd1SJason Evans  - Add a configure test for SSE2 rather than assuming it is usable on i686
1013706d9bd1SJason Evans    systems.  This fixes test compilation errors, especially on 32-bit Linux
1014706d9bd1SJason Evans    systems.
1015706d9bd1SJason Evans  - Fix mallctl argument size mismatches (size_t vs. uint64_t) in the stats unit
1016706d9bd1SJason Evans    test.
1017706d9bd1SJason Evans  - Fix/remove flawed alignment-related overflow tests.
1018706d9bd1SJason Evans  - Prevent compiler optimizations that could change backtraces in the
1019706d9bd1SJason Evans    prof_accum unit test.
1020a4bd5210SJason Evans
1021f921d10fSJason Evans* 3.5.0 (January 22, 2014)
1022f921d10fSJason Evans
1023f921d10fSJason Evans  This version focuses on refactoring and automated testing, though it also
1024f921d10fSJason Evans  includes some non-trivial heap profiling optimizations not mentioned below.
1025f921d10fSJason Evans
1026f921d10fSJason Evans  New features:
1027f921d10fSJason Evans  - Add the *allocx() API, which is a successor to the experimental *allocm()
1028f921d10fSJason Evans    API.  The *allocx() functions are slightly simpler to use because they have
1029f921d10fSJason Evans    fewer parameters, they directly return the results of primary interest, and
1030f921d10fSJason Evans    mallocx()/rallocx() avoid the strict aliasing pitfall that
1031706d9bd1SJason Evans    allocm()/rallocm() share with posix_memalign().  Note that *allocm() is
1032f921d10fSJason Evans    slated for removal in the next non-bugfix release.
1033f921d10fSJason Evans  - Add support for LinuxThreads.
1034f921d10fSJason Evans
1035f921d10fSJason Evans  Bug fixes:
1036f921d10fSJason Evans  - Unless heap profiling is enabled, disable floating point code and don't link
1037f921d10fSJason Evans    with libm.  This, in combination with e.g. EXTRA_CFLAGS=-mno-sse on x64
1038f921d10fSJason Evans    systems, makes it possible to completely disable floating point register
1039f921d10fSJason Evans    use.  Some versions of glibc neglect to save/restore caller-saved floating
1040f921d10fSJason Evans    point registers during dynamic lazy symbol loading, and the symbol loading
1041f921d10fSJason Evans    code uses whatever malloc the application happens to have linked/loaded
1042f921d10fSJason Evans    with, the result being potential floating point register corruption.
1043f921d10fSJason Evans  - Report ENOMEM rather than EINVAL if an OOM occurs during heap profiling
1044f921d10fSJason Evans    backtrace creation in imemalign().  This bug impacted posix_memalign() and
1045f921d10fSJason Evans    aligned_alloc().
1046f921d10fSJason Evans  - Fix a file descriptor leak in a prof_dump_maps() error path.
1047f921d10fSJason Evans  - Fix prof_dump() to close the dump file descriptor for all relevant error
1048f921d10fSJason Evans    paths.
1049f921d10fSJason Evans  - Fix rallocm() to use the arena specified by the ALLOCM_ARENA(s) flag for
1050f921d10fSJason Evans    allocation, not just deallocation.
1051f921d10fSJason Evans  - Fix a data race for large allocation stats counters.
1052f921d10fSJason Evans  - Fix a potential infinite loop during thread exit.  This bug occurred on
1053f921d10fSJason Evans    Solaris, and could affect other platforms with similar pthreads TSD
1054f921d10fSJason Evans    implementations.
1055f921d10fSJason Evans  - Don't junk-fill reallocations unless usable size changes.  This fixes a
1056f921d10fSJason Evans    violation of the *allocx()/*allocm() semantics.
1057f921d10fSJason Evans  - Fix growing large reallocation to junk fill new space.
1058f921d10fSJason Evans  - Fix huge deallocation to junk fill when munmap is disabled.
1059f921d10fSJason Evans  - Change the default private namespace prefix from empty to je_, and change
1060f921d10fSJason Evans    --with-private-namespace-prefix so that it prepends an additional prefix
1061f921d10fSJason Evans    rather than replacing je_.  This reduces the likelihood of applications
1062f921d10fSJason Evans    which statically link jemalloc experiencing symbol name collisions.
1063f921d10fSJason Evans  - Add missing private namespace mangling (relevant when
1064f921d10fSJason Evans    --with-private-namespace is specified).
1065f921d10fSJason Evans  - Add and use JEMALLOC_INLINE_C so that static inline functions are marked as
1066f921d10fSJason Evans    static even for debug builds.
1067f921d10fSJason Evans  - Add a missing mutex unlock in a malloc_init_hard() error path.  In practice
1068f921d10fSJason Evans    this error path is never executed.
1069f921d10fSJason Evans  - Fix numerous bugs in malloc_strotumax() error handling/reporting.  These
1070f921d10fSJason Evans    bugs had no impact except for malformed inputs.
1071f921d10fSJason Evans  - Fix numerous bugs in malloc_snprintf().  These bugs were not exercised by
1072f921d10fSJason Evans    existing calls, so they had no impact.
1073f921d10fSJason Evans
10742b06b201SJason Evans* 3.4.1 (October 20, 2013)
10752b06b201SJason Evans
10762b06b201SJason Evans  Bug fixes:
10772b06b201SJason Evans  - Fix a race in the "arenas.extend" mallctl that could cause memory corruption
10782b06b201SJason Evans    of internal data structures and subsequent crashes.
10792b06b201SJason Evans  - Fix Valgrind integration flaws that caused Valgrind warnings about reads of
10802b06b201SJason Evans    uninitialized memory in:
10812b06b201SJason Evans    + arena chunk headers
10822b06b201SJason Evans    + internal zero-initialized data structures (relevant to tcache and prof
10832b06b201SJason Evans      code)
10842b06b201SJason Evans  - Preserve errno during the first allocation.  A readlink(2) call during
10852b06b201SJason Evans    initialization fails unless /etc/malloc.conf exists, so errno was typically
10862b06b201SJason Evans    set during the first allocation prior to this fix.
10872b06b201SJason Evans  - Fix compilation warnings reported by gcc 4.8.1.
10882b06b201SJason Evans
1089f8ca2db1SJason Evans* 3.4.0 (June 2, 2013)
1090f8ca2db1SJason Evans
1091f8ca2db1SJason Evans  This version is essentially a small bugfix release, but the addition of
1092f8ca2db1SJason Evans  aarch64 support requires that the minor version be incremented.
1093f8ca2db1SJason Evans
1094f8ca2db1SJason Evans  Bug fixes:
1095f8ca2db1SJason Evans  - Fix race-triggered deadlocks in chunk_record().  These deadlocks were
1096f8ca2db1SJason Evans    typically triggered by multiple threads concurrently deallocating huge
1097f8ca2db1SJason Evans    objects.
1098f8ca2db1SJason Evans
1099f8ca2db1SJason Evans  New features:
1100f8ca2db1SJason Evans  - Add support for the aarch64 architecture.
1101f8ca2db1SJason Evans
1102f8ca2db1SJason Evans* 3.3.1 (March 6, 2013)
1103f8ca2db1SJason Evans
1104f8ca2db1SJason Evans  This version fixes bugs that are typically encountered only when utilizing
1105f8ca2db1SJason Evans  custom run-time options.
1106f8ca2db1SJason Evans
1107f8ca2db1SJason Evans  Bug fixes:
1108f8ca2db1SJason Evans  - Fix a locking order bug that could cause deadlock during fork if heap
1109f8ca2db1SJason Evans    profiling were enabled.
1110f8ca2db1SJason Evans  - Fix a chunk recycling bug that could cause the allocator to lose track of
1111f8ca2db1SJason Evans    whether a chunk was zeroed.  On FreeBSD, NetBSD, and OS X, it could cause
1112f8ca2db1SJason Evans    corruption if allocating via sbrk(2) (unlikely unless running with the
1113f8ca2db1SJason Evans    "dss:primary" option specified).  This was completely harmless on Linux
1114f8ca2db1SJason Evans    unless using mlockall(2) (and unlikely even then, unless the
1115f8ca2db1SJason Evans    --disable-munmap configure option or the "dss:primary" option was
1116f8ca2db1SJason Evans    specified).  This regression was introduced in 3.1.0 by the
1117f8ca2db1SJason Evans    mlockall(2)/madvise(2) interaction fix.
1118f8ca2db1SJason Evans  - Fix TLS-related memory corruption that could occur during thread exit if the
1119f8ca2db1SJason Evans    thread never allocated memory.  Only the quarantine and prof facilities were
1120f8ca2db1SJason Evans    susceptible.
1121f8ca2db1SJason Evans  - Fix two quarantine bugs:
1122f8ca2db1SJason Evans    + Internal reallocation of the quarantined object array leaked the old
1123f8ca2db1SJason Evans      array.
1124f8ca2db1SJason Evans    + Reallocation failure for internal reallocation of the quarantined object
1125f8ca2db1SJason Evans      array (very unlikely) resulted in memory corruption.
1126f8ca2db1SJason Evans  - Fix Valgrind integration to annotate all internally allocated memory in a
1127f8ca2db1SJason Evans    way that keeps Valgrind happy about internal data structure access.
1128f8ca2db1SJason Evans  - Fix building for s390 systems.
1129f8ca2db1SJason Evans
113088ad2f8dSJason Evans* 3.3.0 (January 23, 2013)
113188ad2f8dSJason Evans
113288ad2f8dSJason Evans  This version includes a few minor performance improvements in addition to the
113388ad2f8dSJason Evans  listed new features and bug fixes.
113488ad2f8dSJason Evans
113588ad2f8dSJason Evans  New features:
113688ad2f8dSJason Evans  - Add clipping support to lg_chunk option processing.
113788ad2f8dSJason Evans  - Add the --enable-ivsalloc option.
113888ad2f8dSJason Evans  - Add the --without-export option.
113988ad2f8dSJason Evans  - Add the --disable-zone-allocator option.
114088ad2f8dSJason Evans
114188ad2f8dSJason Evans  Bug fixes:
114288ad2f8dSJason Evans  - Fix "arenas.extend" mallctl to output the number of arenas.
11432b06b201SJason Evans  - Fix chunk_recycle() to unconditionally inform Valgrind that returned memory
114488ad2f8dSJason Evans    is undefined.
114588ad2f8dSJason Evans  - Fix build break on FreeBSD related to alloca.h.
114688ad2f8dSJason Evans
114782872ac0SJason Evans* 3.2.0 (November 9, 2012)
114882872ac0SJason Evans
114982872ac0SJason Evans  In addition to a couple of bug fixes, this version modifies page run
115082872ac0SJason Evans  allocation and dirty page purging algorithms in order to better control
115182872ac0SJason Evans  page-level virtual memory fragmentation.
115282872ac0SJason Evans
115382872ac0SJason Evans  Incompatible changes:
115482872ac0SJason Evans  - Change the "opt.lg_dirty_mult" default from 5 to 3 (32:1 to 8:1).
115582872ac0SJason Evans
115682872ac0SJason Evans  Bug fixes:
115782872ac0SJason Evans  - Fix dss/mmap allocation precedence code to use recyclable mmap memory only
115882872ac0SJason Evans    after primary dss allocation fails.
115982872ac0SJason Evans  - Fix deadlock in the "arenas.purge" mallctl.  This regression was introduced
116082872ac0SJason Evans    in 3.1.0 by the addition of the "arena.<i>.purge" mallctl.
116182872ac0SJason Evans
116282872ac0SJason Evans* 3.1.0 (October 16, 2012)
116382872ac0SJason Evans
116482872ac0SJason Evans  New features:
116582872ac0SJason Evans  - Auto-detect whether running inside Valgrind, thus removing the need to
116682872ac0SJason Evans    manually specify MALLOC_CONF=valgrind:true.
116782872ac0SJason Evans  - Add the "arenas.extend" mallctl, which allows applications to create
116882872ac0SJason Evans    manually managed arenas.
116982872ac0SJason Evans  - Add the ALLOCM_ARENA() flag for {,r,d}allocm().
117082872ac0SJason Evans  - Add the "opt.dss", "arena.<i>.dss", and "stats.arenas.<i>.dss" mallctls,
117182872ac0SJason Evans    which provide control over dss/mmap precedence.
117282872ac0SJason Evans  - Add the "arena.<i>.purge" mallctl, which obsoletes "arenas.purge".
117382872ac0SJason Evans  - Define LG_QUANTUM for hppa.
117482872ac0SJason Evans
117582872ac0SJason Evans  Incompatible changes:
117682872ac0SJason Evans  - Disable tcache by default if running inside Valgrind, in order to avoid
117782872ac0SJason Evans    making unallocated objects appear reachable to Valgrind.
117882872ac0SJason Evans  - Drop const from malloc_usable_size() argument on Linux.
117982872ac0SJason Evans
118082872ac0SJason Evans  Bug fixes:
118182872ac0SJason Evans  - Fix heap profiling crash if sampled object is freed via realloc(p, 0).
118282872ac0SJason Evans  - Remove const from __*_hook variable declarations, so that glibc can modify
118382872ac0SJason Evans    them during process forking.
118482872ac0SJason Evans  - Fix mlockall(2)/madvise(2) interaction.
118582872ac0SJason Evans  - Fix fork(2)-related deadlocks.
118682872ac0SJason Evans  - Fix error return value for "thread.tcache.enabled" mallctl.
118782872ac0SJason Evans
118835dad073SJason Evans* 3.0.0 (May 11, 2012)
1189a4bd5210SJason Evans
1190a4bd5210SJason Evans  Although this version adds some major new features, the primary focus is on
1191a4bd5210SJason Evans  internal code cleanup that facilitates maintainability and portability, most
1192a4bd5210SJason Evans  of which is not reflected in the ChangeLog.  This is the first release to
1193a4bd5210SJason Evans  incorporate substantial contributions from numerous other developers, and the
1194a4bd5210SJason Evans  result is a more broadly useful allocator (see the git revision history for
1195a4bd5210SJason Evans  contribution details).  Note that the license has been unified, thanks to
1196a4bd5210SJason Evans  Facebook granting a license under the same terms as the other copyright
1197a4bd5210SJason Evans  holders (see COPYING).
1198a4bd5210SJason Evans
1199a4bd5210SJason Evans  New features:
1200a4bd5210SJason Evans  - Implement Valgrind support, redzones, and quarantine.
1201e722f8f8SJason Evans  - Add support for additional platforms:
1202a4bd5210SJason Evans    + FreeBSD
1203a4bd5210SJason Evans    + Mac OS X Lion
1204e722f8f8SJason Evans    + MinGW
120535dad073SJason Evans    + Windows (no support yet for replacing the system malloc)
1206a4bd5210SJason Evans  - Add support for additional architectures:
1207a4bd5210SJason Evans    + MIPS
1208a4bd5210SJason Evans    + SH4
1209a4bd5210SJason Evans    + Tilera
1210a4bd5210SJason Evans  - Add support for cross compiling.
1211a4bd5210SJason Evans  - Add nallocm(), which rounds a request size up to the nearest size class
1212a4bd5210SJason Evans    without actually allocating.
1213a4bd5210SJason Evans  - Implement aligned_alloc() (blame C11).
1214a4bd5210SJason Evans  - Add the "thread.tcache.enabled" mallctl.
12158ed34ab0SJason Evans  - Add the "opt.prof_final" mallctl.
12168ed34ab0SJason Evans  - Update pprof (from gperftools 2.0).
121735dad073SJason Evans  - Add the --with-mangling option.
121835dad073SJason Evans  - Add the --disable-experimental option.
121935dad073SJason Evans  - Add the --disable-munmap option, and make it the default on Linux.
122035dad073SJason Evans  - Add the --enable-mremap option, which disables use of mremap(2) by default.
1221a4bd5210SJason Evans
1222a4bd5210SJason Evans  Incompatible changes:
1223a4bd5210SJason Evans  - Enable stats by default.
1224a4bd5210SJason Evans  - Enable fill by default.
1225a4bd5210SJason Evans  - Disable lazy locking by default.
1226a4bd5210SJason Evans  - Rename the "tcache.flush" mallctl to "thread.tcache.flush".
1227a4bd5210SJason Evans  - Rename the "arenas.pagesize" mallctl to "arenas.page".
12288ed34ab0SJason Evans  - Change the "opt.lg_prof_sample" default from 0 to 19 (1 B to 512 KiB).
12298ed34ab0SJason Evans  - Change the "opt.prof_accum" default from true to false.
1230a4bd5210SJason Evans
1231a4bd5210SJason Evans  Removed features:
1232a4bd5210SJason Evans  - Remove the swap feature, including the "config.swap", "swap.avail",
1233a4bd5210SJason Evans    "swap.prezeroed", "swap.nfds", and "swap.fds" mallctls.
1234a4bd5210SJason Evans  - Remove highruns statistics, including the
1235a4bd5210SJason Evans    "stats.arenas.<i>.bins.<j>.highruns" and
1236a4bd5210SJason Evans    "stats.arenas.<i>.lruns.<j>.highruns" mallctls.
1237a4bd5210SJason Evans  - As part of small size class refactoring, remove the "opt.lg_[qc]space_max",
1238a4bd5210SJason Evans    "arenas.cacheline", "arenas.subpage", "arenas.[tqcs]space_{min,max}", and
1239a4bd5210SJason Evans    "arenas.[tqcs]bins" mallctls.
1240a4bd5210SJason Evans  - Remove the "arenas.chunksize" mallctl.
1241a4bd5210SJason Evans  - Remove the "opt.lg_prof_tcmax" option.
1242a4bd5210SJason Evans  - Remove the "opt.lg_prof_bt_max" option.
1243a4bd5210SJason Evans  - Remove the "opt.lg_tcache_gc_sweep" option.
1244a4bd5210SJason Evans  - Remove the --disable-tiny option, including the "config.tiny" mallctl.
1245a4bd5210SJason Evans  - Remove the --enable-dynamic-page-shift configure option.
1246a4bd5210SJason Evans  - Remove the --enable-sysv configure option.
1247a4bd5210SJason Evans
1248a4bd5210SJason Evans  Bug fixes:
1249a4bd5210SJason Evans  - Fix a statistics-related bug in the "thread.arena" mallctl that could cause
1250a4bd5210SJason Evans    invalid statistics and crashes.
1251e722f8f8SJason Evans  - Work around TLS deallocation via free() on Linux.  This bug could cause
1252a4bd5210SJason Evans    write-after-free memory corruption.
1253e722f8f8SJason Evans  - Fix a potential deadlock that could occur during interval- and
1254e722f8f8SJason Evans    growth-triggered heap profile dumps.
125535dad073SJason Evans  - Fix large calloc() zeroing bugs due to dropping chunk map unzeroed flags.
12564bcb1430SJason Evans  - Fix chunk_alloc_dss() to stop claiming memory is zeroed.  This bug could
12574bcb1430SJason Evans    cause memory corruption and crashes with --enable-dss specified.
1258e722f8f8SJason Evans  - Fix fork-related bugs that could cause deadlock in children between fork
1259e722f8f8SJason Evans    and exec.
1260a4bd5210SJason Evans  - Fix malloc_stats_print() to honor 'b' and 'l' in the opts parameter.
1261a4bd5210SJason Evans  - Fix realloc(p, 0) to act like free(p).
1262a4bd5210SJason Evans  - Do not enforce minimum alignment in memalign().
1263a4bd5210SJason Evans  - Check for NULL pointer in malloc_usable_size().
1264e722f8f8SJason Evans  - Fix an off-by-one heap profile statistics bug that could be observed in
1265e722f8f8SJason Evans    interval- and growth-triggered heap profiles.
1266e722f8f8SJason Evans  - Fix the "epoch" mallctl to update cached stats even if the passed in epoch
1267e722f8f8SJason Evans    is 0.
1268a4bd5210SJason Evans  - Fix bin->runcur management to fix a layout policy bug.  This bug did not
1269a4bd5210SJason Evans    affect correctness.
1270a4bd5210SJason Evans  - Fix a bug in choose_arena_hard() that potentially caused more arenas to be
1271a4bd5210SJason Evans    initialized than necessary.
1272a4bd5210SJason Evans  - Add missing "opt.lg_tcache_max" mallctl implementation.
1273a4bd5210SJason Evans  - Use glibc allocator hooks to make mixed allocator usage less likely.
1274a4bd5210SJason Evans  - Fix build issues for --disable-tcache.
12758ed34ab0SJason Evans  - Don't mangle pthread_create() when --with-private-namespace is specified.
1276a4bd5210SJason Evans
1277a4bd5210SJason Evans* 2.2.5 (November 14, 2011)
1278a4bd5210SJason Evans
1279a4bd5210SJason Evans  Bug fixes:
1280a4bd5210SJason Evans  - Fix huge_ralloc() race when using mremap(2).  This is a serious bug that
1281a4bd5210SJason Evans    could cause memory corruption and/or crashes.
1282a4bd5210SJason Evans  - Fix huge_ralloc() to maintain chunk statistics.
1283a4bd5210SJason Evans  - Fix malloc_stats_print(..., "a") output.
1284a4bd5210SJason Evans
1285a4bd5210SJason Evans* 2.2.4 (November 5, 2011)
1286a4bd5210SJason Evans
1287a4bd5210SJason Evans  Bug fixes:
1288a4bd5210SJason Evans  - Initialize arenas_tsd before using it.  This bug existed for 2.2.[0-3], as
1289a4bd5210SJason Evans    well as for --disable-tls builds in earlier releases.
1290a4bd5210SJason Evans  - Do not assume a 4 KiB page size in test/rallocm.c.
1291a4bd5210SJason Evans
1292a4bd5210SJason Evans* 2.2.3 (August 31, 2011)
1293a4bd5210SJason Evans
1294a4bd5210SJason Evans  This version fixes numerous bugs related to heap profiling.
1295a4bd5210SJason Evans
1296a4bd5210SJason Evans  Bug fixes:
1297a4bd5210SJason Evans  - Fix a prof-related race condition.  This bug could cause memory corruption,
1298a4bd5210SJason Evans    but only occurred in non-default configurations (prof_accum:false).
1299a4bd5210SJason Evans  - Fix off-by-one backtracing issues (make sure that prof_alloc_prep() is
1300a4bd5210SJason Evans    excluded from backtraces).
1301a4bd5210SJason Evans  - Fix a prof-related bug in realloc() (only triggered by OOM errors).
1302a4bd5210SJason Evans  - Fix prof-related bugs in allocm() and rallocm().
1303a4bd5210SJason Evans  - Fix prof_tdata_cleanup() for --disable-tls builds.
1304a4bd5210SJason Evans  - Fix a relative include path, to fix objdir builds.
1305a4bd5210SJason Evans
1306a4bd5210SJason Evans* 2.2.2 (July 30, 2011)
1307a4bd5210SJason Evans
1308a4bd5210SJason Evans  Bug fixes:
1309a4bd5210SJason Evans  - Fix a build error for --disable-tcache.
1310a4bd5210SJason Evans  - Fix assertions in arena_purge() (for real this time).
1311a4bd5210SJason Evans  - Add the --with-private-namespace option.  This is a workaround for symbol
1312a4bd5210SJason Evans    conflicts that can inadvertently arise when using static libraries.
1313a4bd5210SJason Evans
1314a4bd5210SJason Evans* 2.2.1 (March 30, 2011)
1315a4bd5210SJason Evans
1316a4bd5210SJason Evans  Bug fixes:
1317a4bd5210SJason Evans  - Implement atomic operations for x86/x64.  This fixes compilation failures
1318a4bd5210SJason Evans    for versions of gcc that are still in wide use.
1319a4bd5210SJason Evans  - Fix an assertion in arena_purge().
1320a4bd5210SJason Evans
1321a4bd5210SJason Evans* 2.2.0 (March 22, 2011)
1322a4bd5210SJason Evans
1323a4bd5210SJason Evans  This version incorporates several improvements to algorithms and data
1324a4bd5210SJason Evans  structures that tend to reduce fragmentation and increase speed.
1325a4bd5210SJason Evans
1326a4bd5210SJason Evans  New features:
1327a4bd5210SJason Evans  - Add the "stats.cactive" mallctl.
1328a4bd5210SJason Evans  - Update pprof (from google-perftools 1.7).
1329a4bd5210SJason Evans  - Improve backtracing-related configuration logic, and add the
1330a4bd5210SJason Evans    --disable-prof-libgcc option.
1331a4bd5210SJason Evans
1332a4bd5210SJason Evans  Bug fixes:
1333a4bd5210SJason Evans  - Change default symbol visibility from "internal", to "hidden", which
1334a4bd5210SJason Evans    decreases the overhead of library-internal function calls.
1335a4bd5210SJason Evans  - Fix symbol visibility so that it is also set on OS X.
1336a4bd5210SJason Evans  - Fix a build dependency regression caused by the introduction of the .pic.o
1337a4bd5210SJason Evans    suffix for PIC object files.
1338a4bd5210SJason Evans  - Add missing checks for mutex initialization failures.
1339a4bd5210SJason Evans  - Don't use libgcc-based backtracing except on x64, where it is known to work.
1340a4bd5210SJason Evans  - Fix deadlocks on OS X that were due to memory allocation in
1341a4bd5210SJason Evans    pthread_mutex_lock().
1342a4bd5210SJason Evans  - Heap profiling-specific fixes:
1343a4bd5210SJason Evans    + Fix memory corruption due to integer overflow in small region index
1344a4bd5210SJason Evans      computation, when using a small enough sample interval that profiling
1345a4bd5210SJason Evans      context pointers are stored in small run headers.
1346a4bd5210SJason Evans    + Fix a bootstrap ordering bug that only occurred with TLS disabled.
1347a4bd5210SJason Evans    + Fix a rallocm() rsize bug.
1348a4bd5210SJason Evans    + Fix error detection bugs for aligned memory allocation.
1349a4bd5210SJason Evans
1350a4bd5210SJason Evans* 2.1.3 (March 14, 2011)
1351a4bd5210SJason Evans
1352a4bd5210SJason Evans  Bug fixes:
1353a4bd5210SJason Evans  - Fix a cpp logic regression (due to the "thread.{de,}allocatedp" mallctl fix
1354a4bd5210SJason Evans    for OS X in 2.1.2).
1355a4bd5210SJason Evans  - Fix a "thread.arena" mallctl bug.
1356a4bd5210SJason Evans  - Fix a thread cache stats merging bug.
1357a4bd5210SJason Evans
1358a4bd5210SJason Evans* 2.1.2 (March 2, 2011)
1359a4bd5210SJason Evans
1360a4bd5210SJason Evans  Bug fixes:
1361a4bd5210SJason Evans  - Fix "thread.{de,}allocatedp" mallctl for OS X.
1362a4bd5210SJason Evans  - Add missing jemalloc.a to build system.
1363a4bd5210SJason Evans
1364a4bd5210SJason Evans* 2.1.1 (January 31, 2011)
1365a4bd5210SJason Evans
1366a4bd5210SJason Evans  Bug fixes:
1367a4bd5210SJason Evans  - Fix aligned huge reallocation (affected allocm()).
1368a4bd5210SJason Evans  - Fix the ALLOCM_LG_ALIGN macro definition.
1369a4bd5210SJason Evans  - Fix a heap dumping deadlock.
1370a4bd5210SJason Evans  - Fix a "thread.arena" mallctl bug.
1371a4bd5210SJason Evans
1372a4bd5210SJason Evans* 2.1.0 (December 3, 2010)
1373a4bd5210SJason Evans
1374a4bd5210SJason Evans  This version incorporates some optimizations that can't quite be considered
1375a4bd5210SJason Evans  bug fixes.
1376a4bd5210SJason Evans
1377a4bd5210SJason Evans  New features:
1378a4bd5210SJason Evans  - Use Linux's mremap(2) for huge object reallocation when possible.
1379a4bd5210SJason Evans  - Avoid locking in mallctl*() when possible.
1380a4bd5210SJason Evans  - Add the "thread.[de]allocatedp" mallctl's.
1381a4bd5210SJason Evans  - Convert the manual page source from roff to DocBook, and generate both roff
1382a4bd5210SJason Evans    and HTML manuals.
1383a4bd5210SJason Evans
1384a4bd5210SJason Evans  Bug fixes:
1385a4bd5210SJason Evans  - Fix a crash due to incorrect bootstrap ordering.  This only impacted
1386a4bd5210SJason Evans    --enable-debug --enable-dss configurations.
1387a4bd5210SJason Evans  - Fix a minor statistics bug for mallctl("swap.avail", ...).
1388a4bd5210SJason Evans
1389a4bd5210SJason Evans* 2.0.1 (October 29, 2010)
1390a4bd5210SJason Evans
1391a4bd5210SJason Evans  Bug fixes:
1392a4bd5210SJason Evans  - Fix a race condition in heap profiling that could cause undefined behavior
1393a4bd5210SJason Evans    if "opt.prof_accum" were disabled.
1394a4bd5210SJason Evans  - Add missing mutex unlocks for some OOM error paths in the heap profiling
1395a4bd5210SJason Evans    code.
1396a4bd5210SJason Evans  - Fix a compilation error for non-C99 builds.
1397a4bd5210SJason Evans
1398a4bd5210SJason Evans* 2.0.0 (October 24, 2010)
1399a4bd5210SJason Evans
1400a4bd5210SJason Evans  This version focuses on the experimental *allocm() API, and on improved
1401a4bd5210SJason Evans  run-time configuration/introspection.  Nonetheless, numerous performance
1402a4bd5210SJason Evans  improvements are also included.
1403a4bd5210SJason Evans
1404a4bd5210SJason Evans  New features:
1405a4bd5210SJason Evans  - Implement the experimental {,r,s,d}allocm() API, which provides a superset
1406a4bd5210SJason Evans    of the functionality available via malloc(), calloc(), posix_memalign(),
1407a4bd5210SJason Evans    realloc(), malloc_usable_size(), and free().  These functions can be used to
1408a4bd5210SJason Evans    allocate/reallocate aligned zeroed memory, ask for optional extra memory
1409a4bd5210SJason Evans    during reallocation, prevent object movement during reallocation, etc.
1410a4bd5210SJason Evans  - Replace JEMALLOC_OPTIONS/JEMALLOC_PROF_PREFIX with MALLOC_CONF, which is
1411a4bd5210SJason Evans    more human-readable, and more flexible.  For example:
1412a4bd5210SJason Evans      JEMALLOC_OPTIONS=AJP
1413a4bd5210SJason Evans    is now:
1414a4bd5210SJason Evans      MALLOC_CONF=abort:true,fill:true,stats_print:true
1415a4bd5210SJason Evans  - Port to Apple OS X.  Sponsored by Mozilla.
1416a4bd5210SJason Evans  - Make it possible for the application to control thread-->arena mappings via
1417a4bd5210SJason Evans    the "thread.arena" mallctl.
1418a4bd5210SJason Evans  - Add compile-time support for all TLS-related functionality via pthreads TSD.
1419a4bd5210SJason Evans    This is mainly of interest for OS X, which does not support TLS, but has a
1420a4bd5210SJason Evans    TSD implementation with similar performance.
1421a4bd5210SJason Evans  - Override memalign() and valloc() if they are provided by the system.
1422a4bd5210SJason Evans  - Add the "arenas.purge" mallctl, which can be used to synchronously purge all
1423a4bd5210SJason Evans    dirty unused pages.
1424a4bd5210SJason Evans  - Make cumulative heap profiling data optional, so that it is possible to
1425a4bd5210SJason Evans    limit the amount of memory consumed by heap profiling data structures.
1426a4bd5210SJason Evans  - Add per thread allocation counters that can be accessed via the
1427a4bd5210SJason Evans    "thread.allocated" and "thread.deallocated" mallctls.
1428a4bd5210SJason Evans
1429a4bd5210SJason Evans  Incompatible changes:
1430a4bd5210SJason Evans  - Remove JEMALLOC_OPTIONS and malloc_options (see MALLOC_CONF above).
1431a4bd5210SJason Evans  - Increase default backtrace depth from 4 to 128 for heap profiling.
1432a4bd5210SJason Evans  - Disable interval-based profile dumps by default.
1433a4bd5210SJason Evans
1434a4bd5210SJason Evans  Bug fixes:
1435a4bd5210SJason Evans  - Remove bad assertions in fork handler functions.  These assertions could
1436a4bd5210SJason Evans    cause aborts for some combinations of configure settings.
1437a4bd5210SJason Evans  - Fix strerror_r() usage to deal with non-standard semantics in GNU libc.
1438a4bd5210SJason Evans  - Fix leak context reporting.  This bug tended to cause the number of contexts
1439a4bd5210SJason Evans    to be underreported (though the reported number of objects and bytes were
1440a4bd5210SJason Evans    correct).
1441a4bd5210SJason Evans  - Fix a realloc() bug for large in-place growing reallocation.  This bug could
1442a4bd5210SJason Evans    cause memory corruption, but it was hard to trigger.
1443a4bd5210SJason Evans  - Fix an allocation bug for small allocations that could be triggered if
1444a4bd5210SJason Evans    multiple threads raced to create a new run of backing pages.
1445a4bd5210SJason Evans  - Enhance the heap profiler to trigger samples based on usable size, rather
1446a4bd5210SJason Evans    than request size.
1447a4bd5210SJason Evans  - Fix a heap profiling bug due to sometimes losing track of requested object
1448a4bd5210SJason Evans    size for sampled objects.
1449a4bd5210SJason Evans
1450a4bd5210SJason Evans* 1.0.3 (August 12, 2010)
1451a4bd5210SJason Evans
1452a4bd5210SJason Evans  Bug fixes:
1453a4bd5210SJason Evans  - Fix the libunwind-based implementation of stack backtracing (used for heap
1454a4bd5210SJason Evans    profiling).  This bug could cause zero-length backtraces to be reported.
1455a4bd5210SJason Evans  - Add a missing mutex unlock in library initialization code.  If multiple
1456a4bd5210SJason Evans    threads raced to initialize malloc, some of them could end up permanently
1457a4bd5210SJason Evans    blocked.
1458a4bd5210SJason Evans
1459a4bd5210SJason Evans* 1.0.2 (May 11, 2010)
1460a4bd5210SJason Evans
1461a4bd5210SJason Evans  Bug fixes:
1462a4bd5210SJason Evans  - Fix junk filling of large objects, which could cause memory corruption.
1463a4bd5210SJason Evans  - Add MAP_NORESERVE support for chunk mapping, because otherwise virtual
1464a4bd5210SJason Evans    memory limits could cause swap file configuration to fail.  Contributed by
1465a4bd5210SJason Evans    Jordan DeLong.
1466a4bd5210SJason Evans
1467a4bd5210SJason Evans* 1.0.1 (April 14, 2010)
1468a4bd5210SJason Evans
1469a4bd5210SJason Evans  Bug fixes:
1470a4bd5210SJason Evans  - Fix compilation when --enable-fill is specified.
1471a4bd5210SJason Evans  - Fix threads-related profiling bugs that affected accuracy and caused memory
1472a4bd5210SJason Evans    to be leaked during thread exit.
1473a4bd5210SJason Evans  - Fix dirty page purging race conditions that could cause crashes.
1474a4bd5210SJason Evans  - Fix crash in tcache flushing code during thread destruction.
1475a4bd5210SJason Evans
1476a4bd5210SJason Evans* 1.0.0 (April 11, 2010)
1477a4bd5210SJason Evans
1478a4bd5210SJason Evans  This release focuses on speed and run-time introspection.  Numerous
1479a4bd5210SJason Evans  algorithmic improvements make this release substantially faster than its
1480a4bd5210SJason Evans  predecessors.
1481a4bd5210SJason Evans
1482a4bd5210SJason Evans  New features:
1483a4bd5210SJason Evans  - Implement autoconf-based configuration system.
1484a4bd5210SJason Evans  - Add mallctl*(), for the purposes of introspection and run-time
1485a4bd5210SJason Evans    configuration.
1486a4bd5210SJason Evans  - Make it possible for the application to manually flush a thread's cache, via
1487a4bd5210SJason Evans    the "tcache.flush" mallctl.
1488a4bd5210SJason Evans  - Base maximum dirty page count on proportion of active memory.
1489d0e79aa3SJason Evans  - Compute various additional run-time statistics, including per size class
1490a4bd5210SJason Evans    statistics for large objects.
1491a4bd5210SJason Evans  - Expose malloc_stats_print(), which can be called repeatedly by the
1492a4bd5210SJason Evans    application.
1493a4bd5210SJason Evans  - Simplify the malloc_message() signature to only take one string argument,
1494a4bd5210SJason Evans    and incorporate an opaque data pointer argument for use by the application
1495a4bd5210SJason Evans    in combination with malloc_stats_print().
1496a4bd5210SJason Evans  - Add support for allocation backed by one or more swap files, and allow the
1497a4bd5210SJason Evans    application to disable over-commit if swap files are in use.
1498a4bd5210SJason Evans  - Implement allocation profiling and leak checking.
1499a4bd5210SJason Evans
1500a4bd5210SJason Evans  Removed features:
1501a4bd5210SJason Evans  - Remove the dynamic arena rebalancing code, since thread-specific caching
1502a4bd5210SJason Evans    reduces its utility.
1503a4bd5210SJason Evans
1504a4bd5210SJason Evans  Bug fixes:
1505a4bd5210SJason Evans  - Modify chunk allocation to work when address space layout randomization
1506a4bd5210SJason Evans    (ASLR) is in use.
1507a4bd5210SJason Evans  - Fix thread cleanup bugs related to TLS destruction.
1508a4bd5210SJason Evans  - Handle 0-size allocation requests in posix_memalign().
1509a4bd5210SJason Evans  - Fix a chunk leak.  The leaked chunks were never touched, so this impacted
1510a4bd5210SJason Evans    virtual memory usage, but not physical memory usage.
1511a4bd5210SJason Evans
1512a4bd5210SJason Evans* linux_2008082[78]a (August 27/28, 2008)
1513a4bd5210SJason Evans
1514a4bd5210SJason Evans  These snapshot releases are the simple result of incorporating Linux-specific
1515a4bd5210SJason Evans  support into the FreeBSD malloc sources.
1516a4bd5210SJason Evans
1517a4bd5210SJason Evans--------------------------------------------------------------------------------
1518a4bd5210SJason Evansvim:filetype=text:textwidth=80
1519