1*5ffd83dbSDimitry Andric// -*- C++ -*- 2*5ffd83dbSDimitry Andric//===--------------------------- barrier ----------------------------------===// 3*5ffd83dbSDimitry Andric// 4*5ffd83dbSDimitry Andric// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 5*5ffd83dbSDimitry Andric// See https://llvm.org/LICENSE.txt for license information. 6*5ffd83dbSDimitry Andric// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 7*5ffd83dbSDimitry Andric// 8*5ffd83dbSDimitry Andric//===----------------------------------------------------------------------===// 9*5ffd83dbSDimitry Andric 10*5ffd83dbSDimitry Andric#ifndef _LIBCPP_BARRIER 11*5ffd83dbSDimitry Andric#define _LIBCPP_BARRIER 12*5ffd83dbSDimitry Andric 13*5ffd83dbSDimitry Andric/* 14*5ffd83dbSDimitry Andric barrier synopsis 15*5ffd83dbSDimitry Andric 16*5ffd83dbSDimitry Andricnamespace std 17*5ffd83dbSDimitry Andric{ 18*5ffd83dbSDimitry Andric 19*5ffd83dbSDimitry Andric template<class CompletionFunction = see below> 20*5ffd83dbSDimitry Andric class barrier 21*5ffd83dbSDimitry Andric { 22*5ffd83dbSDimitry Andric public: 23*5ffd83dbSDimitry Andric using arrival_token = see below; 24*5ffd83dbSDimitry Andric 25*5ffd83dbSDimitry Andric constexpr explicit barrier(ptrdiff_t phase_count, 26*5ffd83dbSDimitry Andric CompletionFunction f = CompletionFunction()); 27*5ffd83dbSDimitry Andric ~barrier(); 28*5ffd83dbSDimitry Andric 29*5ffd83dbSDimitry Andric barrier(const barrier&) = delete; 30*5ffd83dbSDimitry Andric barrier& operator=(const barrier&) = delete; 31*5ffd83dbSDimitry Andric 32*5ffd83dbSDimitry Andric [[nodiscard]] arrival_token arrive(ptrdiff_t update = 1); 33*5ffd83dbSDimitry Andric void wait(arrival_token&& arrival) const; 34*5ffd83dbSDimitry Andric 35*5ffd83dbSDimitry Andric void arrive_and_wait(); 36*5ffd83dbSDimitry Andric void arrive_and_drop(); 37*5ffd83dbSDimitry Andric 38*5ffd83dbSDimitry Andric private: 39*5ffd83dbSDimitry Andric CompletionFunction completion; // exposition only 40*5ffd83dbSDimitry Andric }; 41*5ffd83dbSDimitry Andric 42*5ffd83dbSDimitry Andric} 43*5ffd83dbSDimitry Andric 44*5ffd83dbSDimitry Andric*/ 45*5ffd83dbSDimitry Andric 46*5ffd83dbSDimitry Andric#include <__config> 47*5ffd83dbSDimitry Andric#include <atomic> 48*5ffd83dbSDimitry Andric#ifndef _LIBCPP_HAS_NO_TREE_BARRIER 49*5ffd83dbSDimitry Andric# include <memory> 50*5ffd83dbSDimitry Andric#endif 51*5ffd83dbSDimitry Andric 52*5ffd83dbSDimitry Andric#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER) 53*5ffd83dbSDimitry Andric#pragma GCC system_header 54*5ffd83dbSDimitry Andric#endif 55*5ffd83dbSDimitry Andric 56*5ffd83dbSDimitry Andric#ifdef _LIBCPP_HAS_NO_THREADS 57*5ffd83dbSDimitry Andric# error <barrier> is not supported on this single threaded system 58*5ffd83dbSDimitry Andric#endif 59*5ffd83dbSDimitry Andric 60*5ffd83dbSDimitry Andric#if _LIBCPP_STD_VER >= 14 61*5ffd83dbSDimitry Andric 62*5ffd83dbSDimitry Andric_LIBCPP_BEGIN_NAMESPACE_STD 63*5ffd83dbSDimitry Andric 64*5ffd83dbSDimitry Andricstruct __empty_completion 65*5ffd83dbSDimitry Andric{ 66*5ffd83dbSDimitry Andric inline _LIBCPP_INLINE_VISIBILITY 67*5ffd83dbSDimitry Andric void operator()() noexcept 68*5ffd83dbSDimitry Andric { 69*5ffd83dbSDimitry Andric } 70*5ffd83dbSDimitry Andric}; 71*5ffd83dbSDimitry Andric 72*5ffd83dbSDimitry Andric#ifndef _LIBCPP_HAS_NO_TREE_BARRIER 73*5ffd83dbSDimitry Andric 74*5ffd83dbSDimitry Andric/* 75*5ffd83dbSDimitry Andric 76*5ffd83dbSDimitry AndricThe default implementation of __barrier_base is a classic tree barrier. 77*5ffd83dbSDimitry Andric 78*5ffd83dbSDimitry AndricIt looks different from literature pseudocode for two main reasons: 79*5ffd83dbSDimitry Andric 1. Threads that call into std::barrier functions do not provide indices, 80*5ffd83dbSDimitry Andric so a numbering step is added before the actual barrier algorithm, 81*5ffd83dbSDimitry Andric appearing as an N+1 round to the N rounds of the tree barrier. 82*5ffd83dbSDimitry Andric 2. A great deal of attention has been paid to avoid cache line thrashing 83*5ffd83dbSDimitry Andric by flattening the tree structure into cache-line sized arrays, that 84*5ffd83dbSDimitry Andric are indexed in an efficient way. 85*5ffd83dbSDimitry Andric 86*5ffd83dbSDimitry Andric*/ 87*5ffd83dbSDimitry Andric 88*5ffd83dbSDimitry Andricusing __barrier_phase_t = uint8_t; 89*5ffd83dbSDimitry Andric 90*5ffd83dbSDimitry Andricclass __barrier_algorithm_base; 91*5ffd83dbSDimitry Andric 92*5ffd83dbSDimitry Andric_LIBCPP_AVAILABILITY_SYNC _LIBCPP_EXPORTED_FROM_ABI 93*5ffd83dbSDimitry Andric__barrier_algorithm_base* __construct_barrier_algorithm_base(ptrdiff_t& __expected); 94*5ffd83dbSDimitry Andric 95*5ffd83dbSDimitry Andric_LIBCPP_AVAILABILITY_SYNC _LIBCPP_EXPORTED_FROM_ABI 96*5ffd83dbSDimitry Andricbool __arrive_barrier_algorithm_base(__barrier_algorithm_base* __barrier, 97*5ffd83dbSDimitry Andric __barrier_phase_t __old_phase); 98*5ffd83dbSDimitry Andric 99*5ffd83dbSDimitry Andric_LIBCPP_AVAILABILITY_SYNC _LIBCPP_EXPORTED_FROM_ABI 100*5ffd83dbSDimitry Andricvoid __destroy_barrier_algorithm_base(__barrier_algorithm_base* __barrier); 101*5ffd83dbSDimitry Andric 102*5ffd83dbSDimitry Andrictemplate<class _CompletionF> 103*5ffd83dbSDimitry Andricclass __barrier_base { 104*5ffd83dbSDimitry Andric 105*5ffd83dbSDimitry Andric ptrdiff_t __expected; 106*5ffd83dbSDimitry Andric unique_ptr<__barrier_algorithm_base, 107*5ffd83dbSDimitry Andric void (*)(__barrier_algorithm_base*)> __base; 108*5ffd83dbSDimitry Andric __atomic_base<ptrdiff_t> __expected_adjustment; 109*5ffd83dbSDimitry Andric _CompletionF __completion; 110*5ffd83dbSDimitry Andric __atomic_base<__barrier_phase_t> __phase; 111*5ffd83dbSDimitry Andric 112*5ffd83dbSDimitry Andricpublic: 113*5ffd83dbSDimitry Andric using arrival_token = __barrier_phase_t; 114*5ffd83dbSDimitry Andric 115*5ffd83dbSDimitry Andric static constexpr ptrdiff_t max() noexcept { 116*5ffd83dbSDimitry Andric return numeric_limits<ptrdiff_t>::max(); 117*5ffd83dbSDimitry Andric } 118*5ffd83dbSDimitry Andric 119*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 120*5ffd83dbSDimitry Andric __barrier_base(ptrdiff_t __expected, _CompletionF __completion = _CompletionF()) 121*5ffd83dbSDimitry Andric : __expected(__expected), __base(__construct_barrier_algorithm_base(this->__expected), 122*5ffd83dbSDimitry Andric &__destroy_barrier_algorithm_base), 123*5ffd83dbSDimitry Andric __expected_adjustment(0), __completion(move(__completion)), __phase(0) 124*5ffd83dbSDimitry Andric { 125*5ffd83dbSDimitry Andric } 126*5ffd83dbSDimitry Andric [[nodiscard]] _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 127*5ffd83dbSDimitry Andric arrival_token arrive(ptrdiff_t update) 128*5ffd83dbSDimitry Andric { 129*5ffd83dbSDimitry Andric auto const __old_phase = __phase.load(memory_order_relaxed); 130*5ffd83dbSDimitry Andric for(; update; --update) 131*5ffd83dbSDimitry Andric if(__arrive_barrier_algorithm_base(__base.get(), __old_phase)) { 132*5ffd83dbSDimitry Andric __completion(); 133*5ffd83dbSDimitry Andric __expected += __expected_adjustment.load(memory_order_relaxed); 134*5ffd83dbSDimitry Andric __expected_adjustment.store(0, memory_order_relaxed); 135*5ffd83dbSDimitry Andric __phase.store(__old_phase + 2, memory_order_release); 136*5ffd83dbSDimitry Andric __phase.notify_all(); 137*5ffd83dbSDimitry Andric } 138*5ffd83dbSDimitry Andric return __old_phase; 139*5ffd83dbSDimitry Andric } 140*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 141*5ffd83dbSDimitry Andric void wait(arrival_token&& __old_phase) const 142*5ffd83dbSDimitry Andric { 143*5ffd83dbSDimitry Andric auto const __test_fn = [=]() -> bool { 144*5ffd83dbSDimitry Andric return __phase.load(memory_order_acquire) != __old_phase; 145*5ffd83dbSDimitry Andric }; 146*5ffd83dbSDimitry Andric __libcpp_thread_poll_with_backoff(__test_fn, __libcpp_timed_backoff_policy()); 147*5ffd83dbSDimitry Andric } 148*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 149*5ffd83dbSDimitry Andric void arrive_and_drop() 150*5ffd83dbSDimitry Andric { 151*5ffd83dbSDimitry Andric __expected_adjustment.fetch_sub(1, memory_order_relaxed); 152*5ffd83dbSDimitry Andric (void)arrive(1); 153*5ffd83dbSDimitry Andric } 154*5ffd83dbSDimitry Andric}; 155*5ffd83dbSDimitry Andric 156*5ffd83dbSDimitry Andric#else 157*5ffd83dbSDimitry Andric 158*5ffd83dbSDimitry Andric/* 159*5ffd83dbSDimitry Andric 160*5ffd83dbSDimitry AndricThe alternative implementation of __barrier_base is a central barrier. 161*5ffd83dbSDimitry Andric 162*5ffd83dbSDimitry AndricTwo versions of this algorithm are provided: 163*5ffd83dbSDimitry Andric 1. A fairly straightforward implementation of the litterature for the 164*5ffd83dbSDimitry Andric general case where the completion function is not empty. 165*5ffd83dbSDimitry Andric 2. An optimized implementation that exploits 2's complement arithmetic 166*5ffd83dbSDimitry Andric and well-defined overflow in atomic arithmetic, to handle the phase 167*5ffd83dbSDimitry Andric roll-over for free. 168*5ffd83dbSDimitry Andric 169*5ffd83dbSDimitry Andric*/ 170*5ffd83dbSDimitry Andric 171*5ffd83dbSDimitry Andrictemplate<class _CompletionF> 172*5ffd83dbSDimitry Andricclass __barrier_base { 173*5ffd83dbSDimitry Andric 174*5ffd83dbSDimitry Andric __atomic_base<ptrdiff_t> __expected; 175*5ffd83dbSDimitry Andric __atomic_base<ptrdiff_t> __arrived; 176*5ffd83dbSDimitry Andric _CompletionF __completion; 177*5ffd83dbSDimitry Andric __atomic_base<bool> __phase; 178*5ffd83dbSDimitry Andricpublic: 179*5ffd83dbSDimitry Andric using arrival_token = bool; 180*5ffd83dbSDimitry Andric 181*5ffd83dbSDimitry Andric static constexpr ptrdiff_t max() noexcept { 182*5ffd83dbSDimitry Andric return numeric_limits<ptrdiff_t>::max(); 183*5ffd83dbSDimitry Andric } 184*5ffd83dbSDimitry Andric 185*5ffd83dbSDimitry Andric _LIBCPP_INLINE_VISIBILITY 186*5ffd83dbSDimitry Andric __barrier_base(ptrdiff_t __expected, _CompletionF __completion = _CompletionF()) 187*5ffd83dbSDimitry Andric : __expected(__expected), __arrived(__expected), __completion(move(__completion)), __phase(false) 188*5ffd83dbSDimitry Andric { 189*5ffd83dbSDimitry Andric } 190*5ffd83dbSDimitry Andric [[nodiscard]] _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 191*5ffd83dbSDimitry Andric arrival_token arrive(ptrdiff_t update) 192*5ffd83dbSDimitry Andric { 193*5ffd83dbSDimitry Andric auto const __old_phase = __phase.load(memory_order_relaxed); 194*5ffd83dbSDimitry Andric auto const __result = __arrived.fetch_sub(update, memory_order_acq_rel) - update; 195*5ffd83dbSDimitry Andric auto const new_expected = __expected.load(memory_order_relaxed); 196*5ffd83dbSDimitry Andric if(0 == __result) { 197*5ffd83dbSDimitry Andric __completion(); 198*5ffd83dbSDimitry Andric __arrived.store(new_expected, memory_order_relaxed); 199*5ffd83dbSDimitry Andric __phase.store(!__old_phase, memory_order_release); 200*5ffd83dbSDimitry Andric __phase.notify_all(); 201*5ffd83dbSDimitry Andric } 202*5ffd83dbSDimitry Andric return __old_phase; 203*5ffd83dbSDimitry Andric } 204*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 205*5ffd83dbSDimitry Andric void wait(arrival_token&& __old_phase) const 206*5ffd83dbSDimitry Andric { 207*5ffd83dbSDimitry Andric __phase.wait(__old_phase, memory_order_acquire); 208*5ffd83dbSDimitry Andric } 209*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 210*5ffd83dbSDimitry Andric void arrive_and_drop() 211*5ffd83dbSDimitry Andric { 212*5ffd83dbSDimitry Andric __expected.fetch_sub(1, memory_order_relaxed); 213*5ffd83dbSDimitry Andric (void)arrive(1); 214*5ffd83dbSDimitry Andric } 215*5ffd83dbSDimitry Andric}; 216*5ffd83dbSDimitry Andric 217*5ffd83dbSDimitry Andrictemplate<> 218*5ffd83dbSDimitry Andricclass __barrier_base<__empty_completion> { 219*5ffd83dbSDimitry Andric 220*5ffd83dbSDimitry Andric static constexpr uint64_t __expected_unit = 1ull; 221*5ffd83dbSDimitry Andric static constexpr uint64_t __arrived_unit = 1ull << 32; 222*5ffd83dbSDimitry Andric static constexpr uint64_t __expected_mask = __arrived_unit - 1; 223*5ffd83dbSDimitry Andric static constexpr uint64_t __phase_bit = 1ull << 63; 224*5ffd83dbSDimitry Andric static constexpr uint64_t __arrived_mask = (__phase_bit - 1) & ~__expected_mask; 225*5ffd83dbSDimitry Andric 226*5ffd83dbSDimitry Andric __atomic_base<uint64_t> __phase_arrived_expected; 227*5ffd83dbSDimitry Andric 228*5ffd83dbSDimitry Andric static _LIBCPP_INLINE_VISIBILITY 229*5ffd83dbSDimitry Andric constexpr uint64_t __init(ptrdiff_t __count) _NOEXCEPT 230*5ffd83dbSDimitry Andric { 231*5ffd83dbSDimitry Andric return ((uint64_t(1u << 31) - __count) << 32) 232*5ffd83dbSDimitry Andric | (uint64_t(1u << 31) - __count); 233*5ffd83dbSDimitry Andric } 234*5ffd83dbSDimitry Andric 235*5ffd83dbSDimitry Andricpublic: 236*5ffd83dbSDimitry Andric using arrival_token = uint64_t; 237*5ffd83dbSDimitry Andric 238*5ffd83dbSDimitry Andric static constexpr ptrdiff_t max() noexcept { 239*5ffd83dbSDimitry Andric return ptrdiff_t(1u << 31) - 1; 240*5ffd83dbSDimitry Andric } 241*5ffd83dbSDimitry Andric 242*5ffd83dbSDimitry Andric _LIBCPP_INLINE_VISIBILITY 243*5ffd83dbSDimitry Andric explicit inline __barrier_base(ptrdiff_t __count, __empty_completion = __empty_completion()) 244*5ffd83dbSDimitry Andric : __phase_arrived_expected(__init(__count)) 245*5ffd83dbSDimitry Andric { 246*5ffd83dbSDimitry Andric } 247*5ffd83dbSDimitry Andric [[nodiscard]] inline _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 248*5ffd83dbSDimitry Andric arrival_token arrive(ptrdiff_t update) 249*5ffd83dbSDimitry Andric { 250*5ffd83dbSDimitry Andric auto const __inc = __arrived_unit * update; 251*5ffd83dbSDimitry Andric auto const __old = __phase_arrived_expected.fetch_add(__inc, memory_order_acq_rel); 252*5ffd83dbSDimitry Andric if((__old ^ (__old + __inc)) & __phase_bit) { 253*5ffd83dbSDimitry Andric __phase_arrived_expected.fetch_add((__old & __expected_mask) << 32, memory_order_relaxed); 254*5ffd83dbSDimitry Andric __phase_arrived_expected.notify_all(); 255*5ffd83dbSDimitry Andric } 256*5ffd83dbSDimitry Andric return __old & __phase_bit; 257*5ffd83dbSDimitry Andric } 258*5ffd83dbSDimitry Andric inline _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 259*5ffd83dbSDimitry Andric void wait(arrival_token&& __phase) const 260*5ffd83dbSDimitry Andric { 261*5ffd83dbSDimitry Andric auto const __test_fn = [=]() -> bool { 262*5ffd83dbSDimitry Andric uint64_t const __current = __phase_arrived_expected.load(memory_order_acquire); 263*5ffd83dbSDimitry Andric return ((__current & __phase_bit) != __phase); 264*5ffd83dbSDimitry Andric }; 265*5ffd83dbSDimitry Andric __libcpp_thread_poll_with_backoff(__test_fn, __libcpp_timed_backoff_policy()); 266*5ffd83dbSDimitry Andric } 267*5ffd83dbSDimitry Andric inline _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 268*5ffd83dbSDimitry Andric void arrive_and_drop() 269*5ffd83dbSDimitry Andric { 270*5ffd83dbSDimitry Andric __phase_arrived_expected.fetch_add(__expected_unit, memory_order_relaxed); 271*5ffd83dbSDimitry Andric (void)arrive(1); 272*5ffd83dbSDimitry Andric } 273*5ffd83dbSDimitry Andric}; 274*5ffd83dbSDimitry Andric 275*5ffd83dbSDimitry Andric#endif //_LIBCPP_HAS_NO_TREE_BARRIER 276*5ffd83dbSDimitry Andric 277*5ffd83dbSDimitry Andrictemplate<class _CompletionF = __empty_completion> 278*5ffd83dbSDimitry Andricclass barrier { 279*5ffd83dbSDimitry Andric 280*5ffd83dbSDimitry Andric __barrier_base<_CompletionF> __b; 281*5ffd83dbSDimitry Andricpublic: 282*5ffd83dbSDimitry Andric using arrival_token = typename __barrier_base<_CompletionF>::arrival_token; 283*5ffd83dbSDimitry Andric 284*5ffd83dbSDimitry Andric static constexpr ptrdiff_t max() noexcept { 285*5ffd83dbSDimitry Andric return __barrier_base<_CompletionF>::max(); 286*5ffd83dbSDimitry Andric } 287*5ffd83dbSDimitry Andric 288*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 289*5ffd83dbSDimitry Andric barrier(ptrdiff_t __count, _CompletionF __completion = _CompletionF()) 290*5ffd83dbSDimitry Andric : __b(__count, std::move(__completion)) { 291*5ffd83dbSDimitry Andric } 292*5ffd83dbSDimitry Andric 293*5ffd83dbSDimitry Andric barrier(barrier const&) = delete; 294*5ffd83dbSDimitry Andric barrier& operator=(barrier const&) = delete; 295*5ffd83dbSDimitry Andric 296*5ffd83dbSDimitry Andric [[nodiscard]] _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 297*5ffd83dbSDimitry Andric arrival_token arrive(ptrdiff_t update = 1) 298*5ffd83dbSDimitry Andric { 299*5ffd83dbSDimitry Andric return __b.arrive(update); 300*5ffd83dbSDimitry Andric } 301*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 302*5ffd83dbSDimitry Andric void wait(arrival_token&& __phase) const 303*5ffd83dbSDimitry Andric { 304*5ffd83dbSDimitry Andric __b.wait(std::move(__phase)); 305*5ffd83dbSDimitry Andric } 306*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 307*5ffd83dbSDimitry Andric void arrive_and_wait() 308*5ffd83dbSDimitry Andric { 309*5ffd83dbSDimitry Andric wait(arrive()); 310*5ffd83dbSDimitry Andric } 311*5ffd83dbSDimitry Andric _LIBCPP_AVAILABILITY_SYNC _LIBCPP_INLINE_VISIBILITY 312*5ffd83dbSDimitry Andric void arrive_and_drop() 313*5ffd83dbSDimitry Andric { 314*5ffd83dbSDimitry Andric __b.arrive_and_drop(); 315*5ffd83dbSDimitry Andric } 316*5ffd83dbSDimitry Andric}; 317*5ffd83dbSDimitry Andric 318*5ffd83dbSDimitry Andric_LIBCPP_END_NAMESPACE_STD 319*5ffd83dbSDimitry Andric 320*5ffd83dbSDimitry Andric#endif // _LIBCPP_STD_VER >= 14 321*5ffd83dbSDimitry Andric 322*5ffd83dbSDimitry Andric#endif //_LIBCPP_BARRIER 323