xref: /titanic_51/usr/src/uts/common/avs/ns/sdbc/dynmem_readme.txt (revision fcf3ce441efd61da9bb2884968af01cb7c1452cc)
1# CDDL HEADER START
2#
3# The contents of this file are subject to the terms of the
4# Common Development and Distribution License (the "License").
5# You may not use this file except in compliance with the License.
6#
7# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
8# or http://www.opensolaris.org/os/licensing.
9# See the License for the specific language governing permissions
10# and limitations under the License.
11#
12# When distributing Covered Code, include this CDDL HEADER in each
13# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
14# If applicable, add the following below this CDDL HEADER, with the
15# fields enclosed by brackets "[]" replaced with your own identifying
16# information: Portions Copyright [yyyy] [name of copyright owner]
17#
18# CDDL HEADER END
19#
20#
21# Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
22# Use is subject to license terms.
23#
24
25TITLE: Dynamic Memory Implementation Overview
26
27DATE:  10/13/2000
28
29AUTHOR: Jim Guerrera (james.guerrera@east)
30
31
321.0 Dynamic Memory Implementation in the SCM Module
33
34The system memory allocation required by the Storage Cache Manager (SCM)
35has been modified to more fully conform to the requirements of the Solaris
36OS. The previous implementation required that the total memory requirements
37of the package be allocated 'up front' during bootup and was never released.
38The current implementation performs 'on demand' allocations at the time
39memory is required in a piecemeal manner. In addition the requisitioned
40memory will be released back to the system at some later time.
41
422.0 Implementation
43
442.1 Memory Allocation
45
46The memory allocation involves modifications primarily to sd_alloc_buf()
47in module sd_bcache.c. When a request is received for cache and system
48resources it is broken down and each piece catagorized both as an
49independent entity and as a member of a group with close neighbors. Cache
50resources comprise cache control entries (ccent), write control entries
51(wctrl for FWC support) and sytem memory. The current allocation algorithim
52for ccent and wrctl remains the same. The memory allocation has been modified
53and falls into two general catagories - single page and multi-page
54allocations.
55
562.1.1 A single page allocation means exactly that  - the ccent points to and
57owns one page of system memory. If two or more ccent are requisitioned to
58support the caching request then only the first entry in the group actually
59owns the the allocated memory of two or more pages. The secondary entries
60simply point to page boundaries within this larger piece of contiguous memory.
61The first entry is termed a host and the secondaries are termed parasites.
62
63The process for determining what is a host, a parasite or anything else is
64done in three phases. Phase one simply determines whether the caching request
65references a disk area already in cache and  marks it as such. If it is not
66in cache it is typed as eligible - i.e. needing memory allocation. Phase
67two scans this list of typed cache entries and based on immediate neighbors
68is catagorized as host, pest or downgraded to other. A host can only exist
69if there is one or more eligible entries immediately following it and it
70itself either starts the list or immediately follows a non-eligible entry.
71If either condition proves false the catagory remains as eligible (i.e.
72needs memory allocation) but the type is cleared to not host (i.e. other).
73The next phase is simply a matter of scanning the cache entry list and
74allocating multipage memory for hosts, single page entries for others or
75simply setting up pointers in the parasitic entries into it's corresponding
76host multipage memory allocation block.
77
782.1.2 The maximum number of parasitic entries following a host memory
79allocation is adjustable by the system administrator. The details of this
80are under the description of the KSTAT interface (Sec 3.0).
81
822.2 Memory Deallocation
83
84Memory deallocation is implemented in  sd_dealloc_dm() in module sd_io.c.
85This possibly overly complicated routine works as follows:
86
87In general the routine sleeps a specified amount of time then wakes and
88examines the entire centry list. If an entry is available (i.e. not in use
89by another thread and has memory which may be deallocated) it takes
90possession and ages the centry by one tick. It then determines if the
91centry has aged sufficiently to have its memory deallocated and for it to
92be placed at the top of the lru.
93
942.3 There are two general deallocation schemes in place depending on
95whether the centry is a single page allocation centry or it is a member
96of a host/parasite multipage allocation chain.
97
982.3.1 The behavior for a single page allocation centry is as follows:
99
100If the given centry is selected as a 'holdover' it will age normally
101however at full aging it will only be placed at the head of the lru.
102It's memory will not be deallocated until a further aging level has
103been reached. The entries selected for this behavior are governed by
104counting the number of these holdovers in existence on each wakeup
105and comparing it to a specified percentage. This comparision is always
106one cycle out of date and will float in the relative vicinity of the
107specified number.
108
109In addition there is a placeholder for centries identified as 'sticky
110meta-data' with its own aging counter. It operates exactly as the holdover
111entries as regards to aging but is absolute - i.e. no percentage governs
112the number of such entries.
113
1142.3.2 The percentage and additional aging count are adjustable by the
115system administrator. The details of this are under the description of
116the KSTAT interface (Sec. 3.0).
117
1182.3.3 The behavior for a host/parasite chain is as follows:
119
120The host/parasite subchain is examined. If all entries are fully aged the
121entire chain is removed - i.e memory is deallocated from the host centry
122and all centry fields are cleared and each entry requeued on to the lru.
123
124There are three sleep times and two percentage levels specifiable by the
125system administrator. A meaningful relationship between these variables
126is:
127
128sleeptime1 >= sleeptime2 >= sleeptime2 and
129100% >= pcntfree1 >= pcntfree2 >= 0%
130
131sleeptime1 is honored between 100% free and pcntfree1. sleeptime2 is
132honored between pcntfree1 and pcntfree2. sleeptime3 is honored between
133pcntfree2 and 0% free. The general thrust here is to automatically
134adjust sleep time to centry load.
135
136In addition  there exist an accelerated aging flag which mimics hysterisis
137behavior. If the available centrys fall between pcntfree1 and pcntfree2
138an 8 bit counter is switched on. The effect is to keep the timer value
139at sleeptime2 for 8 cycles even if the number available cache entries
140drifts above pcntfree1. If it falls below pcntfree2 an additional 8 bit
141counter is switched on. This causes the sleep timer to remain at sleeptime3
142for at least 8 cycles even if it floats above pcntfree2 or even pcntfree1.
143The overall effect of this is to accelerate the release of system resources
144under what the thread thinks is a heavy load as measured by the number of
145used cache entries.
146
1473.0 Dynamic Memory Tuning
148
149A number of behavior modification variables are accessible via system calls
150to the kstat library. A sample program exercising the various features can
151be found in ./src/cmd/ns/sdbc/sdbc_dynmem.c. In addition the behavior variable
152identifiers can be placed in the sdbc.conf file and will take effect on bootup.
153There is also a
154number of dynamic memory statistics available to gauge its current state.
155
1563.1 Behavior Variables
157
158sdbc_monitor_dynmem --- D0=monitor thread shutdown in the console window
159                        D1=print deallocation thread stats to the console
160                        window
161                        D2=print more deallocation thread stats to the console
162                        window
163                        (usage: setting a value of 6 = 2+4 sets D1 and D2)
164sdbc_max_dyn_list ----- 1 to ?: sets the maximum host/parasite list length
165                        (A length of 1 prevents any multipage allocations from
166                        occuring and effectively removes the concept of
167                        host/parasite.)
168sdbc_cache_aging_ct1 -- 1 to 255: fully aged count (everything but meta and
169			holdover)
170sdbc_cache_aging_ct2 -- 1 to 255: fully aged count for meta-data entries
171sdbc_cache_aging_ct3 -- 1 to 255: fully aged count for holdovers
172sdbc_cache_aging_sec1 - 1 to 255: sleep level 1 for 100% to pcnt1 free cache
173			entries
174sdbc_cache_aging_sec2 - 1 to 255: sleep level 2 for pcnt1 to pcnt2 free cache
175			entries
176sdbc_cache_aging_sec3 - 1 to 255: sleep level 3 for pcnt2 to 0% free cache
177			entries
178sdbc_cache_aging_pcnt1- 0 to 100: cache free percent for transition from
179			sleep1 to sleep2
180sdbc_cache_aging_pcnt2- 0 to 100: cache free percent for transition from
181			sleep2 to sleep3
182sdbc_max_holds_pcnt --- 0 to 100: max percent of cache entries to be maintained
183			as holdovers
184
1853.2 Statistical Variables
186
187Cache Stats (per wake cycle) (r/w):
188sdbc_alloc_ct --------- total allocations performed
189sdbc_dealloc_ct ------- total deallocations performed
190sdbc_history ---------- current hysterisis flag setting
191sdbc_nodatas ---------- cache entries w/o memory assigned
192sdbc_candidates ------- cache entries ready to be aged or released
193sdbc_deallocs --------- cache entries w/memory deallocated and requeued
194sdbc_hosts ------------ number of host cache entries
195sdbc_pests ------------ number of parasitic cache entries
196sdbc_metas ------------ number of meta-data cache entries
197sdbc_holds ------------ number of holdovers (fully aged w/memory and requeued)
198sdbc_others ----------- number of not [host, pests or metas]
199sdbc_notavail --------- number of cache entries to bypass (nodatas+'in use by
200                        other processes')
201sdbc_process_directive- D0=1 wake thread
202                        D1=1 temporaily accelerate aging (set the hysterisis
203                        flag)
204sdbc_simplect --------- simple count of the number of times the kstat update
205			routine has been called
206
207
2083.3 Range Checks and Limits
209
210Only range limits are checked. Internal inconsistencies are not checked
211(e.g. pcnt2 > pcnt1). Inconsistencies won't break the system you just won't
212get meaningful behavior.
213
214The aging counter and sleep timer limits are arbitrarily limited to a byte
215wide counter. This can be expanded. However max'ing the values under the
216current implementation yields about 18 hours for full aging.
217
2183.4 Kstat Lookup Name
219
220The kstat_lookup() module name is "sdbc:dynmem" with an instance of 0.
221
2223.5 Defaults
223
224Default values are:
225sdbc_max_dyn_list = 8
226sdbc_monitor_dynmem = 0
227sdbc_cache_aging_ct1 = 3
228sdbc_cache_aging_ct2 = 3
229sdbc_cache_aging_ct3 = 3
230sdbc_cache_aging_sec1 = 10
231sdbc_cache_aging_sec2 = 5
232sdbc_cache_aging_sec3 = 1
233sdbc_cache_aging_pcnt1 = 50
234sdbc_cache_aging_pcnt2 = 25
235sdbc_max_holds_pcnt = 0
236
237To make the dynmem act for all intents and purposes like the static model
238beyond the inital startup the appropriate values are:
239sdbc_max_dyn_list = 1,
240sdbc_cache_aging_ct1/2/3=255,
241sdbc_cache_aging_sec1/2/3=255
242The remaining variables are irrelevant.
243
2444.0 KSTAT Implementation for Existing Statistics
245
246The existing cache statistical reporting mechanism has been replaced by
247the kstat library reporting mechanism. In general the statistics fall into
248two general catagories - global and shared. The global stats reflect gross
249behavior over all cached volumes and shared reflects behavior particular
250to each cached volume.
251
2524.1 Global KSTAT lookup_name
253
254The kstat_lookup() module name is "sdbc:gstats" with an instance of 0. The
255identifying ascii strings and associated values matching the sd_stats driver
256structure are:
257
258sdbc_dirty -------- net_dirty
259sdbc_pending ------ net_pending
260sdbc_free --------- net_free
261sdbc_count -------- st_count		- number of opens for device
262sdbc_loc_count ---- st_loc_count	- number of open devices
263sdbc_rdhits ------- st_rdhits		- number of read hits
264sdbc_rdmiss ------- st_rdmiss		- number of read misses
265sdbc_wrhits ------- st_wrhits		- number of write hits
266sdbc_wrmiss ------- st_wrmiss		- number of write misses
267sdbc_blksize ------ st_blksize		- cache block size
268sdbc_num_memsize -- SD_MAX_MEM		- number of defined blocks
269					  (currently 6)
270To find the size of each memory blocks append the numbers 0 to 5 to
271'sdbc_memsize'.
272sdbc_memsize0 ----- local memory
273sdbc_memsize1 ----- cache memory
274sdbc_memsize2 ----- iobuf memory
275sdbc_memsize3 ----- hash memory
276sdbc_memsize4 ----- global memory
277sdbc_memsize5 ----- stats memory
278sdbc_total_cmem --- st_total_cmem	- memory used by cache structs
279sdbc_total_smem --- st_total_smem	- memory used by stat  structs
280sdbc_lru_blocks --- st_lru_blocks
281sdbc_lru_noreq ---- st_lru_noreq
282sdbc_lru_req ------ st_lru_req
283sdbc_num_wlru_inq - MAX_CACHE_NET	- number of net (currently 4)
284To find the size of the least recently used write cache per net append
285the numbers 0-3 to sdbc_wlru_inq
286sdbc_wlru_inq0 ---- net 0
287sdbc_wlru_inq1 ---- net 1
288sdbc_wlru_inq2 ---- net 2
289sdbc_wlru_inq3 ---- net 3
290sdbc_cachesize ---- st_cachesize	- cache size
291sdbc_numblocks ---- st_numblocks	- cache blocks
292sdbc_num_shared --- MAXFILES*2		- number of shared structures (one for
293					  each cached volume)
294					  This number dictates the maximum
295					  index size for shared stats and
296					  names given below.
297sdbc_simplect ----- simple count of the number of times the kstat update routine
298		    has been called
299
300All fields are read only.
301
302
3034.2 Shared Structures KSTAT lookup_name
304
305The kstat_lookup() module name is "sdbc:shstats" and "sdbc:shname" both with
306an instance of 0. The identifying ascii strings and associated values matching
307the sd_shared driver structure are:
308
309sdbc:shstats module
310sdbc_index ------- structure index number
311sdbc_alloc ------- sh_alloc		- is this allocated?
312sdbc_failed ------ sh_failed		- Disk failure status (0=ok,1= /o error
313						,2= open failed)
314sdbc_cd ---------- sh_cd		- the cache descriptor. (for stats)
315sdbc_cache_read -- sh_cache_read	- Number of bytes read from cache
316sdbc_cache_write - sh_cache_write	- Number of bytes written  to cache
317sdbc_disk_read --- sh_disk_read		- Number of bytes read from disk
318sdbc_disk_write -- sh_disk_write	- Number of bytes written  to disk
319sdbc_filesize ---- sh_filesize		- Filesize
320sdbc_numdirty ---- sh_numdirty		- Number of dirty blocks
321sdbc_numio ------- sh_numio		- Number of blocks on way to disk
322sdbc_numfail ----- sh_numfail		- Number of blocks failed
323sdbc_flushloop --- sh_flushloop		- Loops delayed so far
324sdbc_flag -------- sh_flag		- Flags visible to user programs
325sdbc_simplect ---- simple count of the number of times the kstat update routine
326		   has been called
327
328sdbc:shname module
329read in as raw bytes and interpreted as a nul terminated assci string.
330
331These two modules operate hand in hand based on information obtained from the
332"sdbc:gstats" module. "sdbc:gstats - sdbc_num_shared" gives the maximum number
333possible of shared devices. It does not tell how many devices are actually
334cached - just the maximum possible. In order to determine the number present
335and retrieve the statistics for each device the user must:
336
3371. open and read "sdbc:shstats"
3382. set the index "sdbc_index" to a starting value (presumably 0)
3393. write the kstat module ( the only item in the module is sdbc_index)
340
341What this does is set a starting index for all subsequent reads.
342
3434. to get the device count and associated statistics the user now simply
344reads each module "sdbc:shstats" and "sdbc:shname" as a group repeatedly -
345the index will auto increment
346
347To reset the index set "sdbc:shstats - sdbc_index" to the required value
348and write the module.
349
350The first entry returning a nul string to "sdbc:shname" signifies no more
351configured devices.
352
353