xref: /freebsd/sys/geom/notes (revision aa339f1d5df9e38f36a34eb522355c4eebcae6c4)
1
2For the lack of a better place to put them, this file will contain
3notes on some of the more intricate details of geom.
4
5-----------------------------------------------------------------------
6Locking of bio_children and bio_inbed
7
8bio_children is used by g_std_done() and g_clone_bio() to keep track
9of children cloned off a request.  g_clone_bio will increment the
10bio_children counter for each time it is called and g_std_done will
11increment bio_inbed for every call, and if the two counters are
12equal, call g_io_deliver() on the parent bio.
13
14The general assumption is that g_clone_bio() is called only in
15the g_down thread, and g_std_done() only in the g_up thread and
16therefore the two fields do not generally need locking.  These
17restrictions are not enforced by the code, but only with great
18care should they be violated.
19
20It is the responsibility of the class implementation to avoid the
21following race condition:  A class intend to split a bio in two
22children.  It clones the bio, and requests I/O on the child.
23This I/O operation completes before the second child is cloned
24and g_std_done() sees the counters both equal 1 and finishes off
25the bio.
26
27There is no race present in the common case where the bio is split
28in multiple parts in the class start method and the I/O is requested
29on another GEOM class below:  There is only one g_down thread and
30the class below will not get its start method run until we return
31from our start method, and consequently the I/O cannot complete
32prematurely.
33
34In all other cases, this race needs to be mitigated, for instance
35by cloning all children before I/O is request on any of them.
36
37Notice that cloning an "extra" child and calling g_std_done() on
38it directly opens another race since the assumption is that
39g_std_done() only is called in the g_up thread.
40
41-----------------------------------------------------------------------
42Statistics collection
43
44Statistics collection can run at three levels controlled by the
45"kern.geom.collectstats" sysctl.
46
47At level zero, only the number of transactions started and completed
48are counted, and this is only because GEOM internally uses the difference
49between these two as sanity checks.
50
51At level one we collect the full statistics.  Higher levels are
52reserved for future use.  Statistics are collected independently
53on both the provider and the consumer, because multiple consumers
54can be active against the same provider at the same time.
55
56The statistics collection falls in two parts:
57
58The first and simpler part consists of g_io_request() timestamping
59the struct bio when the request is first started and g_io_deliver()
60updating the consumer and providers statistics based on fields in
61the bio when it is completed.  There are no concurrency or locking
62concerns in this part.  The statistics collected consists of number
63of requests, number of bytes, number of ENOMEM errors, number of
64other errors and duration of the request for each of the three
65major request types: BIO_READ, BIO_WRITE and BIO_DELETE.
66
67The second part is trying to keep track of the "busy%".
68
69If in g_io_request() we find that there are no outstanding requests,
70(based on the counters for scheduled and completed requests being
71equal), we set a timestamp in the "wentbusy" field.  Since there
72are no outstanding requests, and as long as there is only one thread
73pushing the g_down queue, we cannot possibly conflict with
74g_io_deliver() until we ship the current request down.
75
76In g_io_deliver() we calculate the delta-T from wentbusy and add this
77to the "bt" field, and set wentbusy to the current timestamp.  We
78take care to do this before we increment the "requests completed"
79counter, since that prevents g_io_request() from touching the
80"wentbusy" timestamp concurrently.
81
82The statistics data is made available to userland through the use
83of a special allocator (in geom_stats.c) which through a device
84allows userland to mmap(2) the pages containing the statistics data.
85In order to indicate to userland when the data in a statstics
86structure might be inconsistent, g_io_deliver() atomically sets a
87flag "updating" and resets it when the structure is again consistent.
88-----------------------------------------------------------------------
89maxsize, stripesize and stripeoffset
90
91maxsize is the biggest request we are willing to handle.  If not
92set there is no upper bound on the size of a request and the code
93is responsible for chopping it up.  Only hardware methods should
94set an upper bound in this field.  Geom_disk will inherit the upper
95bound set by the device driver.
96
97stripesize is the width of any natural request boundaries for the
98device.  This would be the width of a stripe on a raid-5 unit or
99one zone in GBDE.  The idea with this field is to hint to clustering
100type code to not trivially overrun these boundaries.
101
102stripeoffset is the amount of the first stripe which lies before the
103devices beginning.
104
105If we have a device with 64k stripes:
106	[0...64k[
107	[64k...128k[
108	[128k..192k[
109Then it will have stripesize = 64k and stripeoffset = 0.
110
111If we put a MBR on this device, where slice#1 starts on sector#63,
112then this slice will have: stripesize = 64k, stripeoffset = 63 * sectorsize.
113
114If the clustering code wants to widen a request which writes to
115sector#53 of the slice, it can calculate how many bytes till the end of
116the stripe as:
117	stripewith - (53 * sectorsize + stripeoffset) % stripewidth.
118-----------------------------------------------------------------------
119
120#include file usage:
121
122                 geom.h|geom_int.h|geom_ext.h|geom_ctl.h|libgeom.h
123----------------+------+----------+----------+----------+--------+
124geom class      |      |          |          |          |        |
125implementation  |   X  |          |          |          |        |
126----------------+------+----------+----------+----------+--------+
127geom kernel     |      |          |          |          |        |
128infrastructure  |   X  |      X   |  X       |    X     |        |
129----------------+------+----------+----------+----------+--------+
130libgeom         |      |          |          |          |        |
131implementation  |      |          |  X       |    X     |  X     |
132----------------+------+----------+----------+----------+--------+
133geom aware      |      |          |          |          |        |
134application     |      |          |          |    X     |  X     |
135----------------+------+----------+----------+----------+--------+
136
137geom_slice.h is special in that it documents a "library" for implementing
138a specific kind of class, and consequently does not appear in the above
139matrix.
140-----------------------------------------------------------------------
141Removable media.
142
143In general, the theory is that a drive creates the provider when it has
144a media and destroys it when the media disappears.
145
146In a more realistic world, we will allow a provider to be opened medialess
147(set any sectorsize and a mediasize==0) in order to allow operations like
148open/close tray etc.
149
150