xref: /freebsd/share/man/man4/geom.4 (revision 27c747876ece5bac04fb038dc1b8672adc23bbb7)
127c74787SPoul-Henning Kamp.\"
227c74787SPoul-Henning Kamp.\" Copyright (c) 2002 Poul-Henning Kamp
327c74787SPoul-Henning Kamp.\" Copyright (c) 2002 Networks Associates Technology, Inc.
427c74787SPoul-Henning Kamp.\" All rights reserved.
527c74787SPoul-Henning Kamp.\"
627c74787SPoul-Henning Kamp.\" This software was developed for the FreeBSD Project by Poul-Henning Kamp
727c74787SPoul-Henning Kamp.\" and NAI Labs, the Security Research Division of Network Associates, Inc.
827c74787SPoul-Henning Kamp.\" under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
927c74787SPoul-Henning Kamp.\" DARPA CHATS research program.
1027c74787SPoul-Henning Kamp.\"
1127c74787SPoul-Henning Kamp.\" Redistribution and use in source and binary forms, with or without
1227c74787SPoul-Henning Kamp.\" modification, are permitted provided that the following conditions
1327c74787SPoul-Henning Kamp.\" are met:
1427c74787SPoul-Henning Kamp.\" 1. Redistributions of source code must retain the above copyright
1527c74787SPoul-Henning Kamp.\"    notice, this list of conditions and the following disclaimer.
1627c74787SPoul-Henning Kamp.\" 2. Redistributions in binary form must reproduce the above copyright
1727c74787SPoul-Henning Kamp.\"    notice, this list of conditions and the following disclaimer in the
1827c74787SPoul-Henning Kamp.\"    documentation and/or other materials provided with the distribution.
1927c74787SPoul-Henning Kamp.\" 3. The names of the authors may not be used to endorse or promote
2027c74787SPoul-Henning Kamp.\"    products derived from this software without specific prior written
2127c74787SPoul-Henning Kamp.\"    permission.
2227c74787SPoul-Henning Kamp.\"
2327c74787SPoul-Henning Kamp.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
2427c74787SPoul-Henning Kamp.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
2527c74787SPoul-Henning Kamp.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
2627c74787SPoul-Henning Kamp.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
2727c74787SPoul-Henning Kamp.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
2827c74787SPoul-Henning Kamp.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
2927c74787SPoul-Henning Kamp.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
3027c74787SPoul-Henning Kamp.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
3127c74787SPoul-Henning Kamp.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
3227c74787SPoul-Henning Kamp.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
3327c74787SPoul-Henning Kamp.\" SUCH DAMAGE.
3427c74787SPoul-Henning Kamp.\"
3527c74787SPoul-Henning Kamp.\" $FreeBSD$
3627c74787SPoul-Henning Kamp.\"
3727c74787SPoul-Henning Kamp.Dd March 27, 2002
3827c74787SPoul-Henning Kamp.Os FreeBSD 5.0
3927c74787SPoul-Henning Kamp.Dt GEOM 4
4027c74787SPoul-Henning Kamp.Sh NAME
4127c74787SPoul-Henning Kamp.Nm GEOM
4227c74787SPoul-Henning Kamp.Nd modular disk I/O request transformation framework.
4327c74787SPoul-Henning Kamp.Sh DESCRIPTION
4427c74787SPoul-Henning KampThe GEOM framework provides an infrastructure in which modules
4527c74787SPoul-Henning Kampcan perform transformations on disk I/O requests on their path from
4627c74787SPoul-Henning Kampthe upper kernel to the device drivers and back.
4727c74787SPoul-Henning Kamp.Pp
4827c74787SPoul-Henning KampTransformations in a GEOM context ranges from the simple geometric
4927c74787SPoul-Henning Kampdisplacement performed in typical disklabel modules over RAID
5027c74787SPoul-Henning Kampalgorithms and device multipath resolution to full blown cryptographic
5127c74787SPoul-Henning Kampprotection of the stored data.
5227c74787SPoul-Henning Kamp.Pp
5327c74787SPoul-Henning KampCompared to traditional "volume management", GEOM differs from most
5427c74787SPoul-Henning Kampand in some cases all previous implementations in the following ways:
5527c74787SPoul-Henning Kamp.Bl -bullet
5627c74787SPoul-Henning Kamp.It
5727c74787SPoul-Henning KampGEOM is extensible.  It is trivially simple to write a new class
5827c74787SPoul-Henning Kampof transformation and it will not be given stepchild treatment.  If
5927c74787SPoul-Henning Kampsomeone for some reason wanted to mount IBM MVS diskpacks, a class
6027c74787SPoul-Henning Kamprecognizing and configuring their VTOC information would be a trivial
6127c74787SPoul-Henning Kampmatter.
6227c74787SPoul-Henning Kamp.It
6327c74787SPoul-Henning KampGEOM is topologically agnostic.  Most volume management implementations
6427c74787SPoul-Henning Kamphave very strict notions of how classes can fit together, very often
6527c74787SPoul-Henning Kampone fixed hierarchy is provided for instance  subdisk - plex -
6627c74787SPoul-Henning Kampvolume.
6727c74787SPoul-Henning Kamp.El
6827c74787SPoul-Henning Kamp.Pp
6927c74787SPoul-Henning KampBeing extensible means that new transformations are treated no differently
7027c74787SPoul-Henning Kampthan existing transformations.
7127c74787SPoul-Henning Kamp.Pp
7227c74787SPoul-Henning KampFixed hierarchies are bad because they make it impossible to express
7327c74787SPoul-Henning Kampthe intent efficiently.
7427c74787SPoul-Henning KampIn the fixed hierarchy above it is not possible to mirror two
7527c74787SPoul-Henning Kampphysical disks and then parition the mirror into subdisks, instead
7627c74787SPoul-Henning Kampone is forced to make subdisks on the physical volumes and to mirror
7727c74787SPoul-Henning Kampthese two and two resulting in a much more complex configuration.
7827c74787SPoul-Henning KampGEOM on the other hand does not care in which order things are done,
7927c74787SPoul-Henning Kampthe only restriction is that cycles in the graph will not be allowed.
8027c74787SPoul-Henning Kamp.Pp
8127c74787SPoul-Henning Kamp.Sh "TERMINOLOGY and TOPOLOGY"
8227c74787SPoul-Henning KampGeom is quite object oriented and consequently the terminology
8327c74787SPoul-Henning Kampborrows a lot of context and sematics from the OO vocabulary:
8427c74787SPoul-Henning Kamp.Pp
8527c74787SPoul-Henning KampA "class", represented by the data structure g_class implements one
8627c74787SPoul-Henning Kampparticular kind of transformation.  Typical examples are MBR disk
8727c74787SPoul-Henning Kamppartition, BSD disklabel or RAID5 classes.
8827c74787SPoul-Henning Kamp.Pp
8927c74787SPoul-Henning KampAn instance of a class is called a "geom" and represented by the
9027c74787SPoul-Henning Kampdata structure "g_geom".  An in typical i386 FreeBSD system, there
9127c74787SPoul-Henning Kampwill be one geom of class MBR for each disk.
9227c74787SPoul-Henning Kamp.Pp
9327c74787SPoul-Henning KampA "provider", represented by the data structure "g_provider", is
9427c74787SPoul-Henning Kampthe front gate at which a geom offers service.
9527c74787SPoul-Henning KampA provider is "a disk-like thing which appear in /dev" - a logical
9627c74787SPoul-Henning Kampdisk in other words.
9727c74787SPoul-Henning KampAll providers have three main properties: name, sectorsize and size. .
9827c74787SPoul-Henning Kamp.Pp
9927c74787SPoul-Henning KampA "consumer" is the backdoor through which a geom connects to another
10027c74787SPoul-Henning Kampgeoms provider and through which I/O requests are sent.
10127c74787SPoul-Henning Kamp.Pp
10227c74787SPoul-Henning KampThe topological relationship between these entities are as follows:
10327c74787SPoul-Henning Kamp.Bl -bullet
10427c74787SPoul-Henning Kamp.It
10527c74787SPoul-Henning KampA class has zero or more geom instances.
10627c74787SPoul-Henning Kamp.It
10727c74787SPoul-Henning KampA geom has exactly one class it is derived from.
10827c74787SPoul-Henning Kamp.It
10927c74787SPoul-Henning KampA geom has zero or more consumers.
11027c74787SPoul-Henning Kamp.It
11127c74787SPoul-Henning KampA geom has zero or more provicers.
11227c74787SPoul-Henning Kamp.It
11327c74787SPoul-Henning KampA consumer can be attached to zero or one providers.
11427c74787SPoul-Henning Kamp.It
11527c74787SPoul-Henning KampA provider can have zero or more consumers attached.
11627c74787SPoul-Henning Kamp.El
11727c74787SPoul-Henning Kamp.Pp
11827c74787SPoul-Henning KampAll geoms have a rank-number assigned which is used to detect and
11927c74787SPoul-Henning Kampprevent loops in the acyclic directed graph, this rank number is
12027c74787SPoul-Henning Kampassigned as follows:
12127c74787SPoul-Henning Kamp.Bl -enum
12227c74787SPoul-Henning Kamp.It
12327c74787SPoul-Henning KampA geom with no attached consumers has rank=1
12427c74787SPoul-Henning Kamp.It
12527c74787SPoul-Henning KampA geom with attached consumers has a rank one higher then the
12627c74787SPoul-Henning Kamphighest rank of the geoms of the providers its consumers are
12727c74787SPoul-Henning Kampattached to.
12827c74787SPoul-Henning Kamp.El
12927c74787SPoul-Henning Kamp.Sh "SPECIAL TOPOLOGICAL MANEUVRES"
13027c74787SPoul-Henning KampIn addition to the straightforward attach which attaches a consumer
13127c74787SPoul-Henning Kampto a provider and dettach which breaks the bond, a number of special
13227c74787SPoul-Henning Kamptoplogical maneuvres exists to facilitate configuration and to
13327c74787SPoul-Henning Kampimprove the overall flexibility.
13427c74787SPoul-Henning Kamp.Pp
13527c74787SPoul-Henning Kamp.Em TASTING
13627c74787SPoul-Henning Kampis a process which happens whenever a new class or new provider
13727c74787SPoul-Henning Kampis created and it is the class' chance to automatically configure an
13827c74787SPoul-Henning Kampinstance on providers which it recognize as its own.
13927c74787SPoul-Henning KampA typical example is the MBR disk-parition class which will look for
14027c74787SPoul-Henning Kampthe MBR table in the first sector and if found and validated it will
14127c74787SPoul-Henning Kampinstantiate a geom to multiplex according to the contents of the MBR.
14227c74787SPoul-Henning Kamp.Pp
14327c74787SPoul-Henning KampA new class will be offered all existing providers in turn and a new
14427c74787SPoul-Henning Kampprovider will be offered to all classes in turn.
14527c74787SPoul-Henning Kamp.Pp
14627c74787SPoul-Henning KampExactly what a class does to recognize if it should accept the offered
14727c74787SPoul-Henning Kampprovider is not defined by GEOM, but the sensible set of options are:
14827c74787SPoul-Henning Kamp.Bl -bullet
14927c74787SPoul-Henning Kamp.It
15027c74787SPoul-Henning KampExamine specific data structures on the disk.
15127c74787SPoul-Henning Kamp.It
15227c74787SPoul-Henning KampExamine properties like sectorsize or mediasize for the provider.
15327c74787SPoul-Henning Kamp.It
15427c74787SPoul-Henning KampExamine the rank number of the providers geom.
15527c74787SPoul-Henning Kamp.It
15627c74787SPoul-Henning KampExamine the method name of the providers geom.
15727c74787SPoul-Henning Kamp.El
15827c74787SPoul-Henning Kamp.Pp
15927c74787SPoul-Henning Kamp.Em ORPHANIZATION
16027c74787SPoul-Henning Kampis the process by which a provider is removed while
16127c74787SPoul-Henning Kampit potentially still being in used.
16227c74787SPoul-Henning Kamp.Pp
16327c74787SPoul-Henning KampWhen a geom makes a provider as orphan all future I/O requests will
16427c74787SPoul-Henning Kamp"bounce" on the provider with an error code set by the geom.  Any
16527c74787SPoul-Henning Kampconsumers attached to the provider will receive notification about
16627c74787SPoul-Henning Kampthe orphanization and need to take appropriate action.
16727c74787SPoul-Henning Kamp.Pp
16827c74787SPoul-Henning KampA geom which came into being as result of a normal taste operation
16927c74787SPoul-Henning Kampshould selfdestruct unless it has an way to keep functioning.  Geoms
17027c74787SPoul-Henning Kamplike disklabels and stripes should therefore selfdestruct whereas
17127c74787SPoul-Henning KampRAID5 or mirror geoms can continue to function as ong as they do
17227c74787SPoul-Henning Kampnot loose quorum.
17327c74787SPoul-Henning Kamp.Pp
17427c74787SPoul-Henning KampWhen a provider is orphaned, this does not result in any immediate
17527c74787SPoul-Henning Kampchange in the topology, any attached consumers are still attached,
17627c74787SPoul-Henning Kampany opened paths are still open, it is the responsibility of the
17727c74787SPoul-Henning Kampgeoms above to close and dettach as soon as this can happen.
17827c74787SPoul-Henning Kamp.Pp
17927c74787SPoul-Henning KampThe typical scenario is that a device driver notices a disk has
18027c74787SPoul-Henning Kampgone and orphans the provider for it.
18127c74787SPoul-Henning KampThe geoms on top receive the orphanization event and orphan all
18227c74787SPoul-Henning Kamptheir providers in turn.
18327c74787SPoul-Henning KampProviders which are not attached to are destroyed right away.
18427c74787SPoul-Henning KampEventually at the toplevel the geom which interfaces
18527c74787SPoul-Henning Kampto the DEVFS received an orphan event on its consumer and it
18627c74787SPoul-Henning Kampcalls destroy_dev(9) and does an explicit close if the
18727c74787SPoul-Henning Kampdevice was open and then dettaches its consumer.
18827c74787SPoul-Henning KampThe provider below is now no longer attached to and can be
18927c74787SPoul-Henning Kampdestroyed, if the geom has no more providers it can dettach
19027c74787SPoul-Henning Kampits consumer and selfdestruct and so the carnage passes back
19127c74787SPoul-Henning Kampdown the tree, until the original provider is dettached from
19227c74787SPoul-Henning Kampand it can be destroyed by the geom serving the device driver.
19327c74787SPoul-Henning Kamp.Pp
19427c74787SPoul-Henning KampWhile this approach seens byzantine it does provide the maximum
19527c74787SPoul-Henning Kampflexibility in handling disapparing devices.
19627c74787SPoul-Henning Kamp.Pp
19727c74787SPoul-Henning Kamp.Em SPOILING
19827c74787SPoul-Henning Kampis a special case of orphanization used to protect
19927c74787SPoul-Henning Kampagainst stale metadata.
20027c74787SPoul-Henning KampIt is probably easiest to understand spoiling by going through
20127c74787SPoul-Henning Kampan example.
20227c74787SPoul-Henning Kamp.Pp
20327c74787SPoul-Henning KampImagine a disk, "da0" on top of which a MBR geom provides
20427c74787SPoul-Henning Kamp"da0s1" and "da0s2" and on top of "da0s1" a BSD geom provides
20527c74787SPoul-Henning Kamp"da0s1a" through "da0s1e", both the MBR and BSD geoms have
20627c74787SPoul-Henning Kampautoconfigured based on data structures on the disk media.
20727c74787SPoul-Henning KampNow imagine the case where "da0" is opened for writing and those
20827c74787SPoul-Henning Kampdata structures are modified or overwritten:  Now the geoms would
20927c74787SPoul-Henning Kampbe operating on stale metadata unless some notification system
21027c74787SPoul-Henning Kampcan inform them otherwise.
21127c74787SPoul-Henning KampTo avoid this situation, when the open of "da0" for write happens,
21227c74787SPoul-Henning Kampall attached consumers are told about this, and geoms like
21327c74787SPoul-Henning KampMBR and BSD will selfdestruct as a result.
21427c74787SPoul-Henning KampWhen "da0" is closed again, it will be offered for tasting again
21527c74787SPoul-Henning Kampand if the data structures for MBR and BSD are still there, new
21627c74787SPoul-Henning Kampgeoms will instantiate themselves anew.
21727c74787SPoul-Henning Kamp.Pp
21827c74787SPoul-Henning KampNow for the fine print:
21927c74787SPoul-Henning Kamp.Pp
22027c74787SPoul-Henning KampIf any of the paths through the MBR or BSD module were open, they
22127c74787SPoul-Henning Kampwould have opened downwards with an exclusive bit rendering it
22227c74787SPoul-Henning Kampimpossible to open "da0" for writing in that case and conversely
22327c74787SPoul-Henning Kampthe requested exclusive bit would render it impossible to open a
22427c74787SPoul-Henning Kamppath through the MBR geom while "da0" is open for writing.
22527c74787SPoul-Henning Kamp.Pp
22627c74787SPoul-Henning KampFrom this it also follows that changing the size of open geoms can
22727c74787SPoul-Henning Kamponly be done through their cooperation.
22827c74787SPoul-Henning Kamp.Pp
22927c74787SPoul-Henning KampFinally: the spoiling only happens when the write count goes from
23027c74787SPoul-Henning Kampzero to non-zero and the retasting only when the write count goes
23127c74787SPoul-Henning Kampback to zero.
23227c74787SPoul-Henning Kamp.Pp
23327c74787SPoul-Henning Kamp.Em INSERT/DELETE
23427c74787SPoul-Henning Kampare a very special operation which allows a new geom
23527c74787SPoul-Henning Kampto be instantiated between a consumer and a provider attached to
23627c74787SPoul-Henning Kampeach other and to remove it again.
23727c74787SPoul-Henning Kamp.Pp
23827c74787SPoul-Henning KampTo understand the utility of this, imagine a provider with
23927c74787SPoul-Henning Kampbeing mounted as a filesystem.
24027c74787SPoul-Henning KampBetween the DEVFS geoms consumer and its provider we insert
24127c74787SPoul-Henning Kampa mirror modules which configures itself with one mirror
24227c74787SPoul-Henning Kampcopy and consequently is transparent to the I/O requests
24327c74787SPoul-Henning Kampon the path.
24427c74787SPoul-Henning KampWe can now configure yet a mirror copy on the mirror geom,
24527c74787SPoul-Henning Kamprequest a synchronization and finally drop the first mirror
24627c74787SPoul-Henning Kampcopy.
24727c74787SPoul-Henning KampWe have now in essence moved a mounted filesystem from one
24827c74787SPoul-Henning Kampdisk to another while it was being used.
24927c74787SPoul-Henning KampAt this point the mirror geom can be deleted from the path
25027c74787SPoul-Henning Kampagain, it has served its purpose.
25127c74787SPoul-Henning Kamp.Pp
25227c74787SPoul-Henning Kamp.Em CONFIGURE
25327c74787SPoul-Henning Kampis the process where the administrator issues instructions
25427c74787SPoul-Henning Kampfor a particular class to instantiate itself.  There are multiple
25527c74787SPoul-Henning Kampways to express intent in this case, a particular provider can be
25627c74787SPoul-Henning Kampspecified with a level of override forcing for instance a BSD
25727c74787SPoul-Henning Kampdisklabel module to attach to a provider which was not found palatable
25827c74787SPoul-Henning Kampduring the TASTE operation.
25927c74787SPoul-Henning Kamp.Pp
26027c74787SPoul-Henning KampFinally IO is the reason we even do this: it concerns itself with
26127c74787SPoul-Henning Kampsending I/O requests through the graph.
26227c74787SPoul-Henning Kamp.Pp
26327c74787SPoul-Henning Kamp.Em "I/O REQUESTS
26427c74787SPoul-Henning Kamprepresented by struct bio, originate at a consumer,
26527c74787SPoul-Henning Kampare scheduled on its attached provider and when processed, returned
26627c74787SPoul-Henning Kampto the consumer.
26727c74787SPoul-Henning KampIt is important to realize that the struct bio which
26827c74787SPoul-Henning Kampenters throuh the provider of a particular geom does not "come
26927c74787SPoul-Henning Kampout on the other side".
27027c74787SPoul-Henning KampEven simple transformations like MBR and BSD will clone the
27127c74787SPoul-Henning Kampstruct bio, modify the clone and schedule the clone on their
27227c74787SPoul-Henning Kampown consumer.
27327c74787SPoul-Henning KampNote that cloning the struct bio does not involve cloning the
27427c74787SPoul-Henning Kampactual data area specified in the IO request.
27527c74787SPoul-Henning Kamp.Pp
27627c74787SPoul-Henning KampIn total five different IO requests exist in GEOM: read, write,
27727c74787SPoul-Henning Kampdelete, format, get attribute and set attribute.
27827c74787SPoul-Henning Kamp.Pp
27927c74787SPoul-Henning KampRead and write are pretty self explanatory.
28027c74787SPoul-Henning Kamp.Pp
28127c74787SPoul-Henning KampDelete indicates that a certain range of data is no longer used
28227c74787SPoul-Henning Kampand that it can be erased or freed as the underlying technology
28327c74787SPoul-Henning Kampsupports.
28427c74787SPoul-Henning KampTechnologies like flash adaptation layers can arrange to erase
28527c74787SPoul-Henning Kampthe relevant blocks before they will become reassigned and
28627c74787SPoul-Henning Kampcrytographic devices may want to fill random bits into the
28727c74787SPoul-Henning Kamprange to reduce the amount of data available for attack.
28827c74787SPoul-Henning Kamp.Pp
28927c74787SPoul-Henning KampIt is important to recognize that a delete indication is not a
29027c74787SPoul-Henning Kamprequest and consequently there is no guarantee that the data actually
29127c74787SPoul-Henning Kampwill be erased or made unavailable unless guaranteed by specific
29227c74787SPoul-Henning Kampgeoms in the graph.  If "secure delete" semantics are required, a
29327c74787SPoul-Henning Kampgeom should be pushed which converts delete indications into (a
29427c74787SPoul-Henning Kampsequence of) write requests.
29527c74787SPoul-Henning Kamp.Pp
29627c74787SPoul-Henning KampGet attribute and set attribute supports inspection and manipulation
29727c74787SPoul-Henning Kampof out-of-band attributes on a particular provider or path.
29827c74787SPoul-Henning KampAttributes are named by ascii strings and they will be discussed in
29927c74787SPoul-Henning Kampa separate section below.
30027c74787SPoul-Henning Kamp.Pp
30127c74787SPoul-Henning Kamp(stay tuned while the author rests his brain and fingers: more to come.)
30227c74787SPoul-Henning Kamp.Sh HISTORY
30327c74787SPoul-Henning KampThis software was developed for the FreeBSD Project by Poul-Henning Kamp
30427c74787SPoul-Henning Kampand NAI Labs, the Security Research Division of Network Associates, Inc.
30527c74787SPoul-Henning Kampunder DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
30627c74787SPoul-Henning KampDARPA CHATS research program.
30727c74787SPoul-Henning Kamp.Pp
30827c74787SPoul-Henning KampThe first precursor for GEOM was a gruesome hack to Minix 1.2 and was
30927c74787SPoul-Henning Kampnever distributed.  An earlier attempt to implement a less general scheme in FreeBSD never succeeded.
31027c74787SPoul-Henning Kamp.Sh AUTHORS
31127c74787SPoul-Henning Kamp.An "Poul-Henning Kamp" Aq phk@FreeBSD.org
312