xref: /freebsd/share/man/man4/geom.4 (revision 16e88145224970828fbeabf6a673794a2bd2e5a4)
127c74787SPoul-Henning Kamp.\"
227c74787SPoul-Henning Kamp.\" Copyright (c) 2002 Poul-Henning Kamp
327c74787SPoul-Henning Kamp.\" Copyright (c) 2002 Networks Associates Technology, Inc.
427c74787SPoul-Henning Kamp.\" All rights reserved.
527c74787SPoul-Henning Kamp.\"
627c74787SPoul-Henning Kamp.\" This software was developed for the FreeBSD Project by Poul-Henning Kamp
727c74787SPoul-Henning Kamp.\" and NAI Labs, the Security Research Division of Network Associates, Inc.
827c74787SPoul-Henning Kamp.\" under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the
927c74787SPoul-Henning Kamp.\" DARPA CHATS research program.
1027c74787SPoul-Henning Kamp.\"
1127c74787SPoul-Henning Kamp.\" Redistribution and use in source and binary forms, with or without
1227c74787SPoul-Henning Kamp.\" modification, are permitted provided that the following conditions
1327c74787SPoul-Henning Kamp.\" are met:
1427c74787SPoul-Henning Kamp.\" 1. Redistributions of source code must retain the above copyright
1527c74787SPoul-Henning Kamp.\"    notice, this list of conditions and the following disclaimer.
1627c74787SPoul-Henning Kamp.\" 2. Redistributions in binary form must reproduce the above copyright
1727c74787SPoul-Henning Kamp.\"    notice, this list of conditions and the following disclaimer in the
1827c74787SPoul-Henning Kamp.\"    documentation and/or other materials provided with the distribution.
1927c74787SPoul-Henning Kamp.\" 3. The names of the authors may not be used to endorse or promote
2027c74787SPoul-Henning Kamp.\"    products derived from this software without specific prior written
2127c74787SPoul-Henning Kamp.\"    permission.
2227c74787SPoul-Henning Kamp.\"
2327c74787SPoul-Henning Kamp.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
2427c74787SPoul-Henning Kamp.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
2527c74787SPoul-Henning Kamp.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
2627c74787SPoul-Henning Kamp.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
2727c74787SPoul-Henning Kamp.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
2827c74787SPoul-Henning Kamp.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
2927c74787SPoul-Henning Kamp.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
3027c74787SPoul-Henning Kamp.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
3127c74787SPoul-Henning Kamp.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
3227c74787SPoul-Henning Kamp.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
3327c74787SPoul-Henning Kamp.\" SUCH DAMAGE.
3427c74787SPoul-Henning Kamp.\"
3527c74787SPoul-Henning Kamp.\" $FreeBSD$
3627c74787SPoul-Henning Kamp.\"
3727c74787SPoul-Henning Kamp.Dd March 27, 2002
38fc412d1bSRuslan Ermilov.Os
3927c74787SPoul-Henning Kamp.Dt GEOM 4
4027c74787SPoul-Henning Kamp.Sh NAME
4127c74787SPoul-Henning Kamp.Nm GEOM
4278ad5421SRuslan Ermilov.Nd "modular disk I/O request transformation framework"
4327c74787SPoul-Henning Kamp.Sh DESCRIPTION
4478ad5421SRuslan ErmilovThe
4578ad5421SRuslan Ermilov.Nm
4678ad5421SRuslan Ermilovframework provides an infrastructure in which
4778ad5421SRuslan Ermilov.Dq classes
4827c74787SPoul-Henning Kampcan perform transformations on disk I/O requests on their path from
4927c74787SPoul-Henning Kampthe upper kernel to the device drivers and back.
5027c74787SPoul-Henning Kamp.Pp
5178ad5421SRuslan ErmilovTransformations in a
5278ad5421SRuslan Ermilov.Nm
5378ad5421SRuslan Ermilovcontext range from the simple geometric
54d773aebdSPoul-Henning Kampdisplacement performed in typical disk partitioning modules over RAID
5527c74787SPoul-Henning Kampalgorithms and device multipath resolution to full blown cryptographic
5627c74787SPoul-Henning Kampprotection of the stored data.
5727c74787SPoul-Henning Kamp.Pp
5878ad5421SRuslan ErmilovCompared to traditional
5978ad5421SRuslan Ermilov.Dq "volume management" ,
6078ad5421SRuslan Ermilov.Nm
6178ad5421SRuslan Ermilovdiffers from most
6227c74787SPoul-Henning Kampand in some cases all previous implementations in the following ways:
6327c74787SPoul-Henning Kamp.Bl -bullet
6427c74787SPoul-Henning Kamp.It
6578ad5421SRuslan Ermilov.Nm
6678ad5421SRuslan Ermilovis extensible.
675203edcdSRuslan ErmilovIt is trivially simple to write a new class
685203edcdSRuslan Ermilovof transformation and it will not be given stepchild treatment.
695203edcdSRuslan ErmilovIf
7027c74787SPoul-Henning Kampsomeone for some reason wanted to mount IBM MVS diskpacks, a class
7127c74787SPoul-Henning Kamprecognizing and configuring their VTOC information would be a trivial
7227c74787SPoul-Henning Kampmatter.
7327c74787SPoul-Henning Kamp.It
7478ad5421SRuslan Ermilov.Nm
7578ad5421SRuslan Ermilovis topologically agnostic.
765203edcdSRuslan ErmilovMost volume management implementations
7727c74787SPoul-Henning Kamphave very strict notions of how classes can fit together, very often
7878ad5421SRuslan Ermilovone fixed hierarchy is provided, for instance, subdisk - plex -
7927c74787SPoul-Henning Kampvolume.
8027c74787SPoul-Henning Kamp.El
8127c74787SPoul-Henning Kamp.Pp
8227c74787SPoul-Henning KampBeing extensible means that new transformations are treated no differently
8327c74787SPoul-Henning Kampthan existing transformations.
8427c74787SPoul-Henning Kamp.Pp
8527c74787SPoul-Henning KampFixed hierarchies are bad because they make it impossible to express
8627c74787SPoul-Henning Kampthe intent efficiently.
8778ad5421SRuslan ErmilovIn the fixed hierarchy above, it is not possible to mirror two
8856cf50adSPoul-Henning Kampphysical disks and then partition the mirror into subdisks, instead
8927c74787SPoul-Henning Kampone is forced to make subdisks on the physical volumes and to mirror
9078ad5421SRuslan Ermilovthese two and two, resulting in a much more complex configuration.
9178ad5421SRuslan Ermilov.Nm
9278ad5421SRuslan Ermilovon the other hand does not care in which order things are done,
9327c74787SPoul-Henning Kampthe only restriction is that cycles in the graph will not be allowed.
9478ad5421SRuslan Ermilov.Sh "TERMINOLOGY AND TOPOLOGY"
9578ad5421SRuslan Ermilov.Nm
9678ad5421SRuslan Ermilovis quite object oriented and consequently the terminology
9756cf50adSPoul-Henning Kampborrows a lot of context and semantics from the OO vocabulary:
9827c74787SPoul-Henning Kamp.Pp
9978ad5421SRuslan ErmilovA
10078ad5421SRuslan Ermilov.Dq class ,
10178ad5421SRuslan Ermilovrepresented by the data structure
10278ad5421SRuslan Ermilov.Vt g_class
10378ad5421SRuslan Ermilovimplements one
1045203edcdSRuslan Ermilovparticular kind of transformation.
1055203edcdSRuslan ErmilovTypical examples are MBR disk
10656cf50adSPoul-Henning Kamppartition, BSD disklabel, and RAID5 classes.
10727c74787SPoul-Henning Kamp.Pp
10878ad5421SRuslan ErmilovAn instance of a class is called a
10978ad5421SRuslan Ermilov.Dq geom
11078ad5421SRuslan Ermilovand represented by the data structure
11178ad5421SRuslan Ermilov.Vt g_geom .
11278ad5421SRuslan ErmilovIn a typical i386
11378ad5421SRuslan Ermilov.Fx
11478ad5421SRuslan Ermilovsystem, there
11527c74787SPoul-Henning Kampwill be one geom of class MBR for each disk.
11627c74787SPoul-Henning Kamp.Pp
11778ad5421SRuslan ErmilovA
11878ad5421SRuslan Ermilov.Dq provider ,
11978ad5421SRuslan Ermilovrepresented by the data structure
12078ad5421SRuslan Ermilov.Vt g_provider ,
12178ad5421SRuslan Ermilovis the front gate at which a geom offers service.
12278ad5421SRuslan ErmilovA provider is
12378ad5421SRuslan Ermilov.Do
12478ad5421SRuslan Ermilova disk-like thing which appears in
12578ad5421SRuslan Ermilov.Pa /dev
12678ad5421SRuslan Ermilov.Dc - a logical
12727c74787SPoul-Henning Kampdisk in other words.
12878ad5421SRuslan ErmilovAll providers have three main properties:
12978ad5421SRuslan Ermilov.Dq name ,
13078ad5421SRuslan Ermilov.Dq sectorsize
13178ad5421SRuslan Ermilovand
13278ad5421SRuslan Ermilov.Dq size .
13327c74787SPoul-Henning Kamp.Pp
13478ad5421SRuslan ErmilovA
13578ad5421SRuslan Ermilov.Dq consumer
13678ad5421SRuslan Ermilovis the backdoor through which a geom connects to another
13756cf50adSPoul-Henning Kampgeom provider and through which I/O requests are sent.
13827c74787SPoul-Henning Kamp.Pp
13927c74787SPoul-Henning KampThe topological relationship between these entities are as follows:
14027c74787SPoul-Henning Kamp.Bl -bullet
14127c74787SPoul-Henning Kamp.It
14227c74787SPoul-Henning KampA class has zero or more geom instances.
14327c74787SPoul-Henning Kamp.It
14427c74787SPoul-Henning KampA geom has exactly one class it is derived from.
14527c74787SPoul-Henning Kamp.It
14627c74787SPoul-Henning KampA geom has zero or more consumers.
14727c74787SPoul-Henning Kamp.It
14856cf50adSPoul-Henning KampA geom has zero or more providers.
14927c74787SPoul-Henning Kamp.It
15027c74787SPoul-Henning KampA consumer can be attached to zero or one providers.
15127c74787SPoul-Henning Kamp.It
15227c74787SPoul-Henning KampA provider can have zero or more consumers attached.
15327c74787SPoul-Henning Kamp.El
15427c74787SPoul-Henning Kamp.Pp
15556cf50adSPoul-Henning KampAll geoms have a rank-number assigned, which is used to detect and
1565203edcdSRuslan Ermilovprevent loops in the acyclic directed graph.
1575203edcdSRuslan ErmilovThis rank number is
15827c74787SPoul-Henning Kampassigned as follows:
15927c74787SPoul-Henning Kamp.Bl -enum
16027c74787SPoul-Henning Kamp.It
16178ad5421SRuslan ErmilovA geom with no attached consumers has rank=1.
16227c74787SPoul-Henning Kamp.It
16356cf50adSPoul-Henning KampA geom with attached consumers has a rank one higher than the
16427c74787SPoul-Henning Kamphighest rank of the geoms of the providers its consumers are
16527c74787SPoul-Henning Kampattached to.
16627c74787SPoul-Henning Kamp.El
16757bd0fc6SJens Schweikhardt.Sh "SPECIAL TOPOLOGICAL MANEUVERS"
16856cf50adSPoul-Henning KampIn addition to the straightforward attach, which attaches a consumer
16957bd0fc6SJens Schweikhardtto a provider, and detach, which breaks the bond, a number of special
17057bd0fc6SJens Schweikhardttopological maneuvers exists to facilitate configuration and to
17127c74787SPoul-Henning Kampimprove the overall flexibility.
17278ad5421SRuslan Ermilov.Bl -inset
17378ad5421SRuslan Ermilov.It Em TASTING
17456cf50adSPoul-Henning Kampis a process that happens whenever a new class or new provider
17578ad5421SRuslan Ermilovis created, and it provides the class a chance to automatically configure an
17616e88145SCeri Daviesinstance on providers which it recognizes as its own.
17756cf50adSPoul-Henning KampA typical example is the MBR disk-partition class which will look for
17878ad5421SRuslan Ermilovthe MBR table in the first sector and, if found and validated, will
17927c74787SPoul-Henning Kampinstantiate a geom to multiplex according to the contents of the MBR.
18027c74787SPoul-Henning Kamp.Pp
18156cf50adSPoul-Henning KampA new class will be offered to all existing providers in turn and a new
18227c74787SPoul-Henning Kampprovider will be offered to all classes in turn.
18327c74787SPoul-Henning Kamp.Pp
18427c74787SPoul-Henning KampExactly what a class does to recognize if it should accept the offered
18578ad5421SRuslan Ermilovprovider is not defined by
18678ad5421SRuslan Ermilov.Nm ,
18778ad5421SRuslan Ermilovbut the sensible set of options are:
18827c74787SPoul-Henning Kamp.Bl -bullet
18927c74787SPoul-Henning Kamp.It
19027c74787SPoul-Henning KampExamine specific data structures on the disk.
19127c74787SPoul-Henning Kamp.It
19278ad5421SRuslan ErmilovExamine properties like
19378ad5421SRuslan Ermilov.Dq sectorsize
19478ad5421SRuslan Ermilovor
19578ad5421SRuslan Ermilov.Dq mediasize
19678ad5421SRuslan Ermilovfor the provider.
19727c74787SPoul-Henning Kamp.It
19856cf50adSPoul-Henning KampExamine the rank number of the provider's geom.
19927c74787SPoul-Henning Kamp.It
20056cf50adSPoul-Henning KampExamine the method name of the provider's geom.
20127c74787SPoul-Henning Kamp.El
20278ad5421SRuslan Ermilov.It Em ORPHANIZATION
20327c74787SPoul-Henning Kampis the process by which a provider is removed while
20456cf50adSPoul-Henning Kampit potentially is still being used.
20527c74787SPoul-Henning Kamp.Pp
206c1c85751SPoul-Henning KampWhen a geom orphans a provider, all future I/O requests will
20778ad5421SRuslan Ermilov.Dq bounce
20878ad5421SRuslan Ermilovon the provider with an error code set by the geom.
2095203edcdSRuslan ErmilovAny
21027c74787SPoul-Henning Kampconsumers attached to the provider will receive notification about
211c1c85751SPoul-Henning Kampthe orphanization when the event loop gets around to it, and they
212d773aebdSPoul-Henning Kampcan take appropriate action at that time.
21327c74787SPoul-Henning Kamp.Pp
21456cf50adSPoul-Henning KampA geom which came into being as a result of a normal taste operation
21516e88145SCeri Daviesshould self-destruct unless it has a way to keep functioning whilst
21616e88145SCeri Davieslacking the orphaned provider.
21778ad5421SRuslan ErmilovGeoms like disk slicers should therefore self-destruct whereas
21816e88145SCeri DaviesRAID5 or mirror geoms will be able to continue as long as they do
21916e88145SCeri Daviesnot lose quorum.
22027c74787SPoul-Henning Kamp.Pp
221c1c85751SPoul-Henning KampWhen a provider is orphaned, this does not necessarily result in any
222c1c85751SPoul-Henning Kampimmediate change in the topology: any attached consumers are still
223c1c85751SPoul-Henning Kampattached, any opened paths are still open, any outstanding I/O
224c1c85751SPoul-Henning Kamprequests are still outstanding.
22527c74787SPoul-Henning Kamp.Pp
22678ad5421SRuslan ErmilovThe typical scenario is:
22778ad5421SRuslan Ermilov.Pp
228c1c85751SPoul-Henning Kamp.Bl -bullet -offset indent -compact
229c1c85751SPoul-Henning Kamp.It
230c1c85751SPoul-Henning KampA device driver detects a disk has departed and orphans the provider for it.
231c1c85751SPoul-Henning Kamp.It
232c1c85751SPoul-Henning KampThe geoms on top of the disk receive the orphanization event and
23316e88145SCeri Daviesorphan all their providers in turn.
23416e88145SCeri DaviesProviders which are not attached to will typically self-destruct
235c1c85751SPoul-Henning Kampright away.
236c1c85751SPoul-Henning KampThis process continues in a quasi-recursive fashion until all
23716e88145SCeri Daviesrelevant pieces of the tree have heard the bad news.
238c1c85751SPoul-Henning Kamp.It
239c1c85751SPoul-Henning KampEventually the buck stops when it reaches geom_dev at the top
240c1c85751SPoul-Henning Kampof the stack.
241c1c85751SPoul-Henning Kamp.It
24278ad5421SRuslan ErmilovGeom_dev will call
24378ad5421SRuslan Ermilov.Xr destroy_dev 9
24416e88145SCeri Daviesto stop any more requests from
245c1c85751SPoul-Henning Kampcoming in.
24616e88145SCeri DaviesIt will sleep until any and all outstanding I/O requests have
247c1c85751SPoul-Henning Kampbeen returned.
24878ad5421SRuslan ErmilovIt will explicitly close (i.e.: zero the access counts), a change
249c1c85751SPoul-Henning Kampwhich will propagate all the way down through the mesh.
250c1c85751SPoul-Henning KampIt will then detach and destroy its geom.
251c1c85751SPoul-Henning Kamp.It
252c1c85751SPoul-Henning KampThe geom whose provider is now attached will destroy the provider,
253c1c85751SPoul-Henning Kampdetach and destroy its consumer and destroy its geom.
254c1c85751SPoul-Henning Kamp.It
255c1c85751SPoul-Henning KampThis process percolates all the way down through the mesh, until
256c1c85751SPoul-Henning Kampthe cleanup is complete.
257c1c85751SPoul-Henning Kamp.El
25827c74787SPoul-Henning Kamp.Pp
25956cf50adSPoul-Henning KampWhile this approach seems byzantine, it does provide the maximum
260c1c85751SPoul-Henning Kampflexibility and robustness in handling disappearing devices.
261c1c85751SPoul-Henning Kamp.Pp
26216e88145SCeri DaviesThe one absolutely crucial detail to be aware of is that if the
263c1c85751SPoul-Henning Kampdevice driver does not return all I/O requests, the tree will
264d773aebdSPoul-Henning Kampnot unravel.
26578ad5421SRuslan Ermilov.It Em SPOILING
26627c74787SPoul-Henning Kampis a special case of orphanization used to protect
26727c74787SPoul-Henning Kampagainst stale metadata.
26827c74787SPoul-Henning KampIt is probably easiest to understand spoiling by going through
26927c74787SPoul-Henning Kampan example.
27027c74787SPoul-Henning Kamp.Pp
27178ad5421SRuslan ErmilovImagine a disk,
27216e88145SCeri Davies.Pa da0 ,
27378ad5421SRuslan Ermilovon top of which an MBR geom provides
27478ad5421SRuslan Ermilov.Pa da0s1
27578ad5421SRuslan Ermilovand
27678ad5421SRuslan Ermilov.Pa da0s2 ,
27778ad5421SRuslan Ermilovand on top of
27878ad5421SRuslan Ermilov.Pa da0s1
27978ad5421SRuslan Ermilova BSD geom provides
28078ad5421SRuslan Ermilov.Pa da0s1a
28178ad5421SRuslan Ermilovthrough
28278ad5421SRuslan Ermilov.Pa da0s1e ,
28316e88145SCeri Daviesand that both the MBR and BSD geoms have
28427c74787SPoul-Henning Kampautoconfigured based on data structures on the disk media.
28578ad5421SRuslan ErmilovNow imagine the case where
28678ad5421SRuslan Ermilov.Pa da0
28778ad5421SRuslan Ermilovis opened for writing and those
28878ad5421SRuslan Ermilovdata structures are modified or overwritten: now the geoms would
28927c74787SPoul-Henning Kampbe operating on stale metadata unless some notification system
29027c74787SPoul-Henning Kampcan inform them otherwise.
291d773aebdSPoul-Henning Kamp.Pp
29278ad5421SRuslan ErmilovTo avoid this situation, when the open of
29378ad5421SRuslan Ermilov.Pa da0
29478ad5421SRuslan Ermilovfor write happens,
29516e88145SCeri Daviesall attached consumers are told about this and geoms like
29678ad5421SRuslan ErmilovMBR and BSD will self-destruct as a result.
29778ad5421SRuslan ErmilovWhen
29878ad5421SRuslan Ermilov.Pa da0
29916e88145SCeri Daviesis closed, it will be offered for tasting again
30016e88145SCeri Daviesand, if the data structures for MBR and BSD are still there, new
30127c74787SPoul-Henning Kampgeoms will instantiate themselves anew.
30227c74787SPoul-Henning Kamp.Pp
30327c74787SPoul-Henning KampNow for the fine print:
30427c74787SPoul-Henning Kamp.Pp
30527c74787SPoul-Henning KampIf any of the paths through the MBR or BSD module were open, they
30616e88145SCeri Davieswould have opened downwards with an exclusive bit thus rendering it
30778ad5421SRuslan Ermilovimpossible to open
30878ad5421SRuslan Ermilov.Pa da0
30916e88145SCeri Daviesfor writing in that case.
31016e88145SCeri DaviesConversely,
31127c74787SPoul-Henning Kampthe requested exclusive bit would render it impossible to open a
31278ad5421SRuslan Ermilovpath through the MBR geom while
31378ad5421SRuslan Ermilov.Pa da0
31478ad5421SRuslan Ermilovis open for writing.
31527c74787SPoul-Henning Kamp.Pp
31627c74787SPoul-Henning KampFrom this it also follows that changing the size of open geoms can
317d773aebdSPoul-Henning Kamponly be done with their cooperation.
31827c74787SPoul-Henning Kamp.Pp
31927c74787SPoul-Henning KampFinally: the spoiling only happens when the write count goes from
32016e88145SCeri Davieszero to non-zero and the retasting happens only when the write count goes
321d773aebdSPoul-Henning Kampfrom non-zero to zero.
32278ad5421SRuslan Ermilov.It Em INSERT/DELETE
32316e88145SCeri Daviesare very special operations which allow a new geom
32427c74787SPoul-Henning Kampto be instantiated between a consumer and a provider attached to
32527c74787SPoul-Henning Kampeach other and to remove it again.
32627c74787SPoul-Henning Kamp.Pp
32716e88145SCeri DaviesTo understand the utility of this, imagine a provider
32827c74787SPoul-Henning Kampbeing mounted as a file system.
32916e88145SCeri DaviesBetween the DEVFS geom's consumer and its provider we insert
33056cf50adSPoul-Henning Kampa mirror module which configures itself with one mirror
33127c74787SPoul-Henning Kampcopy and consequently is transparent to the I/O requests
33227c74787SPoul-Henning Kampon the path.
33327c74787SPoul-Henning KampWe can now configure yet a mirror copy on the mirror geom,
33456cf50adSPoul-Henning Kamprequest a synchronization, and finally drop the first mirror
33527c74787SPoul-Henning Kampcopy.
33616e88145SCeri DaviesWe have now, in essence, moved a mounted file system from one
33727c74787SPoul-Henning Kampdisk to another while it was being used.
33827c74787SPoul-Henning KampAt this point the mirror geom can be deleted from the path
33916e88145SCeri Daviesagain; it has served its purpose.
34078ad5421SRuslan Ermilov.It Em CONFIGURE
34127c74787SPoul-Henning Kampis the process where the administrator issues instructions
3425203edcdSRuslan Ermilovfor a particular class to instantiate itself.
3435203edcdSRuslan ErmilovThere are multiple
34416e88145SCeri Daviesways to express intent in this case - a particular provider may be
34516e88145SCeri Daviesspecified with a level of override forcing, for instance, a BSD
34627c74787SPoul-Henning Kampdisklabel module to attach to a provider which was not found palatable
34727c74787SPoul-Henning Kampduring the TASTE operation.
34827c74787SPoul-Henning Kamp.Pp
34916e88145SCeri DaviesFinally, I/O is the reason we even do this: it concerns itself with
35027c74787SPoul-Henning Kampsending I/O requests through the graph.
35116e88145SCeri Davies.It Em "I/O REQUESTS" ,
35278ad5421SRuslan Ermilovrepresented by
35378ad5421SRuslan Ermilov.Vt "struct bio" ,
35478ad5421SRuslan Ermilovoriginate at a consumer,
35516e88145SCeri Daviesare scheduled on its attached provider and, when processed, are returned
35627c74787SPoul-Henning Kampto the consumer.
35778ad5421SRuslan ErmilovIt is important to realize that the
35878ad5421SRuslan Ermilov.Vt "struct bio"
35978ad5421SRuslan Ermilovwhich enters through the provider of a particular geom does not
36078ad5421SRuslan Ermilov.Do
36178ad5421SRuslan Ermilovcome out on the other side
36278ad5421SRuslan Ermilov.Dc .
36327c74787SPoul-Henning KampEven simple transformations like MBR and BSD will clone the
36478ad5421SRuslan Ermilov.Vt "struct bio" ,
36578ad5421SRuslan Ermilovmodify the clone, and schedule the clone on their
36627c74787SPoul-Henning Kampown consumer.
36778ad5421SRuslan ErmilovNote that cloning the
36878ad5421SRuslan Ermilov.Vt "struct bio"
36978ad5421SRuslan Ermilovdoes not involve cloning the
37078ad5421SRuslan Ermilovactual data area specified in the I/O request.
37127c74787SPoul-Henning Kamp.Pp
37278ad5421SRuslan ErmilovIn total, four different I/O requests exist in
37378ad5421SRuslan Ermilov.Nm :
37478ad5421SRuslan Ermilovread, write, delete, and
37578ad5421SRuslan Ermilov.Dq "get attribute".
37627c74787SPoul-Henning Kamp.Pp
37756cf50adSPoul-Henning KampRead and write are self explanatory.
37827c74787SPoul-Henning Kamp.Pp
37927c74787SPoul-Henning KampDelete indicates that a certain range of data is no longer used
38027c74787SPoul-Henning Kampand that it can be erased or freed as the underlying technology
38127c74787SPoul-Henning Kampsupports.
38227c74787SPoul-Henning KampTechnologies like flash adaptation layers can arrange to erase
38327c74787SPoul-Henning Kampthe relevant blocks before they will become reassigned and
38456cf50adSPoul-Henning Kampcryptographic devices may want to fill random bits into the
38527c74787SPoul-Henning Kamprange to reduce the amount of data available for attack.
38627c74787SPoul-Henning Kamp.Pp
38727c74787SPoul-Henning KampIt is important to recognize that a delete indication is not a
38827c74787SPoul-Henning Kamprequest and consequently there is no guarantee that the data actually
38927c74787SPoul-Henning Kampwill be erased or made unavailable unless guaranteed by specific
3905203edcdSRuslan Ermilovgeoms in the graph.
39178ad5421SRuslan ErmilovIf
39278ad5421SRuslan Ermilov.Dq "secure delete"
39378ad5421SRuslan Ermilovsemantics are required, a
39427c74787SPoul-Henning Kampgeom should be pushed which converts delete indications into (a
39527c74787SPoul-Henning Kampsequence of) write requests.
39627c74787SPoul-Henning Kamp.Pp
39778ad5421SRuslan Ermilov.Dq "Get attribute"
39878ad5421SRuslan Ermilovsupports inspection and manipulation
39927c74787SPoul-Henning Kampof out-of-band attributes on a particular provider or path.
40078ad5421SRuslan ErmilovAttributes are named by
40178ad5421SRuslan Ermilov.Tn ASCII
40278ad5421SRuslan Ermilovstrings and they will be discussed in
40327c74787SPoul-Henning Kampa separate section below.
40478ad5421SRuslan Ermilov.El
40527c74787SPoul-Henning Kamp.Pp
40678ad5421SRuslan Ermilov(Stay tuned while the author rests his brain and fingers: more to come.)
407ba3eb872SScott Long.Sh DIAGNOSTICS
40878ad5421SRuslan ErmilovSeveral flags are provided for tracing
40978ad5421SRuslan Ermilov.Nm
41078ad5421SRuslan Ermilovoperations and unlocking
411ba3eb872SScott Longprotection mechanisms via the
412ba3eb872SScott Long.Va kern.geom.debugflags
413ba3eb872SScott Longsysctl.
414ba3eb872SScott LongAll of these flags are off by default, and great care should be taken in
415ba3eb872SScott Longturning them on.
41678ad5421SRuslan Ermilov.Bl -tag -width indent
4174f068961SRuslan Ermilov.It 0x01 Pq Dv G_T_TOPOLOGY
418ba3eb872SScott LongProvide tracing of topology change events.
4194f068961SRuslan Ermilov.It 0x02 Pq Dv G_T_BIO
420ba3eb872SScott LongProvide tracing of buffer I/O requests.
4214f068961SRuslan Ermilov.It 0x04 Pq Dv G_T_ACCESS
422ba3eb872SScott LongProvide tracing of access check controls.
423ba3eb872SScott Long.It 0x08 (unused)
424ba3eb872SScott Long.It 0x10 (allow foot shooting)
425ba3eb872SScott LongAllow writing to Rank 1 providers.
426ba3eb872SScott LongThis would, for example, allow the super-user to overwrite the MBR on the root
4274f068961SRuslan Ermilovdisk or write random sectors elsewhere to a mounted disk.
4284f068961SRuslan ErmilovThe implications are obvious.
4294f068961SRuslan Ermilov.It 0x20 Pq Dv G_T_DETAILS
430ba3eb872SScott LongThis appears to be unused at this time.
4314f068961SRuslan Ermilov.It 0x40 Pq Dv G_F_DISKIOCTL
432ba3eb872SScott LongThis appears to be unused at this time.
4334f068961SRuslan Ermilov.It 0x80 Pq Dv G_F_CTLDUMP
434ba3eb872SScott LongDump contents of gctl requests.
435ba3eb872SScott Long.El
43627c74787SPoul-Henning Kamp.Sh HISTORY
43778ad5421SRuslan ErmilovThis software was developed for the
43878ad5421SRuslan Ermilov.Fx
43978ad5421SRuslan ErmilovProject by
44078ad5421SRuslan Ermilov.An Poul-Henning Kamp
4414f068961SRuslan Ermilovand NAI Labs, the Security Research Division of Network Associates, Inc.\&
44278ad5421SRuslan Ermilovunder DARPA/SPAWAR contract N66001-01-C-8035
44378ad5421SRuslan Ermilov.Pq Dq CBOSS ,
44478ad5421SRuslan Ermilovas part of the
44527c74787SPoul-Henning KampDARPA CHATS research program.
44627c74787SPoul-Henning Kamp.Pp
44778ad5421SRuslan ErmilovThe first precursor for
44878ad5421SRuslan Ermilov.Nm
44978ad5421SRuslan Ermilovwas a gruesome hack to Minix 1.2 and was
4505203edcdSRuslan Ermilovnever distributed.
4515203edcdSRuslan ErmilovAn earlier attempt to implement a less general scheme
45278ad5421SRuslan Ermilovin
45378ad5421SRuslan Ermilov.Fx
45478ad5421SRuslan Ermilovnever succeeded.
45527c74787SPoul-Henning Kamp.Sh AUTHORS
45627c74787SPoul-Henning Kamp.An "Poul-Henning Kamp" Aq phk@FreeBSD.org
457