1#
2# CDDL HEADER START
3#
4# The contents of this file are subject to the terms of the
5# Common Development and Distribution License, Version 1.0 only
6# (the "License"). You may not use this file except in compliance
7# with the License.
8#
9# You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
10# or http://www.opensolaris.org/os/licensing.
11# See the License for the specific language governing permissions
12# and limitations under the License.
13#
14# When distributing Covered Code, include this CDDL HEADER in each
15# file and include the License file at usr/src/OPENSOLARIS.LICENSE.
16# If applicable, add the following below this CDDL HEADER, with the
17# fields enclosed by brackets "[]" replaced with your own identifying
18# information: Portions Copyright [yyyy] [name of copyright owner]
19#
20# CDDL HEADER END
21#
22# Copyright (c) 1995 Sun Microsystems, Inc. All Rights Reserved
23#
24#ident "%W% %E% SMI"
25#
26# design notes that are likely to be of general (rather than
27# merely historical) interest.
28
29Table of Contents
30
31 Overview what filesync does
32
33 Primary Data Structures
34 general principles why they exist
35 key concepts what they represent
36 data structures major structures and their contents
37
38 Overview of Passes main phases of program execution
39
40 Modules list and descriptions of files
41
42 Studying the Code
43 active ingredients a reading list of high points
44 the whole thing a suggested order for everything
45
46 Gross calling structure who calls whom
47
48 Helpful hints good things to know
49
50Overview
51
52 The purpose of this program is to compare pairs of directory
53 trees with a baseline snapshot, to determine which files have
54 changed, and to propagate the changes in order to bring the
55 trees back into congruency. The baseline snapshot describes
56 size, ownership, ... for all files that filesync is managing
57 WHEN THEY WERE LAST IN SYNC.
58
59 The files and directory trees to be compared are determined
60 by a relatively flexible (user editable) rules file, whose
61 format (packingrules.4) permits files and or trees to be
62 specified, explicitly, implicitly, or with wild cards.
63 There are also provisions for filtering out unwanted files
64 and for running programs to generate lists of files and
65 directories to be included or excluded.
66
67 The comparisons begin by comparing the structured name
68 spaces. For names that appear in both trees, the files
69 are then compared on the basis of type, size, contents,
70 ownership and protections. For files that are already
71 in the baseline snapshot, if the sizes and modification
72 times have not changed, we do not bother to recheck the
73 contents.
74
75 The reconciliation process (resolving the differences)
76 will only propagate a change if it is obvious what should
77 be done (one side has changed relative to the snapshot,
78 while the other has not). If there are conflicting changes,
79 the file is flagged and the user is asked to reconcile the
80 differences manually. There are, however a few switches
81 that can be used to constrain the analysis or reconciliation,
82 or to force one particular side to win in case of a conflict.
83
84
85Primary Data Structures
86
87 general principles:
88 we will build up an in-memory tree that represents
89 the union of the name spaces found in the baseline
90 and on the source and destination sides.
91
92 keep in mind that the baseline recalls the state of
93 files THE LAST TIME THEY WERE IN AGREEMENT. If files
94 have disagreed for a long time, the baseline still
95 remembers what they were like when they agreed. If
96 files have never agreed, the baseline has no notions
97 of how they "used to be".
98
99 key concepts:
100 a "base pair" is a pair of directories whose
101 contents (or a subset of whose contents) are to
102 be syncrhonized. The "base pairs" to be managed
103 are specified in the packing rules file.
104
105 associated with each "base pair" is a set of rules
106 that describe which files (under those directories)
107 are to be kept in sync. Each rule is a list of:
108 files and or directories to be included
109 wild cards for files or directories to be included
110 programs to generate lists of names for inclusion
111 file names to be ignored
112 wild cards for file names to be ignored
113 programs to generate lists of names for ignoring
114
115 as a result of the "evaluation" process we build up
116 (under each base pair) a tree that represents all of
117 the files that we are supposed to keep in sync, and
118 contains everything we need to know about each one
119 of those files. The structure of the tree mirrors
120 the directory hierarchy ... actually the union of the
121 three hiearchies (baseline, source and destination).
122
123 for each file, we record interesting information (type,
124 size, owner, protection, mod time) and keep separate
125 note of what these values were:
126 in the baseline last time two sides agreed
127 on the source side, as we just examined it
128 on the destination side, as we just examined it
129
130 data structures:
131
132 there is an ordered list of "base" structures
133 for each base, we maintain
134 three lists of associated "rule" descriptions:
135 inclusion rules
136 exclusion rules
137 restriction rules (from the command line)
138 a "file" tree, representing all files below the bases
139 a list of statistics to be printed as a summary
140
141 for each "rule", we maintain
142 some flags describing the type of rule
143 the character string that is the rule
144
145 for each "file", we maintain
146 sibling and child pointers to give them tree structure
147 flags to describe what we have done/should do
148 "fileinfo" information from the src, dest, and baseline
149
150 in addition there are some fields that are used
151 to add the file to a list of files requiring
152 reconciliation and record what happened to it.
153
154 a "fileinfo" structure contains a subset of the information
155 that we obtain from a stat call:
156 major/minor/inum
157 type
158 link count
159 ownership, protection, and acls
160 size
161 modification time
162
163 there is also, built up during analysis, a reconciliation
164 list. This is an ordered list of "file" structures which
165 are believed to descibe files that have changed and require
166 reconciliation. The ordering is important both for correctness
167 and to preserve relative modification times.
168
169Overview of passes:
170
171 pass I (evaluate)
172
173 stat every file that we might be interested in
174 (on both src/dest sides). This includes walking
175 the trees under all directories in order to
176 find out what files exist and stating all of
177 them.
178
179 the main trick in this pass is that there may be
180 files we don't want to evaluate (because we are
181 limiting our attention to specific files and trees).
182 There is a LISTED flag kept in the database that
183 tells me whether or not I need to stat/descend any
184 given node.
185
186 all restrictions and ignores take effect during this pass.
187
188 pass II (analyze)
189
190 given the baseline and all of the current stat information
191 gained during pass I, figure out what might conceivably
192 have changed and queue it for pass III. This pass doesn't
193 try to figure out what happened or who should win ... it
194 merely identifies candidates for pass III. This pass
195 ignores any nodes that were not evaluated during pass I.
196
197 the queueing process, however, determines the order in
198 which the files will be processed in pass III, and the
199 order is very important.
200
201 pass III (reconcile)
202
203 process the list of candidates, figuring out what has
204 actually changed and which versions deserve to win. If
205 is clear what needs doing, we actually do it in this
206 pass.
207
208Modules
209
210 filesync.h
211 defines for limits, sizes and return codes
212 declarations for global variables (mostly cmd-line parms)
213 defines for default file names
214 declarations for routines of general interest
215
216 database.h
217 data-structures for recording rules
218 data-structures for recording information about files
219 declarations for routines that operate on/with those structures
220
221 messages.h
222 the text of all localizable messages
223
224 debug.h
225 definitions and declarations for routines for error
226 simulation and bit-map display.
227
228 acls.c
229 routines to get, set, compare, and display Access Control Lists
230 action.c
231 routines to do the real work of copying, deleting, or
232 changing ownership in order to make one side agree
233 with the other.
234 anal.c
235 routines to examine the in-core list of files and
236 determine what has changed (and therefore what is
237 files are candidates for reconciliation). This
238 analysis includes figuring out which files should
239 be links rather than copies.
240 base.c
241 routines to read and write the baseline file
242 routines to search and manipulate the in-core base list
243 debug.c
244 data structures and routines, used to sumulate errors
245 and produce debug output, that map between bits (as found
246 in various flag words) character string names for their
247 meanings.
248
249 eval.c
250 routines to build up the internal tree that describes
251 the status of all of the files that are described
252 by the current rules.
253 files.c
254 routines to manipulate file name arguments, including
255 wild cards and embedded environment variables.
256 ignore.c
257 routines to maintain a list of names or patterns for
258 files to be ignored, and to check file names against
259 that list.
260 main.c
261 global variables, cmd-line parameter processing,
262 parameter validation, error reporting, and the
263 main loop.
264 recon.c
265 routines to examine a list of files that appear to
266 have changed, and figure out what the appropriate
267 reconciliation course of action is.
268 rename.c
269 routines to search the tree to determine whether
270 or not any creates/deletes are actually renames.
271 rules.c
272 routines to read and write the rules file
273 routines to add rules and enumerate in-core rules
274
275 filecheck.c
276 not really a part of filesync, but rather a utility
277 program that is used in the test suite. It extracts
278 information about files that is not readily available
279 from other unix commands.
280
281Comments on studying the code
282
283 if you are only interested in the "active ingredients":
284
285 read the above notes on data structures and then
286
287 read the structure declarations in database.h
288
289 read the above notes overviewing the passes
290
291 in recon.c: read reconcile
292
293 this routine almost makes sense on its own,
294 and it is unquestionably the most important
295 routine in the entire program. Everything
296 else just gathers data for reconcile to use,
297 or updates the books to reflect the changes.
298
299 in eval.c: read evaluate, eval_file, walker, and note_info
300
301 this is the main guts of pass I
302
303 in anal.c: read analyze, check_file, check_changes & queue_file
304
305 this is the main guts of pass II
306
307 if you want to read the whole thing:
308
309 the following routines do fundamentally simple things
310 in simple ways, and can (for the most part) be understood
311 in vaccuuo. The things they do are probably sufficiently
312 obvious that you can probably understand the more interesting
313 code without having read them at all.
314
315 base.c
316 rules.c
317 files.c
318 debug.c
319 ignore.c
320 acls.c
321
322 the following routines constitute the real meat of the
323 program, and while they are broken into specialized
324 modules, they probably need to be understood as an
325 organic whole:
326
327 main.c setup and control
328 eval.c pass I
329 anal.c pass II
330 recon.c pass III
331 action.c execution and book-keeping
332 rename.c a special case for a common situation
333
334
335Gross calling structure / flow of control
336
337 main.c:main
338 findfiles
339 read_baseline
340 read_rules
341 if new rules
342 add_base
343 add_include
344 evaluate
345 analyze
346 write_baseline
347 write_summary
348
349 eval.c:evaluate
350 add_file_to_base
351 add_glob
352 add_run
353 ignore_pgm
354 ignore_file
355 ignore_expr
356 eval_file
357
358 eval.c:eval_file
359 note_info
360 nftw
361 walker
362 note_info
363
364 anal.c:analyze
365 check_file
366 reconcile
367
368 anal.c:check_file
369 check_changes
370 queue_file
371
372
373 recon.c:reconcile
374 samedata
375 samestuff
376 do_copy
377 copy
378 do_like
379 update_info
380 do_like
381 do_remove
382
383Helpful Hints
384
385 the "file" structure contains a bunch of flags. Many of them
386 just summarize what we know about the file (e.g. where it was
387 found). Others are more subtle and control the evaluation
388 process or the writing out of the baseline file. You can't
389 really understand the processing unless you understand what
390 these flags mean.
391
392 F_NEW added by a new rule
393
394 F_LISTED this name was generated by a rule
395
396 F_SPARSE this directory is an intermediate on
397 the way to a name generated by a rule
398 and should not be recursively walked.
399
400 F_EVALUATE this node was found in evaluation and
401 has up-to-date stat information
402
403 F_CONFLICT there is a conflict on this node so
404 baseline should remain unchanged
405
406 F_REMOVE this node should be purged from the baseline
407
408 F_STAT_ERROR it was impossible to stat this file
409 (and anything below it)
410
411 the implications of these flags on processing are
412
413 F_NEW, F_LISTED, F_SPARSE
414
415 affect whether or not a particular node should
416 be included in the evaluation pass.
417
418 in some situations, only new rules are interpreted.
419
420 listed files and directories should be evaluated
421 and analyzed. sparse directories should not be
422 recursively enumerated.
423
424 F_EVALUATE
425
426 determines whether or not a node is included
427 in the analysis pass. Only nodes that have
428 been evaluated will be analyzed.
429
430 F_CONFLICT, F_REMOVE, F_EVALUATE
431
432 affect how a node should be written back into the baseline file.
433
434 if there is a conflict or we haven't evaluated
435 a node, we won't update the baseline.
436
437 if a node is marked for removal, it will be
438 excluded from the baseline when it is written out.
439
440 F_STAT_ERROR
441
442 if we could not get proper status information
443 about a file (or the tree under it) we cannot,
444 with any confidence, determine what its state
445 is or do anything about it. Such files are
446 flagged as "in conflict".
447
448 it is somewhat kinky that we put error flagged
449 files on the reconciliation list. We do this
450 because this is the easiest way to pull them
451 out for reporting as conflicts.
452
453
454