The Regents of the University of California. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
3. Neither the name of the University nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
@(#)2.2.t 8.1 (Berkeley) 6/8/93
.sh "File system Overview
The file system abstraction provides access to a hierarchical file system structure. The file system contains directories (each of which may contain other sub-directories) as well as files and references to other objects such as devices and inter-process communications sockets.
Each file is organized as a linear array of bytes. No record boundaries or system related information is present in a file. Files may be read and written in a random-access fashion. The user may read the data in a directory as though it were an ordinary file to determine the names of the contained files, but only the system may write into the directories. The file system stores only a small amount of ownership, protection and usage information with a file. Naming
The file system calls take path name arguments. These consist of a zero or more component file names separated by ``/\^'' characters, where each file name is up to 255 ASCII characters excluding null and ``/\^''.
Each process always has two naming contexts: one for the root directory of the file system and one for the current working directory. These are used by the system in the filename translation process. If a path name begins with a ``/\^'', it is called a full path name and interpreted relative to the root directory context. If the path name does not begin with a ``/\^'' it is called a relative path name and interpreted relative to the current directory context.
The system limits the total length of a path name to 1024 characters.
The file name ``..'' in each directory refers to the parent directory of that directory. The parent directory of the root of the file system is always that directory.
The calls chdir(path); char *path; chroot(path) char *path; change the current working directory and root directory context of a process. Only the super-user can change the root directory context of a process. Creation and removal
The file system allows directories, files, special devices, and ``portals'' to be created and removed from the file system. Directory creation and removal
A directory is created with the mkdir system call: mkdir(path, mode); char *path; int mode; where the mode is defined as for files (see below). Directories are removed with the rmdir system call: rmdir(path); char *path; A directory must be empty if it is to be deleted. File creation
Files are created with the open system call, fd = open(path, oflag, mode); result int fd; char *path; int oflag, mode; The path parameter specifies the name of the file to be created. The oflag parameter must include O_CREAT from below to cause the file to be created. Bits for oflag are defined in <sys/file.h>: ._d #define O_RDONLY 000 /* open for reading */ #define O_WRONLY 001 /* open for writing */ #define O_RDWR 002 /* open for read & write */ #define O_NDELAY 004 /* non-blocking open */ #define O_APPEND 010 /* append on each write */ #define O_CREAT 01000 /* open with file create */ #define O_TRUNC 02000 /* open with truncation */ #define O_EXCL 04000 /* error on create if file exists */
One of O_RDONLY, O_WRONLY and O_RDWR should be specified, indicating what types of operations are desired to be performed on the open file. The operations will be checked against the user's access rights to the file before allowing the open to succeed. Specifying O_APPEND causes writes to automatically append to the file. The flag O_CREAT causes the file to be created if it does not exist, owned by the current user and the group of the containing directory. The protection for the new file is specified in mode. The file mode is used as a three digit octal number. Each digit encodes read access as 4, write access as 2 and execute access as 1, or'ed together. The 0700 bits describe owner access, the 070 bits describe the access rights for processes in the same group as the file, and the 07 bits describe the access rights for other processes.
If the open specifies to create the file with O_EXCL and the file already exists, then the open will fail without affecting the file in any way. This provides a simple exclusive access facility. If the file exists but is a symbolic link, the open will fail regardless of the existence of the file specified by the link. Creating references to devices
The file system allows entries which reference peripheral devices. Peripherals are distinguished as block or character devices according by their ability to support block-oriented operations. Devices are identified by their ``major'' and ``minor'' device numbers. The major device number determines the kind of peripheral it is, while the minor device number indicates one of possibly many peripherals of that kind. Structured devices have all operations performed internally in ``block'' quantities while unstructured devices often have a number of special ioctl operations, and may have input and output performed in varying units. The mknod call creates special entries: mknod(path, mode, dev); char *path; int mode, dev; where mode is formed from the object type and access permissions. The parameter dev is a configuration dependent parameter used to identify specific character or block I/O devices. Portal creation\(dg
.FS \(dg The portal call is not implemented in 4.3BSD. .FE The call fd = portal(name, server, param, dtype, protocol, domain, socktype) result int fd; char *name, *server, *param; int dtype, protocol; int domain, socktype; places a name in the file system name space that causes connection to a server process when the name is used. The portal call returns an active portal in fd as though an access had occurred to activate an inactive portal, as now described.
When an inactive portal is accessed, the system sets up a socket of the specified socktype in the specified communications domain (see section 2.3), and creates the server process, giving it the specified param as argument to help it identify the portal, and also giving it the newly created socket as descriptor number 0. The accessor of the portal will create a socket in the same domain and connect to the server. The user will then wrap the socket in the specified protocol to create an object of the required descriptor type dtype and proceed with the operation which was in progress before the portal was encountered.
While the server process holds the socket (which it received as fd from the portal call on descriptor 0 at activation) further references will result in connections being made to the same socket. File, device, and portal removal
A reference to a file, special device or portal may be removed with the unlink call, unlink(path); char *path; The caller must have write access to the directory in which the file is located for this call to be successful. Reading and modifying file attributes
Detailed information about the attributes of a file may be obtained with the calls: #include <sys/stat.h> stat(path, stb); char *path; result struct stat *stb; fstat(fd, stb); int fd; result struct stat *stb; The stat structure includes the file type, protection, ownership, access times, size, and a count of hard links. If the file is a symbolic link, then the status of the link itself (rather than the file the link references) may be found using the lstat call: lstat(path, stb); char *path; result struct stat *stb;
Newly created files are assigned the user id of the process that created it and the group id of the directory in which it was created. The ownership of a file may be changed by either of the calls chown(path, owner, group); char *path; int owner, group; fchown(fd, owner, group); int fd, owner, group;
In addition to ownership, each file has three levels of access protection associated with it. These levels are owner relative, group relative, and global (all users and groups). Each level of access has separate indicators for read permission, write permission, and execute permission. The protection bits associated with a file may be set by either of the calls: chmod(path, mode); char *path; int mode; fchmod(fd, mode); int fd, mode; where mode is a value indicating the new protection of the file, as listed in section 2.2.3.2.
Finally, the access and modify times on a file may be set by the call: utimes(path, tvp) char *path; struct timeval *tvp[2]; This is particularly useful when moving files between media, to preserve relationships between the times the file was modified. Links and renaming
Links allow multiple names for a file to exist. Links exist independently of the file linked to.
Two types of links exist, hard links and symbolic links. A hard link is a reference counting mechanism that allows a file to have multiple names within the same file system. Symbolic links cause string substitution during the pathname interpretation process.
Hard links and symbolic links have different properties. A hard link insures the target file will always be accessible, even after its original directory entry is removed; no such guarantee exists for a symbolic link. Symbolic links can span file systems boundaries.
The following calls create a new link, named path2, to path1: link(path1, path2); char *path1, *path2; symlink(path1, path2); char *path1, *path2; The unlink primitive may be used to remove either type of link.
If a file is a symbolic link, the ``value'' of the link may be read with the readlink call, len = readlink(path, buf, bufsize); result int len; result char *path, *buf; int bufsize; This call returns, in buf, the null-terminated string substituted into pathnames passing through path\|.
Atomic renaming of file system resident objects is possible with the rename call: rename(oldname, newname); char *oldname, *newname; where both oldname and newname must be in the same file system. If newname exists and is a directory, then it must be empty. Extension and truncation
Files are created with zero length and may be extended simply by writing or appending to them. While a file is open the system maintains a pointer into the file indicating the current location in the file associated with the descriptor. This pointer may be moved about in the file in a random access fashion. To set the current offset into a file, the lseek call may be used, oldoffset = lseek(fd, offset, type); result off_t oldoffset; int fd; off_t offset; int type; where type is given in <sys/file.h> as one of: ._d #define L_SET 0 /* set absolute file offset */ #define L_INCR 1 /* set file offset relative to current position */ #define L_XTND 2 /* set offset relative to end-of-file */ The call ``lseek(fd, 0, L_INCR)'' returns the current offset into the file.
Files may have ``holes'' in them. Holes are void areas in the linear extent of the file where data has never been written. These may be created by seeking to a location in a file past the current end-of-file and writing. Holes are treated by the system as zero valued bytes.
A file may be truncated with either of the calls: truncate(path, length); char *path; int length; ftruncate(fd, length); int fd, length; reducing the size of the specified file to length bytes. Checking accessibility
A process running with different real and effective user ids may interrogate the accessibility of a file to the real user by using the access call: accessible = access(path, how); result int accessible; char *path; int how; Here how is constructed by or'ing the following bits, defined in <sys/file.h>: ._d #define F_OK 0 /* file exists */ #define X_OK 1 /* file is executable */ #define W_OK 2 /* file is writable */ #define R_OK 4 /* file is readable */ The presence or absence of advisory locks does not affect the result of access\|. Locking
The file system provides basic facilities that allow cooperating processes to synchronize their access to shared files. A process may place an advisory read or write lock on a file, so that other cooperating processes may avoid interfering with the process' access. This simple mechanism provides locking with file granularity. More granular locking can be built using the IPC facilities to provide a lock manager. The system does not force processes to obey the locks; they are of an advisory nature only.
Locking is performed after an open call by applying the flock primitive, flock(fd, how); int fd, how; where the how parameter is formed from bits defined in <sys/file.h>: ._d #define LOCK_SH 1 /* shared lock */ #define LOCK_EX 2 /* exclusive lock */ #define LOCK_NB 4 /* don't block when locking */ #define LOCK_UN 8 /* unlock */ Successive lock calls may be used to increase or decrease the level of locking. If an object is currently locked by another process when a flock call is made, the caller will be blocked until the current lock owner releases the lock; this may be avoided by including LOCK_NB in the how parameter. Specifying LOCK_UN removes all locks associated with the descriptor. Advisory locks held by a process are automatically deleted when the process terminates. Disk quotas
As an optional facility, each file system may be requested to impose limits on a user's disk usage. Two quantities are limited: the total amount of disk space which a user may allocate in a file system and the total number of files a user may create in a file system. Quotas are expressed as hard limits and soft limits. A hard limit is always imposed; if a user would exceed a hard limit, the operation which caused the resource request will fail. A soft limit results in the user receiving a warning message, but with allocation succeeding. Facilities are provided to turn soft limits into hard limits if a user has exceeded a soft limit for an unreasonable period of time.
To enable disk quotas on a file system the setquota call is used: setquota(special, file) char *special, *file; where special refers to a structured device file where a mounted file system exists, and file refers to a disk quota file (residing on the file system associated with special) from which user quotas should be obtained. The format of the disk quota file is implementation dependent.
To manipulate disk quotas the quota call is provided: #include <sys/quota.h> quota(cmd, uid, arg, addr) int cmd, uid, arg; caddr_t addr; The indicated cmd is applied to the user ID uid. The parameters arg and addr are command specific. The file <sys/quota.h> contains definitions pertinent to the use of this call.