xref: /freebsd/usr.bin/split/split.1 (revision 4f8f43b06ed07e96a250855488cc531799d5b78f)
1.\" Copyright (c) 1990, 1991, 1993, 1994
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\" 3. Neither the name of the University nor the names of its contributors
13.\"    may be used to endorse or promote products derived from this software
14.\"    without specific prior written permission.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.\"	@(#)split.1	8.3 (Berkeley) 4/16/94
29.\"
30.Dd May 26, 2023
31.Dt SPLIT 1
32.Os
33.Sh NAME
34.Nm split
35.Nd split a file into pieces
36.Sh SYNOPSIS
37.Nm
38.Op Fl cd
39.Op Fl l Ar line_count
40.Op Fl a Ar suffix_length
41.Op Ar file Op Ar prefix
42.Nm
43.Op Fl cd
44.Fl b Ar byte_count Ns
45.Oo
46.Sm off
47.Cm K | k | M | m | G | g
48.Sm on
49.Oc
50.Op Fl a Ar suffix_length
51.Op Ar file Op Ar prefix
52.Nm
53.Op Fl cd
54.Fl n Ar chunk_count
55.Op Fl a Ar suffix_length
56.Op Ar file Op Ar prefix
57.Nm
58.Op Fl cd
59.Fl p Ar pattern
60.Op Fl a Ar suffix_length
61.Op Ar file Op Ar prefix
62.Sh DESCRIPTION
63The
64.Nm
65utility reads the given
66.Ar file
67and breaks it up into files of 1000 lines each
68(if no options are specified), leaving the
69.Ar file
70unchanged.
71If
72.Ar file
73is a single dash
74.Pq Sq Fl
75or absent,
76.Nm
77reads from the standard input.
78.Pp
79The options are as follows:
80.Bl -tag -width indent
81.It Fl a Ar suffix_length
82Use
83.Ar suffix_length
84letters to form the suffix of the file name.
85.It Fl b Ar byte_count Ns Oo
86.Sm off
87.Cm K | k | M | m | G | g
88.Sm on
89.Oc
90Create split files
91.Ar byte_count
92bytes in length.
93If
94.Cm k
95or
96.Cm K
97is appended to the number, the file is split into
98.Ar byte_count
99kilobyte pieces.
100If
101.Cm m
102or
103.Cm M
104is appended to the number, the file is split into
105.Ar byte_count
106megabyte pieces.
107If
108.Cm g
109or
110.Cm G
111is appended to the number, the file is split into
112.Ar byte_count
113gigabyte pieces.
114.It Fl c
115Continue creating files and do not overwrite existing
116output files.
117.It Fl d
118Use a numeric suffix instead of a alphabetic suffix.
119.It Fl l Ar line_count
120Create split files
121.Ar line_count
122lines in length.
123.It Fl n Ar chunk_count
124Split file into
125.Ar chunk_count
126smaller files.
127The first n - 1 files will be of size (size of
128.Ar file
129/
130.Ar chunk_count
131)
132and the last file will contain the remaining bytes.
133.It Fl p Ar pattern
134The file is split whenever an input line matches
135.Ar pattern ,
136which is interpreted as an extended regular expression.
137The matching line will be the first line of the next output file.
138This option is incompatible with the
139.Fl b
140and
141.Fl l
142options.
143.El
144.Pp
145If additional arguments are specified, the first is used as the name
146of the input file which is to be split.
147If a second additional argument is specified, it is used as a prefix
148for the names of the files into which the file is split.
149In this case, each file into which the file is split is named by the
150prefix followed by a lexically ordered suffix using
151.Ar suffix_length
152characters in the range
153.Dq Li a Ns - Ns Li z .
154If
155.Fl a
156is not specified, two letters are used as the initial suffix.
157If the output does not fit into the resulting number of files and the
158.Fl d
159flag is not specified, then the suffix length is automatically extended as
160needed such that all output files continue to sort in lexical order.
161.Pp
162If the
163.Ar prefix
164argument is not specified, the file is split into lexically ordered
165files named with the prefix
166.Dq Li x
167and with suffixes as above.
168.Pp
169By default,
170.Nm
171will overwrite any existing output files.
172If the
173.Fl c
174flag is specified,
175.Nm
176will instead create files with names that do not already exist.
177.Sh ENVIRONMENT
178The
179.Ev LANG , LC_ALL , LC_CTYPE
180and
181.Ev LC_COLLATE
182environment variables affect the execution of
183.Nm
184as described in
185.Xr environ 7 .
186.Sh EXIT STATUS
187.Ex -std
188.Sh EXAMPLES
189Split input into as many files as needed, so that each file contains at most 2
190lines:
191.Bd -literal -offset indent
192$ echo -e "first line\\nsecond line\\nthird line\\nforth line" | split -l2
193.Ed
194.Pp
195Split input in chunks of 10 bytes using numeric prefixes for file names.
196This generates two files of 10 bytes (x00 and x01) and a third file (x02) with the
197remaining 2 bytes:
198.Bd -literal -offset indent
199$ echo -e "This is 22 bytes long" | split -d -b10
200.Ed
201.Pp
202Split input generating 6 files:
203.Bd -literal -offset indent
204$ echo -e "This is 22 bytes long" | split -n 6
205.Ed
206.Pp
207Split input creating a new file every time a line matches the regular expression
208for a
209.Dq t
210followed by either
211.Dq a
212or
213.Dq u
214thus creating two files:
215.Bd -literal -offset indent
216$ echo -e "stack\\nstock\\nstuck\\nanother line" | split -p 't[au]'
217.Ed
218.Sh SEE ALSO
219.Xr csplit 1 ,
220.Xr re_format 7
221.Sh STANDARDS
222The
223.Nm
224utility conforms to
225.St -p1003.1-2001 .
226.Sh HISTORY
227A
228.Nm
229command appeared in
230.At v3 .
231.Pp
232Before
233.Fx 14 ,
234pattern and line matching only operated on lines shorter than 65,536 bytes.
235