xref: /freebsd/contrib/libdivsufsort/README.md (revision 27067774dce3388702a4cf744d7096c6fb71b688)
1*c08cbc64SXin LI# libdivsufsort
2*c08cbc64SXin LI
3*c08cbc64SXin LIlibdivsufsort is a software library that implements a lightweight suffix array construction algorithm.
4*c08cbc64SXin LI
5*c08cbc64SXin LI## News
6*c08cbc64SXin LI* 2015-03-21: The project has moved from [Google Code](http://code.google.com/p/libdivsufsort/) to [GitHub](https://github.com/y-256/libdivsufsort)
7*c08cbc64SXin LI
8*c08cbc64SXin LI## Introduction
9*c08cbc64SXin LIThis library provides a simple and an efficient C API to construct a suffix array and a Burrows-Wheeler transformed string from a given string over a constant-size alphabet.
10*c08cbc64SXin LIThe algorithm runs in O(n log n) worst-case time using only 5n+O(1) bytes of memory space, where n is the length of
11*c08cbc64SXin LIthe string.
12*c08cbc64SXin LI
13*c08cbc64SXin LI## Build requirements
14*c08cbc64SXin LI* An ANSI C Compiler (e.g. GNU GCC)
15*c08cbc64SXin LI* [CMake](http://www.cmake.org/ "CMake") version 2.4.2 or newer
16*c08cbc64SXin LI* CMake-supported build tool
17*c08cbc64SXin LI
18*c08cbc64SXin LI## Building on GNU/Linux
19*c08cbc64SXin LI1. Get the source code from GitHub. You can either
20*c08cbc64SXin LI    * use git to clone the repository
21*c08cbc64SXin LI    ```
22*c08cbc64SXin LI    git clone https://github.com/y-256/libdivsufsort.git
23*c08cbc64SXin LI    ```
24*c08cbc64SXin LI    * or download a [zip file](../../archive/master.zip) directly
25*c08cbc64SXin LI2. Create a `build` directory in the package source directory.
26*c08cbc64SXin LI```shell
27*c08cbc64SXin LI$ cd libdivsufsort
28*c08cbc64SXin LI$ mkdir build
29*c08cbc64SXin LI$ cd build
30*c08cbc64SXin LI```
31*c08cbc64SXin LI3. Configure the package for your system.
32*c08cbc64SXin LIIf you want to install to a different location,  change the -DCMAKE_INSTALL_PREFIX option.
33*c08cbc64SXin LI```shell
34*c08cbc64SXin LI$ cmake -DCMAKE_BUILD_TYPE="Release" \
35*c08cbc64SXin LI-DCMAKE_INSTALL_PREFIX="/usr/local" ..
36*c08cbc64SXin LI```
37*c08cbc64SXin LI4. Compile the package.
38*c08cbc64SXin LI```shell
39*c08cbc64SXin LI$ make
40*c08cbc64SXin LI```
41*c08cbc64SXin LI5. (Optional) Install the library and header files.
42*c08cbc64SXin LI```shell
43*c08cbc64SXin LI$ sudo make install
44*c08cbc64SXin LI```
45*c08cbc64SXin LI
46*c08cbc64SXin LI## API
47*c08cbc64SXin LI```c
48*c08cbc64SXin LI/* Data types */
49*c08cbc64SXin LItypedef int32_t saint_t;
50*c08cbc64SXin LItypedef int32_t saidx_t;
51*c08cbc64SXin LItypedef uint8_t sauchar_t;
52*c08cbc64SXin LI
53*c08cbc64SXin LI/*
54*c08cbc64SXin LI * Constructs the suffix array of a given string.
55*c08cbc64SXin LI * @param T[0..n-1] The input string.
56*c08cbc64SXin LI * @param SA[0..n-1] The output array or suffixes.
57*c08cbc64SXin LI * @param n The length of the given string.
58*c08cbc64SXin LI * @return 0 if no error occurred, -1 or -2 otherwise.
59*c08cbc64SXin LI */
60*c08cbc64SXin LIsaint_t
61*c08cbc64SXin LIdivsufsort(const sauchar_t *T, saidx_t *SA, saidx_t n);
62*c08cbc64SXin LI
63*c08cbc64SXin LI/*
64*c08cbc64SXin LI * Constructs the burrows-wheeler transformed string of a given string.
65*c08cbc64SXin LI * @param T[0..n-1] The input string.
66*c08cbc64SXin LI * @param U[0..n-1] The output string. (can be T)
67*c08cbc64SXin LI * @param A[0..n-1] The temporary array. (can be NULL)
68*c08cbc64SXin LI * @param n The length of the given string.
69*c08cbc64SXin LI * @return The primary index if no error occurred, -1 or -2 otherwise.
70*c08cbc64SXin LI */
71*c08cbc64SXin LIsaidx_t
72*c08cbc64SXin LIdivbwt(const sauchar_t *T, sauchar_t *U, saidx_t *A, saidx_t n);
73*c08cbc64SXin LI```
74*c08cbc64SXin LI
75*c08cbc64SXin LI## Example Usage
76*c08cbc64SXin LI```c
77*c08cbc64SXin LI#include <stdio.h>
78*c08cbc64SXin LI#include <stdlib.h>
79*c08cbc64SXin LI#include <string.h>
80*c08cbc64SXin LI
81*c08cbc64SXin LI#include <divsufsort.h>
82*c08cbc64SXin LI
83*c08cbc64SXin LIint main() {
84*c08cbc64SXin LI    // intput data
85*c08cbc64SXin LI    char *Text = "abracadabra";
86*c08cbc64SXin LI    int n = strlen(Text);
87*c08cbc64SXin LI    int i, j;
88*c08cbc64SXin LI
89*c08cbc64SXin LI    // allocate
90*c08cbc64SXin LI    int *SA = (int *)malloc(n * sizeof(int));
91*c08cbc64SXin LI
92*c08cbc64SXin LI    // sort
93*c08cbc64SXin LI    divsufsort((unsigned char *)Text, SA, n);
94*c08cbc64SXin LI
95*c08cbc64SXin LI    // output
96*c08cbc64SXin LI    for(i = 0; i < n; ++i) {
97*c08cbc64SXin LI        printf("SA[%2d] = %2d: ", i, SA[i]);
98*c08cbc64SXin LI        for(j = SA[i]; j < n; ++j) {
99*c08cbc64SXin LI            printf("%c", Text[j]);
100*c08cbc64SXin LI        }
101*c08cbc64SXin LI        printf("$\n");
102*c08cbc64SXin LI    }
103*c08cbc64SXin LI
104*c08cbc64SXin LI    // deallocate
105*c08cbc64SXin LI    free(SA);
106*c08cbc64SXin LI
107*c08cbc64SXin LI    return 0;
108*c08cbc64SXin LI}
109*c08cbc64SXin LI```
110*c08cbc64SXin LISee the [examples](examples) directory for a few other examples.
111*c08cbc64SXin LI
112*c08cbc64SXin LI## Benchmarks
113*c08cbc64SXin LISee [Benchmarks](https://github.com/y-256/libdivsufsort/blob/wiki/SACA_Benchmarks.md) page for details.
114*c08cbc64SXin LI
115*c08cbc64SXin LI## License
116*c08cbc64SXin LIlibdivsufsort is released under the [MIT license](LICENSE "MIT license").
117*c08cbc64SXin LI> The MIT License (MIT)
118*c08cbc64SXin LI>
119*c08cbc64SXin LI> Copyright (c) 2003 Yuta Mori All rights reserved.
120*c08cbc64SXin LI>
121*c08cbc64SXin LI> Permission is hereby granted, free of charge, to any person obtaining a copy
122*c08cbc64SXin LI> of this software and associated documentation files (the "Software"), to deal
123*c08cbc64SXin LI> in the Software without restriction, including without limitation the rights
124*c08cbc64SXin LI> to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
125*c08cbc64SXin LI> copies of the Software, and to permit persons to whom the Software is
126*c08cbc64SXin LI> furnished to do so, subject to the following conditions:
127*c08cbc64SXin LI>
128*c08cbc64SXin LI> The above copyright notice and this permission notice shall be included in all
129*c08cbc64SXin LI> copies or substantial portions of the Software.
130*c08cbc64SXin LI>
131*c08cbc64SXin LI> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
132*c08cbc64SXin LI> IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
133*c08cbc64SXin LI> FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
134*c08cbc64SXin LI> AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
135*c08cbc64SXin LI> LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
136*c08cbc64SXin LI> OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
137*c08cbc64SXin LI> SOFTWARE.
138*c08cbc64SXin LI
139*c08cbc64SXin LI## Author
140*c08cbc64SXin LI* Yuta Mori
141