1/**************************************************************** 2Copyright (C) Lucent Technologies 1997 3All Rights Reserved 4 5Permission to use, copy, modify, and distribute this software and 6its documentation for any purpose and without fee is hereby 7granted, provided that the above copyright notice appear in all 8copies and that both that the copyright notice and this 9permission notice and warranty disclaimer appear in supporting 10documentation, and that the name Lucent Technologies or any of 11its entities not be used in advertising or publicity pertaining 12to distribution of the software without specific, written prior 13permission. 14 15LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, 16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. 17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY 18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER 20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, 21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF 22THIS SOFTWARE. 23****************************************************************/ 24 25This file lists all bug fixes, changes, etc., made since the 26second edition of the AWK book was published in September 2023. 27 28Oct 30, 2023: 29 multiple fixes and a minor code cleanup. 30 disabled utf-8 for non-multibyte locales, such as C or POSIX. 31 fixed a bad char * cast that causes incorrect results on big-endian 32 systems. also fixed an out-of-bounds read for empty CCL. 33 fixed a buffer overflow in substr with utf-8 strings. 34 many thanks to Todd C Miller. 35 36 37Sep 24, 2023: 38 fnematch and getrune have been overhauled to solve issues around 39 unicode FS and RS. also fixed gsub null match issue with unicode. 40 big thanks to Arnold Robbins. 41 42Sep 12, 2023: 43 Fixed a length error in u8_byte2char that set RSTART to 44 incorrect (cannot happen) value for EOL match(str, /$/). 45 46 47----------------------------------------------------------------- 48 49[This entry is a summary, not a precise list of changes.] 50 51 Added --csv option to enable processing of comma-separated 52 values inputs. When --csv is enabled, fields are separated 53 by commas, fields may be quoted with " double quotes, fields 54 may contain embedded newlines. 55 56 If no explicit separator argument is provided, split() uses 57 the setting of --csv to determine how fields are split. 58 59 Strings may now contain UTF-8 code points (not necessarily 60 characters). Functions that operate on characters, like 61 length, substr, index, match, etc., use UTF-8, so the length 62 of a string of 3 emojis is 3, not 12 as it would be if bytes 63 were counted. 64 65 Regular expressions are processes as UTF-8. 66 67 Unicode literals can be written as \u followed by one 68 to eight hexadecimal digits. These may appear in strings and 69 regular expressions. 70 71