mirror of
git://git.sv.gnu.org/sed
synced 2025-04-18 02:37:37 +03:00
133 lines
5.8 KiB
Plaintext
133 lines
5.8 KiB
Plaintext
* ABOUT BUGS
|
|
|
|
Before reporting a bug, please check the list of known bugs
|
|
and the list of oft-reported non-bugs (below).
|
|
|
|
Bugs and comments may be sent to bonzini@gnu.org; please
|
|
include in the Subject: header the first line of the output of
|
|
"sed --version".
|
|
|
|
Please do not send a bug report like this:
|
|
|
|
[while building frobme-1.3.4]
|
|
$ configure
|
|
sed: file sedscr line 1: Unknown option to 's'
|
|
|
|
If sed doesn't configure your favorite package, take a few extra
|
|
minutes to identify the specific problem and make a stand-alone test
|
|
case.
|
|
|
|
A stand-alone test case includes all the data necessary to perform the
|
|
test, and the specific invocation of sed that causes the problem. The
|
|
smaller a stand-alone test case is, the better. A test case should
|
|
not involve something as far removed from sed as "try to configure
|
|
frobme-1.3.4". Yes, that is in principle enough information to look
|
|
for the bug, but that is not a very practical prospect.
|
|
|
|
|
|
|
|
* NON-BUGS
|
|
|
|
'N' command on the last line
|
|
|
|
Most versions of sed exit without printing anything when the 'N'
|
|
command is issued on the last line of a file. GNU sed instead
|
|
prints pattern space before exiting unless of course the '-n'
|
|
command switch has been specified. More information on the reason
|
|
behind this choice can be found in the Info manual.
|
|
|
|
|
|
regex syntax clashes (problems with backslashes)
|
|
|
|
sed uses the Posix basic regular expression syntax. According to
|
|
the standard, the meaning of some escape sequences is undefined in
|
|
this syntax; notable in the case of GNU sed are '\|', '\+', '\?',
|
|
'\'', '\'', '\<', '\>', '\b', '\B', '\w', and '\W'.
|
|
|
|
As in all GNU programs that use Posix basic regular expressions, sed
|
|
interprets these escape sequences as meta-characters. So, 'x\+'
|
|
matches one or more occurrences of 'x'. 'abc\|def' matches either
|
|
'abc' or 'def'.
|
|
|
|
This syntax may cause problems when running scripts written for other
|
|
seds. Some sed programs have been written with the assumption that
|
|
'\|' and '\+' match the literal characters '|' and '+'. Such scripts
|
|
must be modified by removing the spurious backslashes if they are to
|
|
be used with recent versions of sed (not only GNU sed).
|
|
|
|
On the other hand, some scripts use 's|abc\|def||g' to remove occurrences
|
|
of _either_ 'abc' or 'def'. While this worked until sed 4.0.x, newer
|
|
versions interpret this as removing the string 'abc|def'. This is
|
|
again undefined behavior according to POSIX, but this interpretation
|
|
is arguably more robust: the older one, for example, required that
|
|
the regex matcher parsed '\/' as '/' in the common case of escaping
|
|
a slash, which is again undefined behavior; the new behavior avoids
|
|
this, and this is good because the regex matcher is only partially
|
|
under our control.
|
|
|
|
In addition, GNU sed supports several escape characters (some of
|
|
which are multi-character) to insert non-printable characters
|
|
in scripts ('\a', '\c', '\d', '\o', '\r', '\t', '\v', '\x'). These
|
|
can cause similar problems with scripts written for other seds.
|
|
|
|
|
|
-i clobbers read-only files
|
|
|
|
In short, 'sed d -i' will let one delete the contents of
|
|
a read-only file, and in general the '-i' option will let
|
|
one clobber protected files. This is not a bug, but rather a
|
|
consequence of how the Unix file system works.
|
|
|
|
The permissions on a file say what can happen to the data
|
|
in that file, while the permissions on a directory say what can
|
|
happen to the list of files in that directory. 'sed -i'
|
|
will not ever open for writing a file that is already on disk,
|
|
rather, it will work on a temporary file that is finally renamed
|
|
to the original name: if you rename or delete files, you're actually
|
|
modifying the contents of the directory, so the operation depends on
|
|
the permissions of the directory, not of the file). For this same
|
|
reason, sed will not let one use '-i' on a writeable file in a
|
|
read-only directory, and will break hard or symbolic links when
|
|
'-i' is used on such a file.
|
|
|
|
|
|
'0a' does not work (gives an error)
|
|
|
|
There is no line 0. 0 is a special address that is only used to treat
|
|
addresses like '0,/RE/' as active when the script starts: if you
|
|
write '1,/abc/d' and the first line includes the word 'abc', then
|
|
that match would be ignored because address ranges must span at least
|
|
two lines (barring the end of the file); but what you probably wanted is
|
|
to delete every line up to the first one including 'abc', and this
|
|
is obtained with '0,/abc/d'.
|
|
|
|
|
|
'[a-z]' is case insensitive
|
|
's/.*//' does not clear pattern space
|
|
|
|
You are encountering problems with locales. POSIX mandates that '[a-z]'
|
|
uses the current locale's collation order -- in C parlance, that means
|
|
strcoll(3) instead of strcmp(3). Some locales have a case insensitive
|
|
strcoll, others don't.
|
|
|
|
Another problem is that [a-z] tries to use collation symbols. This
|
|
only happens if you are on the GNU system, using GNU libc's regular
|
|
expression matcher instead of compiling the one supplied with GNU sed.
|
|
In a Danish locale, for example, the regular expression '^[a-z]$'
|
|
matches the string 'aa', because 'aa' is a single collating symbol that
|
|
comes after 'a' and before 'b'; 'll' behaves similarly in Spanish
|
|
locales, or 'ij' in Dutch locales.
|
|
|
|
Another common localization-related problem happens if your input stream
|
|
includes invalid multibyte sequences. POSIX mandates that such
|
|
sequences are _not_ matched by '.', so that 's/.*//' will not clear
|
|
pattern space as you would expect. In fact, there is no way to clear
|
|
sed's buffers in the middle of the script in most multibyte locales
|
|
(including UTF-8 locales). For this reason, GNU sed provides a 'z'
|
|
command (for 'zap') as an extension.
|
|
|
|
However, to work around both of these problems, which may cause bugs
|
|
in shell scripts, you can set the LC_ALL environment variable to 'C',
|
|
or set the locale on a more fine-grained basis with the other LC_*
|
|
environment variables.
|