1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-28 23:42:10 +03:00

Improve the recently-added support for properly pluralized error messages

by extending the ereport() API to cater for pluralization directly.  This
is better than the original method of calling ngettext outside the elog.c
code because (1) it avoids double translation, which wastes cycles and in
the worst case could give a wrong result; and (2) it avoids having to use
a different coding method in PL code than in the core backend.  The
client-side uses of ngettext are not touched since neither of these concerns
is very pressing in the client environment.  Per my proposal of yesterday.
This commit is contained in:
Tom Lane
2009-06-04 18:33:08 +00:00
parent fd416db406
commit 76d4abf2d9
17 changed files with 292 additions and 102 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/nls.sgml,v 1.17 2009/01/09 10:54:07 petere Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/nls.sgml,v 1.18 2009/06/04 18:33:06 tgl Exp $ -->
<chapter id="nls">
<chapterinfo>
@ -46,7 +46,7 @@
<filename>msgmerge</filename>, respectively, in a GNU-compatible
implementation. Later, we will try to arrange it so that if you
use a packaged source distribution, you won't need
<filename>xgettext</filename>. (From CVS, you will still need
<filename>xgettext</filename>. (If working from CVS, you will still need
it.) <application>GNU Gettext 0.10.36</application> or later is currently recommended.
</para>
@ -152,7 +152,7 @@ msgstr "another translated"
If there are already some <filename>.po</filename> files, then
someone has already done some translation work. The files are
named <filename><replaceable>language</replaceable>.po</filename>,
where <replaceable>language</replaceable> is the
where <replaceable>language</replaceable> is the
<ulink url="http://lcweb.loc.gov/standards/iso639-2/englangn.html">
ISO 639-1 two-letter language code (in lower case)</ulink>, e.g.,
<filename>fr.po</filename> for French. If there is really a need
@ -224,7 +224,7 @@ gmake update-po
that gives room for other people to pick up your work. However,
you are encouraged to give priority to removing fuzzy entries
after doing a merge. Remember that fuzzy entries will not be
installed; they only serve as reference what might be the right
installed; they only serve as reference for what might be the right
translation.
</para>
@ -347,8 +347,8 @@ fprintf(stderr, "panic level %d\n", lvl);
<programlisting>
fprintf(stderr, gettext("panic level %d\n"), lvl);
</programlisting>
(<symbol>gettext</symbol> is defined as a no-op if no NLS is
configured.)
(<symbol>gettext</symbol> is defined as a no-op if NLS support is
not configured.)
</para>
<para>
@ -421,6 +421,9 @@ fprintf(stderr, gettext("panic level %d\n"), lvl);
them here. If the translatable string is not the first
argument, the item needs to be of the form
<literal>func:2</literal> (for the second argument).
If you have a function that supports pluralized messages,
the item should look like <literal>func:1,2</literal>
(identifying the singular and plural message arguments).
</para>
</listitem>
</varlistentry>
@ -451,8 +454,8 @@ fprintf(stderr, gettext("panic level %d\n"), lvl);
printf("Files were %s.\n", flag ? "copied" : "removed");
</programlisting>
The word order within the sentence might be different in other
languages. Also, even if you remember to call gettext() on each
fragment, the fragments might not translate well separately. It's
languages. Also, even if you remember to call <function>gettext()</> on
each fragment, the fragments might not translate well separately. It's
better to duplicate a little code so that each message to be
translated is a coherent whole. Only numbers, file names, and
such-like run-time variables should be inserted at run time into
@ -475,13 +478,44 @@ else
printf("copied %d files", n):
</programlisting>
then be disappointed. Some languages have more than two forms,
with some peculiar rules. We might have a solution for this in
the future, but for now the matter is best avoided altogether.
You could write:
with some peculiar rules. It's often best to design the message
to avoid the issue altogether, for instance like this:
<programlisting>
printf("number of copied files: %d", n);
</programlisting>
</para>
<para>
If you really want to construct a properly pluralized message,
there is support for this, but it's a bit awkward. When generating
a primary or detail error message in <function>ereport()</>, you can
write something like this:
<programlisting>
errmsg_plural("copied %d file",
"copied %d files",
n,
n)
</programlisting>
The first argument is the format string appropriate for English
singular form, the second is the format string appropriate for
English plural form, and the third is the integer control value
that determines which plural form to use. Subsequent arguments
are formatted per the format string as usual. (Normally, the
pluralization control value will also be one of the values to be
formatted, so it has to be written twice.) In English it only
matters whether <replaceable>n</> is 1 or not 1, but in other
languages there can be many different plural forms. The translator
sees the two English forms as a group and has the opportunity to
supply multiple substitute strings, with the appropriate one being
selected based on the run-time value of <replaceable>n</>.
</para>
<para>
If you need to pluralize a message that isn't going directly to an
<function>errmsg</> or <function>errdetail</> report, you have to use
the underlying function <function>ngettext</>. See the gettext
documentation.
</para>
</listitem>
<listitem>

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/sources.sgml,v 2.33 2009/04/27 16:27:36 momjian Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/sources.sgml,v 2.34 2009/06/04 18:33:06 tgl Exp $ -->
<chapter id="source">
<title>PostgreSQL Coding Conventions</title>
@ -181,6 +181,19 @@ ereport(ERROR,
not worth expending translation effort on.
</para>
</listitem>
<listitem>
<para>
<function>errmsg_plural(const char *fmt_singular, const char *fmt_plural,
unsigned long n, ...)</function> is like <function>errmsg</>, but with
support for various plural forms of the message.
<replaceable>fmt_singular</> is the English singular format,
<replaceable>fmt_plural</> is the English plural format,
<replaceable>n</> is the integer value that determines which plural
form is needed, and the remaining arguments are formatted according
to the selected format string. For more information see
<xref linkend="nls-guidelines">.
</para>
</listitem>
<listitem>
<para>
<function>errdetail(const char *msg, ...)</function> supplies an optional
@ -201,6 +214,14 @@ ereport(ERROR,
sent to the client.
</para>
</listitem>
<listitem>
<para>
<function>errdetail_plural(const char *fmt_singular, const char *fmt_plural,
unsigned long n, ...)</function> is like <function>errdetail</>, but with
support for various plural forms of the message.
For more information see <xref linkend="nls-guidelines">.
</para>
</listitem>
<listitem>
<para>
<function>errhint(const char *msg, ...)</function> supplies an optional
@ -390,14 +411,14 @@ Hint: the addendum
<para>
There are functions in the backend that will double-quote their own output
at need (for example, <function>format_type_be</>()). Do not put
additional quotes around the output of such functions.
additional quotes around the output of such functions.
</para>
<para>
Rationale: Objects can have names that create ambiguity when embedded in a
message. Be consistent about denoting where a plugged-in name starts and
ends. But don't clutter messages with unnecessary or duplicate quote
marks.
marks.
</para>
</simplesect>
@ -413,7 +434,7 @@ Hint: the addendum
<para>
Primary error messages: Do not capitalize the first letter. Do not end a
message with a period. Do not even think about ending a message with an
exclamation point.
exclamation point.
</para>
<para>
@ -430,7 +451,7 @@ Hint: the addendum
long enough to be more than one sentence, they should be split into
primary and detail parts.) However, detail and hint messages are longer
and might need to include multiple sentences. For consistency, they should
follow complete-sentence style even when there's only one sentence.
follow complete-sentence style even when there's only one sentence.
</para>
</simplesect>
@ -473,7 +494,7 @@ Hint: the addendum
<para>
Use past tense if an attempt to do something failed, but could perhaps
succeed next time (perhaps after fixing some problem). Use present tense
if the failure is certainly permanent.
if the failure is certainly permanent.
</para>
<para>
@ -489,20 +510,20 @@ cannot open file "%s"
message should give a reason, such as <quote>disk full</quote> or
<quote>file doesn't exist</quote>. The past tense is appropriate because
next time the disk might not be full anymore or the file in question might
exist.
exist.
</para>
<para>
The second form indicates that the functionality of opening the named file
does not exist at all in the program, or that it's conceptually
impossible. The present tense is appropriate because the condition will
persist indefinitely.
persist indefinitely.
</para>
<para>
Rationale: Granted, the average user will not be able to draw great
conclusions merely from the tense of the message, but since the language
provides us with a grammar we should use it correctly.
provides us with a grammar we should use it correctly.
</para>
</simplesect>
@ -552,7 +573,7 @@ could not open file %s: %m
to paste this into a single smooth sentence, so some sort of punctuation
is needed. Putting the embedded text in parentheses has also been
suggested, but it's unnatural if the embedded text is likely to be the
most important part of the message, as is often the case.
most important part of the message, as is often the case.
</para>
</simplesect>
@ -579,7 +600,7 @@ BETTER: could not open file %s (I/O failure)
Don't include the name of the reporting routine in the error text. We have
other mechanisms for finding that out when needed, and for most users it's
not helpful information. If the error text doesn't make as much sense
without the function name, reword it.
without the function name, reword it.
<programlisting>
BAD: pg_atoi: error in "z": cannot parse "z"
BETTER: invalid input syntax for integer: "z"
@ -620,7 +641,7 @@ BETTER: could not open file %s: %m
<para>
Error messages like <quote>bad result</quote> are really hard to interpret
intelligently. It's better to write why the result is <quote>bad</quote>,
e.g., <quote>invalid format</quote>.
e.g., <quote>invalid format</quote>.
</para>
</formalpara>
@ -638,7 +659,7 @@ BETTER: could not open file %s: %m
Try to avoid <quote>unknown</quote>. Consider <quote>error: unknown
response</quote>. If you don't know what the response is, how do you know
it's erroneous? <quote>Unrecognized</quote> is often a better choice.
Also, be sure to include the value being complained of.
Also, be sure to include the value being complained of.
<programlisting>
BAD: unknown node type
BETTER: unrecognized node type: 42
@ -654,7 +675,7 @@ BETTER: unrecognized node type: 42
couldn't <quote>find</quote> the resource. If, on the other hand, the
expected location of the resource is known but the program cannot access
it there then say that the resource doesn't <quote>exist</quote>. Using
<quote>find</quote> in this case sounds weak and confuses the issue.
<quote>find</quote> in this case sounds weak and confuses the issue.
</para>
</formalpara>