Improve the recently-added support for properly pluralized error messages

by extending the ereport() API to cater for pluralization directly. This is better than the original method of calling ngettext outside the elog.c code because (1) it avoids double translation, which wastes cycles and in the worst case could give a wrong result; and (2) it avoids having to use a different coding method in PL code than in the core backend. The client-side uses of ngettext are not touched since neither of these concerns is very pressing in the client environment. Per my proposal of yesterday.
2025-07-28 23:42:10 +03:00 · 2009-06-04 18:33:08 +00:00
parent fd416db406
commit 76d4abf2d9
17 changed files with 292 additions and 102 deletions
--- a/doc/src/sgml/nls.sgml
+++ b/doc/src/sgml/nls.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/nls.sgml,v 1.17 2009/01/09 10:54:07 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/nls.sgml,v 1.18 2009/06/04 18:33:06 tgl Exp $ -->

 <chapter id="nls">
 <chapterinfo>
@ -46,7 +46,7 @@
    <filename>msgmerge</filename>, respectively, in a GNU-compatible
    implementation.  Later, we will try to arrange it so that if you
    use a packaged source distribution, you won't need
-    <filename>xgettext</filename>.  (From CVS, you will still need
+    <filename>xgettext</filename>.  (If working from CVS, you will still need
    it.)  <application>GNU Gettext 0.10.36</application> or later is currently recommended.
   </para>

@ -152,7 +152,7 @@ msgstr "another translated"
    If there are already some <filename>.po</filename> files, then
    someone has already done some translation work.  The files are
    named <filename><replaceable>language</replaceable>.po</filename>,
-    where <replaceable>language</replaceable> is the 
+    where <replaceable>language</replaceable> is the
    <ulink url="http://lcweb.loc.gov/standards/iso639-2/englangn.html">
    ISO 639-1 two-letter language code (in lower case)</ulink>, e.g.,
    <filename>fr.po</filename> for French.  If there is really a need
@ -224,7 +224,7 @@ gmake update-po
    that gives room for other people to pick up your work.  However,
    you are encouraged to give priority to removing fuzzy entries
    after doing a merge.  Remember that fuzzy entries will not be
-    installed; they only serve as reference what might be the right
+    installed; they only serve as reference for what might be the right
    translation.
   </para>

@ -347,8 +347,8 @@ fprintf(stderr, "panic level %d\n", lvl);
 <programlisting>
 fprintf(stderr, gettext("panic level %d\n"), lvl);
 </programlisting>
-     (<symbol>gettext</symbol> is defined as a no-op if no NLS is
-     configured.)
+     (<symbol>gettext</symbol> is defined as a no-op if NLS support is
+     not configured.)
    </para>

    <para>
@ -421,6 +421,9 @@ fprintf(stderr, gettext("panic level %d\n"), lvl);
         them here.  If the translatable string is not the first
         argument, the item needs to be of the form
         <literal>func:2</literal> (for the second argument).
+         If you have a function that supports pluralized messages,
+         the item should look like <literal>func:1,2</literal>
+         (identifying the singular and plural message arguments).
        </para>
       </listitem>
      </varlistentry>
@ -451,8 +454,8 @@ fprintf(stderr, gettext("panic level %d\n"), lvl);
 printf("Files were %s.\n", flag ? "copied" : "removed");
 </programlisting>
      The word order within the sentence might be different in other
-      languages.  Also, even if you remember to call gettext() on each
-      fragment, the fragments might not translate well separately.  It's
+      languages.  Also, even if you remember to call <function>gettext()</> on
+      each fragment, the fragments might not translate well separately.  It's
      better to duplicate a little code so that each message to be
      translated is a coherent whole.  Only numbers, file names, and
      such-like run-time variables should be inserted at run time into
@ -475,13 +478,44 @@ else
    printf("copied %d files", n):
 </programlisting>
      then be disappointed.  Some languages have more than two forms,
-      with some peculiar rules.  We might have a solution for this in
-      the future, but for now the matter is best avoided altogether.
-      You could write:
+      with some peculiar rules.  It's often best to design the message
+      to avoid the issue altogether, for instance like this:
 <programlisting>
 printf("number of copied files: %d", n);
 </programlisting>
     </para>
+
+     <para>
+      If you really want to construct a properly pluralized message,
+      there is support for this, but it's a bit awkward.  When generating
+      a primary or detail error message in <function>ereport()</>, you can
+      write something like this:
+<programlisting>
+errmsg_plural("copied %d file",
+              "copied %d files",
+              n,
+              n)
+</programlisting>
+      The first argument is the format string appropriate for English
+      singular form, the second is the format string appropriate for
+      English plural form, and the third is the integer control value
+      that determines which plural form to use.  Subsequent arguments
+      are formatted per the format string as usual.  (Normally, the
+      pluralization control value will also be one of the values to be
+      formatted, so it has to be written twice.)  In English it only
+      matters whether <replaceable>n</> is 1 or not 1, but in other
+      languages there can be many different plural forms.  The translator
+      sees the two English forms as a group and has the opportunity to
+      supply multiple substitute strings, with the appropriate one being
+      selected based on the run-time value of <replaceable>n</>.
+     </para>
+
+     <para>
+      If you need to pluralize a message that isn't going directly to an
+      <function>errmsg</> or <function>errdetail</> report, you have to use
+      the underlying function <function>ngettext</>.  See the gettext
+      documentation.
+     </para>
    </listitem>

    <listitem>
--- a/doc/src/sgml/sources.sgml
+++ b/doc/src/sgml/sources.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/sources.sgml,v 2.33 2009/04/27 16:27:36 momjian Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/sources.sgml,v 2.34 2009/06/04 18:33:06 tgl Exp $ -->

 <chapter id="source">
  <title>PostgreSQL Coding Conventions</title>
@ -181,6 +181,19 @@ ereport(ERROR,
     not worth expending translation effort on.
    </para>
   </listitem>
+   <listitem>
+    <para>
+     <function>errmsg_plural(const char *fmt_singular, const char *fmt_plural,
+     unsigned long n, ...)</function> is like <function>errmsg</>, but with
+     support for various plural forms of the message.
+     <replaceable>fmt_singular</> is the English singular format,
+     <replaceable>fmt_plural</> is the English plural format,
+     <replaceable>n</> is the integer value that determines which plural
+     form is needed, and the remaining arguments are formatted according
+     to the selected format string.  For more information see
+     <xref linkend="nls-guidelines">.
+    </para>
+   </listitem>
   <listitem>
    <para>
     <function>errdetail(const char *msg, ...)</function> supplies an optional
@ -201,6 +214,14 @@ ereport(ERROR,
     sent to the client.
    </para>
   </listitem>
+   <listitem>
+    <para>
+     <function>errdetail_plural(const char *fmt_singular, const char *fmt_plural,
+     unsigned long n, ...)</function> is like <function>errdetail</>, but with
+     support for various plural forms of the message.
+     For more information see <xref linkend="nls-guidelines">.
+    </para>
+   </listitem>
   <listitem>
    <para>
     <function>errhint(const char *msg, ...)</function> supplies an optional
@ -390,14 +411,14 @@ Hint:       the addendum
   <para>
    There are functions in the backend that will double-quote their own output
    at need (for example, <function>format_type_be</>()).  Do not put
-    additional quotes around the output of such functions. 
+    additional quotes around the output of such functions.
   </para>

   <para>
    Rationale: Objects can have names that create ambiguity when embedded in a
    message.  Be consistent about denoting where a plugged-in name starts and
    ends.  But don't clutter messages with unnecessary or duplicate quote
-    marks. 
+    marks.
   </para>

  </simplesect>
@ -413,7 +434,7 @@ Hint:       the addendum
   <para>
    Primary error messages: Do not capitalize the first letter.  Do not end a
    message with a period.  Do not even think about ending a message with an
-    exclamation point. 
+    exclamation point.
   </para>

   <para>
@ -430,7 +451,7 @@ Hint:       the addendum
    long enough to be more than one sentence, they should be split into
    primary and detail parts.)  However, detail and hint messages are longer
    and might need to include multiple sentences.  For consistency, they should
-    follow complete-sentence style even when there's only one sentence. 
+    follow complete-sentence style even when there's only one sentence.
   </para>

  </simplesect>
@ -473,7 +494,7 @@ Hint:       the addendum
   <para>
    Use past tense if an attempt to do something failed, but could perhaps
    succeed next time (perhaps after fixing some problem).  Use present tense
-    if the failure is certainly permanent. 
+    if the failure is certainly permanent.
   </para>

   <para>
@ -489,20 +510,20 @@ cannot open file "%s"
    message should give a reason, such as <quote>disk full</quote> or
    <quote>file doesn't exist</quote>.  The past tense is appropriate because
    next time the disk might not be full anymore or the file in question might
-    exist. 
+    exist.
   </para>

   <para>
    The second form indicates that the functionality of opening the named file
    does not exist at all in the program, or that it's conceptually
    impossible.  The present tense is appropriate because the condition will
-    persist indefinitely. 
+    persist indefinitely.
   </para>

   <para>
    Rationale: Granted, the average user will not be able to draw great
    conclusions merely from the tense of the message, but since the language
-    provides us with a grammar we should use it correctly. 
+    provides us with a grammar we should use it correctly.
   </para>

  </simplesect>
@ -552,7 +573,7 @@ could not open file %s: %m
    to paste this into a single smooth sentence, so some sort of punctuation
    is needed.  Putting the embedded text in parentheses has also been
    suggested, but it's unnatural if the embedded text is likely to be the
-    most important part of the message, as is often the case. 
+    most important part of the message, as is often the case.
   </para>

  </simplesect>
@ -579,7 +600,7 @@ BETTER: could not open file %s (I/O failure)
    Don't include the name of the reporting routine in the error text. We have
    other mechanisms for finding that out when needed, and for most users it's
    not helpful information.  If the error text doesn't make as much sense
-    without the function name, reword it. 
+    without the function name, reword it.
 <programlisting>
 BAD:    pg_atoi: error in "z": cannot parse "z"
 BETTER: invalid input syntax for integer: "z"
@ -620,7 +641,7 @@ BETTER: could not open file %s: %m
   <para>
    Error messages like <quote>bad result</quote> are really hard to interpret
    intelligently.  It's better to write why the result is <quote>bad</quote>,
-    e.g., <quote>invalid format</quote>. 
+    e.g., <quote>invalid format</quote>.
   </para>
  </formalpara>

@ -638,7 +659,7 @@ BETTER: could not open file %s: %m
    Try to avoid <quote>unknown</quote>.  Consider <quote>error: unknown
    response</quote>.  If you don't know what the response is, how do you know
    it's erroneous? <quote>Unrecognized</quote> is often a better choice.
-    Also, be sure to include the value being complained of. 
+    Also, be sure to include the value being complained of.
 <programlisting>
 BAD:    unknown node type
 BETTER: unrecognized node type: 42
@ -654,7 +675,7 @@ BETTER: unrecognized node type: 42
    couldn't <quote>find</quote> the resource.  If, on the other hand, the
    expected location of the resource is known but the program cannot access
    it there then say that the resource doesn't <quote>exist</quote>.  Using
-    <quote>find</quote> in this case sounds weak and confuses the issue. 
+    <quote>find</quote> in this case sounds weak and confuses the issue.
   </para>
  </formalpara>