postgres

mirror of https://github.com/postgres/postgres.git synced 2025-07-05 07:21:24 +03:00

Author	SHA1	Message	Date
Heikki Linnakangas	dd12bef58c	Include array size in forward declaration. Some compilers require it. At least Visual Studio, according to the buildfarm, and gcc with the -pedantic flag.	2017-03-13 21:53:38 +02:00
Heikki Linnakangas	aeed17d000	Use radix tree for character encoding conversions. Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp	2017-03-13 20:46:39 +02:00
Heikki Linnakangas	84892692fd	Remove obsolete references to JIS0201.TXT JIS0208.TXT. We don't use those files anymore, since commit `1de9cc0dcc`.	2017-03-13 19:06:56 +02:00
Heikki Linnakangas	53dd2da257	Add KOI8-U map files to Makefile. These were left out by mistake back when support for KOI8-U encoding was added. Extracted from Kyotaro Horiguchi's larger patch.	2017-02-02 14:12:35 +02:00
Heikki Linnakangas	bc1686f3f6	Small fixes to the Perl scripts to create unicode conversion tables. Add missing semicolons in UCS_to_* perl scripts. For consistency, use "$hashref->{key}" style everywhere. Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/20170130.153738.139030994.horiguchi.kyotaro@lab.ntt.co.jp	2017-02-01 11:23:53 +02:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Heikki Linnakangas	021d254d9a	Make all unicode perl scripts to use strict, rearrange logic for clarity. The loops were a bit difficult to understand, due to breaking out of them early. Also fix things that perlcritic complained about. Daniel Gustafsson	2016-11-30 18:06:34 +02:00
Heikki Linnakangas	1de9cc0dcc	Rewrite the perl scripts to produce our Unicode conversion tables. Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no longer available. Get UHC from windows-949-2000.xml, it's more up-to-date. Plus tons more small changes. With these changes, the perl scripts faithfully produce the *.map files we have in the repository, from the external source files. In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT. Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:54:52 +02:00
Heikki Linnakangas	6c303223be	Remove leading zeros, for consistency with other map files. The common style is to pad to 4 digits. Running the current perl scripts to generate these map files would override this change, but the next commit will rewrite the perl scripts to produce this style. I'm doing this as a separate commit, to make it more clear what non-cosmetic changes the next commit makes to the map files. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:54:41 +02:00
Heikki Linnakangas	2c09c93ce1	Remove code points < 0x80 from character conversion tables. PostgreSQL treats characters with < 0x80 leading byte as plain ASCII, and they are not even passed to the conversion routines. There is no point in having them in the conversion tables. Everything in the tables were direct ASCII-ASCII mappings, except for two: * SHIFT_JIS_2004 code point 0x5C (backslash in ASCII) was mapped to Unicode YEN SIGN character. * Unicode 0x5C (backslash again) was mapped to "REVERSE SOLIDUS" in SHIFT_JIS_2004 These mappings never had any effect, so there's no functional change from removing them. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:53:57 +02:00
Robert Haas	00c6d8077f	Fix broken statement in UCS_to_most.pl. This has been wrong for a very long time, and it's puzzling to me how it ever worked for anyone. Kyotaro Horiguchi	2016-11-15 09:41:53 -05:00
Peter Eisentraut	3a47c704fb	Add make rules to download raw Unicode mapping files This serves as implicit documentation and is handy if someone wants to tweak things. The rules are not part of a normal build, like this entire directory.	2016-11-01 11:54:58 -04:00
Heikki Linnakangas	0aec7f9aec	Remove bogus mapping from UTF-8 to SJIS conversion table. 0xc19c is not a valid UTF-8 byte sequence. It doesn't do any harm, AFAICS, but it's surely not intentional. No backpatching though, just to be sure. In the passing, also add a file header comment to the file, like the UCS_to_SJIS.pl script would produce. (The file was originally created with UCS_to_SJIS.pl, but has been modified by hand since then. That's questionable, but I'll leave fixing that for later.) Kyotaro Horiguchi Discussion: <20160907.155050.233844095.horiguchi.kyotaro@lab.ntt.co.jp>	2016-10-07 23:56:42 +03:00
Noah Misch	3be0a62ffe	Finish pgindent run for 9.6: Perl files.	2016-06-12 04:19:56 -04:00
Peter Eisentraut	f9e5ed61ed	UCS_to_EUC_JIS_2004.pl: Turn off "test" mode by default It produces debugging output files that are of no further use, so we don't need that by default.	2016-03-16 10:43:05 -04:00
Peter Eisentraut	9dbcb500ca	Make spacing and punctuation consistent	2016-03-16 10:43:05 -04:00
Peter Eisentraut	1fa2a6b1d4	Add prerequisite for KOI8-U.TXT This was missed when the encoding was added.	2016-03-03 20:44:47 -05:00
Peter Eisentraut	b497abc602	Make some adjustments in variable assignments These variables aren't really used for anything interesting, but it seems the existing grouping was somewhat nonsensical.	2016-03-03 20:44:47 -05:00
Peter Eisentraut	7a4a813c99	Add missing rules related to EUC_JIS_2004 and SHIFT_JIS_2004 encodings This was apparently forgotten in commit `75c6519ff6`.	2016-03-03 20:44:47 -05:00
Peter Eisentraut	bd6cf3f237	Add Unicode map generation scripts as rule prerequisites That way, the rules will trigger when the scripts change.	2016-02-29 21:19:28 -05:00
Peter Eisentraut	cc074bf6c1	Fix comments Some of these comments were copied and pasted without updating them, some of them were duplicates.	2016-02-29 21:19:24 -05:00
Peter Eisentraut	9a3e06baa2	UCS_to_most.pl: Make executable, for consistency with other scripts	2016-02-29 21:19:17 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Tom Lane	5afdfc9cbb	Update UCS_to_GB18030.pl with info about origin of the reference file.	2015-11-27 17:31:26 -05:00
Tom Lane	e17dab53ea	Auto-generate file header comments in Unicode mapping files. Some of the Unicode/.map files had identification comments added to them, evidently by hand. Others did not. Modify the generating scripts to produce these comments automatically, and update the generated files that lacked them. This is just minor cleanup as a by-product of trying to verify that the .map files can indeed be reproduced from authoritative data. There are a depressingly large number that fail to reproduce from the claimed sources. I have not touched those in this commit, except for the JIS 2004-related files which required only a single comment update to match. Since this only affects comments, no need to consider a back-patch.	2015-11-27 16:50:47 -05:00
Bruce Momjian	807b9e0dff	pgindent run for 9.5	2015-05-23 21:35:49 -04:00
Tom Lane	8d3e0906df	Extend GB18030 encoding conversion to cover full Unicode range. Our previous code for GB18030 <-> UTF8 conversion only covered Unicode code points up to U+FFFF, but the actual spec defines conversions for all code points up to U+10FFFF. That would be rather impractical as a lookup table, but fortunately there is a simple algorithmic conversion between the additional code points and the equivalent GB18030 byte patterns. Make use of the just-added callback facility in LocalToUtf/UtfToLocal to perform the additional conversions. Having created the infrastructure to do that, we can use the same code to map certain linearly-related subranges of the Unicode space below U+FFFF, allowing removal of the corresponding lookup table entries. This more than halves the lookup table size, which is a substantial savings; utf8_and_gb18030.so drops from nearly a megabyte to about half that. In support of doing that, replace ISO10646-GB18030.TXT with the data file gb-18030-2000.xml (retrieved from http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ ) in which these subranges have been deleted from the simple lookup entries. Per bug #12845 from Arjen Nienhuis. The conversion code added here is based on his proposed patch, though I whacked it around rather heavily.	2015-05-15 15:02:13 -04:00
Tom Lane	7730f48ede	Teach UtfToLocal/LocalToUtf to support algorithmic encoding conversions. Until now, these functions have only supported encoding conversions using lookup tables, which is fine as long as there's not too many code points to convert. However, GB18030 expects all 1.1 million Unicode code points to be convertible, which would require a ridiculously-sized lookup table. Fortunately, a large fraction of those conversions can be expressed through arithmetic, ie the conversions are one-to-one in certain defined ranges. To support that, provide a callback function that is used after consulting the lookup tables. (This patch doesn't actually change anything about the GB18030 conversion behavior, just provide infrastructure for fixing it.) Since this requires changing the APIs of UtfToLocal/LocalToUtf anyway, take the opportunity to rearrange their argument lists into what seems to me a saner order. And beautify the call sites by using lengthof() instead of error-prone sizeof() arithmetic. In passing, also mark all the lookup tables used by these calls "const". This moves an impressive amount of stuff into the text segment, at least on my machine, and is safer anyhow.	2015-05-14 22:27:12 -04:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Bruce Momjian	042d9ffc28	Run newly-configured perltidy script on Perl files. Run on HEAD and 9.2.	2012-07-04 21:47:49 -04:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Peter Eisentraut	fc946c39ae	Remove useless whitespace at end of lines	2010-11-23 22:34:55 +02:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Tom Lane	f679cfe97b	Replace last remaining $Id$ with $PostgreSQL$.	2010-09-19 16:27:17 +00:00
Peter Eisentraut	3f11971916	Remove extra newlines at end and beginning of files, add missing newlines at end of files.	2010-08-19 05:57:36 +00:00
Bruce Momjian	346a721eed	Remove personal copyright now that file has been rewritten using existing *.pl conversion script. Andreas 'ads' Scherbaum	2010-02-16 20:35:07 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Tatsuo Ishii	5c7f55342b	Update UTF-8 <--> EUC_KR, JOHAB, UHC mappings. Patch contributed by Chuck McDevitt	2009-05-03 01:17:41 +00:00
Heikki Linnakangas	afcde99b1b	Fix case of the just resurrected UCS_to_BIG5.pl script, and update Makefile to use it.	2009-03-18 16:26:18 +00:00
Heikki Linnakangas	2dbbf33f4a	Add seven kanji characters defined in the Windows 950 codepage to our big5/win950 <-> UTF8 conversion tables. Per report by Roger Chang.	2009-03-18 16:17:58 +00:00
Peter Eisentraut	8b9dd6b5fd	Support for KOI8U encoding	2009-02-10 19:29:39 +00:00
Peter Eisentraut	06941da30b	Add possibility to generate only some files, by passing command-line arguments.	2009-02-10 16:36:55 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Tom Lane	ce9baa06f0	Fix some missed copyright updates.	2008-01-01 20:31:21 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Tatsuo Ishii	75c6519ff6	Add new encoding EUC_JIS_2004 and SHIFT_JIS_2004, along with new conversions among EUC_JIS_2004, SHIFT_JIS_2004 and UTF-8. catalog version has been bump up.	2007-03-25 11:56:04 +00:00
Tatsuo Ishii	4c35ec53a9	Allow 4 bytes UTF-8 (UCS-4 range 00010000-001FFFFF) This is necessary to support JIS X 0213 <--> UTF-8 conversion.	2007-03-23 13:51:30 +00:00

1 2

85 Commits