are not an alphabetic character although they are not word-breakers too.
So, treat them as part of word.
Per off-list discussion with Dibyendra Hyoju <dibyendra@gmail.com> and
and Bal Krishna Bal <balkrishna7bal@gmail.com> about Nepali language and
Devanagari alphabet.
- pg_wchar and wchar_t could have different size, so char2wchar
doesn't call pg_mb2wchar_with_len to prevent out-of-bound
memory bug
- make char2wchar/wchar2char symmetric, now they should not be
called with C-locale because mbstowcs/wcstombs oftenly doesn't
work correct with C-locale.
- Text parser uses pg_mb2wchar_with_len directly in case of
C-locale and multibyte encoding
Per bug report by Hiroshi Inoue <inoue@tpf.co.jp> and
following discussion.
Backpatch up to 8.2 when multybyte support was implemented in tsearch.
to 10, to compensate for the recent change in default statistics target.
The original number was pulled out of the air anyway :-(, but it was picked
in the context of the old default, so holding the default size of the
MCELEM array constant seems the best thing. Per discussion.
on the most common individual lexemes in place of the mostly-useless default
behavior of counting duplicate tsvectors. Future work: create selectivity
estimation functions that actually do something with these stats.
(Some other things we ought to look at doing: using the Lossy Counting
algorithm in compute_minimal_stats, and using the element-counting idea for
stats on regular arrays.)
Jan Urbanski
by installing an error context subroutine that will provide the file name
and line number for all errors detected while reading a config file.
Some of the reader routines were already doing that in an ad-hoc way for
errors detected directly in the reader, but it didn't help for problems
detected in subroutines, such as encoding violations.
Back-patch to 8.3 because 8.3 is where people will be trying to debug
configuration files.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
strings. This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString. A number of
existing macros that provided variants on these themes were removed.
Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed. There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).
This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.
Brendan Jurd and Tom Lane
a unused memory holes in tsquery.
Per report by Richard Huxton <dev@archonet.com>.
It was working well because in fact tsquery->size is not used for any
kind of operation except comparing tsqueries. So, in HEAD it's enough to
fix to_tsquery function, but for previous version it's needed to
remove optimization in CompareTSQ to prevent requirement of renew all
stored tsquery.
regis. Correct the latter's oversight that a bracket-expression needs to be
terminated. Reduce the ereports to elogs, since they are now not expected to
ever be hit (thus addressing Alvaro's original complaint).
In passing, const-ify the string argument to RS_compile.
subtlety that this function only returns a null terminator if it's
fed input that includes one; which, in the usage here, it's not.
This probably fixes bugs reported by Thomas Haegi.
Allow tag and entity names that follow XML rules. Provide for hexadecimal
as well as decimal numeric entities. Adjust code names to coincide with
new descriptions.
gives the old behavior; selecting false allows the dictionary to be used
as a filter ahead of other dictionaries, because it will pass on rather
than accept words that aren't in its stopword list.
Jan Urbanski
Throw an error for actual stop words, rather than a warning. This fixes
problems with cache reloading causing warning messages.
Re-enable stop words in regression tests; was disabled by Tom.
Document "?" as API change.
behavior of wchar2char/char2wchar; this should resolve bug #3730. Avoid
excess computations of pg_mblen in t_isalpha and friends. Const-ify
APIs where possible.