1
0
mirror of https://github.com/postgres/postgres.git synced 2025-07-02 09:02:37 +03:00

Indexing support for pattern matching operations via separate operator

class when lc_collate is not C.
This commit is contained in:
Peter Eisentraut
2003-05-15 15:50:21 +00:00
parent 2a2f6cfa39
commit 2c0556068f
20 changed files with 489 additions and 208 deletions

View File

@ -1,4 +1,4 @@
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.35 2003/04/15 13:26:54 petere Exp $ -->
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.36 2003/05/15 15:50:18 petere Exp $ -->
<chapter id="charset">
<title>Localization</>
@ -213,23 +213,13 @@ initdb --locale=sv_SE
The <function>to_char</> family of functions
</para>
</listitem>
<listitem>
<para>
The <literal>LIKE</> and <literal>~</> operators for pattern
matching
</para>
</listitem>
</itemizedlist>
</para>
<para>
The only severe drawback of using the locale support in
<productname>PostgreSQL</> is its speed. So use locales only if you
actually need it. It should be noted in particular that selecting
a non-C locale disables index optimizations for <literal>LIKE</> and
<literal>~</> operators, which can make a huge difference in the
speed of searches that use those operators.
<productname>PostgreSQL</> is its speed. So use locales only if
you actually need them.
</para>
</sect2>

View File

@ -1,4 +1,4 @@
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.40 2003/03/25 16:15:36 petere Exp $ -->
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v 1.41 2003/05/15 15:50:18 petere Exp $ -->
<chapter id="indexes">
<title id="indexes-title">Indexes</title>
@ -132,6 +132,19 @@ CREATE INDEX test1_id_index ON test1 (id);
</simplelist>
</para>
<para>
The optimizer can also use a B-tree index for queries involving the
pattern matching operators <literal>LIKE</>,
<literal>ILIKE</literal>, <literal>~</literal>, and
<literal>~*</literal>, <emphasis>if</emphasis> the pattern is
anchored to the beginning of the string, e.g., <literal>col LIKE
'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not
<literal>col LIKE '%bar'</literal>. However, if your server does
not use the C locale you will need to create the index with a
special operator class. See <xref linkend="indexes-opclass">
below.
</para>
<para>
<indexterm>
<primary>indexes</primary>
@ -405,6 +418,36 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable>
<literal>bigbox_ops</literal>.
</para>
</listitem>
<listitem>
<para>
The operator classes <literal>text_pattern_ops</literal>,
<literal>varchar_pattern_ops</literal>,
<literal>bpchar_pattern_ops</literal>, and
<literal>name_pattern_ops</literal> support B-tree indexes on
the types <type>text</type>, <type>varchar</type>,
<type>char</type>, and <type>name</type>, respectively. The
difference to the ordinary operator classes is that the values
are compared strictly character by character rather than
according to the locale-specific collation rules. This makes
these operator classes suitable for use by queries involving
pattern matching expressions (<literal>LIKE</literal> or POSIX
regular expressions) if the server does not use the standard
<quote>C</quote> locale. As an example, to index a
<type>varchar</type> column like this:
<programlisting>
CREATE INDEX test_index ON test_table (col varchar_pattern_ops);
</programlisting>
If you do use the C locale, you should instead create an index
with the default operator class. Also note that you should
create an index with the default operator class if you want
queries involving ordinary comparisons to use an index. Such
queries cannot use the
<literal><replaceable>xxx</replaceable>_pattern_ops</literal>
operator classes. It is possible, however, to create multiple
indexes on the same column with different operator classes.
</para>
</listitem>
</itemizedlist>
</para>

View File

@ -1,5 +1,5 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.187 2003/05/14 03:25:59 tgl Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.188 2003/05/15 15:50:18 petere Exp $
-->
<appendix id="release">
@ -24,6 +24,7 @@ CDATA means the content is "SGML-free", so you can write without
worries about funny characters.
-->
<literallayout><![CDATA[
Pattern matching operations can use indexes regardless of locale
New frontend/backend protocol supports many long-requested features
SET AUTOCOMMIT TO OFF is no longer supported
Reimplementation of NUMERIC datatype for more speed

View File

@ -1,5 +1,5 @@
<!--
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.179 2003/05/14 03:26:00 tgl Exp $
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.180 2003/05/15 15:50:18 petere Exp $
-->
<Chapter Id="runtime">
@ -133,26 +133,13 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput>
</para>
<para>
<command>initdb</command> also initializes the default locale<indexterm><primary>locale</></> for
the database cluster. Normally, it will just take the locale
settings in the environment and apply them to the initialized
database. It is possible to specify a different locale for the
database; more information about that can be found in <xref
linkend="locale">. One surprise you might encounter while running
<command>initdb</command> is a notice similar to this:
<screen>
The database cluster will be initialized with locale de_DE.
This locale setting will prevent the use of indexes for pattern matching
operations. If that is a concern, rerun initdb with the collation order
set to "C". For more information see the documentation.
</screen>
This is intended to warn you that the currently selected locale
will cause indexes to be sorted in an order that prevents them from
being used for <literal>LIKE</> and regular-expression searches. If you need
good performance in such searches, you should set your current
locale to <literal>C</> and re-run <command>initdb</command>, e.g.,
by running <literal>initdb --lc-collate=C</literal>. The sort
order used within a particular database cluster is set by
<command>initdb</command> also initializes the default
locale<indexterm><primary>locale</></> for the database cluster.
Normally, it will just take the locale settings in the environment
and apply them to the initialized database. It is possible to
specify a different locale for the database; more information about
that can be found in <xref linkend="locale">. The sort order used
within a particular database cluster is set by
<command>initdb</command> and cannot be changed later, short of
dumping all data, rerunning <command>initdb</command>, and
reloading the data. So it's important to make this choice correctly