var_samp(), stddev_pop(), and stddev_samp(). var_samp() and stddev_samp()
are just renamings of the historical Postgres aggregates variance() and
stddev() -- the latter names have been kept for backward compatibility.
This patch includes updates for the documentation and regression tests.
The catversion has been bumped.
NB: SQL2003 requires that DISTINCT not be specified for any of these
aggregates. Per discussion on -patches, I have NOT implemented this
restriction: if the user asks for stddev(DISTINCT x), presumably they
know what they are doing.
performance issue during regular merge passes not only the 'final merge'
case. The original design contemplated that there would never be more
than about one free block per 'tape', hence no need for an efficient
method of keeping the free blocks sorted. But given the later addition
of merge preread behavior in tuplesort.c, there is likely to be about
work_mem worth of free blocks, which is not so small ... and for that
matter the number of tapes isn't necessarily small anymore either. So
we'd better get rid of the assumption entirely. Instead, I'm assuming
that the usage pattern will involve alternation between merge preread
and writing of a new run. This makes it reasonable to just add blocks
to the list without sorting during successive ltsReleaseBlock calls,
and then do a qsort() when we start getting ltsGetFreeBlock() calls.
Experimentation seems to confirm that there aren't many qsort calls
relative to the number of ltsReleaseBlock/ltsGetFreeBlock calls.
we are doing the final merge pass on-the-fly, and not writing the data
back onto a 'tape', the number of free blocks in the tape set will become
large, leading to a lot of time wasted in ltsReleaseBlock(). There is
really no need to track the free blocks anymore in this state, so add a
simple shutoff switch. Per report from Stefan Kaltenbrunner.
(respectively) to rename yylex and related symbols. Some were doing
it this way already, while others used not-too-reliable sed hacks in
the Makefiles. It's all nice and consistent now.
not likely ever to be implemented seeing it's been removed from SQL2003.
This allows getting rid of the 'filter' version of yylex() that we had in
parser.c, which should save at least a few microseconds in parsing.
- new function justify_interval(interval)
- modified function justify_hours(interval)
- modified function justify_days(interval)
These functions are defined to meet the requirements as discussed in
this thread. Specifically:
- justify_hours makes certain the sign bit on the hours
matches the sign bit on the days. It only checks the
sign bit on the days, and not the months, when
determining if the hours should be positive or negative.
After the call, -24 < hours < 24.
- justify_days makes certain the sign bit on the days
matches the sign bit on the months. It's behavior does
not depend on the hours, nor does it modify the hours.
After the call, -30 < days < 30.
- justify_interval makes sure the sign bits on all three
fields months, days, and hours are all the same. After
the call, -24 < hours < 24 AND -30 < days < 30.
Mark Dilger
In particular, ensure that enlargement of the memtuples[] array doesn't
fall foul of MaxAllocSize when work_mem is very large, and don't bother
enlarging it if that would force an immediate switch into 'tape' mode anyway.
we'll go over to disk-based sort if we reach that limit.
This fixes Stefan Kaltenbrunner's observation that sorting can suffer an
'invalid memory alloc request size' failure when sort_mem is set large
enough. It's unfortunately not so easy to fix in 8.1 ...
are unnecessarily allocated on the heap rather than the stack. If the
StringInfo doesn't outlive the stack frame in which it is created,
there is no need to allocate it on the heap via makeStringInfo() --
stack allocation is faster. While it's not a big deal unless the
code is in a critical path, I don't see a reason not to save a few
cycles -- using stack allocation is not less readable.
I also cleaned up a bit of code along the way: moved variable
declarations into a more tightly-enclosing scope where possible,
fixed some pointless copying of strings in dblink, etc.
creation of a shell type. This allows a less hacky way of dealing with
the mutual dependency between a datatype and its I/O functions: make a
shell type, then make the functions, then define the datatype fully.
We should fix pg_dump to handle things this way, but this commit just deals
with the backend.
Martijn van Oosterhout, with some corrections by Tom Lane.
each tuple, as per my proposal of several days ago. Also, clean up
sort memory management by keeping all working data in a separate memory
context, and refine the handling of low-memory conditions.
the script is not executable as UCS_to_most.pl is in CVS. It also won't
pick up any custom setting of the perl version/location to use. This
patch calls perl scripts like $(PERL) $(srcdir)/script.pl.
Kris Jurka
allocates the control data. The per-tape buffers are allocated only
on first use. This saves memory in situations where tuplesort.c
overestimates the number of tapes needed (ie, there are fewer runs
than tapes). Also, this makes legitimate the coding in inittapes()
that includes tape buffer space in the maximum-memory calculation:
when inittapes runs, we've already expended the whole allowed memory
on tuple storage, and so we'd better not allocate all the tape buffers
until we've flushed some tuples out of memory.
with fixed merge order (fixed number of "tapes") was based on obsolete
assumptions, namely that tape drives are expensive. Since our "tapes"
are really just a couple of buffers, we can have a lot of them given
adequate workspace. This allows reduction of the number of merge passes
with consequent savings of I/O during large sorts.
Simon Riggs with some rework by Tom Lane
up a bunch of the support utilities.
In src/backend/utils/mb/Unicode remove nearly duplicate copies of the
UCS_to_XXX perl script and replace with one version to handle all generic
files. Update the Makefile so that it knows about all the map files.
This produces a slight difference in some of the map files, using a
uniform naming convention and not mapping the null character.
In src/backend/utils/mb/conversion_procs create a master utf8<->win
codepage function like the ISO 8859 versions instead of having a separate
handler for each conversion.
There is an externally visible change in the name of the win1258 to utf8
conversion. According to the documentation notes, it was named
incorrectly and this changes it to a standard name.
Running the Unicode mapping perl scripts has shown some additional mapping
changes in koi8r and iso8859-7.
id (CVE-2006-0553). Also fix related bug in SET SESSION AUTHORIZATION that
allows unprivileged users to crash the server, if it has been compiled with
Asserts enabled. The escalation-of-privilege risk exists only in 8.1.0-8.1.2.
However, the Assert-crash risk exists in all releases back to 7.3.
Thanks to Akio Ishida for reporting this problem.
regardless of the current schema search path. Since CREATE OPERATOR CLASS
only allows one default opclass per datatype regardless of schemas, this
should have minimal impact, and it fixes problems with failure to find a
desired opclass while restoring dump files. Per discussion at
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00284.php.
Remove now-redundant-or-unused code in typcache.c and namespace.c,
and backpatch as far as 8.0.
If the second output column value is 'a\nb', the 'b' should appear
in the second display column, rather than the first column as it
does now.
Change libpq's PQdsplen() to return more useful values.
> Note: this changes the PQdsplen function, it can now return zero or
> minus one which was not possible before. It doesn't appear anyone is
> actually using the functions other than psql but it is a change. The
> functions are not actually documentated anywhere so it's not like we're
> breaking a defined interface. The new semantics follow the Unicode
> standard.
BACKWARD COMPATIBLE CHANGE.
The only user-visible change I saw in the regression tests is that a
SELECT * on a table where all the columns have been dropped doesn't
return a blank line like before. This seems like a step forward.
Martijn van Oosterhout
the format on Tuple(Numeric) and the format to calculate(NumericVar)
are different. I understood that to reduce I/O. However, when many
comparisons or calculations of NUMERIC are executed, the conversion
of Numeric and NumericVar becomes a bottleneck.
It is profile result when "create index on NUMERIC column" is executed:
% cumulative self self total
time seconds seconds calls s/call s/call name
17.61 10.27 10.27 34542006 0.00 0.00 cmp_numerics
11.90 17.21 6.94 34542006 0.00 0.00 comparetup_index
7.42 21.54 4.33 71102587 0.00 0.00 AllocSetAlloc
7.02 25.64 4.09 69084012 0.00 0.00 set_var_from_num
4.87 28.48 2.84 69084012 0.00 0.00 alloc_var
4.79 31.27 2.79 142205745 0.00 0.00 AllocSetFreeIndex
4.55 33.92 2.65 34542004 0.00 0.00 cmp_abs
4.07 36.30 2.38 71101189 0.00 0.00 AllocSetFree
3.83 38.53 2.23 69084012 0.00 0.00 free_var
The create index command executes many comparisons of Numeric values.
Functions other than comparetup_index spent a lot of cycles for
conversion from Numeric to NumericVar.
An attached patch enables the comparison of Numeric values without
executing conversion to NumericVar. The execution time of that SQL
becomes half.
o Test SQL (index_test table has 1,000,000 tuples)
create index index_test_idx on index_test(num_col);
o Test results (executed the test five times)
(1)PentiumIII
original: 39.789s 36.823s 36.737s 37.752s 37.019s
patched : 18.560s 19.103s 18.830s 18.408s 18.853s
4.07 36.30 2.38 71101189 0.00 0.00 AllocSetFree
3.83 38.53 2.23 69084012 0.00 0.00 free_var
The create index command executes many comparisons of Numeric values.
Functions other than comparetup_index spent a lot of cycles for
conversion from Numeric to NumericVar.
An attached patch enables the comparison of Numeric values without
executing conversion to NumericVar. The execution time of that SQL
becomes half.
o Test SQL (index_test table has 1,000,000 tuples)
create index index_test_idx on index_test(num_col);
o Test results (executed the test five times)
(1)PentiumIII
original: 39.789s 36.823s 36.737s 37.752s 37.019s
patched : 18.560s 19.103s 18.830s 18.408s 18.853s
(2)Pentium4
original: 16.349s 14.997s 12.979s 13.169s 12.955s
patched : 7.005s 6.594s 6.770s 6.740s 6.828s
(3)Itanium2
original: 15.392s 15.447s 15.350s 15.370s 15.417s
patched : 7.413s 7.330s 7.334s 7.339s 7.339s
(4)Ultra Sparc
original: 64.435s 59.336s 59.332s 58.455s 59.781s
patched : 28.630s 28.666s 28.983s 28.744s 28.595s
Atsushi Ogawa
While we normally prefer the notation "foo.*" for a whole-row Var, that does
not work at SELECT top level, because in that context the parser will assume
that what is wanted is to expand the "*" into a list of separate target
columns, yielding behavior different from a whole-row Var. We have to emit
just "foo" instead in that context. Per report from Sokolov Yura.
and rely exclusively on the SQL type system to tell the difference between
the types. Prevent creation of invalid CIDR values via casting from INET
or set_masklen() --- both of these operations now silently zero any bits
to the right of the netmask. Remove duplicate CIDR comparison operators,
letting the type rely on the INET operators instead.