var_samp(), stddev_pop(), and stddev_samp(). var_samp() and stddev_samp()
are just renamings of the historical Postgres aggregates variance() and
stddev() -- the latter names have been kept for backward compatibility.
This patch includes updates for the documentation and regression tests.
The catversion has been bumped.
NB: SQL2003 requires that DISTINCT not be specified for any of these
aggregates. Per discussion on -patches, I have NOT implemented this
restriction: if the user asks for stddev(DISTINCT x), presumably they
know what they are doing.
the format on Tuple(Numeric) and the format to calculate(NumericVar)
are different. I understood that to reduce I/O. However, when many
comparisons or calculations of NUMERIC are executed, the conversion
of Numeric and NumericVar becomes a bottleneck.
It is profile result when "create index on NUMERIC column" is executed:
% cumulative self self total
time seconds seconds calls s/call s/call name
17.61 10.27 10.27 34542006 0.00 0.00 cmp_numerics
11.90 17.21 6.94 34542006 0.00 0.00 comparetup_index
7.42 21.54 4.33 71102587 0.00 0.00 AllocSetAlloc
7.02 25.64 4.09 69084012 0.00 0.00 set_var_from_num
4.87 28.48 2.84 69084012 0.00 0.00 alloc_var
4.79 31.27 2.79 142205745 0.00 0.00 AllocSetFreeIndex
4.55 33.92 2.65 34542004 0.00 0.00 cmp_abs
4.07 36.30 2.38 71101189 0.00 0.00 AllocSetFree
3.83 38.53 2.23 69084012 0.00 0.00 free_var
The create index command executes many comparisons of Numeric values.
Functions other than comparetup_index spent a lot of cycles for
conversion from Numeric to NumericVar.
An attached patch enables the comparison of Numeric values without
executing conversion to NumericVar. The execution time of that SQL
becomes half.
o Test SQL (index_test table has 1,000,000 tuples)
create index index_test_idx on index_test(num_col);
o Test results (executed the test five times)
(1)PentiumIII
original: 39.789s 36.823s 36.737s 37.752s 37.019s
patched : 18.560s 19.103s 18.830s 18.408s 18.853s
4.07 36.30 2.38 71101189 0.00 0.00 AllocSetFree
3.83 38.53 2.23 69084012 0.00 0.00 free_var
The create index command executes many comparisons of Numeric values.
Functions other than comparetup_index spent a lot of cycles for
conversion from Numeric to NumericVar.
An attached patch enables the comparison of Numeric values without
executing conversion to NumericVar. The execution time of that SQL
becomes half.
o Test SQL (index_test table has 1,000,000 tuples)
create index index_test_idx on index_test(num_col);
o Test results (executed the test five times)
(1)PentiumIII
original: 39.789s 36.823s 36.737s 37.752s 37.019s
patched : 18.560s 19.103s 18.830s 18.408s 18.853s
(2)Pentium4
original: 16.349s 14.997s 12.979s 13.169s 12.955s
patched : 7.005s 6.594s 6.770s 6.740s 6.828s
(3)Itanium2
original: 15.392s 15.447s 15.350s 15.370s 15.417s
patched : 7.413s 7.330s 7.334s 7.339s 7.339s
(4)Ultra Sparc
original: 64.435s 59.336s 59.332s 58.455s 59.781s
patched : 28.630s 28.666s 28.983s 28.744s 28.595s
Atsushi Ogawa
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe. Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
optional arguments as text input functions, ie, typioparam OID and
atttypmod. Make all the datatypes that use typmod enforce it the same
way in typreceive as they do in typinput. This fixes a problem with
failure to enforce length restrictions during COPY FROM BINARY.
functions. This patch optimizes int2_sum(), int4_sum(), float4_accum()
and float8_accum() to avoid needing to copy the transition function's
state for each input tuple of the aggregate. In an extreme case
(e.g. SELECT sum(int2_col) FROM table where table has a single column),
it improves performance by about 20%. For more complex queries or tables
with wider rows, the relative performance improvement will not be as
significant.
performance hack Tom introduced recently. This means we can avoid
copying the transition array for each input tuple if these functions
are invoked as aggregate transition functions.
To test the performance improvement, I created a 1 million row table
with a single int4 column. Without the patch, SELECT avg(col) FROM
table took about 4.2 seconds (after the data was cached); with the
patch, it took about 3.2 seconds. Naturally, the performance
improvement for a less trivial query (or a table with wider rows)
would be relatively smaller.
bigint variants). Clean up some inconsistencies in error message wording.
Fix scanint8 to allow trailing whitespace in INT64_MIN case. Update
int8-exp-three-digits.out, which seems to have been ignored by the last
couple of people to modify the int8 regression test, and remove
int8-exp-three-digits-win32.out which is thereby exposed as redundant.
a variant of the function for the 'numeric' datatype; it would be possible
to add additional variants for other datatypes, but I haven't done so yet.
This commit includes regression tests and minimal documentation; if we
want developers to actually use this function in applications, we'll
probably need to document what it does more fully.
conversion of basic ASCII letters. Remove all uses of strcasecmp and
strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp;
remove most but not all direct uses of toupper and tolower in favor of
pg_toupper and pg_tolower. These functions use the same notions of
case folding already developed for identifier case conversion. I left
the straight locale-based folding in place for situations where we are
just manipulating user data and not trying to match it to built-in
strings --- for example, the SQL upper() function is still locale
dependent. Perhaps this will prove not to be what's wanted, but at
the moment we can initdb and pass regression tests in Turkish locale.
is asked to assign a variable to itself, it will result in doing a
memcpy() on an entirely-overlapping memory range, which results in
undefined behavior according to ANSI C. That said, it is unlikely to
actually do anything bad on any sane libc, but this keeps valgrind quiet.
to note:
1) arttype is numeric. I thought this was the best way of allowing
arbitarily large factorials, even though factorial(2^63) is a large
number. Happy to change to integers if this is overkill.
2) since we're accepting numeric arguments, the patch tests for floats.
If a numeric is passed with non-zero decimal portion, an error is raised
since (from memory) they are undefined.
Gavin Sherry
float8smaller, float8larger (and thereby the MIN/MAX aggregates on these
datatypes) to agree with the datatypes' comparison operations as
regards NaN handling. In all these datatypes, NaN is arbitrarily
considered larger than any normal value ... but MIN/MAX had not gotten
the word. Per recent discussion on pgsql-sql.
some of the algorithms for higher functions. I see about a factor of ten
speedup on the 'numeric' regression test, but it's unlikely that that test
is representative of real-world applications.
initdb forced due to change of on-disk representation for NUMERIC.
division and modulo functions, to avoid problems on OS X (which fails to
trap 0 divide at all) and Windows (which traps it in some bizarre
nonstandard fashion). Standardize on 'division by zero' as the one true
spelling of this error message. Add regression tests as suggested by
Neil Conway.
specifically ceil(), floor(), and sign(). There may be other functions
that need to be added, but this is a start. I've included some simple
regression tests.
Neil Conway
so that precision of result is always at least as good as you'd get from
float8 arithmetic (ie, always at least 16 digits of accuracy). Per
pg_hackers discussion a few days ago.
to be flexible about assignment casts without introducing ambiguity in
operator/function resolution. Introduce a well-defined promotion hierarchy
for numeric datatypes (int2->int4->int8->numeric->float4->float8).
Change make_const to initially label numeric literals as int4, int8, or
numeric (never float8 anymore).
Explicitly mark Func and RelabelType nodes to indicate whether they came
from a function call, explicit cast, or implicit cast; use this to do
reverse-listing more accurately and without so many heuristics.
Explicit casts to char, varchar, bit, varbit will truncate or pad without
raising an error (the pre-7.2 behavior), while assigning to a column without
any explicit cast will still raise an error for wrong-length data like 7.3.
This more nearly follows the SQL spec than 7.2 behavior (we should be
reporting a 'completion condition' in the explicit-cast cases, but we have
no mechanism for that, so just do silent truncation).
Fix some problems with enforcement of typmod for array elements;
it didn't work at all in 'UPDATE ... SET array[n] = foo', for example.
Provide a generalized array_length_coerce() function to replace the
specialized per-array-type functions that used to be needed (and were
missing for NUMERIC as well as all the datetime types).
Add missing conversions int8<->float4, text<->numeric, oid<->int8.
initdb forced.
array header, and to compute sizing and alignment of array elements
the same way normal tuple access operations do --- viz, using the
tupmacs.h macros att_addlength and att_align. This makes the world
safe for arrays of cstrings or intervals, and should make it much
easier to write array-type-polymorphic functions; as examples see
the cleanups of array_out and contrib/array_iterator. By Joe Conway
and Tom Lane.