Subject: [PATCHES] Re: [PORTS] AIX 6.1 fixes...
Here are the patches for the two things that wouldn't make it thru the AIX
compiler. The geo_ops.c change is harmless I believe. The nbtcompare.c patch
fixes me, but I don't know about any other ports. Maybe wait on that one
until Vadim decides what to do about the unsigned vs signed chars varlena
issue.
when btree used in innerscan with run-time key which value
passed by pointer.
Fix: keys ordering stuff moved to _bt_first().
Pointed by Thomas Lockhart.
OK, here are a passel of patches for the geometric data types.
These add a "circle" data type, new operators and functions
for the existing data types, and change the default formats
for some of the existing types to make them consistant with
each other. Current formatting conventions (e.g. compatible
with v6.0 to allow dump/reload) are supported, but the new
conventions should be an improvement and we can eventually
drop the old conventions entirely.
For example, there are two kinds of paths (connected line segments),
open and closed, and the old format was
'(1,2,1,2,3,4)' for a closed path with two points (1,2) and (3,4)
'(0,2,1,2,3,4)' for an open path with two points (1,2) and (3,4)
Pretty arcane, huh? The new format for paths is
'((1,2),(3,4))' for a closed path with two points (1,2) and (3,4)
'[(1,2),(3,4)]' for an open path with two points (1,2) and (3,4)
For polygons, the old convention is
'(0,4,2,0,4,3)' for a triangle with points at (0,0),(4,4), and (2,3)
and the new convention is
'((0,0),(4,4),(2,3))' for a triangle with points at (0,0),(4,4), and (2,3)
Other data types which are also represented as lists of points
(e.g. boxes, line segments, and polygons) have similar representations
(they surround each point with parens).
For v6.1, any format which can be interpreted as the old style format
is decoded as such; we can remove that backwards compatibility but ugly
convention for v7.0. This will allow dump/reloads from v6.0.
These include some updates to the regression test files to change the test
for creating a data type from "circle" to "widget" to keep the test from
trashing the new builtin circle type.
index tuple (logical position within A LEVEL). bti_oid & bti_dummy
taken off from BTItemData.
2. Fix for multi-column indices (nbtsearch.c):
_bt_binsrch() - for searches on internal pages having keysize <
number of attrs we point at the last item < the scankey, not at the
first item = the scankey;
_bt_moveright() - if keysize < number of attrs we compare scankey with
_last_ item on current page to decide should we move right or
not.
Reply-To: hackers@hub.org, Dan McGuirk <mcguirk@indirect.com>
To: hackers@hub.org
Subject: [HACKERS] tmin writeback optimization
I was doing some profiling of the backend, and noticed that during a certain
benchmark I was running somewhere between 30% and 75% of the backend's CPU
time was being spent in calls to TransactionIdDidCommit() from
HeapTupleSatisfiesNow() or HeapTupleSatisfiesItself() to determine that
changed rows' transactions had in fact been committed even though the rows'
tmin values had not yet been set.
When a query looks at a given row, it needs to figure out whether the
transaction that changed the row has been committed and hence it should pay
attention to the row, or whether on the other hand the transaction is still
in progress or has been aborted and hence the row should be ignored. If
a tmin value is set, it is known definitively that the row's transaction
has been committed. However, if tmin is not set, the transaction
referred to in xmin must be looked up in pg_log, and this is what the
backend was spending a lot of time doing during my benchmark.
So, implementing a method suggested by Vadim, I created the following
patch that, the first time a query finds a committed row whose tmin value
is not set, sets it, and marks the buffer where the row is stored as
dirty. (It works for tmax, too.) This doesn't result in the boost in
real time performance I was hoping for, however it does decrease backend
CPU usage by up to two-thirds in certain situations, so it could be
rather beneficial in high-concurrency settings.
Actually required by multi-column indices support.
We still don't use btree for 'A is (not) null', but
now btree keep items with NULL attrs using single rule
for placing/finding items on pages:
NULLs greater NOT_NULLs and NULL = NULL.
+ Bulkload code (nbtsort.c) support for multi-column indices
building and NULLs.
+ Fix for btendscan()->pfree(scanopaque) from Chris Dunlop.
Subject: [HACKERS] linux/alpha patches
These patches lay the groundwork for a Linux/Alpha port. The port doesn't
actually work unless you tweak the linker to put all the pointers in the
first 32 bits of the address space, but it's at least a start. It
implements the test-and-set instruction in Alpha assembly, and also fixes
a lot of pointer-to-integer conversions, which is probably good anyway.
Subject: [HACKERS] abort failed transaction patch
This patch allows you to end a transaction that has failed on an error
using the 'ABORT' statement without generating another error message.
(By default you get an error unless you use 'END' to terminate the
transaction, which has already been aborted anyway.)
Patches from: aoki@CS.Berkeley.EDU (Paul M. Aoki)
i gave jolly my btree bulkload code a long, long time ago but never
gave him a bunch of my bugfixes. here's a diff against the 6.0
baseline.
for some reason, this code has slowed down somewhat relative to the
insertion-build code on very small tables. don't know why -- it used
to be within about 10%. anyway, here are some (highly unscientific!)
timings on a dec 3000/300 for synthetic tables with 10k, 100k and
1000k tuples (basically, 1mb, 10mb and 100mb heaps). 'c' means
clustered (pre-sorted) inputs and 'u' means unclustered (randomly
ordered) inputs. the 10k table basically fits in the buffer pool, but
the 100k and 1000k tables don't. as you can see, insertion build is
fine if you've sorted your heaps on your index key or if your heap
fits in core, but is absolutely horrible on unordered data (yes,
that's 7.5 hours to index 100mb of data...) because of the zillions of
random i/os.
if it doesn't work for you for whatever reason, you can always turn it
back off by flipping the FastBuild flag in nbtree.c. i don't have
time to maintain it.
good luck!
baseline code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 8.6
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 9.1
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.2
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 652.4
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.1
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 26772.9
bulkloading code:
time psql -c 'create index c10 on k10 using btree (c int4_ops)' bttest
real 11.3
time psql -c 'create index u10 on k10 using btree (b int4_ops)' bttest
real 10.4
time psql -c 'create index c100 on k100 using btree (c int4_ops)' bttest
real 59.5
time psql -c 'create index u100 on k100 using btree (b int4_ops)' bttest
real 63.5
time psql -c 'create index c1000 on k1000 using btree (c int4_ops)' bttest
real 636.9
time psql -c 'create index u1000 on k1000 using btree (b int4_ops)' bttest
real 701.0
As an example I sent a bug-report on 26 Nov to tell that the fix included
below is necessary to compile pg95-current on Ultrix with Digital's
standard C compiler c89. In fact I think that this fix is needed
for any C compiler sticking very close the standard, see my discussion
in the original bug report.
Erik Bertelsen
cc1: warnings being treated as errors
transsup.c: In function `TransBlockGetLastTransactionIdStatus':
transsup.c:122: warning: unsigned value >= 0 is always 1
gmake[3]: *** [transsup.o] Error 1
...
(old _bt_compare always returned >= 0 while comparing with P_HIKEY
on root page - it breaks root page when _bt_insertonpg tries insert
new minimal key into root page).
2. Fixed bug concerns "empty" pages: non-rightmost pages with only P_HIKEY
present on it. Such pages appear after vacuum.