|
|
|
@@ -538,64 +538,69 @@
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
@page secConcepts NDB Cluster Concepts
|
|
|
|
|
@page secConcepts MySQL Cluster Concepts
|
|
|
|
|
|
|
|
|
|
The <em>NDB Kernel</em> is the collection of storage nodes
|
|
|
|
|
belonging to an NDB Cluster.
|
|
|
|
|
belonging to a MySQL Cluster.
|
|
|
|
|
The application programmer can for most purposes view the
|
|
|
|
|
set of all DB nodes as one entity.
|
|
|
|
|
Each DB node has three main components:
|
|
|
|
|
- TC : The transaction coordinator
|
|
|
|
|
- ACC : The index storage
|
|
|
|
|
- TUP : The data storage
|
|
|
|
|
set of all storage nodes as a single entity.
|
|
|
|
|
Each storage node is made up of three main components:
|
|
|
|
|
- TC : The transaction co-ordinator
|
|
|
|
|
- ACC : Index storage component
|
|
|
|
|
- TUP : Data storage component
|
|
|
|
|
|
|
|
|
|
When the application program executes a transaction,
|
|
|
|
|
it connects to one TC on one DB node.
|
|
|
|
|
Usually, the programmer does not need to specify which TC to use,
|
|
|
|
|
but some cases when performance is important,
|
|
|
|
|
transactions can be hinted to use a certain TC.
|
|
|
|
|
(If the node with the TC is down, then another TC will
|
|
|
|
|
When an application program executes a transaction,
|
|
|
|
|
it connects to one transaction co-ordinator on one storage node.
|
|
|
|
|
Usually, the programmer does not need to specify which TC should be used,
|
|
|
|
|
but in some cases when performance is important, the programmer can
|
|
|
|
|
provide "hints" to use a certain TC.
|
|
|
|
|
(If the node with the desired transaction co-ordinator is down, then another TC will
|
|
|
|
|
automatically take over the work.)
|
|
|
|
|
|
|
|
|
|
Every DB node has an ACC and a TUP which stores
|
|
|
|
|
the index and the data part of the database.
|
|
|
|
|
Every storage node has an ACC and a TUP which store
|
|
|
|
|
the indexes and data portions of the database table fragment.
|
|
|
|
|
Even though one TC is responsible for the transaction,
|
|
|
|
|
several ACCs and TUPs on other DB nodes might be involved in the
|
|
|
|
|
several ACCs and TUPs on other storage nodes might be involved in the
|
|
|
|
|
execution of the transaction.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@section secNdbKernelConnection Selecting Transaction Coordinator
|
|
|
|
|
@section secNdbKernelConnection Selecting a Transaction Co-ordinator
|
|
|
|
|
|
|
|
|
|
The default method is to select the transaction coordinator (TC) as being
|
|
|
|
|
the "closest" DB node. There is a heuristics for closeness based on
|
|
|
|
|
the type of transporter connection. In order of closest first, we have
|
|
|
|
|
SCI, SHM, TCP/IP (localhost), and TCP/IP (remote host). If there are several
|
|
|
|
|
connections available with the same "closeness", they will each be
|
|
|
|
|
The default method is to select the transaction co-ordinator (TC) determined to be
|
|
|
|
|
the "closest" storage node, using a heuristic for proximity based on
|
|
|
|
|
the type of transporter connection. In order of closest to most distant, these are
|
|
|
|
|
- SCI
|
|
|
|
|
- SHM
|
|
|
|
|
- TCP/IP (localhost)
|
|
|
|
|
- TCP/IP (remote host)
|
|
|
|
|
If there are several connections available with the same proximity, they will each be
|
|
|
|
|
selected in a round robin fashion for every transaction. Optionally
|
|
|
|
|
one may set the methos for TC selection round robin over all available
|
|
|
|
|
connections, where each new set of transactions
|
|
|
|
|
is placed on the next DB node.
|
|
|
|
|
one may set the method for TC selection to round-robin mode, where each new set of
|
|
|
|
|
transactions is placed on the next DB node. The pool of connections from which this
|
|
|
|
|
selection is made consists of all available connections.
|
|
|
|
|
|
|
|
|
|
The application programmer can however hint the NDB API which
|
|
|
|
|
transaction coordinator to use
|
|
|
|
|
by providing a <em>partition key</em> (usually the primary key).
|
|
|
|
|
By using the primary key as partition key,
|
|
|
|
|
As noted previously, the application programmer can provide hints to the NDB API as to
|
|
|
|
|
which transaction co-ordinator it should use. This is done by
|
|
|
|
|
providing a <em>partition key</em> (usually the primary key).
|
|
|
|
|
By using the primary key as the partition key,
|
|
|
|
|
the transaction will be placed on the node where the primary replica
|
|
|
|
|
of that record resides.
|
|
|
|
|
Note that this is only a hint, the system can be
|
|
|
|
|
reconfigured and then the NDB API will choose a transaction
|
|
|
|
|
coordinator without using the hint.
|
|
|
|
|
For more information, see NdbDictionary::Column::getPartitionKey(),
|
|
|
|
|
Note that this is only a hint; the system can be
|
|
|
|
|
reconfigured at any time, in which case the NDB API will choose a transaction
|
|
|
|
|
co-ordinator without using the hint.
|
|
|
|
|
For more information, see NdbDictionary::Column::getPartitionKey() and
|
|
|
|
|
Ndb::startTransaction(). The application programmer can specify
|
|
|
|
|
the partition key from SQL by using the construct,
|
|
|
|
|
"CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (<attribute list>)".
|
|
|
|
|
<code>CREATE TABLE ... ENGINE=NDB PARTITION BY KEY (<var>attribute-list</var>);</code>.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@section secRecordStruct Record Structure
|
|
|
|
|
NDB Cluster is a relational database with tables of records.
|
|
|
|
|
Table rows represent tuples of relational data stored as records.
|
|
|
|
|
When created, the attribute schema of the table is specified,
|
|
|
|
|
and thus each record of the table has the same schema.
|
|
|
|
|
@section secRecordStruct NDB Record Structure
|
|
|
|
|
The NDB Cluster engine used by MySQL Cluster is a relational database engine
|
|
|
|
|
storing records in tables just as with any other RDBMS.
|
|
|
|
|
Table rows represent records as tuples of relational data.
|
|
|
|
|
When a new table is created, its attribute schema is specified for the table as a whole,
|
|
|
|
|
and thus each record of the table has the same structure. Again, this is typical
|
|
|
|
|
of relational databases, and NDB is no different in this regard.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@subsection secKeys Primary Keys
|
|
|
|
@@ -604,14 +609,14 @@
|
|
|
|
|
|
|
|
|
|
@section secTrans Transactions
|
|
|
|
|
|
|
|
|
|
Transactions are committed to main memory,
|
|
|
|
|
and are committed to disk after a global checkpoint, GCP.
|
|
|
|
|
Transactions are committed first to main memory,
|
|
|
|
|
and then to disk after a global checkpoint (GCP) is issued.
|
|
|
|
|
Since all data is (in most NDB Cluster configurations)
|
|
|
|
|
synchronously replicated and stored on multiple NDB nodes,
|
|
|
|
|
the system can still handle processor failures without loss
|
|
|
|
|
of data.
|
|
|
|
|
However, in the case of a system failure (e.g. the whole system goes down),
|
|
|
|
|
then all (committed or not) transactions after the latest GCP are lost.
|
|
|
|
|
then all (committed or not) transactions occurring since the latest GCP are lost.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@subsection secConcur Concurrency Control
|
|
|
|
@@ -620,39 +625,38 @@
|
|
|
|
|
cannot be attained within a specified time,
|
|
|
|
|
then a timeout error occurs.
|
|
|
|
|
|
|
|
|
|
Concurrent transactions (parallel application programs, thread-based
|
|
|
|
|
applications)
|
|
|
|
|
sometimes deadlock when they try to access the same information.
|
|
|
|
|
Applications need to be programmed so that timeout errors
|
|
|
|
|
occurring due to deadlocks are handled. This generally
|
|
|
|
|
means that the transaction encountering timeout
|
|
|
|
|
should be rolled back and restarted.
|
|
|
|
|
Concurrent transactions as requested by parallel application programs and
|
|
|
|
|
thread-based applications can sometimes deadlock when they try to access
|
|
|
|
|
the same information simultaneously.
|
|
|
|
|
Thus, applications need to be written in a manner so that timeout errors
|
|
|
|
|
occurring due to such deadlocks are handled gracefully. This generally
|
|
|
|
|
means that the transaction encountering a timeout should be rolled back
|
|
|
|
|
and restarted.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@section secHint Hints and performance
|
|
|
|
|
@section secHint Hints and Performance
|
|
|
|
|
|
|
|
|
|
Placing the transaction coordinator close
|
|
|
|
|
Placing the transaction co-ordinator in close proximity
|
|
|
|
|
to the actual data used in the transaction can in many cases
|
|
|
|
|
improve performance significantly. This is particularly true for
|
|
|
|
|
systems using TCP/IP. A system using Solaris and a 500 MHz processor
|
|
|
|
|
has a cost model for TCP/IP communication which is:
|
|
|
|
|
systems using TCP/IP. For example, a Solaris system using a single 500 MHz processor
|
|
|
|
|
has a cost model for TCP/IP communication which can be represented by the formula
|
|
|
|
|
|
|
|
|
|
30 microseconds + (100 nanoseconds * no of Bytes)
|
|
|
|
|
<code>[30 microseconds] + ([100 nanoseconds] * [<var>number of bytes</var>])</code>
|
|
|
|
|
|
|
|
|
|
This means that if we can ensure that we use "popular" links we increase
|
|
|
|
|
buffering and thus drastically reduce the communication cost.
|
|
|
|
|
Systems using SCI has a different cost model which is:
|
|
|
|
|
The same system using SCI has a different cost model:
|
|
|
|
|
|
|
|
|
|
5 microseconds + (10 nanoseconds * no of Bytes)
|
|
|
|
|
<code>[5 microseconds] + ([10 nanoseconds] * [<var>number of bytes</var>])</code>
|
|
|
|
|
|
|
|
|
|
Thus SCI systems are much less dependent on selection of
|
|
|
|
|
transaction coordinators.
|
|
|
|
|
Typically TCP/IP systems spend 30-60% of the time during communication,
|
|
|
|
|
whereas SCI systems typically spend 5-10% of the time during
|
|
|
|
|
communication.
|
|
|
|
|
Thus SCI means that less care from the NDB API programmer is
|
|
|
|
|
needed and great scalability can be achieved even for applications using
|
|
|
|
|
data from many parts of the database.
|
|
|
|
|
Thus, the efficiency of an SCI system is much less dependent on selection of
|
|
|
|
|
transaction co-ordinators.
|
|
|
|
|
Typically, TCP/IP systems spend 30-60% of their working time on communication,
|
|
|
|
|
whereas for SCI systems this figure is closer to 5-10%.
|
|
|
|
|
Thus, employing SCI for data transport means that less care from the NDB API
|
|
|
|
|
programmer is required and greater scalability can be achieved, even for
|
|
|
|
|
applications using data from many different parts of the database.
|
|
|
|
|
|
|
|
|
|
A simple example is an application that uses many simple updates where
|
|
|
|
|
a transaction needs to update one record.
|
|
|
|
@@ -928,11 +932,11 @@
|
|
|
|
|
i.e. get/setValue("kalle[3]");
|
|
|
|
|
|
|
|
|
|
@subsection secArrays Array Attributes
|
|
|
|
|
A table attribute in NDB Cluster can be of <em>array type</em>.
|
|
|
|
|
This means that the attribute consists of an array of
|
|
|
|
|
<em>elements</em>. The <em>attribute size</em> is the size
|
|
|
|
|
of one element of the array (expressed in bits) and the
|
|
|
|
|
<em>array size</em> is the number of elements of the array.
|
|
|
|
|
A table attribute in NDB Cluster can be of type <var>Array</var>,
|
|
|
|
|
meaning that the attribute consists of an ordered sequence of
|
|
|
|
|
elements. In such cases, <var>attribute size</var> is the size
|
|
|
|
|
(expressed in bits) of any one element making up the array; the
|
|
|
|
|
<var>array size</var> is the number of elements in the array.
|
|
|
|
|
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
@@ -1051,16 +1055,16 @@ public:
|
|
|
|
|
/**
|
|
|
|
|
* The Ndb object represents a connection to a database.
|
|
|
|
|
*
|
|
|
|
|
* @note the init() method must be called before it may be used
|
|
|
|
|
* @note The init() method must be called before the Ndb object may actually be used.
|
|
|
|
|
*
|
|
|
|
|
* @param ndb_cluster_connection is a connection to a cluster containing
|
|
|
|
|
* @param ndb_cluster_connection is a connection to the cluster containing
|
|
|
|
|
* the database to be used
|
|
|
|
|
* @param aCatalogName is the name of the catalog you want to use.
|
|
|
|
|
* @note The catalog name provides a name space for the tables and
|
|
|
|
|
* @param aCatalogName is the name of the catalog to be used.
|
|
|
|
|
* @note The catalog name provides a namespace for the tables and
|
|
|
|
|
* indexes created in any connection from the Ndb object.
|
|
|
|
|
* @param aSchemaName is the name of the schema you
|
|
|
|
|
* want to use.
|
|
|
|
|
* @note The schema name provides an additional name space
|
|
|
|
|
* @note The schema name provides an additional namespace
|
|
|
|
|
* for the tables and indexes created in a given catalog.
|
|
|
|
|
*/
|
|
|
|
|
Ndb(Ndb_cluster_connection *ndb_cluster_connection,
|
|
|
|
|