diff --git a/doc/src/sgml/acronyms.sgml b/doc/src/sgml/acronyms.sgml index 751c46de6d4..638ffc9fe83 100644 --- a/doc/src/sgml/acronyms.sgml +++ b/doc/src/sgml/acronyms.sgml @@ -369,6 +369,16 @@ + + JIT + + + Just-in-Time + compilation + + + + JSON diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 4d899e3b244..dc9ed22eb41 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -4136,6 +4136,62 @@ ANY num_sync ( + jit_above_cost (floating point) + + jit_above_cost configuration parameter + + + + + Sets the planner's cutoff above which JIT compilation is used as part + of query execution (see ). Performing + JIT costs time but can accelerate query execution. + + The default is 100000. + + + + + + jit_optimize_above_cost (floating point) + + jit_optimize_above_cost configuration parameter + + + + + Sets the planner's cutoff above which JIT compiled programs (see ) are optimized. Optimization initially + takes time, but can improve execution speed. It is not meaningful to + set this to a lower value than . + + The default is 500000. + + + + + + jit_inline_above_cost (floating point) + + jit_inline_above_cost configuration parameter + + + + + Sets the planner's cutoff above which JIT compiled programs (see ) attempt to inline functions and + operators. Inlining initially takes time, but can improve execution + speed. It is unlikely to be beneficial to set + jit_inline_above_cost below + jit_optimize_above_cost. + + The default is 500000. + + + + @@ -4418,6 +4474,23 @@ SELECT * FROM parent WHERE key = 2400; + + jit (boolean) + + jit configuration parameter + + + + + Determines whether JIT may be used by + PostgreSQL, if available (see ). + + The default is on. + + + + join_collapse_limit (integer) @@ -7412,6 +7485,29 @@ SET XML OPTION { DOCUMENT | CONTENT }; + + + jit_provider (string) + + jit_provider configuration parameter + + + + + Determines which JIT provider (see ) is + used. The built-in default is llvmjit. + + + If set to a non-existent library JIT will not + available, but no error will be raised. This allows JIT support to be + installed separately from the main + PostgreSQL package. + + This parameter can only be set at server start. + + + + @@ -8658,7 +8754,92 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) - + + + jit_debugging_support (boolean) + + jit_debugging_support configuration parameter + + + + + If LLVM has the required functionality, register generated functions + with GDB. This makes debugging easier. + + The default setting is off, and can only be set at + server start. + + + + + + jit_dump_bitcode (boolean) + + jit_dump_bitcode configuration parameter + + + + + Writes the generated LLVM IR out to the + filesystem, inside . This is only + useful for working on the internals of the JIT implementation. + + The default setting is off, and it can only be + changed by a superuser. + + + + + + jit_expressions (boolean) + + jit_expressions configuration parameter + + + + + Determines whether expressions are JIT compiled, subject to costing + decisions (see ). The default is + on. + + + + + + jit_profiling_support (boolean) + + jit_profiling_support configuration parameter + + + + + If LLVM has the required functionality, emit required data to allow + perf to profile functions generated by JIT. + This writes out files to $HOME/.debug/jit/; the + user is responsible for performing cleanup when desired. + + The default setting is off, and can only be set at + server start. + + + + + + jit_tuple_deforming (boolean) + + jit_tuple_deforming configuration parameter + + + + + Determines whether tuple deforming is JIT compiled, subject to costing + decisions (see ). The default is + on. + + + + + Short Options diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 732b8ab7d0b..56b8da04488 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -48,6 +48,7 @@ + diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 7b1a85fc717..9d1772f349a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -15942,6 +15942,14 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); is schema another session's temporary schema? + + pg_jit_available() + boolean + is JIT available in this session (see )? Returns false if is set to false. + + pg_listening_channels() setof text diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 2d24153bdcc..30921cf4868 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -758,6 +758,39 @@ su - postgres + + + + + Build with support for LLVM based + JIT compilation (see ). This + requires the LLVM library to be installed. + The minimum required version of LLVM is + currently 3.9. + + + llvm-configllvm-config + will be used to find the required compilation options. + llvm-config, and then + llvm-config-$major-$minor for all supported + versions, will be searched on PATH. If that would not + yield the correct binary, use LLVM_CONFIG to specify a + path to the correct llvm-config. For example + +./configure ... --with-llvm LLVM_CONFIG='/path/to/llvm/bin/llvm-config' + + + + + LLVM support requires a compatible + clang compiler (specified, if necessary, using the + CLANG environment variable), and a working C++ + compiler (specified, if necessary, using the CXX + environment variable). + + + + @@ -1342,6 +1375,16 @@ su - postgres + + CLANG + + + path to clang program used to process source code + for inlining when compiling with --with-llvm + + + + CPP @@ -1432,6 +1475,16 @@ su - postgres + + LLVM_CONFIG + + + llvm-config program used to locate the + LLVM installation. + + + + MSGFMT diff --git a/doc/src/sgml/jit.sgml b/doc/src/sgml/jit.sgml new file mode 100644 index 00000000000..f59e4923e14 --- /dev/null +++ b/doc/src/sgml/jit.sgml @@ -0,0 +1,299 @@ + + + + Just-in-Time Compilation (<acronym>JIT</acronym>) + + + JIT + + + + Just-In-Time compilation + JIT + + + + This chapter explains what just-in-time compilation is, and how it can be + configured in PostgreSQL. + + + + What is <acronym>JIT</acronym>? + + + Just-in-time compilation (JIT) is the process of turning + some form of interpreted program evaluation into a native program, and + doing so at runtime. + + For example, instead of using a facility that can evaluate arbitrary SQL + expressions to evaluate an SQL predicate like WHERE a.col = + 3, it is possible to generate a function than can be natively + executed by the CPU that just handles that expression, yielding a speedup. + + + + PostgreSQL has builtin support perform + JIT using LLVM when built + PostgreSQL was built with + --with-llvm (see ). + + + + See src/backend/jit/README for further details. + + + + <acronym>JIT</acronym> Accelerated Operations + + Currently PostgreSQL's JIT + implementation has support for accelerating expression evaluation and + tuple deforming. Several other operations could be accelerated in the + future. + + + Expression evaluation is used to evaluate WHERE + clauses, target lists, aggregates and projections. It can be accelerated + by generating code specific to each case. + + + Tuple deforming is the process of transforming an on-disk tuple (see ) into its in-memory representation. It can be + accelerated by creating a function specific to the table layout and the + number of columns to be extracted. + + + + + Optimization + + LLVM has support for optimizing generated + code. Some of the optimizations are cheap enough to be performed whenever + JIT is used, while others are only beneficial for + longer running queries. + + See for + more details about optimizations. + + + + + Inlining + + PostgreSQL is very extensible and allows new + datatypes, functions, operators and other database objects to be defined; + see . In fact the built-in ones are implemented + using nearly the same mechanisms. This extensibility implies some + overhead, for example due to function calls (see ). + To reduce that overhead JIT compilation can inline the + body for small functions into the expression using them. That allows a + significant percentage of the overhead to be optimized away. + + + + + + + When to <acronym>JIT</acronym>? + + + JIT is beneficial primarily for long-running CPU bound + queries. Frequently these will be analytical queries. For short queries + the overhead of performing JIT will often be higher than + the time it can save. + + + + To determine whether JIT is used, the total cost of a + query (see and ) is used. + + + + The cost of the query will be compared with GUC. If the cost is higher, + JIT compilation will be performed. + + + + If the planner, based on the above criterion, decided that + JIT is beneficial, two further decisions are + made. Firstly, if the query is more costly than the , GUC expensive optimizations are + used to improve the generated code. Secondly, if the query is more costly + than the GUC, short functions + and operators used in the query will be inlined. Both of these operations + increase the JIT overhead, but can reduce query + execution time considerably. + + + + This cost based decision will be made at plan time, not execution + time. This means that when prepared statements are in use, and the generic + plan is used (see ), the values of the + GUCs set at prepare time take effect, not the settings at execution time. + + + + + If is set to off, or no + JIT implementation is available (for example because + the server was compiled without --with-llvm), + JIT will not performed, even if considered to be + beneficial based on the above criteria. Setting + to off takes effect both at plan and at execution time. + + + + + can be used to see whether + JIT is used or not. As an example, here is a query that + is not using JIT: + +=# EXPLAIN ANALYZE SELECT SUM(relpages) FROM pg_class; +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ QUERY PLAN │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ Aggregate (cost=16.27..16.29 rows=1 width=8) (actual time=0.303..0.303 rows=1 loops=1) │ +│ -> Seq Scan on pg_class (cost=0.00..15.42 rows=342 width=4) (actual time=0.017..0.111 rows=356 loops=1) │ +│ Planning Time: 0.116 ms │ +│ Execution Time: 0.365 ms │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +(4 rows) + + Given the cost of the plan, it is entirely reasonable that no + JIT was used, the cost of JIT would + have been bigger than the savings. Adjusting the cost limits will lead to + JIT use: + +=# SET jit_above_cost = 10; +SET +=# EXPLAIN ANALYZE SELECT SUM(relpages) FROM pg_class; +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +│ QUERY PLAN │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ +│ Aggregate (cost=16.27..16.29 rows=1 width=8) (actual time=6.049..6.049 rows=1 loops=1) │ +│ -> Seq Scan on pg_class (cost=0.00..15.42 rows=342 width=4) (actual time=0.019..0.052 rows=356 loops=1) │ +│ Planning Time: 0.133 ms │ +│ JIT: │ +│ Functions: 3 │ +│ Generation Time: 1.259 ms │ +│ Inlining: false │ +│ Inlining Time: 0.000 ms │ +│ Optimization: false │ +│ Optimization Time: 0.797 ms │ +│ Emission Time: 5.048 ms │ +│ Execution Time: 7.416 ms │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ + + As visible here, JIT was used, but inlining and + optimization were not. If , + were lowered, just like , that would change. + + + + + Configuration + + + determines whether JIT is + enabled or disabled. + + + + As explained in the configuration variables + , , decide whether JIT + compilation is performed for a query, and how much effort is spent doing + so. + + + + For development and debugging purposes a few additional GUCs exist. allows the generated bitcode to be + inspected. allows GDB to see + generated functions. emits + information so the perf profiler can interpret + JIT generated functions sensibly. + + + + determines which JIT + implementation is used. It rarely is required to be changed. See . + + + + + Extensibility + + + Inlining Support for Extensions + + PostgreSQL's JIT + implementation can inline the implementation of operators and functions + (of type C and internal). See . To do so for functions in extensions, the + definition of these functions needs to be made available. When using PGXS to build an extension against a server + that has been compiled with LLVM support, the relevant files will be + installed automatically. + + + + The relevant files have to be installed into + $pkglibdir/bitcode/$extension/ and a summary of them + to $pkglibdir/bitcode/$extension.index.bc, where + $pkglibdir is the directory returned by + pg_config --pkglibdir and $extension + the basename of the extension's shared library. + + + + For functions built into PostgreSQL itself, + the bitcode is installed into + $pkglibdir/bitcode/postgres. + + + + + + + Pluggable <acronym>JIT</acronym> Provider + + + PostgreSQL provides a JIT + implementation based on LLVM. The interface to + the JIT provider is pluggable and the provider can be + changed without recompiling. The provider is chosen via the GUC. + + + + <acronym>JIT</acronym> Provider Interface + + A JIT provider is loaded by dynamically loading the + named shared library. The normal library search path is used to locate + the library. To provide the required JIT provider + callbacks and to indicate that the library is actually a + JIT provider it needs to provide a function named + _PG_jit_provider_init. This function is passed a + struct that needs to be filled with the callback function pointers for + individual actions. + +struct JitProviderCallbacks +{ + JitProviderResetAfterErrorCB reset_after_error; + JitProviderReleaseContextCB release_context; + JitProviderCompileExprCB compile_expr; +}; +extern void _PG_jit_provider_init(JitProviderCallbacks *cb); + + + + + + + diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index 054347b17d9..0070603fc36 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -163,6 +163,7 @@ &diskusage; &wal; &logical-replication; + &jit; ®ress; diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index c0e548fa5bc..70a822e0597 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -875,7 +875,7 @@ data. Empty in ordinary tables. src/include/storage/bufpage.h. - + Following the page header are item identifiers (ItemIdData), each requiring four bytes. diff --git a/src/backend/jit/README b/src/backend/jit/README new file mode 100644 index 00000000000..b37dcbe0c16 --- /dev/null +++ b/src/backend/jit/README @@ -0,0 +1,289 @@ +What is Just-in-Time Compilation? +================================= + +Just-in-Time compilation (JIT) is the process of turning some form of +interpreted program evaluation into a native program, and doing so at +runtime. + +For example, instead of using a facility that can evaluate arbitrary +SQL expressions to evaluate an SQL predicate like WHERE a.col = 3, it +is possible to generate a function than can be natively executed by +the CPU that just handles that expression, yielding a speedup. + +That this is done at query execution time, possibly even only in cases +the relevant task is done a number of times, makes it JIT, rather than +ahead-of-time (AOT). Given the way JIT compilation is used in +postgres, the lines between interpretation, AOT and JIT are somewhat +blurry. + +Note that the interpreted program turned into a native program does +not necessarily have to be a program in the classical sense. E.g. it +is highly beneficial JIT compile tuple deforming into a native +function just handling a specific type of table, despite tuple +deforming not commonly being understood as a "program". + + +Why JIT? +======== + +Parts of postgres are commonly bottlenecked by comparatively small +pieces of CPU intensive code. In a number of cases that is because the +relevant code has to be very generic (e.g. handling arbitrary SQL +level expressions, over arbitrary tables, with arbitrary extensions +installed). This often leads to a large number of indirect jumps and +unpredictable branches, and generally a high number of instructions +for a given task. E.g. just evaluating an expression comparing a +column in a database to an integer ends up needing several hundred +cycles. + +By generating native code large numbers of indirect jumps can be +removed by either making them into direct branches (e.g. replacing the +indirect call to an SQL operator's implementation with a direct call +to that function), or by removing it entirely (e.g. by evaluating the +branch at compile time because the input is constant). Similarly a lot +of branches can be entirely removed (e.g. by again evaluating the +branch at compile time because the input is constant). The latter is +particularly beneficial for removing branches during tuple deforming. + + +How to JIT +========== + +Postgres, by default, uses LLVM to perform JIT. LLVM was chosen +because it is developed by several large corporations and therefore +unlikely to be discontinued, because it has a license compatible with +PostgreSQL, and because its LLVM IR can be generated from C +using the clang compiler. + + +Shared Library Separation +------------------------- + +To avoid the main PostgreSQL binary directly depending on LLVM, which +would prevent LLVM support being independently installed by OS package +managers, the LLVM dependent code is located in a shared library that +is loaded on-demand. + +An additional benefit of doing so is that it is relatively easy to +evaluate JIT compilation that does not use LLVM, by changing out the +shared library used to provide JIT compilation. + +To achieve this code, e.g. expression evaluation, intending to perform +JIT, calls a LLVM independent wrapper located in jit.c to do so. If +the shared library providing JIT support can be loaded (i.e. postgres +was compiled with LLVM support and the shared library is installed), +the task of JIT compiling an expression gets handed of to shared +library. This obviously requires that the function in jit.c is allowed +to fail in case not JIT provider can be loaded. + +Which shared library is loaded is determined by the jit_provider GUC, +defaulting to "llvmjit". + +Cloistering code performing JIT into a shared library unfortunately +also means that code doing JIT compilation for various parts of code +has to be located separately from the code doing so without +JIT. E.g. the JITed version of execExprInterp.c is located in +jit/llvm/ rather than executor/. + + +JIT Context +----------- + +For performance and convenience reasons it is useful to allow JITed +functions to be emitted and deallocated together. It is e.g. very +common to create a number of functions at query initialization time, +use them during query execution, and then deallocate all of them +together at the end of the query. + +Lifetimes of JITed functions are managed via JITContext. Exactly one +such context should be created for work in which all created JITed +function should have the same lifetime. E.g. there's exactly one +JITContext for each query executed, in the query's EState. Only the +release of an JITContext is exposed to the provider independent +facility, as the creation of one is done on-demand by the JIT +implementations. + +Emitting individual functions separately is more expensive than +emitting several functions at once, and emitting them together can +provide additional optimization opportunities. To facilitate that the +LLVM provider separates function definition from emitting them in an +executable way. + +Creating functions into the current mutable module (a module +essentially is LLVM's equivalent of a translation unit in C) is done +using + extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context); +in which it then can emit as much code using the LLVM APIs as it +wants. Whenever a function actually needs to be called + extern void *llvm_get_function(LLVMJitContext *context, const char *funcname); +returns a pointer to it. + +E.g. in the expression evaluation case this setup allows most +functions in a query to be emitted during ExecInitNode(), delaying the +function emission to the time the first time a function is actually +used. + + +Error Handling +-------------- + +There are two aspects to error handling. Firstly, generated (LLVM IR) +and emitted functions (mmap()ed segments) need to be cleaned up both +after a successful query execution and after an error. This is done by +registering each created JITContext with the current resource owner, +and cleaning it up on error / end of transaction. If it is desirable +to release resources earlier, jit_release_context() can be used. + +The second, less pretty, aspect of error handling is OOM handling +inside LLVM itself. The above resowner based mechanism takes care of +cleaning up emitted code upon ERROR, but there's also the chance that +LLVM itself runs out of memory. LLVM by default does *not* use any C++ +exceptions. Its allocations are primarily funneled through the +standard "new" handlers, and some direct use of malloc() and +mmap(). For the former a 'new handler' exists +http://en.cppreference.com/w/cpp/memory/new/set_new_handler for the +latter LLVM provides callback that get called upon failure +(unfortunately mmap() failures are treated as fatal rather than OOM +errors). What we've, for now, chosen to do, is to have two functions +that LLVM using code must use: +extern void llvm_enter_fatal_on_oom(void); +extern void llvm_leave_fatal_on_oom(void); +before interacting with LLVM code. + +When a libstdc++ new or LLVM error occurs, the handlers set up by the +above functions trigger a FATAL error. We have to use FATAL rather +than ERROR, as we *cannot* reliably throw ERROR inside a foreign +library without risking corrupting its internal state. + +Users of the above sections do *not* have to use PG_TRY/CATCH blocks, +the handlers instead are reset on toplevel sigsetjmp() level. + +Using a relatively small enter/leave protected section of code, rather +than setting up these handlers globally, avoids negative interactions +with extensions that might use C++ like e.g. postgis. As LLVM code +generation should never execute arbitrary code, just setting these +handlers temporarily ought to suffice. + + +Type Synchronization +-------------------- + +To able to generate code performing tasks that are done in "interpreted" +postgres, it obviously is required that code generation knows about at +least a few postgres types. While it is possible to inform LLVM about +type definitions by recreating them manually in C code, that is failure +prone and labor intensive. + +Instead the is one small file (llvmjit_types.c) which references each of +the types required for JITing. That file is translated to bitcode at +compile time, and loaded when LLVM is initialized in a backend. + +That works very well to synchronize the type definition, unfortunately +it does *not* synchronize offsets as the IR level representation doesn't +know field names. Instead required offsets are maintained as defines in +the original struct definition. E.g. +#define FIELDNO_TUPLETABLESLOT_NVALID 9 + int tts_nvalid; /* # of valid values in tts_values */ +while that still needs to be defined, it's only required for a +relatively small number of fields, and it's bunched together with the +struct definition, so it's easily kept synchronized. + + +Inlining +-------- + +One big advantage of JITing expressions is that it can significantly +reduce the overhead of postgres's extensible function/operator +mechanism, by inlining the body of called functions / operators. + +It obviously is undesirable to maintain a second implementation of +commonly used functions, just for inlining purposes. Instead we take +advantage of the fact that the clang compiler can emit LLVM IR. + +The ability to do so allows us to get the LLVM IR for all operators +(e.g. int8eq, float8pl etc), without maintaining two copies. These +bitcode files get installed into the server's + $pkglibdir/bitcode/postgres/ +Using existing LLVM functionality (for parallel LTO compilation), +additionally an index is over these is stored to +$pkglibdir/bitcode/postgres.index.bc + +Similarly extensions can install code into + $pkglibdir/bitcode/[extension]/ +accompanied by + $pkglibdir/bitcode/[extension].index.bc + +just alongside the actual library. An extension's index will be used +to look up symbols when located in the corresponding shared +library. Symbols that are used inside the extension, when inlined, +will be first looked up in the main binary and then the extension's. + + +Caching +------- + +Currently it is not yet possible to cache generated functions, even +though that'd be desirable from a performance point of view. The +problem is that the generated functions commonly contain pointers into +per-execution memory. The expression evaluation functionality needs to +be redesigned a bit to avoid that. Basically all per-execution memory +needs to be referenced as an offset to one block of memory stored in +an ExprState, rather than absolute pointers into memory. + +Once that is addressed, adding an LRU cache that's keyed by the +generated LLVM IR will allow to use optimized functions even for +shorter functions. + +A longer term project is to move expression compilation to the planner +stage, allowing to tie + +What to JIT +=========== + +Currently expression evaluation and tuple deforming are JITed. Those +were chosen because they commonly are major CPU bottlenecks in +analytics queries, but are by no means the only potentially beneficial cases. + +For JITing to be beneficial a piece of code first and foremost has to +be a CPU bottleneck. But also importantly, JITing can only be +beneficial if overhead can be removed by doing so. E.g. in the tuple +deforming case the knowledge about the number of columns and their +types can remove a significant number of branches, and in the +expression evaluation case a lot of indirect jumps/calls can be +removed. If neither of these is the case, JITing is a waste of +resources. + +Future avenues for JITing are tuple sorting, COPY parsing/output +generation, and later compiling larger parts of queries. + + +When to JIT +=========== + +Currently there are a number of GUCs that influence JITing: + +- jit_above_cost = -1, 0-DBL_MAX - all queries with a higher total cost + get JITed, *without* optimization (expensive part), corresponding to + -O0. This commonly already results in significant speedups if + expression/deforming is a bottleneck (removing dynamic branches + mostly). +- jit_optimize_above_cost = -1, 0-DBL_MAX - all queries with a higher total cost + get JITed, *with* optimization (expensive part). +- jit_inline_above_cost = -1, 0-DBL_MAX - inlining is tried if query has + higher cost. + +whenever a query's total cost is above these limits, JITing is +performed. + +Alternative costing models, e.g. by generating separate paths for +parts of a query with lower cpu_* costs, are also a possibility, but +it's doubtful the overhead of doing so is sufficient. Another +alternative would be to count the number of times individual +expressions are estimated to be evaluated, and perform JITing of these +individual expressions. + +The obvious seeming approach of JITing expressions individually after +a number of execution turns out not to work too well. Primarily +because emitting many small functions individually has significant +overhead. Secondarily because the time till JITing occurs causes +relative slowdowns that eat into the gain of JIT compilation.