mirror of
https://github.com/apache/httpd.git
synced 2025-05-17 15:21:13 +03:00
<A NAME> (thanks, Marc). Lots of trailing blanks removed throughout. Small addition to the new_features_1_3 page. Plenty of cleanup still to come.. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@78545 13f79535-47bb-0310-9956-ffa450edef68
1005 lines
43 KiB
HTML
1005 lines
43 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<html><head>
|
|
<title>Apache API notes</title>
|
|
</head>
|
|
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
|
|
<BODY
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#000080"
|
|
ALINK="#FF0000"
|
|
>
|
|
<!--#include virtual="header.html" -->
|
|
<h1 ALIGN="CENTER">Apache API notes</h1>
|
|
|
|
These are some notes on the Apache API and the data structures you
|
|
have to deal with, etc. They are not yet nearly complete, but
|
|
hopefully, they will help you get your bearings. Keep in mind that
|
|
the API is still subject to change as we gain experience with it.
|
|
(See the TODO file for what <em>might</em> be coming). However,
|
|
it will be easy to adapt modules to any changes that are made.
|
|
(We have more modules to adapt than you do).
|
|
<p>
|
|
|
|
A few notes on general pedagogical style here. In the interest of
|
|
conciseness, all structure declarations here are incomplete --- the
|
|
real ones have more slots that I'm not telling you about. For the
|
|
most part, these are reserved to one component of the server core or
|
|
another, and should be altered by modules with caution. However, in
|
|
some cases, they really are things I just haven't gotten around to
|
|
yet. Welcome to the bleeding edge.<p>
|
|
|
|
Finally, here's an outline, to give you some bare idea of what's
|
|
coming up, and in what order:
|
|
|
|
<ul>
|
|
<li> <a href="#basics">Basic concepts.</a>
|
|
<menu>
|
|
<li> <a href="#HMR">Handlers, Modules, and Requests</a>
|
|
<li> <a href="#moduletour">A brief tour of a module</a>
|
|
</menu>
|
|
<li> <a href="#handlers">How handlers work</a>
|
|
<menu>
|
|
<li> <a href="#req_tour">A brief tour of the <code>request_rec</code></a>
|
|
<li> <a href="#req_orig">Where request_rec structures come from</a>
|
|
<li> <a href="#req_return">Handling requests, declining, and returning error codes</a>
|
|
<li> <a href="#resp_handlers">Special considerations for response handlers</a>
|
|
<li> <a href="#auth_handlers">Special considerations for authentication handlers</a>
|
|
<li> <a href="#log_handlers">Special considerations for logging handlers</a>
|
|
</menu>
|
|
<li> <a href="#pools">Resource allocation and resource pools</a>
|
|
<li> <a href="#config">Configuration, commands and the like</a>
|
|
<menu>
|
|
<li> <a href="#per-dir">Per-directory configuration structures</a>
|
|
<li> <a href="#commands">Command handling</a>
|
|
<li> <a href="#servconf">Side notes --- per-server configuration, virtual servers, etc.</a>
|
|
</menu>
|
|
</ul>
|
|
|
|
<h2><a name="basics">Basic concepts.</a></h2>
|
|
|
|
We begin with an overview of the basic concepts behind the
|
|
API, and how they are manifested in the code.
|
|
|
|
<h3><a name="HMR">Handlers, Modules, and Requests</a></h3>
|
|
|
|
Apache breaks down request handling into a series of steps, more or
|
|
less the same way the Netscape server API does (although this API has
|
|
a few more stages than NetSite does, as hooks for stuff I thought
|
|
might be useful in the future). These are:
|
|
|
|
<ul>
|
|
<li> URI -> Filename translation
|
|
<li> Auth ID checking [is the user who they say they are?]
|
|
<li> Auth access checking [is the user authorized <em>here</em>?]
|
|
<li> Access checking other than auth
|
|
<li> Determining MIME type of the object requested
|
|
<li> `Fixups' --- there aren't any of these yet, but the phase is
|
|
intended as a hook for possible extensions like
|
|
<code>SetEnv</code>, which don't really fit well elsewhere.
|
|
<li> Actually sending a response back to the client.
|
|
<li> Logging the request
|
|
</ul>
|
|
|
|
These phases are handled by looking at each of a succession of
|
|
<em>modules</em>, looking to see if each of them has a handler for the
|
|
phase, and attempting invoking it if so. The handler can typically do
|
|
one of three things:
|
|
|
|
<ul>
|
|
<li> <em>Handle</em> the request, and indicate that it has done so
|
|
by returning the magic constant <code>OK</code>.
|
|
<li> <em>Decline</em> to handle the request, by returning the magic
|
|
integer constant <code>DECLINED</code>. In this case, the
|
|
server behaves in all respects as if the handler simply hadn't
|
|
been there.
|
|
<li> Signal an error, by returning one of the HTTP error codes.
|
|
This terminates normal handling of the request, although an
|
|
ErrorDocument may be invoked to try to mop up, and it will be
|
|
logged in any case.
|
|
</ul>
|
|
|
|
Most phases are terminated by the first module that handles them;
|
|
however, for logging, `fixups', and non-access authentication
|
|
checking, all handlers always run (barring an error). Also, the
|
|
response phase is unique in that modules may declare multiple handlers
|
|
for it, via a dispatch table keyed on the MIME type of the requested
|
|
object. Modules may declare a response-phase handler which can handle
|
|
<em>any</em> request, by giving it the key <code>*/*</code> (i.e., a
|
|
wildcard MIME type specification). However, wildcard handlers are
|
|
only invoked if the server has already tried and failed to find a more
|
|
specific response handler for the MIME type of the requested object
|
|
(either none existed, or they all declined).<p>
|
|
|
|
The handlers themselves are functions of one argument (a
|
|
<code>request_rec</code> structure. vide infra), which returns an
|
|
integer, as above.<p>
|
|
|
|
<h3><a name="moduletour">A brief tour of a module</a></h3>
|
|
|
|
At this point, we need to explain the structure of a module. Our
|
|
candidate will be one of the messier ones, the CGI module --- this
|
|
handles both CGI scripts and the <code>ScriptAlias</code> config file
|
|
command. It's actually a great deal more complicated than most
|
|
modules, but if we're going to have only one example, it might as well
|
|
be the one with its fingers in every place.<p>
|
|
|
|
Let's begin with handlers. In order to handle the CGI scripts, the
|
|
module declares a response handler for them. Because of
|
|
<code>ScriptAlias</code>, it also has handlers for the name
|
|
translation phase (to recognize <code>ScriptAlias</code>ed URIs), the
|
|
type-checking phase (any <code>ScriptAlias</code>ed request is typed
|
|
as a CGI script).<p>
|
|
|
|
The module needs to maintain some per (virtual)
|
|
server information, namely, the <code>ScriptAlias</code>es in effect;
|
|
the module structure therefore contains pointers to a functions which
|
|
builds these structures, and to another which combines two of them (in
|
|
case the main server and a virtual server both have
|
|
<code>ScriptAlias</code>es declared).<p>
|
|
|
|
Finally, this module contains code to handle the
|
|
<code>ScriptAlias</code> command itself. This particular module only
|
|
declares one command, but there could be more, so modules have
|
|
<em>command tables</em> which declare their commands, and describe
|
|
where they are permitted, and how they are to be invoked. <p>
|
|
|
|
A final note on the declared types of the arguments of some of these
|
|
commands: a <code>pool</code> is a pointer to a <em>resource pool</em>
|
|
structure; these are used by the server to keep track of the memory
|
|
which has been allocated, files opened, etc., either to service a
|
|
particular request, or to handle the process of configuring itself.
|
|
That way, when the request is over (or, for the configuration pool,
|
|
when the server is restarting), the memory can be freed, and the files
|
|
closed, <i>en masse</i>, without anyone having to write explicit code to
|
|
track them all down and dispose of them. Also, a
|
|
<code>cmd_parms</code> structure contains various information about
|
|
the config file being read, and other status information, which is
|
|
sometimes of use to the function which processes a config-file command
|
|
(such as <code>ScriptAlias</code>).
|
|
|
|
With no further ado, the module itself:
|
|
|
|
<pre>
|
|
/* Declarations of handlers. */
|
|
|
|
int translate_scriptalias (request_rec *);
|
|
int type_scriptalias (request_rec *);
|
|
int cgi_handler (request_rec *);
|
|
|
|
/* Subsidiary dispatch table for response-phase handlers, by MIME type */
|
|
|
|
handler_rec cgi_handlers[] = {
|
|
{ "application/x-httpd-cgi", cgi_handler },
|
|
{ NULL }
|
|
};
|
|
|
|
/* Declarations of routines to manipulate the module's configuration
|
|
* info. Note that these are returned, and passed in, as void *'s;
|
|
* the server core keeps track of them, but it doesn't, and can't,
|
|
* know their internal structure.
|
|
*/
|
|
|
|
void *make_cgi_server_config (pool *);
|
|
void *merge_cgi_server_config (pool *, void *, void *);
|
|
|
|
/* Declarations of routines to handle config-file commands */
|
|
|
|
extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
|
|
char *real);
|
|
|
|
command_rec cgi_cmds[] = {
|
|
{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
|
|
"a fakename and a realname"},
|
|
{ NULL }
|
|
};
|
|
|
|
module cgi_module = {
|
|
STANDARD_MODULE_STUFF,
|
|
NULL, /* initializer */
|
|
NULL, /* dir config creator */
|
|
NULL, /* dir merger --- default is to override */
|
|
make_cgi_server_config, /* server config */
|
|
merge_cgi_server_config, /* merge server config */
|
|
cgi_cmds, /* command table */
|
|
cgi_handlers, /* handlers */
|
|
translate_scriptalias, /* filename translation */
|
|
NULL, /* check_user_id */
|
|
NULL, /* check auth */
|
|
NULL, /* check access */
|
|
type_scriptalias, /* type_checker */
|
|
NULL, /* fixups */
|
|
NULL, /* logger */
|
|
NULL /* header parser */
|
|
};
|
|
</pre>
|
|
|
|
<h2><a name="handlers">How handlers work</a></h2>
|
|
|
|
The sole argument to handlers is a <code>request_rec</code> structure.
|
|
This structure describes a particular request which has been made to
|
|
the server, on behalf of a client. In most cases, each connection to
|
|
the client generates only one <code>request_rec</code> structure.<p>
|
|
|
|
<h3><a name="req_tour">A brief tour of the <code>request_rec</code></a></h3>
|
|
|
|
The <code>request_rec</code> contains pointers to a resource pool
|
|
which will be cleared when the server is finished handling the
|
|
request; to structures containing per-server and per-connection
|
|
information, and most importantly, information on the request itself.<p>
|
|
|
|
The most important such information is a small set of character
|
|
strings describing attributes of the object being requested, including
|
|
its URI, filename, content-type and content-encoding (these being filled
|
|
in by the translation and type-check handlers which handle the
|
|
request, respectively). <p>
|
|
|
|
Other commonly used data items are tables giving the MIME headers on
|
|
the client's original request, MIME headers to be sent back with the
|
|
response (which modules can add to at will), and environment variables
|
|
for any subprocesses which are spawned off in the course of servicing
|
|
the request. These tables are manipulated using the
|
|
<code>table_get</code> and <code>table_set</code> routines. <p>
|
|
<BLOCKQUOTE>
|
|
Note that the <SAMP>Content-type</SAMP> header value <EM>cannot</EM> be
|
|
set by module content-handlers using the <SAMP>table_*()</SAMP>
|
|
routines. Rather, it is set by pointing the <SAMP>content_type</SAMP>
|
|
field in the <SAMP>request_rec</SAMP> structure to an appropriate
|
|
string. <EM>E.g.</EM>,
|
|
<PRE>
|
|
r->content_type = "text/html";
|
|
</PRE>
|
|
</BLOCKQUOTE>
|
|
Finally, there are pointers to two data structures which, in turn,
|
|
point to per-module configuration structures. Specifically, these
|
|
hold pointers to the data structures which the module has built to
|
|
describe the way it has been configured to operate in a given
|
|
directory (via <code>.htaccess</code> files or
|
|
<code><Directory></code> sections), for private data it has
|
|
built in the course of servicing the request (so modules' handlers for
|
|
one phase can pass `notes' to their handlers for other phases). There
|
|
is another such configuration vector in the <code>server_rec</code>
|
|
data structure pointed to by the <code>request_rec</code>, which
|
|
contains per (virtual) server configuration data.<p>
|
|
|
|
Here is an abridged declaration, giving the fields most commonly used:<p>
|
|
|
|
<pre>
|
|
struct request_rec {
|
|
|
|
pool *pool;
|
|
conn_rec *connection;
|
|
server_rec *server;
|
|
|
|
/* What object is being requested */
|
|
|
|
char *uri;
|
|
char *filename;
|
|
char *path_info;
|
|
char *args; /* QUERY_ARGS, if any */
|
|
struct stat finfo; /* Set by server core;
|
|
* st_mode set to zero if no such file */
|
|
|
|
char *content_type;
|
|
char *content_encoding;
|
|
|
|
/* MIME header environments, in and out. Also, an array containing
|
|
* environment variables to be passed to subprocesses, so people can
|
|
* write modules to add to that environment.
|
|
*
|
|
* The difference between headers_out and err_headers_out is that
|
|
* the latter are printed even on error, and persist across internal
|
|
* redirects (so the headers printed for ErrorDocument handlers will
|
|
* have them).
|
|
*/
|
|
|
|
table *headers_in;
|
|
table *headers_out;
|
|
table *err_headers_out;
|
|
table *subprocess_env;
|
|
|
|
/* Info about the request itself... */
|
|
|
|
int header_only; /* HEAD request, as opposed to GET */
|
|
char *protocol; /* Protocol, as given to us, or HTTP/0.9 */
|
|
char *method; /* GET, HEAD, POST, etc. */
|
|
int method_number; /* M_GET, M_POST, etc. */
|
|
|
|
/* Info for logging */
|
|
|
|
char *the_request;
|
|
int bytes_sent;
|
|
|
|
/* A flag which modules can set, to indicate that the data being
|
|
* returned is volatile, and clients should be told not to cache it.
|
|
*/
|
|
|
|
int no_cache;
|
|
|
|
/* Various other config info which may change with .htaccess files
|
|
* These are config vectors, with one void* pointer for each module
|
|
* (the thing pointed to being the module's business).
|
|
*/
|
|
|
|
void *per_dir_config; /* Options set in config files, etc. */
|
|
void *request_config; /* Notes on *this* request */
|
|
|
|
};
|
|
|
|
</pre>
|
|
|
|
<h3><a name="req_orig">Where request_rec structures come from</a></h3>
|
|
|
|
Most <code>request_rec</code> structures are built by reading an HTTP
|
|
request from a client, and filling in the fields. However, there are
|
|
a few exceptions:
|
|
|
|
<ul>
|
|
<li> If the request is to an imagemap, a type map (i.e., a
|
|
<code>*.var</code> file), or a CGI script which returned a
|
|
local `Location:', then the resource which the user requested
|
|
is going to be ultimately located by some URI other than what
|
|
the client originally supplied. In this case, the server does
|
|
an <em>internal redirect</em>, constructing a new
|
|
<code>request_rec</code> for the new URI, and processing it
|
|
almost exactly as if the client had requested the new URI
|
|
directly. <p>
|
|
|
|
<li> If some handler signaled an error, and an
|
|
<code>ErrorDocument</code> is in scope, the same internal
|
|
redirect machinery comes into play.<p>
|
|
|
|
<li> Finally, a handler occasionally needs to investigate `what
|
|
would happen if' some other request were run. For instance,
|
|
the directory indexing module needs to know what MIME type
|
|
would be assigned to a request for each directory entry, in
|
|
order to figure out what icon to use.<p>
|
|
|
|
Such handlers can construct a <em>sub-request</em>, using the
|
|
functions <code>sub_req_lookup_file</code> and
|
|
<code>sub_req_lookup_uri</code>; this constructs a new
|
|
<code>request_rec</code> structure and processes it as you
|
|
would expect, up to but not including the point of actually
|
|
sending a response. (These functions skip over the access
|
|
checks if the sub-request is for a file in the same directory
|
|
as the original request).<p>
|
|
|
|
(Server-side includes work by building sub-requests and then
|
|
actually invoking the response handler for them, via the
|
|
function <code>run_sub_request</code>).
|
|
</ul>
|
|
|
|
<h3><a name="req_return">Handling requests, declining, and returning error codes</a></h3>
|
|
|
|
As discussed above, each handler, when invoked to handle a particular
|
|
<code>request_rec</code>, has to return an <code>int</code> to
|
|
indicate what happened. That can either be
|
|
|
|
<ul>
|
|
<li> OK --- the request was handled successfully. This may or may
|
|
not terminate the phase.
|
|
<li> DECLINED --- no erroneous condition exists, but the module
|
|
declines to handle the phase; the server tries to find another.
|
|
<li> an HTTP error code, which aborts handling of the request.
|
|
</ul>
|
|
|
|
Note that if the error code returned is <code>REDIRECT</code>, then
|
|
the module should put a <code>Location</code> in the request's
|
|
<code>headers_out</code>, to indicate where the client should be
|
|
redirected <em>to</em>. <p>
|
|
|
|
<h3><a name="resp_handlers">Special considerations for response handlers</a></h3>
|
|
|
|
Handlers for most phases do their work by simply setting a few fields
|
|
in the <code>request_rec</code> structure (or, in the case of access
|
|
checkers, simply by returning the correct error code). However,
|
|
response handlers have to actually send a request back to the client. <p>
|
|
|
|
They should begin by sending an HTTP response header, using the
|
|
function <code>send_http_header</code>. (You don't have to do
|
|
anything special to skip sending the header for HTTP/0.9 requests; the
|
|
function figures out on its own that it shouldn't do anything). If
|
|
the request is marked <code>header_only</code>, that's all they should
|
|
do; they should return after that, without attempting any further
|
|
output. <p>
|
|
|
|
Otherwise, they should produce a request body which responds to the
|
|
client as appropriate. The primitives for this are <code>rputc</code>
|
|
and <code>rprintf</code>, for internally generated output, and
|
|
<code>send_fd</code>, to copy the contents of some <code>FILE *</code>
|
|
straight to the client. <p>
|
|
|
|
At this point, you should more or less understand the following piece
|
|
of code, which is the handler which handles <code>GET</code> requests
|
|
which have no more specific handler; it also shows how conditional
|
|
<code>GET</code>s can be handled, if it's desirable to do so in a
|
|
particular response handler --- <code>set_last_modified</code> checks
|
|
against the <code>If-modified-since</code> value supplied by the
|
|
client, if any, and returns an appropriate code (which will, if
|
|
nonzero, be USE_LOCAL_COPY). No similar considerations apply for
|
|
<code>set_content_length</code>, but it returns an error code for
|
|
symmetry.<p>
|
|
|
|
<pre>
|
|
int default_handler (request_rec *r)
|
|
{
|
|
int errstatus;
|
|
FILE *f;
|
|
|
|
if (r->method_number != M_GET) return DECLINED;
|
|
if (r->finfo.st_mode == 0) return NOT_FOUND;
|
|
|
|
if ((errstatus = set_content_length (r, r->finfo.st_size))
|
|
|| (errstatus = set_last_modified (r, r->finfo.st_mtime)))
|
|
return errstatus;
|
|
|
|
f = fopen (r->filename, "r");
|
|
|
|
if (f == NULL) {
|
|
log_reason("file permissions deny server access",
|
|
r->filename, r);
|
|
return FORBIDDEN;
|
|
}
|
|
|
|
register_timeout ("send", r);
|
|
send_http_header (r);
|
|
|
|
if (!r->header_only) send_fd (f, r);
|
|
pfclose (r->pool, f);
|
|
return OK;
|
|
}
|
|
</pre>
|
|
|
|
Finally, if all of this is too much of a challenge, there are a few
|
|
ways out of it. First off, as shown above, a response handler which
|
|
has not yet produced any output can simply return an error code, in
|
|
which case the server will automatically produce an error response.
|
|
Secondly, it can punt to some other handler by invoking
|
|
<code>internal_redirect</code>, which is how the internal redirection
|
|
machinery discussed above is invoked. A response handler which has
|
|
internally redirected should always return <code>OK</code>. <p>
|
|
|
|
(Invoking <code>internal_redirect</code> from handlers which are
|
|
<em>not</em> response handlers will lead to serious confusion).
|
|
|
|
<h3><a name="auth_handlers">Special considerations for authentication handlers</a></h3>
|
|
|
|
Stuff that should be discussed here in detail:
|
|
|
|
<ul>
|
|
<li> Authentication-phase handlers not invoked unless auth is
|
|
configured for the directory.
|
|
<li> Common auth configuration stored in the core per-dir
|
|
configuration; it has accessors <code>auth_type</code>,
|
|
<code>auth_name</code>, and <code>requires</code>.
|
|
<li> Common routines, to handle the protocol end of things, at least
|
|
for HTTP basic authentication (<code>get_basic_auth_pw</code>,
|
|
which sets the <code>connection->user</code> structure field
|
|
automatically, and <code>note_basic_auth_failure</code>, which
|
|
arranges for the proper <code>WWW-Authenticate:</code> header
|
|
to be sent back).
|
|
</ul>
|
|
|
|
<h3><a name="log_handlers">Special considerations for logging handlers</a></h3>
|
|
|
|
When a request has internally redirected, there is the question of
|
|
what to log. Apache handles this by bundling the entire chain of
|
|
redirects into a list of <code>request_rec</code> structures which are
|
|
threaded through the <code>r->prev</code> and <code>r->next</code>
|
|
pointers. The <code>request_rec</code> which is passed to the logging
|
|
handlers in such cases is the one which was originally built for the
|
|
initial request from the client; note that the bytes_sent field will
|
|
only be correct in the last request in the chain (the one for which a
|
|
response was actually sent).
|
|
|
|
<h2><a name="pools">Resource allocation and resource pools</a></h2>
|
|
|
|
One of the problems of writing and designing a server-pool server is
|
|
that of preventing leakage, that is, allocating resources (memory,
|
|
open files, etc.), without subsequently releasing them. The resource
|
|
pool machinery is designed to make it easy to prevent this from
|
|
happening, by allowing resource to be allocated in such a way that
|
|
they are <em>automatically</em> released when the server is done with
|
|
them. <p>
|
|
|
|
The way this works is as follows: the memory which is allocated, file
|
|
opened, etc., to deal with a particular request are tied to a
|
|
<em>resource pool</em> which is allocated for the request. The pool
|
|
is a data structure which itself tracks the resources in question. <p>
|
|
|
|
When the request has been processed, the pool is <em>cleared</em>. At
|
|
that point, all the memory associated with it is released for reuse,
|
|
all files associated with it are closed, and any other clean-up
|
|
functions which are associated with the pool are run. When this is
|
|
over, we can be confident that all the resource tied to the pool have
|
|
been released, and that none of them have leaked. <p>
|
|
|
|
Server restarts, and allocation of memory and resources for per-server
|
|
configuration, are handled in a similar way. There is a
|
|
<em>configuration pool</em>, which keeps track of resources which were
|
|
allocated while reading the server configuration files, and handling
|
|
the commands therein (for instance, the memory that was allocated for
|
|
per-server module configuration, log files and other files that were
|
|
opened, and so forth). When the server restarts, and has to reread
|
|
the configuration files, the configuration pool is cleared, and so the
|
|
memory and file descriptors which were taken up by reading them the
|
|
last time are made available for reuse. <p>
|
|
|
|
It should be noted that use of the pool machinery isn't generally
|
|
obligatory, except for situations like logging handlers, where you
|
|
really need to register cleanups to make sure that the log file gets
|
|
closed when the server restarts (this is most easily done by using the
|
|
function <code><a href="#pool-files">pfopen</a></code>, which also
|
|
arranges for the underlying file descriptor to be closed before any
|
|
child processes, such as for CGI scripts, are <code>exec</code>ed), or
|
|
in case you are using the timeout machinery (which isn't yet even
|
|
documented here). However, there are two benefits to using it:
|
|
resources allocated to a pool never leak (even if you allocate a
|
|
scratch string, and just forget about it); also, for memory
|
|
allocation, <code>palloc</code> is generally faster than
|
|
<code>malloc</code>.<p>
|
|
|
|
We begin here by describing how memory is allocated to pools, and then
|
|
discuss how other resources are tracked by the resource pool
|
|
machinery.
|
|
|
|
<h3>Allocation of memory in pools</h3>
|
|
|
|
Memory is allocated to pools by calling the function
|
|
<code>palloc</code>, which takes two arguments, one being a pointer to
|
|
a resource pool structure, and the other being the amount of memory to
|
|
allocate (in <code>char</code>s). Within handlers for handling
|
|
requests, the most common way of getting a resource pool structure is
|
|
by looking at the <code>pool</code> slot of the relevant
|
|
<code>request_rec</code>; hence the repeated appearance of the
|
|
following idiom in module code:
|
|
|
|
<pre>
|
|
int my_handler(request_rec *r)
|
|
{
|
|
struct my_structure *foo;
|
|
...
|
|
|
|
foo = (foo *)palloc (r->pool, sizeof(my_structure));
|
|
}
|
|
</pre>
|
|
|
|
Note that <em>there is no <code>pfree</code></em> ---
|
|
<code>palloc</code>ed memory is freed only when the associated
|
|
resource pool is cleared. This means that <code>palloc</code> does not
|
|
have to do as much accounting as <code>malloc()</code>; all it does in
|
|
the typical case is to round up the size, bump a pointer, and do a
|
|
range check.<p>
|
|
|
|
(It also raises the possibility that heavy use of <code>palloc</code>
|
|
could cause a server process to grow excessively large. There are
|
|
two ways to deal with this, which are dealt with below; briefly, you
|
|
can use <code>malloc</code>, and try to be sure that all of the memory
|
|
gets explicitly <code>free</code>d, or you can allocate a sub-pool of
|
|
the main pool, allocate your memory in the sub-pool, and clear it out
|
|
periodically. The latter technique is discussed in the section on
|
|
sub-pools below, and is used in the directory-indexing code, in order
|
|
to avoid excessive storage allocation when listing directories with
|
|
thousands of files).
|
|
|
|
<h3>Allocating initialized memory</h3>
|
|
|
|
There are functions which allocate initialized memory, and are
|
|
frequently useful. The function <code>pcalloc</code> has the same
|
|
interface as <code>palloc</code>, but clears out the memory it
|
|
allocates before it returns it. The function <code>pstrdup</code>
|
|
takes a resource pool and a <code>char *</code> as arguments, and
|
|
allocates memory for a copy of the string the pointer points to,
|
|
returning a pointer to the copy. Finally <code>pstrcat</code> is a
|
|
varargs-style function, which takes a pointer to a resource pool, and
|
|
at least two <code>char *</code> arguments, the last of which must be
|
|
<code>NULL</code>. It allocates enough memory to fit copies of each
|
|
of the strings, as a unit; for instance:
|
|
|
|
<pre>
|
|
pstrcat (r->pool, "foo", "/", "bar", NULL);
|
|
</pre>
|
|
|
|
returns a pointer to 8 bytes worth of memory, initialized to
|
|
<code>"foo/bar"</code>.
|
|
|
|
<h3><a name="pool-files">Tracking open files, etc.</a></h3>
|
|
|
|
As indicated above, resource pools are also used to track other sorts
|
|
of resources besides memory. The most common are open files. The
|
|
routine which is typically used for this is <code>pfopen</code>, which
|
|
takes a resource pool and two strings as arguments; the strings are
|
|
the same as the typical arguments to <code>fopen</code>, e.g.,
|
|
|
|
<pre>
|
|
...
|
|
FILE *f = pfopen (r->pool, r->filename, "r");
|
|
|
|
if (f == NULL) { ... } else { ... }
|
|
</pre>
|
|
|
|
There is also a <code>popenf</code> routine, which parallels the
|
|
lower-level <code>open</code> system call. Both of these routines
|
|
arrange for the file to be closed when the resource pool in question
|
|
is cleared. <p>
|
|
|
|
Unlike the case for memory, there <em>are</em> functions to close
|
|
files allocated with <code>pfopen</code>, and <code>popenf</code>,
|
|
namely <code>pfclose</code> and <code>pclosef</code>. (This is
|
|
because, on many systems, the number of files which a single process
|
|
can have open is quite limited). It is important to use these
|
|
functions to close files allocated with <code>pfopen</code> and
|
|
<code>popenf</code>, since to do otherwise could cause fatal errors on
|
|
systems such as Linux, which react badly if the same
|
|
<code>FILE*</code> is closed more than once. <p>
|
|
|
|
(Using the <code>close</code> functions is not mandatory, since the
|
|
file will eventually be closed regardless, but you should consider it
|
|
in cases where your module is opening, or could open, a lot of files).
|
|
|
|
<h3>Other sorts of resources --- cleanup functions</h3>
|
|
|
|
More text goes here. Describe the the cleanup primitives in terms of
|
|
which the file stuff is implemented; also, <code>spawn_process</code>.
|
|
|
|
<h3>Fine control --- creating and dealing with sub-pools, with a note
|
|
on sub-requests</h3>
|
|
|
|
On rare occasions, too-free use of <code>palloc()</code> and the
|
|
associated primitives may result in undesirably profligate resource
|
|
allocation. You can deal with such a case by creating a
|
|
<em>sub-pool</em>, allocating within the sub-pool rather than the main
|
|
pool, and clearing or destroying the sub-pool, which releases the
|
|
resources which were associated with it. (This really <em>is</em> a
|
|
rare situation; the only case in which it comes up in the standard
|
|
module set is in case of listing directories, and then only with
|
|
<em>very</em> large directories. Unnecessary use of the primitives
|
|
discussed here can hair up your code quite a bit, with very little
|
|
gain). <p>
|
|
|
|
The primitive for creating a sub-pool is <code>make_sub_pool</code>,
|
|
which takes another pool (the parent pool) as an argument. When the
|
|
main pool is cleared, the sub-pool will be destroyed. The sub-pool
|
|
may also be cleared or destroyed at any time, by calling the functions
|
|
<code>clear_pool</code> and <code>destroy_pool</code>, respectively.
|
|
(The difference is that <code>clear_pool</code> frees resources
|
|
associated with the pool, while <code>destroy_pool</code> also
|
|
deallocates the pool itself. In the former case, you can allocate new
|
|
resources within the pool, and clear it again, and so forth; in the
|
|
latter case, it is simply gone). <p>
|
|
|
|
One final note --- sub-requests have their own resource pools, which
|
|
are sub-pools of the resource pool for the main request. The polite
|
|
way to reclaim the resources associated with a sub request which you
|
|
have allocated (using the <code>sub_req_lookup_...</code> functions)
|
|
is <code>destroy_sub_request</code>, which frees the resource pool.
|
|
Before calling this function, be sure to copy anything that you care
|
|
about which might be allocated in the sub-request's resource pool into
|
|
someplace a little less volatile (for instance, the filename in its
|
|
<code>request_rec</code> structure). <p>
|
|
|
|
(Again, under most circumstances, you shouldn't feel obliged to call
|
|
this function; only 2K of memory or so are allocated for a typical sub
|
|
request, and it will be freed anyway when the main request pool is
|
|
cleared. It is only when you are allocating many, many sub-requests
|
|
for a single main request that you should seriously consider the
|
|
<code>destroy...</code> functions).
|
|
|
|
<h2><a name="config">Configuration, commands and the like</a></h2>
|
|
|
|
One of the design goals for this server was to maintain external
|
|
compatibility with the NCSA 1.3 server --- that is, to read the same
|
|
configuration files, to process all the directives therein correctly,
|
|
and in general to be a drop-in replacement for NCSA. On the other
|
|
hand, another design goal was to move as much of the server's
|
|
functionality into modules which have as little as possible to do with
|
|
the monolithic server core. The only way to reconcile these goals is
|
|
to move the handling of most commands from the central server into the
|
|
modules. <p>
|
|
|
|
However, just giving the modules command tables is not enough to
|
|
divorce them completely from the server core. The server has to
|
|
remember the commands in order to act on them later. That involves
|
|
maintaining data which is private to the modules, and which can be
|
|
either per-server, or per-directory. Most things are per-directory,
|
|
including in particular access control and authorization information,
|
|
but also information on how to determine file types from suffixes,
|
|
which can be modified by <code>AddType</code> and
|
|
<code>DefaultType</code> directives, and so forth. In general, the
|
|
governing philosophy is that anything which <em>can</em> be made
|
|
configurable by directory should be; per-server information is
|
|
generally used in the standard set of modules for information like
|
|
<code>Alias</code>es and <code>Redirect</code>s which come into play
|
|
before the request is tied to a particular place in the underlying
|
|
file system. <p>
|
|
|
|
Another requirement for emulating the NCSA server is being able to
|
|
handle the per-directory configuration files, generally called
|
|
<code>.htaccess</code> files, though even in the NCSA server they can
|
|
contain directives which have nothing at all to do with access
|
|
control. Accordingly, after URI -> filename translation, but before
|
|
performing any other phase, the server walks down the directory
|
|
hierarchy of the underlying filesystem, following the translated
|
|
pathname, to read any <code>.htaccess</code> files which might be
|
|
present. The information which is read in then has to be
|
|
<em>merged</em> with the applicable information from the server's own
|
|
config files (either from the <code><Directory></code> sections
|
|
in <code>access.conf</code>, or from defaults in
|
|
<code>srm.conf</code>, which actually behaves for most purposes almost
|
|
exactly like <code><Directory /></code>).<p>
|
|
|
|
Finally, after having served a request which involved reading
|
|
<code>.htaccess</code> files, we need to discard the storage allocated
|
|
for handling them. That is solved the same way it is solved wherever
|
|
else similar problems come up, by tying those structures to the
|
|
per-transaction resource pool. <p>
|
|
|
|
<h3><a name="per-dir">Per-directory configuration structures</a></h3>
|
|
|
|
Let's look out how all of this plays out in <code>mod_mime.c</code>,
|
|
which defines the file typing handler which emulates the NCSA server's
|
|
behavior of determining file types from suffixes. What we'll be
|
|
looking at, here, is the code which implements the
|
|
<code>AddType</code> and <code>AddEncoding</code> commands. These
|
|
commands can appear in <code>.htaccess</code> files, so they must be
|
|
handled in the module's private per-directory data, which in fact,
|
|
consists of two separate <code>table</code>s for MIME types and
|
|
encoding information, and is declared as follows:
|
|
|
|
<pre>
|
|
typedef struct {
|
|
table *forced_types; /* Additional AddTyped stuff */
|
|
table *encoding_types; /* Added with AddEncoding... */
|
|
} mime_dir_config;
|
|
</pre>
|
|
|
|
When the server is reading a configuration file, or
|
|
<code><Directory></code> section, which includes one of the MIME
|
|
module's commands, it needs to create a <code>mime_dir_config</code>
|
|
structure, so those commands have something to act on. It does this
|
|
by invoking the function it finds in the module's `create per-dir
|
|
config slot', with two arguments: the name of the directory to which
|
|
this configuration information applies (or <code>NULL</code> for
|
|
<code>srm.conf</code>), and a pointer to a resource pool in which the
|
|
allocation should happen. <p>
|
|
|
|
(If we are reading a <code>.htaccess</code> file, that resource pool
|
|
is the per-request resource pool for the request; otherwise it is a
|
|
resource pool which is used for configuration data, and cleared on
|
|
restarts. Either way, it is important for the structure being created
|
|
to vanish when the pool is cleared, by registering a cleanup on the
|
|
pool if necessary). <p>
|
|
|
|
For the MIME module, the per-dir config creation function just
|
|
<code>palloc</code>s the structure above, and a creates a couple of
|
|
<code>table</code>s to fill it. That looks like this:
|
|
|
|
<pre>
|
|
void *create_mime_dir_config (pool *p, char *dummy)
|
|
{
|
|
mime_dir_config *new =
|
|
(mime_dir_config *) palloc (p, sizeof(mime_dir_config));
|
|
|
|
new->forced_types = make_table (p, 4);
|
|
new->encoding_types = make_table (p, 4);
|
|
|
|
return new;
|
|
}
|
|
</pre>
|
|
|
|
Now, suppose we've just read in a <code>.htaccess</code> file. We
|
|
already have the per-directory configuration structure for the next
|
|
directory up in the hierarchy. If the <code>.htaccess</code> file we
|
|
just read in didn't have any <code>AddType</code> or
|
|
<code>AddEncoding</code> commands, its per-directory config structure
|
|
for the MIME module is still valid, and we can just use it.
|
|
Otherwise, we need to merge the two structures somehow. <p>
|
|
|
|
To do that, the server invokes the module's per-directory config merge
|
|
function, if one is present. That function takes three arguments:
|
|
the two structures being merged, and a resource pool in which to
|
|
allocate the result. For the MIME module, all that needs to be done
|
|
is overlay the tables from the new per-directory config structure with
|
|
those from the parent:
|
|
|
|
<pre>
|
|
void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
|
|
{
|
|
mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
|
|
mime_dir_config *subdir = (mime_dir_config *)subdirv;
|
|
mime_dir_config *new =
|
|
(mime_dir_config *)palloc (p, sizeof(mime_dir_config));
|
|
|
|
new->forced_types = overlay_tables (p, subdir->forced_types,
|
|
parent_dir->forced_types);
|
|
new->encoding_types = overlay_tables (p, subdir->encoding_types,
|
|
parent_dir->encoding_types);
|
|
|
|
return new;
|
|
}
|
|
</pre>
|
|
|
|
As a note --- if there is no per-directory merge function present, the
|
|
server will just use the subdirectory's configuration info, and ignore
|
|
the parent's. For some modules, that works just fine (e.g., for the
|
|
includes module, whose per-directory configuration information
|
|
consists solely of the state of the <code>XBITHACK</code>), and for
|
|
those modules, you can just not declare one, and leave the
|
|
corresponding structure slot in the module itself <code>NULL</code>.<p>
|
|
|
|
<h3><a name="commands">Command handling</a></h3>
|
|
|
|
Now that we have these structures, we need to be able to figure out
|
|
how to fill them. That involves processing the actual
|
|
<code>AddType</code> and <code>AddEncoding</code> commands. To find
|
|
commands, the server looks in the module's <code>command table</code>.
|
|
That table contains information on how many arguments the commands
|
|
take, and in what formats, where it is permitted, and so forth. That
|
|
information is sufficient to allow the server to invoke most
|
|
command-handling functions with pre-parsed arguments. Without further
|
|
ado, let's look at the <code>AddType</code> command handler, which
|
|
looks like this (the <code>AddEncoding</code> command looks basically
|
|
the same, and won't be shown here):
|
|
|
|
<pre>
|
|
char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
|
|
{
|
|
if (*ext == '.') ++ext;
|
|
table_set (m->forced_types, ext, ct);
|
|
return NULL;
|
|
}
|
|
</pre>
|
|
|
|
This command handler is unusually simple. As you can see, it takes
|
|
four arguments, two of which are pre-parsed arguments, the third being
|
|
the per-directory configuration structure for the module in question,
|
|
and the fourth being a pointer to a <code>cmd_parms</code> structure.
|
|
That structure contains a bunch of arguments which are frequently of
|
|
use to some, but not all, commands, including a resource pool (from
|
|
which memory can be allocated, and to which cleanups should be tied),
|
|
and the (virtual) server being configured, from which the module's
|
|
per-server configuration data can be obtained if required.<p>
|
|
|
|
Another way in which this particular command handler is unusually
|
|
simple is that there are no error conditions which it can encounter.
|
|
If there were, it could return an error message instead of
|
|
<code>NULL</code>; this causes an error to be printed out on the
|
|
server's <code>stderr</code>, followed by a quick exit, if it is in
|
|
the main config files; for a <code>.htaccess</code> file, the syntax
|
|
error is logged in the server error log (along with an indication of
|
|
where it came from), and the request is bounced with a server error
|
|
response (HTTP error status, code 500). <p>
|
|
|
|
The MIME module's command table has entries for these commands, which
|
|
look like this:
|
|
|
|
<pre>
|
|
command_rec mime_cmds[] = {
|
|
{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
|
|
"a mime type followed by a file extension" },
|
|
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
|
|
"an encoding (e.g., gzip), followed by a file extension" },
|
|
{ NULL }
|
|
};
|
|
</pre>
|
|
|
|
The entries in these tables are:
|
|
|
|
<ul>
|
|
<li> The name of the command
|
|
<li> The function which handles it
|
|
<li> a <code>(void *)</code> pointer, which is passed in the
|
|
<code>cmd_parms</code> structure to the command handler ---
|
|
this is useful in case many similar commands are handled by the
|
|
same function.
|
|
<li> A bit mask indicating where the command may appear. There are
|
|
mask bits corresponding to each <code>AllowOverride</code>
|
|
option, and an additional mask bit, <code>RSRC_CONF</code>,
|
|
indicating that the command may appear in the server's own
|
|
config files, but <em>not</em> in any <code>.htaccess</code>
|
|
file.
|
|
<li> A flag indicating how many arguments the command handler wants
|
|
pre-parsed, and how they should be passed in.
|
|
<code>TAKE2</code> indicates two pre-parsed arguments. Other
|
|
options are <code>TAKE1</code>, which indicates one pre-parsed
|
|
argument, <code>FLAG</code>, which indicates that the argument
|
|
should be <code>On</code> or <code>Off</code>, and is passed in
|
|
as a boolean flag, <code>RAW_ARGS</code>, which causes the
|
|
server to give the command the raw, unparsed arguments
|
|
(everything but the command name itself). There is also
|
|
<code>ITERATE</code>, which means that the handler looks the
|
|
same as <code>TAKE1</code>, but that if multiple arguments are
|
|
present, it should be called multiple times, and finally
|
|
<code>ITERATE2</code>, which indicates that the command handler
|
|
looks like a <code>TAKE2</code>, but if more arguments are
|
|
present, then it should be called multiple times, holding the
|
|
first argument constant.
|
|
<li> Finally, we have a string which describes the arguments that
|
|
should be present. If the arguments in the actual config file
|
|
are not as required, this string will be used to help give a
|
|
more specific error message. (You can safely leave this
|
|
<code>NULL</code>).
|
|
</ul>
|
|
|
|
Finally, having set this all up, we have to use it. This is
|
|
ultimately done in the module's handlers, specifically for its
|
|
file-typing handler, which looks more or less like this; note that the
|
|
per-directory configuration structure is extracted from the
|
|
<code>request_rec</code>'s per-directory configuration vector by using
|
|
the <code>get_module_config</code> function.
|
|
|
|
<pre>
|
|
int find_ct(request_rec *r)
|
|
{
|
|
int i;
|
|
char *fn = pstrdup (r->pool, r->filename);
|
|
mime_dir_config *conf = (mime_dir_config *)
|
|
get_module_config(r->per_dir_config, &mime_module);
|
|
char *type;
|
|
|
|
if (S_ISDIR(r->finfo.st_mode)) {
|
|
r->content_type = DIR_MAGIC_TYPE;
|
|
return OK;
|
|
}
|
|
|
|
if((i=rind(fn,'.')) < 0) return DECLINED;
|
|
++i;
|
|
|
|
if ((type = table_get (conf->encoding_types, &fn[i])))
|
|
{
|
|
r->content_encoding = type;
|
|
|
|
/* go back to previous extension to try to use it as a type */
|
|
|
|
fn[i-1] = '\0';
|
|
if((i=rind(fn,'.')) < 0) return OK;
|
|
++i;
|
|
}
|
|
|
|
if ((type = table_get (conf->forced_types, &fn[i])))
|
|
{
|
|
r->content_type = type;
|
|
}
|
|
|
|
return OK;
|
|
}
|
|
|
|
</pre>
|
|
|
|
<h3><a name="servconf">Side notes --- per-server configuration, virtual servers, etc.</a></h3>
|
|
|
|
The basic ideas behind per-server module configuration are basically
|
|
the same as those for per-directory configuration; there is a creation
|
|
function and a merge function, the latter being invoked where a
|
|
virtual server has partially overridden the base server configuration,
|
|
and a combined structure must be computed. (As with per-directory
|
|
configuration, the default if no merge function is specified, and a
|
|
module is configured in some virtual server, is that the base
|
|
configuration is simply ignored). <p>
|
|
|
|
The only substantial difference is that when a command needs to
|
|
configure the per-server private module data, it needs to go to the
|
|
<code>cmd_parms</code> data to get at it. Here's an example, from the
|
|
alias module, which also indicates how a syntax error can be returned
|
|
(note that the per-directory configuration argument to the command
|
|
handler is declared as a dummy, since the module doesn't actually have
|
|
per-directory config data):
|
|
|
|
<pre>
|
|
char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
|
|
{
|
|
server_rec *s = cmd->server;
|
|
alias_server_conf *conf = (alias_server_conf *)
|
|
get_module_config(s->module_config,&alias_module);
|
|
alias_entry *new = push_array (conf->redirects);
|
|
|
|
if (!is_url (url)) return "Redirect to non-URL";
|
|
|
|
new->fake = f; new->real = url;
|
|
return NULL;
|
|
}
|
|
</pre>
|
|
<!--#include virtual="footer.html" -->
|
|
</body></html>
|