mirror of
				https://github.com/apache/httpd.git
				synced 2025-11-03 17:53:20 +03:00 
			
		
		
		
	git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@96194 13f79535-47bb-0310-9956-ffa450edef68
		
			
				
	
	
		
			1246 lines
		
	
	
		
			52 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			1246 lines
		
	
	
		
			52 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 | 
						|
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 | 
						|
 | 
						|
<html xmlns="http://www.w3.org/1999/xhtml">
 | 
						|
  <head>
 | 
						|
    <meta name="generator" content="HTML Tidy, see www.w3.org" />
 | 
						|
 | 
						|
    <title>Apache API notes</title>
 | 
						|
  </head>
 | 
						|
  <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
 | 
						|
 | 
						|
  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
 | 
						|
  vlink="#000080" alink="#FF0000">
 | 
						|
    <!--#include virtual="header.html" -->
 | 
						|
 | 
						|
    <blockquote>
 | 
						|
      <strong>Warning:</strong> This document has not been updated
 | 
						|
      to take into account changes made in the 2.0 version of the
 | 
						|
      Apache HTTP Server. Some of the information may still be
 | 
						|
      relevant, but please use it with care.
 | 
						|
    </blockquote>
 | 
						|
 | 
						|
    <h1 align="center">Apache API notes</h1>
 | 
						|
    These are some notes on the Apache API and the data structures
 | 
						|
    you have to deal with, <em>etc.</em> They are not yet nearly
 | 
						|
    complete, but hopefully, they will help you get your bearings.
 | 
						|
    Keep in mind that the API is still subject to change as we gain
 | 
						|
    experience with it. (See the TODO file for what <em>might</em>
 | 
						|
    be coming). However, it will be easy to adapt modules to any
 | 
						|
    changes that are made. (We have more modules to adapt than you
 | 
						|
    do). 
 | 
						|
 | 
						|
    <p>A few notes on general pedagogical style here. In the
 | 
						|
    interest of conciseness, all structure declarations here are
 | 
						|
    incomplete --- the real ones have more slots that I'm not
 | 
						|
    telling you about. For the most part, these are reserved to one
 | 
						|
    component of the server core or another, and should be altered
 | 
						|
    by modules with caution. However, in some cases, they really
 | 
						|
    are things I just haven't gotten around to yet. Welcome to the
 | 
						|
    bleeding edge.</p>
 | 
						|
 | 
						|
    <p>Finally, here's an outline, to give you some bare idea of
 | 
						|
    what's coming up, and in what order:</p>
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li>
 | 
						|
        <a href="#basics">Basic concepts.</a> 
 | 
						|
 | 
						|
        <ul>
 | 
						|
          <li><a href="#HMR">Handlers, Modules, and
 | 
						|
          Requests</a></li>
 | 
						|
 | 
						|
          <li><a href="#moduletour">A brief tour of a
 | 
						|
          module</a></li>
 | 
						|
        </ul>
 | 
						|
      </li>
 | 
						|
 | 
						|
      <li>
 | 
						|
        <a href="#handlers">How handlers work</a> 
 | 
						|
 | 
						|
        <ul>
 | 
						|
          <li><a href="#req_tour">A brief tour of the
 | 
						|
          <code>request_rec</code></a></li>
 | 
						|
 | 
						|
          <li><a href="#req_orig">Where request_rec structures come
 | 
						|
          from</a></li>
 | 
						|
 | 
						|
          <li><a href="#req_return">Handling requests, declining,
 | 
						|
          and returning error codes</a></li>
 | 
						|
 | 
						|
          <li><a href="#resp_handlers">Special considerations for
 | 
						|
          response handlers</a></li>
 | 
						|
 | 
						|
          <li><a href="#auth_handlers">Special considerations for
 | 
						|
          authentication handlers</a></li>
 | 
						|
 | 
						|
          <li><a href="#log_handlers">Special considerations for
 | 
						|
          logging handlers</a></li>
 | 
						|
        </ul>
 | 
						|
      </li>
 | 
						|
 | 
						|
      <li><a href="#pools">Resource allocation and resource
 | 
						|
      pools</a></li>
 | 
						|
 | 
						|
      <li>
 | 
						|
        <a href="#config">Configuration, commands and the like</a> 
 | 
						|
 | 
						|
        <ul>
 | 
						|
          <li><a href="#per-dir">Per-directory configuration
 | 
						|
          structures</a></li>
 | 
						|
 | 
						|
          <li><a href="#commands">Command handling</a></li>
 | 
						|
 | 
						|
          <li><a href="#servconf">Side notes --- per-server
 | 
						|
          configuration, virtual servers, <em>etc</em>.</a></li>
 | 
						|
        </ul>
 | 
						|
      </li>
 | 
						|
    </ul>
 | 
						|
 | 
						|
    <h2><a id="basics" name="basics">Basic concepts.</a></h2>
 | 
						|
    We begin with an overview of the basic concepts behind the API,
 | 
						|
    and how they are manifested in the code. 
 | 
						|
 | 
						|
    <h3><a id="HMR" name="HMR">Handlers, Modules, and
 | 
						|
    Requests</a></h3>
 | 
						|
    Apache breaks down request handling into a series of steps,
 | 
						|
    more or less the same way the Netscape server API does
 | 
						|
    (although this API has a few more stages than NetSite does, as
 | 
						|
    hooks for stuff I thought might be useful in the future). These
 | 
						|
    are: 
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li>URI -> Filename translation</li>
 | 
						|
 | 
						|
      <li>Auth ID checking [is the user who they say they
 | 
						|
      are?]</li>
 | 
						|
 | 
						|
      <li>Auth access checking [is the user authorized
 | 
						|
      <em>here</em>?]</li>
 | 
						|
 | 
						|
      <li>Access checking other than auth</li>
 | 
						|
 | 
						|
      <li>Determining MIME type of the object requested</li>
 | 
						|
 | 
						|
      <li>`Fixups' --- there aren't any of these yet, but the phase
 | 
						|
      is intended as a hook for possible extensions like
 | 
						|
      <code>SetEnv</code>, which don't really fit well
 | 
						|
      elsewhere.</li>
 | 
						|
 | 
						|
      <li>Actually sending a response back to the client.</li>
 | 
						|
 | 
						|
      <li>Logging the request</li>
 | 
						|
    </ul>
 | 
						|
    These phases are handled by looking at each of a succession of
 | 
						|
    <em>modules</em>, looking to see if each of them has a handler
 | 
						|
    for the phase, and attempting invoking it if so. The handler
 | 
						|
    can typically do one of three things: 
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li><em>Handle</em> the request, and indicate that it has
 | 
						|
      done so by returning the magic constant <code>OK</code>.</li>
 | 
						|
 | 
						|
      <li><em>Decline</em> to handle the request, by returning the
 | 
						|
      magic integer constant <code>DECLINED</code>. In this case,
 | 
						|
      the server behaves in all respects as if the handler simply
 | 
						|
      hadn't been there.</li>
 | 
						|
 | 
						|
      <li>Signal an error, by returning one of the HTTP error
 | 
						|
      codes. This terminates normal handling of the request,
 | 
						|
      although an ErrorDocument may be invoked to try to mop up,
 | 
						|
      and it will be logged in any case.</li>
 | 
						|
    </ul>
 | 
						|
    Most phases are terminated by the first module that handles
 | 
						|
    them; however, for logging, `fixups', and non-access
 | 
						|
    authentication checking, all handlers always run (barring an
 | 
						|
    error). Also, the response phase is unique in that modules may
 | 
						|
    declare multiple handlers for it, via a dispatch table keyed on
 | 
						|
    the MIME type of the requested object. Modules may declare a
 | 
						|
    response-phase handler which can handle <em>any</em> request,
 | 
						|
    by giving it the key <code>*/*</code> (<em>i.e.</em>, a
 | 
						|
    wildcard MIME type specification). However, wildcard handlers
 | 
						|
    are only invoked if the server has already tried and failed to
 | 
						|
    find a more specific response handler for the MIME type of the
 | 
						|
    requested object (either none existed, or they all declined). 
 | 
						|
 | 
						|
    <p>The handlers themselves are functions of one argument (a
 | 
						|
    <code>request_rec</code> structure. vide infra), which returns
 | 
						|
    an integer, as above.</p>
 | 
						|
 | 
						|
    <h3><a id="moduletour" name="moduletour">A brief tour of a
 | 
						|
    module</a></h3>
 | 
						|
    At this point, we need to explain the structure of a module.
 | 
						|
    Our candidate will be one of the messier ones, the CGI module
 | 
						|
    --- this handles both CGI scripts and the
 | 
						|
    <code>ScriptAlias</code> config file command. It's actually a
 | 
						|
    great deal more complicated than most modules, but if we're
 | 
						|
    going to have only one example, it might as well be the one
 | 
						|
    with its fingers in every place. 
 | 
						|
 | 
						|
    <p>Let's begin with handlers. In order to handle the CGI
 | 
						|
    scripts, the module declares a response handler for them.
 | 
						|
    Because of <code>ScriptAlias</code>, it also has handlers for
 | 
						|
    the name translation phase (to recognize
 | 
						|
    <code>ScriptAlias</code>ed URIs), the type-checking phase (any
 | 
						|
    <code>ScriptAlias</code>ed request is typed as a CGI
 | 
						|
    script).</p>
 | 
						|
 | 
						|
    <p>The module needs to maintain some per (virtual) server
 | 
						|
    information, namely, the <code>ScriptAlias</code>es in effect;
 | 
						|
    the module structure therefore contains pointers to a functions
 | 
						|
    which builds these structures, and to another which combines
 | 
						|
    two of them (in case the main server and a virtual server both
 | 
						|
    have <code>ScriptAlias</code>es declared).</p>
 | 
						|
 | 
						|
    <p>Finally, this module contains code to handle the
 | 
						|
    <code>ScriptAlias</code> command itself. This particular module
 | 
						|
    only declares one command, but there could be more, so modules
 | 
						|
    have <em>command tables</em> which declare their commands, and
 | 
						|
    describe where they are permitted, and how they are to be
 | 
						|
    invoked.</p>
 | 
						|
 | 
						|
    <p>A final note on the declared types of the arguments of some
 | 
						|
    of these commands: a <code>pool</code> is a pointer to a
 | 
						|
    <em>resource pool</em> structure; these are used by the server
 | 
						|
    to keep track of the memory which has been allocated, files
 | 
						|
    opened, <em>etc.</em>, either to service a particular request,
 | 
						|
    or to handle the process of configuring itself. That way, when
 | 
						|
    the request is over (or, for the configuration pool, when the
 | 
						|
    server is restarting), the memory can be freed, and the files
 | 
						|
    closed, <em>en masse</em>, without anyone having to write
 | 
						|
    explicit code to track them all down and dispose of them. Also,
 | 
						|
    a <code>cmd_parms</code> structure contains various information
 | 
						|
    about the config file being read, and other status information,
 | 
						|
    which is sometimes of use to the function which processes a
 | 
						|
    config-file command (such as <code>ScriptAlias</code>). With no
 | 
						|
    further ado, the module itself:</p>
 | 
						|
<pre>
 | 
						|
/* Declarations of handlers. */
 | 
						|
 | 
						|
int translate_scriptalias (request_rec *);
 | 
						|
int type_scriptalias (request_rec *);
 | 
						|
int cgi_handler (request_rec *);
 | 
						|
 | 
						|
/* Subsidiary dispatch table for response-phase handlers, by MIME type */
 | 
						|
 | 
						|
handler_rec cgi_handlers[] = {
 | 
						|
{ "application/x-httpd-cgi", cgi_handler },
 | 
						|
{ NULL }
 | 
						|
};
 | 
						|
 | 
						|
/* Declarations of routines to manipulate the module's configuration
 | 
						|
 * info.  Note that these are returned, and passed in, as void *'s;
 | 
						|
 * the server core keeps track of them, but it doesn't, and can't,
 | 
						|
 * know their internal structure.
 | 
						|
 */
 | 
						|
 | 
						|
void *make_cgi_server_config (pool *);
 | 
						|
void *merge_cgi_server_config (pool *, void *, void *);
 | 
						|
 | 
						|
/* Declarations of routines to handle config-file commands */
 | 
						|
 | 
						|
extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
 | 
						|
                          char *real);
 | 
						|
 | 
						|
command_rec cgi_cmds[] = {
 | 
						|
{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
 | 
						|
    "a fakename and a realname"},
 | 
						|
{ NULL }
 | 
						|
};
 | 
						|
 | 
						|
module cgi_module = {
 | 
						|
   STANDARD_MODULE_STUFF,
 | 
						|
   NULL,                     /* initializer */
 | 
						|
   NULL,                     /* dir config creator */
 | 
						|
   NULL,                     /* dir merger --- default is to override */
 | 
						|
   make_cgi_server_config,   /* server config */
 | 
						|
   merge_cgi_server_config,  /* merge server config */
 | 
						|
   cgi_cmds,                 /* command table */
 | 
						|
   cgi_handlers,             /* handlers */
 | 
						|
   translate_scriptalias,    /* filename translation */
 | 
						|
   NULL,                     /* check_user_id */
 | 
						|
   NULL,                     /* check auth */
 | 
						|
   NULL,                     /* check access */
 | 
						|
   type_scriptalias,         /* type_checker */
 | 
						|
   NULL,                     /* fixups */
 | 
						|
   NULL,                     /* logger */
 | 
						|
   NULL                      /* header parser */
 | 
						|
};
 | 
						|
</pre>
 | 
						|
 | 
						|
    <h2><a id="handlers" name="handlers">How handlers work</a></h2>
 | 
						|
    The sole argument to handlers is a <code>request_rec</code>
 | 
						|
    structure. This structure describes a particular request which
 | 
						|
    has been made to the server, on behalf of a client. In most
 | 
						|
    cases, each connection to the client generates only one
 | 
						|
    <code>request_rec</code> structure. 
 | 
						|
 | 
						|
    <h3><a id="req_tour" name="req_tour">A brief tour of the
 | 
						|
    <code>request_rec</code></a></h3>
 | 
						|
    The <code>request_rec</code> contains pointers to a resource
 | 
						|
    pool which will be cleared when the server is finished handling
 | 
						|
    the request; to structures containing per-server and
 | 
						|
    per-connection information, and most importantly, information
 | 
						|
    on the request itself. 
 | 
						|
 | 
						|
    <p>The most important such information is a small set of
 | 
						|
    character strings describing attributes of the object being
 | 
						|
    requested, including its URI, filename, content-type and
 | 
						|
    content-encoding (these being filled in by the translation and
 | 
						|
    type-check handlers which handle the request,
 | 
						|
    respectively).</p>
 | 
						|
 | 
						|
    <p>Other commonly used data items are tables giving the MIME
 | 
						|
    headers on the client's original request, MIME headers to be
 | 
						|
    sent back with the response (which modules can add to at will),
 | 
						|
    and environment variables for any subprocesses which are
 | 
						|
    spawned off in the course of servicing the request. These
 | 
						|
    tables are manipulated using the <code>ap_table_get</code> and
 | 
						|
    <code>ap_table_set</code> routines.</p>
 | 
						|
 | 
						|
    <blockquote>
 | 
						|
      Note that the <samp>Content-type</samp> header value
 | 
						|
      <em>cannot</em> be set by module content-handlers using the
 | 
						|
      <samp>ap_table_*()</samp> routines. Rather, it is set by
 | 
						|
      pointing the <samp>content_type</samp> field in the
 | 
						|
      <samp>request_rec</samp> structure to an appropriate string.
 | 
						|
      <em>E.g.</em>, 
 | 
						|
<pre>
 | 
						|
  r->content_type = "text/html";
 | 
						|
</pre>
 | 
						|
    </blockquote>
 | 
						|
    Finally, there are pointers to two data structures which, in
 | 
						|
    turn, point to per-module configuration structures.
 | 
						|
    Specifically, these hold pointers to the data structures which
 | 
						|
    the module has built to describe the way it has been configured
 | 
						|
    to operate in a given directory (via <code>.htaccess</code>
 | 
						|
    files or <code><Directory></code> sections), for private
 | 
						|
    data it has built in the course of servicing the request (so
 | 
						|
    modules' handlers for one phase can pass `notes' to their
 | 
						|
    handlers for other phases). There is another such configuration
 | 
						|
    vector in the <code>server_rec</code> data structure pointed to
 | 
						|
    by the <code>request_rec</code>, which contains per (virtual)
 | 
						|
    server configuration data. 
 | 
						|
 | 
						|
    <p>Here is an abridged declaration, giving the fields most
 | 
						|
    commonly used:</p>
 | 
						|
<pre>
 | 
						|
struct request_rec {
 | 
						|
 | 
						|
  pool *pool;
 | 
						|
  conn_rec *connection;
 | 
						|
  server_rec *server;
 | 
						|
 | 
						|
  /* What object is being requested */
 | 
						|
 | 
						|
  char *uri;
 | 
						|
  char *filename;
 | 
						|
  char *path_info;
 | 
						|
  char *args;           /* QUERY_ARGS, if any */
 | 
						|
  struct stat finfo;    /* Set by server core;
 | 
						|
                         * st_mode set to zero if no such file */
 | 
						|
 | 
						|
  char *content_type;
 | 
						|
  char *content_encoding;
 | 
						|
 | 
						|
  /* MIME header environments, in and out.  Also, an array containing
 | 
						|
   * environment variables to be passed to subprocesses, so people can
 | 
						|
   * write modules to add to that environment.
 | 
						|
   *
 | 
						|
   * The difference between headers_out and err_headers_out is that
 | 
						|
   * the latter are printed even on error, and persist across internal
 | 
						|
   * redirects (so the headers printed for ErrorDocument handlers will
 | 
						|
   * have them).
 | 
						|
   */
 | 
						|
 | 
						|
  table *headers_in;
 | 
						|
  table *headers_out;
 | 
						|
  table *err_headers_out;
 | 
						|
  table *subprocess_env;
 | 
						|
 | 
						|
  /* Info about the request itself... */
 | 
						|
 | 
						|
  int header_only;     /* HEAD request, as opposed to GET */
 | 
						|
  char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */
 | 
						|
  char *method;        /* GET, HEAD, POST, <em>etc.</em> */
 | 
						|
  int method_number;   /* M_GET, M_POST, <em>etc.</em> */
 | 
						|
 | 
						|
  /* Info for logging */
 | 
						|
 | 
						|
  char *the_request;
 | 
						|
  int bytes_sent;
 | 
						|
 | 
						|
  /* A flag which modules can set, to indicate that the data being
 | 
						|
   * returned is volatile, and clients should be told not to cache it.
 | 
						|
   */
 | 
						|
 | 
						|
  int no_cache;
 | 
						|
 | 
						|
  /* Various other config info which may change with .htaccess files
 | 
						|
   * These are config vectors, with one void* pointer for each module
 | 
						|
   * (the thing pointed to being the module's business).
 | 
						|
   */
 | 
						|
 | 
						|
  void *per_dir_config;   /* Options set in config files, <em>etc.</em> */
 | 
						|
  void *request_config;   /* Notes on *this* request */
 | 
						|
 | 
						|
};
 | 
						|
 | 
						|
</pre>
 | 
						|
 | 
						|
    <h3><a id="req_orig" name="req_orig">Where request_rec
 | 
						|
    structures come from</a></h3>
 | 
						|
    Most <code>request_rec</code> structures are built by reading
 | 
						|
    an HTTP request from a client, and filling in the fields.
 | 
						|
    However, there are a few exceptions: 
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li>If the request is to an imagemap, a type map
 | 
						|
      (<em>i.e.</em>, a <code>*.var</code> file), or a CGI script
 | 
						|
      which returned a local `Location:', then the resource which
 | 
						|
      the user requested is going to be ultimately located by some
 | 
						|
      URI other than what the client originally supplied. In this
 | 
						|
      case, the server does an <em>internal redirect</em>,
 | 
						|
      constructing a new <code>request_rec</code> for the new URI,
 | 
						|
      and processing it almost exactly as if the client had
 | 
						|
      requested the new URI directly.</li>
 | 
						|
 | 
						|
      <li>If some handler signaled an error, and an
 | 
						|
      <code>ErrorDocument</code> is in scope, the same internal
 | 
						|
      redirect machinery comes into play.</li>
 | 
						|
 | 
						|
      <li>
 | 
						|
        Finally, a handler occasionally needs to investigate `what
 | 
						|
        would happen if' some other request were run. For instance,
 | 
						|
        the directory indexing module needs to know what MIME type
 | 
						|
        would be assigned to a request for each directory entry, in
 | 
						|
        order to figure out what icon to use. 
 | 
						|
 | 
						|
        <p>Such handlers can construct a <em>sub-request</em>,
 | 
						|
        using the functions <code>ap_sub_req_lookup_file</code>,
 | 
						|
        <code>ap_sub_req_lookup_uri</code>, and
 | 
						|
        <code>ap_sub_req_method_uri</code>; these construct a new
 | 
						|
        <code>request_rec</code> structure and processes it as you
 | 
						|
        would expect, up to but not including the point of actually
 | 
						|
        sending a response. (These functions skip over the access
 | 
						|
        checks if the sub-request is for a file in the same
 | 
						|
        directory as the original request).</p>
 | 
						|
 | 
						|
        <p>(Server-side includes work by building sub-requests and
 | 
						|
        then actually invoking the response handler for them, via
 | 
						|
        the function <code>ap_run_sub_req</code>).</p>
 | 
						|
      </li>
 | 
						|
    </ul>
 | 
						|
 | 
						|
    <h3><a id="req_return" name="req_return">Handling requests,
 | 
						|
    declining, and returning error codes</a></h3>
 | 
						|
    As discussed above, each handler, when invoked to handle a
 | 
						|
    particular <code>request_rec</code>, has to return an
 | 
						|
    <code>int</code> to indicate what happened. That can either be 
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li>OK --- the request was handled successfully. This may or
 | 
						|
      may not terminate the phase.</li>
 | 
						|
 | 
						|
      <li>DECLINED --- no erroneous condition exists, but the
 | 
						|
      module declines to handle the phase; the server tries to find
 | 
						|
      another.</li>
 | 
						|
 | 
						|
      <li>an HTTP error code, which aborts handling of the
 | 
						|
      request.</li>
 | 
						|
    </ul>
 | 
						|
    Note that if the error code returned is <code>REDIRECT</code>,
 | 
						|
    then the module should put a <code>Location</code> in the
 | 
						|
    request's <code>headers_out</code>, to indicate where the
 | 
						|
    client should be redirected <em>to</em>. 
 | 
						|
 | 
						|
    <h3><a id="resp_handlers" name="resp_handlers">Special
 | 
						|
    considerations for response handlers</a></h3>
 | 
						|
    Handlers for most phases do their work by simply setting a few
 | 
						|
    fields in the <code>request_rec</code> structure (or, in the
 | 
						|
    case of access checkers, simply by returning the correct error
 | 
						|
    code). However, response handlers have to actually send a
 | 
						|
    request back to the client. 
 | 
						|
 | 
						|
    <p>They should begin by sending an HTTP response header, using
 | 
						|
    the function <code>ap_send_http_header</code>. (You don't have
 | 
						|
    to do anything special to skip sending the header for HTTP/0.9
 | 
						|
    requests; the function figures out on its own that it shouldn't
 | 
						|
    do anything). If the request is marked
 | 
						|
    <code>header_only</code>, that's all they should do; they
 | 
						|
    should return after that, without attempting any further
 | 
						|
    output.</p>
 | 
						|
 | 
						|
    <p>Otherwise, they should produce a request body which responds
 | 
						|
    to the client as appropriate. The primitives for this are
 | 
						|
    <code>ap_rputc</code> and <code>ap_rprintf</code>, for
 | 
						|
    internally generated output, and <code>ap_send_fd</code>, to
 | 
						|
    copy the contents of some <code>FILE *</code> straight to the
 | 
						|
    client.</p>
 | 
						|
 | 
						|
    <p>At this point, you should more or less understand the
 | 
						|
    following piece of code, which is the handler which handles
 | 
						|
    <code>GET</code> requests which have no more specific handler;
 | 
						|
    it also shows how conditional <code>GET</code>s can be handled,
 | 
						|
    if it's desirable to do so in a particular response handler ---
 | 
						|
    <code>ap_set_last_modified</code> checks against the
 | 
						|
    <code>If-modified-since</code> value supplied by the client, if
 | 
						|
    any, and returns an appropriate code (which will, if nonzero,
 | 
						|
    be USE_LOCAL_COPY). No similar considerations apply for
 | 
						|
    <code>ap_set_content_length</code>, but it returns an error
 | 
						|
    code for symmetry.</p>
 | 
						|
<pre>
 | 
						|
int default_handler (request_rec *r)
 | 
						|
{
 | 
						|
    int errstatus;
 | 
						|
    FILE *f;
 | 
						|
 | 
						|
    if (r->method_number != M_GET) return DECLINED;
 | 
						|
    if (r->finfo.st_mode == 0) return NOT_FOUND;
 | 
						|
 | 
						|
    if ((errstatus = ap_set_content_length (r, r->finfo.st_size))
 | 
						|
    || (errstatus = ap_set_last_modified (r, r->finfo.st_mtime)))
 | 
						|
        return errstatus;
 | 
						|
 | 
						|
    f = fopen (r->filename, "r");
 | 
						|
 | 
						|
    if (f == NULL) {
 | 
						|
        log_reason("file permissions deny server access",
 | 
						|
                   r->filename, r);
 | 
						|
        return FORBIDDEN;
 | 
						|
    }
 | 
						|
 | 
						|
    register_timeout ("send", r);
 | 
						|
    ap_send_http_header (r);
 | 
						|
 | 
						|
    if (!r->header_only) send_fd (f, r);
 | 
						|
    ap_pfclose (r->pool, f);
 | 
						|
    return OK;
 | 
						|
}
 | 
						|
</pre>
 | 
						|
    Finally, if all of this is too much of a challenge, there are a
 | 
						|
    few ways out of it. First off, as shown above, a response
 | 
						|
    handler which has not yet produced any output can simply return
 | 
						|
    an error code, in which case the server will automatically
 | 
						|
    produce an error response. Secondly, it can punt to some other
 | 
						|
    handler by invoking <code>ap_internal_redirect</code>, which is
 | 
						|
    how the internal redirection machinery discussed above is
 | 
						|
    invoked. A response handler which has internally redirected
 | 
						|
    should always return <code>OK</code>. 
 | 
						|
 | 
						|
    <p>(Invoking <code>ap_internal_redirect</code> from handlers
 | 
						|
    which are <em>not</em> response handlers will lead to serious
 | 
						|
    confusion).</p>
 | 
						|
 | 
						|
    <h3><a id="auth_handlers" name="auth_handlers">Special
 | 
						|
    considerations for authentication handlers</a></h3>
 | 
						|
    Stuff that should be discussed here in detail: 
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li>Authentication-phase handlers not invoked unless auth is
 | 
						|
      configured for the directory.</li>
 | 
						|
 | 
						|
      <li>Common auth configuration stored in the core per-dir
 | 
						|
      configuration; it has accessors <code>ap_auth_type</code>,
 | 
						|
      <code>ap_auth_name</code>, and <code>ap_requires</code>.</li>
 | 
						|
 | 
						|
      <li>Common routines, to handle the protocol end of things, at
 | 
						|
      least for HTTP basic authentication
 | 
						|
      (<code>ap_get_basic_auth_pw</code>, which sets the
 | 
						|
      <code>connection->user</code> structure field
 | 
						|
      automatically, and <code>ap_note_basic_auth_failure</code>,
 | 
						|
      which arranges for the proper <code>WWW-Authenticate:</code>
 | 
						|
      header to be sent back).</li>
 | 
						|
    </ul>
 | 
						|
 | 
						|
    <h3><a id="log_handlers" name="log_handlers">Special
 | 
						|
    considerations for logging handlers</a></h3>
 | 
						|
    When a request has internally redirected, there is the question
 | 
						|
    of what to log. Apache handles this by bundling the entire
 | 
						|
    chain of redirects into a list of <code>request_rec</code>
 | 
						|
    structures which are threaded through the
 | 
						|
    <code>r->prev</code> and <code>r->next</code> pointers.
 | 
						|
    The <code>request_rec</code> which is passed to the logging
 | 
						|
    handlers in such cases is the one which was originally built
 | 
						|
    for the initial request from the client; note that the
 | 
						|
    bytes_sent field will only be correct in the last request in
 | 
						|
    the chain (the one for which a response was actually sent). 
 | 
						|
 | 
						|
    <h2><a id="pools" name="pools">Resource allocation and resource
 | 
						|
    pools</a></h2>
 | 
						|
 | 
						|
    <p>One of the problems of writing and designing a server-pool
 | 
						|
    server is that of preventing leakage, that is, allocating
 | 
						|
    resources (memory, open files, <em>etc.</em>), without
 | 
						|
    subsequently releasing them. The resource pool machinery is
 | 
						|
    designed to make it easy to prevent this from happening, by
 | 
						|
    allowing resource to be allocated in such a way that they are
 | 
						|
    <em>automatically</em> released when the server is done with
 | 
						|
    them.</p>
 | 
						|
 | 
						|
    <p>The way this works is as follows: the memory which is
 | 
						|
    allocated, file opened, <em>etc.</em>, to deal with a
 | 
						|
    particular request are tied to a <em>resource pool</em> which
 | 
						|
    is allocated for the request. The pool is a data structure
 | 
						|
    which itself tracks the resources in question.</p>
 | 
						|
 | 
						|
    <p>When the request has been processed, the pool is
 | 
						|
    <em>cleared</em>. At that point, all the memory associated with
 | 
						|
    it is released for reuse, all files associated with it are
 | 
						|
    closed, and any other clean-up functions which are associated
 | 
						|
    with the pool are run. When this is over, we can be confident
 | 
						|
    that all the resource tied to the pool have been released, and
 | 
						|
    that none of them have leaked.</p>
 | 
						|
 | 
						|
    <p>Server restarts, and allocation of memory and resources for
 | 
						|
    per-server configuration, are handled in a similar way. There
 | 
						|
    is a <em>configuration pool</em>, which keeps track of
 | 
						|
    resources which were allocated while reading the server
 | 
						|
    configuration files, and handling the commands therein (for
 | 
						|
    instance, the memory that was allocated for per-server module
 | 
						|
    configuration, log files and other files that were opened, and
 | 
						|
    so forth). When the server restarts, and has to reread the
 | 
						|
    configuration files, the configuration pool is cleared, and so
 | 
						|
    the memory and file descriptors which were taken up by reading
 | 
						|
    them the last time are made available for reuse.</p>
 | 
						|
 | 
						|
    <p>It should be noted that use of the pool machinery isn't
 | 
						|
    generally obligatory, except for situations like logging
 | 
						|
    handlers, where you really need to register cleanups to make
 | 
						|
    sure that the log file gets closed when the server restarts
 | 
						|
    (this is most easily done by using the function <code><a
 | 
						|
    href="#pool-files">ap_pfopen</a></code>, which also arranges
 | 
						|
    for the underlying file descriptor to be closed before any
 | 
						|
    child processes, such as for CGI scripts, are
 | 
						|
    <code>exec</code>ed), or in case you are using the timeout
 | 
						|
    machinery (which isn't yet even documented here). However,
 | 
						|
    there are two benefits to using it: resources allocated to a
 | 
						|
    pool never leak (even if you allocate a scratch string, and
 | 
						|
    just forget about it); also, for memory allocation,
 | 
						|
    <code>ap_palloc</code> is generally faster than
 | 
						|
    <code>malloc</code>.</p>
 | 
						|
 | 
						|
    <p>We begin here by describing how memory is allocated to
 | 
						|
    pools, and then discuss how other resources are tracked by the
 | 
						|
    resource pool machinery.</p>
 | 
						|
 | 
						|
    <h3>Allocation of memory in pools</h3>
 | 
						|
 | 
						|
    <p>Memory is allocated to pools by calling the function
 | 
						|
    <code>ap_palloc</code>, which takes two arguments, one being a
 | 
						|
    pointer to a resource pool structure, and the other being the
 | 
						|
    amount of memory to allocate (in <code>char</code>s). Within
 | 
						|
    handlers for handling requests, the most common way of getting
 | 
						|
    a resource pool structure is by looking at the
 | 
						|
    <code>pool</code> slot of the relevant
 | 
						|
    <code>request_rec</code>; hence the repeated appearance of the
 | 
						|
    following idiom in module code:</p>
 | 
						|
<pre>
 | 
						|
int my_handler(request_rec *r)
 | 
						|
{
 | 
						|
    struct my_structure *foo;
 | 
						|
    ...
 | 
						|
 | 
						|
    foo = (foo *)ap_palloc (r->pool, sizeof(my_structure));
 | 
						|
}
 | 
						|
</pre>
 | 
						|
 | 
						|
    <p>Note that <em>there is no <code>ap_pfree</code></em> ---
 | 
						|
    <code>ap_palloc</code>ed memory is freed only when the
 | 
						|
    associated resource pool is cleared. This means that
 | 
						|
    <code>ap_palloc</code> does not have to do as much accounting
 | 
						|
    as <code>malloc()</code>; all it does in the typical case is to
 | 
						|
    round up the size, bump a pointer, and do a range check.</p>
 | 
						|
 | 
						|
    <p>(It also raises the possibility that heavy use of
 | 
						|
    <code>ap_palloc</code> could cause a server process to grow
 | 
						|
    excessively large. There are two ways to deal with this, which
 | 
						|
    are dealt with below; briefly, you can use <code>malloc</code>,
 | 
						|
    and try to be sure that all of the memory gets explicitly
 | 
						|
    <code>free</code>d, or you can allocate a sub-pool of the main
 | 
						|
    pool, allocate your memory in the sub-pool, and clear it out
 | 
						|
    periodically. The latter technique is discussed in the section
 | 
						|
    on sub-pools below, and is used in the directory-indexing code,
 | 
						|
    in order to avoid excessive storage allocation when listing
 | 
						|
    directories with thousands of files).</p>
 | 
						|
 | 
						|
    <h3>Allocating initialized memory</h3>
 | 
						|
 | 
						|
    <p>There are functions which allocate initialized memory, and
 | 
						|
    are frequently useful. The function <code>ap_pcalloc</code> has
 | 
						|
    the same interface as <code>ap_palloc</code>, but clears out
 | 
						|
    the memory it allocates before it returns it. The function
 | 
						|
    <code>ap_pstrdup</code> takes a resource pool and a <code>char
 | 
						|
    *</code> as arguments, and allocates memory for a copy of the
 | 
						|
    string the pointer points to, returning a pointer to the copy.
 | 
						|
    Finally <code>ap_pstrcat</code> is a varargs-style function,
 | 
						|
    which takes a pointer to a resource pool, and at least two
 | 
						|
    <code>char *</code> arguments, the last of which must be
 | 
						|
    <code>NULL</code>. It allocates enough memory to fit copies of
 | 
						|
    each of the strings, as a unit; for instance:</p>
 | 
						|
<pre>
 | 
						|
     ap_pstrcat (r->pool, "foo", "/", "bar", NULL);
 | 
						|
</pre>
 | 
						|
 | 
						|
    <p>returns a pointer to 8 bytes worth of memory, initialized to
 | 
						|
    <code>"foo/bar"</code>.</p>
 | 
						|
 | 
						|
    <h3><a id="pools-used" name="pools-used">Commonly-used pools in
 | 
						|
    the Apache Web server</a></h3>
 | 
						|
 | 
						|
    <p>A pool is really defined by its lifetime more than anything
 | 
						|
    else. There are some static pools in http_main which are passed
 | 
						|
    to various non-http_main functions as arguments at opportune
 | 
						|
    times. Here they are:</p>
 | 
						|
 | 
						|
    <dl compact="compact">
 | 
						|
      <dt>permanent_pool</dt>
 | 
						|
 | 
						|
      <dd>
 | 
						|
        <ul>
 | 
						|
          <li>never passed to anything else, this is the ancestor
 | 
						|
          of all pools</li>
 | 
						|
        </ul>
 | 
						|
      </dd>
 | 
						|
 | 
						|
      <dt>pconf</dt>
 | 
						|
 | 
						|
      <dd>
 | 
						|
        <ul>
 | 
						|
          <li>subpool of permanent_pool</li>
 | 
						|
 | 
						|
          <li>created at the beginning of a config "cycle"; exists
 | 
						|
          until the server is terminated or restarts; passed to all
 | 
						|
          config-time routines, either via cmd->pool, or as the
 | 
						|
          "pool *p" argument on those which don't take pools</li>
 | 
						|
 | 
						|
          <li>passed to the module init() functions</li>
 | 
						|
        </ul>
 | 
						|
      </dd>
 | 
						|
 | 
						|
      <dt>ptemp</dt>
 | 
						|
 | 
						|
      <dd>
 | 
						|
        <ul>
 | 
						|
          <li>sorry I lie, this pool isn't called this currently in
 | 
						|
          1.3, I renamed it this in my pthreads development. I'm
 | 
						|
          referring to the use of ptrans in the parent... contrast
 | 
						|
          this with the later definition of ptrans in the
 | 
						|
          child.</li>
 | 
						|
 | 
						|
          <li>subpool of permanent_pool</li>
 | 
						|
 | 
						|
          <li>created at the beginning of a config "cycle"; exists
 | 
						|
          until the end of config parsing; passed to config-time
 | 
						|
          routines <em>via</em> cmd->temp_pool. Somewhat of a
 | 
						|
          "bastard child" because it isn't available everywhere.
 | 
						|
          Used for temporary scratch space which may be needed by
 | 
						|
          some config routines but which is deleted at the end of
 | 
						|
          config.</li>
 | 
						|
        </ul>
 | 
						|
      </dd>
 | 
						|
 | 
						|
      <dt>pchild</dt>
 | 
						|
 | 
						|
      <dd>
 | 
						|
        <ul>
 | 
						|
          <li>subpool of permanent_pool</li>
 | 
						|
 | 
						|
          <li>created when a child is spawned (or a thread is
 | 
						|
          created); lives until that child (thread) is
 | 
						|
          destroyed</li>
 | 
						|
 | 
						|
          <li>passed to the module child_init functions</li>
 | 
						|
 | 
						|
          <li>destruction happens right after the child_exit
 | 
						|
          functions are called... (which may explain why I think
 | 
						|
          child_exit is redundant and unneeded)</li>
 | 
						|
        </ul>
 | 
						|
      </dd>
 | 
						|
 | 
						|
      <dt>ptrans</dt>
 | 
						|
 | 
						|
      <dd>
 | 
						|
        <ul>
 | 
						|
          <li>should be a subpool of pchild, but currently is a
 | 
						|
          subpool of permanent_pool, see above</li>
 | 
						|
 | 
						|
          <li>cleared by the child before going into the accept()
 | 
						|
          loop to receive a connection</li>
 | 
						|
 | 
						|
          <li>used as connection->pool</li>
 | 
						|
        </ul>
 | 
						|
      </dd>
 | 
						|
 | 
						|
      <dt>r->pool</dt>
 | 
						|
 | 
						|
      <dd>
 | 
						|
        <ul>
 | 
						|
          <li>for the main request this is a subpool of
 | 
						|
          connection->pool; for subrequests it is a subpool of
 | 
						|
          the parent request's pool.</li>
 | 
						|
 | 
						|
          <li>exists until the end of the request (<em>i.e.</em>,
 | 
						|
          ap_destroy_sub_req, or in child_main after
 | 
						|
          process_request has finished)</li>
 | 
						|
 | 
						|
          <li>note that r itself is allocated from r->pool;
 | 
						|
          <em>i.e.</em>, r->pool is first created and then r is
 | 
						|
          the first thing palloc()d from it</li>
 | 
						|
        </ul>
 | 
						|
      </dd>
 | 
						|
    </dl>
 | 
						|
 | 
						|
    <p>For almost everything folks do, r->pool is the pool to
 | 
						|
    use. But you can see how other lifetimes, such as pchild, are
 | 
						|
    useful to some modules... such as modules that need to open a
 | 
						|
    database connection once per child, and wish to clean it up
 | 
						|
    when the child dies.</p>
 | 
						|
 | 
						|
    <p>You can also see how some bugs have manifested themself,
 | 
						|
    such as setting connection->user to a value from r->pool
 | 
						|
    -- in this case connection exists for the lifetime of ptrans,
 | 
						|
    which is longer than r->pool (especially if r->pool is a
 | 
						|
    subrequest!). So the correct thing to do is to allocate from
 | 
						|
    connection->pool.</p>
 | 
						|
 | 
						|
    <p>And there was another interesting bug in
 | 
						|
    mod_include/mod_cgi. You'll see in those that they do this test
 | 
						|
    to decide if they should use r->pool or r->main->pool.
 | 
						|
    In this case the resource that they are registering for cleanup
 | 
						|
    is a child process. If it were registered in r->pool, then
 | 
						|
    the code would wait() for the child when the subrequest
 | 
						|
    finishes. With mod_include this could be any old #include, and
 | 
						|
    the delay can be up to 3 seconds... and happened quite
 | 
						|
    frequently. Instead the subprocess is registered in
 | 
						|
    r->main->pool which causes it to be cleaned up when the
 | 
						|
    entire request is done -- <em>i.e.</em>, after the output has
 | 
						|
    been sent to the client and logging has happened.</p>
 | 
						|
 | 
						|
    <h3><a id="pool-files" name="pool-files">Tracking open files,
 | 
						|
    etc.</a></h3>
 | 
						|
 | 
						|
    <p>As indicated above, resource pools are also used to track
 | 
						|
    other sorts of resources besides memory. The most common are
 | 
						|
    open files. The routine which is typically used for this is
 | 
						|
    <code>ap_pfopen</code>, which takes a resource pool and two
 | 
						|
    strings as arguments; the strings are the same as the typical
 | 
						|
    arguments to <code>fopen</code>, <em>e.g.</em>,</p>
 | 
						|
<pre>
 | 
						|
     ...
 | 
						|
     FILE *f = ap_pfopen (r->pool, r->filename, "r");
 | 
						|
 | 
						|
     if (f == NULL) { ... } else { ... }
 | 
						|
</pre>
 | 
						|
 | 
						|
    <p>There is also a <code>ap_popenf</code> routine, which
 | 
						|
    parallels the lower-level <code>open</code> system call. Both
 | 
						|
    of these routines arrange for the file to be closed when the
 | 
						|
    resource pool in question is cleared.</p>
 | 
						|
 | 
						|
    <p>Unlike the case for memory, there <em>are</em> functions to
 | 
						|
    close files allocated with <code>ap_pfopen</code>, and
 | 
						|
    <code>ap_popenf</code>, namely <code>ap_pfclose</code> and
 | 
						|
    <code>ap_pclosef</code>. (This is because, on many systems, the
 | 
						|
    number of files which a single process can have open is quite
 | 
						|
    limited). It is important to use these functions to close files
 | 
						|
    allocated with <code>ap_pfopen</code> and
 | 
						|
    <code>ap_popenf</code>, since to do otherwise could cause fatal
 | 
						|
    errors on systems such as Linux, which react badly if the same
 | 
						|
    <code>FILE*</code> is closed more than once.</p>
 | 
						|
 | 
						|
    <p>(Using the <code>close</code> functions is not mandatory,
 | 
						|
    since the file will eventually be closed regardless, but you
 | 
						|
    should consider it in cases where your module is opening, or
 | 
						|
    could open, a lot of files).</p>
 | 
						|
 | 
						|
    <h3>Other sorts of resources --- cleanup functions</h3>
 | 
						|
 | 
						|
    <blockquote>
 | 
						|
      More text goes here. Describe the the cleanup primitives in
 | 
						|
      terms of which the file stuff is implemented; also,
 | 
						|
      <code>spawn_process</code>.
 | 
						|
    </blockquote>
 | 
						|
 | 
						|
    <p>Pool cleanups live until clear_pool() is called:
 | 
						|
    clear_pool(a) recursively calls destroy_pool() on all subpools
 | 
						|
    of a; then calls all the cleanups for a; then releases all the
 | 
						|
    memory for a. destroy_pool(a) calls clear_pool(a) and then
 | 
						|
    releases the pool structure itself. <em>i.e.</em>,
 | 
						|
    clear_pool(a) doesn't delete a, it just frees up all the
 | 
						|
    resources and you can start using it again immediately.</p>
 | 
						|
 | 
						|
    <h3>Fine control --- creating and dealing with sub-pools, with
 | 
						|
    a note on sub-requests</h3>
 | 
						|
    On rare occasions, too-free use of <code>ap_palloc()</code> and
 | 
						|
    the associated primitives may result in undesirably profligate
 | 
						|
    resource allocation. You can deal with such a case by creating
 | 
						|
    a <em>sub-pool</em>, allocating within the sub-pool rather than
 | 
						|
    the main pool, and clearing or destroying the sub-pool, which
 | 
						|
    releases the resources which were associated with it. (This
 | 
						|
    really <em>is</em> a rare situation; the only case in which it
 | 
						|
    comes up in the standard module set is in case of listing
 | 
						|
    directories, and then only with <em>very</em> large
 | 
						|
    directories. Unnecessary use of the primitives discussed here
 | 
						|
    can hair up your code quite a bit, with very little gain). 
 | 
						|
 | 
						|
    <p>The primitive for creating a sub-pool is
 | 
						|
    <code>ap_make_sub_pool</code>, which takes another pool (the
 | 
						|
    parent pool) as an argument. When the main pool is cleared, the
 | 
						|
    sub-pool will be destroyed. The sub-pool may also be cleared or
 | 
						|
    destroyed at any time, by calling the functions
 | 
						|
    <code>ap_clear_pool</code> and <code>ap_destroy_pool</code>,
 | 
						|
    respectively. (The difference is that
 | 
						|
    <code>ap_clear_pool</code> frees resources associated with the
 | 
						|
    pool, while <code>ap_destroy_pool</code> also deallocates the
 | 
						|
    pool itself. In the former case, you can allocate new resources
 | 
						|
    within the pool, and clear it again, and so forth; in the
 | 
						|
    latter case, it is simply gone).</p>
 | 
						|
 | 
						|
    <p>One final note --- sub-requests have their own resource
 | 
						|
    pools, which are sub-pools of the resource pool for the main
 | 
						|
    request. The polite way to reclaim the resources associated
 | 
						|
    with a sub request which you have allocated (using the
 | 
						|
    <code>ap_sub_req_...</code> functions) is
 | 
						|
    <code>ap_destroy_sub_req</code>, which frees the resource pool.
 | 
						|
    Before calling this function, be sure to copy anything that you
 | 
						|
    care about which might be allocated in the sub-request's
 | 
						|
    resource pool into someplace a little less volatile (for
 | 
						|
    instance, the filename in its <code>request_rec</code>
 | 
						|
    structure).</p>
 | 
						|
 | 
						|
    <p>(Again, under most circumstances, you shouldn't feel obliged
 | 
						|
    to call this function; only 2K of memory or so are allocated
 | 
						|
    for a typical sub request, and it will be freed anyway when the
 | 
						|
    main request pool is cleared. It is only when you are
 | 
						|
    allocating many, many sub-requests for a single main request
 | 
						|
    that you should seriously consider the
 | 
						|
    <code>ap_destroy_...</code> functions).</p>
 | 
						|
 | 
						|
    <h2><a id="config" name="config">Configuration, commands and
 | 
						|
    the like</a></h2>
 | 
						|
    One of the design goals for this server was to maintain
 | 
						|
    external compatibility with the NCSA 1.3 server --- that is, to
 | 
						|
    read the same configuration files, to process all the
 | 
						|
    directives therein correctly, and in general to be a drop-in
 | 
						|
    replacement for NCSA. On the other hand, another design goal
 | 
						|
    was to move as much of the server's functionality into modules
 | 
						|
    which have as little as possible to do with the monolithic
 | 
						|
    server core. The only way to reconcile these goals is to move
 | 
						|
    the handling of most commands from the central server into the
 | 
						|
    modules. 
 | 
						|
 | 
						|
    <p>However, just giving the modules command tables is not
 | 
						|
    enough to divorce them completely from the server core. The
 | 
						|
    server has to remember the commands in order to act on them
 | 
						|
    later. That involves maintaining data which is private to the
 | 
						|
    modules, and which can be either per-server, or per-directory.
 | 
						|
    Most things are per-directory, including in particular access
 | 
						|
    control and authorization information, but also information on
 | 
						|
    how to determine file types from suffixes, which can be
 | 
						|
    modified by <code>AddType</code> and <code>DefaultType</code>
 | 
						|
    directives, and so forth. In general, the governing philosophy
 | 
						|
    is that anything which <em>can</em> be made configurable by
 | 
						|
    directory should be; per-server information is generally used
 | 
						|
    in the standard set of modules for information like
 | 
						|
    <code>Alias</code>es and <code>Redirect</code>s which come into
 | 
						|
    play before the request is tied to a particular place in the
 | 
						|
    underlying file system.</p>
 | 
						|
 | 
						|
    <p>Another requirement for emulating the NCSA server is being
 | 
						|
    able to handle the per-directory configuration files, generally
 | 
						|
    called <code>.htaccess</code> files, though even in the NCSA
 | 
						|
    server they can contain directives which have nothing at all to
 | 
						|
    do with access control. Accordingly, after URI -> filename
 | 
						|
    translation, but before performing any other phase, the server
 | 
						|
    walks down the directory hierarchy of the underlying
 | 
						|
    filesystem, following the translated pathname, to read any
 | 
						|
    <code>.htaccess</code> files which might be present. The
 | 
						|
    information which is read in then has to be <em>merged</em>
 | 
						|
    with the applicable information from the server's own config
 | 
						|
    files (either from the <code><Directory></code> sections
 | 
						|
    in <code>access.conf</code>, or from defaults in
 | 
						|
    <code>srm.conf</code>, which actually behaves for most purposes
 | 
						|
    almost exactly like <code><Directory /></code>).</p>
 | 
						|
 | 
						|
    <p>Finally, after having served a request which involved
 | 
						|
    reading <code>.htaccess</code> files, we need to discard the
 | 
						|
    storage allocated for handling them. That is solved the same
 | 
						|
    way it is solved wherever else similar problems come up, by
 | 
						|
    tying those structures to the per-transaction resource
 | 
						|
    pool.</p>
 | 
						|
 | 
						|
    <h3><a id="per-dir" name="per-dir">Per-directory configuration
 | 
						|
    structures</a></h3>
 | 
						|
    Let's look out how all of this plays out in
 | 
						|
    <code>mod_mime.c</code>, which defines the file typing handler
 | 
						|
    which emulates the NCSA server's behavior of determining file
 | 
						|
    types from suffixes. What we'll be looking at, here, is the
 | 
						|
    code which implements the <code>AddType</code> and
 | 
						|
    <code>AddEncoding</code> commands. These commands can appear in
 | 
						|
    <code>.htaccess</code> files, so they must be handled in the
 | 
						|
    module's private per-directory data, which in fact, consists of
 | 
						|
    two separate <code>table</code>s for MIME types and encoding
 | 
						|
    information, and is declared as follows: 
 | 
						|
<pre>
 | 
						|
typedef struct {
 | 
						|
    table *forced_types;      /* Additional AddTyped stuff */
 | 
						|
    table *encoding_types;    /* Added with AddEncoding... */
 | 
						|
} mime_dir_config;
 | 
						|
</pre>
 | 
						|
    When the server is reading a configuration file, or
 | 
						|
    <code><Directory></code> section, which includes one of
 | 
						|
    the MIME module's commands, it needs to create a
 | 
						|
    <code>mime_dir_config</code> structure, so those commands have
 | 
						|
    something to act on. It does this by invoking the function it
 | 
						|
    finds in the module's `create per-dir config slot', with two
 | 
						|
    arguments: the name of the directory to which this
 | 
						|
    configuration information applies (or <code>NULL</code> for
 | 
						|
    <code>srm.conf</code>), and a pointer to a resource pool in
 | 
						|
    which the allocation should happen. 
 | 
						|
 | 
						|
    <p>(If we are reading a <code>.htaccess</code> file, that
 | 
						|
    resource pool is the per-request resource pool for the request;
 | 
						|
    otherwise it is a resource pool which is used for configuration
 | 
						|
    data, and cleared on restarts. Either way, it is important for
 | 
						|
    the structure being created to vanish when the pool is cleared,
 | 
						|
    by registering a cleanup on the pool if necessary).</p>
 | 
						|
 | 
						|
    <p>For the MIME module, the per-dir config creation function
 | 
						|
    just <code>ap_palloc</code>s the structure above, and a creates
 | 
						|
    a couple of <code>table</code>s to fill it. That looks like
 | 
						|
    this:</p>
 | 
						|
<pre>
 | 
						|
void *create_mime_dir_config (pool *p, char *dummy)
 | 
						|
{
 | 
						|
    mime_dir_config *new =
 | 
						|
      (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config));
 | 
						|
 | 
						|
    new->forced_types = ap_make_table (p, 4);
 | 
						|
    new->encoding_types = ap_make_table (p, 4);
 | 
						|
 | 
						|
    return new;
 | 
						|
}
 | 
						|
</pre>
 | 
						|
    Now, suppose we've just read in a <code>.htaccess</code> file.
 | 
						|
    We already have the per-directory configuration structure for
 | 
						|
    the next directory up in the hierarchy. If the
 | 
						|
    <code>.htaccess</code> file we just read in didn't have any
 | 
						|
    <code>AddType</code> or <code>AddEncoding</code> commands, its
 | 
						|
    per-directory config structure for the MIME module is still
 | 
						|
    valid, and we can just use it. Otherwise, we need to merge the
 | 
						|
    two structures somehow. 
 | 
						|
 | 
						|
    <p>To do that, the server invokes the module's per-directory
 | 
						|
    config merge function, if one is present. That function takes
 | 
						|
    three arguments: the two structures being merged, and a
 | 
						|
    resource pool in which to allocate the result. For the MIME
 | 
						|
    module, all that needs to be done is overlay the tables from
 | 
						|
    the new per-directory config structure with those from the
 | 
						|
    parent:</p>
 | 
						|
<pre>
 | 
						|
void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
 | 
						|
{
 | 
						|
    mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
 | 
						|
    mime_dir_config *subdir = (mime_dir_config *)subdirv;
 | 
						|
    mime_dir_config *new =
 | 
						|
      (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config));
 | 
						|
 | 
						|
    new->forced_types = ap_overlay_tables (p, subdir->forced_types,
 | 
						|
                                        parent_dir->forced_types);
 | 
						|
    new->encoding_types = ap_overlay_tables (p, subdir->encoding_types,
 | 
						|
                                          parent_dir->encoding_types);
 | 
						|
 | 
						|
    return new;
 | 
						|
}
 | 
						|
</pre>
 | 
						|
    As a note --- if there is no per-directory merge function
 | 
						|
    present, the server will just use the subdirectory's
 | 
						|
    configuration info, and ignore the parent's. For some modules,
 | 
						|
    that works just fine (<em>e.g.</em>, for the includes module,
 | 
						|
    whose per-directory configuration information consists solely
 | 
						|
    of the state of the <code>XBITHACK</code>), and for those
 | 
						|
    modules, you can just not declare one, and leave the
 | 
						|
    corresponding structure slot in the module itself
 | 
						|
    <code>NULL</code>. 
 | 
						|
 | 
						|
    <h3><a id="commands" name="commands">Command handling</a></h3>
 | 
						|
    Now that we have these structures, we need to be able to figure
 | 
						|
    out how to fill them. That involves processing the actual
 | 
						|
    <code>AddType</code> and <code>AddEncoding</code> commands. To
 | 
						|
    find commands, the server looks in the module's <code>command
 | 
						|
    table</code>. That table contains information on how many
 | 
						|
    arguments the commands take, and in what formats, where it is
 | 
						|
    permitted, and so forth. That information is sufficient to
 | 
						|
    allow the server to invoke most command-handling functions with
 | 
						|
    pre-parsed arguments. Without further ado, let's look at the
 | 
						|
    <code>AddType</code> command handler, which looks like this
 | 
						|
    (the <code>AddEncoding</code> command looks basically the same,
 | 
						|
    and won't be shown here): 
 | 
						|
<pre>
 | 
						|
char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
 | 
						|
{
 | 
						|
    if (*ext == '.') ++ext;
 | 
						|
    ap_table_set (m->forced_types, ext, ct);
 | 
						|
    return NULL;
 | 
						|
}
 | 
						|
</pre>
 | 
						|
    This command handler is unusually simple. As you can see, it
 | 
						|
    takes four arguments, two of which are pre-parsed arguments,
 | 
						|
    the third being the per-directory configuration structure for
 | 
						|
    the module in question, and the fourth being a pointer to a
 | 
						|
    <code>cmd_parms</code> structure. That structure contains a
 | 
						|
    bunch of arguments which are frequently of use to some, but not
 | 
						|
    all, commands, including a resource pool (from which memory can
 | 
						|
    be allocated, and to which cleanups should be tied), and the
 | 
						|
    (virtual) server being configured, from which the module's
 | 
						|
    per-server configuration data can be obtained if required. 
 | 
						|
 | 
						|
    <p>Another way in which this particular command handler is
 | 
						|
    unusually simple is that there are no error conditions which it
 | 
						|
    can encounter. If there were, it could return an error message
 | 
						|
    instead of <code>NULL</code>; this causes an error to be
 | 
						|
    printed out on the server's <code>stderr</code>, followed by a
 | 
						|
    quick exit, if it is in the main config files; for a
 | 
						|
    <code>.htaccess</code> file, the syntax error is logged in the
 | 
						|
    server error log (along with an indication of where it came
 | 
						|
    from), and the request is bounced with a server error response
 | 
						|
    (HTTP error status, code 500).</p>
 | 
						|
 | 
						|
    <p>The MIME module's command table has entries for these
 | 
						|
    commands, which look like this:</p>
 | 
						|
<pre>
 | 
						|
command_rec mime_cmds[] = {
 | 
						|
{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
 | 
						|
    "a mime type followed by a file extension" },
 | 
						|
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
 | 
						|
    "an encoding (<em>e.g.</em>, gzip), followed by a file extension" },
 | 
						|
{ NULL }
 | 
						|
};
 | 
						|
</pre>
 | 
						|
    The entries in these tables are: 
 | 
						|
 | 
						|
    <ul>
 | 
						|
      <li>The name of the command</li>
 | 
						|
 | 
						|
      <li>The function which handles it</li>
 | 
						|
 | 
						|
      <li>a <code>(void *)</code> pointer, which is passed in the
 | 
						|
      <code>cmd_parms</code> structure to the command handler ---
 | 
						|
      this is useful in case many similar commands are handled by
 | 
						|
      the same function.</li>
 | 
						|
 | 
						|
      <li>A bit mask indicating where the command may appear. There
 | 
						|
      are mask bits corresponding to each
 | 
						|
      <code>AllowOverride</code> option, and an additional mask
 | 
						|
      bit, <code>RSRC_CONF</code>, indicating that the command may
 | 
						|
      appear in the server's own config files, but <em>not</em> in
 | 
						|
      any <code>.htaccess</code> file.</li>
 | 
						|
 | 
						|
      <li>A flag indicating how many arguments the command handler
 | 
						|
      wants pre-parsed, and how they should be passed in.
 | 
						|
      <code>TAKE2</code> indicates two pre-parsed arguments. Other
 | 
						|
      options are <code>TAKE1</code>, which indicates one
 | 
						|
      pre-parsed argument, <code>FLAG</code>, which indicates that
 | 
						|
      the argument should be <code>On</code> or <code>Off</code>,
 | 
						|
      and is passed in as a boolean flag, <code>RAW_ARGS</code>,
 | 
						|
      which causes the server to give the command the raw, unparsed
 | 
						|
      arguments (everything but the command name itself). There is
 | 
						|
      also <code>ITERATE</code>, which means that the handler looks
 | 
						|
      the same as <code>TAKE1</code>, but that if multiple
 | 
						|
      arguments are present, it should be called multiple times,
 | 
						|
      and finally <code>ITERATE2</code>, which indicates that the
 | 
						|
      command handler looks like a <code>TAKE2</code>, but if more
 | 
						|
      arguments are present, then it should be called multiple
 | 
						|
      times, holding the first argument constant.</li>
 | 
						|
 | 
						|
      <li>Finally, we have a string which describes the arguments
 | 
						|
      that should be present. If the arguments in the actual config
 | 
						|
      file are not as required, this string will be used to help
 | 
						|
      give a more specific error message. (You can safely leave
 | 
						|
      this <code>NULL</code>).</li>
 | 
						|
    </ul>
 | 
						|
    Finally, having set this all up, we have to use it. This is
 | 
						|
    ultimately done in the module's handlers, specifically for its
 | 
						|
    file-typing handler, which looks more or less like this; note
 | 
						|
    that the per-directory configuration structure is extracted
 | 
						|
    from the <code>request_rec</code>'s per-directory configuration
 | 
						|
    vector by using the <code>ap_get_module_config</code> function.
 | 
						|
    
 | 
						|
<pre>
 | 
						|
int find_ct(request_rec *r)
 | 
						|
{
 | 
						|
    int i;
 | 
						|
    char *fn = ap_pstrdup (r->pool, r->filename);
 | 
						|
    mime_dir_config *conf = (mime_dir_config *)
 | 
						|
             ap_get_module_config(r->per_dir_config, &mime_module);
 | 
						|
    char *type;
 | 
						|
 | 
						|
    if (S_ISDIR(r->finfo.st_mode)) {
 | 
						|
        r->content_type = DIR_MAGIC_TYPE;
 | 
						|
        return OK;
 | 
						|
    }
 | 
						|
 | 
						|
    if((i=ap_rind(fn,'.')) < 0) return DECLINED;
 | 
						|
    ++i;
 | 
						|
 | 
						|
    if ((type = ap_table_get (conf->encoding_types, &fn[i])))
 | 
						|
    {
 | 
						|
        r->content_encoding = type;
 | 
						|
 | 
						|
        /* go back to previous extension to try to use it as a type */
 | 
						|
 | 
						|
        fn[i-1] = '\0';
 | 
						|
        if((i=ap_rind(fn,'.')) < 0) return OK;
 | 
						|
        ++i;
 | 
						|
    }
 | 
						|
 | 
						|
    if ((type = ap_table_get (conf->forced_types, &fn[i])))
 | 
						|
    {
 | 
						|
        r->content_type = type;
 | 
						|
    }
 | 
						|
 | 
						|
    return OK;
 | 
						|
}
 | 
						|
 | 
						|
</pre>
 | 
						|
 | 
						|
    <h3><a id="servconf" name="servconf">Side notes --- per-server
 | 
						|
    configuration, virtual servers, <em>etc</em>.</a></h3>
 | 
						|
    The basic ideas behind per-server module configuration are
 | 
						|
    basically the same as those for per-directory configuration;
 | 
						|
    there is a creation function and a merge function, the latter
 | 
						|
    being invoked where a virtual server has partially overridden
 | 
						|
    the base server configuration, and a combined structure must be
 | 
						|
    computed. (As with per-directory configuration, the default if
 | 
						|
    no merge function is specified, and a module is configured in
 | 
						|
    some virtual server, is that the base configuration is simply
 | 
						|
    ignored). 
 | 
						|
 | 
						|
    <p>The only substantial difference is that when a command needs
 | 
						|
    to configure the per-server private module data, it needs to go
 | 
						|
    to the <code>cmd_parms</code> data to get at it. Here's an
 | 
						|
    example, from the alias module, which also indicates how a
 | 
						|
    syntax error can be returned (note that the per-directory
 | 
						|
    configuration argument to the command handler is declared as a
 | 
						|
    dummy, since the module doesn't actually have per-directory
 | 
						|
    config data):</p>
 | 
						|
<pre>
 | 
						|
char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
 | 
						|
{
 | 
						|
    server_rec *s = cmd->server;
 | 
						|
    alias_server_conf *conf = (alias_server_conf *)
 | 
						|
            ap_get_module_config(s->module_config,&alias_module);
 | 
						|
    alias_entry *new = ap_push_array (conf->redirects);
 | 
						|
 | 
						|
    if (!ap_is_url (url)) return "Redirect to non-URL";
 | 
						|
 | 
						|
    new->fake = f; new->real = url;
 | 
						|
    return NULL;
 | 
						|
}
 | 
						|
</pre>
 | 
						|
    <!--#include virtual="footer.html" -->
 | 
						|
  </body>
 | 
						|
</html>
 | 
						|
 |