mirror of
				https://github.com/apache/httpd.git
				synced 2025-11-03 17:53:20 +03:00 
			
		
		
		
	other than GET, and const'd the definition of method in request_rec. Submitted by: Greg Stein <gstein@lyra.org> Reviewed by: Roy Fielding, Dean Gaudet, Doug MacEachern git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@82870 13f79535-47bb-0310-9956-ffa450edef68
		
			
				
	
	
		
			1154 lines
		
	
	
		
			48 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
			
		
		
	
	
			1154 lines
		
	
	
		
			48 KiB
		
	
	
	
		
			HTML
		
	
	
	
	
	
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
 | 
						|
<HTML><HEAD>
 | 
						|
<TITLE>Apache API notes</TITLE>
 | 
						|
</HEAD>
 | 
						|
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
 | 
						|
<BODY
 | 
						|
 BGCOLOR="#FFFFFF"
 | 
						|
 TEXT="#000000"
 | 
						|
 LINK="#0000FF"
 | 
						|
 VLINK="#000080"
 | 
						|
 ALINK="#FF0000"
 | 
						|
>
 | 
						|
<!--#include virtual="header.html" -->
 | 
						|
<H1 ALIGN="CENTER">Apache API notes</H1>
 | 
						|
 | 
						|
These are some notes on the Apache API and the data structures you
 | 
						|
have to deal with, <EM>etc.</EM>  They are not yet nearly complete, but
 | 
						|
hopefully, they will help you get your bearings.  Keep in mind that
 | 
						|
the API is still subject to change as we gain experience with it.
 | 
						|
(See the TODO file for what <EM>might</EM> be coming).  However,
 | 
						|
it will be easy to adapt modules to any changes that are made.
 | 
						|
(We have more modules to adapt than you do).
 | 
						|
<P>
 | 
						|
 | 
						|
A few notes on general pedagogical style here.  In the interest of
 | 
						|
conciseness, all structure declarations here are incomplete --- the
 | 
						|
real ones have more slots that I'm not telling you about.  For the
 | 
						|
most part, these are reserved to one component of the server core or
 | 
						|
another, and should be altered by modules with caution.  However, in
 | 
						|
some cases, they really are things I just haven't gotten around to
 | 
						|
yet.  Welcome to the bleeding edge.<P>
 | 
						|
 | 
						|
Finally, here's an outline, to give you some bare idea of what's
 | 
						|
coming up, and in what order:
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI> <A HREF="#basics">Basic concepts.</A>
 | 
						|
<MENU>
 | 
						|
 <LI> <A HREF="#HMR">Handlers, Modules, and Requests</A>
 | 
						|
 <LI> <A HREF="#moduletour">A brief tour of a module</A>
 | 
						|
</MENU>
 | 
						|
<LI> <A HREF="#handlers">How handlers work</A>
 | 
						|
<MENU>
 | 
						|
 <LI> <A HREF="#req_tour">A brief tour of the <CODE>request_rec</CODE></A>
 | 
						|
 <LI> <A HREF="#req_orig">Where request_rec structures come from</A>
 | 
						|
 <LI> <A HREF="#req_return">Handling requests, declining, and returning error
 | 
						|
  codes</A>
 | 
						|
 <LI> <A HREF="#resp_handlers">Special considerations for response handlers</A>
 | 
						|
 <LI> <A HREF="#auth_handlers">Special considerations for authentication
 | 
						|
  handlers</A>
 | 
						|
 <LI> <A HREF="#log_handlers">Special considerations for logging handlers</A>
 | 
						|
</MENU>
 | 
						|
<LI> <A HREF="#pools">Resource allocation and resource pools</A>
 | 
						|
<LI> <A HREF="#config">Configuration, commands and the like</A>
 | 
						|
<MENU>
 | 
						|
 <LI> <A HREF="#per-dir">Per-directory configuration structures</A>
 | 
						|
 <LI> <A HREF="#commands">Command handling</A>
 | 
						|
 <LI> <A HREF="#servconf">Side notes --- per-server configuration,
 | 
						|
  virtual servers, <EM>etc</EM>.</A>
 | 
						|
</MENU>
 | 
						|
</UL>
 | 
						|
 | 
						|
<H2><A NAME="basics">Basic concepts.</A></H2>
 | 
						|
 | 
						|
We begin with an overview of the basic concepts behind the
 | 
						|
API, and how they are manifested in the code.
 | 
						|
 | 
						|
<H3><A NAME="HMR">Handlers, Modules, and Requests</A></H3>
 | 
						|
 | 
						|
Apache breaks down request handling into a series of steps, more or
 | 
						|
less the same way the Netscape server API does (although this API has
 | 
						|
a few more stages than NetSite does, as hooks for stuff I thought
 | 
						|
might be useful in the future).  These are:
 | 
						|
 | 
						|
<UL>
 | 
						|
  <LI> URI -> Filename translation
 | 
						|
  <LI> Auth ID checking [is the user who they say they are?]
 | 
						|
  <LI> Auth access checking [is the user authorized <EM>here</EM>?]
 | 
						|
  <LI> Access checking other than auth
 | 
						|
  <LI> Determining MIME type of the object requested
 | 
						|
  <LI> `Fixups' --- there aren't any of these yet, but the phase is
 | 
						|
       intended as a hook for possible extensions like
 | 
						|
       <CODE>SetEnv</CODE>, which don't really fit well elsewhere.
 | 
						|
  <LI> Actually sending a response back to the client.
 | 
						|
  <LI> Logging the request
 | 
						|
</UL>
 | 
						|
 | 
						|
These phases are handled by looking at each of a succession of
 | 
						|
<EM>modules</EM>, looking to see if each of them has a handler for the
 | 
						|
phase, and attempting invoking it if so.  The handler can typically do
 | 
						|
one of three things:
 | 
						|
 | 
						|
<UL>
 | 
						|
  <LI> <EM>Handle</EM> the request, and indicate that it has done so
 | 
						|
       by returning the magic constant <CODE>OK</CODE>.
 | 
						|
  <LI> <EM>Decline</EM> to handle the request, by returning the magic
 | 
						|
       integer constant <CODE>DECLINED</CODE>.  In this case, the
 | 
						|
       server behaves in all respects as if the handler simply hadn't
 | 
						|
       been there.
 | 
						|
  <LI> Signal an error, by returning one of the HTTP error codes.
 | 
						|
       This terminates normal handling of the request, although an
 | 
						|
       ErrorDocument may be invoked to try to mop up, and it will be
 | 
						|
       logged in any case.
 | 
						|
</UL>
 | 
						|
 | 
						|
Most phases are terminated by the first module that handles them;
 | 
						|
however, for logging, `fixups', and non-access authentication
 | 
						|
checking, all handlers always run (barring an error).  Also, the
 | 
						|
response phase is unique in that modules may declare multiple handlers
 | 
						|
for it, via a dispatch table keyed on the MIME type of the requested
 | 
						|
object.  Modules may declare a response-phase handler which can handle
 | 
						|
<EM>any</EM> request, by giving it the key <CODE>*/*</CODE> (<EM>i.e.</EM>, a
 | 
						|
wildcard MIME type specification).  However, wildcard handlers are
 | 
						|
only invoked if the server has already tried and failed to find a more
 | 
						|
specific response handler for the MIME type of the requested object
 | 
						|
(either none existed, or they all declined).<P>
 | 
						|
 | 
						|
The handlers themselves are functions of one argument (a
 | 
						|
<CODE>request_rec</CODE> structure. vide infra), which returns an
 | 
						|
integer, as above.<P>
 | 
						|
 | 
						|
<H3><A NAME="moduletour">A brief tour of a module</A></H3>
 | 
						|
 | 
						|
At this point, we need to explain the structure of a module.  Our
 | 
						|
candidate will be one of the messier ones, the CGI module --- this
 | 
						|
handles both CGI scripts and the <CODE>ScriptAlias</CODE> config file
 | 
						|
command.  It's actually a great deal more complicated than most
 | 
						|
modules, but if we're going to have only one example, it might as well
 | 
						|
be the one with its fingers in every place.<P>
 | 
						|
 | 
						|
Let's begin with handlers.  In order to handle the CGI scripts, the
 | 
						|
module declares a response handler for them. Because of
 | 
						|
<CODE>ScriptAlias</CODE>, it also has handlers for the name
 | 
						|
translation phase (to recognize <CODE>ScriptAlias</CODE>ed URIs), the
 | 
						|
type-checking phase (any <CODE>ScriptAlias</CODE>ed request is typed
 | 
						|
as a CGI script).<P>
 | 
						|
 | 
						|
The module needs to maintain some per (virtual)
 | 
						|
server information, namely, the <CODE>ScriptAlias</CODE>es in effect;
 | 
						|
the module structure therefore contains pointers to a functions which
 | 
						|
builds these structures, and to another which combines two of them (in
 | 
						|
case the main server and a virtual server both have
 | 
						|
<CODE>ScriptAlias</CODE>es declared).<P>
 | 
						|
 | 
						|
Finally, this module contains code to handle the
 | 
						|
<CODE>ScriptAlias</CODE> command itself.  This particular module only
 | 
						|
declares one command, but there could be more, so modules have
 | 
						|
<EM>command tables</EM> which declare their commands, and describe
 | 
						|
where they are permitted, and how they are to be invoked.  <P>
 | 
						|
 | 
						|
A final note on the declared types of the arguments of some of these
 | 
						|
commands: a <CODE>pool</CODE> is a pointer to a <EM>resource pool</EM>
 | 
						|
structure; these are used by the server to keep track of the memory
 | 
						|
which has been allocated, files opened, <EM>etc.</EM>, either to service a
 | 
						|
particular request, or to handle the process of configuring itself.
 | 
						|
That way, when the request is over (or, for the configuration pool,
 | 
						|
when the server is restarting), the memory can be freed, and the files
 | 
						|
closed, <EM>en masse</EM>, without anyone having to write explicit code to
 | 
						|
track them all down and dispose of them.  Also, a
 | 
						|
<CODE>cmd_parms</CODE> structure contains various information about
 | 
						|
the config file being read, and other status information, which is
 | 
						|
sometimes of use to the function which processes a config-file command
 | 
						|
(such as <CODE>ScriptAlias</CODE>).
 | 
						|
 | 
						|
With no further ado, the module itself:
 | 
						|
 | 
						|
<PRE>
 | 
						|
/* Declarations of handlers. */
 | 
						|
 | 
						|
int translate_scriptalias (request_rec *);
 | 
						|
int type_scriptalias (request_rec *);
 | 
						|
int cgi_handler (request_rec *);
 | 
						|
 | 
						|
/* Subsidiary dispatch table for response-phase handlers, by MIME type */
 | 
						|
 | 
						|
handler_rec cgi_handlers[] = {
 | 
						|
{ "application/x-httpd-cgi", cgi_handler },
 | 
						|
{ NULL }
 | 
						|
};
 | 
						|
 | 
						|
/* Declarations of routines to manipulate the module's configuration
 | 
						|
 * info.  Note that these are returned, and passed in, as void *'s;
 | 
						|
 * the server core keeps track of them, but it doesn't, and can't,
 | 
						|
 * know their internal structure.
 | 
						|
 */
 | 
						|
 | 
						|
void *make_cgi_server_config (pool *);
 | 
						|
void *merge_cgi_server_config (pool *, void *, void *);
 | 
						|
 | 
						|
/* Declarations of routines to handle config-file commands */
 | 
						|
 | 
						|
extern char *script_alias(cmd_parms *, void *per_dir_config, char *fake,
 | 
						|
                          char *real);
 | 
						|
 | 
						|
command_rec cgi_cmds[] = {
 | 
						|
{ "ScriptAlias", script_alias, NULL, RSRC_CONF, TAKE2,
 | 
						|
    "a fakename and a realname"},
 | 
						|
{ NULL }
 | 
						|
};
 | 
						|
 | 
						|
module cgi_module = {
 | 
						|
   STANDARD_MODULE_STUFF,
 | 
						|
   NULL,                     /* initializer */
 | 
						|
   NULL,                     /* dir config creator */
 | 
						|
   NULL,                     /* dir merger --- default is to override */
 | 
						|
   make_cgi_server_config,   /* server config */
 | 
						|
   merge_cgi_server_config,  /* merge server config */
 | 
						|
   cgi_cmds,                 /* command table */
 | 
						|
   cgi_handlers,             /* handlers */
 | 
						|
   translate_scriptalias,    /* filename translation */
 | 
						|
   NULL,                     /* check_user_id */
 | 
						|
   NULL,                     /* check auth */
 | 
						|
   NULL,                     /* check access */
 | 
						|
   type_scriptalias,         /* type_checker */
 | 
						|
   NULL,                     /* fixups */
 | 
						|
   NULL,                     /* logger */
 | 
						|
   NULL                      /* header parser */
 | 
						|
};
 | 
						|
</PRE>
 | 
						|
 | 
						|
<H2><A NAME="handlers">How handlers work</A></H2>
 | 
						|
 | 
						|
The sole argument to handlers is a <CODE>request_rec</CODE> structure.
 | 
						|
This structure describes a particular request which has been made to
 | 
						|
the server, on behalf of a client.  In most cases, each connection to
 | 
						|
the client generates only one <CODE>request_rec</CODE> structure.<P>
 | 
						|
 | 
						|
<H3><A NAME="req_tour">A brief tour of the <CODE>request_rec</CODE></A></H3>
 | 
						|
 | 
						|
The <CODE>request_rec</CODE> contains pointers to a resource pool
 | 
						|
which will be cleared when the server is finished handling the
 | 
						|
request; to structures containing per-server and per-connection
 | 
						|
information, and most importantly, information on the request itself.<P>
 | 
						|
 | 
						|
The most important such information is a small set of character
 | 
						|
strings describing attributes of the object being requested, including
 | 
						|
its URI, filename, content-type and content-encoding (these being filled
 | 
						|
in by the translation and type-check handlers which handle the
 | 
						|
request, respectively). <P>
 | 
						|
 | 
						|
Other commonly used data items are tables giving the MIME headers on
 | 
						|
the client's original request, MIME headers to be sent back with the
 | 
						|
response (which modules can add to at will), and environment variables
 | 
						|
for any subprocesses which are spawned off in the course of servicing
 | 
						|
the request.  These tables are manipulated using the
 | 
						|
<CODE>ap_table_get</CODE> and <CODE>ap_table_set</CODE> routines. <P>
 | 
						|
<BLOCKQUOTE>
 | 
						|
 Note that the <SAMP>Content-type</SAMP> header value <EM>cannot</EM> be
 | 
						|
 set by module content-handlers using the <SAMP>ap_table_*()</SAMP>
 | 
						|
 routines.  Rather, it is set by pointing the <SAMP>content_type</SAMP>
 | 
						|
 field in the <SAMP>request_rec</SAMP> structure to an appropriate
 | 
						|
 string.  <EM>E.g.</EM>,
 | 
						|
 <PRE>
 | 
						|
  r->content_type = "text/html";
 | 
						|
 </PRE>
 | 
						|
</BLOCKQUOTE>
 | 
						|
Finally, there are pointers to two data structures which, in turn,
 | 
						|
point to per-module configuration structures.  Specifically, these
 | 
						|
hold pointers to the data structures which the module has built to
 | 
						|
describe the way it has been configured to operate in a given
 | 
						|
directory (via <CODE>.htaccess</CODE> files or
 | 
						|
<CODE><Directory></CODE> sections), for private data it has
 | 
						|
built in the course of servicing the request (so modules' handlers for
 | 
						|
one phase can pass `notes' to their handlers for other phases).  There
 | 
						|
is another such configuration vector in the <CODE>server_rec</CODE>
 | 
						|
data structure pointed to by the <CODE>request_rec</CODE>, which
 | 
						|
contains per (virtual) server configuration data.<P>
 | 
						|
 | 
						|
Here is an abridged declaration, giving the fields most commonly used:<P>
 | 
						|
 | 
						|
<PRE>
 | 
						|
struct request_rec {
 | 
						|
 | 
						|
  pool *pool;
 | 
						|
  conn_rec *connection;
 | 
						|
  server_rec *server;
 | 
						|
 | 
						|
  /* What object is being requested */
 | 
						|
 | 
						|
  char *uri;
 | 
						|
  char *filename;
 | 
						|
  char *path_info;
 | 
						|
  char *args;           /* QUERY_ARGS, if any */
 | 
						|
  struct stat finfo;    /* Set by server core;
 | 
						|
                         * st_mode set to zero if no such file */
 | 
						|
 | 
						|
  char *content_type;
 | 
						|
  char *content_encoding;
 | 
						|
 | 
						|
  /* MIME header environments, in and out.  Also, an array containing
 | 
						|
   * environment variables to be passed to subprocesses, so people can
 | 
						|
   * write modules to add to that environment.
 | 
						|
   *
 | 
						|
   * The difference between headers_out and err_headers_out is that
 | 
						|
   * the latter are printed even on error, and persist across internal
 | 
						|
   * redirects (so the headers printed for ErrorDocument handlers will
 | 
						|
   * have them).
 | 
						|
   */
 | 
						|
 | 
						|
  table *headers_in;
 | 
						|
  table *headers_out;
 | 
						|
  table *err_headers_out;
 | 
						|
  table *subprocess_env;
 | 
						|
 | 
						|
  /* Info about the request itself... */
 | 
						|
 | 
						|
  int header_only;     /* HEAD request, as opposed to GET */
 | 
						|
  char *protocol;      /* Protocol, as given to us, or HTTP/0.9 */
 | 
						|
  char *method;        /* GET, HEAD, POST, <EM>etc.</EM> */
 | 
						|
  int method_number;   /* M_GET, M_POST, <EM>etc.</EM> */
 | 
						|
 | 
						|
  /* Info for logging */
 | 
						|
 | 
						|
  char *the_request;
 | 
						|
  int bytes_sent;
 | 
						|
 | 
						|
  /* A flag which modules can set, to indicate that the data being
 | 
						|
   * returned is volatile, and clients should be told not to cache it.
 | 
						|
   */
 | 
						|
 | 
						|
  int no_cache;
 | 
						|
 | 
						|
  /* Various other config info which may change with .htaccess files
 | 
						|
   * These are config vectors, with one void* pointer for each module
 | 
						|
   * (the thing pointed to being the module's business).
 | 
						|
   */
 | 
						|
 | 
						|
  void *per_dir_config;   /* Options set in config files, <EM>etc.</EM> */
 | 
						|
  void *request_config;   /* Notes on *this* request */
 | 
						|
 | 
						|
};
 | 
						|
 | 
						|
</PRE>
 | 
						|
 | 
						|
<H3><A NAME="req_orig">Where request_rec structures come from</A></H3>
 | 
						|
 | 
						|
Most <CODE>request_rec</CODE> structures are built by reading an HTTP
 | 
						|
request from a client, and filling in the fields.  However, there are
 | 
						|
a few exceptions:
 | 
						|
 | 
						|
<UL>
 | 
						|
  <LI> If the request is to an imagemap, a type map (<EM>i.e.</EM>, a
 | 
						|
       <CODE>*.var</CODE> file), or a CGI script which returned a
 | 
						|
       local `Location:', then the resource which the user requested
 | 
						|
       is going to be ultimately located by some URI other than what
 | 
						|
       the client originally supplied.  In this case, the server does
 | 
						|
       an <EM>internal redirect</EM>, constructing a new
 | 
						|
       <CODE>request_rec</CODE> for the new URI, and processing it
 | 
						|
       almost exactly as if the client had requested the new URI
 | 
						|
       directly. <P>
 | 
						|
 | 
						|
  <LI> If some handler signaled an error, and an
 | 
						|
       <CODE>ErrorDocument</CODE> is in scope, the same internal
 | 
						|
       redirect machinery comes into play.<P>
 | 
						|
 | 
						|
  <LI> Finally, a handler occasionally needs to investigate `what
 | 
						|
       would happen if' some other request were run.  For instance,
 | 
						|
       the directory indexing module needs to know what MIME type
 | 
						|
       would be assigned to a request for each directory entry, in
 | 
						|
       order to figure out what icon to use.<P>
 | 
						|
 | 
						|
       Such handlers can construct a <EM>sub-request</EM>, using the
 | 
						|
       functions <CODE>ap_sub_req_lookup_file</CODE>,
 | 
						|
       <CODE>ap_sub_req_lookup_uri</CODE>, and
 | 
						|
       <CODE>ap_sub_req_method_uri</CODE>; these construct a new
 | 
						|
       <CODE>request_rec</CODE> structure and processes it as you
 | 
						|
       would expect, up to but not including the point of actually
 | 
						|
       sending a response.  (These functions skip over the access
 | 
						|
       checks if the sub-request is for a file in the same directory
 | 
						|
       as the original request).<P>
 | 
						|
 | 
						|
       (Server-side includes work by building sub-requests and then
 | 
						|
       actually invoking the response handler for them, via the
 | 
						|
       function <CODE>ap_run_sub_req</CODE>).
 | 
						|
</UL>
 | 
						|
 | 
						|
<H3><A NAME="req_return">Handling requests, declining, and returning error
 | 
						|
 codes</A></H3>
 | 
						|
 | 
						|
As discussed above, each handler, when invoked to handle a particular
 | 
						|
<CODE>request_rec</CODE>, has to return an <CODE>int</CODE> to
 | 
						|
indicate what happened.  That can either be
 | 
						|
 | 
						|
<UL>
 | 
						|
  <LI> OK --- the request was handled successfully.  This may or may
 | 
						|
       not terminate the phase.
 | 
						|
  <LI> DECLINED --- no erroneous condition exists, but the module
 | 
						|
       declines to handle the phase; the server tries to find another.
 | 
						|
  <LI> an HTTP error code, which aborts handling of the request.
 | 
						|
</UL>
 | 
						|
 | 
						|
Note that if the error code returned is <CODE>REDIRECT</CODE>, then
 | 
						|
the module should put a <CODE>Location</CODE> in the request's
 | 
						|
<CODE>headers_out</CODE>, to indicate where the client should be
 | 
						|
redirected <EM>to</EM>. <P>
 | 
						|
 | 
						|
<H3><A NAME="resp_handlers">Special considerations for response
 | 
						|
 handlers</A></H3>
 | 
						|
 | 
						|
Handlers for most phases do their work by simply setting a few fields
 | 
						|
in the <CODE>request_rec</CODE> structure (or, in the case of access
 | 
						|
checkers, simply by returning the correct error code).  However,
 | 
						|
response handlers have to actually send a request back to the client. <P>
 | 
						|
 | 
						|
They should begin by sending an HTTP response header, using the
 | 
						|
function <CODE>ap_send_http_header</CODE>.  (You don't have to do
 | 
						|
anything special to skip sending the header for HTTP/0.9 requests; the
 | 
						|
function figures out on its own that it shouldn't do anything).  If
 | 
						|
the request is marked <CODE>header_only</CODE>, that's all they should
 | 
						|
do; they should return after that, without attempting any further
 | 
						|
output.  <P>
 | 
						|
 | 
						|
Otherwise, they should produce a request body which responds to the
 | 
						|
client as appropriate.  The primitives for this are <CODE>ap_rputc</CODE>
 | 
						|
and <CODE>ap_rprintf</CODE>, for internally generated output, and
 | 
						|
<CODE>ap_send_fd</CODE>, to copy the contents of some <CODE>FILE *</CODE>
 | 
						|
straight to the client.  <P>
 | 
						|
 | 
						|
At this point, you should more or less understand the following piece
 | 
						|
of code, which is the handler which handles <CODE>GET</CODE> requests
 | 
						|
which have no more specific handler; it also shows how conditional
 | 
						|
<CODE>GET</CODE>s can be handled, if it's desirable to do so in a
 | 
						|
particular response handler --- <CODE>ap_set_last_modified</CODE> checks
 | 
						|
against the <CODE>If-modified-since</CODE> value supplied by the
 | 
						|
client, if any, and returns an appropriate code (which will, if
 | 
						|
nonzero, be USE_LOCAL_COPY).   No similar considerations apply for
 | 
						|
<CODE>ap_set_content_length</CODE>, but it returns an error code for
 | 
						|
symmetry.<P>
 | 
						|
 | 
						|
<PRE>
 | 
						|
int default_handler (request_rec *r)
 | 
						|
{
 | 
						|
    int errstatus;
 | 
						|
    FILE *f;
 | 
						|
 | 
						|
    if (r->method_number != M_GET) return DECLINED;
 | 
						|
    if (r->finfo.st_mode == 0) return NOT_FOUND;
 | 
						|
 | 
						|
    if ((errstatus = ap_set_content_length (r, r->finfo.st_size))
 | 
						|
	|| (errstatus = ap_set_last_modified (r, r->finfo.st_mtime)))
 | 
						|
        return errstatus;
 | 
						|
 | 
						|
    f = fopen (r->filename, "r");
 | 
						|
 | 
						|
    if (f == NULL) {
 | 
						|
        log_reason("file permissions deny server access",
 | 
						|
                   r->filename, r);
 | 
						|
        return FORBIDDEN;
 | 
						|
    }
 | 
						|
 | 
						|
    register_timeout ("send", r);
 | 
						|
    ap_send_http_header (r);
 | 
						|
 | 
						|
    if (!r->header_only) send_fd (f, r);
 | 
						|
    ap_pfclose (r->pool, f);
 | 
						|
    return OK;
 | 
						|
}
 | 
						|
</PRE>
 | 
						|
 | 
						|
Finally, if all of this is too much of a challenge, there are a few
 | 
						|
ways out of it.  First off, as shown above, a response handler which
 | 
						|
has not yet produced any output can simply return an error code, in
 | 
						|
which case the server will automatically produce an error response.
 | 
						|
Secondly, it can punt to some other handler by invoking
 | 
						|
<CODE>ap_internal_redirect</CODE>, which is how the internal redirection
 | 
						|
machinery discussed above is invoked.  A response handler which has
 | 
						|
internally redirected should always return <CODE>OK</CODE>. <P>
 | 
						|
 | 
						|
(Invoking <CODE>ap_internal_redirect</CODE> from handlers which are
 | 
						|
<EM>not</EM> response handlers will lead to serious confusion).
 | 
						|
 | 
						|
<H3><A NAME="auth_handlers">Special considerations for authentication
 | 
						|
 handlers</A></H3>
 | 
						|
 | 
						|
Stuff that should be discussed here in detail:
 | 
						|
 | 
						|
<UL>
 | 
						|
  <LI> Authentication-phase handlers not invoked unless auth is
 | 
						|
       configured for the directory.
 | 
						|
  <LI> Common auth configuration stored in the core per-dir
 | 
						|
       configuration; it has accessors <CODE>ap_auth_type</CODE>,
 | 
						|
       <CODE>ap_auth_name</CODE>, and <CODE>ap_requires</CODE>.
 | 
						|
  <LI> Common routines, to handle the protocol end of things, at least
 | 
						|
       for HTTP basic authentication (<CODE>ap_get_basic_auth_pw</CODE>,
 | 
						|
       which sets the <CODE>connection->user</CODE> structure field
 | 
						|
       automatically, and <CODE>ap_note_basic_auth_failure</CODE>, which
 | 
						|
       arranges for the proper <CODE>WWW-Authenticate:</CODE> header
 | 
						|
       to be sent back).
 | 
						|
</UL>
 | 
						|
 | 
						|
<H3><A NAME="log_handlers">Special considerations for logging handlers</A></H3>
 | 
						|
 | 
						|
When a request has internally redirected, there is the question of
 | 
						|
what to log.  Apache handles this by bundling the entire chain of
 | 
						|
redirects into a list of <CODE>request_rec</CODE> structures which are
 | 
						|
threaded through the <CODE>r->prev</CODE> and <CODE>r->next</CODE>
 | 
						|
pointers.  The <CODE>request_rec</CODE> which is passed to the logging
 | 
						|
handlers in such cases is the one which was originally built for the
 | 
						|
initial request from the client; note that the bytes_sent field will
 | 
						|
only be correct in the last request in the chain (the one for which a
 | 
						|
response was actually sent).
 | 
						|
 | 
						|
<H2><A NAME="pools">Resource allocation and resource pools</A></H2>
 | 
						|
<P>
 | 
						|
One of the problems of writing and designing a server-pool server is
 | 
						|
that of preventing leakage, that is, allocating resources (memory,
 | 
						|
open files, <EM>etc.</EM>), without subsequently releasing them.  The resource
 | 
						|
pool machinery is designed to make it easy to prevent this from
 | 
						|
happening, by allowing resource to be allocated in such a way that
 | 
						|
they are <EM>automatically</EM> released when the server is done with
 | 
						|
them.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
The way this works is as follows:  the memory which is allocated, file
 | 
						|
opened, <EM>etc.</EM>, to deal with a particular request are tied to a
 | 
						|
<EM>resource pool</EM> which is allocated for the request.  The pool
 | 
						|
is a data structure which itself tracks the resources in question.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
When the request has been processed, the pool is <EM>cleared</EM>.  At
 | 
						|
that point, all the memory associated with it is released for reuse,
 | 
						|
all files associated with it are closed, and any other clean-up
 | 
						|
functions which are associated with the pool are run.  When this is
 | 
						|
over, we can be confident that all the resource tied to the pool have
 | 
						|
been released, and that none of them have leaked.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Server restarts, and allocation of memory and resources for per-server
 | 
						|
configuration, are handled in a similar way.  There is a
 | 
						|
<EM>configuration pool</EM>, which keeps track of resources which were
 | 
						|
allocated while reading the server configuration files, and handling
 | 
						|
the commands therein (for instance, the memory that was allocated for
 | 
						|
per-server module configuration, log files and other files that were
 | 
						|
opened, and so forth).  When the server restarts, and has to reread
 | 
						|
the configuration files, the configuration pool is cleared, and so the
 | 
						|
memory and file descriptors which were taken up by reading them the
 | 
						|
last time are made available for reuse.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
It should be noted that use of the pool machinery isn't generally
 | 
						|
obligatory, except for situations like logging handlers, where you
 | 
						|
really need to register cleanups to make sure that the log file gets
 | 
						|
closed when the server restarts (this is most easily done by using the
 | 
						|
function <CODE><A HREF="#pool-files">ap_pfopen</A></CODE>, which also
 | 
						|
arranges for the underlying file descriptor to be closed before any
 | 
						|
child processes, such as for CGI scripts, are <CODE>exec</CODE>ed), or
 | 
						|
in case you are using the timeout machinery (which isn't yet even
 | 
						|
documented here).  However, there are two benefits to using it:
 | 
						|
resources allocated to a pool never leak (even if you allocate a
 | 
						|
scratch string, and just forget about it); also, for memory
 | 
						|
allocation, <CODE>ap_palloc</CODE> is generally faster than
 | 
						|
<CODE>malloc</CODE>.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
We begin here by describing how memory is allocated to pools, and then
 | 
						|
discuss how other resources are tracked by the resource pool
 | 
						|
machinery.
 | 
						|
</P>
 | 
						|
<H3>Allocation of memory in pools</H3>
 | 
						|
<P>
 | 
						|
Memory is allocated to pools by calling the function
 | 
						|
<CODE>ap_palloc</CODE>, which takes two arguments, one being a pointer to
 | 
						|
a resource pool structure, and the other being the amount of memory to
 | 
						|
allocate (in <CODE>char</CODE>s).  Within handlers for handling
 | 
						|
requests, the most common way of getting a resource pool structure is
 | 
						|
by looking at the <CODE>pool</CODE> slot of the relevant
 | 
						|
<CODE>request_rec</CODE>; hence the repeated appearance of the
 | 
						|
following idiom in module code:
 | 
						|
</P>
 | 
						|
<PRE>
 | 
						|
int my_handler(request_rec *r)
 | 
						|
{
 | 
						|
    struct my_structure *foo;
 | 
						|
    ...
 | 
						|
 | 
						|
    foo = (foo *)ap_palloc (r->pool, sizeof(my_structure));
 | 
						|
}
 | 
						|
</PRE>
 | 
						|
<P>
 | 
						|
Note that <EM>there is no <CODE>ap_pfree</CODE></EM> ---
 | 
						|
<CODE>ap_palloc</CODE>ed memory is freed only when the associated
 | 
						|
resource pool is cleared.  This means that <CODE>ap_palloc</CODE> does not
 | 
						|
have to do as much accounting as <CODE>malloc()</CODE>; all it does in
 | 
						|
the typical case is to round up the size, bump a pointer, and do a
 | 
						|
range check.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
(It also raises the possibility that heavy use of <CODE>ap_palloc</CODE>
 | 
						|
could cause a server process to grow excessively large.  There are
 | 
						|
two ways to deal with this, which are dealt with below; briefly, you
 | 
						|
can use <CODE>malloc</CODE>, and try to be sure that all of the memory
 | 
						|
gets explicitly <CODE>free</CODE>d, or you can allocate a sub-pool of
 | 
						|
the main pool, allocate your memory in the sub-pool, and clear it out
 | 
						|
periodically.  The latter technique is discussed in the section on
 | 
						|
sub-pools below, and is used in the directory-indexing code, in order
 | 
						|
to avoid excessive storage allocation when listing directories with
 | 
						|
thousands of files).
 | 
						|
</P>
 | 
						|
<H3>Allocating initialized memory</H3>
 | 
						|
<P>
 | 
						|
There are functions which allocate initialized memory, and are
 | 
						|
frequently useful.  The function <CODE>ap_pcalloc</CODE> has the same
 | 
						|
interface as <CODE>ap_palloc</CODE>, but clears out the memory it
 | 
						|
allocates before it returns it.  The function <CODE>ap_pstrdup</CODE>
 | 
						|
takes a resource pool and a <CODE>char *</CODE> as arguments, and
 | 
						|
allocates memory for a copy of the string the pointer points to,
 | 
						|
returning a pointer to the copy.  Finally <CODE>ap_pstrcat</CODE> is a
 | 
						|
varargs-style function, which takes a pointer to a resource pool, and
 | 
						|
at least two <CODE>char *</CODE> arguments, the last of which must be
 | 
						|
<CODE>NULL</CODE>.  It allocates enough memory to fit copies of each
 | 
						|
of the strings, as a unit; for instance:
 | 
						|
</P>
 | 
						|
<PRE>
 | 
						|
     ap_pstrcat (r->pool, "foo", "/", "bar", NULL);
 | 
						|
</PRE>
 | 
						|
<P>
 | 
						|
returns a pointer to 8 bytes worth of memory, initialized to
 | 
						|
<CODE>"foo/bar"</CODE>.
 | 
						|
</P>
 | 
						|
<H3><A NAME="pools-used">Commonly-used pools in the Apache Web server</A></H3>
 | 
						|
<P>
 | 
						|
A pool is really defined by its lifetime more than anything else.  There
 | 
						|
are some static pools in http_main which are passed to various
 | 
						|
non-http_main functions as arguments at opportune times.  Here they are:
 | 
						|
</P>
 | 
						|
<DL COMPACT>
 | 
						|
 <DT>permanent_pool
 | 
						|
 </DT>
 | 
						|
 <DD>
 | 
						|
  <UL>
 | 
						|
   <LI>never passed to anything else, this is the ancestor of all pools
 | 
						|
   </LI>
 | 
						|
  </UL>
 | 
						|
 </DD>
 | 
						|
 <DT>pconf
 | 
						|
 </DT>
 | 
						|
 <DD>
 | 
						|
  <UL>
 | 
						|
   <LI>subpool of permanent_pool
 | 
						|
   </LI>
 | 
						|
   <LI>created at the beginning of a config "cycle"; exists until the
 | 
						|
    server is terminated or restarts; passed to all config-time
 | 
						|
    routines, either via cmd->pool, or as the "pool *p" argument on
 | 
						|
    those which don't take pools
 | 
						|
   </LI>
 | 
						|
   <LI>passed to the module init() functions
 | 
						|
   </LI>
 | 
						|
  </UL>
 | 
						|
 </DD>
 | 
						|
 <DT>ptemp
 | 
						|
 </DT>
 | 
						|
 <DD>
 | 
						|
  <UL>
 | 
						|
   <LI>sorry I lie, this pool isn't called this currently in 1.3, I
 | 
						|
    renamed it this in my pthreads development.  I'm referring to
 | 
						|
    the use of ptrans in the parent... contrast this with the later
 | 
						|
    definition of ptrans in the child.
 | 
						|
   </LI>
 | 
						|
   <LI>subpool of permanent_pool
 | 
						|
   </LI>
 | 
						|
   <LI>created at the beginning of a config "cycle"; exists until the
 | 
						|
    end of config parsing; passed to config-time routines <EM>via</EM>
 | 
						|
    cmd->temp_pool.  Somewhat of a "bastard child" because it isn't
 | 
						|
    available everywhere.  Used for temporary scratch space which
 | 
						|
    may be needed by some config routines but which is deleted at
 | 
						|
    the end of config.
 | 
						|
   </LI>
 | 
						|
  </UL>
 | 
						|
 </DD>
 | 
						|
 <DT>pchild
 | 
						|
 </DT>
 | 
						|
 <DD>
 | 
						|
  <UL>
 | 
						|
   <LI>subpool of permanent_pool
 | 
						|
   </LI>
 | 
						|
   <LI>created when a child is spawned (or a thread is created); lives
 | 
						|
    until that child (thread) is destroyed
 | 
						|
   </LI>
 | 
						|
   <LI>passed to the module child_init functions
 | 
						|
   </LI>
 | 
						|
   <LI>destruction happens right after the child_exit functions are
 | 
						|
    called... (which may explain why I think child_exit is redundant
 | 
						|
    and unneeded)
 | 
						|
   </LI>
 | 
						|
  </UL>
 | 
						|
 </DD>
 | 
						|
 <DT>ptrans
 | 
						|
 <DT>
 | 
						|
 <DD>
 | 
						|
  <UL>
 | 
						|
   <LI>should be a subpool of pchild, but currently is a subpool of
 | 
						|
    permanent_pool, see above
 | 
						|
   </LI>
 | 
						|
   <LI>cleared by the child before going into the accept() loop to receive
 | 
						|
    a connection
 | 
						|
   </LI>
 | 
						|
   <LI>used as connection->pool
 | 
						|
   </LI>
 | 
						|
  </UL>
 | 
						|
 </DD>
 | 
						|
 <DT>r->pool
 | 
						|
 </DT>
 | 
						|
 <DD>
 | 
						|
  <UL>
 | 
						|
   <LI>for the main request this is a subpool of connection->pool; for
 | 
						|
    subrequests it is a subpool of the parent request's pool.
 | 
						|
   </LI>
 | 
						|
   <LI>exists until the end of the request (<EM>i.e.</EM>,
 | 
						|
    ap_destroy_sub_req, or
 | 
						|
    in child_main after process_request has finished)
 | 
						|
   </LI>
 | 
						|
   <LI>note that r itself is allocated from r->pool; <EM>i.e.</EM>,
 | 
						|
    r->pool is
 | 
						|
    first created and then r is the first thing palloc()d from it
 | 
						|
   </LI>
 | 
						|
  </UL>
 | 
						|
 </DD>
 | 
						|
</DL>
 | 
						|
<P>
 | 
						|
For almost everything folks do, r->pool is the pool to use.  But you
 | 
						|
can see how other lifetimes, such as pchild, are useful to some
 | 
						|
modules... such as modules that need to open a database connection once
 | 
						|
per child, and wish to clean it up when the child dies.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
You can also see how some bugs have manifested themself, such as setting
 | 
						|
connection->user to a value from r->pool -- in this case
 | 
						|
connection exists
 | 
						|
for the lifetime of ptrans, which is longer than r->pool (especially if
 | 
						|
r->pool is a subrequest!).  So the correct thing to do is to allocate
 | 
						|
from connection->pool.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
And there was another interesting bug in mod_include/mod_cgi.  You'll see
 | 
						|
in those that they do this test to decide if they should use r->pool
 | 
						|
or r->main->pool.  In this case the resource that they are registering
 | 
						|
for cleanup is a child process.  If it were registered in r->pool,
 | 
						|
then the code would wait() for the child when the subrequest finishes.
 | 
						|
With mod_include this could be any old #include, and the delay can be up
 | 
						|
to 3 seconds... and happened quite frequently.  Instead the subprocess
 | 
						|
is registered in r->main->pool which causes it to be cleaned up when
 | 
						|
the entire request is done -- <EM>i.e.</EM>, after the output has been sent to
 | 
						|
the client and logging has happened.
 | 
						|
</P>
 | 
						|
<H3><A NAME="pool-files">Tracking open files, etc.</A></H3>
 | 
						|
<P>
 | 
						|
As indicated above, resource pools are also used to track other sorts
 | 
						|
of resources besides memory.  The most common are open files.  The
 | 
						|
routine which is typically used for this is <CODE>ap_pfopen</CODE>, which
 | 
						|
takes a resource pool and two strings as arguments; the strings are
 | 
						|
the same as the typical arguments to <CODE>fopen</CODE>, <EM>e.g.</EM>,
 | 
						|
</P>
 | 
						|
<PRE>
 | 
						|
     ...
 | 
						|
     FILE *f = ap_pfopen (r->pool, r->filename, "r");
 | 
						|
 | 
						|
     if (f == NULL) { ... } else { ... }
 | 
						|
</PRE>
 | 
						|
<P>
 | 
						|
There is also a <CODE>ap_popenf</CODE> routine, which parallels the
 | 
						|
lower-level <CODE>open</CODE> system call.  Both of these routines
 | 
						|
arrange for the file to be closed when the resource pool in question
 | 
						|
is cleared.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Unlike the case for memory, there <EM>are</EM> functions to close
 | 
						|
files allocated with <CODE>ap_pfopen</CODE>, and <CODE>ap_popenf</CODE>,
 | 
						|
namely <CODE>ap_pfclose</CODE> and <CODE>ap_pclosef</CODE>.  (This is
 | 
						|
because, on many systems, the number of files which a single process
 | 
						|
can have open is quite limited).  It is important to use these
 | 
						|
functions to close files allocated with <CODE>ap_pfopen</CODE> and
 | 
						|
<CODE>ap_popenf</CODE>, since to do otherwise could cause fatal errors on
 | 
						|
systems such as Linux, which react badly if the same
 | 
						|
<CODE>FILE*</CODE> is closed more than once.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
(Using the <CODE>close</CODE> functions is not mandatory, since the
 | 
						|
file will eventually be closed regardless, but you should consider it
 | 
						|
in cases where your module is opening, or could open, a lot of files).
 | 
						|
</P>
 | 
						|
<H3>Other sorts of resources --- cleanup functions</H3>
 | 
						|
<BLOCKQUOTE>
 | 
						|
More text goes here.  Describe the the cleanup primitives in terms of
 | 
						|
which the file stuff is implemented; also, <CODE>spawn_process</CODE>.
 | 
						|
</BLOCKQUOTE>
 | 
						|
<P>
 | 
						|
Pool cleanups live until clear_pool() is called:  clear_pool(a) recursively
 | 
						|
calls destroy_pool() on all subpools of a; then calls all the cleanups for a; 
 | 
						|
then releases all the memory for a.  destroy_pool(a) calls clear_pool(a) 
 | 
						|
and then releases the pool structure itself.  <EM>i.e.</EM>, clear_pool(a) doesn't
 | 
						|
delete a, it just frees up all the resources and you can start using it
 | 
						|
again immediately. 
 | 
						|
</P>
 | 
						|
<H3>Fine control --- creating and dealing with sub-pools, with a note
 | 
						|
on sub-requests</H3>
 | 
						|
 | 
						|
On rare occasions, too-free use of <CODE>ap_palloc()</CODE> and the
 | 
						|
associated primitives may result in undesirably profligate resource
 | 
						|
allocation.  You can deal with such a case by creating a
 | 
						|
<EM>sub-pool</EM>, allocating within the sub-pool rather than the main
 | 
						|
pool, and clearing or destroying the sub-pool, which releases the
 | 
						|
resources which were associated with it.  (This really <EM>is</EM> a
 | 
						|
rare situation; the only case in which it comes up in the standard
 | 
						|
module set is in case of listing directories, and then only with
 | 
						|
<EM>very</EM> large directories.  Unnecessary use of the primitives
 | 
						|
discussed here can hair up your code quite a bit, with very little
 | 
						|
gain). <P>
 | 
						|
 | 
						|
The primitive for creating a sub-pool is <CODE>ap_make_sub_pool</CODE>,
 | 
						|
which takes another pool (the parent pool) as an argument.  When the
 | 
						|
main pool is cleared, the sub-pool will be destroyed.  The sub-pool
 | 
						|
may also be cleared or destroyed at any time, by calling the functions
 | 
						|
<CODE>ap_clear_pool</CODE> and <CODE>ap_destroy_pool</CODE>, respectively.
 | 
						|
(The difference is that <CODE>ap_clear_pool</CODE> frees resources
 | 
						|
associated with the pool, while <CODE>ap_destroy_pool</CODE> also
 | 
						|
deallocates the pool itself.  In the former case, you can allocate new
 | 
						|
resources within the pool, and clear it again, and so forth; in the
 | 
						|
latter case, it is simply gone). <P>
 | 
						|
 | 
						|
One final note --- sub-requests have their own resource pools, which
 | 
						|
are sub-pools of the resource pool for the main request.  The polite
 | 
						|
way to reclaim the resources associated with a sub request which you
 | 
						|
have allocated (using the <CODE>ap_sub_req_...</CODE> functions)
 | 
						|
is <CODE>ap_destroy_sub_req</CODE>, which frees the resource pool.
 | 
						|
Before calling this function, be sure to copy anything that you care
 | 
						|
about which might be allocated in the sub-request's resource pool into
 | 
						|
someplace a little less volatile (for instance, the filename in its
 | 
						|
<CODE>request_rec</CODE> structure). <P>
 | 
						|
 | 
						|
(Again, under most circumstances, you shouldn't feel obliged to call
 | 
						|
this function; only 2K of memory or so are allocated for a typical sub
 | 
						|
request, and it will be freed anyway when the main request pool is
 | 
						|
cleared.  It is only when you are allocating many, many sub-requests
 | 
						|
for a single main request that you should seriously consider the
 | 
						|
<CODE>ap_destroy_...</CODE> functions).
 | 
						|
 | 
						|
<H2><A NAME="config">Configuration, commands and the like</A></H2>
 | 
						|
 | 
						|
One of the design goals for this server was to maintain external
 | 
						|
compatibility with the NCSA 1.3 server --- that is, to read the same
 | 
						|
configuration files, to process all the directives therein correctly,
 | 
						|
and in general to be a drop-in replacement for NCSA.  On the other
 | 
						|
hand, another design goal was to move as much of the server's
 | 
						|
functionality into modules which have as little as possible to do with
 | 
						|
the monolithic server core.  The only way to reconcile these goals is
 | 
						|
to move the handling of most commands from the central server into the
 | 
						|
modules.  <P>
 | 
						|
 | 
						|
However, just giving the modules command tables is not enough to
 | 
						|
divorce them completely from the server core.  The server has to
 | 
						|
remember the commands in order to act on them later.  That involves
 | 
						|
maintaining data which is private to the modules, and which can be
 | 
						|
either per-server, or per-directory.  Most things are per-directory,
 | 
						|
including in particular access control and authorization information,
 | 
						|
but also information on how to determine file types from suffixes,
 | 
						|
which can be modified by <CODE>AddType</CODE> and
 | 
						|
<CODE>DefaultType</CODE> directives, and so forth.  In general, the
 | 
						|
governing philosophy is that anything which <EM>can</EM> be made
 | 
						|
configurable by directory should be; per-server information is
 | 
						|
generally used in the standard set of modules for information like
 | 
						|
<CODE>Alias</CODE>es and <CODE>Redirect</CODE>s which come into play
 | 
						|
before the request is tied to a particular place in the underlying
 | 
						|
file system. <P>
 | 
						|
 | 
						|
Another requirement for emulating the NCSA server is being able to
 | 
						|
handle the per-directory configuration files, generally called
 | 
						|
<CODE>.htaccess</CODE> files, though even in the NCSA server they can
 | 
						|
contain directives which have nothing at all to do with access
 | 
						|
control.  Accordingly, after URI -> filename translation, but before
 | 
						|
performing any other phase, the server walks down the directory
 | 
						|
hierarchy of the underlying filesystem, following the translated
 | 
						|
pathname, to read any <CODE>.htaccess</CODE> files which might be
 | 
						|
present.  The information which is read in then has to be
 | 
						|
<EM>merged</EM> with the applicable information from the server's own
 | 
						|
config files (either from the <CODE><Directory></CODE> sections
 | 
						|
in <CODE>access.conf</CODE>, or from defaults in
 | 
						|
<CODE>srm.conf</CODE>, which actually behaves for most purposes almost
 | 
						|
exactly like <CODE><Directory /></CODE>).<P>
 | 
						|
 | 
						|
Finally, after having served a request which involved reading
 | 
						|
<CODE>.htaccess</CODE> files, we need to discard the storage allocated
 | 
						|
for handling them.  That is solved the same way it is solved wherever
 | 
						|
else similar problems come up, by tying those structures to the
 | 
						|
per-transaction resource pool.  <P>
 | 
						|
 | 
						|
<H3><A NAME="per-dir">Per-directory configuration structures</A></H3>
 | 
						|
 | 
						|
Let's look out how all of this plays out in <CODE>mod_mime.c</CODE>,
 | 
						|
which defines the file typing handler which emulates the NCSA server's
 | 
						|
behavior of determining file types from suffixes.  What we'll be
 | 
						|
looking at, here, is the code which implements the
 | 
						|
<CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands.  These
 | 
						|
commands can appear in <CODE>.htaccess</CODE> files, so they must be
 | 
						|
handled in the module's private per-directory data, which in fact,
 | 
						|
consists of two separate <CODE>table</CODE>s for MIME types and
 | 
						|
encoding information, and is declared as follows:
 | 
						|
 | 
						|
<PRE>
 | 
						|
typedef struct {
 | 
						|
    table *forced_types;      /* Additional AddTyped stuff */
 | 
						|
    table *encoding_types;    /* Added with AddEncoding... */
 | 
						|
} mime_dir_config;
 | 
						|
</PRE>
 | 
						|
 | 
						|
When the server is reading a configuration file, or
 | 
						|
<CODE><Directory></CODE> section, which includes one of the MIME
 | 
						|
module's commands, it needs to create a <CODE>mime_dir_config</CODE>
 | 
						|
structure, so those commands have something to act on.  It does this
 | 
						|
by invoking the function it finds in the module's `create per-dir
 | 
						|
config slot', with two arguments: the name of the directory to which
 | 
						|
this configuration information applies (or <CODE>NULL</CODE> for
 | 
						|
<CODE>srm.conf</CODE>), and a pointer to a resource pool in which the
 | 
						|
allocation should happen. <P>
 | 
						|
 | 
						|
(If we are reading a <CODE>.htaccess</CODE> file, that resource pool
 | 
						|
is the per-request resource pool for the request; otherwise it is a
 | 
						|
resource pool which is used for configuration data, and cleared on
 | 
						|
restarts.  Either way, it is important for the structure being created
 | 
						|
to vanish when the pool is cleared, by registering a cleanup on the
 | 
						|
pool if necessary). <P>
 | 
						|
 | 
						|
For the MIME module, the per-dir config creation function just
 | 
						|
<CODE>ap_palloc</CODE>s the structure above, and a creates a couple of
 | 
						|
<CODE>table</CODE>s to fill it.  That looks like this:
 | 
						|
 | 
						|
<PRE>
 | 
						|
void *create_mime_dir_config (pool *p, char *dummy)
 | 
						|
{
 | 
						|
    mime_dir_config *new =
 | 
						|
      (mime_dir_config *) ap_palloc (p, sizeof(mime_dir_config));
 | 
						|
 | 
						|
    new->forced_types = ap_make_table (p, 4);
 | 
						|
    new->encoding_types = ap_make_table (p, 4);
 | 
						|
 | 
						|
    return new;
 | 
						|
}
 | 
						|
</PRE>
 | 
						|
 | 
						|
Now, suppose we've just read in a <CODE>.htaccess</CODE> file.  We
 | 
						|
already have the per-directory configuration structure for the next
 | 
						|
directory up in the hierarchy.  If the <CODE>.htaccess</CODE> file we
 | 
						|
just read in didn't have any <CODE>AddType</CODE> or
 | 
						|
<CODE>AddEncoding</CODE> commands, its per-directory config structure
 | 
						|
for the MIME module is still valid, and we can just use it.
 | 
						|
Otherwise, we need to merge the two structures somehow. <P>
 | 
						|
 | 
						|
To do that, the server invokes the module's per-directory config merge
 | 
						|
function, if one is present.  That function takes three arguments:
 | 
						|
the two structures being merged, and a resource pool in which to
 | 
						|
allocate the result.  For the MIME module, all that needs to be done
 | 
						|
is overlay the tables from the new per-directory config structure with
 | 
						|
those from the parent:
 | 
						|
 | 
						|
<PRE>
 | 
						|
void *merge_mime_dir_configs (pool *p, void *parent_dirv, void *subdirv)
 | 
						|
{
 | 
						|
    mime_dir_config *parent_dir = (mime_dir_config *)parent_dirv;
 | 
						|
    mime_dir_config *subdir = (mime_dir_config *)subdirv;
 | 
						|
    mime_dir_config *new =
 | 
						|
      (mime_dir_config *)ap_palloc (p, sizeof(mime_dir_config));
 | 
						|
 | 
						|
    new->forced_types = ap_overlay_tables (p, subdir->forced_types,
 | 
						|
                                        parent_dir->forced_types);
 | 
						|
    new->encoding_types = ap_overlay_tables (p, subdir->encoding_types,
 | 
						|
                                          parent_dir->encoding_types);
 | 
						|
 | 
						|
    return new;
 | 
						|
}
 | 
						|
</PRE>
 | 
						|
 | 
						|
As a note --- if there is no per-directory merge function present, the
 | 
						|
server will just use the subdirectory's configuration info, and ignore
 | 
						|
the parent's.  For some modules, that works just fine (<EM>e.g.</EM>, for the
 | 
						|
includes module, whose per-directory configuration information
 | 
						|
consists solely of the state of the <CODE>XBITHACK</CODE>), and for
 | 
						|
those modules, you can just not declare one, and leave the
 | 
						|
corresponding structure slot in the module itself <CODE>NULL</CODE>.<P>
 | 
						|
 | 
						|
<H3><A NAME="commands">Command handling</A></H3>
 | 
						|
 | 
						|
Now that we have these structures, we need to be able to figure out
 | 
						|
how to fill them.  That involves processing the actual
 | 
						|
<CODE>AddType</CODE> and <CODE>AddEncoding</CODE> commands.  To find
 | 
						|
commands, the server looks in the module's <CODE>command table</CODE>.
 | 
						|
That table contains information on how many arguments the commands
 | 
						|
take, and in what formats, where it is permitted, and so forth.  That
 | 
						|
information is sufficient to allow the server to invoke most
 | 
						|
command-handling functions with pre-parsed arguments.  Without further
 | 
						|
ado, let's look at the <CODE>AddType</CODE> command handler, which
 | 
						|
looks like this (the <CODE>AddEncoding</CODE> command looks basically
 | 
						|
the same, and won't be shown here):
 | 
						|
 | 
						|
<PRE>
 | 
						|
char *add_type(cmd_parms *cmd, mime_dir_config *m, char *ct, char *ext)
 | 
						|
{
 | 
						|
    if (*ext == '.') ++ext;
 | 
						|
    ap_table_set (m->forced_types, ext, ct);
 | 
						|
    return NULL;
 | 
						|
}
 | 
						|
</PRE>
 | 
						|
 | 
						|
This command handler is unusually simple.  As you can see, it takes
 | 
						|
four arguments, two of which are pre-parsed arguments, the third being
 | 
						|
the per-directory configuration structure for the module in question,
 | 
						|
and the fourth being a pointer to a <CODE>cmd_parms</CODE> structure.
 | 
						|
That structure contains a bunch of arguments which are frequently of
 | 
						|
use to some, but not all, commands, including a resource pool (from
 | 
						|
which memory can be allocated, and to which cleanups should be tied),
 | 
						|
and the (virtual) server being configured, from which the module's
 | 
						|
per-server configuration data can be obtained if required.<P>
 | 
						|
 | 
						|
Another way in which this particular command handler is unusually
 | 
						|
simple is that there are no error conditions which it can encounter.
 | 
						|
If there were, it could return an error message instead of
 | 
						|
<CODE>NULL</CODE>; this causes an error to be printed out on the
 | 
						|
server's <CODE>stderr</CODE>, followed by a quick exit, if it is in
 | 
						|
the main config files; for a <CODE>.htaccess</CODE> file, the syntax
 | 
						|
error is logged in the server error log (along with an indication of
 | 
						|
where it came from), and the request is bounced with a server error
 | 
						|
response (HTTP error status, code 500). <P>
 | 
						|
 | 
						|
The MIME module's command table has entries for these commands, which
 | 
						|
look like this:
 | 
						|
 | 
						|
<PRE>
 | 
						|
command_rec mime_cmds[] = {
 | 
						|
{ "AddType", add_type, NULL, OR_FILEINFO, TAKE2,
 | 
						|
    "a mime type followed by a file extension" },
 | 
						|
{ "AddEncoding", add_encoding, NULL, OR_FILEINFO, TAKE2,
 | 
						|
    "an encoding (<EM>e.g.</EM>, gzip), followed by a file extension" },
 | 
						|
{ NULL }
 | 
						|
};
 | 
						|
</PRE>
 | 
						|
 | 
						|
The entries in these tables are:
 | 
						|
 | 
						|
<UL>
 | 
						|
  <LI> The name of the command
 | 
						|
  <LI> The function which handles it
 | 
						|
  <LI> a <CODE>(void *)</CODE> pointer, which is passed in the
 | 
						|
       <CODE>cmd_parms</CODE> structure to the command handler ---
 | 
						|
       this is useful in case many similar commands are handled by the
 | 
						|
       same function.
 | 
						|
  <LI> A bit mask indicating where the command may appear.  There are
 | 
						|
       mask bits corresponding to each <CODE>AllowOverride</CODE>
 | 
						|
       option, and an additional mask bit, <CODE>RSRC_CONF</CODE>,
 | 
						|
       indicating that the command may appear in the server's own
 | 
						|
       config files, but <EM>not</EM> in any <CODE>.htaccess</CODE>
 | 
						|
       file.
 | 
						|
  <LI> A flag indicating how many arguments the command handler wants
 | 
						|
       pre-parsed, and how they should be passed in.
 | 
						|
       <CODE>TAKE2</CODE> indicates two pre-parsed arguments.  Other
 | 
						|
       options are <CODE>TAKE1</CODE>, which indicates one pre-parsed
 | 
						|
       argument, <CODE>FLAG</CODE>, which indicates that the argument
 | 
						|
       should be <CODE>On</CODE> or <CODE>Off</CODE>, and is passed in
 | 
						|
       as a boolean flag, <CODE>RAW_ARGS</CODE>, which causes the
 | 
						|
       server to give the command the raw, unparsed arguments
 | 
						|
       (everything but the command name itself).  There is also
 | 
						|
       <CODE>ITERATE</CODE>, which means that the handler looks the
 | 
						|
       same as <CODE>TAKE1</CODE>, but that if multiple arguments are
 | 
						|
       present, it should be called multiple times, and finally
 | 
						|
       <CODE>ITERATE2</CODE>, which indicates that the command handler
 | 
						|
       looks like a <CODE>TAKE2</CODE>, but if more arguments are
 | 
						|
       present, then it should be called multiple times, holding the
 | 
						|
       first argument constant.
 | 
						|
  <LI> Finally, we have a string which describes the arguments that
 | 
						|
       should be present.  If the arguments in the actual config file
 | 
						|
       are not as required, this string will be used to help give a
 | 
						|
       more specific error message.  (You can safely leave this
 | 
						|
       <CODE>NULL</CODE>).
 | 
						|
</UL>
 | 
						|
 | 
						|
Finally, having set this all up, we have to use it.  This is
 | 
						|
ultimately done in the module's handlers, specifically for its
 | 
						|
file-typing handler, which looks more or less like this; note that the
 | 
						|
per-directory configuration structure is extracted from the
 | 
						|
<CODE>request_rec</CODE>'s per-directory configuration vector by using
 | 
						|
the <CODE>ap_get_module_config</CODE> function.
 | 
						|
 | 
						|
<PRE>
 | 
						|
int find_ct(request_rec *r)
 | 
						|
{
 | 
						|
    int i;
 | 
						|
    char *fn = ap_pstrdup (r->pool, r->filename);
 | 
						|
    mime_dir_config *conf = (mime_dir_config *)
 | 
						|
             ap_get_module_config(r->per_dir_config, &mime_module);
 | 
						|
    char *type;
 | 
						|
 | 
						|
    if (S_ISDIR(r->finfo.st_mode)) {
 | 
						|
        r->content_type = DIR_MAGIC_TYPE;
 | 
						|
        return OK;
 | 
						|
    }
 | 
						|
 | 
						|
    if((i=ap_rind(fn,'.')) < 0) return DECLINED;
 | 
						|
    ++i;
 | 
						|
 | 
						|
    if ((type = ap_table_get (conf->encoding_types, &fn[i])))
 | 
						|
    {
 | 
						|
        r->content_encoding = type;
 | 
						|
 | 
						|
        /* go back to previous extension to try to use it as a type */
 | 
						|
 | 
						|
        fn[i-1] = '\0';
 | 
						|
        if((i=ap_rind(fn,'.')) < 0) return OK;
 | 
						|
        ++i;
 | 
						|
    }
 | 
						|
 | 
						|
    if ((type = ap_table_get (conf->forced_types, &fn[i])))
 | 
						|
    {
 | 
						|
        r->content_type = type;
 | 
						|
    }
 | 
						|
 | 
						|
    return OK;
 | 
						|
}
 | 
						|
 | 
						|
</PRE>
 | 
						|
 | 
						|
<H3><A NAME="servconf">Side notes --- per-server configuration, virtual
 | 
						|
 servers, <EM>etc</EM>.</A></H3>
 | 
						|
 | 
						|
The basic ideas behind per-server module configuration are basically
 | 
						|
the same as those for per-directory configuration; there is a creation
 | 
						|
function and a merge function, the latter being invoked where a
 | 
						|
virtual server has partially overridden the base server configuration,
 | 
						|
and a combined structure must be computed.  (As with per-directory
 | 
						|
configuration, the default if no merge function is specified, and a
 | 
						|
module is configured in some virtual server, is that the base
 | 
						|
configuration is simply ignored). <P>
 | 
						|
 | 
						|
The only substantial difference is that when a command needs to
 | 
						|
configure the per-server private module data, it needs to go to the
 | 
						|
<CODE>cmd_parms</CODE> data to get at it.  Here's an example, from the
 | 
						|
alias module, which also indicates how a syntax error can be returned
 | 
						|
(note that the per-directory configuration argument to the command
 | 
						|
handler is declared as a dummy, since the module doesn't actually have
 | 
						|
per-directory config data):
 | 
						|
 | 
						|
<PRE>
 | 
						|
char *add_redirect(cmd_parms *cmd, void *dummy, char *f, char *url)
 | 
						|
{
 | 
						|
    server_rec *s = cmd->server;
 | 
						|
    alias_server_conf *conf = (alias_server_conf *)
 | 
						|
            ap_get_module_config(s->module_config,&alias_module);
 | 
						|
    alias_entry *new = ap_push_array (conf->redirects);
 | 
						|
 | 
						|
    if (!ap_is_url (url)) return "Redirect to non-URL";
 | 
						|
 | 
						|
    new->fake = f; new->real = url;
 | 
						|
    return NULL;
 | 
						|
}
 | 
						|
</PRE>
 | 
						|
<!--#include virtual="footer.html" -->
 | 
						|
</BODY></HTML>
 |