mirror of
https://github.com/apache/httpd.git
synced 2025-08-26 05:42:34 +03:00
Thanks to Jean-Philippe BAUDOUIN <baudouin@cyberaccess.fr> for hints. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@81944 13f79535-47bb-0310-9956-ffa450edef68
397 lines
16 KiB
HTML
397 lines
16 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML><HEAD>
|
|
<TITLE>An In-Depth Discussion of VirtualHost Matching</TITLE>
|
|
</HEAD>
|
|
|
|
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
|
|
<BODY
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#000080"
|
|
ALINK="#FF0000"
|
|
>
|
|
<!--#include virtual="header.html" -->
|
|
<H1 ALIGN="CENTER">An In-Depth Discussion of VirtualHost Matching</H1>
|
|
|
|
<P>This is a very rough document that was probably out of date the moment
|
|
it was written. It attempts to explain exactly what the code does when
|
|
deciding what virtual host to serve a hit from. It's provided on the
|
|
assumption that something is better than nothing. The server version
|
|
under discussion is Apache 1.2.
|
|
|
|
<P>If you just want to "make it work" without understanding
|
|
how, there's a <A HREF="#whatworks">What Works</A> section at the bottom.
|
|
|
|
<H3>Config File Parsing</H3>
|
|
|
|
<P>There is a main_server which consists of all the definitions appearing
|
|
outside of <CODE>VirtualHost</CODE> sections. There are virtual servers,
|
|
called <EM>vhosts</EM>, which are defined by
|
|
<A
|
|
HREF="../mod/core.html#virtualhost"
|
|
><SAMP>VirtualHost</SAMP></A>
|
|
sections.
|
|
|
|
<P>The directives
|
|
<A
|
|
HREF="../mod/core.html#port"
|
|
><SAMP>Port</SAMP></A>,
|
|
<A
|
|
HREF="../mod/core.html#servername"
|
|
><SAMP>ServerName</SAMP></A>,
|
|
<A
|
|
HREF="../mod/core.html#serverpath"
|
|
><SAMP>ServerPath</SAMP></A>,
|
|
and
|
|
<A
|
|
HREF="../mod/core.html#serveralias"
|
|
><SAMP>ServerAlias</SAMP></A>
|
|
can appear anywhere within the definition of
|
|
a server. However, each appearance overrides the previous appearance
|
|
(within that server).
|
|
|
|
<P>The default value of the <CODE>Port</CODE> field for main_server
|
|
is 80. The main_server has no default <CODE>ServerName</CODE>,
|
|
<CODE>ServerPath</CODE>, or <CODE>ServerAlias</CODE>.
|
|
|
|
<P>In the absence of any
|
|
<A
|
|
HREF="../mod/core.html#listen"
|
|
><SAMP>Listen</SAMP></A>
|
|
directives, the (final if there
|
|
are multiple) <CODE>Port</CODE> directive in the main_server indicates
|
|
which port httpd will listen on.
|
|
|
|
<P> The <CODE>Port</CODE> and <CODE>ServerName</CODE> directives for
|
|
any server main or virtual are used when generating URLs such as during
|
|
redirects.
|
|
|
|
<P> Each address appearing in the <CODE>VirtualHost</CODE> directive
|
|
can have an optional port. If the port is unspecified it defaults to
|
|
the value of the main_server's most recent <CODE>Port</CODE> statement.
|
|
The special port <SAMP>*</SAMP> indicates a wildcard that matches any port.
|
|
Collectively the entire set of addresses (including multiple
|
|
<SAMP>A</SAMP> record
|
|
results from DNS lookups) are called the vhost's <EM>address set</EM>.
|
|
|
|
<P> The magic <CODE>_default_</CODE> address has significance during
|
|
the matching algorithm. It essentially matches any unspecified address.
|
|
|
|
<P> After parsing the <CODE>VirtualHost</CODE> directive, the vhost server
|
|
is given a default <CODE>Port</CODE> equal to the port assigned to the
|
|
first name in its <CODE>VirtualHost</CODE> directive. The complete
|
|
list of names in the <CODE>VirtualHost</CODE> directive are treated
|
|
just like a <CODE>ServerAlias</CODE> (but are not overridden by any
|
|
<CODE>ServerAlias</CODE> statement). Note that subsequent <CODE>Port</CODE>
|
|
statements for this vhost will not affect the ports assigned in the
|
|
address set.
|
|
|
|
<P>
|
|
All vhosts are stored in a list which is in the reverse order that
|
|
they appeared in the config file. For example, if the config file is:
|
|
|
|
<BLOCKQUOTE><PRE>
|
|
<VirtualHost A>
|
|
...
|
|
</VirtualHost>
|
|
|
|
<VirtualHost B>
|
|
...
|
|
</VirtualHost>
|
|
|
|
<VirtualHost C>
|
|
...
|
|
</VirtualHost>
|
|
</PRE></BLOCKQUOTE>
|
|
|
|
Then the list will be ordered: main_server, C, B, A. Keep this in mind.
|
|
|
|
<P>
|
|
After parsing has completed, the list of servers is scanned, and various
|
|
merges and default values are set. In particular:
|
|
|
|
<OL>
|
|
<LI>If a vhost has no
|
|
<A
|
|
HREF="../mod/core.html#serveradmin"
|
|
><CODE>ServerAdmin</CODE></A>,
|
|
<A
|
|
HREF="../mod/core.html#resourceconfig"
|
|
><CODE>ResourceConfig</CODE></A>,
|
|
<A
|
|
HREF="../mod/core.html#accessconfig"
|
|
><CODE>AccessConfig</CODE></A>,
|
|
<A
|
|
HREF="../mod/core.html#timeout"
|
|
><CODE>Timeout</CODE></A>,
|
|
<A
|
|
HREF="../mod/core.html#keepalivetimeout"
|
|
><CODE>KeepAliveTimeout</CODE></A>,
|
|
<A
|
|
HREF="../mod/core.html#keepalive"
|
|
><CODE>KeepAlive</CODE></A>,
|
|
<A
|
|
HREF="../mod/core.html#maxkeepaliverequests"
|
|
><CODE>MaxKeepAliveRequests</CODE></A>,
|
|
or
|
|
<A
|
|
HREF="../mod/core.html#sendbuffersize"
|
|
><CODE>SendBufferSize</CODE></A>
|
|
directive then the respective value is
|
|
inherited from the main_server. (That is, inherited from whatever
|
|
the final setting of that value is in the main_server.)
|
|
|
|
<LI>The "lookup defaults" that define the default directory
|
|
permissions
|
|
for a vhost are merged with those of the main server. This includes
|
|
any per-directory configuration information for any module.
|
|
|
|
<LI>The per-server configs for each module from the main_server are
|
|
merged into the vhost server.
|
|
</OL>
|
|
|
|
Essentially, the main_server is treated as "defaults" or a
|
|
"base" on
|
|
which to build each vhost. But the positioning of these main_server
|
|
definitions in the config file is largely irrelevant -- the entire
|
|
config of the main_server has been parsed when this final merging occurs.
|
|
So even if a main_server definition appears after a vhost definition
|
|
it might affect the vhost definition.
|
|
|
|
<P> If the main_server has no <CODE>ServerName</CODE> at this point,
|
|
then the hostname of the machine that httpd is running on is used
|
|
instead. We will call the <EM>main_server address set</EM> those IP
|
|
addresses returned by a DNS lookup on the <CODE>ServerName</CODE> of
|
|
the main_server.
|
|
|
|
<P> Now a pass is made through the vhosts to fill in any missing
|
|
<CODE>ServerName</CODE> fields and to classify the vhost as either
|
|
an <EM>IP-based</EM> vhost or a <EM>name-based</EM> vhost. A vhost is
|
|
considered a name-based vhost if any of its address set overlaps the
|
|
main_server (the port associated with each address must match the
|
|
main_server's <CODE>Port</CODE>). Otherwise it is considered an IP-based
|
|
vhost.
|
|
|
|
<P> For any undefined <CODE>ServerName</CODE> fields, a name-based vhost
|
|
defaults to the address given first in the <CODE>VirtualHost</CODE>
|
|
statement defining the vhost. Any vhost that includes the magic
|
|
<SAMP>_default_</SAMP> wildcard is given the same <CODE>ServerName</CODE> as
|
|
the main_server. Otherwise the vhost (which is necessarily an IP-based
|
|
vhost) is given a <CODE>ServerName</CODE> based on the result of a reverse
|
|
DNS lookup on the first address given in the <CODE>VirtualHost</CODE>
|
|
statement.
|
|
|
|
<P>
|
|
|
|
<H3>Vhost Matching</H3>
|
|
|
|
|
|
<P><STRONG>Apache 1.3 differs from what is documented
|
|
here, and documentation still has to be written.</STRONG>
|
|
|
|
<P>
|
|
The server determines which vhost to use for a request as follows:
|
|
|
|
<P> <CODE>find_virtual_server</CODE>: When the connection is first made
|
|
by the client, the local IP address (the IP address to which the client
|
|
connected) is looked up in the server list. A vhost is matched if it
|
|
is an IP-based vhost, the IP address matches and the port matches
|
|
(taking into account wildcards).
|
|
|
|
<P> If no vhosts are matched then the last occurrence, if it appears,
|
|
of a <SAMP>_default_</SAMP> address (which if you recall the ordering of the
|
|
server list mentioned above means that this would be the first occurrence
|
|
of <SAMP>_default_</SAMP> in the config file) is matched.
|
|
|
|
<P> In any event, if nothing above has matched, then the main_server is
|
|
matched.
|
|
|
|
<P> The vhost resulting from the above search is stored with data
|
|
about the connection. We'll call this the <EM>connection vhost</EM>.
|
|
The connection vhost is constant over all requests in a particular TCP/IP
|
|
session -- that is, over all requests in a KeepAlive/persistent session.
|
|
|
|
<P> For each request made on the connection the following sequence of
|
|
events further determines the actual vhost that will be used to serve
|
|
the request.
|
|
|
|
<P> <CODE>check_fulluri</CODE>: If the requestURI is an absoluteURI, that
|
|
is it includes <CODE>http://hostname/</CODE>, then an attempt is made to
|
|
determine if the hostname's address (and optional port) match that of
|
|
the connection vhost. If it does then the hostname portion of the URI
|
|
is saved as the <EM>request_hostname</EM>. If it does not match, then the
|
|
URI remains untouched. <STRONG>Note</STRONG>: to achieve this address
|
|
comparison,
|
|
the hostname supplied goes through a DNS lookup unless it matches the
|
|
<CODE>ServerName</CODE> or the local IP address of the client's socket.
|
|
|
|
<P> <CODE>parse_uri</CODE>: If the URI begins with a protocol
|
|
(<EM>i.e.</EM>, <CODE>http:</CODE>, <CODE>ftp:</CODE>) then the request is
|
|
considered a proxy request. Note that even though we may have stripped
|
|
an <CODE>http://hostname/</CODE> in the previous step, this could still
|
|
be a proxy request.
|
|
|
|
<P> <CODE>read_request</CODE>: If the request does not have a hostname
|
|
from the earlier step, then any <CODE>Host:</CODE> header sent by the
|
|
client is used as the request hostname.
|
|
|
|
<P> <CODE>check_hostalias</CODE>: If the request now has a hostname,
|
|
then an attempt is made to match for this hostname. The first step
|
|
of this match is to compare any port, if one was given in the request,
|
|
against the <CODE>Port</CODE> field of the connection vhost. If there's
|
|
a mismatch then the vhost used for the request is the connection vhost.
|
|
(This is a bug, see observations.)
|
|
|
|
<P>
|
|
If the port matches, then httpd scans the list of vhosts starting with
|
|
the next server <STRONG>after</STRONG> the connection vhost. This scan does not
|
|
stop if there are any matches, it goes through all possible vhosts,
|
|
and in the end uses the last match it found. The comparisons performed
|
|
are as follows:
|
|
|
|
<UL>
|
|
<LI>Compare the request hostname:port with the vhost
|
|
<CODE>ServerName</CODE> and <CODE>Port</CODE>.
|
|
|
|
<LI>Compare the request hostname against any and all addresses given in
|
|
the <CODE>VirtualHost</CODE> directive for this vhost.
|
|
|
|
<LI>Compare the request hostname against the <CODE>ServerAlias</CODE>
|
|
given for the vhost.
|
|
</UL>
|
|
|
|
<P>
|
|
<CODE>check_serverpath</CODE>: If the request has no hostname
|
|
(back up a few paragraphs) then a scan similar to the one
|
|
in <CODE>check_hostalias</CODE> is performed to match any
|
|
<CODE>ServerPath</CODE> directives given in the vhosts. Note that the
|
|
<STRONG>last match</STRONG> is used regardless (again consider the ordering of
|
|
the virtual hosts).
|
|
|
|
<H3>Observations</H3>
|
|
|
|
<UL>
|
|
|
|
<LI>It is difficult to define an IP-based vhost for the machine's
|
|
"main IP address". You essentially have to create a bogus
|
|
<CODE>ServerName</CODE> for the main_server that does not match the
|
|
machine's IPs.
|
|
<P>
|
|
|
|
<LI>During the scans in both <CODE>check_hostalias</CODE> and
|
|
<CODE>check_serverpath</CODE> no check is made that the vhost being
|
|
scanned is actually a name-based vhost. This means, for example, that
|
|
it's possible to match an IP-based vhost through another address. But
|
|
because the scan starts in the vhost list at the first vhost that
|
|
matched the local IP address of the connection, not all IP-based vhosts
|
|
can be matched.
|
|
<P>
|
|
Consider the config file above with three vhosts A, B, C. Suppose
|
|
that B is a named-based vhost, and A and C are IP-based vhosts. If
|
|
a request comes in on B or C's address containing a header
|
|
"<SAMP>Host: A</SAMP>" then
|
|
it will be served from A's config. If a request comes in on A's
|
|
address then it will always be served from A's config regardless of
|
|
any Host: header.
|
|
</P>
|
|
|
|
<LI>Unless you have a <SAMP>_default_</SAMP> vhost,
|
|
it doesn't matter if you mix name-based vhosts in amongst IP-based
|
|
vhosts. During the <CODE>find_virtual_server</CODE> phase above no
|
|
named-based vhost will be matched, so the main_server will remain the
|
|
connection vhost. Then scans will cover all vhosts in the vhost list.
|
|
<P>
|
|
If you do have a <SAMP>_default_</SAMP> vhost, then you cannot place
|
|
named-based vhosts after it in the config. This is because on any
|
|
connection to the main server IPs the connection vhost will always be
|
|
the <SAMP>_default_</SAMP> vhost since none of the name-based are
|
|
considered during <CODE>find_virtual_server</CODE>.
|
|
</P>
|
|
|
|
<LI>You should never specify DNS names in <CODE>VirtualHost</CODE>
|
|
directives because it will force your server to rely on DNS to boot.
|
|
Furthermore it poses a security threat if you do not control the
|
|
DNS for all the domains listed.
|
|
<A HREF="dns-caveats.html">There's more information
|
|
available on this and the next two topics</A>.
|
|
<P>
|
|
|
|
<LI><CODE>ServerName</CODE> should always be set for each vhost. Otherwise
|
|
A DNS lookup is required for each vhost.
|
|
<P>
|
|
|
|
<LI>A DNS lookup is always required for the main_server's
|
|
<CODE>ServerName</CODE> (or to generate that if it isn't specified
|
|
in the config).
|
|
<P>
|
|
|
|
<LI>If a <CODE>ServerPath</CODE> directive exists which is a prefix of
|
|
another <CODE>ServerPath</CODE> directive that appears later in
|
|
the configuration file, then the former will always be matched
|
|
and the latter will never be matched. (That is assuming that no
|
|
Host header was available to disambiguate the two.)
|
|
<P>
|
|
|
|
<LI>If a vhost that would otherwise be a name-vhost includes a
|
|
<CODE>Port</CODE> statement that doesn't match the main_server
|
|
<CODE>Port</CODE> then it will be considered an IP-based vhost.
|
|
Then <CODE>find_virtual_server</CODE> will match it (because
|
|
the ports associated with each address in the address set default
|
|
to the port of the main_server) as the connection vhost. Then
|
|
<CODE>check_hostalias</CODE> will refuse to check any other name-based
|
|
vhost because of the port mismatch. The result is that the vhost
|
|
will steal all hits going to the main_server address.
|
|
<P>
|
|
|
|
<LI>If two IP-based vhosts have an address in common, the vhost appearing
|
|
later in the file is always matched. Such a thing might happen
|
|
inadvertently. If the config has name-based vhosts and for some reason
|
|
the main_server <CODE>ServerName</CODE> resolves to the wrong address
|
|
then all the name-based vhosts will be parsed as ip-based vhosts.
|
|
Then the last of them will steal all the hits.
|
|
<P>
|
|
|
|
<LI>The last name-based vhost in the config is always matched for any hit
|
|
which doesn't match one of the other name-based vhosts.
|
|
|
|
</UL>
|
|
|
|
<H3><A NAME="whatworks">What Works</A></H3>
|
|
|
|
<P>In addition to the tips on the <A HREF="dns-caveats.html#tips">DNS
|
|
Issues</A> page, here are some further tips:
|
|
|
|
<UL>
|
|
|
|
<LI>Place all main_server definitions before any VirtualHost definitions.
|
|
(This is to aid the readability of the configuration -- the post-config
|
|
merging process makes it non-obvious that definitions mixed in around
|
|
virtualhosts might affect all virtualhosts.)
|
|
<P>
|
|
|
|
<LI>Arrange your VirtualHosts such
|
|
that all name-based virtual hosts come first, followed by IP-based
|
|
virtual hosts, followed by any <SAMP>_default_</SAMP> virtual host
|
|
<P>
|
|
|
|
<LI>Avoid <CODE>ServerPaths</CODE> which are prefixes of other
|
|
<CODE>ServerPaths</CODE>. If you cannot avoid this then you have to
|
|
ensure that the longer (more specific) prefix vhost appears earlier in
|
|
the configuration file than the shorter (less specific) prefix
|
|
(<EM>i.e.</EM>, "ServerPath /abc" should appear after
|
|
"ServerPath /abcdef").
|
|
<P>
|
|
|
|
<LI>Do not use <EM>port-based</EM> vhosts in the same server as
|
|
name-based vhosts. A loose definition for port-based is a vhost which
|
|
is determined by the port on the server (<EM>i.e.</EM>, one server with
|
|
ports 8000, 8080, and 80 - all of which have different configurations).
|
|
<P>
|
|
|
|
</UL>
|
|
|
|
<!--#include virtual="footer.html" -->
|
|
</BODY>
|
|
</HTML>
|