mirror of
https://github.com/apache/httpd.git
synced 2025-05-19 02:21:09 +03:00
corrections coming up. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@80130 13f79535-47bb-0310-9956-ffa450edef68
155 lines
5.9 KiB
HTML
155 lines
5.9 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="description" CONTENT="Some 'how to' tips for the Apache httpd server">
|
|
<META NAME="keywords" CONTENT="apache,redirect,robots,rotate,logfiles">
|
|
<TITLE>Apache HOWTO documentation</TITLE>
|
|
</HEAD>
|
|
|
|
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
|
|
<BODY
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#000080"
|
|
ALINK="#FF0000"
|
|
>
|
|
<!--#include virtual="header.html" -->
|
|
<H1 ALIGN="CENTER">Apache HOWTO documentation</H1>
|
|
|
|
How to:
|
|
<UL>
|
|
<LI><A HREF="#redirect">redirect an entire server or directory to a single URL</A>
|
|
<LI><A HREF="#logreset">reset your log files</A>
|
|
<LI><A HREF="#stoprob">stop/restrict robots</A>
|
|
</UL>
|
|
|
|
<HR>
|
|
<H2><A name="redirect">How to redirect an entire server or directory to a single URL</A></H2>
|
|
|
|
<P>There are two chief ways to redirect all requests for an entire
|
|
server to a single location: one which requires the use of
|
|
<CODE>mod_rewrite</CODE>, and another which uses a CGI script.
|
|
|
|
<P>First: if all you need to do is migrate a server from one name to
|
|
another, simply use the <CODE>Redirect</CODE> directive, as supplied
|
|
by <CODE>mod_alias</CODE>:
|
|
|
|
<BLOCKQUOTE><PRE>
|
|
Redirect / http://www.apache.org/
|
|
</PRE></BLOCKQUOTE>
|
|
|
|
<P>Since <CODE>Redirect</CODE> will forward along the complete path,
|
|
however, it may not be appropriate - for example, when the directory
|
|
structure has changed after the move, and you simply want to direct people
|
|
to the home page.
|
|
|
|
<P>The best option is to use the standard Apache module <CODE>mod_rewrite</CODE>.
|
|
If that module is compiled in, the following lines:
|
|
|
|
<BLOCKQUOTE><PRE>RewriteEngine On
|
|
RewriteRule /.* http://www.apache.org/ [R]
|
|
</PRE></BLOCKQUOTE>
|
|
|
|
This will send an HTTP 302 Redirect back to the client, and no matter
|
|
what they gave in the original URL, they'll be sent to
|
|
"http://www.apache.org".
|
|
|
|
The second option is to set up a <CODE>ScriptAlias</CODE> pointing to
|
|
a <STRONG>cgi script</STRONG> which outputs a 301 or 302 status and the location
|
|
of the other server.</P>
|
|
|
|
<P>By using a <STRONG>cgi-script</STRONG> you can intercept various requests and
|
|
treat them specially, e.g. you might want to intercept <STRONG>POST</STRONG>
|
|
requests, so that the client isn't redirected to a script on the other
|
|
server which expects POST information (a redirect will lose the POST
|
|
information.) You might also want to use a CGI script if you don't
|
|
want to compile mod_rewrite into your server.
|
|
|
|
<P>Here's how to redirect all requests to a script... In the server
|
|
configuration file,
|
|
<BLOCKQUOTE><PRE>ScriptAlias / /usr/local/httpd/cgi-bin/redirect_script</PRE></BLOCKQUOTE>
|
|
|
|
and here's a simple perl script to redirect requests:
|
|
|
|
<BLOCKQUOTE><PRE>
|
|
#!/usr/local/bin/perl
|
|
|
|
print "Status: 302 Moved Temporarily\r
|
|
Location: http://www.some.where.else.com/\r\n\r\n";
|
|
|
|
</PRE></BLOCKQUOTE></P>
|
|
|
|
<HR>
|
|
|
|
<H2><A name="logreset">How to reset your log files</A></H2>
|
|
|
|
<P>Sooner or later, you'll want to reset your log files (access_log and
|
|
error_log) because they are too big, or full of old information you don't
|
|
need.</P>
|
|
|
|
<P><CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.</P>
|
|
|
|
<P>Most people's first attempt at replacing the logfile is to just move the
|
|
logfile or remove the logfile. This doesn't work.</P>
|
|
|
|
<P>Apache will continue writing to the logfile at the same offset as before the
|
|
logfile moved. This results in a new logfile being created which is just
|
|
as big as the old one, but it now contains thousands (or millions) of null
|
|
characters.</P>
|
|
|
|
<P>The correct procedure is to move the logfile, then signal Apache to tell it to reopen the logfiles.</P>
|
|
|
|
<P>Apache is signaled using the <STRONG>SIGHUP</STRONG> (-1) signal. e.g.
|
|
<BLOCKQUOTE><CODE>
|
|
mv access_log access_log.old<BR>
|
|
kill -1 `cat httpd.pid`
|
|
</CODE></BLOCKQUOTE>
|
|
</P>
|
|
|
|
<P>Note: <CODE>httpd.pid</CODE> is a file containing the <STRONG>p</STRONG>rocess <STRONG>id</STRONG>
|
|
of the Apache httpd daemon, Apache saves this in the same directory as the log
|
|
files.</P>
|
|
|
|
<P>Many people use this method to replace (and backup) their logfiles on a
|
|
nightly or weekly basis.</P>
|
|
<HR>
|
|
|
|
<H2><A name="stoprob">How to stop or restrict robots</A></H2>
|
|
|
|
<P>Ever wondered why so many clients are interested in a file called
|
|
<CODE>robots.txt</CODE> which you don't have, and never did have?</P>
|
|
|
|
<P>These clients are called <STRONG>robots</STRONG> (also known as crawlers,
|
|
spiders and other cute name) - special automated clients which
|
|
wander around the web looking for interesting resources.</P>
|
|
|
|
<P>Most robots are used to generate some kind of <EM>web index</EM> which
|
|
is then used by a <EM>search engine</EM> to help locate information.</P>
|
|
|
|
<P><CODE>robots.txt</CODE> provides a means to request that robots limit their
|
|
activities at the site, or more often than not, to leave the site alone.</P>
|
|
|
|
<P>When the first robots were developed, they had a bad reputation for
|
|
sending hundreds/thousands of requests to each site, often resulting
|
|
in the site being overloaded. Things have improved dramatically since
|
|
then, thanks to <A
|
|
HREF="http://info.webcrawler.com/mak/projects/robots/guidelines.html">
|
|
Guidelines for Robot Writers</A>, but even so, some robots may exhibit
|
|
unfriendly behavior which the webmaster isn't willing to tolerate, and
|
|
will want to stop.</P>
|
|
|
|
<P>Another reason some webmasters want to block access to robots, is to
|
|
stop them indexing dynamic information. Many search engines will use the
|
|
data collected from your pages for months to come - not much use if your
|
|
serving stock quotes, news, weather reports or anything else that will be
|
|
stale by the time people find it in a search engine.</P>
|
|
|
|
<P>If you decide to exclude robots completely, or just limit the areas
|
|
in which they can roam, create a <CODE>robots.txt</CODE> file; refer
|
|
to the <A HREF="http://info.webcrawler.com/mak/projects/robots/robots.html">robot information pages</A> provided by Martijn Koster for the syntax.</P>
|
|
|
|
<!--#include virtual="footer.html" -->
|
|
</BODY>
|
|
</HTML>
|