apache/docs/manual/misc/howto.html

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML>
<HEAD>
<TITLE>Apache HOWTO documentation</TITLE>
</HEAD>

<BODY>
<!--#include virtual="header.html" -->
<H1>Apache HOWTO documentation</h1>

How to:
<ul>
<li><A HREF="#redirect">redirect an entire server or directory</A>
<li><A HREF="#logreset">reset your log files</A>
<li><A HREF="#stoprob">stop robots</A>
</ul>

<hr>
<H2><A name="redirect">How to redirect an entire server or directory</A></H2>

One way to redirect all requests for an entire server is to setup a
<CODE>Redirect</Code> to a <B>cgi script</B> which outputs a 301 or 302 status
and the location of the other server.<P>

By using a <B>cgi-script</B> you can intercept various requests and treat them
specially, e.g. you might want to intercept <B>POST</B> requests, so that the
client isn't redirected to a script on the other server which expects POST
information (a redirect will lose the POST information.)<P>

Here's how to redirect all requests to a script... In the server configuration
file,
<blockquote><code>ScriptAlias /
/usr/local/httpd/cgi-bin/redirect_script</code></blockquote>

and here's a simple perl script to redirect

<blockquote><code>
#!/usr/local/bin/perl <br>
<br>
print "Status: 302 Moved Temporarily\r <br>
Location: http://www.some.where.else.com/\r\n\r\n"; <br>
<br>
</code></blockquote><p><hr>

<H2><A name="logreset">How to reset your log files</A></H2>

Sooner or later, you'll want to reset your log files (access_log and
error_log) because they are too big, or full of old information you don't
need.<p>

<CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.<p>

Most people's first attempt at replacingthe logfile is to just move the
logfile or remove the logfile. This doesn't work.<p>

Apache will continue writing to the logfile at the same offset as before the
logifile moved. This results in a new logfile being created which is just
as big as the old one, but it now contains thousands (or millions) of null
characters.<p>

The correct procedure is to move the logfile, then signal Apache to tell it to
reopen the logfiles.<p>

Apache is signalled using the <B>SIGHUP</B> (-1) signal. e.g.
<blockquote><code>
mv access_log access_log.old ; kill -1 `cat httpd.pid`
</code></blockquote>

Note: <code>httpd.pid</code> is a file containing the <B>p</B>rocess <B>id</B>
of the Apache httpd daemon, Apache saves this in the same directory as the log
files.<P>

Many people use this method to replace (and backup) their logfiles on a
nightly basis.<p><hr>

<H2><A name="stoprob">How to stop robots</A></H2>

Ever wondered why so many clients are interested in a file called
<code>robots.txt</code> which you don't have, and never did have?<p>

These clients are called <B>robots</B> - special automated clients which
wander around the web looking for interesting resources.<p>

Most robots are used to generate some kind of <em>web index</em> which
is then used by a <em>search engine</em> to help locate information.<P>

<code>robots.txt</code> provides a means to request that robots limit their
activities at the site, or more often than not, to leave the site alone.<P>

When the first robots were developed, they had a bad reputation for
sending hundreds of requests to each site, often resulting in the site
being overloaded. Things have improved dramatically since then, thanks
to <A HREF="http://web.nexor.co.uk/mak/doc/robots/guidelines.html"> Guidlines
for Robot Writers</A>, but even so, some robots may exhibit unfriendly
behaviour which the webmaster isn't willing to tolerate.<P>

Another reason some webmasters want to block access to robots, results
from the way in which the information collected by the robots is subsequently
indexed. <B>There are currently no well used systems to annotate documents
such that they can be indexed by wandering robots.</B> Hence, the index
writer will often revert to unsatisfactory algorithms to determine what gets
indexed.<p>

Typically, indexes are built around text which appears in
document titles (&lt;TITLE&gt;), or main headings (&lt;H1&gt;), and more
often than not, the words it indexes on are completely irrelevant or
misleading for the docuement subject. The worst index is one based on
every word in the document. This inevitably leads to the search engines
offering poor suggestions which waste both the users and the servers
valuable time<P>

So if you decide to exclude robots completely, or just limit the areas
in which they can roam, set up a <CODE>robots.txt</CODE> file, and refer
to the <A HREF="http://web.nexor.co.uk/mak/doc/robots/norobots.html">robot
exclusion documentation</A>.<p>

Much better systems exist to both index your site and publicise its
resources, e.g.
<A HREF="http://web.nexor.co.uk/public/aliweb/aliweb.html">ALIWEB</A>, which
uses site defined index files.<p>

<!--#include virtual="footer.html" -->
</BODY>
</HTML>