mirror of
https://github.com/apache/httpd.git
synced 2025-08-30 15:01:14 +03:00
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@77120 13f79535-47bb-0310-9956-ffa450edef68
125 lines
4.8 KiB
HTML
125 lines
4.8 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>Apache HOWTO documentation</TITLE>
|
|
</HEAD>
|
|
|
|
<BODY>
|
|
<!--#include virtual="header.html" -->
|
|
<H1>Apache HOWTO documentation</h1>
|
|
|
|
How to:
|
|
<ul>
|
|
<li><A HREF="#redirect">redirect an entire server or directory</A>
|
|
<li><A HREF="#logreset">reset your log files</A>
|
|
<li><A HREF="#stoprob">stop robots</A>
|
|
</ul>
|
|
|
|
<hr>
|
|
<H2><A name="redirect">How to redirect an entire server or directory</A></H2>
|
|
|
|
One way to redirect all requests for an entire server is to setup a
|
|
<CODE>Redirect</Code> to a <B>cgi script</B> which outputs a 301 or 302 status
|
|
and the location of the other server.<P>
|
|
|
|
By using a <B>cgi-script</B> you can intercept various requests and treat them
|
|
specially, e.g. you might want to intercept <B>POST</B> requests, so that the
|
|
client isn't redirected to a script on the other server which expects POST
|
|
information (a redirect will lose the POST information.)<P>
|
|
|
|
Here's how to redirect all requests to a script... In the server configuration
|
|
file,
|
|
<blockquote><code>ScriptAlias /
|
|
/usr/local/httpd/cgi-bin/redirect_script</code></blockquote>
|
|
|
|
and here's a simple perl script to redirect
|
|
|
|
<blockquote><code>
|
|
#!/usr/local/bin/perl <br>
|
|
<br>
|
|
print "Status: 302 Moved Temporarily\r <br>
|
|
Location: http://www.some.where.else.com/\r\n\r\n"; <br>
|
|
<br>
|
|
</code></blockquote><p><hr>
|
|
|
|
<H2><A name="logreset">How to reset your log files</A></H2>
|
|
|
|
Sooner or later, you'll want to reset your log files (access_log and
|
|
error_log) because they are too big, or full of old information you don't
|
|
need.<p>
|
|
|
|
<CODE>access.log</CODE> typically grows by 1Mb for each 10,000 requests.<p>
|
|
|
|
Most people's first attempt at replacingthe logfile is to just move the
|
|
logfile or remove the logfile. This doesn't work.<p>
|
|
|
|
Apache will continue writing to the logfile at the same offset as before the
|
|
logifile moved. This results in a new logfile being created which is just
|
|
as big as the old one, but it now contains thousands (or millions) of null
|
|
characters.<p>
|
|
|
|
The correct procedure is to move the logfile, then signal Apache to tell it to
|
|
reopen the logfiles.<p>
|
|
|
|
Apache is signalled using the <B>SIGHUP</B> (-1) signal. e.g.
|
|
<blockquote><code>
|
|
mv access_log access_log.old ; kill -1 `cat httpd.pid`
|
|
</code></blockquote>
|
|
|
|
Note: <code>httpd.pid</code> is a file containing the <B>p</B>rocess <B>id</B>
|
|
of the Apache httpd daemon, Apache saves this in the same directory as the log
|
|
files.<P>
|
|
|
|
Many people use this method to replace (and backup) their logfiles on a
|
|
nightly basis.<p><hr>
|
|
|
|
<H2><A name="stoprob">How to stop robots</A></H2>
|
|
|
|
Ever wondered why so many clients are interested in a file called
|
|
<code>robots.txt</code> which you don't have, and never did have?<p>
|
|
|
|
These clients are called <B>robots</B> - special automated clients which
|
|
wander around the web looking for interesting resources.<p>
|
|
|
|
Most robots are used to generate some kind of <em>web index</em> which
|
|
is then used by a <em>search engine</em> to help locate information.<P>
|
|
|
|
<code>robots.txt</code> provides a means to request that robots limit their
|
|
activities at the site, or more often than not, to leave the site alone.<P>
|
|
|
|
When the first robots were developed, they had a bad reputation for
|
|
sending hundreds of requests to each site, often resulting in the site
|
|
being overloaded. Things have improved dramatically since then, thanks
|
|
to <A HREF="http://web.nexor.co.uk/mak/doc/robots/guidelines.html"> Guidlines
|
|
for Robot Writers</A>, but even so, some robots may exhibit unfriendly
|
|
behaviour which the webmaster isn't willing to tolerate.<P>
|
|
|
|
Another reason some webmasters want to block access to robots, results
|
|
from the way in which the information collected by the robots is subsequently
|
|
indexed. <B>There are currently no well used systems to annotate documents
|
|
such that they can be indexed by wandering robots.</B> Hence, the index
|
|
writer will often revert to unsatisfactory algorithms to determine what gets
|
|
indexed.<p>
|
|
|
|
Typically, indexes are built around text which appears in
|
|
document titles (<TITLE>), or main headings (<H1>), and more
|
|
often than not, the words it indexes on are completely irrelevant or
|
|
misleading for the docuement subject. The worst index is one based on
|
|
every word in the document. This inevitably leads to the search engines
|
|
offering poor suggestions which waste both the users and the servers
|
|
valuable time<P>
|
|
|
|
So if you decide to exclude robots completely, or just limit the areas
|
|
in which they can roam, set up a <CODE>robots.txt</CODE> file, and refer
|
|
to the <A HREF="http://web.nexor.co.uk/mak/doc/robots/norobots.html">robot
|
|
exclusion documentation</A>.<p>
|
|
|
|
Much better systems exist to both index your site and publicise its
|
|
resources, e.g.
|
|
<A HREF="http://web.nexor.co.uk/public/aliweb/aliweb.html">ALIWEB</A>, which
|
|
uses site defined index files.<p>
|
|
|
|
<!--#include virtual="footer.html" -->
|
|
</BODY>
|
|
</HTML>
|