mirror of
https://github.com/apache/httpd.git
synced 2025-06-03 10:42:03 +03:00
git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@96194 13f79535-47bb-0310-9956-ffa450edef68
649 lines
16 KiB
HTML
649 lines
16 KiB
HTML
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
|
|
|
<html xmlns="http://www.w3.org/1999/xhtml">
|
|
<head>
|
|
<meta name="generator" content="HTML Tidy, see www.w3.org" />
|
|
|
|
<title>The Apache EBCDIC Port</title>
|
|
</head>
|
|
<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
|
|
|
|
<body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
|
|
vlink="#000080" alink="#FF0000">
|
|
<!--#include virtual="header.html" -->
|
|
|
|
<blockquote>
|
|
<strong>Warning:</strong> This document has not been updated
|
|
to take into account changes made in the 2.0 version of the
|
|
Apache HTTP Server. Some of the information may still be
|
|
relevant, but please use it with care.
|
|
</blockquote>
|
|
|
|
<h1 align="center">Overview of the Apache EBCDIC Port</h1>
|
|
|
|
<p>Version 1.3 of the Apache HTTP Server is the first version
|
|
which includes a port to a (non-ASCII) mainframe machine which
|
|
uses the EBCDIC character set as its native codeset.<br />
|
|
(It is the SIEMENS family of mainframes running the <a
|
|
href="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD
|
|
operating system</a>. This mainframe OS nowadays features a
|
|
SVR4-derived POSIX subsystem).</p>
|
|
|
|
<p>The port was started initially to</p>
|
|
|
|
<ul>
|
|
<li>prove the feasibility of porting <a
|
|
href="http://dev.apache.org/">the Apache HTTP server</a> to
|
|
this platform</li>
|
|
|
|
<li>find a "worthy and capable" successor for the venerable
|
|
<a href="http://www.w3.org/Daemon/">CERN-3.0</a> daemon
|
|
(which was ported a couple of years ago), and to</li>
|
|
|
|
<li>prove that Apache's preforking process model can on this
|
|
platform easily outperform the accept-fork-serve model used
|
|
by CERN by a factor of 5 or more.</li>
|
|
</ul>
|
|
<br />
|
|
<br />
|
|
|
|
|
|
<p>This document serves as a rationale to describe some of the
|
|
design decisions of the port to this machine.</p>
|
|
|
|
<h2 align="center">Design Goals</h2>
|
|
|
|
<p>One objective of the EBCDIC port was to maintain enough
|
|
backwards compatibility with the (EBCDIC) CERN server to make
|
|
the transition to the new server attractive and easy. This
|
|
required the addition of a configurable method to define
|
|
whether a HTML document was stored in ASCII (the only format
|
|
accepted by the old server) or in EBCDIC (the native document
|
|
format in the POSIX subsystem, and therefore the only realistic
|
|
format in which the other POSIX tools like grep or sed could
|
|
operate on the documents). The current solution to this is a
|
|
"pseudo-MIME-format" which is intercepted and interpreted by
|
|
the Apache server (see below). Future versions might solve the
|
|
problem by defining an "ebcdic-handler" for all documents which
|
|
must be converted.</p>
|
|
|
|
<h2 align="center">Technical Solution</h2>
|
|
|
|
<p>Since all Apache input and output is based upon the BUFF
|
|
data type and its methods, the easiest solution was to add the
|
|
conversion to the BUFF handling routines. The conversion must
|
|
be settable at any time, so a BUFF flag was added which defines
|
|
whether a BUFF object has currently enabled conversion or not.
|
|
This flag is modified at several points in the HTTP
|
|
protocol:</p>
|
|
|
|
<ul>
|
|
<li><strong>set</strong> before a request is received
|
|
(because the request and the request header lines are always
|
|
in ASCII format)</li>
|
|
|
|
<li><strong>set/unset</strong> when the request body is
|
|
received - depending on the content type of the request body
|
|
(because the request body may contain ASCII text or a binary
|
|
file)</li>
|
|
|
|
<li><strong>set</strong> before a reply header is sent
|
|
(because the response header lines are always in ASCII
|
|
format)</li>
|
|
|
|
<li><strong>set/unset</strong> when the response body is sent
|
|
- depending on the content type of the response body (because
|
|
the response body may contain text or a binary file)</li>
|
|
</ul>
|
|
<br />
|
|
<br />
|
|
|
|
|
|
<h2 align="center">Porting Notes</h2>
|
|
|
|
<ol>
|
|
<li>
|
|
The relevant changes in the source are #ifdef'ed into two
|
|
categories:
|
|
|
|
<dl>
|
|
<dt><code><strong>#ifdef
|
|
CHARSET_EBCDIC</strong></code></dt>
|
|
|
|
<dd>Code which is needed for any EBCDIC based machine.
|
|
This includes character translations, differences in
|
|
contiguity of the two character sets, flags which
|
|
indicate which part of the HTTP protocol has to be
|
|
converted and which part doesn't <em>etc.</em></dd>
|
|
|
|
<dt><code><strong>#ifdef _OSD_POSIX</strong></code></dt>
|
|
|
|
<dd>Code which is needed for the SIEMENS BS2000/OSD
|
|
mainframe platform only. This deals with include file
|
|
differences and socket implementation topics which are
|
|
only required on the BS2000/OSD platform.</dd>
|
|
</dl>
|
|
</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>The possibility to translate between ASCII and EBCDIC at
|
|
the socket level (on BS2000 POSIX, there is a socket option
|
|
which supports this) was intentionally <em>not</em> chosen,
|
|
because the byte stream at the HTTP protocol level consists
|
|
of a mixture of protocol related strings and non-protocol
|
|
related raw file data. HTTP protocol strings are always
|
|
encoded in ASCII (the GET request, any Header: lines, the
|
|
chunking information <em>etc.</em>) whereas the file transfer
|
|
parts (<em>i.e.</em>, GIF images, CGI output <em>etc.</em>)
|
|
should usually be just "passed through" by the server. This
|
|
separation between "protocol string" and "raw data" is
|
|
reflected in the server code by functions like bgets() or
|
|
rvputs() for strings, and functions like bwrite() for binary
|
|
data. A global translation of everything would therefore be
|
|
inadequate.<br />
|
|
(In the case of text files of course, provisions must be
|
|
made so that EBCDIC documents are always served in
|
|
ASCII)</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>This port therefore features a built-in protocol level
|
|
conversion for the server-internal strings (which the
|
|
compiler translated to EBCDIC strings) and thus for all
|
|
server-generated documents. The hard coded ASCII escapes \012
|
|
and \015 which are ubiquitous in the server code are an
|
|
exception: they are already the binary encoding of the ASCII
|
|
\n and \r and must not be converted to ASCII a second time.
|
|
This exception is only relevant for server-generated strings;
|
|
and <em>external</em> EBCDIC documents are not expected to
|
|
contain ASCII newline characters.</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>By examining the call hierarchy for the BUFF management
|
|
routines, I added an "ebcdic/ascii conversion layer" which
|
|
would be crossed on every puts/write/get/gets, and a
|
|
conversion flag which allowed enabling/disabling the
|
|
conversions on-the-fly. Usually, a document crosses this
|
|
layer twice from its origin source (a file or CGI output) to
|
|
its destination (the requesting client): <samp>file ->
|
|
Apache</samp>, and <samp>Apache -> client</samp>.<br />
|
|
The server can now read the header lines of a CGI-script
|
|
output in EBCDIC format, and then find out that the remainder
|
|
of the script's output is in ASCII (like in the case of the
|
|
output of a WWW Counter program: the document body contains a
|
|
GIF image). All header processing is done in the native
|
|
EBCDIC format; the server then determines, based on the type
|
|
of document being served, whether the document body (except
|
|
for the chunking information, of course) is in ASCII already
|
|
or must be converted from EBCDIC.</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>
|
|
For Text documents (MIME types text/plain, text/html
|
|
<em>etc.</em>), an implicit translation to ASCII can be
|
|
used, or (if the users prefer to store some documents in
|
|
raw ASCII form for faster serving, or because the files
|
|
reside on a NFS-mounted directory tree) can be served
|
|
without conversion.<br />
|
|
<strong>Example:</strong>
|
|
|
|
<blockquote>
|
|
to serve files with the suffix .ahtml as a raw ASCII
|
|
text/html document without implicit conversion (and
|
|
suffix .ascii as ASCII text/plain), use the directives:
|
|
<pre>
|
|
AddType text/x-ascii-html .ahtml
|
|
AddType text/x-ascii-plain .ascii
|
|
|
|
</pre>
|
|
</blockquote>
|
|
Similarly, any text/foo MIME type can be served as "raw
|
|
ASCII" by configuring a MIME type "text/x-ascii-foo" for it
|
|
using AddType.
|
|
</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>Non-text documents are always served "binary" without
|
|
conversion. This seems to be the most sensible choice for,
|
|
.<em>e.g.</em>, GIF/ZIP/AU file types. This of course
|
|
requires the user to copy them to the mainframe host using
|
|
the "rcp -b" binary switch.</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>Server parsed files are always assumed to be in native
|
|
(<em>i.e.</em>, EBCDIC) format as used on the machine, and
|
|
are converted after processing.</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
|
|
<li>For CGI output, the CGI script determines whether a
|
|
conversion is needed or not: by setting the appropriate
|
|
Content-Type, text files can be converted, or GIF output can
|
|
be passed through unmodified. An example for the latter case
|
|
is the wwwcount program which we ported as well.</li>
|
|
|
|
<li style="list-style: none"><br />
|
|
</li>
|
|
</ol>
|
|
<br />
|
|
<br />
|
|
|
|
|
|
<h2 align="center">Document Storage Notes</h2>
|
|
|
|
<h3 align="center">Binary Files</h3>
|
|
|
|
<p>All files with a <samp>Content-Type:</samp> which does not
|
|
start with <samp>text/</samp> are regarded as <em>binary
|
|
files</em> by the server and are not subject to any conversion.
|
|
Examples for binary files are GIF images, gzip-compressed files
|
|
and the like.</p>
|
|
|
|
<p>When exchanging binary files between the mainframe host and
|
|
a Unix machine or Windows PC, be sure to use the ftp "binary"
|
|
(<samp>TYPE I</samp>) command, or use the
|
|
<samp>rcp -b</samp> command from the mainframe host (the
|
|
-b switch is not supported in unix rcp's).</p>
|
|
|
|
<h3 align="center">Text Documents</h3>
|
|
|
|
<p>The default assumption of the server is that Text Files
|
|
(<em>i.e.</em>, all files whose <samp>Content-Type:</samp>
|
|
starts with <samp>text/</samp>) are stored in the native
|
|
character set of the host, EBCDIC.</p>
|
|
|
|
<h3 align="center">Server Side Included Documents</h3>
|
|
|
|
<p>SSI documents must currently be stored in EBCDIC only. No
|
|
provision is made to convert it from ASCII before
|
|
processing.</p>
|
|
|
|
<h2 align="center">Apache Modules' Status</h2>
|
|
|
|
<table border="1" align="center">
|
|
<tr>
|
|
<th>Module</th>
|
|
|
|
<th>Status</th>
|
|
|
|
<th>Notes</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">http_core</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_access</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_actions</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_alias</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_asis</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_auth</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_auth_anon</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_auth_dbm</td>
|
|
|
|
<td align="center">?</td>
|
|
|
|
<td>with own libdb.a</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_autoindex</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_cern_meta</td>
|
|
|
|
<td align="center">?</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_cgi</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_digest</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_dir</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_so</td>
|
|
|
|
<td align="center">-</td>
|
|
|
|
<td>no shared libs</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_env</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_example</td>
|
|
|
|
<td align="center">-</td>
|
|
|
|
<td>(test bed only)</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_expires</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_headers</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_imap</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_include</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_info</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_log_agent</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_log_config</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_log_referer</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_mime</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_mime_magic</td>
|
|
|
|
<td align="center">?</td>
|
|
|
|
<td>not ported yet</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_negotiation</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_proxy</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_rewrite</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>untested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_setenvif</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_speling</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_status</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_unique_id</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_userdir</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left">mod_usertrack</td>
|
|
|
|
<td align="center">?</td>
|
|
|
|
<td>untested</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<h2 align="center">Third Party Modules' Status</h2>
|
|
|
|
<table border="1" align="center">
|
|
<tr>
|
|
<th>Module</th>
|
|
|
|
<th>Status</th>
|
|
|
|
<th>Notes</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left"><a
|
|
href="http://java.apache.org/">mod_jserv</a> </td>
|
|
|
|
<td align="center">-</td>
|
|
|
|
<td>JAVA still being ported.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left"><a href="http://www.php.net/">mod_php3</a>
|
|
</td>
|
|
|
|
<td align="center">+</td>
|
|
|
|
<td>mod_php3 runs fine, with LDAP and GD and FreeType
|
|
libraries</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left"><a
|
|
href="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html">
|
|
mod_put</a> </td>
|
|
|
|
<td align="center">?</td>
|
|
|
|
<td>untested</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td align="left"><a
|
|
href="ftp://hachiman.vidya.com/pub/apache/">mod_session</a>
|
|
</td>
|
|
|
|
<td align="center">-</td>
|
|
|
|
<td>untested</td>
|
|
</tr>
|
|
</table>
|
|
<!--#include virtual="footer.html" -->
|
|
</body>
|
|
</html>
|
|
|