Module mod_charset_lite

This module is contained in the mod_charset_lite.c file, with Apache 2.0 and later. It provides the ability to specify character set translation, or recoding, by directory or location or virtual server. It is not compiled into the server by default. mod_charset_lite requires that Apache is compiled with APACHE_XLATE defined.

This module provides a small subset of configuration mechanisms implemented by Russian Apache and its associated mod_charset.

Summary

This is an experimental module and should be used with care. Experiment with your mod_charset_lite configuration to ensure that it performs the desired function.

mod_charset_lite allows the administrator to specify the source character set of objects as well as the character set they should be translated into before sending to the client. mod_charset_lite does not translate the data itself but instead tells Apache what translation to perform. mod_charset_lite is applicable to EBCDIC and ASCII host environments. In an EBCDIC environment, Apache normally translates text content from the code page of the Apache process locale to ISO-8859-1. mod_charset_lite can be used to specify that a different translation is to be performed. In an ASCII environment, Apache normally performs no translation, so mod_charset_lite is needed in order for any translation to take place.

Directives


CharsetSourceEnc

Syntax: CharsetSourceEnc charset
Default: None
Context: directory, virtual host
Override: FileInfo
Status: Experimental
Module: mod_charset_lite
Compatibility: Only available in Apache 2.0 or later

The CharsetSourceEnc directive specifies the source charset of files in the associated container.

The value of the charset argument must be accepted as a valid character set name by the character set support in APR. Generally, this means that it must be supported by iconv.

Example:
    <Directory "/export/home/trawick/apacheinst/htdocs/convert">
    CharsetSourceEnc  UTF-16BE
    CharsetDefault    ISO8859-1
    </Directory>
  
The character set names in this example work with the iconv translation support in Solaris 8.

CharsetDefault

Syntax: CharsetDefault charset
Default: None
Context: directory, virtual host
Override: FileInfo
Status: Experimental
Module: mod_charset_lite
Compatibility: Only available in Apache 2.0 or later

The CharsetDefault directive specifies the charset that content in the associated container should be translated to.

The value of the charset argument must be accepted as a valid character set name by the character set support in APR. Generally, this means that it must be supported by iconv.

Example:
    <Directory "/export/home/trawick/apacheinst/htdocs/convert">
    CharsetSourceEnc  UTF-16BE
    CharsetDefault    ISO8859-1
    </Directory>
  

CharsetDebug

Syntax: CharsetDebug on/off
Default: off
Context: directory, virtual host
Override: FileInfo
Status: Experimental
Module: mod_charset_lite
Compatibility: Only available in Apache 2.0 or later

The CharsetDebug directive specifies whether or not verbose logging should be performed by mod_charset_lite. Such logging is written to the Apache error log with level debug.

Common Problems

Invalid character set names

The character set name parameters of CharsetSourceEnc and CharsetDefault must be acceptable to the translation mechanism used by APR on the system where mod_charset_lite is deployed. These character set names are not standardized and are usually not the same as the corresponding values used in http headers. Currently, APR can only use iconv(3), so you can easily test your character set names using the iconv(1) program, as follows:

  iconv -f charsetsourceenc-value -t charsetdefault-value
  

Mismatch between character set of content and translation rules

If the translation rules don't make sense for the content, translation can fail in various ways, including:

  • The translation mechanism may return a bad return code, and the connection will be aborted.
  • The translation mechanism may silently place special characters (e.g., question marks) in the output buffer when it cannot translate the input buffer.