encoding

encoding - interface for the encoding conversion functions

interface for the encoding conversion functions needed for XML basic encoding and iconv() support. Related specs are rfc2044 (UTF-8 and UTF-16) F. Yergeau Alis Technologies [ISO-10646] UTF-8 and UTF-16 in Annexes [ISO-8859-1] ISO Latin-1 characters codes. [UNICODE] The Unicode Consortium, "The Unicode Standard -- Worldwide Character Encoding -- Version 1.0", Addison- Wesley, Volume 1, 1991, Volume 2, 1992. UTF-8 is described in Unicode Technical Report #4. [US-ASCII] Coded Character Set--7-bit American Standard Code for Information Interchange, ANSI X3.4-1986.

Author(s): Daniel Veillard

Synopsis

typedef enum xmlCharEncoding;
typedef struct _xmlCharEncodingHandler xmlCharEncodingHandler;
typedef xmlCharEncodingHandler * xmlCharEncodingHandlerPtr;
int	UTF8Toisolat1			(unsigned char * out, 
int * outlen,
const unsigned char * in,
int * inlen); int isolat1ToUTF8 (unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen); int xmlAddEncodingAlias (const char * name,
const char * alias); int xmlCharEncCloseFunc (xmlCharEncodingHandler * handler); int xmlCharEncFirstLine (xmlCharEncodingHandler * handler,
xmlBufferPtr out,
xmlBufferPtr in); int xmlCharEncInFunc (xmlCharEncodingHandler * handler,
xmlBufferPtr out,
xmlBufferPtr in); int xmlCharEncOutFunc (xmlCharEncodingHandler * handler,
xmlBufferPtr out,
xmlBufferPtr in); typedef int xmlCharEncodingInputFunc (unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen); typedef int xmlCharEncodingOutputFunc (unsigned char * out,
int * outlen,
const unsigned char * in,
int * inlen); void xmlCleanupCharEncodingHandlers (void); void xmlCleanupEncodingAliases (void); int xmlDelEncodingAlias (const char * alias); xmlCharEncoding xmlDetectCharEncoding (const unsigned char * in,
int len); xmlCharEncodingHandlerPtr xmlFindCharEncodingHandler (const char * name); xmlCharEncodingHandlerPtr xmlGetCharEncodingHandler (xmlCharEncoding enc); const char * xmlGetCharEncodingName (xmlCharEncoding enc); const char * xmlGetEncodingAlias (const char * alias); void xmlInitCharEncodingHandlers (void); xmlCharEncodingHandlerPtr xmlNewCharEncodingHandler (const char * name,
xmlCharEncodingInputFunc input,
xmlCharEncodingOutputFunc output); xmlCharEncoding xmlParseCharEncoding (const char * name); void xmlRegisterCharEncodingHandler (xmlCharEncodingHandlerPtr handler);

Description

Details

Enum xmlCharEncoding

enum xmlCharEncoding {
    XML_CHAR_ENCODING_ERROR = -1 /* No char encoding detected */
    XML_CHAR_ENCODING_NONE = 0 /* No char encoding detected */
    XML_CHAR_ENCODING_UTF8 = 1 /* UTF-8 */
    XML_CHAR_ENCODING_UTF16LE = 2 /* UTF-16 little endian */
    XML_CHAR_ENCODING_UTF16BE = 3 /* UTF-16 big endian */
    XML_CHAR_ENCODING_UCS4LE = 4 /* UCS-4 little endian */
    XML_CHAR_ENCODING_UCS4BE = 5 /* UCS-4 big endian */
    XML_CHAR_ENCODING_EBCDIC = 6 /* EBCDIC uh! */
    XML_CHAR_ENCODING_UCS4_2143 = 7 /* UCS-4 unusual ordering */
    XML_CHAR_ENCODING_UCS4_3412 = 8 /* UCS-4 unusual ordering */
    XML_CHAR_ENCODING_UCS2 = 9 /* UCS-2 */
    XML_CHAR_ENCODING_8859_1 = 10 /* ISO-8859-1 ISO Latin 1 */
    XML_CHAR_ENCODING_8859_2 = 11 /* ISO-8859-2 ISO Latin 2 */
    XML_CHAR_ENCODING_8859_3 = 12 /* ISO-8859-3 */
    XML_CHAR_ENCODING_8859_4 = 13 /* ISO-8859-4 */
    XML_CHAR_ENCODING_8859_5 = 14 /* ISO-8859-5 */
    XML_CHAR_ENCODING_8859_6 = 15 /* ISO-8859-6 */
    XML_CHAR_ENCODING_8859_7 = 16 /* ISO-8859-7 */
    XML_CHAR_ENCODING_8859_8 = 17 /* ISO-8859-8 */
    XML_CHAR_ENCODING_8859_9 = 18 /* ISO-8859-9 */
    XML_CHAR_ENCODING_2022_JP = 19 /* ISO-2022-JP */
    XML_CHAR_ENCODING_SHIFT_JIS = 20 /* Shift_JIS */
    XML_CHAR_ENCODING_EUC_JP = 21 /* EUC-JP */
    XML_CHAR_ENCODING_ASCII = 22 /*  pure ASCII */
};


Structure xmlCharEncodingHandler

struct _xmlCharEncodingHandler {
    char *	name
    xmlCharEncodingInputFunc	input
    xmlCharEncodingOutputFunc	output
    iconv_t	iconv_in
    iconv_t	iconv_out
    struct _uconv_t *	uconv_in
    struct _uconv_t *	uconv_out
} xmlCharEncodingHandler;


Typedef xmlCharEncodingHandlerPtr

xmlCharEncodingHandler * xmlCharEncodingHandlerPtr;







xmlCharEncCloseFunc ()

int	xmlCharEncCloseFunc		(xmlCharEncodingHandler * handler)

Generic front-end for encoding handler close function

handler:char encoding transformation data structure
Returns:0 if success, or -1 in case of error

xmlCharEncFirstLine ()

int	xmlCharEncFirstLine		(xmlCharEncodingHandler * handler, 
xmlBufferPtr out,
xmlBufferPtr in)

Front-end for the encoding handler input function, but handle only the very first line, i.e. limit itself to 45 chars.

handler:char encoding transformation data structure
out:an xmlBuffer for the output.
in:an xmlBuffer for the input
Returns:the number of byte written if success, or -1 general error -2 if the transcoding fails (for *in is not valid utf8 string or the result of transformation can't fit into the encoding we want), or

xmlCharEncInFunc ()

int	xmlCharEncInFunc		(xmlCharEncodingHandler * handler, 
xmlBufferPtr out,
xmlBufferPtr in)

Generic front-end for the encoding handler input function

handler:char encoding transformation data structure
out:an xmlBuffer for the output.
in:an xmlBuffer for the input
Returns:the number of byte written if success, or -1 general error -2 if the transcoding fails (for *in is not valid utf8 string or the result of transformation can't fit into the encoding we want), or

xmlCharEncOutFunc ()

int	xmlCharEncOutFunc		(xmlCharEncodingHandler * handler, 
xmlBufferPtr out,
xmlBufferPtr in)

Generic front-end for the encoding handler output function a first call with @in == NULL has to be made firs to initiate the output in case of non-stateless encoding needing to initiate their state or the output (like the BOM in UTF16). In case of UTF8 sequence conversion errors for the given encoder, the content will be automatically remapped to a CharRef sequence.

handler:char encoding transformation data structure
out:an xmlBuffer for the output.
in:an xmlBuffer for the input
Returns:the number of byte written if success, or -1 general error -2 if the transcoding fails (for *in is not valid utf8 string or the result of transformation can't fit into the encoding we want), or

xmlCleanupCharEncodingHandlers ()

void	xmlCleanupCharEncodingHandlers	(void)

DEPRECATED: This function will be made private. Call xmlCleanupParser to free global state but see the warnings there. xmlCleanupParser should be only called once at program exit. In most cases, you don't have call cleanup functions at all. Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.




xmlDetectCharEncoding ()

xmlCharEncoding	xmlDetectCharEncoding	(const unsigned char * in, 
int len)

Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.

in:a pointer to the first bytes of the XML entity, must be at least 2 bytes long (at least 4 if encoding is UTF4 variant).
len:pointer to the length of the buffer
Returns:one of the XML_CHAR_ENCODING_... values.

xmlFindCharEncodingHandler ()

xmlCharEncodingHandlerPtr	xmlFindCharEncodingHandler	(const char * name)

Search in the registered set the handler able to read/write that encoding.

name:a string describing the char encoding.
Returns:the handler or NULL if not found

xmlGetCharEncodingHandler ()

xmlCharEncodingHandlerPtr	xmlGetCharEncodingHandler	(xmlCharEncoding enc)

Search in the registered set the handler able to read/write that encoding.

enc:an xmlCharEncoding value.
Returns:the handler or NULL if not found

xmlGetCharEncodingName ()

const char *	xmlGetCharEncodingName	(xmlCharEncoding enc)

The "canonical" name for XML encoding. C.f. http://www.w3.org/TR/REC-xml#charencoding Section 4.3.3 Character Encoding in Entities

enc:the encoding
Returns:the canonical name for the given encoding


xmlInitCharEncodingHandlers ()

void	xmlInitCharEncodingHandlers	(void)

DEPRECATED: This function will be made private. Call xmlInitParser to initialize the library. Initialize the char encoding support, it registers the default encoding supported. NOTE: while public, this function usually doesn't need to be called in normal processing.


xmlNewCharEncodingHandler ()

xmlCharEncodingHandlerPtr	xmlNewCharEncodingHandler	(const char * name, 
xmlCharEncodingInputFunc input,
xmlCharEncodingOutputFunc output)

Create and registers an xmlCharEncodingHandler.

name:the encoding name, in UTF-8 format (ASCII actually)
input:the xmlCharEncodingInputFunc to read that encoding
output:the xmlCharEncodingOutputFunc to write that encoding
Returns:the xmlCharEncodingHandlerPtr created (or NULL in case of error).

xmlParseCharEncoding ()

xmlCharEncoding	xmlParseCharEncoding	(const char * name)

Compare the string to the encoding schemes already known. Note that the comparison is case insensitive accordingly to the section [XML] 4.3.3 Character Encoding in Entities.

name:the encoding name as parsed, in UTF-8 format (ASCII actually)
Returns:one of the XML_CHAR_ENCODING_... values or XML_CHAR_ENCODING_NONE if not recognized.

xmlRegisterCharEncodingHandler ()

void	xmlRegisterCharEncodingHandler	(xmlCharEncodingHandlerPtr handler)

Register the char encoding handler, surprising, isn't it ?

handler:the xmlCharEncodingHandlerPtr handler block