HTMLparser

Name

HTMLparser —

Synopsis



typedef     htmlParserCtxt;
typedef     htmlParserCtxtPtr;
typedef     htmlParserNodeInfo;
typedef     htmlSAXHandler;
typedef     htmlSAXHandlerPtr;
typedef     htmlParserInput;
typedef     htmlParserInputPtr;
typedef     htmlDocPtr;
typedef     htmlNodePtr;
htmlElemDescPtr htmlTagLookup               (const xmlChar *tag);
htmlEntityDescPtr htmlEntityLookup          (const xmlChar *name);
int         htmlIsAutoClosed                (htmlDocPtr doc,
                                             htmlNodePtr elem);
int         htmlAutoCloseTag                (htmlDocPtr doc,
                                             const xmlChar *name,
                                             htmlNodePtr elem);
htmlEntityDescPtr htmlParseEntityRef        (htmlParserCtxtPtr ctxt,
                                             xmlChar **str);
int         htmlParseCharRef                (htmlParserCtxtPtr ctxt);
void        htmlParseElement                (htmlParserCtxtPtr ctxt);
htmlDocPtr  htmlSAXParseDoc                 (xmlChar *cur,
                                             const char *encoding,
                                             htmlSAXHandlerPtr sax,
                                             void *userData);
htmlDocPtr  htmlParseDoc                    (xmlChar *cur,
                                             const char *encoding);
htmlDocPtr  htmlSAXParseFile                (const char *filename,
                                             const char *encoding,
                                             htmlSAXHandlerPtr sax,
                                             void *userData);
htmlDocPtr  htmlParseFile                   (const char *filename,
                                             const char *encoding);

Description

Details

htmlParserCtxt


htmlParserCtxtPtr


htmlParserNodeInfo


htmlSAXHandler


htmlSAXHandlerPtr


htmlParserInput


htmlParserInputPtr


htmlDocPtr


htmlNodePtr


htmlTagLookup ()

htmlElemDescPtr htmlTagLookup               (const xmlChar *tag);

Lookup the HTML tag in the ElementTable

tag : 
Returns : 


htmlEntityLookup ()

htmlEntityDescPtr htmlEntityLookup          (const xmlChar *name);

Lookup the given entity in EntitiesTable

TODO: the linear scan is really ugly, an hash table is really needed.

name : 
Returns : 


htmlIsAutoClosed ()

int         htmlIsAutoClosed                (htmlDocPtr doc,
                                             htmlNodePtr elem);

The HTmL DtD allows a tag to implicitely close other tags. The list is kept in htmlStartClose array. This function checks if a tag is autoclosed by one of it's child

doc : 
elem : 
Returns : 


htmlAutoCloseTag ()

int         htmlAutoCloseTag                (htmlDocPtr doc,
                                             const xmlChar *name,
                                             htmlNodePtr elem);

The HTmL DtD allows a tag to implicitely close other tags. The list is kept in htmlStartClose array. This function checks if the element or one of it's children would autoclose the given tag.

doc : 
name : 
elem : 
Returns : 


htmlParseEntityRef ()

htmlEntityDescPtr htmlParseEntityRef        (htmlParserCtxtPtr ctxt,
                                             xmlChar **str);

parse an HTML ENTITY references

[68] EntityRef ::= '&' Name ';'

ctxt : 
str : 
Returns : 


htmlParseCharRef ()

int         htmlParseCharRef                (htmlParserCtxtPtr ctxt);

parse Reference declarations

[66] CharRef ::= '&#' [0-9]+ ';' | '&x' [0-9a-fA-F]+ ';'

ctxt : 
Returns : 


htmlParseElement ()

void        htmlParseElement                (htmlParserCtxtPtr ctxt);

parse an HTML element, this is highly recursive

[39] element ::= EmptyElemTag | STag content ETag

[41] Attribute ::= Name Eq AttValue

ctxt : 


htmlSAXParseDoc ()

htmlDocPtr  htmlSAXParseDoc                 (xmlChar *cur,
                                             const char *encoding,
                                             htmlSAXHandlerPtr sax,
                                             void *userData);

parse an HTML in-memory document and build a tree. It use the given SAX function block to handle the parsing callback. If sax is NULL, fallback to the default DOM tree building routines.

cur : 
encoding : 
sax : 
userData : 
Returns : 


htmlParseDoc ()

htmlDocPtr  htmlParseDoc                    (xmlChar *cur,
                                             const char *encoding);

parse an HTML in-memory document and build a tree.

cur : 
encoding : 
Returns : 


htmlSAXParseFile ()

htmlDocPtr  htmlSAXParseFile                (const char *filename,
                                             const char *encoding,
                                             htmlSAXHandlerPtr sax,
                                             void *userData);

parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time. It use the given SAX function block to handle the parsing callback. If sax is NULL, fallback to the default DOM tree building routines.

filename : 
encoding : 
sax : 
userData : 
Returns : 


htmlParseFile ()

htmlDocPtr  htmlParseFile                   (const char *filename,
                                             const char *encoding);

parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.

filename : 
encoding : 
Returns :