EDMLParser


Inherits From:
NSObject
Declared In:
EDMLParser.h


Class Description

This parser was implemented before the widespread adoption of XML and today's abundance of corresponding parsers but it remains useful, especially in conjunction with the EDAOMTagProcessor. It is probably even more useful today as more and more applications have to deal with XML files. Moreover, the parser can deal with fairly bad markup, as often found in HTML documents, but also handles more complex constructs such as XML namespaces. It is implemented as a shift/reduce parser which makes it efficient but not tolerant to missing end tags.

The parser needs a tag processor that implements the EDMLTagProcessorProtocol and uses the callback methods at events during the parsing of a string/document. (Some few methods deal with configuration.) This mode of operation is very similar to the SAX API and provides great flexibility. Most applications, however, will use the AOM tag processor which transforms the document into a tree in which the nodes are represented by objects. This is very DOM like with one significant exception: Unlike typical DOM parsers EDAOMTagProcessor uses node classes created by the application developer. This means, of course, that the resulting tree is directly meaningful in the application and the node classes can contain application specific behaviour. See the implementations in the EDSLProcessor framework for an example.

Typically, a parser with a custom tag processor is used as follows:     
    
    id <EDMLTagProcessor> myTagProcessor; // assume this exists
    NSString *myDocument; // assume this exists
    EDMLParser *parser;
    NSArray *toplevelElements;
    
    parser = [EDMLParser parserWithTagProcessor:myTagProcessor];
    toplevelElements = [parser parseString:myDocument];
    

EDAOMTagProcessor provides a convenience method to initialise a parser with the AOM processor as follows: (See the class description of EDAOMTagProcessor for an explanation of the "tag definitions" dictionary.)     
    
    NSDictionary *myTagDefinitions; // assume this exists
    NSString *myDocument; // assume this exists
    EDMLParser *parser;
    NSArray *toplevelElements;
    
    parser = [EDMLParser parserWithTagDefinitions:myTagDefinitions];
    toplevelElements = [parser parseString:myDocument];
    

Note that there are convenience methods to parse XML.


Global Variables

Synopsis:

NSString *EDMLParserException;

Description:

Exception thrown when the parser encounters an error; generally a syntax error.


Instance Variables

BOOL preservesWhitespace;
id <EDMLTagProcessor> tagProcessor;
NSDictionary *entityTable;
unichar *source;
unichar *charp;
unsigned int lexmode;
id peekedToken;
NSMutableArray *stack;
NSMutableArray *namespaceStack;

preservesWhitespaceAll instance variables are private.
tagProcessor
entityTable
source
charp
lexmode
peekedToken
stack
namespaceStack


Method Types

Creating parser instances
+ parserWithTagProcessor:
- init
- initWithTagProcessor:
Assigning a tag processor
- setTagProcessor:
- tagProcessor
Configuring the parser
- setPreservesWhitespace:
- preservesWhitespace
- setEntityTable:
- entityTable
Parsing
- parseDocument:
- parseString:
- parseXMLDocumentAtPath:
- parseXMLDocument:
- parseXMLFragment:
Relevant character sets
+ spaceCharacterSet
+ idCharacterSet
+ textCharacterSet
+ attrStopCharacterSet


Class Methods

attrStopCharacterSet

+ (NSCharacterSet *)attrStopCharacterSet

Returns the character set that definitely marks the end of an attribute. In an ideal world this would be the inverse of the idCharacterSet but in the real world it is simply {"=", ">"}. Override in sublcasses for further customisation.


idCharacterSet

+ (NSCharacterSet *)idCharacterSet

Returns the character set containing all characters that can legally appear tag and attribute names. The default is the alphanumeric set plus {"-", ":", "."}. Override in sublcasses for further customisation.


parserWithTagProcessor:

+ (id)parserWithTagProcessor:(id <EDMLTagProcessor>)aTagProcessor

Creates and returns a parser which will use aTagProcessor.


spaceCharacterSet

+ (NSCharacterSet *)spaceCharacterSet

Returns the character set containing all characters that should be considered spaces. The default is the "whitespaceAndNewlineCharacterSet" set as returned by NSCharacterSet. Override in sublcasses for further customisation.


textCharacterSet

+ (NSCharacterSet *)textCharacterSet

Returns the character set containing all characters that can legally appear between tags; minus whitespace. The default is "everything" minus whitespace minus the greater than and less than characters. Override in sublcasses for further customisation.


Instance Methods

entityTable

- (NSDictionary *)entityTable

Returns the parser's entity table. See setEntityTable: for details.


init

- (id)init

Initialises a newly allocated parser.


initWithTagProcessor:

- (id)initWithTagProcessor:(id <EDMLTagProcessor>)aTagProcessor

Initialises a newly allocated parser and sets the tag processor to aTagProcessor.


parseDocument:

- (id)parseDocument:(NSString *)aString

Parses, or tries to parse, the document contained in aString using parseString:. Then creates a document object using documentForElements: in the tag processor. Consequently, the class of the returned object depends on the tag processor. Please refer to the respective class documentation.


parseString:

- (NSArray *)parseString:(NSString *)aString

Parses, or tries to parse, the text contained in aString. During the process methods from the EDTagProcessorProtocol are sent to the current tag processor. parseString: returns an array of all top-level elements found in the string as created by the tag processor. Exceptions are raised when syntax errors or mismatched container tags are encountered. (If the tag processor raises any exception, the parser shuts down properly, and re-raises it.)


parseXMLDocument:

- (id)parseXMLDocument:(NSData *)xmlData

Determines the string encoding of the xmlData, converts it into a string and calls parseDocument:. The class of the returned object depends on the tag processor.

Note that this method automatically loads the standard XML entity table if no entity table is set.


parseXMLDocumentAtPath:

- (id)parseXMLDocumentAtPath:(NSString *)path

Loads the file at path and calls parseXMLDocument:


parseXMLFragment:

- (NSArray *)parseXMLFragment:(NSString *)xmlString

Parses the XML fragment (which has to be passed as a string) and returns an array of all top-level elements found in the string.

Note that this method automatically loads the standard XML entity table if no entity table is set.


preservesWhitespace

- (BOOL)preservesWhitespace

Returns the parser's whitespace handling mode. See setPreservesWhitespace: for details.


setEntityTable:

- (void)setEntityTable:(NSDictionary *)aDictionary

Set the "entity table" for the parser. This table maps entities of the form &ename; to another string, usually a single characters. Entities are only replaced within text, not within tags.

Example table:     
    {    
        lt = "<";
        gt = ">";
        amp = "&";
        apos = "'";
        quot = "\"";
    }
    
    

Note that if you use the parseXML... methods the standard XML entity table is automatically set if no other entity table was provided before.


setPreservesWhitespace:

- (void)setPreservesWhitespace:(BOOL)flag

Controls whitespace handling. If set to YES, the parser will pass the exact whitespace sequence to its tag processor. If set to NO, it converts it to a simple @" ". The default is not to preserve whitespace.

Note that the tag processor can specify that whitespace within text, i.e. between tags, should be treated as text.


setTagProcessor:

- (void)setTagProcessor:(id <EDMLTagProcessor>)aTagProcessor

Sets the tag processor to aTagProcessor. The parser retains its tag processor. Note that it is probably not wise to change tag processors while parsing a string.


tagProcessor

- (id <EDMLTagProcessor>)tagProcessor

Returns the parser's tag processor.


Version 2.2 Copyright ©2002. All Rights Reserved.