- Inherits From:
- NSObject
- Declared In:
- EDMLParser.h
The parser needs a tag processor that implements the EDMLTagProcessorProtocol and uses the callback methods at events during the parsing of a string/document. (Some few methods deal with configuration.) This mode of operation is very similar to the SAX API and provides great flexibility. Most applications, however, will use the AOM tag processor which transforms the document into a tree in which the nodes are represented by objects. This is very DOM like with one significant exception: Unlike typical DOM parsers EDAOMTagProcessor uses node classes created by the application developer. This means, of course, that the resulting tree is directly meaningful in the application and the node classes can contain application specific behaviour. See the implementations in the EDSLProcessor framework for an example.
Typically, a parser with a custom tag processor is used as follows:
id <EDMLTagProcessor> myTagProcessor; // assume this exists
NSString *myDocument; // assume this exists
EDMLParser *parser;
NSArray *toplevelElements;
parser = [EDMLParser parserWithTagProcessor:myTagProcessor];
toplevelElements = [parser parseString:myDocument];
EDAOMTagProcessor provides a convenience method to initialise a parser with the AOM processor as follows: (See the class description of EDAOMTagProcessor for an explanation of the "tag definitions" dictionary.)
NSDictionary *myTagDefinitions; // assume this exists
NSString *myDocument; // assume this exists
EDMLParser *parser;
NSArray *toplevelElements;
parser = [EDMLParser parserWithTagDefinitions:myTagDefinitions];
toplevelElements = [parser parseString:myDocument];
Note that there are convenience methods to parse XML.
Synopsis:
NSString *EDMLParserException;
Description:
Exception thrown when the parser encounters an error; generally a syntax error.
BOOL preservesWhitespace;
id <EDMLTagProcessor> tagProcessor;
NSDictionary *entityTable;
unichar *source;
unichar *charp;
unsigned int lexmode;
id peekedToken;
NSMutableArray *stack;
NSMutableArray *namespaceStack;
preservesWhitespace All instance variables are private. tagProcessor entityTable source charp lexmode peekedToken stack namespaceStack
Creating parser instancesAssigning a tag processor
- + parserWithTagProcessor:
- - init
- - initWithTagProcessor:
Configuring the parser
- - setTagProcessor:
- - tagProcessor
Parsing
- - setPreservesWhitespace:
- - preservesWhitespace
- - setEntityTable:
- - entityTable
Relevant character sets
- - parseDocument:
- - parseString:
- - parseXMLDocumentAtPath:
- - parseXMLDocument:
- - parseXMLFragment:
- + spaceCharacterSet
- + idCharacterSet
- + textCharacterSet
- + attrStopCharacterSet
+ (NSCharacterSet *)attrStopCharacterSet
Returns the character set that definitely marks the end of an attribute. In an ideal world this would be the inverse of the idCharacterSet but in the real world it is simply {"=", ">"}
. Override in sublcasses for further customisation.
+ (NSCharacterSet *)idCharacterSet
Returns the character set containing all characters that can legally appear tag and attribute names. The default is the alphanumeric set plus {"-", ":", "."}
. Override in sublcasses for further customisation.
+ (id)parserWithTagProcessor:(id <EDMLTagProcessor>)aTagProcessor
Creates and returns a parser which will use aTagProcessor.
+ (NSCharacterSet *)spaceCharacterSet
Returns the character set containing all characters that should be considered spaces. The default is the "whitespaceAndNewlineCharacterSet" set as returned by NSCharacterSet. Override in sublcasses for further customisation.
+ (NSCharacterSet *)textCharacterSet
Returns the character set containing all characters that can legally appear between tags; minus whitespace. The default is "everything" minus whitespace minus the greater than and less than characters. Override in sublcasses for further customisation.
- (NSDictionary *)entityTable
Returns the parser's entity table. See setEntityTable: for details.
- (id)init
Initialises a newly allocated parser.
- (id)initWithTagProcessor:(id <EDMLTagProcessor>)aTagProcessor
Initialises a newly allocated parser and sets the tag processor to aTagProcessor.
- (id)parseDocument:(NSString *)aString
Parses, or tries to parse, the document contained in aString using parseString:. Then creates a document object using documentForElements: in the tag processor. Consequently, the class of the returned object depends on the tag processor. Please refer to the respective class documentation.
- (NSArray *)parseString:(NSString *)aString
Parses, or tries to parse, the text contained in aString. During the process methods from the EDTagProcessorProtocol are sent to the current tag processor. parseString: returns an array of all top-level elements found in the string as created by the tag processor. Exceptions are raised when syntax errors or mismatched container tags are encountered. (If the tag processor raises any exception, the parser shuts down properly, and re-raises it.)
- (id)parseXMLDocument:(NSData *)xmlData
Determines the string encoding of the xmlData, converts it into a string and calls parseDocument:. The class of the returned object depends on the tag processor.
Note that this method automatically loads the standard XML entity table if no entity table is set.
- (id)parseXMLDocumentAtPath:(NSString *)path
Loads the file at path and calls parseXMLDocument:
- (NSArray *)parseXMLFragment:(NSString *)xmlString
Parses the XML fragment (which has to be passed as a string) and returns an array of all top-level elements found in the string.
Note that this method automatically loads the standard XML entity table if no entity table is set.
- (BOOL)preservesWhitespace
Returns the parser's whitespace handling mode. See setPreservesWhitespace: for details.
- (void)setEntityTable:(NSDictionary *)aDictionary
Set the "entity table" for the parser. This table maps entities of the form &ename;
to another string, usually a single characters. Entities are only replaced within text, not within tags.
Example table:
{
lt = "<";
gt = ">";
amp = "&";
apos = "'";
quot = "\"";
}
Note that if you use the parseXML... methods the standard XML entity table is automatically set if no other entity table was provided before.
- (void)setPreservesWhitespace:(BOOL)flag
Controls whitespace handling. If set to YES, the parser will pass the exact whitespace sequence to its tag processor. If set to NO, it converts it to a simple @" "
. The default is not to preserve whitespace.
Note that the tag processor can specify that whitespace within text, i.e. between tags, should be treated as text.
- (void)setTagProcessor:(id <EDMLTagProcessor>)aTagProcessor
Sets the tag processor to aTagProcessor. The parser retains its tag processor. Note that it is probably not wise to change tag processors while parsing a string.
- (id <EDMLTagProcessor>)tagProcessor
Returns the parser's tag processor.