1<?xml version="1.0" encoding="iso-8859-1"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 4<html> 5<head> 6<!-- 7 __ __ _ 8 ___\ \/ /_ __ __ _| |_ 9 / _ \\ /| '_ \ / _` | __| 10 | __// \| |_) | (_| | |_ 11 \___/_/\_\ .__/ \__,_|\__| 12 |_| XML parser 13 14 Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net> 15 Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net> 16 Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net> 17 Copyright (c) 2017-2022 Sebastian Pipping <sebastian@pipping.org> 18 Copyright (c) 2017 Jakub Wilk <jwilk@jwilk.net> 19 Copyright (c) 2021 Tomas Korbar <tkorbar@redhat.com> 20 Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr> 21 Copyright (c) 2022 Thijs Schreijer <thijs@thijsschreijer.nl> 22 Licensed under the MIT license: 23 24 Permission is hereby granted, free of charge, to any person obtaining 25 a copy of this software and associated documentation files (the 26 "Software"), to deal in the Software without restriction, including 27 without limitation the rights to use, copy, modify, merge, publish, 28 distribute, sublicense, and/or sell copies of the Software, and to permit 29 persons to whom the Software is furnished to do so, subject to the 30 following conditions: 31 32 The above copyright notice and this permission notice shall be included 33 in all copies or substantial portions of the Software. 34 35 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 36 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 37 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 38 NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 39 DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 40 OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 41 USE OR OTHER DEALINGS IN THE SOFTWARE. 42--> 43 <title>Expat XML Parser</title> 44 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" /> 45 <meta http-equiv="Content-Style-Type" content="text/css" /> 46 <link href="ok.min.css" rel="stylesheet" type="text/css" /> 47 <link href="style.css" rel="stylesheet" type="text/css" /> 48</head> 49<body> 50 <div> 51 <h1> 52 The Expat XML Parser 53 <small>Release 2.4.7</small> 54 </h1> 55 </div> 56<div class="content"> 57 58<p>Expat is a library, written in C, for parsing XML documents. It's 59the underlying XML parser for the open source Mozilla project, Perl's 60<code>XML::Parser</code>, Python's <code>xml.parsers.expat</code>, and 61other open-source XML parsers.</p> 62 63<p>This library is the creation of James Clark, who's also given us 64groff (an nroff look-alike), Jade (an implementation of ISO's DSSSL 65stylesheet language for SGML), XP (a Java XML parser package), XT (a 66Java XSL engine). James was also the technical lead on the XML 67Working Group at W3C that produced the XML specification.</p> 68 69<p>This is free software, licensed under the <a 70href="../COPYING">MIT/X Consortium license</a>. You may download it 71from <a href="http://www.libexpat.org/">the Expat home page</a>. 72</p> 73 74<p>The bulk of this document was originally commissioned as an article 75by <a href="http://www.xml.com/">XML.com</a>. They graciously allowed 76Clark Cooper to retain copyright and to distribute it with Expat. 77This version has been substantially extended to include documentation 78on features which have been added since the original article was 79published, and additional information on using the original 80interface.</p> 81 82<hr /> 83<h2>Table of Contents</h2> 84<ul> 85 <li><a href="#overview">Overview</a></li> 86 <li><a href="#building">Building and Installing</a></li> 87 <li><a href="#using">Using Expat</a></li> 88 <li><a href="#reference">Reference</a> 89 <ul> 90 <li><a href="#creation">Parser Creation Functions</a> 91 <ul> 92 <li><a href="#XML_ParserCreate">XML_ParserCreate</a></li> 93 <li><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></li> 94 <li><a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a></li> 95 <li><a href="#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></li> 96 <li><a href="#XML_ParserFree">XML_ParserFree</a></li> 97 <li><a href="#XML_ParserReset">XML_ParserReset</a></li> 98 </ul> 99 </li> 100 <li><a href="#parsing">Parsing Functions</a> 101 <ul> 102 <li><a href="#XML_Parse">XML_Parse</a></li> 103 <li><a href="#XML_ParseBuffer">XML_ParseBuffer</a></li> 104 <li><a href="#XML_GetBuffer">XML_GetBuffer</a></li> 105 <li><a href="#XML_StopParser">XML_StopParser</a></li> 106 <li><a href="#XML_ResumeParser">XML_ResumeParser</a></li> 107 <li><a href="#XML_GetParsingStatus">XML_GetParsingStatus</a></li> 108 </ul> 109 </li> 110 <li><a href="#setting">Handler Setting Functions</a> 111 <ul> 112 <li><a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a></li> 113 <li><a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a></li> 114 <li><a href="#XML_SetElementHandler">XML_SetElementHandler</a></li> 115 <li><a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></li> 116 <li><a href="#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a></li> 117 <li><a href="#XML_SetCommentHandler">XML_SetCommentHandler</a></li> 118 <li><a href="#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a></li> 119 <li><a href="#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a></li> 120 <li><a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a></li> 121 <li><a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a></li> 122 <li><a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></li> 123 <li><a href="#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></li> 124 <li><a href="#XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</a></li> 125 <li><a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a></li> 126 <li><a href="#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a></li> 127 <li><a href="#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a></li> 128 <li><a href="#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a></li> 129 <li><a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></li> 130 <li><a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></li> 131 <li><a href="#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a></li> 132 <li><a href="#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a></li> 133 <li><a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a></li> 134 <li><a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a></li> 135 <li><a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a></li> 136 <li><a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a></li> 137 <li><a href="#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a></li> 138 <li><a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a></li> 139 <li><a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a></li> 140 </ul> 141 </li> 142 <li><a href="#position">Parse Position and Error Reporting Functions</a> 143 <ul> 144 <li><a href="#XML_GetErrorCode">XML_GetErrorCode</a></li> 145 <li><a href="#XML_ErrorString">XML_ErrorString</a></li> 146 <li><a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></li> 147 <li><a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></li> 148 <li><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></li> 149 <li><a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a></li> 150 <li><a href="#XML_GetInputContext">XML_GetInputContext</a></li> 151 </ul> 152 </li> 153 <li> 154 <a href="#billion-laughs">Billion Laughs Attack Protection</a> 155 <ul> 156 <li><a href="#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></li> 157 <li><a href="#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></li> 158 </ul> 159 </li> 160 <li><a href="#miscellaneous">Miscellaneous Functions</a> 161 <ul> 162 <li><a href="#XML_SetUserData">XML_SetUserData</a></li> 163 <li><a href="#XML_GetUserData">XML_GetUserData</a></li> 164 <li><a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a></li> 165 <li><a href="#XML_SetBase">XML_SetBase</a></li> 166 <li><a href="#XML_GetBase">XML_GetBase</a></li> 167 <li><a href="#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a></li> 168 <li><a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a></li> 169 <li><a href="#XML_GetAttributeInfo">XML_GetAttributeInfo</a></li> 170 <li><a href="#XML_SetEncoding">XML_SetEncoding</a></li> 171 <li><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></li> 172 <li><a href="#XML_SetHashSalt">XML_SetHashSalt</a></li> 173 <li><a href="#XML_UseForeignDTD">XML_UseForeignDTD</a></li> 174 <li><a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></li> 175 <li><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></li> 176 <li><a href="#XML_ExpatVersion">XML_ExpatVersion</a></li> 177 <li><a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a></li> 178 <li><a href="#XML_GetFeatureList">XML_GetFeatureList</a></li> 179 <li><a href="#XML_FreeContentModel">XML_FreeContentModel</a></li> 180 <li><a href="#XML_MemMalloc">XML_MemMalloc</a></li> 181 <li><a href="#XML_MemRealloc">XML_MemRealloc</a></li> 182 <li><a href="#XML_MemFree">XML_MemFree</a></li> 183 </ul> 184 </li> 185 </ul> 186 </li> 187</ul> 188 189<hr /> 190<h2><a name="overview">Overview</a></h2> 191 192<p>Expat is a stream-oriented parser. You register callback (or 193handler) functions with the parser and then start feeding it the 194document. As the parser recognizes parts of the document, it will 195call the appropriate handler for that part (if you've registered one.) 196The document is fed to the parser in pieces, so you can start parsing 197before you have all the document. This also allows you to parse really 198huge documents that won't fit into memory.</p> 199 200<p>Expat can be intimidating due to the many kinds of handlers and 201options you can set. But you only need to learn four functions in 202order to do 90% of what you'll want to do with it:</p> 203 204<dl> 205 206<dt><code><a href= "#XML_ParserCreate" 207 >XML_ParserCreate</a></code></dt> 208 <dd>Create a new parser object.</dd> 209 210<dt><code><a href= "#XML_SetElementHandler" 211 >XML_SetElementHandler</a></code></dt> 212 <dd>Set handlers for start and end tags.</dd> 213 214<dt><code><a href= "#XML_SetCharacterDataHandler" 215 >XML_SetCharacterDataHandler</a></code></dt> 216 <dd>Set handler for text.</dd> 217 218<dt><code><a href= "#XML_Parse" 219 >XML_Parse</a></code></dt> 220 <dd>Pass a buffer full of document to the parser</dd> 221</dl> 222 223<p>These functions and others are described in the <a 224href="#reference">reference</a> part of this document. The reference 225section also describes in detail the parameters passed to the 226different types of handlers.</p> 227 228<p>Let's look at a very simple example program that only uses 3 of the 229above functions (it doesn't need to set a character handler.) The 230program <a href="../examples/outline.c">outline.c</a> prints an 231element outline, indenting child elements to distinguish them from the 232parent element that contains them. The start handler does all the 233work. It prints two indenting spaces for every level of ancestor 234elements, then it prints the element and attribute 235information. Finally it increments the global <code>Depth</code> 236variable.</p> 237 238<pre class="eg"> 239int Depth; 240 241void XMLCALL 242start(void *data, const char *el, const char **attr) { 243 int i; 244 245 for (i = 0; i < Depth; i++) 246 printf(" "); 247 248 printf("%s", el); 249 250 for (i = 0; attr[i]; i += 2) { 251 printf(" %s='%s'", attr[i], attr[i + 1]); 252 } 253 254 printf("\n"); 255 Depth++; 256} /* End of start handler */ 257</pre> 258 259<p>The end tag simply does the bookkeeping work of decrementing 260<code>Depth</code>.</p> 261<pre class="eg"> 262void XMLCALL 263end(void *data, const char *el) { 264 Depth--; 265} /* End of end handler */ 266</pre> 267 268<p>Note the <code>XMLCALL</code> annotation used for the callbacks. 269This is used to ensure that the Expat and the callbacks are using the 270same calling convention in case the compiler options used for Expat 271itself and the client code are different. Expat tries not to care 272what the default calling convention is, though it may require that it 273be compiled with a default convention of "cdecl" on some platforms. 274For code which uses Expat, however, the calling convention is 275specified by the <code>XMLCALL</code> annotation on most platforms; 276callbacks should be defined using this annotation.</p> 277 278<p>The <code>XMLCALL</code> annotation was added in Expat 1.95.7, but 279existing working Expat applications don't need to add it (since they 280are already using the "cdecl" calling convention, or they wouldn't be 281working). The annotation is only needed if the default calling 282convention may be something other than "cdecl". To use the annotation 283safely with older versions of Expat, you can conditionally define it 284<em>after</em> including Expat's header file:</p> 285 286<pre class="eg"> 287#include <expat.h> 288 289#ifndef XMLCALL 290#if defined(_MSC_EXTENSIONS) && !defined(__BEOS__) && !defined(__CYGWIN__) 291#define XMLCALL __cdecl 292#elif defined(__GNUC__) 293#define XMLCALL __attribute__((cdecl)) 294#else 295#define XMLCALL 296#endif 297#endif 298</pre> 299 300<p>After creating the parser, the main program just has the job of 301shoveling the document to the parser so that it can do its work.</p> 302 303<hr /> 304<h2><a name="building">Building and Installing Expat</a></h2> 305 306<p>The Expat distribution comes as a compressed (with GNU gzip) tar 307file. You may download the latest version from <a href= 308"http://sourceforge.net/projects/expat/" >Source Forge</a>. After 309unpacking this, cd into the directory. Then follow either the Win32 310directions or Unix directions below.</p> 311 312<h3>Building under Win32</h3> 313 314<p>If you're using the GNU compiler under cygwin, follow the Unix 315directions in the next section. Otherwise if you have Microsoft's 316Developer Studio installed, 317you can use CMake to generate a <code>.sln</code> file, e.g. 318<code> 319cmake -G"Visual Studio 15 2017" -DCMAKE_BUILD_TYPE=RelWithDebInfo . 320</code>, and build Expat using <code>msbuild /m expat.sln</code> after.</p> 321 322<p>Alternatively, you may download the Win32 binary package that 323contains the "expat.h" include file and a pre-built DLL.</p> 324 325<h3>Building under Unix (or GNU)</h3> 326 327<p>First you'll need to run the configure shell script in order to 328configure the Makefiles and headers for your system.</p> 329 330<p>If you're happy with all the defaults that configure picks for you, 331and you have permission on your system to install into /usr/local, you 332can install Expat with this sequence of commands:</p> 333 334<pre class="eg"> 335./configure 336make 337make install 338</pre> 339 340<p>There are some options that you can provide to this script, but the 341only one we'll mention here is the <code>--prefix</code> option. You 342can find out all the options available by running configure with just 343the <code>--help</code> option.</p> 344 345<p>By default, the configure script sets things up so that the library 346gets installed in <code>/usr/local/lib</code> and the associated 347header file in <code>/usr/local/include</code>. But if you were to 348give the option, <code>--prefix=/home/me/mystuff</code>, then the 349library and header would get installed in 350<code>/home/me/mystuff/lib</code> and 351<code>/home/me/mystuff/include</code> respectively.</p> 352 353<h3>Configuring Expat Using the Pre-Processor</h3> 354 355<p>Expat's feature set can be configured using a small number of 356pre-processor definitions. The definition of this symbols does not 357affect the set of entry points for Expat, only the behavior of the API 358and the definition of character types in the case of 359<code>XML_UNICODE_WCHAR_T</code>. The symbols are:</p> 360 361<dl class="cpp-symbols"> 362<dt>XML_DTD</dt> 363<dd>Include support for using and reporting DTD-based content. If 364this is defined, default attribute values from an external DTD subset 365are reported and attribute value normalization occurs based on the 366type of attributes defined in the external subset. Without 367this, Expat has a smaller memory footprint and can be faster, but will 368not load external entities or process conditional sections. This does 369not affect the set of functions available in the API.</dd> 370 371<dt>XML_NS</dt> 372<dd>When defined, support for the <cite><a href= 373"http://www.w3.org/TR/REC-xml-names/" >Namespaces in XML</a></cite> 374specification is included.</dd> 375 376<dt>XML_UNICODE</dt> 377<dd>When defined, character data reported to the application is 378encoded in UTF-16 using wide characters of the type 379<code>XML_Char</code>. This is implied if 380<code>XML_UNICODE_WCHAR_T</code> is defined.</dd> 381 382<dt>XML_UNICODE_WCHAR_T</dt> 383<dd>If defined, causes the <code>XML_Char</code> character type to be 384defined using the <code>wchar_t</code> type; otherwise, <code>unsigned 385short</code> is used. Defining this implies 386<code>XML_UNICODE</code>.</dd> 387 388<dt>XML_LARGE_SIZE</dt> 389<dd>If defined, causes the <code>XML_Size</code> and <code>XML_Index</code> 390integer types to be at least 64 bits in size. This is intended to support 391processing of very large input streams, where the return values of 392<code><a href="#XML_GetCurrentByteIndex" >XML_GetCurrentByteIndex</a></code>, 393<code><a href="#XML_GetCurrentLineNumber" >XML_GetCurrentLineNumber</a></code> and 394<code><a href="#XML_GetCurrentColumnNumber" >XML_GetCurrentColumnNumber</a></code> 395could overflow. It may not be supported by all compilers, and is turned 396off by default.</dd> 397 398<dt>XML_CONTEXT_BYTES</dt> 399<dd>The number of input bytes of markup context which the parser will 400ensure are available for reporting via <code><a href= 401"#XML_GetInputContext" >XML_GetInputContext</a></code>. This is 402normally set to 1024, and must be set to a positive integer. If this 403is not defined, the input context will not be available and <code><a 404href= "#XML_GetInputContext" >XML_GetInputContext</a></code> will 405always report NULL. Without this, Expat has a smaller memory 406footprint and can be faster.</dd> 407 408<dt>XML_STATIC</dt> 409<dd>On Windows, this should be set if Expat is going to be linked 410statically with the code that calls it; this is required to get all 411the right MSVC magic annotations correct. This is ignored on other 412platforms.</dd> 413 414<dt>XML_ATTR_INFO</dt> 415<dd>If defined, makes the additional function <code><a href= 416"#XML_GetAttributeInfo" >XML_GetAttributeInfo</a></code> available 417for reporting attribute byte offsets.</dd> 418</dl> 419 420<hr /> 421<h2><a name="using">Using Expat</a></h2> 422 423<h3>Compiling and Linking Against Expat</h3> 424 425<p>Unless you installed Expat in a location not expected by your 426compiler and linker, all you have to do to use Expat in your programs 427is to include the Expat header (<code>#include <expat.h></code>) 428in your files that make calls to it and to tell the linker that it 429needs to link against the Expat library. On Unix systems, this would 430usually be done with the <code>-lexpat</code> argument. Otherwise, 431you'll need to tell the compiler where to look for the Expat header 432and the linker where to find the Expat library. You may also need to 433take steps to tell the operating system where to find this library at 434run time.</p> 435 436<p>On a Unix-based system, here's what a Makefile might look like when 437Expat is installed in a standard location:</p> 438 439<pre class="eg"> 440CC=cc 441LDFLAGS= 442LIBS= -lexpat 443xmlapp: xmlapp.o 444 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 445</pre> 446 447<p>If you installed Expat in, say, <code>/home/me/mystuff</code>, then 448the Makefile would look like this:</p> 449 450<pre class="eg"> 451CC=cc 452CFLAGS= -I/home/me/mystuff/include 453LDFLAGS= 454LIBS= -L/home/me/mystuff/lib -lexpat 455xmlapp: xmlapp.o 456 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 457</pre> 458 459<p>You'd also have to set the environment variable 460<code>LD_LIBRARY_PATH</code> to <code>/home/me/mystuff/lib</code> (or 461to <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if 462LD_LIBRARY_PATH already has some directories in it) in order to run 463your application.</p> 464 465<h3>Expat Basics</h3> 466 467<p>As we saw in the example in the overview, the first step in parsing 468an XML document with Expat is to create a parser object. There are <a 469href="#creation">three functions</a> in the Expat API for creating a 470parser object. However, only two of these (<code><a href= 471"#XML_ParserCreate" >XML_ParserCreate</a></code> and <code><a href= 472"#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>) can be used for 473constructing a parser for a top-level document. The object returned 474by these functions is an opaque pointer (i.e. "expat.h" declares it as 475void *) to data with further internal structure. In order to free the 476memory associated with this object you must call <code><a href= 477"#XML_ParserFree" >XML_ParserFree</a></code>. Note that if you have 478provided any <a href="#userdata">user data</a> that gets stored in the 479parser, then your application is responsible for freeing it prior to 480calling <code>XML_ParserFree</code>.</p> 481 482<p>The objects returned by the parser creation functions are good for 483parsing only one XML document or external parsed entity. If your 484application needs to parse many XML documents, then it needs to create 485a parser object for each one. The best way to deal with this is to 486create a higher level object that contains all the default 487initialization you want for your parser objects.</p> 488 489<p>Walking through a document hierarchy with a stream oriented parser 490will require a good stack mechanism in order to keep track of current 491context. For instance, to answer the simple question, "What element 492does this text belong to?" requires a stack, since the parser may have 493descended into other elements that are children of the current one and 494has encountered this text on the way out.</p> 495 496<p>The things you're likely to want to keep on a stack are the 497currently opened element and it's attributes. You push this 498information onto the stack in the start handler and you pop it off in 499the end handler.</p> 500 501<p>For some tasks, it is sufficient to just keep information on what 502the depth of the stack is (or would be if you had one.) The outline 503program shown above presents one example. Another such task would be 504skipping over a complete element. When you see the start tag for the 505element you want to skip, you set a skip flag and record the depth at 506which the element started. When the end tag handler encounters the 507same depth, the skipped element has ended and the flag may be 508cleared. If you follow the convention that the root element starts at 5091, then you can use the same variable for skip flag and skip 510depth.</p> 511 512<pre class="eg"> 513void 514init_info(Parseinfo *info) { 515 info->skip = 0; 516 info->depth = 1; 517 /* Other initializations here */ 518} /* End of init_info */ 519 520void XMLCALL 521rawstart(void *data, const char *el, const char **attr) { 522 Parseinfo *inf = (Parseinfo *) data; 523 524 if (! inf->skip) { 525 if (should_skip(inf, el, attr)) { 526 inf->skip = inf->depth; 527 } 528 else 529 start(inf, el, attr); /* This does rest of start handling */ 530 } 531 532 inf->depth++; 533} /* End of rawstart */ 534 535void XMLCALL 536rawend(void *data, const char *el) { 537 Parseinfo *inf = (Parseinfo *) data; 538 539 inf->depth--; 540 541 if (! inf->skip) 542 end(inf, el); /* This does rest of end handling */ 543 544 if (inf->skip == inf->depth) 545 inf->skip = 0; 546} /* End rawend */ 547</pre> 548 549<p>Notice in the above example the difference in how depth is 550manipulated in the start and end handlers. The end tag handler should 551be the mirror image of the start tag handler. This is necessary to 552properly model containment. Since, in the start tag handler, we 553incremented depth <em>after</em> the main body of start tag code, then 554in the end handler, we need to manipulate it <em>before</em> the main 555body. If we'd decided to increment it first thing in the start 556handler, then we'd have had to decrement it last thing in the end 557handler.</p> 558 559<h3 id="userdata">Communicating between handlers</h3> 560 561<p>In order to be able to pass information between different handlers 562without using globals, you'll need to define a data structure to hold 563the shared variables. You can then tell Expat (with the <code><a href= 564"#XML_SetUserData" >XML_SetUserData</a></code> function) to pass a 565pointer to this structure to the handlers. This is the first 566argument received by most handlers. In the <a href="#reference" 567>reference section</a>, an argument to a callback function is named 568<code>userData</code> and have type <code>void *</code> if the user 569data is passed; it will have the type <code>XML_Parser</code> if the 570parser itself is passed. When the parser is passed, the user data may 571be retrieved using <code><a href="#XML_GetUserData" 572>XML_GetUserData</a></code>.</p> 573 574<p>One common case where multiple calls to a single handler may need 575to communicate using an application data structure is the case when 576content passed to the character data handler (set by <code><a href= 577"#XML_SetCharacterDataHandler" 578>XML_SetCharacterDataHandler</a></code>) needs to be accumulated. A 579common first-time mistake with any of the event-oriented interfaces to 580an XML parser is to expect all the text contained in an element to be 581reported by a single call to the character data handler. Expat, like 582many other XML parsers, reports such data as a sequence of calls; 583there's no way to know when the end of the sequence is reached until a 584different callback is made. A buffer referenced by the user data 585structure proves both an effective and convenient place to accumulate 586character data.</p> 587 588<!-- XXX example needed here --> 589 590 591<h3>XML Version</h3> 592 593<p>Expat is an XML 1.0 parser, and as such never complains based on 594the value of the <code>version</code> pseudo-attribute in the XML 595declaration, if present.</p> 596 597<p>If an application needs to check the version number (to support 598alternate processing), it should use the <code><a href= 599"#XML_SetXmlDeclHandler" >XML_SetXmlDeclHandler</a></code> function to 600set a handler that uses the information in the XML declaration to 601determine what to do. This example shows how to check that only a 602version number of <code>"1.0"</code> is accepted:</p> 603 604<pre class="eg"> 605static int wrong_version; 606static XML_Parser parser; 607 608static void XMLCALL 609xmldecl_handler(void *userData, 610 const XML_Char *version, 611 const XML_Char *encoding, 612 int standalone) 613{ 614 static const XML_Char Version_1_0[] = {'1', '.', '0', 0}; 615 616 int i; 617 618 for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) { 619 if (version[i] != Version_1_0[i]) { 620 wrong_version = 1; 621 /* also clear all other handlers: */ 622 XML_SetCharacterDataHandler(parser, NULL); 623 ... 624 return; 625 } 626 } 627 ... 628} 629</pre> 630 631<h3>Namespace Processing</h3> 632 633<p>When the parser is created using the <code><a href= 634"#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>, function, Expat 635performs namespace processing. Under namespace processing, Expat 636consumes <code>xmlns</code> and <code>xmlns:...</code> attributes, 637which declare namespaces for the scope of the element in which they 638occur. This means that your start handler will not see these 639attributes. Your application can still be informed of these 640declarations by setting namespace declaration handlers with <a href= 641"#XML_SetNamespaceDeclHandler" 642><code>XML_SetNamespaceDeclHandler</code></a>.</p> 643 644<p>Element type and attribute names that belong to a given namespace 645are passed to the appropriate handler in expanded form. By default 646this expanded form is a concatenation of the namespace URI, the 647separator character (which is the 2nd argument to <code><a href= 648"#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>), and the local 649name (i.e. the part after the colon). Names with undeclared prefixes 650are not well-formed when namespace processing is enabled, and will 651trigger an error. Unprefixed attribute names are never expanded, 652and unprefixed element names are only expanded when they are in the 653scope of a default namespace.</p> 654 655<p>However if <code><a href= "#XML_SetReturnNSTriplet" 656>XML_SetReturnNSTriplet</a></code> has been called with a non-zero 657<code>do_nst</code> parameter, then the expanded form for names with 658an explicit prefix is a concatenation of: URI, separator, local name, 659separator, prefix.</p> 660 661<p>You can set handlers for the start of a namespace declaration and 662for the end of a scope of a declaration with the <code><a href= 663"#XML_SetNamespaceDeclHandler" >XML_SetNamespaceDeclHandler</a></code> 664function. The StartNamespaceDeclHandler is called prior to the start 665tag handler and the EndNamespaceDeclHandler is called after the 666corresponding end tag that ends the namespace's scope. The namespace 667start handler gets passed the prefix and URI for the namespace. For a 668default namespace declaration (xmlns='...'), the prefix will be null. 669The URI will be null for the case where the default namespace is being 670unset. The namespace end handler just gets the prefix for the closing 671scope.</p> 672 673<p>These handlers are called for each declaration. So if, for 674instance, a start tag had three namespace declarations, then the 675StartNamespaceDeclHandler would be called three times before the start 676tag handler is called, once for each declaration.</p> 677 678<h3>Character Encodings</h3> 679 680<p>While XML is based on Unicode, and every XML processor is required 681to recognized UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), 682other encodings may be declared in XML documents or entities. For the 683main document, an XML declaration may contain an encoding 684declaration:</p> 685<pre> 686<?xml version="1.0" encoding="ISO-8859-2"?> 687</pre> 688 689<p>External parsed entities may begin with a text declaration, which 690looks like an XML declaration with just an encoding declaration:</p> 691<pre> 692<?xml encoding="Big5"?> 693</pre> 694 695<p>With Expat, you may also specify an encoding at the time of 696creating a parser. This is useful when the encoding information may 697come from a source outside the document itself (like a higher level 698protocol.)</p> 699 700<p><a name="builtin_encodings"></a>There are four built-in encodings 701in Expat:</p> 702<ul> 703<li>UTF-8</li> 704<li>UTF-16</li> 705<li>ISO-8859-1</li> 706<li>US-ASCII</li> 707</ul> 708 709<p>Anything else discovered in an encoding declaration or in the 710protocol encoding specified in the parser constructor, triggers a call 711to the <code>UnknownEncodingHandler</code>. This handler gets passed 712the encoding name and a pointer to an <code>XML_Encoding</code> data 713structure. Your handler must fill in this structure and return 714<code>XML_STATUS_OK</code> if it knows how to deal with the 715encoding. Otherwise the handler should return 716<code>XML_STATUS_ERROR</code>. The handler also gets passed a pointer 717to an optional application data structure that you may indicate when 718you set the handler.</p> 719 720<p>Expat places restrictions on character encodings that it can 721support by filling in the <code>XML_Encoding</code> structure. 722include file:</p> 723<ol> 724<li>Every ASCII character that can appear in a well-formed XML document 725must be represented by a single byte, and that byte must correspond to 726it's ASCII encoding (except for the characters $@\^'{}~)</li> 727<li>Characters must be encoded in 4 bytes or less.</li> 728<li>All characters encoded must have Unicode scalar values less than or 729equal to 65535 (0xFFFF)<em>This does not apply to the built-in support 730for UTF-16 and UTF-8</em></li> 731<li>No character may be encoded by more that one distinct sequence of 732bytes</li> 733</ol> 734 735<p><code>XML_Encoding</code> contains an array of integers that 736correspond to the 1st byte of an encoding sequence. If the value in 737the array for a byte is zero or positive, then the byte is a single 738byte encoding that encodes the Unicode scalar value contained in the 739array. A -1 in this array indicates a malformed byte. If the value is 740-2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte 741sequence respectively. Multi-byte sequences are sent to the convert 742function pointed at in the <code>XML_Encoding</code> structure. This 743function should return the Unicode scalar value for the sequence or -1 744if the sequence is malformed.</p> 745 746<p>One pitfall that novice Expat users are likely to fall into is that 747although Expat may accept input in various encodings, the strings that 748it passes to the handlers are always encoded in UTF-8 or UTF-16 749(depending on how Expat was compiled). Your application is responsible 750for any translation of these strings into other encodings.</p> 751 752<h3>Handling External Entity References</h3> 753 754<p>Expat does not read or parse external entities directly. Note that 755any external DTD is a special case of an external entity. If you've 756set no <code>ExternalEntityRefHandler</code>, then external entity 757references are silently ignored. Otherwise, it calls your handler with 758the information needed to read and parse the external entity.</p> 759 760<p>Your handler isn't actually responsible for parsing the entity, but 761it is responsible for creating a subsidiary parser with <code><a href= 762"#XML_ExternalEntityParserCreate" 763>XML_ExternalEntityParserCreate</a></code> that will do the job. This 764returns an instance of <code>XML_Parser</code> that has handlers and 765other data structures initialized from the parent parser. You may then 766use <code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a 767href= "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this 768parser. Since external entities my refer to other external entities, 769your handler should be prepared to be called recursively.</p> 770 771<h3>Parsing DTDs</h3> 772 773<p>In order to parse parameter entities, before starting the parse, 774you must call <code><a href= "#XML_SetParamEntityParsing" 775>XML_SetParamEntityParsing</a></code> with one of the following 776arguments:</p> 777<dl> 778<dt><code>XML_PARAM_ENTITY_PARSING_NEVER</code></dt> 779<dd>Don't parse parameter entities or the external subset</dd> 780<dt><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></dt> 781<dd>Parse parameter entities and the external subset unless 782<code>standalone</code> was set to "yes" in the XML declaration.</dd> 783<dt><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></dt> 784<dd>Always parse parameter entities and the external subset</dd> 785</dl> 786 787<p>In order to read an external DTD, you also have to set an external 788entity reference handler as described above.</p> 789 790<h3 id="stop-resume">Temporarily Stopping Parsing</h3> 791 792<p>Expat 1.95.8 introduces a new feature: its now possible to stop 793parsing temporarily from within a handler function, even if more data 794has already been passed into the parser. Applications for this 795include</p> 796 797<ul> 798 <li>Supporting the <a href= "http://www.w3.org/TR/xinclude/" 799 >XInclude</a> specification.</li> 800 801 <li>Delaying further processing until additional information is 802 available from some other source.</li> 803 804 <li>Adjusting processor load as task priorities shift within an 805 application.</li> 806 807 <li>Stopping parsing completely (simply free or reset the parser 808 instead of resuming in the outer parsing loop). This can be useful 809 if an application-domain error is found in the XML being parsed or if 810 the result of the parse is determined not to be useful after 811 all.</li> 812</ul> 813 814<p>To take advantage of this feature, the main parsing loop of an 815application needs to support this specifically. It cannot be 816supported with a parsing loop compatible with Expat 1.95.7 or 817earlier (though existing loops will continue to work without 818supporting the stop/resume feature).</p> 819 820<p>An application that uses this feature for a single parser will have 821the rough structure (in pseudo-code):</p> 822 823<pre class="pseudocode"> 824fd = open_input() 825p = create_parser() 826 827if parse_xml(p, fd) { 828 /* suspended */ 829 830 int suspended = 1; 831 832 while (suspended) { 833 do_something_else() 834 if ready_to_resume() { 835 suspended = continue_parsing(p, fd); 836 } 837 } 838} 839</pre> 840 841<p>An application that may resume any of several parsers based on 842input (either from the XML being parsed or some other source) will 843certainly have more interesting control structures.</p> 844 845<p>This C function could be used for the <code>parse_xml</code> 846function mentioned in the pseudo-code above:</p> 847 848<pre class="eg"> 849#define BUFF_SIZE 10240 850 851/* Parse a document from the open file descriptor 'fd' until the parse 852 is complete (the document has been completely parsed, or there's 853 been an error), or the parse is stopped. Return non-zero when 854 the parse is merely suspended. 855*/ 856int 857parse_xml(XML_Parser p, int fd) 858{ 859 for (;;) { 860 int last_chunk; 861 int bytes_read; 862 enum XML_Status status; 863 864 void *buff = XML_GetBuffer(p, BUFF_SIZE); 865 if (buff == NULL) { 866 /* handle error... */ 867 return 0; 868 } 869 bytes_read = read(fd, buff, BUFF_SIZE); 870 if (bytes_read < 0) { 871 /* handle error... */ 872 return 0; 873 } 874 status = XML_ParseBuffer(p, bytes_read, bytes_read == 0); 875 switch (status) { 876 case XML_STATUS_ERROR: 877 /* handle error... */ 878 return 0; 879 case XML_STATUS_SUSPENDED: 880 return 1; 881 } 882 if (bytes_read == 0) 883 return 0; 884 } 885} 886</pre> 887 888<p>The corresponding <code>continue_parsing</code> function is 889somewhat simpler, since it only need deal with the return code from 890<code><a href= "#XML_ResumeParser">XML_ResumeParser</a></code>; it can 891delegate the input handling to the <code>parse_xml</code> 892function:</p> 893 894<pre class="eg"> 895/* Continue parsing a document which had been suspended. The 'p' and 896 'fd' arguments are the same as passed to parse_xml(). Return 897 non-zero when the parse is suspended. 898*/ 899int 900continue_parsing(XML_Parser p, int fd) 901{ 902 enum XML_Status status = XML_ResumeParser(p); 903 switch (status) { 904 case XML_STATUS_ERROR: 905 /* handle error... */ 906 return 0; 907 case XML_ERROR_NOT_SUSPENDED: 908 /* handle error... */ 909 return 0;. 910 case XML_STATUS_SUSPENDED: 911 return 1; 912 } 913 return parse_xml(p, fd); 914} 915</pre> 916 917<p>Now that we've seen what a mess the top-level parsing loop can 918become, what have we gained? Very simply, we can now use the <code><a 919href= "#XML_StopParser" >XML_StopParser</a></code> function to stop 920parsing, without having to go to great lengths to avoid additional 921processing that we're expecting to ignore. As a bonus, we get to stop 922parsing <em>temporarily</em>, and come back to it when we're 923ready.</p> 924 925<p>To stop parsing from a handler function, use the <code><a href= 926"#XML_StopParser" >XML_StopParser</a></code> function. This function 927takes two arguments; the parser being stopped and a flag indicating 928whether the parse can be resumed in the future.</p> 929 930<!-- XXX really need more here --> 931 932 933<hr /> 934<!-- ================================================================ --> 935 936<h2><a name="reference">Expat Reference</a></h2> 937 938<h3><a name="creation">Parser Creation</a></h3> 939 940<h4 id="XML_ParserCreate">XML_ParserCreate</h4> 941<pre class="fcndec"> 942XML_Parser XMLCALL 943XML_ParserCreate(const XML_Char *encoding); 944</pre> 945<div class="fcndef"> 946Construct a new parser. If encoding is non-null, it specifies a 947character encoding to use for the document. This overrides the document 948encoding declaration. There are four built-in encodings: 949<ul> 950<li>US-ASCII</li> 951<li>UTF-8</li> 952<li>UTF-16</li> 953<li>ISO-8859-1</li> 954</ul> 955Any other value will invoke a call to the UnknownEncodingHandler. 956</div> 957 958<h4 id="XML_ParserCreateNS">XML_ParserCreateNS</h4> 959<pre class="fcndec"> 960XML_Parser XMLCALL 961XML_ParserCreateNS(const XML_Char *encoding, 962 XML_Char sep); 963</pre> 964<div class="fcndef"> 965Constructs a new parser that has namespace processing in effect. Namespace 966expanded element names and attribute names are returned as a concatenation 967of the namespace URI, <em>sep</em>, and the local part of the name. This 968means that you should pick a character for <em>sep</em> that can't be part 969of an URI. Since Expat does not check namespace URIs for conformance, the 970only safe choice for a namespace separator is a character that is illegal 971in XML. For instance, <code>'\xFF'</code> is not legal in UTF-8, and 972<code>'\xFFFF'</code> is not legal in UTF-16. There is a special case when 973<em>sep</em> is the null character <code>'\0'</code>: the namespace URI and 974the local part will be concatenated without any separator - this is intended 975to support RDF processors. It is a programming error to use the null separator 976with <a href= "#XML_SetReturnNSTriplet">namespace triplets</a>.</div> 977 978<p><strong>Note:</strong> 979Expat does not validate namespace URIs (beyond encoding) 980against RFC 3986 today (and is not required to do so with regard to 981the XML 1.0 namespaces specification) but it may start doing that 982in future releases. Before that, an application using Expat must 983be ready to receive namespace URIs containing non-URI characters. 984</p> 985 986<h4 id="XML_ParserCreate_MM">XML_ParserCreate_MM</h4> 987<pre class="fcndec"> 988XML_Parser XMLCALL 989XML_ParserCreate_MM(const XML_Char *encoding, 990 const XML_Memory_Handling_Suite *ms, 991 const XML_Char *sep); 992</pre> 993<pre class="signature"> 994typedef struct { 995 void *(XMLCALL *malloc_fcn)(size_t size); 996 void *(XMLCALL *realloc_fcn)(void *ptr, size_t size); 997 void (XMLCALL *free_fcn)(void *ptr); 998} XML_Memory_Handling_Suite; 999</pre> 1000<div class="fcndef"> 1001<p>Construct a new parser using the suite of memory handling functions 1002specified in <code>ms</code>. If <code>ms</code> is NULL, then use the 1003standard set of memory management functions. If <code>sep</code> is 1004non NULL, then namespace processing is enabled in the created parser 1005and the character pointed at by sep is used as the separator between 1006the namespace URI and the local part of the name.</p> 1007</div> 1008 1009<h4 id="XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</h4> 1010<pre class="fcndec"> 1011XML_Parser XMLCALL 1012XML_ExternalEntityParserCreate(XML_Parser p, 1013 const XML_Char *context, 1014 const XML_Char *encoding); 1015</pre> 1016<div class="fcndef"> 1017Construct a new <code>XML_Parser</code> object for parsing an external 1018general entity. Context is the context argument passed in a call to a 1019ExternalEntityRefHandler. Other state information such as handlers, 1020user data, namespace processing is inherited from the parser passed as 1021the 1st argument. So you shouldn't need to call any of the behavior 1022changing functions on this parser (unless you want it to act 1023differently than the parent parser). 1024</div> 1025 1026<h4 id="XML_ParserFree">XML_ParserFree</h4> 1027<pre class="fcndec"> 1028void XMLCALL 1029XML_ParserFree(XML_Parser p); 1030</pre> 1031<div class="fcndef"> 1032Free memory used by the parser. Your application is responsible for 1033freeing any memory associated with <a href="#userdata">user data</a>. 1034</div> 1035 1036<h4 id="XML_ParserReset">XML_ParserReset</h4> 1037<pre class="fcndec"> 1038XML_Bool XMLCALL 1039XML_ParserReset(XML_Parser p, 1040 const XML_Char *encoding); 1041</pre> 1042<div class="fcndef"> 1043Clean up the memory structures maintained by the parser so that it may 1044be used again. After this has been called, <code>parser</code> is 1045ready to start parsing a new document. All handlers are cleared from 1046the parser, except for the unknownEncodingHandler. The parser's external 1047state is re-initialized except for the values of ns and ns_triplets. 1048This function may not be used on a parser created using <code><a href= 1049"#XML_ExternalEntityParserCreate" >XML_ExternalEntityParserCreate</a 1050></code>; it will return <code>XML_FALSE</code> in that case. Returns 1051<code>XML_TRUE</code> on success. Your application is responsible for 1052dealing with any memory associated with <a href="#userdata">user data</a>. 1053</div> 1054 1055<h3><a name="parsing">Parsing</a></h3> 1056 1057<p>To state the obvious: the three parsing functions <code><a href= 1058"#XML_Parse" >XML_Parse</a></code>, <code><a href= "#XML_ParseBuffer"> 1059XML_ParseBuffer</a></code> and <code><a href= "#XML_GetBuffer"> 1060XML_GetBuffer</a></code> must not be called from within a handler 1061unless they operate on a separate parser instance, that is, one that 1062did not call the handler. For example, it is OK to call the parsing 1063functions from within an <code>XML_ExternalEntityRefHandler</code>, 1064if they apply to the parser created by 1065<code><a href= "#XML_ExternalEntityParserCreate" 1066>XML_ExternalEntityParserCreate</a></code>.</p> 1067 1068<p>Note: The <code>len</code> argument passed to these functions 1069should be considerably less than the maximum value for an integer, 1070as it could create an integer overflow situation if the added 1071lengths of a buffer and the unprocessed portion of the previous buffer 1072exceed the maximum integer value. Input data at the end of a buffer 1073will remain unprocessed if it is part of an XML token for which the 1074end is not part of that buffer.</p> 1075 1076<h4 id="XML_Parse">XML_Parse</h4> 1077<pre class="fcndec"> 1078enum XML_Status XMLCALL 1079XML_Parse(XML_Parser p, 1080 const char *s, 1081 int len, 1082 int isFinal); 1083</pre> 1084<pre class="signature"> 1085enum XML_Status { 1086 XML_STATUS_ERROR = 0, 1087 XML_STATUS_OK = 1 1088}; 1089</pre> 1090<div class="fcndef"> 1091Parse some more of the document. The string <code>s</code> is a buffer 1092containing part (or perhaps all) of the document. The number of bytes of s 1093that are part of the document is indicated by <code>len</code>. This means 1094that <code>s</code> doesn't have to be null terminated. It also means that 1095if <code>len</code> is larger than the number of bytes in the block of 1096memory that <code>s</code> points at, then a memory fault is likely. The 1097<code>isFinal</code> parameter informs the parser that this is the last 1098piece of the document. Frequently, the last piece is empty (i.e. 1099<code>len</code> is zero.) 1100If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. 1101Otherwise it returns <code>XML_STATUS_OK</code> value. 1102</div> 1103 1104<h4 id="XML_ParseBuffer">XML_ParseBuffer</h4> 1105<pre class="fcndec"> 1106enum XML_Status XMLCALL 1107XML_ParseBuffer(XML_Parser p, 1108 int len, 1109 int isFinal); 1110</pre> 1111<div class="fcndef"> 1112This is just like <code><a href= "#XML_Parse" >XML_Parse</a></code>, 1113except in this case Expat provides the buffer. By obtaining the 1114buffer from Expat with the <code><a href= "#XML_GetBuffer" 1115>XML_GetBuffer</a></code> function, the application can avoid double 1116copying of the input. 1117</div> 1118 1119<h4 id="XML_GetBuffer">XML_GetBuffer</h4> 1120<pre class="fcndec"> 1121void * XMLCALL 1122XML_GetBuffer(XML_Parser p, 1123 int len); 1124</pre> 1125<div class="fcndef"> 1126Obtain a buffer of size <code>len</code> to read a piece of the document 1127into. A NULL value is returned if Expat can't allocate enough memory for 1128this buffer. A NULL value may also be returned if <code>len</code> is zero. 1129This has to be called prior to every call to 1130<code><a href= "#XML_ParseBuffer" >XML_ParseBuffer</a></code>. A 1131typical use would look like this: 1132 1133<pre class="eg"> 1134for (;;) { 1135 int bytes_read; 1136 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1137 if (buff == NULL) { 1138 /* handle error */ 1139 } 1140 1141 bytes_read = read(docfd, buff, BUFF_SIZE); 1142 if (bytes_read < 0) { 1143 /* handle error */ 1144 } 1145 1146 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) { 1147 /* handle parse error */ 1148 } 1149 1150 if (bytes_read == 0) 1151 break; 1152} 1153</pre> 1154</div> 1155 1156<h4 id="XML_StopParser">XML_StopParser</h4> 1157<pre class="fcndec"> 1158enum XML_Status XMLCALL 1159XML_StopParser(XML_Parser p, 1160 XML_Bool resumable); 1161</pre> 1162<div class="fcndef"> 1163 1164<p>Stops parsing, causing <code><a href= "#XML_Parse" 1165>XML_Parse</a></code> or <code><a href= "#XML_ParseBuffer" 1166>XML_ParseBuffer</a></code> to return. Must be called from within a 1167call-back handler, except when aborting (when <code>resumable</code> 1168is <code>XML_FALSE</code>) an already suspended parser. Some 1169call-backs may still follow because they would otherwise get 1170lost, including</p> 1171<ul> 1172 <li> the end element handler for empty elements when stopped in the 1173 start element handler,</li> 1174 <li> the end namespace declaration handler when stopped in the end 1175 element handler,</li> 1176 <li> the character data handler when stopped in the character data handler 1177 while making multiple call-backs on a contiguous chunk of characters,</li> 1178</ul> 1179<p>and possibly others.</p> 1180 1181<p>This can be called from most handlers, including DTD related 1182call-backs, except when parsing an external parameter entity and 1183<code>resumable</code> is <code>XML_TRUE</code>. Returns 1184<code>XML_STATUS_OK</code> when successful, 1185<code>XML_STATUS_ERROR</code> otherwise. The possible error codes 1186are:</p> 1187<dl> 1188 <dt><code>XML_ERROR_SUSPENDED</code></dt> 1189 <dd>when suspending an already suspended parser.</dd> 1190 <dt><code>XML_ERROR_FINISHED</code></dt> 1191 <dd>when the parser has already finished.</dd> 1192 <dt><code>XML_ERROR_SUSPEND_PE</code></dt> 1193 <dd>when suspending while parsing an external PE.</dd> 1194</dl> 1195 1196<p>Since the stop/resume feature requires application support in the 1197outer parsing loop, it is an error to call this function for a parser 1198not being handled appropriately; see <a href= "#stop-resume" 1199>Temporarily Stopping Parsing</a> for more information.</p> 1200 1201<p>When <code>resumable</code> is <code>XML_TRUE</code> then parsing 1202is <em>suspended</em>, that is, <code><a href= "#XML_Parse" 1203>XML_Parse</a></code> and <code><a href= "#XML_ParseBuffer" 1204>XML_ParseBuffer</a></code> return <code>XML_STATUS_SUSPENDED</code>. 1205Otherwise, parsing is <em>aborted</em>, that is, <code><a href= 1206"#XML_Parse" >XML_Parse</a></code> and <code><a href= 1207"#XML_ParseBuffer" >XML_ParseBuffer</a></code> return 1208<code>XML_STATUS_ERROR</code> with error code 1209<code>XML_ERROR_ABORTED</code>.</p> 1210 1211<p><strong>Note:</strong> 1212This will be applied to the current parser instance only, that is, if 1213there is a parent parser then it will continue parsing when the 1214external entity reference handler returns. It is up to the 1215implementation of that handler to call <code><a href= 1216"#XML_StopParser" >XML_StopParser</a></code> on the parent parser 1217(recursively), if one wants to stop parsing altogether.</p> 1218 1219<p>When suspended, parsing can be resumed by calling <code><a href= 1220"#XML_ResumeParser" >XML_ResumeParser</a></code>.</p> 1221 1222<p>New in Expat 1.95.8.</p> 1223</div> 1224 1225<h4 id="XML_ResumeParser">XML_ResumeParser</h4> 1226<pre class="fcndec"> 1227enum XML_Status XMLCALL 1228XML_ResumeParser(XML_Parser p); 1229</pre> 1230<div class="fcndef"> 1231<p>Resumes parsing after it has been suspended with <code><a href= 1232"#XML_StopParser" >XML_StopParser</a></code>. Must not be called from 1233within a handler call-back. Returns same status codes as <code><a 1234href= "#XML_Parse">XML_Parse</a></code> or <code><a href= 1235"#XML_ParseBuffer" >XML_ParseBuffer</a></code>. An additional error 1236code, <code>XML_ERROR_NOT_SUSPENDED</code>, will be returned if the 1237parser was not currently suspended.</p> 1238 1239<p><strong>Note:</strong> 1240This must be called on the most deeply nested child parser instance 1241first, and on its parent parser only after the child parser has 1242finished, to be applied recursively until the document entity's parser 1243is restarted. That is, the parent parser will not resume by itself 1244and it is up to the application to call <code><a href= 1245"#XML_ResumeParser" >XML_ResumeParser</a></code> on it at the 1246appropriate moment.</p> 1247 1248<p>New in Expat 1.95.8.</p> 1249</div> 1250 1251<h4 id="XML_GetParsingStatus">XML_GetParsingStatus</h4> 1252<pre class="fcndec"> 1253void XMLCALL 1254XML_GetParsingStatus(XML_Parser p, 1255 XML_ParsingStatus *status); 1256</pre> 1257<pre class="signature"> 1258enum XML_Parsing { 1259 XML_INITIALIZED, 1260 XML_PARSING, 1261 XML_FINISHED, 1262 XML_SUSPENDED 1263}; 1264 1265typedef struct { 1266 enum XML_Parsing parsing; 1267 XML_Bool finalBuffer; 1268} XML_ParsingStatus; 1269</pre> 1270<div class="fcndef"> 1271<p>Returns status of parser with respect to being initialized, 1272parsing, finished, or suspended, and whether the final buffer is being 1273processed. The <code>status</code> parameter <em>must not</em> be 1274NULL.</p> 1275 1276<p>New in Expat 1.95.8.</p> 1277</div> 1278 1279 1280<h3><a name="setting">Handler Setting</a></h3> 1281 1282<p>Although handlers are typically set prior to parsing and left alone, an 1283application may choose to set or change the handler for a parsing event 1284while the parse is in progress. For instance, your application may choose 1285to ignore all text not descended from a <code>para</code> element. One 1286way it could do this is to set the character handler when a para start tag 1287is seen, and unset it for the corresponding end tag.</p> 1288 1289<p>A handler may be <em>unset</em> by providing a NULL pointer to the 1290appropriate handler setter. None of the handler setting functions have 1291a return value.</p> 1292 1293<p>Your handlers will be receiving strings in arrays of type 1294<code>XML_Char</code>. This type is conditionally defined in expat.h as 1295either <code>char</code>, <code>wchar_t</code> or <code>unsigned short</code>. 1296The former implies UTF-8 encoding, the latter two imply UTF-16 encoding. 1297Note that you'll receive them in this form independent of the original 1298encoding of the document.</p> 1299 1300<div class="handler"> 1301<h4 id="XML_SetStartElementHandler">XML_SetStartElementHandler</h4> 1302<pre class="setter"> 1303void XMLCALL 1304XML_SetStartElementHandler(XML_Parser p, 1305 XML_StartElementHandler start); 1306</pre> 1307<pre class="signature"> 1308typedef void 1309(XMLCALL *XML_StartElementHandler)(void *userData, 1310 const XML_Char *name, 1311 const XML_Char **atts); 1312</pre> 1313<p>Set handler for start (and empty) tags. Attributes are passed to the start 1314handler as a pointer to a vector of char pointers. Each attribute seen in 1315a start (or empty) tag occupies 2 consecutive places in this vector: the 1316attribute name followed by the attribute value. These pairs are terminated 1317by a null pointer.</p> 1318<p>Note that an empty tag generates a call to both start and end handlers 1319(in that order).</p> 1320</div> 1321 1322<div class="handler"> 1323<h4 id="XML_SetEndElementHandler">XML_SetEndElementHandler</h4> 1324<pre class="setter"> 1325void XMLCALL 1326XML_SetEndElementHandler(XML_Parser p, 1327 XML_EndElementHandler); 1328</pre> 1329<pre class="signature"> 1330typedef void 1331(XMLCALL *XML_EndElementHandler)(void *userData, 1332 const XML_Char *name); 1333</pre> 1334<p>Set handler for end (and empty) tags. As noted above, an empty tag 1335generates a call to both start and end handlers.</p> 1336</div> 1337 1338<div class="handler"> 1339<h4 id="XML_SetElementHandler">XML_SetElementHandler</h4> 1340<pre class="setter"> 1341void XMLCALL 1342XML_SetElementHandler(XML_Parser p, 1343 XML_StartElementHandler start, 1344 XML_EndElementHandler end); 1345</pre> 1346<p>Set handlers for start and end tags with one call.</p> 1347</div> 1348 1349<div class="handler"> 1350<h4 id="XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</h4> 1351<pre class="setter"> 1352void XMLCALL 1353XML_SetCharacterDataHandler(XML_Parser p, 1354 XML_CharacterDataHandler charhndl) 1355</pre> 1356<pre class="signature"> 1357typedef void 1358(XMLCALL *XML_CharacterDataHandler)(void *userData, 1359 const XML_Char *s, 1360 int len); 1361</pre> 1362<p>Set a text handler. The string your handler receives 1363is <em>NOT null-terminated</em>. You have to use the length argument 1364to deal with the end of the string. A single block of contiguous text 1365free of markup may still result in a sequence of calls to this handler. 1366In other words, if you're searching for a pattern in the text, it may 1367be split across calls to this handler. Note: Setting this handler to NULL 1368may <em>NOT immediately</em> terminate call-backs if the parser is currently 1369processing such a single block of contiguous markup-free text, as the parser 1370will continue calling back until the end of the block is reached.</p> 1371</div> 1372 1373<div class="handler"> 1374<h4 id="XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</h4> 1375<pre class="setter"> 1376void XMLCALL 1377XML_SetProcessingInstructionHandler(XML_Parser p, 1378 XML_ProcessingInstructionHandler proc) 1379</pre> 1380<pre class="signature"> 1381typedef void 1382(XMLCALL *XML_ProcessingInstructionHandler)(void *userData, 1383 const XML_Char *target, 1384 const XML_Char *data); 1385 1386</pre> 1387<p>Set a handler for processing instructions. The target is the first word 1388in the processing instruction. The data is the rest of the characters in 1389it after skipping all whitespace after the initial word.</p> 1390</div> 1391 1392<div class="handler"> 1393<h4 id="XML_SetCommentHandler">XML_SetCommentHandler</h4> 1394<pre class="setter"> 1395void XMLCALL 1396XML_SetCommentHandler(XML_Parser p, 1397 XML_CommentHandler cmnt) 1398</pre> 1399<pre class="signature"> 1400typedef void 1401(XMLCALL *XML_CommentHandler)(void *userData, 1402 const XML_Char *data); 1403</pre> 1404<p>Set a handler for comments. The data is all text inside the comment 1405delimiters.</p> 1406</div> 1407 1408<div class="handler"> 1409<h4 id="XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</h4> 1410<pre class="setter"> 1411void XMLCALL 1412XML_SetStartCdataSectionHandler(XML_Parser p, 1413 XML_StartCdataSectionHandler start); 1414</pre> 1415<pre class="signature"> 1416typedef void 1417(XMLCALL *XML_StartCdataSectionHandler)(void *userData); 1418</pre> 1419<p>Set a handler that gets called at the beginning of a CDATA section.</p> 1420</div> 1421 1422<div class="handler"> 1423<h4 id="XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</h4> 1424<pre class="setter"> 1425void XMLCALL 1426XML_SetEndCdataSectionHandler(XML_Parser p, 1427 XML_EndCdataSectionHandler end); 1428</pre> 1429<pre class="signature"> 1430typedef void 1431(XMLCALL *XML_EndCdataSectionHandler)(void *userData); 1432</pre> 1433<p>Set a handler that gets called at the end of a CDATA section.</p> 1434</div> 1435 1436<div class="handler"> 1437<h4 id="XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</h4> 1438<pre class="setter"> 1439void XMLCALL 1440XML_SetCdataSectionHandler(XML_Parser p, 1441 XML_StartCdataSectionHandler start, 1442 XML_EndCdataSectionHandler end) 1443</pre> 1444<p>Sets both CDATA section handlers with one call.</p> 1445</div> 1446 1447<div class="handler"> 1448<h4 id="XML_SetDefaultHandler">XML_SetDefaultHandler</h4> 1449<pre class="setter"> 1450void XMLCALL 1451XML_SetDefaultHandler(XML_Parser p, 1452 XML_DefaultHandler hndl) 1453</pre> 1454<pre class="signature"> 1455typedef void 1456(XMLCALL *XML_DefaultHandler)(void *userData, 1457 const XML_Char *s, 1458 int len); 1459</pre> 1460 1461<p>Sets a handler for any characters in the document which wouldn't 1462otherwise be handled. This includes both data for which no handlers 1463can be set (like some kinds of DTD declarations) and data which could 1464be reported but which currently has no handler set. The characters 1465are passed exactly as they were present in the XML document except 1466that they will be encoded in UTF-8 or UTF-16. Line boundaries are not 1467normalized. Note that a byte order mark character is not passed to the 1468default handler. There are no guarantees about how characters are 1469divided between calls to the default handler: for example, a comment 1470might be split between multiple calls. Setting the handler with 1471this call has the side effect of turning off expansion of references 1472to internally defined general entities. Instead these references are 1473passed to the default handler.</p> 1474 1475<p>See also <code><a 1476href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.</p> 1477</div> 1478 1479<div class="handler"> 1480<h4 id="XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</h4> 1481<pre class="setter"> 1482void XMLCALL 1483XML_SetDefaultHandlerExpand(XML_Parser p, 1484 XML_DefaultHandler hndl) 1485</pre> 1486<pre class="signature"> 1487typedef void 1488(XMLCALL *XML_DefaultHandler)(void *userData, 1489 const XML_Char *s, 1490 int len); 1491</pre> 1492<p>This sets a default handler, but doesn't inhibit the expansion of 1493internal entity references. The entity reference will not be passed 1494to the default handler.</p> 1495 1496<p>See also <code><a 1497href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>.</p> 1498</div> 1499 1500<div class="handler"> 1501<h4 id="XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</h4> 1502<pre class="setter"> 1503void XMLCALL 1504XML_SetExternalEntityRefHandler(XML_Parser p, 1505 XML_ExternalEntityRefHandler hndl) 1506</pre> 1507<pre class="signature"> 1508typedef int 1509(XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p, 1510 const XML_Char *context, 1511 const XML_Char *base, 1512 const XML_Char *systemId, 1513 const XML_Char *publicId); 1514</pre> 1515<p>Set an external entity reference handler. This handler is also 1516called for processing an external DTD subset if parameter entity parsing 1517is in effect. (See <a href="#XML_SetParamEntityParsing"> 1518<code>XML_SetParamEntityParsing</code></a>.)</p> 1519 1520<p>The <code>context</code> parameter specifies the parsing context in 1521the format expected by the <code>context</code> argument to <code><a 1522href="#XML_ExternalEntityParserCreate" 1523>XML_ExternalEntityParserCreate</a></code>. <code>code</code> is 1524valid only until the handler returns, so if the referenced entity is 1525to be parsed later, it must be copied. <code>context</code> is NULL 1526only when the entity is a parameter entity, which is how one can 1527differentiate between general and parameter entities.</p> 1528 1529<p>The <code>base</code> parameter is the base to use for relative 1530system identifiers. It is set by <code><a 1531href="#XML_SetBase">XML_SetBase</a></code> and may be NULL. The 1532<code>publicId</code> parameter is the public id given in the entity 1533declaration and may be NULL. <code>systemId</code> is the system 1534identifier specified in the entity declaration and is never NULL.</p> 1535 1536<p>There are a couple of ways in which this handler differs from 1537others. First, this handler returns a status indicator (an 1538integer). <code>XML_STATUS_OK</code> should be returned for successful 1539handling of the external entity reference. Returning 1540<code>XML_STATUS_ERROR</code> indicates failure, and causes the 1541calling parser to return an 1542<code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error.</p> 1543 1544<p>Second, instead of having the user data as its first argument, it 1545receives the parser that encountered the entity reference. This, along 1546with the context parameter, may be used as arguments to a call to 1547<code><a href= "#XML_ExternalEntityParserCreate" 1548>XML_ExternalEntityParserCreate</a></code>. Using the returned 1549parser, the body of the external entity can be recursively parsed.</p> 1550 1551<p>Since this handler may be called recursively, it should not be saving 1552information into global or static variables.</p> 1553</div> 1554 1555<h4 id="XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</h4> 1556<pre class="fcndec"> 1557void XMLCALL 1558XML_SetExternalEntityRefHandlerArg(XML_Parser p, 1559 void *arg) 1560</pre> 1561<div class="fcndef"> 1562<p>Set the argument passed to the ExternalEntityRefHandler. If 1563<code>arg</code> is not NULL, it is the new value passed to the 1564handler set using <code><a href="#XML_SetExternalEntityRefHandler" 1565>XML_SetExternalEntityRefHandler</a></code>; if <code>arg</code> is 1566NULL, the argument passed to the handler function will be the parser 1567object itself.</p> 1568 1569<p><strong>Note:</strong> 1570The type of <code>arg</code> and the type of the first argument to the 1571ExternalEntityRefHandler do not match. This function takes a 1572<code>void *</code> to be passed to the handler, while the handler 1573accepts an <code>XML_Parser</code>. This is a historical accident, 1574but will not be corrected before Expat 2.0 (at the earliest) to avoid 1575causing compiler warnings for code that's known to work with this 1576API. It is the responsibility of the application code to know the 1577actual type of the argument passed to the handler and to manage it 1578properly.</p> 1579</div> 1580 1581<div class="handler"> 1582<h4 id="XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</h4> 1583<pre class="setter"> 1584void XMLCALL 1585XML_SetSkippedEntityHandler(XML_Parser p, 1586 XML_SkippedEntityHandler handler) 1587</pre> 1588<pre class="signature"> 1589typedef void 1590(XMLCALL *XML_SkippedEntityHandler)(void *userData, 1591 const XML_Char *entityName, 1592 int is_parameter_entity); 1593</pre> 1594<p>Set a skipped entity handler. This is called in two situations:</p> 1595<ol> 1596 <li>An entity reference is encountered for which no declaration 1597 has been read <em>and</em> this is not an error.</li> 1598 <li>An internal entity reference is read, but not expanded, because 1599 <a href="#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> 1600 has been called.</li> 1601</ol> 1602<p>The <code>is_parameter_entity</code> argument will be non-zero for 1603a parameter entity and zero for a general entity.</p> <p>Note: Skipped 1604parameter entities in declarations and skipped general entities in 1605attribute values cannot be reported, because the event would be out of 1606sync with the reporting of the declarations or attribute values</p> 1607</div> 1608 1609<div class="handler"> 1610<h4 id="XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</h4> 1611<pre class="setter"> 1612void XMLCALL 1613XML_SetUnknownEncodingHandler(XML_Parser p, 1614 XML_UnknownEncodingHandler enchandler, 1615 void *encodingHandlerData) 1616</pre> 1617<pre class="signature"> 1618typedef int 1619(XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData, 1620 const XML_Char *name, 1621 XML_Encoding *info); 1622 1623typedef struct { 1624 int map[256]; 1625 void *data; 1626 int (XMLCALL *convert)(void *data, const char *s); 1627 void (XMLCALL *release)(void *data); 1628} XML_Encoding; 1629</pre> 1630<p>Set a handler to deal with encodings other than the <a 1631href="#builtin_encodings">built in set</a>. This should be done before 1632<code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a href= 1633"#XML_ParseBuffer" >XML_ParseBuffer</a></code> have been called on the 1634given parser.</p> <p>If the handler knows how to deal with an encoding 1635with the given name, it should fill in the <code>info</code> data 1636structure and return <code>XML_STATUS_OK</code>. Otherwise it 1637should return <code>XML_STATUS_ERROR</code>. The handler will be called 1638at most once per parsed (external) entity. The optional application 1639data pointer <code>encodingHandlerData</code> will be passed back to 1640the handler.</p> 1641 1642<p>The map array contains information for every possible leading 1643byte in a byte sequence. If the corresponding value is >= 0, then it's 1644a single byte sequence and the byte encodes that Unicode value. If the 1645value is -1, then that byte is invalid as the initial byte in a sequence. 1646If the value is -n, where n is an integer > 1, then n is the number of 1647bytes in the sequence and the actual conversion is accomplished by a 1648call to the function pointed at by convert. This function may return -1 1649if the sequence itself is invalid. The convert pointer may be null if 1650there are only single byte codes. The data parameter passed to the convert 1651function is the data pointer from <code>XML_Encoding</code>. The 1652string s is <em>NOT</em> null-terminated and points at the sequence of 1653bytes to be converted.</p> 1654 1655<p>The function pointed at by <code>release</code> is called by the 1656parser when it is finished with the encoding. It may be NULL.</p> 1657</div> 1658 1659<div class="handler"> 1660<h4 id="XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</h4> 1661<pre class="setter"> 1662void XMLCALL 1663XML_SetStartNamespaceDeclHandler(XML_Parser p, 1664 XML_StartNamespaceDeclHandler start); 1665</pre> 1666<pre class="signature"> 1667typedef void 1668(XMLCALL *XML_StartNamespaceDeclHandler)(void *userData, 1669 const XML_Char *prefix, 1670 const XML_Char *uri); 1671</pre> 1672<p>Set a handler to be called when a namespace is declared. Namespace 1673declarations occur inside start tags. But the namespace declaration start 1674handler is called before the start tag handler for each namespace declared 1675in that start tag.</p> 1676</div> 1677 1678<div class="handler"> 1679<h4 id="XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</h4> 1680<pre class="setter"> 1681void XMLCALL 1682XML_SetEndNamespaceDeclHandler(XML_Parser p, 1683 XML_EndNamespaceDeclHandler end); 1684</pre> 1685<pre class="signature"> 1686typedef void 1687(XMLCALL *XML_EndNamespaceDeclHandler)(void *userData, 1688 const XML_Char *prefix); 1689</pre> 1690<p>Set a handler to be called when leaving the scope of a namespace 1691declaration. This will be called, for each namespace declaration, 1692after the handler for the end tag of the element in which the 1693namespace was declared.</p> 1694</div> 1695 1696<div class="handler"> 1697<h4 id="XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</h4> 1698<pre class="setter"> 1699void XMLCALL 1700XML_SetNamespaceDeclHandler(XML_Parser p, 1701 XML_StartNamespaceDeclHandler start, 1702 XML_EndNamespaceDeclHandler end) 1703</pre> 1704<p>Sets both namespace declaration handlers with a single call.</p> 1705</div> 1706 1707<div class="handler"> 1708<h4 id="XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</h4> 1709<pre class="setter"> 1710void XMLCALL 1711XML_SetXmlDeclHandler(XML_Parser p, 1712 XML_XmlDeclHandler xmldecl); 1713</pre> 1714<pre class="signature"> 1715typedef void 1716(XMLCALL *XML_XmlDeclHandler)(void *userData, 1717 const XML_Char *version, 1718 const XML_Char *encoding, 1719 int standalone); 1720</pre> 1721<p>Sets a handler that is called for XML declarations and also for 1722text declarations discovered in external entities. The way to 1723distinguish is that the <code>version</code> parameter will be NULL 1724for text declarations. The <code>encoding</code> parameter may be NULL 1725for an XML declaration. The <code>standalone</code> argument will 1726contain -1, 0, or 1 indicating respectively that there was no 1727standalone parameter in the declaration, that it was given as no, or 1728that it was given as yes.</p> 1729</div> 1730 1731<div class="handler"> 1732<h4 id="XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</h4> 1733<pre class="setter"> 1734void XMLCALL 1735XML_SetStartDoctypeDeclHandler(XML_Parser p, 1736 XML_StartDoctypeDeclHandler start); 1737</pre> 1738<pre class="signature"> 1739typedef void 1740(XMLCALL *XML_StartDoctypeDeclHandler)(void *userData, 1741 const XML_Char *doctypeName, 1742 const XML_Char *sysid, 1743 const XML_Char *pubid, 1744 int has_internal_subset); 1745</pre> 1746<p>Set a handler that is called at the start of a DOCTYPE declaration, 1747before any external or internal subset is parsed. Both <code>sysid</code> 1748and <code>pubid</code> may be NULL. The <code>has_internal_subset</code> 1749will be non-zero if the DOCTYPE declaration has an internal subset.</p> 1750</div> 1751 1752<div class="handler"> 1753<h4 id="XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</h4> 1754<pre class="setter"> 1755void XMLCALL 1756XML_SetEndDoctypeDeclHandler(XML_Parser p, 1757 XML_EndDoctypeDeclHandler end); 1758</pre> 1759<pre class="signature"> 1760typedef void 1761(XMLCALL *XML_EndDoctypeDeclHandler)(void *userData); 1762</pre> 1763<p>Set a handler that is called at the end of a DOCTYPE declaration, 1764after parsing any external subset.</p> 1765</div> 1766 1767<div class="handler"> 1768<h4 id="XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</h4> 1769<pre class="setter"> 1770void XMLCALL 1771XML_SetDoctypeDeclHandler(XML_Parser p, 1772 XML_StartDoctypeDeclHandler start, 1773 XML_EndDoctypeDeclHandler end); 1774</pre> 1775<p>Set both doctype handlers with one call.</p> 1776</div> 1777 1778<div class="handler"> 1779<h4 id="XML_SetElementDeclHandler">XML_SetElementDeclHandler</h4> 1780<pre class="setter"> 1781void XMLCALL 1782XML_SetElementDeclHandler(XML_Parser p, 1783 XML_ElementDeclHandler eldecl); 1784</pre> 1785<pre class="signature"> 1786typedef void 1787(XMLCALL *XML_ElementDeclHandler)(void *userData, 1788 const XML_Char *name, 1789 XML_Content *model); 1790</pre> 1791<pre class="signature"> 1792enum XML_Content_Type { 1793 XML_CTYPE_EMPTY = 1, 1794 XML_CTYPE_ANY, 1795 XML_CTYPE_MIXED, 1796 XML_CTYPE_NAME, 1797 XML_CTYPE_CHOICE, 1798 XML_CTYPE_SEQ 1799}; 1800 1801enum XML_Content_Quant { 1802 XML_CQUANT_NONE, 1803 XML_CQUANT_OPT, 1804 XML_CQUANT_REP, 1805 XML_CQUANT_PLUS 1806}; 1807 1808typedef struct XML_cp XML_Content; 1809 1810struct XML_cp { 1811 enum XML_Content_Type type; 1812 enum XML_Content_Quant quant; 1813 const XML_Char * name; 1814 unsigned int numchildren; 1815 XML_Content * children; 1816}; 1817</pre> 1818<p>Sets a handler for element declarations in a DTD. The handler gets 1819called with the name of the element in the declaration and a pointer 1820to a structure that contains the element model. It's the user code's 1821responsibility to free model when finished with it. See <code> 1822<a href="#XML_FreeContentModel">XML_FreeContentModel</a></code>. 1823There is no need to free the model from the handler, it can be kept 1824around and freed at a later stage.</p> 1825 1826<p>The <code>model</code> argument is the root of a tree of 1827<code>XML_Content</code> nodes. If <code>type</code> equals 1828<code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then 1829<code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other 1830fields will be zero or NULL. If <code>type</code> is 1831<code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be 1832<code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and 1833<code>numchildren</code> will contain the number of elements that are 1834allowed to be mixed in and <code>children</code> points to an array of 1835<code>XML_Content</code> structures that will all have type 1836XML_CTYPE_NAME with no quantification. Only the root node can be type 1837<code>XML_CTYPE_EMPTY</code>, <code>XML_CTYPE_ANY</code>, or 1838<code>XML_CTYPE_MIXED</code>.</p> 1839 1840<p>For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field 1841points to the name and the <code>numchildren</code> and 1842<code>children</code> fields will be zero and NULL. The 1843<code>quant</code> field will indicate any quantifiers placed on the 1844name.</p> 1845 1846<p>Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> 1847indicate a choice or sequence respectively. The 1848<code>numchildren</code> field indicates how many nodes in the choice 1849or sequence and <code>children</code> points to the nodes.</p> 1850</div> 1851 1852<div class="handler"> 1853<h4 id="XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</h4> 1854<pre class="setter"> 1855void XMLCALL 1856XML_SetAttlistDeclHandler(XML_Parser p, 1857 XML_AttlistDeclHandler attdecl); 1858</pre> 1859<pre class="signature"> 1860typedef void 1861(XMLCALL *XML_AttlistDeclHandler)(void *userData, 1862 const XML_Char *elname, 1863 const XML_Char *attname, 1864 const XML_Char *att_type, 1865 const XML_Char *dflt, 1866 int isrequired); 1867</pre> 1868<p>Set a handler for attlist declarations in the DTD. This handler is 1869called for <em>each</em> attribute. So a single attlist declaration 1870with multiple attributes declared will generate multiple calls to this 1871handler. The <code>elname</code> parameter returns the name of the 1872element for which the attribute is being declared. The attribute name 1873is in the <code>attname</code> parameter. The attribute type is in the 1874<code>att_type</code> parameter. It is the string representing the 1875type in the declaration with whitespace removed.</p> 1876 1877<p>The <code>dflt</code> parameter holds the default value. It will be 1878NULL in the case of "#IMPLIED" or "#REQUIRED" attributes. You can 1879distinguish these two cases by checking the <code>isrequired</code> 1880parameter, which will be true in the case of "#REQUIRED" attributes. 1881Attributes which are "#FIXED" will have also have a true 1882<code>isrequired</code>, but they will have the non-NULL fixed value 1883in the <code>dflt</code> parameter.</p> 1884</div> 1885 1886<div class="handler"> 1887<h4 id="XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</h4> 1888<pre class="setter"> 1889void XMLCALL 1890XML_SetEntityDeclHandler(XML_Parser p, 1891 XML_EntityDeclHandler handler); 1892</pre> 1893<pre class="signature"> 1894typedef void 1895(XMLCALL *XML_EntityDeclHandler)(void *userData, 1896 const XML_Char *entityName, 1897 int is_parameter_entity, 1898 const XML_Char *value, 1899 int value_length, 1900 const XML_Char *base, 1901 const XML_Char *systemId, 1902 const XML_Char *publicId, 1903 const XML_Char *notationName); 1904</pre> 1905<p>Sets a handler that will be called for all entity declarations. 1906The <code>is_parameter_entity</code> argument will be non-zero in the 1907case of parameter entities and zero otherwise.</p> 1908 1909<p>For internal entities (<code><!ENTITY foo "bar"></code>), 1910<code>value</code> will be non-NULL and <code>systemId</code>, 1911<code>publicId</code>, and <code>notationName</code> will all be NULL. 1912The value string is <em>not</em> NULL terminated; the length is 1913provided in the <code>value_length</code> parameter. Do not use 1914<code>value_length</code> to test for internal entities, since it is 1915legal to have zero-length values. Instead check for whether or not 1916<code>value</code> is NULL.</p> <p>The <code>notationName</code> 1917argument will have a non-NULL value only for unparsed entity 1918declarations.</p> 1919</div> 1920 1921<div class="handler"> 1922<h4 id="XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</h4> 1923<pre class="setter"> 1924void XMLCALL 1925XML_SetUnparsedEntityDeclHandler(XML_Parser p, 1926 XML_UnparsedEntityDeclHandler h) 1927</pre> 1928<pre class="signature"> 1929typedef void 1930(XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData, 1931 const XML_Char *entityName, 1932 const XML_Char *base, 1933 const XML_Char *systemId, 1934 const XML_Char *publicId, 1935 const XML_Char *notationName); 1936</pre> 1937<p>Set a handler that receives declarations of unparsed entities. These 1938are entity declarations that have a notation (NDATA) field:</p> 1939 1940<div id="eg"><pre> 1941<!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> 1942</pre></div> 1943<p>This handler is obsolete and is provided for backwards 1944compatibility. Use instead <a href= "#XML_SetEntityDeclHandler" 1945>XML_SetEntityDeclHandler</a>.</p> 1946</div> 1947 1948<div class="handler"> 1949<h4 id="XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</h4> 1950<pre class="setter"> 1951void XMLCALL 1952XML_SetNotationDeclHandler(XML_Parser p, 1953 XML_NotationDeclHandler h) 1954</pre> 1955<pre class="signature"> 1956typedef void 1957(XMLCALL *XML_NotationDeclHandler)(void *userData, 1958 const XML_Char *notationName, 1959 const XML_Char *base, 1960 const XML_Char *systemId, 1961 const XML_Char *publicId); 1962</pre> 1963<p>Set a handler that receives notation declarations.</p> 1964</div> 1965 1966<div class="handler"> 1967<h4 id="XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</h4> 1968<pre class="setter"> 1969void XMLCALL 1970XML_SetNotStandaloneHandler(XML_Parser p, 1971 XML_NotStandaloneHandler h) 1972</pre> 1973<pre class="signature"> 1974typedef int 1975(XMLCALL *XML_NotStandaloneHandler)(void *userData); 1976</pre> 1977<p>Set a handler that is called if the document is not "standalone". 1978This happens when there is an external subset or a reference to a 1979parameter entity, but does not have standalone set to "yes" in an XML 1980declaration. If this handler returns <code>XML_STATUS_ERROR</code>, 1981then the parser will throw an <code>XML_ERROR_NOT_STANDALONE</code> 1982error.</p> 1983</div> 1984 1985<h3><a name="position">Parse position and error reporting functions</a></h3> 1986 1987<p>These are the functions you'll want to call when the parse 1988functions return <code>XML_STATUS_ERROR</code> (a parse error has 1989occurred), although the position reporting functions are useful outside 1990of errors. The position reported is the byte position (in the original 1991document or entity encoding) of the first of the sequence of 1992characters that generated the current event (or the error that caused 1993the parse functions to return <code>XML_STATUS_ERROR</code>.) The 1994exceptions are callbacks triggered by declarations in the document 1995prologue, in which case they exact position reported is somewhere in the 1996relevant markup, but not necessarily as meaningful as for other 1997events.</p> 1998 1999<p>The position reporting functions are accurate only outside of the 2000DTD. In other words, they usually return bogus information when 2001called from within a DTD declaration handler.</p> 2002 2003<h4 id="XML_GetErrorCode">XML_GetErrorCode</h4> 2004<pre class="fcndec"> 2005enum XML_Error XMLCALL 2006XML_GetErrorCode(XML_Parser p); 2007</pre> 2008<div class="fcndef"> 2009Return what type of error has occurred. 2010</div> 2011 2012<h4 id="XML_ErrorString">XML_ErrorString</h4> 2013<pre class="fcndec"> 2014const XML_LChar * XMLCALL 2015XML_ErrorString(enum XML_Error code); 2016</pre> 2017<div class="fcndef"> 2018Return a string describing the error corresponding to code. 2019The code should be one of the enums that can be returned from 2020<code><a href= "#XML_GetErrorCode" >XML_GetErrorCode</a></code>. 2021</div> 2022 2023<h4 id="XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</h4> 2024<pre class="fcndec"> 2025XML_Index XMLCALL 2026XML_GetCurrentByteIndex(XML_Parser p); 2027</pre> 2028<div class="fcndef"> 2029Return the byte offset of the position. This always corresponds to 2030the values returned by <code><a href= "#XML_GetCurrentLineNumber" 2031>XML_GetCurrentLineNumber</a></code> and <code><a href= 2032"#XML_GetCurrentColumnNumber" >XML_GetCurrentColumnNumber</a></code>. 2033</div> 2034 2035<h4 id="XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</h4> 2036<pre class="fcndec"> 2037XML_Size XMLCALL 2038XML_GetCurrentLineNumber(XML_Parser p); 2039</pre> 2040<div class="fcndef"> 2041Return the line number of the position. The first line is reported as 2042<code>1</code>. 2043</div> 2044 2045<h4 id="XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</h4> 2046<pre class="fcndec"> 2047XML_Size XMLCALL 2048XML_GetCurrentColumnNumber(XML_Parser p); 2049</pre> 2050<div class="fcndef"> 2051Return the offset, from the beginning of the current line, of 2052the position. 2053</div> 2054 2055<h4 id="XML_GetCurrentByteCount">XML_GetCurrentByteCount</h4> 2056<pre class="fcndec"> 2057int XMLCALL 2058XML_GetCurrentByteCount(XML_Parser p); 2059</pre> 2060<div class="fcndef"> 2061Return the number of bytes in the current event. Returns 2062<code>0</code> if the event is inside a reference to an internal 2063entity and for the end-tag event for empty element tags (the later can 2064be used to distinguish empty-element tags from empty elements using 2065separate start and end tags). 2066</div> 2067 2068<h4 id="XML_GetInputContext">XML_GetInputContext</h4> 2069<pre class="fcndec"> 2070const char * XMLCALL 2071XML_GetInputContext(XML_Parser p, 2072 int *offset, 2073 int *size); 2074</pre> 2075<div class="fcndef"> 2076 2077<p>Returns the parser's input buffer, sets the integer pointed at by 2078<code>offset</code> to the offset within this buffer of the current 2079parse position, and set the integer pointed at by <code>size</code> to 2080the size of the returned buffer.</p> 2081 2082<p>This should only be called from within a handler during an active 2083parse and the returned buffer should only be referred to from within 2084the handler that made the call. This input buffer contains the 2085untranslated bytes of the input.</p> 2086 2087<p>Only a limited amount of context is kept, so if the event 2088triggering a call spans over a very large amount of input, the actual 2089parse position may be before the beginning of the buffer.</p> 2090 2091<p>If <code>XML_CONTEXT_BYTES</code> is not defined, this will always 2092return NULL.</p> 2093</div> 2094 2095<h3><a name="billion-laughs">Billion Laughs Attack Protection</a></h3> 2096 2097<p>The functions in this section configure the built-in 2098 protection against various forms of 2099 <a href="https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs attacks</a>.</p> 2100 2101<h4 id="XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</h4> 2102<pre class="fcndec"> 2103/* Added in Expat 2.4.0. */ 2104XML_Bool XMLCALL 2105XML_SetBillionLaughsAttackProtectionMaximumAmplification(XML_Parser p, 2106 float maximumAmplificationFactor); 2107</pre> 2108<div class="fcndef"> 2109 <p> 2110 Sets the maximum tolerated amplification factor 2111 for protection against 2112 <a href="https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs attacks</a> 2113 (default: <code>100.0</code>) 2114 of parser <code>p</code> to <code>maximumAmplificationFactor</code>, and 2115 returns <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 2116 </p> 2117 2118 The amplification factor is calculated as .. 2119 <pre> 2120 amplification := (direct + indirect) / direct 2121 </pre> 2122 .. while parsing, whereas 2123 <code>direct</code> is the number of bytes read from the primary document in parsing and 2124 <code>indirect</code> is the number of bytes added by expanding entities and reading of external DTD files, combined. 2125 2126 <p>For a call to <code>XML_SetBillionLaughsAttackProtectionMaximumAmplification</code> to succeed:</p> 2127 <ul> 2128 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without any parent parsers) and</li> 2129 <li><code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and greater than or equal to <code>1.0</code>.</li> 2130 </ul> 2131 2132 <p> 2133 <strong>Note:</strong> 2134 If you ever need to increase this value for non-attack payload, 2135 please <a href="https://github.com/libexpat/libexpat/issues">file a bug report</a>. 2136 </p> 2137 2138 <p> 2139 <strong>Note:</strong> 2140 Peak amplifications 2141 of factor 15,000 for the entire payload and 2142 of factor 30,000 in the middle of parsing 2143 have been observed with small benign files in practice. 2144 2145 So if you do reduce the maximum allowed amplification, 2146 please make sure that the activation threshold is still big enough 2147 to not end up with undesired false positives (i.e. benign files being rejected). 2148 </p> 2149</div> 2150 2151<h4 id="XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</h4> 2152<pre class="fcndec"> 2153/* Added in Expat 2.4.0. */ 2154XML_Bool XMLCALL 2155XML_SetBillionLaughsAttackProtectionActivationThreshold(XML_Parser p, 2156 unsigned long long activationThresholdBytes); 2157</pre> 2158<div class="fcndef"> 2159 <p> 2160 Sets number of output bytes (including amplification from entity expansion and reading DTD files) 2161 needed to activate protection against 2162 <a href="https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs attacks</a> 2163 (default: <code>8 MiB</code>) 2164 of parser <code>p</code> to <code>activationThresholdBytes</code>, and 2165 returns <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 2166 </p> 2167 2168 <p>For a call to <code>XML_SetBillionLaughsAttackProtectionActivationThreshold</code> to succeed:</p> 2169 <ul> 2170 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without any parent parsers).</li> 2171 </ul> 2172 2173 <p> 2174 <strong>Note:</strong> 2175 If you ever need to increase this value for non-attack payload, 2176 please <a href="https://github.com/libexpat/libexpat/issues">file a bug report</a>. 2177 </p> 2178 2179 <p> 2180 <strong>Note:</strong> 2181 Activation thresholds below 4 MiB are known to break support for 2182 <a href="https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</a> 1.3 payload 2183 and are hence not recommended. 2184 </p> 2185</div> 2186 2187<h3><a name="miscellaneous">Miscellaneous functions</a></h3> 2188 2189<p>The functions in this section either obtain state information from 2190the parser or can be used to dynamically set parser options.</p> 2191 2192<h4 id="XML_SetUserData">XML_SetUserData</h4> 2193<pre class="fcndec"> 2194void XMLCALL 2195XML_SetUserData(XML_Parser p, 2196 void *userData); 2197</pre> 2198<div class="fcndef"> 2199This sets the user data pointer that gets passed to handlers. It 2200overwrites any previous value for this pointer. Note that the 2201application is responsible for freeing the memory associated with 2202<code>userData</code> when it is finished with the parser. So if you 2203call this when there's already a pointer there, and you haven't freed 2204the memory associated with it, then you've probably just leaked 2205memory. 2206</div> 2207 2208<h4 id="XML_GetUserData">XML_GetUserData</h4> 2209<pre class="fcndec"> 2210void * XMLCALL 2211XML_GetUserData(XML_Parser p); 2212</pre> 2213<div class="fcndef"> 2214This returns the user data pointer that gets passed to handlers. 2215It is actually implemented as a macro. 2216</div> 2217 2218<h4 id="XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</h4> 2219<pre class="fcndec"> 2220void XMLCALL 2221XML_UseParserAsHandlerArg(XML_Parser p); 2222</pre> 2223<div class="fcndef"> 2224After this is called, handlers receive the parser in their 2225<code>userData</code> arguments. The user data can still be obtained 2226using the <code><a href= "#XML_GetUserData" 2227>XML_GetUserData</a></code> function. 2228</div> 2229 2230<h4 id="XML_SetBase">XML_SetBase</h4> 2231<pre class="fcndec"> 2232enum XML_Status XMLCALL 2233XML_SetBase(XML_Parser p, 2234 const XML_Char *base); 2235</pre> 2236<div class="fcndef"> 2237Set the base to be used for resolving relative URIs in system 2238identifiers. The return value is <code>XML_STATUS_ERROR</code> if 2239there's no memory to store base, otherwise it's 2240<code>XML_STATUS_OK</code>. 2241</div> 2242 2243<h4 id="XML_GetBase">XML_GetBase</h4> 2244<pre class="fcndec"> 2245const XML_Char * XMLCALL 2246XML_GetBase(XML_Parser p); 2247</pre> 2248<div class="fcndef"> 2249Return the base for resolving relative URIs. 2250</div> 2251 2252<h4 id="XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</h4> 2253<pre class="fcndec"> 2254int XMLCALL 2255XML_GetSpecifiedAttributeCount(XML_Parser p); 2256</pre> 2257<div class="fcndef"> 2258When attributes are reported to the start handler in the atts vector, 2259attributes that were explicitly set in the element occur before any 2260attributes that receive their value from default information in an 2261ATTLIST declaration. This function returns the number of attributes 2262that were explicitly set times two, thus giving the offset in the 2263<code>atts</code> array passed to the start tag handler of the first 2264attribute set due to defaults. It supplies information for the last 2265call to a start handler. If called inside a start handler, then that 2266means the current call. 2267</div> 2268 2269<h4 id="XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</h4> 2270<pre class="fcndec"> 2271int XMLCALL 2272XML_GetIdAttributeIndex(XML_Parser p); 2273</pre> 2274<div class="fcndef"> 2275Returns the index of the ID attribute passed in the atts array in the 2276last call to <code><a href= "#XML_StartElementHandler" 2277>XML_StartElementHandler</a></code>, or -1 if there is no ID 2278attribute. If called inside a start handler, then that means the 2279current call. 2280</div> 2281 2282<h4 id="XML_GetAttributeInfo">XML_GetAttributeInfo</h4> 2283<pre class="fcndec"> 2284const XML_AttrInfo * XMLCALL 2285XML_GetAttributeInfo(XML_Parser parser); 2286</pre> 2287<pre class="signature"> 2288typedef struct { 2289 XML_Index nameStart; /* Offset to beginning of the attribute name. */ 2290 XML_Index nameEnd; /* Offset after the attribute name's last byte. */ 2291 XML_Index valueStart; /* Offset to beginning of the attribute value. */ 2292 XML_Index valueEnd; /* Offset after the attribute value's last byte. */ 2293} XML_AttrInfo; 2294</pre> 2295<div class="fcndef"> 2296Returns an array of <code>XML_AttrInfo</code> structures for the 2297attribute/value pairs passed in the last call to the 2298<code>XML_StartElementHandler</code> that were specified 2299in the start-tag rather than defaulted. Each attribute/value pair counts 2300as 1; thus the number of entries in the array is 2301<code>XML_GetSpecifiedAttributeCount(parser) / 2</code>. 2302</div> 2303 2304<h4 id="XML_SetEncoding">XML_SetEncoding</h4> 2305<pre class="fcndec"> 2306enum XML_Status XMLCALL 2307XML_SetEncoding(XML_Parser p, 2308 const XML_Char *encoding); 2309</pre> 2310<div class="fcndef"> 2311Set the encoding to be used by the parser. It is equivalent to 2312passing a non-null encoding argument to the parser creation functions. 2313It must not be called after <code><a href= "#XML_Parse" 2314>XML_Parse</a></code> or <code><a href= "#XML_ParseBuffer" 2315>XML_ParseBuffer</a></code> have been called on the given parser. 2316Returns <code>XML_STATUS_OK</code> on success or 2317<code>XML_STATUS_ERROR</code> on error. 2318</div> 2319 2320<h4 id="XML_SetParamEntityParsing">XML_SetParamEntityParsing</h4> 2321<pre class="fcndec"> 2322int XMLCALL 2323XML_SetParamEntityParsing(XML_Parser p, 2324 enum XML_ParamEntityParsing code); 2325</pre> 2326<div class="fcndef"> 2327This enables parsing of parameter entities, including the external 2328parameter entity that is the external DTD subset, according to 2329<code>code</code>. 2330The choices for <code>code</code> are: 2331<ul> 2332<li><code>XML_PARAM_ENTITY_PARSING_NEVER</code></li> 2333<li><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></li> 2334<li><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></li> 2335</ul> 2336<b>Note:</b> If <code>XML_SetParamEntityParsing</code> is called after 2337<code>XML_Parse</code> or <code>XML_ParseBuffer</code>, then it has 2338no effect and will always return 0. 2339</div> 2340 2341<h4 id="XML_SetHashSalt">XML_SetHashSalt</h4> 2342<pre class="fcndec"> 2343int XMLCALL 2344XML_SetHashSalt(XML_Parser p, 2345 unsigned long hash_salt); 2346</pre> 2347<div class="fcndef"> 2348Sets the hash salt to use for internal hash calculations. 2349Helps in preventing DoS attacks based on predicting hash 2350function behavior. In order to have an effect this must be called 2351before parsing has started. Returns 1 if successful, 0 when called 2352after <code>XML_Parse</code> or <code>XML_ParseBuffer</code>. 2353<p><b>Note:</b> This call is optional, as the parser will auto-generate 2354a new random salt value if no value has been set at the start of parsing.</p> 2355<p><b>Note:</b> One should not call <code>XML_SetHashSalt</code> with a 2356hash salt value of 0, as this value is used as sentinel value to indicate 2357that <code>XML_SetHashSalt</code> has <b>not</b> been called. Consequently 2358such a call will have no effect, even if it returns 1.</p> 2359</div> 2360 2361<h4 id="XML_UseForeignDTD">XML_UseForeignDTD</h4> 2362<pre class="fcndec"> 2363enum XML_Error XMLCALL 2364XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD); 2365</pre> 2366<div class="fcndef"> 2367<p>This function allows an application to provide an external subset 2368for the document type declaration for documents which do not specify 2369an external subset of their own. For documents which specify an 2370external subset in their DOCTYPE declaration, the application-provided 2371subset will be ignored. If the document does not contain a DOCTYPE 2372declaration at all and <code>useDTD</code> is true, the 2373application-provided subset will be parsed, but the 2374<code>startDoctypeDeclHandler</code> and 2375<code>endDoctypeDeclHandler</code> functions, if set, will not be 2376called. The setting of parameter entity parsing, controlled using 2377<code><a href= "#XML_SetParamEntityParsing" 2378>XML_SetParamEntityParsing</a></code>, will be honored.</p> 2379 2380<p>The application-provided external subset is read by calling the 2381external entity reference handler set via <code><a href= 2382"#XML_SetExternalEntityRefHandler" 2383>XML_SetExternalEntityRefHandler</a></code> with both 2384<code>publicId</code> and <code>systemId</code> set to NULL.</p> 2385 2386<p>If this function is called after parsing has begun, it returns 2387<code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores 2388<code>useDTD</code>. If called when Expat has been compiled without 2389DTD support, it returns 2390<code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise, it 2391returns <code>XML_ERROR_NONE</code>.</p> 2392 2393<p><b>Note:</b> For the purpose of checking WFC: Entity Declared, passing 2394<code>useDTD == XML_TRUE</code> will make the parser behave as if 2395the document had a DTD with an external subset. This holds true even if 2396the external entity reference handler returns without action.</p> 2397</div> 2398 2399<h4 id="XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</h4> 2400<pre class="fcndec"> 2401void XMLCALL 2402XML_SetReturnNSTriplet(XML_Parser parser, 2403 int do_nst); 2404</pre> 2405<div class="fcndef"> 2406<p> 2407This function only has an effect when using a parser created with 2408<code><a href= "#XML_ParserCreateNS" >XML_ParserCreateNS</a></code>, 2409i.e. when namespace processing is in effect. The <code>do_nst</code> 2410sets whether or not prefixes are returned with names qualified with a 2411namespace prefix. If this function is called with <code>do_nst</code> 2412non-zero, then afterwards namespace qualified names (that is qualified 2413with a prefix as opposed to belonging to a default namespace) are 2414returned as a triplet with the three parts separated by the namespace 2415separator specified when the parser was created. The order of 2416returned parts is URI, local name, and prefix.</p> <p>If 2417<code>do_nst</code> is zero, then namespaces are reported in the 2418default manner, URI then local_name separated by the namespace 2419separator.</p> 2420</div> 2421 2422<h4 id="XML_DefaultCurrent">XML_DefaultCurrent</h4> 2423<pre class="fcndec"> 2424void XMLCALL 2425XML_DefaultCurrent(XML_Parser parser); 2426</pre> 2427<div class="fcndef"> 2428This can be called within a handler for a start element, end element, 2429processing instruction or character data. It causes the corresponding 2430markup to be passed to the default handler set by <code><a 2431href="#XML_SetDefaultHandler" >XML_SetDefaultHandler</a></code> or 2432<code><a href="#XML_SetDefaultHandlerExpand" 2433>XML_SetDefaultHandlerExpand</a></code>. It does nothing if there is 2434not a default handler. 2435</div> 2436 2437<h4 id="XML_ExpatVersion">XML_ExpatVersion</h4> 2438<pre class="fcndec"> 2439XML_LChar * XMLCALL 2440XML_ExpatVersion(); 2441</pre> 2442<div class="fcndef"> 2443Return the library version as a string (e.g. <code>"expat_1.95.1"</code>). 2444</div> 2445 2446<h4 id="XML_ExpatVersionInfo">XML_ExpatVersionInfo</h4> 2447<pre class="fcndec"> 2448struct XML_Expat_Version XMLCALL 2449XML_ExpatVersionInfo(); 2450</pre> 2451<pre class="signature"> 2452typedef struct { 2453 int major; 2454 int minor; 2455 int micro; 2456} XML_Expat_Version; 2457</pre> 2458<div class="fcndef"> 2459Return the library version information as a structure. 2460Some macros are also defined that support compile-time tests of the 2461library version: 2462<ul> 2463<li><code>XML_MAJOR_VERSION</code></li> 2464<li><code>XML_MINOR_VERSION</code></li> 2465<li><code>XML_MICRO_VERSION</code></li> 2466</ul> 2467Testing these constants is currently the best way to determine if 2468particular parts of the Expat API are available. 2469</div> 2470 2471<h4 id="XML_GetFeatureList">XML_GetFeatureList</h4> 2472<pre class="fcndec"> 2473const XML_Feature * XMLCALL 2474XML_GetFeatureList(); 2475</pre> 2476<pre class="signature"> 2477enum XML_FeatureEnum { 2478 XML_FEATURE_END = 0, 2479 XML_FEATURE_UNICODE, 2480 XML_FEATURE_UNICODE_WCHAR_T, 2481 XML_FEATURE_DTD, 2482 XML_FEATURE_CONTEXT_BYTES, 2483 XML_FEATURE_MIN_SIZE, 2484 XML_FEATURE_SIZEOF_XML_CHAR, 2485 XML_FEATURE_SIZEOF_XML_LCHAR, 2486 XML_FEATURE_NS, 2487 XML_FEATURE_LARGE_SIZE 2488}; 2489 2490typedef struct { 2491 enum XML_FeatureEnum feature; 2492 XML_LChar *name; 2493 long int value; 2494} XML_Feature; 2495</pre> 2496<div class="fcndef"> 2497<p>Returns a list of "feature" records, providing details on how 2498Expat was configured at compile time. Most applications should not 2499need to worry about this, but this information is otherwise not 2500available from Expat. This function allows code that does need to 2501check these features to do so at runtime.</p> 2502 2503<p>The return value is an array of <code>XML_Feature</code>, 2504terminated by a record with a <code>feature</code> of 2505<code>XML_FEATURE_END</code> and <code>name</code> of NULL, 2506identifying the feature-test macros Expat was compiled with. Since an 2507application that requires this kind of information needs to determine 2508the type of character the <code>name</code> points to, records for the 2509<code>XML_FEATURE_SIZEOF_XML_CHAR</code> and 2510<code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the 2511beginning of the list, followed by <code>XML_FEATURE_UNICODE</code> 2512and <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at 2513all.</p> 2514 2515<p>Some features have an associated value. If there isn't an 2516associated value, the <code>value</code> field is set to 0. At this 2517time, the following features have been defined to have values:</p> 2518 2519<dl> 2520 <dt><code>XML_FEATURE_SIZEOF_XML_CHAR</code></dt> 2521 <dd>The number of bytes occupied by one <code>XML_Char</code> 2522 character.</dd> 2523 <dt><code>XML_FEATURE_SIZEOF_XML_LCHAR</code></dt> 2524 <dd>The number of bytes occupied by one <code>XML_LChar</code> 2525 character.</dd> 2526 <dt><code>XML_FEATURE_CONTEXT_BYTES</code></dt> 2527 <dd>The maximum number of characters of context which can be 2528 reported by <code><a href= "#XML_GetInputContext" 2529 >XML_GetInputContext</a></code>.</dd> 2530</dl> 2531</div> 2532 2533<h4 id="XML_FreeContentModel">XML_FreeContentModel</h4> 2534<pre class="fcndec"> 2535void XMLCALL 2536XML_FreeContentModel(XML_Parser parser, XML_Content *model); 2537</pre> 2538<div class="fcndef"> 2539Function to deallocate the <code>model</code> argument passed to the 2540<code>XML_ElementDeclHandler</code> callback set using <code><a 2541href="#XML_SetElementDeclHandler" >XML_ElementDeclHandler</a></code>. 2542This function should not be used for any other purpose. 2543</div> 2544 2545<p>The following functions allow external code to share the memory 2546allocator an <code>XML_Parser</code> has been configured to use. This 2547is especially useful for third-party libraries that interact with a 2548parser object created by application code, or heavily layered 2549applications. This can be essential when using dynamically loaded 2550libraries which use different C standard libraries (this can happen on 2551Windows, at least).</p> 2552 2553<h4 id="XML_MemMalloc">XML_MemMalloc</h4> 2554<pre class="fcndec"> 2555void * XMLCALL 2556XML_MemMalloc(XML_Parser parser, size_t size); 2557</pre> 2558<div class="fcndef"> 2559Allocate <code>size</code> bytes of memory using the allocator the 2560<code>parser</code> object has been configured to use. Returns a 2561pointer to the memory or NULL on failure. Memory allocated in this 2562way must be freed using <code><a href="#XML_MemFree" 2563>XML_MemFree</a></code>. 2564</div> 2565 2566<h4 id="XML_MemRealloc">XML_MemRealloc</h4> 2567<pre class="fcndec"> 2568void * XMLCALL 2569XML_MemRealloc(XML_Parser parser, void *ptr, size_t size); 2570</pre> 2571<div class="fcndef"> 2572Allocate <code>size</code> bytes of memory using the allocator the 2573<code>parser</code> object has been configured to use. 2574<code>ptr</code> must point to a block of memory allocated by <code><a 2575href="#XML_MemMalloc" >XML_MemMalloc</a></code> or 2576<code>XML_MemRealloc</code>, or be NULL. This function tries to 2577expand the block pointed to by <code>ptr</code> if possible. Returns 2578a pointer to the memory or NULL on failure. On success, the original 2579block has either been expanded or freed. On failure, the original 2580block has not been freed; the caller is responsible for freeing the 2581original block. Memory allocated in this way must be freed using 2582<code><a href="#XML_MemFree" 2583>XML_MemFree</a></code>. 2584</div> 2585 2586<h4 id="XML_MemFree">XML_MemFree</h4> 2587<pre class="fcndec"> 2588void XMLCALL 2589XML_MemFree(XML_Parser parser, void *ptr); 2590</pre> 2591<div class="fcndef"> 2592Free a block of memory pointed to by <code>ptr</code>. The block must 2593have been allocated by <code><a href="#XML_MemMalloc" 2594>XML_MemMalloc</a></code> or <code>XML_MemRealloc</code>, or be NULL. 2595</div> 2596 2597<hr /> 2598 2599 <div class="footer"> 2600 Found a bug in the documentation? 2601 <a href="https://github.com/libexpat/libexpat/issues">Please file a bug report.</a> 2602 </div> 2603 2604</div> 2605</body> 2606</html> 2607