1<?xml version="1.0" encoding="utf-8"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 5 <head> 6 <!-- 7 __ __ _ 8 ___\ \/ /_ __ __ _| |_ 9 / _ \\ /| '_ \ / _` | __| 10 | __// \| |_) | (_| | |_ 11 \___/_/\_\ .__/ \__,_|\__| 12 |_| XML parser 13 14 Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net> 15 Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net> 16 Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net> 17 Copyright (c) 2017-2026 Sebastian Pipping <sebastian@pipping.org> 18 Copyright (c) 2017 Jakub Wilk <jwilk@jwilk.net> 19 Copyright (c) 2021 Tomas Korbar <tkorbar@redhat.com> 20 Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr> 21 Copyright (c) 2022 Thijs Schreijer <thijs@thijsschreijer.nl> 22 Copyright (c) 2023-2025 Hanno Böck <hanno@gentoo.org> 23 Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com> 24 Licensed under the MIT license: 25 26 Permission is hereby granted, free of charge, to any person obtaining 27 a copy of this software and associated documentation files (the 28 "Software"), to deal in the Software without restriction, including 29 without limitation the rights to use, copy, modify, merge, publish, 30 distribute, sublicense, and/or sell copies of the Software, and to permit 31 persons to whom the Software is furnished to do so, subject to the 32 following conditions: 33 34 The above copyright notice and this permission notice shall be included 35 in all copies or substantial portions of the Software. 36 37 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 38 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 39 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 40 NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 41 DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 42 OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 43 USE OR OTHER DEALINGS IN THE SOFTWARE. 44--> 45 46 <title> 47 Expat XML Parser 48 </title> 49 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" /> 50 <link href="ok.min.css" rel="stylesheet" /> 51 <link href="style.css" rel="stylesheet" /> 52 </head> 53 <body> 54 <div> 55 <h1> 56 The Expat XML Parser <small>Release 2.8.0</small> 57 </h1> 58 </div> 59 60 <div class="content"> 61 <p> 62 Expat is a library, written in C, for parsing XML documents. It's the underlying 63 XML parser for the open source Mozilla project, Perl's <code>XML::Parser</code>, 64 Python's <code>xml.parsers.expat</code>, and other open-source XML parsers. 65 </p> 66 67 <p> 68 This library is the creation of James Clark, who's also given us groff (an nroff 69 look-alike), Jade (an implementation of ISO's DSSSL stylesheet language for 70 SGML), XP (a Java XML parser package), XT (a Java XSL engine). James was also the 71 technical lead on the XML Working Group at W3C that produced the XML 72 specification. 73 </p> 74 75 <p> 76 This is free software, licensed under the <a href="../COPYING">MIT/X Consortium 77 license</a>. You may download it from <a href="https://libexpat.github.io/">the 78 Expat home page</a>. 79 </p> 80 81 <p> 82 The bulk of this document was originally commissioned as an article by <a href= 83 "https://www.xml.com/">XML.com</a>. They graciously allowed Clark Cooper to 84 retain copyright and to distribute it with Expat. This version has been 85 substantially extended to include documentation on features which have been added 86 since the original article was published, and additional information on using the 87 original interface. 88 </p> 89 90 <hr /> 91 92 <h2> 93 Table of Contents 94 </h2> 95 96 <ul> 97 <li> 98 <a href="#overview">Overview</a> 99 </li> 100 101 <li> 102 <a href="#building">Building and Installing</a> 103 </li> 104 105 <li> 106 <a href="#using">Using Expat</a> 107 </li> 108 109 <li> 110 <a href="#reference">Reference</a> 111 <ul> 112 <li> 113 <a href="#creation">Parser Creation Functions</a> 114 <ul> 115 <li> 116 <a href="#XML_ParserCreate">XML_ParserCreate</a> 117 </li> 118 119 <li> 120 <a href="#XML_ParserCreateNS">XML_ParserCreateNS</a> 121 </li> 122 123 <li> 124 <a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a> 125 </li> 126 127 <li> 128 <a href= 129 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a> 130 </li> 131 132 <li> 133 <a href="#XML_ParserFree">XML_ParserFree</a> 134 </li> 135 136 <li> 137 <a href="#XML_ParserReset">XML_ParserReset</a> 138 </li> 139 </ul> 140 </li> 141 142 <li> 143 <a href="#parsing">Parsing Functions</a> 144 <ul> 145 <li> 146 <a href="#XML_Parse">XML_Parse</a> 147 </li> 148 149 <li> 150 <a href="#XML_ParseBuffer">XML_ParseBuffer</a> 151 </li> 152 153 <li> 154 <a href="#XML_GetBuffer">XML_GetBuffer</a> 155 </li> 156 157 <li> 158 <a href="#XML_StopParser">XML_StopParser</a> 159 </li> 160 161 <li> 162 <a href="#XML_ResumeParser">XML_ResumeParser</a> 163 </li> 164 165 <li> 166 <a href="#XML_GetParsingStatus">XML_GetParsingStatus</a> 167 </li> 168 </ul> 169 </li> 170 171 <li> 172 <a href="#setting">Handler Setting Functions</a> 173 <ul> 174 <li> 175 <a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a> 176 </li> 177 178 <li> 179 <a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a> 180 </li> 181 182 <li> 183 <a href="#XML_SetElementHandler">XML_SetElementHandler</a> 184 </li> 185 186 <li> 187 <a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a> 188 </li> 189 190 <li> 191 <a href= 192 "#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a> 193 </li> 194 195 <li> 196 <a href="#XML_SetCommentHandler">XML_SetCommentHandler</a> 197 </li> 198 199 <li> 200 <a href= 201 "#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a> 202 </li> 203 204 <li> 205 <a href= 206 "#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a> 207 </li> 208 209 <li> 210 <a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a> 211 </li> 212 213 <li> 214 <a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a> 215 </li> 216 217 <li> 218 <a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a> 219 </li> 220 221 <li> 222 <a href= 223 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a> 224 </li> 225 226 <li> 227 <a href= 228 "#XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</a> 229 </li> 230 231 <li> 232 <a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a> 233 </li> 234 235 <li> 236 <a href= 237 "#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a> 238 </li> 239 240 <li> 241 <a href= 242 "#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a> 243 </li> 244 245 <li> 246 <a href= 247 "#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a> 248 </li> 249 250 <li> 251 <a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a> 252 </li> 253 254 <li> 255 <a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a> 256 </li> 257 258 <li> 259 <a href= 260 "#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a> 261 </li> 262 263 <li> 264 <a href= 265 "#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a> 266 </li> 267 268 <li> 269 <a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a> 270 </li> 271 272 <li> 273 <a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a> 274 </li> 275 276 <li> 277 <a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a> 278 </li> 279 280 <li> 281 <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a> 282 </li> 283 284 <li> 285 <a href= 286 "#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a> 287 </li> 288 289 <li> 290 <a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a> 291 </li> 292 293 <li> 294 <a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a> 295 </li> 296 </ul> 297 </li> 298 299 <li> 300 <a href="#position">Parse Position and Error Reporting Functions</a> 301 <ul> 302 <li> 303 <a href="#XML_GetErrorCode">XML_GetErrorCode</a> 304 </li> 305 306 <li> 307 <a href="#XML_ErrorString">XML_ErrorString</a> 308 </li> 309 310 <li> 311 <a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a> 312 </li> 313 314 <li> 315 <a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a> 316 </li> 317 318 <li> 319 <a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a> 320 </li> 321 322 <li> 323 <a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a> 324 </li> 325 326 <li> 327 <a href="#XML_GetInputContext">XML_GetInputContext</a> 328 </li> 329 </ul> 330 </li> 331 332 <li> 333 <a href="#attack-protection">Attack Protection</a> 334 <ul> 335 <li> 336 <a href= 337 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a> 338 </li> 339 340 <li> 341 <a href= 342 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a> 343 </li> 344 345 <li> 346 <a href= 347 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a> 348 </li> 349 350 <li> 351 <a href= 352 "#XML_SetAllocTrackerActivationThreshold">XML_SetAllocTrackerActivationThreshold</a> 353 </li> 354 355 <li> 356 <a href= 357 "#XML_SetReparseDeferralEnabled">XML_SetReparseDeferralEnabled</a> 358 </li> 359 </ul> 360 </li> 361 362 <li> 363 <a href="#miscellaneous">Miscellaneous Functions</a> 364 <ul> 365 <li> 366 <a href="#XML_SetUserData">XML_SetUserData</a> 367 </li> 368 369 <li> 370 <a href="#XML_GetUserData">XML_GetUserData</a> 371 </li> 372 373 <li> 374 <a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a> 375 </li> 376 377 <li> 378 <a href="#XML_SetBase">XML_SetBase</a> 379 </li> 380 381 <li> 382 <a href="#XML_GetBase">XML_GetBase</a> 383 </li> 384 385 <li> 386 <a href= 387 "#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a> 388 </li> 389 390 <li> 391 <a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a> 392 </li> 393 394 <li> 395 <a href="#XML_GetAttributeInfo">XML_GetAttributeInfo</a> 396 </li> 397 398 <li> 399 <a href="#XML_SetEncoding">XML_SetEncoding</a> 400 </li> 401 402 <li> 403 <a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a> 404 </li> 405 406 <li> 407 <a href="#XML_SetHashSalt">XML_SetHashSalt</a> (deprecated) 408 </li> 409 410 <li> 411 <a href="#XML_SetHashSalt16Bytes">XML_SetHashSalt16Bytes</a> 412 </li> 413 414 <li> 415 <a href="#XML_UseForeignDTD">XML_UseForeignDTD</a> 416 </li> 417 418 <li> 419 <a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a> 420 </li> 421 422 <li> 423 <a href="#XML_DefaultCurrent">XML_DefaultCurrent</a> 424 </li> 425 426 <li> 427 <a href="#XML_ExpatVersion">XML_ExpatVersion</a> 428 </li> 429 430 <li> 431 <a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a> 432 </li> 433 434 <li> 435 <a href="#XML_GetFeatureList">XML_GetFeatureList</a> 436 </li> 437 438 <li> 439 <a href="#XML_FreeContentModel">XML_FreeContentModel</a> 440 </li> 441 442 <li> 443 <a href="#XML_MemMalloc">XML_MemMalloc</a> 444 </li> 445 446 <li> 447 <a href="#XML_MemRealloc">XML_MemRealloc</a> 448 </li> 449 450 <li> 451 <a href="#XML_MemFree">XML_MemFree</a> 452 </li> 453 </ul> 454 </li> 455 </ul> 456 </li> 457 </ul> 458 459 <hr /> 460 461 <h2> 462 <a id="overview" name="overview">Overview</a> 463 </h2> 464 465 <p> 466 Expat is a stream-oriented parser. You register callback (or handler) functions 467 with the parser and then start feeding it the document. As the parser recognizes 468 parts of the document, it will call the appropriate handler for that part (if 469 you've registered one.) The document is fed to the parser in pieces, so you can 470 start parsing before you have all the document. This also allows you to parse 471 really huge documents that won't fit into memory. 472 </p> 473 474 <p> 475 Expat can be intimidating due to the many kinds of handlers and options you can 476 set. But you only need to learn four functions in order to do 90% of what you'll 477 want to do with it: 478 </p> 479 480 <dl> 481 <dt> 482 <code><a href="#XML_ParserCreate">XML_ParserCreate</a></code> 483 </dt> 484 485 <dd> 486 Create a new parser object. 487 </dd> 488 489 <dt> 490 <code><a href="#XML_SetElementHandler">XML_SetElementHandler</a></code> 491 </dt> 492 493 <dd> 494 Set handlers for start and end tags. 495 </dd> 496 497 <dt> 498 <code><a href= 499 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code> 500 </dt> 501 502 <dd> 503 Set handler for text. 504 </dd> 505 506 <dt> 507 <code><a href="#XML_Parse">XML_Parse</a></code> 508 </dt> 509 510 <dd> 511 Pass a buffer full of document to the parser 512 </dd> 513 </dl> 514 515 <p> 516 These functions and others are described in the <a href= 517 "#reference">reference</a> part of this document. The reference section also 518 describes in detail the parameters passed to the different types of handlers. 519 </p> 520 521 <p> 522 Let's look at a very simple example program that only uses 3 of the above 523 functions (it doesn't need to set a character handler.) The program <a href= 524 "../examples/outline.c">outline.c</a> prints an element outline, indenting child 525 elements to distinguish them from the parent element that contains them. The 526 start handler does all the work. It prints two indenting spaces for every level 527 of ancestor elements, then it prints the element and attribute information. 528 Finally it increments the global <code>Depth</code> variable. 529 </p> 530 531 <pre class="eg"> 532int Depth; 533 534void XMLCALL 535start(void *data, const char *el, const char **attr) { 536 int i; 537 538 for (i = 0; i < Depth; i++) 539 printf(" "); 540 541 printf("%s", el); 542 543 for (i = 0; attr[i]; i += 2) { 544 printf(" %s='%s'", attr[i], attr[i + 1]); 545 } 546 547 printf("\n"); 548 Depth++; 549} /* End of start handler */ 550</pre> 551 <p> 552 The end tag simply does the bookkeeping work of decrementing <code>Depth</code>. 553 </p> 554 555 <pre class="eg"> 556void XMLCALL 557end(void *data, const char *el) { 558 Depth--; 559} /* End of end handler */ 560</pre> 561 <p> 562 Note the <code>XMLCALL</code> annotation used for the callbacks. This is used to 563 ensure that the Expat and the callbacks are using the same calling convention in 564 case the compiler options used for Expat itself and the client code are 565 different. Expat tries not to care what the default calling convention is, though 566 it may require that it be compiled with a default convention of "cdecl" on some 567 platforms. For code which uses Expat, however, the calling convention is 568 specified by the <code>XMLCALL</code> annotation on most platforms; callbacks 569 should be defined using this annotation. 570 </p> 571 572 <p> 573 The <code>XMLCALL</code> annotation was added in Expat 1.95.7, but existing 574 working Expat applications don't need to add it (since they are already using the 575 "cdecl" calling convention, or they wouldn't be working). The annotation is only 576 needed if the default calling convention may be something other than "cdecl". To 577 use the annotation safely with older versions of Expat, you can conditionally 578 define it <em>after</em> including Expat's header file: 579 </p> 580 581 <pre class="eg"> 582#include <expat.h> 583 584#ifndef XMLCALL 585#if defined(_MSC_VER) && !defined(__BEOS__) && !defined(__CYGWIN__) 586#define XMLCALL __cdecl 587#elif defined(__GNUC__) 588#define XMLCALL __attribute__((cdecl)) 589#else 590#define XMLCALL 591#endif 592#endif 593</pre> 594 <p> 595 After creating the parser, the main program just has the job of shoveling the 596 document to the parser so that it can do its work. 597 </p> 598 599 <hr /> 600 601 <h2> 602 <a id="building" name="building">Building and Installing Expat</a> 603 </h2> 604 605 <p> 606 The Expat distribution comes as a compressed (with GNU gzip) tar file. You may 607 download the latest version from <a href= 608 "https://sourceforge.net/projects/expat/">Source Forge</a>. After unpacking this, 609 cd into the directory. Then follow either the Win32 directions or Unix directions 610 below. 611 </p> 612 613 <h3> 614 Building under Win32 615 </h3> 616 617 <p> 618 If you're using the GNU compiler under cygwin, follow the Unix directions in the 619 next section. Otherwise if you have Microsoft's Developer Studio installed, you 620 can use CMake to generate a <code>.sln</code> file, e.g. <code>cmake -G"Visual 621 Studio 17 2022" -DCMAKE_BUILD_TYPE=RelWithDebInfo .</code> , and build Expat 622 using <code>msbuild /m expat.sln</code> after. 623 </p> 624 625 <p> 626 Alternatively, you may download the Win32 binary package that contains the 627 "expat.h" include file and a pre-built DLL. 628 </p> 629 630 <h3> 631 Building under Unix (or GNU) 632 </h3> 633 634 <p> 635 First you'll need to run the configure shell script in order to configure the 636 Makefiles and headers for your system. 637 </p> 638 639 <p> 640 If you're happy with all the defaults that configure picks for you, and you have 641 permission on your system to install into /usr/local, you can install Expat with 642 this sequence of commands: 643 </p> 644 645 <pre class="eg"> 646./configure 647make 648make install 649</pre> 650 <p> 651 There are some options that you can provide to this script, but the only one 652 we'll mention here is the <code>--prefix</code> option. You can find out all the 653 options available by running configure with just the <code>--help</code> option. 654 </p> 655 656 <p> 657 By default, the configure script sets things up so that the library gets 658 installed in <code>/usr/local/lib</code> and the associated header file in 659 <code>/usr/local/include</code>. But if you were to give the option, 660 <code>--prefix=/home/me/mystuff</code>, then the library and header would get 661 installed in <code>/home/me/mystuff/lib</code> and 662 <code>/home/me/mystuff/include</code> respectively. 663 </p> 664 665 <h3> 666 Configuring Expat Using the Pre-Processor 667 </h3> 668 669 <p> 670 Expat's feature set can be configured using a small number of pre-processor 671 definitions. The symbols are: 672 </p> 673 674 <dl class="cpp-symbols"> 675 <dt> 676 <a id="XML_GE" name="XML_GE">XML_GE</a> 677 </dt> 678 679 <dd> 680 Added in Expat 2.6.0. Include support for <a href= 681 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-physical-struct">general 682 entities</a> (syntax <code>&e1;</code> to reference and syntax 683 <code><!ENTITY e1 'value1'></code> (an internal general entity) or 684 <code><!ENTITY e2 SYSTEM 'file2'></code> (an external general entity) to 685 declare). With <code>XML_GE</code> enabled, general entities will be replaced 686 by their declared replacement text; for this to work for <em>external</em> 687 general entities, in addition an <code><a href= 688 "#XML_SetExternalEntityRefHandler">XML_ExternalEntityRefHandler</a></code> must 689 be set using <code><a href= 690 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>. 691 Also, enabling <code>XML_GE</code> makes the functions <code><a href= 692 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code> 693 and <code><a href= 694 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code> 695 available.<br /> 696 With <code>XML_GE</code> disabled, Expat has a smaller memory footprint and can 697 be faster, but will not load external general entities and will replace all 698 general entities (except the <a href= 699 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">predefined 700 five</a>: <code>amp</code>, <code>apos</code>, <code>gt</code>, 701 <code>lt</code>, <code>quot</code>) with a self-reference: for example, 702 referencing an entity <code>e1</code> via <code>&e1;</code> will be 703 replaced by text <code>&e1;</code>. 704 </dd> 705 706 <dt> 707 <a id="XML_DTD" name="XML_DTD">XML_DTD</a> 708 </dt> 709 710 <dd> 711 Include support for using and reporting DTD-based content. If this is defined, 712 default attribute values from an external DTD subset are reported and attribute 713 value normalization occurs based on the type of attributes defined in the 714 external subset. Without this, Expat has a smaller memory footprint and can be 715 faster, but will not load external parameter entities or process conditional 716 sections. If defined, makes the functions <code><a href= 717 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code> 718 and <code><a href= 719 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code> 720 available. 721 </dd> 722 723 <dt> 724 <a id="XML_NS" name="XML_NS">XML_NS</a> 725 </dt> 726 727 <dd> 728 When defined, support for the <cite><a href= 729 "https://www.w3.org/TR/REC-xml-names/">Namespaces in XML</a></cite> 730 specification is included. 731 </dd> 732 733 <dt> 734 <a id="XML_UNICODE" name="XML_UNICODE">XML_UNICODE</a> 735 </dt> 736 737 <dd> 738 When defined, character data reported to the application is encoded in UTF-16 739 using wide characters of the type <code>XML_Char</code>. This is implied if 740 <code>XML_UNICODE_WCHAR_T</code> is defined. 741 </dd> 742 743 <dt> 744 <a id="XML_UNICODE_WCHAR_T" name="XML_UNICODE_WCHAR_T">XML_UNICODE_WCHAR_T</a> 745 </dt> 746 747 <dd> 748 If defined, causes the <code>XML_Char</code> character type to be defined using 749 the <code>wchar_t</code> type; otherwise, <code>unsigned short</code> is used. 750 Defining this implies <code>XML_UNICODE</code>. 751 </dd> 752 753 <dt> 754 <a id="XML_LARGE_SIZE" name="XML_LARGE_SIZE">XML_LARGE_SIZE</a> 755 </dt> 756 757 <dd> 758 If defined, causes the <code>XML_Size</code> and <code>XML_Index</code> integer 759 types to be at least 64 bits in size. This is intended to support processing of 760 very large input streams, where the return values of <code><a href= 761 "#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></code>, <code><a href= 762 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and 763 <code><a href= 764 "#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code> could 765 overflow. It may not be supported by all compilers, and is turned off by 766 default. 767 </dd> 768 769 <dt> 770 <a id="XML_CONTEXT_BYTES" name="XML_CONTEXT_BYTES">XML_CONTEXT_BYTES</a> 771 </dt> 772 773 <dd> 774 The number of input bytes of markup context which the parser will ensure are 775 available for reporting via <code><a href= 776 "#XML_GetInputContext">XML_GetInputContext</a></code>. This is normally set to 777 1024, and must be set to a positive integer to enable. If this is set to zero, 778 the input context will not be available and <code><a href= 779 "#XML_GetInputContext">XML_GetInputContext</a></code> will always report 780 <code>NULL</code>. Without this, Expat has a smaller memory footprint and can 781 be faster. 782 </dd> 783 784 <dt> 785 <a id="XML_STATIC" name="XML_STATIC">XML_STATIC</a> 786 </dt> 787 788 <dd> 789 On Windows, this should be set if Expat is going to be linked statically with 790 the code that calls it; this is required to get all the right MSVC magic 791 annotations correct. This is ignored on other platforms. 792 </dd> 793 794 <dt> 795 <a id="XML_ATTR_INFO" name="XML_ATTR_INFO">XML_ATTR_INFO</a> 796 </dt> 797 798 <dd> 799 If defined, makes the additional function <code><a href= 800 "#XML_GetAttributeInfo">XML_GetAttributeInfo</a></code> available for reporting 801 attribute byte offsets. 802 </dd> 803 </dl> 804 805 <hr /> 806 807 <h2> 808 <a id="using" name="using">Using Expat</a> 809 </h2> 810 811 <h3> 812 Compiling and Linking Against Expat 813 </h3> 814 815 <p> 816 Unless you installed Expat in a location not expected by your compiler and 817 linker, all you have to do to use Expat in your programs is to include the Expat 818 header (<code>#include <expat.h></code>) in your files that make calls to 819 it and to tell the linker that it needs to link against the Expat library. On 820 Unix systems, this would usually be done with the <code>-lexpat</code> argument. 821 Otherwise, you'll need to tell the compiler where to look for the Expat header 822 and the linker where to find the Expat library. You may also need to take steps 823 to tell the operating system where to find this library at run time. 824 </p> 825 826 <p> 827 On a Unix-based system, here's what a Makefile might look like when Expat is 828 installed in a standard location: 829 </p> 830 831 <pre class="eg"> 832CC=cc 833LDFLAGS= 834LIBS= -lexpat 835xmlapp: xmlapp.o 836 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 837</pre> 838 <p> 839 If you installed Expat in, say, <code>/home/me/mystuff</code>, then the Makefile 840 would look like this: 841 </p> 842 843 <pre class="eg"> 844CC=cc 845CFLAGS= -I/home/me/mystuff/include 846LDFLAGS= 847LIBS= -L/home/me/mystuff/lib -lexpat 848xmlapp: xmlapp.o 849 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 850</pre> 851 <p> 852 You'd also have to set the environment variable <code>LD_LIBRARY_PATH</code> to 853 <code>/home/me/mystuff/lib</code> (or to 854 <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if LD_LIBRARY_PATH already 855 has some directories in it) in order to run your application. 856 </p> 857 858 <h3> 859 Expat Basics 860 </h3> 861 862 <p> 863 As we saw in the example in the overview, the first step in parsing an XML 864 document with Expat is to create a parser object. There are <a href= 865 "#creation">three functions</a> in the Expat API for creating a parser object. 866 However, only two of these (<code><a href= 867 "#XML_ParserCreate">XML_ParserCreate</a></code> and <code><a href= 868 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>) can be used for constructing 869 a parser for a top-level document. The object returned by these functions is an 870 opaque pointer (i.e. "expat.h" declares it as void *) to data with further 871 internal structure. In order to free the memory associated with this object you 872 must call <code><a href="#XML_ParserFree">XML_ParserFree</a></code>. Note that if 873 you have provided any <a href="#userdata">user data</a> that gets stored in the 874 parser, then your application is responsible for freeing it prior to calling 875 <code>XML_ParserFree</code>. 876 </p> 877 878 <p> 879 The objects returned by the parser creation functions are good for parsing only 880 one XML document or external parsed entity. If your application needs to parse 881 many XML documents, then it needs to create a parser object for each one. The 882 best way to deal with this is to create a higher level object that contains all 883 the default initialization you want for your parser objects. 884 </p> 885 886 <p> 887 Walking through a document hierarchy with a stream oriented parser will require a 888 good stack mechanism in order to keep track of current context. For instance, to 889 answer the simple question, "What element does this text belong to?" requires a 890 stack, since the parser may have descended into other elements that are children 891 of the current one and has encountered this text on the way out. 892 </p> 893 894 <p> 895 The things you're likely to want to keep on a stack are the currently opened 896 element and it's attributes. You push this information onto the stack in the 897 start handler and you pop it off in the end handler. 898 </p> 899 900 <p> 901 For some tasks, it is sufficient to just keep information on what the depth of 902 the stack is (or would be if you had one.) The outline program shown above 903 presents one example. Another such task would be skipping over a complete 904 element. When you see the start tag for the element you want to skip, you set a 905 skip flag and record the depth at which the element started. When the end tag 906 handler encounters the same depth, the skipped element has ended and the flag may 907 be cleared. If you follow the convention that the root element starts at 1, then 908 you can use the same variable for skip flag and skip depth. 909 </p> 910 911 <pre class="eg"> 912void 913init_info(Parseinfo *info) { 914 info->skip = 0; 915 info->depth = 1; 916 /* Other initializations here */ 917} /* End of init_info */ 918 919void XMLCALL 920rawstart(void *data, const char *el, const char **attr) { 921 Parseinfo *inf = (Parseinfo *) data; 922 923 if (! inf->skip) { 924 if (should_skip(inf, el, attr)) { 925 inf->skip = inf->depth; 926 } 927 else 928 start(inf, el, attr); /* This does rest of start handling */ 929 } 930 931 inf->depth++; 932} /* End of rawstart */ 933 934void XMLCALL 935rawend(void *data, const char *el) { 936 Parseinfo *inf = (Parseinfo *) data; 937 938 inf->depth--; 939 940 if (! inf->skip) 941 end(inf, el); /* This does rest of end handling */ 942 943 if (inf->skip == inf->depth) 944 inf->skip = 0; 945} /* End rawend */ 946</pre> 947 <p> 948 Notice in the above example the difference in how depth is manipulated in the 949 start and end handlers. The end tag handler should be the mirror image of the 950 start tag handler. This is necessary to properly model containment. Since, in the 951 start tag handler, we incremented depth <em>after</em> the main body of start tag 952 code, then in the end handler, we need to manipulate it <em>before</em> the main 953 body. If we'd decided to increment it first thing in the start handler, then we'd 954 have had to decrement it last thing in the end handler. 955 </p> 956 957 <h3 id="userdata"> 958 Communicating between handlers 959 </h3> 960 961 <p> 962 In order to be able to pass information between different handlers without using 963 globals, you'll need to define a data structure to hold the shared variables. You 964 can then tell Expat (with the <code><a href= 965 "#XML_SetUserData">XML_SetUserData</a></code> function) to pass a pointer to this 966 structure to the handlers. This is the first argument received by most handlers. 967 In the <a href="#reference">reference section</a>, an argument to a callback 968 function is named <code>userData</code> and have type <code>void *</code> if the 969 user data is passed; it will have the type <code>XML_Parser</code> if the parser 970 itself is passed. When the parser is passed, the user data may be retrieved using 971 <code><a href="#XML_GetUserData">XML_GetUserData</a></code>. 972 </p> 973 974 <p> 975 One common case where multiple calls to a single handler may need to communicate 976 using an application data structure is the case when content passed to the 977 character data handler (set by <code><a href= 978 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code>) needs to 979 be accumulated. A common first-time mistake with any of the event-oriented 980 interfaces to an XML parser is to expect all the text contained in an element to 981 be reported by a single call to the character data handler. Expat, like many 982 other XML parsers, reports such data as a sequence of calls; there's no way to 983 know when the end of the sequence is reached until a different callback is made. 984 A buffer referenced by the user data structure proves both an effective and 985 convenient place to accumulate character data. 986 </p> 987 <!-- XXX example needed here --> 988 989 <h3> 990 XML Version 991 </h3> 992 993 <p> 994 Expat is an XML 1.0 parser, and as such never complains based on the value of the 995 <code>version</code> pseudo-attribute in the XML declaration, if present. 996 </p> 997 998 <p> 999 If an application needs to check the version number (to support alternate 1000 processing), it should use the <code><a href= 1001 "#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></code> function to set a 1002 handler that uses the information in the XML declaration to determine what to do. 1003 This example shows how to check that only a version number of <code>"1.0"</code> 1004 is accepted: 1005 </p> 1006 1007 <pre class="eg"> 1008static int wrong_version; 1009static XML_Parser parser; 1010 1011static void XMLCALL 1012xmldecl_handler(void *userData, 1013 const XML_Char *version, 1014 const XML_Char *encoding, 1015 int standalone) 1016{ 1017 static const XML_Char Version_1_0[] = {'1', '.', '0', 0}; 1018 1019 int i; 1020 1021 for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) { 1022 if (version[i] != Version_1_0[i]) { 1023 wrong_version = 1; 1024 /* also clear all other handlers: */ 1025 XML_SetCharacterDataHandler(parser, NULL); 1026 ... 1027 return; 1028 } 1029 } 1030 ... 1031} 1032</pre> 1033 <h3> 1034 Namespace Processing 1035 </h3> 1036 1037 <p> 1038 When the parser is created using the <code><a href= 1039 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, function, Expat performs 1040 namespace processing. Under namespace processing, Expat consumes 1041 <code>xmlns</code> and <code>xmlns:...</code> attributes, which declare 1042 namespaces for the scope of the element in which they occur. This means that your 1043 start handler will not see these attributes. Your application can still be 1044 informed of these declarations by setting namespace declaration handlers with 1045 <a href= 1046 "#XML_SetNamespaceDeclHandler"><code>XML_SetNamespaceDeclHandler</code></a>. 1047 </p> 1048 1049 <p> 1050 Element type and attribute names that belong to a given namespace are passed to 1051 the appropriate handler in expanded form. By default this expanded form is a 1052 concatenation of the namespace URI, the separator character (which is the 2nd 1053 argument to <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>), 1054 and the local name (i.e. the part after the colon). Names with undeclared 1055 prefixes are not well-formed when namespace processing is enabled, and will 1056 trigger an error. Unprefixed attribute names are never expanded, and unprefixed 1057 element names are only expanded when they are in the scope of a default 1058 namespace. 1059 </p> 1060 1061 <p> 1062 However if <code><a href= 1063 "#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></code> has been called with 1064 a non-zero <code>do_nst</code> parameter, then the expanded form for names with 1065 an explicit prefix is a concatenation of: URI, separator, local name, separator, 1066 prefix. 1067 </p> 1068 1069 <p> 1070 You can set handlers for the start of a namespace declaration and for the end of 1071 a scope of a declaration with the <code><a href= 1072 "#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></code> function. 1073 The StartNamespaceDeclHandler is called prior to the start tag handler and the 1074 EndNamespaceDeclHandler is called after the corresponding end tag that ends the 1075 namespace's scope. The namespace start handler gets passed the prefix and URI for 1076 the namespace. For a default namespace declaration (xmlns='...'), the prefix will 1077 be <code>NULL</code>. The URI will be <code>NULL</code> for the case where the 1078 default namespace is being unset. The namespace end handler just gets the prefix 1079 for the closing scope. 1080 </p> 1081 1082 <p> 1083 These handlers are called for each declaration. So if, for instance, a start tag 1084 had three namespace declarations, then the StartNamespaceDeclHandler would be 1085 called three times before the start tag handler is called, once for each 1086 declaration. 1087 </p> 1088 1089 <h3> 1090 Character Encodings 1091 </h3> 1092 1093 <p> 1094 While XML is based on Unicode, and every XML processor is required to recognized 1095 UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), other encodings may be 1096 declared in XML documents or entities. For the main document, an XML declaration 1097 may contain an encoding declaration: 1098 </p> 1099 1100 <pre> 1101<?xml version="1.0" encoding="ISO-8859-2"?> 1102</pre> 1103 <p> 1104 External parsed entities may begin with a text declaration, which looks like an 1105 XML declaration with just an encoding declaration: 1106 </p> 1107 1108 <pre> 1109<?xml encoding="Big5"?> 1110</pre> 1111 <p> 1112 With Expat, you may also specify an encoding at the time of creating a parser. 1113 This is useful when the encoding information may come from a source outside the 1114 document itself (like a higher level protocol.) 1115 </p> 1116 1117 <p> 1118 <a id="builtin_encodings" name="builtin_encodings"></a>There are four built-in 1119 encodings in Expat: 1120 </p> 1121 1122 <ul> 1123 <li>UTF-8 1124 </li> 1125 1126 <li>UTF-16 1127 </li> 1128 1129 <li>ISO-8859-1 1130 </li> 1131 1132 <li>US-ASCII 1133 </li> 1134 </ul> 1135 1136 <p> 1137 Anything else discovered in an encoding declaration or in the protocol encoding 1138 specified in the parser constructor, triggers a call to the 1139 <code>UnknownEncodingHandler</code>. This handler gets passed the encoding name 1140 and a pointer to an <code>XML_Encoding</code> data structure. Your handler must 1141 fill in this structure and return <code>XML_STATUS_OK</code> if it knows how to 1142 deal with the encoding. Otherwise the handler should return 1143 <code>XML_STATUS_ERROR</code>. The handler also gets passed a pointer to an 1144 optional application data structure that you may indicate when you set the 1145 handler. 1146 </p> 1147 1148 <p> 1149 Expat places restrictions on character encodings that it can support by filling 1150 in the <code>XML_Encoding</code> structure. include file: 1151 </p> 1152 1153 <ol> 1154 <li>Every ASCII character that can appear in a well-formed XML document must be 1155 represented by a single byte, and that byte must correspond to it's ASCII 1156 encoding (except for the characters $@\^'{}~) 1157 </li> 1158 1159 <li>Characters must be encoded in 4 bytes or less. 1160 </li> 1161 1162 <li>All characters encoded must have Unicode scalar values less than or equal to 1163 65535 (0xFFFF)<em>This does not apply to the built-in support for UTF-16 and 1164 UTF-8</em> 1165 </li> 1166 1167 <li>No character may be encoded by more that one distinct sequence of bytes 1168 </li> 1169 </ol> 1170 1171 <p> 1172 <code>XML_Encoding</code> contains an array of integers that correspond to the 1173 1st byte of an encoding sequence. If the value in the array for a byte is zero or 1174 positive, then the byte is a single byte encoding that encodes the Unicode scalar 1175 value contained in the array. A -1 in this array indicates a malformed byte. If 1176 the value is -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte 1177 sequence respectively. Multi-byte sequences are sent to the convert function 1178 pointed at in the <code>XML_Encoding</code> structure. This function should 1179 return the Unicode scalar value for the sequence or -1 if the sequence is 1180 malformed. 1181 </p> 1182 1183 <p> 1184 One pitfall that novice Expat users are likely to fall into is that although 1185 Expat may accept input in various encodings, the strings that it passes to the 1186 handlers are always encoded in UTF-8 or UTF-16 (depending on how Expat was 1187 compiled). Your application is responsible for any translation of these strings 1188 into other encodings. 1189 </p> 1190 1191 <h3> 1192 Handling External Entity References 1193 </h3> 1194 1195 <p> 1196 Expat does not read or parse external entities directly. Note that any external 1197 DTD is a special case of an external entity. If you've set no 1198 <code>ExternalEntityRefHandler</code>, then external entity references are 1199 silently ignored. Otherwise, it calls your handler with the information needed to 1200 read and parse the external entity. 1201 </p> 1202 1203 <p> 1204 Your handler isn't actually responsible for parsing the entity, but it is 1205 responsible for creating a subsidiary parser with <code><a href= 1206 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> that 1207 will do the job. This returns an instance of <code>XML_Parser</code> that has 1208 handlers and other data structures initialized from the parent parser. You may 1209 then use <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 1210 "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this parser. Since 1211 external entities my refer to other external entities, your handler should be 1212 prepared to be called recursively. 1213 </p> 1214 1215 <h3> 1216 Parsing DTDs 1217 </h3> 1218 1219 <p> 1220 In order to parse parameter entities, before starting the parse, you must call 1221 <code><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code> 1222 with one of the following arguments: 1223 </p> 1224 1225 <dl> 1226 <dt> 1227 <code>XML_PARAM_ENTITY_PARSING_NEVER</code> 1228 </dt> 1229 1230 <dd> 1231 Don't parse parameter entities or the external subset 1232 </dd> 1233 1234 <dt> 1235 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code> 1236 </dt> 1237 1238 <dd> 1239 Parse parameter entities and the external subset unless <code>standalone</code> 1240 was set to "yes" in the XML declaration. 1241 </dd> 1242 1243 <dt> 1244 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code> 1245 </dt> 1246 1247 <dd> 1248 Always parse parameter entities and the external subset 1249 </dd> 1250 </dl> 1251 1252 <p> 1253 In order to read an external DTD, you also have to set an external entity 1254 reference handler as described above. 1255 </p> 1256 1257 <h3 id="stop-resume"> 1258 Temporarily Stopping Parsing 1259 </h3> 1260 1261 <p> 1262 Expat 1.95.8 introduces a new feature: its now possible to stop parsing 1263 temporarily from within a handler function, even if more data has already been 1264 passed into the parser. Applications for this include 1265 </p> 1266 1267 <ul> 1268 <li>Supporting the <a href="https://www.w3.org/TR/xinclude/">XInclude</a> 1269 specification. 1270 </li> 1271 1272 <li>Delaying further processing until additional information is available from 1273 some other source. 1274 </li> 1275 1276 <li>Adjusting processor load as task priorities shift within an application. 1277 </li> 1278 1279 <li>Stopping parsing completely (simply free or reset the parser instead of 1280 resuming in the outer parsing loop). This can be useful if an application-domain 1281 error is found in the XML being parsed or if the result of the parse is 1282 determined not to be useful after all. 1283 </li> 1284 </ul> 1285 1286 <p> 1287 To take advantage of this feature, the main parsing loop of an application needs 1288 to support this specifically. It cannot be supported with a parsing loop 1289 compatible with Expat 1.95.7 or earlier (though existing loops will continue to 1290 work without supporting the stop/resume feature). 1291 </p> 1292 1293 <p> 1294 An application that uses this feature for a single parser will have the rough 1295 structure (in pseudo-code): 1296 </p> 1297 1298 <pre class="pseudocode"> 1299fd = open_input() 1300p = create_parser() 1301 1302if parse_xml(p, fd) { 1303 /* suspended */ 1304 1305 int suspended = 1; 1306 1307 while (suspended) { 1308 do_something_else() 1309 if ready_to_resume() { 1310 suspended = continue_parsing(p, fd); 1311 } 1312 } 1313} 1314</pre> 1315 <p> 1316 An application that may resume any of several parsers based on input (either from 1317 the XML being parsed or some other source) will certainly have more interesting 1318 control structures. 1319 </p> 1320 1321 <p> 1322 This C function could be used for the <code>parse_xml</code> function mentioned 1323 in the pseudo-code above: 1324 </p> 1325 1326 <pre class="eg"> 1327#define BUFF_SIZE 10240 1328 1329/* Parse a document from the open file descriptor 'fd' until the parse 1330 is complete (the document has been completely parsed, or there's 1331 been an error), or the parse is stopped. Return non-zero when 1332 the parse is merely suspended. 1333*/ 1334int 1335parse_xml(XML_Parser p, int fd) 1336{ 1337 for (;;) { 1338 int last_chunk; 1339 int bytes_read; 1340 enum XML_Status status; 1341 1342 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1343 if (buff == NULL) { 1344 /* handle error... */ 1345 return 0; 1346 } 1347 bytes_read = read(fd, buff, BUFF_SIZE); 1348 if (bytes_read < 0) { 1349 /* handle error... */ 1350 return 0; 1351 } 1352 status = XML_ParseBuffer(p, bytes_read, bytes_read == 0); 1353 switch (status) { 1354 case XML_STATUS_ERROR: 1355 /* handle error... */ 1356 return 0; 1357 case XML_STATUS_SUSPENDED: 1358 return 1; 1359 } 1360 if (bytes_read == 0) 1361 return 0; 1362 } 1363} 1364</pre> 1365 <p> 1366 The corresponding <code>continue_parsing</code> function is somewhat simpler, 1367 since it only need deal with the return code from <code><a href= 1368 "#XML_ResumeParser">XML_ResumeParser</a></code>; it can delegate the input 1369 handling to the <code>parse_xml</code> function: 1370 </p> 1371 1372 <pre class="eg"> 1373/* Continue parsing a document which had been suspended. The 'p' and 1374 'fd' arguments are the same as passed to parse_xml(). Return 1375 non-zero when the parse is suspended. 1376*/ 1377int 1378continue_parsing(XML_Parser p, int fd) 1379{ 1380 enum XML_Status status = XML_ResumeParser(p); 1381 switch (status) { 1382 case XML_STATUS_ERROR: 1383 /* handle error... */ 1384 return 0; 1385 case XML_ERROR_NOT_SUSPENDED: 1386 /* handle error... */ 1387 return 0;. 1388 case XML_STATUS_SUSPENDED: 1389 return 1; 1390 } 1391 return parse_xml(p, fd); 1392} 1393</pre> 1394 <p> 1395 Now that we've seen what a mess the top-level parsing loop can become, what have 1396 we gained? Very simply, we can now use the <code><a href= 1397 "#XML_StopParser">XML_StopParser</a></code> function to stop parsing, without 1398 having to go to great lengths to avoid additional processing that we're expecting 1399 to ignore. As a bonus, we get to stop parsing <em>temporarily</em>, and come back 1400 to it when we're ready. 1401 </p> 1402 1403 <p> 1404 To stop parsing from a handler function, use the <code><a href= 1405 "#XML_StopParser">XML_StopParser</a></code> function. This function takes two 1406 arguments; the parser being stopped and a flag indicating whether the parse can 1407 be resumed in the future. 1408 </p> 1409 <!-- XXX really need more here --> 1410 1411 <hr /> 1412 <!-- ================================================================ --> 1413 1414 <h2> 1415 <a id="reference" name="reference">Expat Reference</a> 1416 </h2> 1417 1418 <h3> 1419 <a id="creation" name="creation">Parser Creation</a> 1420 </h3> 1421 1422 <h4 id="XML_ParserCreate"> 1423 XML_ParserCreate 1424 </h4> 1425 1426 <pre class="fcndec"> 1427XML_Parser XMLCALL 1428XML_ParserCreate(const XML_Char *encoding); 1429</pre> 1430 <div class="fcndef"> 1431 <p> 1432 Construct a new parser. If encoding is non-<code>NULL</code>, it specifies a 1433 character encoding to use for the document. This overrides the document 1434 encoding declaration. There are four built-in encodings: 1435 </p> 1436 1437 <ul> 1438 <li>US-ASCII 1439 </li> 1440 1441 <li>UTF-8 1442 </li> 1443 1444 <li>UTF-16 1445 </li> 1446 1447 <li>ISO-8859-1 1448 </li> 1449 </ul> 1450 1451 <p> 1452 Any other value will invoke a call to the UnknownEncodingHandler. 1453 </p> 1454 </div> 1455 1456 <h4 id="XML_ParserCreateNS"> 1457 XML_ParserCreateNS 1458 </h4> 1459 1460 <pre class="fcndec"> 1461XML_Parser XMLCALL 1462XML_ParserCreateNS(const XML_Char *encoding, 1463 XML_Char sep); 1464</pre> 1465 <div class="fcndef"> 1466 Constructs a new parser that has namespace processing in effect. Namespace 1467 expanded element names and attribute names are returned as a concatenation of the 1468 namespace URI, <em>sep</em>, and the local part of the name. This means that you 1469 should pick a character for <em>sep</em> that can't be part of an URI. Since 1470 Expat does not check namespace URIs for conformance, the only safe choice for a 1471 namespace separator is a character that is illegal in XML. For instance, 1472 <code>'\xFF'</code> is not legal in UTF-8, and <code>'\xFFFF'</code> is not legal 1473 in UTF-16. There is a special case when <em>sep</em> is the null character 1474 <code>'\0'</code>: the namespace URI and the local part will be concatenated 1475 without any separator - this is intended to support RDF processors. It is a 1476 programming error to use the null separator with <a href= 1477 "#XML_SetReturnNSTriplet">namespace triplets</a>. 1478 </div> 1479 1480 <p> 1481 <strong>Note:</strong> Expat does not validate namespace URIs (beyond encoding) 1482 against RFC 3986 today (and is not required to do so with regard to the XML 1.0 1483 namespaces specification) but it may start doing that in future releases. Before 1484 that, an application using Expat must be ready to receive namespace URIs 1485 containing non-URI characters. 1486 </p> 1487 1488 <h4 id="XML_ParserCreate_MM"> 1489 XML_ParserCreate_MM 1490 </h4> 1491 1492 <pre class="fcndec"> 1493XML_Parser XMLCALL 1494XML_ParserCreate_MM(const XML_Char *encoding, 1495 const XML_Memory_Handling_Suite *ms, 1496 const XML_Char *sep); 1497</pre> 1498 1499 <pre class="signature"> 1500typedef struct { 1501 void *(XMLCALL *malloc_fcn)(size_t size); 1502 void *(XMLCALL *realloc_fcn)(void *ptr, size_t size); 1503 void (XMLCALL *free_fcn)(void *ptr); 1504} XML_Memory_Handling_Suite; 1505</pre> 1506 <div class="fcndef"> 1507 <p> 1508 Construct a new parser using the suite of memory handling functions specified 1509 in <code>ms</code>. If <code>ms</code> is <code>NULL</code>, then use the 1510 standard set of memory management functions. If <code>sep</code> is 1511 non-<code>NULL</code>, then namespace processing is enabled in the created 1512 parser and the character pointed at by sep is used as the separator between the 1513 namespace URI and the local part of the name. 1514 </p> 1515 </div> 1516 1517 <h4 id="XML_ExternalEntityParserCreate"> 1518 XML_ExternalEntityParserCreate 1519 </h4> 1520 1521 <pre class="fcndec"> 1522XML_Parser XMLCALL 1523XML_ExternalEntityParserCreate(XML_Parser p, 1524 const XML_Char *context, 1525 const XML_Char *encoding); 1526</pre> 1527 <div class="fcndef"> 1528 <p> 1529 Construct a new <code>XML_Parser</code> object for parsing an external general 1530 entity. Context is the context argument passed in a call to a 1531 ExternalEntityRefHandler. Other state information such as handlers, user data, 1532 namespace processing is inherited from the parser passed as the 1st argument. 1533 So you shouldn't need to call any of the behavior changing functions on this 1534 parser (unless you want it to act differently than the parent parser). 1535 </p> 1536 1537 <p> 1538 <strong>Note:</strong> Please be sure to free subparsers created by 1539 <code><a href= 1540 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> 1541 <em>prior to</em> freeing their related parent parser, as subparsers reference 1542 and use parts of their respective parent parser, internally. Parent parsers 1543 must outlive subparsers. 1544 </p> 1545 </div> 1546 1547 <h4 id="XML_ParserFree"> 1548 XML_ParserFree 1549 </h4> 1550 1551 <pre class="fcndec"> 1552void XMLCALL 1553XML_ParserFree(XML_Parser p); 1554</pre> 1555 <div class="fcndef"> 1556 <p> 1557 Free memory used by the parser. 1558 </p> 1559 1560 <p> 1561 <strong>Note:</strong> Your application is responsible for freeing any memory 1562 associated with <a href="#userdata">user data</a>. 1563 </p> 1564 1565 <p> 1566 <strong>Note:</strong> Please be sure to free subparsers created by 1567 <code><a href= 1568 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> 1569 <em>prior to</em> freeing their related parent parser, as subparsers reference 1570 and use parts of their respective parent parser, internally. Parent parsers 1571 must outlive subparsers. 1572 </p> 1573 </div> 1574 1575 <h4 id="XML_ParserReset"> 1576 XML_ParserReset 1577 </h4> 1578 1579 <pre class="fcndec"> 1580XML_Bool XMLCALL 1581XML_ParserReset(XML_Parser p, 1582 const XML_Char *encoding); 1583</pre> 1584 <div class="fcndef"> 1585 Clean up the memory structures maintained by the parser so that it may be used 1586 again. After this has been called, <code>parser</code> is ready to start parsing 1587 a new document. All handlers are cleared from the parser, except for the 1588 unknownEncodingHandler. The parser's external state is re-initialized except for 1589 the values of ns and ns_triplets. This function may not be used on a parser 1590 created using <code><a href= 1591 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>; it 1592 will return <code>XML_FALSE</code> in that case. Returns <code>XML_TRUE</code> on 1593 success. Your application is responsible for dealing with any memory associated 1594 with <a href="#userdata">user data</a>. 1595 </div> 1596 1597 <h3> 1598 <a id="parsing" name="parsing">Parsing</a> 1599 </h3> 1600 1601 <p> 1602 To state the obvious: the three parsing functions <code><a href= 1603 "#XML_Parse">XML_Parse</a></code>, <code><a href= 1604 "#XML_ParseBuffer">XML_ParseBuffer</a></code> and <code><a href= 1605 "#XML_GetBuffer">XML_GetBuffer</a></code> must not be called from within a 1606 handler unless they operate on a separate parser instance, that is, one that did 1607 not call the handler. For example, it is OK to call the parsing functions from 1608 within an <code>XML_ExternalEntityRefHandler</code>, if they apply to the parser 1609 created by <code><a href= 1610 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 1611 </p> 1612 1613 <p> 1614 Note: The <code>len</code> argument passed to these functions should be 1615 considerably less than the maximum value for an integer, as it could create an 1616 integer overflow situation if the added lengths of a buffer and the unprocessed 1617 portion of the previous buffer exceed the maximum integer value. Input data at 1618 the end of a buffer will remain unprocessed if it is part of an XML token for 1619 which the end is not part of that buffer. 1620 </p> 1621 1622 <p> 1623 <a id="isFinal" name="isFinal"></a>The application <em>must</em> make a 1624 concluding <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 1625 "#XML_ParseBuffer">XML_ParseBuffer</a></code> call with <code>isFinal</code> set 1626 to <code>XML_TRUE</code>. 1627 </p> 1628 1629 <h4 id="XML_Parse"> 1630 XML_Parse 1631 </h4> 1632 1633 <pre class="fcndec"> 1634enum XML_Status XMLCALL 1635XML_Parse(XML_Parser p, 1636 const char *s, 1637 int len, 1638 int isFinal); 1639</pre> 1640 1641 <pre class="signature"> 1642enum XML_Status { 1643 XML_STATUS_ERROR = 0, 1644 XML_STATUS_OK = 1 1645}; 1646</pre> 1647 <div class="fcndef"> 1648 <p> 1649 Parse some more of the document. The string <code>s</code> is a buffer 1650 containing part (or perhaps all) of the document. The number of bytes of s that 1651 are part of the document is indicated by <code>len</code>. This means that 1652 <code>s</code> doesn't have to be null-terminated. It also means that if 1653 <code>len</code> is larger than the number of bytes in the block of memory that 1654 <code>s</code> points at, then a memory fault is likely. Negative values for 1655 <code>len</code> are rejected since Expat 2.2.1. The <code>isFinal</code> 1656 parameter informs the parser that this is the last piece of the document. 1657 Frequently, the last piece is empty (i.e. <code>len</code> is zero.) 1658 </p> 1659 1660 <p> 1661 If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. Otherwise 1662 it returns <code>XML_STATUS_OK</code> value. Note that regardless of the return 1663 value, there is no guarantee that all provided input has been parsed; only 1664 after <a href="#isFinal">the concluding call</a> will all handler callbacks and 1665 parsing errors have happened. 1666 </p> 1667 1668 <p> 1669 Simplified, <code>XML_Parse</code> can be considered a convenience wrapper that 1670 is pairing calls to <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> and 1671 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> (when Expat is 1672 built with macro <code>XML_CONTEXT_BYTES</code> defined to a positive value, 1673 which is both common and default). <code>XML_Parse</code> is then functionally 1674 equivalent to calling <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code>, 1675 <code>memcpy</code>, and <code><a href= 1676 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. 1677 </p> 1678 1679 <p> 1680 To avoid double copying of the input, direct use of functions <code><a href= 1681 "#XML_GetBuffer">XML_GetBuffer</a></code> and <code><a href= 1682 "#XML_ParseBuffer">XML_ParseBuffer</a></code> is advised for most production 1683 use, e.g. if you're using <code>read</code> or similar functionality to fill 1684 your buffers, fill directly into the buffer from <code><a href= 1685 "#XML_GetBuffer">XML_GetBuffer</a></code>, then parse with <code><a href= 1686 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. 1687 </p> 1688 </div> 1689 1690 <h4 id="XML_ParseBuffer"> 1691 XML_ParseBuffer 1692 </h4> 1693 1694 <pre class="fcndec"> 1695enum XML_Status XMLCALL 1696XML_ParseBuffer(XML_Parser p, 1697 int len, 1698 int isFinal); 1699</pre> 1700 <div class="fcndef"> 1701 <p> 1702 This is just like <code><a href="#XML_Parse">XML_Parse</a></code>, except in 1703 this case Expat provides the buffer. By obtaining the buffer from Expat with 1704 the <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> function, the 1705 application can avoid double copying of the input. 1706 </p> 1707 1708 <p> 1709 Negative values for <code>len</code> are rejected since Expat 2.6.3. 1710 </p> 1711 </div> 1712 1713 <h4 id="XML_GetBuffer"> 1714 XML_GetBuffer 1715 </h4> 1716 1717 <pre class="fcndec"> 1718void * XMLCALL 1719XML_GetBuffer(XML_Parser p, 1720 int len); 1721</pre> 1722 <div class="fcndef"> 1723 Obtain a buffer of size <code>len</code> to read a piece of the document into. A 1724 <code>NULL</code> value is returned if Expat can't allocate enough memory for 1725 this buffer. A <code>NULL</code> value may also be returned if <code>len</code> 1726 is zero. This has to be called prior to every call to <code><a href= 1727 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. A typical use would look like 1728 this: 1729 1730 <pre class="eg"> 1731for (;;) { 1732 int bytes_read; 1733 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1734 if (buff == NULL) { 1735 /* handle error */ 1736 } 1737 1738 bytes_read = read(docfd, buff, BUFF_SIZE); 1739 if (bytes_read < 0) { 1740 /* handle error */ 1741 } 1742 1743 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) { 1744 /* handle parse error */ 1745 } 1746 1747 if (bytes_read == 0) 1748 break; 1749} 1750</pre> 1751 </div> 1752 1753 <h4 id="XML_StopParser"> 1754 XML_StopParser 1755 </h4> 1756 1757 <pre class="fcndec"> 1758enum XML_Status XMLCALL 1759XML_StopParser(XML_Parser p, 1760 XML_Bool resumable); 1761</pre> 1762 <div class="fcndef"> 1763 <p> 1764 Stops parsing, causing <code><a href="#XML_Parse">XML_Parse</a></code> or 1765 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> to return. Must be 1766 called from within a call-back handler, except when aborting (when 1767 <code>resumable</code> is <code>XML_FALSE</code>) an already suspended parser. 1768 Some call-backs may still follow because they would otherwise get lost, 1769 including 1770 </p> 1771 1772 <ul> 1773 <li>the end element handler for empty elements when stopped in the start 1774 element handler, 1775 </li> 1776 1777 <li>the end namespace declaration handler when stopped in the end element 1778 handler, 1779 </li> 1780 1781 <li>the character data handler when stopped in the character data handler while 1782 making multiple call-backs on a contiguous chunk of characters, 1783 </li> 1784 </ul> 1785 1786 <p> 1787 and possibly others. 1788 </p> 1789 1790 <p> 1791 This can be called from most handlers, including DTD related call-backs, except 1792 when parsing an external parameter entity and <code>resumable</code> is 1793 <code>XML_TRUE</code>. Returns <code>XML_STATUS_OK</code> when successful, 1794 <code>XML_STATUS_ERROR</code> otherwise. The possible error codes are: 1795 </p> 1796 1797 <dl> 1798 <dt> 1799 <code>XML_ERROR_NOT_STARTED</code> 1800 </dt> 1801 1802 <dd> 1803 when stopping or suspending a parser before it has started, added in Expat 1804 2.6.4. 1805 </dd> 1806 1807 <dt> 1808 <code>XML_ERROR_SUSPENDED</code> 1809 </dt> 1810 1811 <dd> 1812 when suspending an already suspended parser. 1813 </dd> 1814 1815 <dt> 1816 <code>XML_ERROR_FINISHED</code> 1817 </dt> 1818 1819 <dd> 1820 when the parser has already finished. 1821 </dd> 1822 1823 <dt> 1824 <code>XML_ERROR_SUSPEND_PE</code> 1825 </dt> 1826 1827 <dd> 1828 when suspending while parsing an external PE. 1829 </dd> 1830 </dl> 1831 1832 <p> 1833 Since the stop/resume feature requires application support in the outer parsing 1834 loop, it is an error to call this function for a parser not being handled 1835 appropriately; see <a href="#stop-resume">Temporarily Stopping Parsing</a> for 1836 more information. 1837 </p> 1838 1839 <p> 1840 When <code>resumable</code> is <code>XML_TRUE</code> then parsing is 1841 <em>suspended</em>, that is, <code><a href="#XML_Parse">XML_Parse</a></code> 1842 and <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> return 1843 <code>XML_STATUS_SUSPENDED</code>. Otherwise, parsing is <em>aborted</em>, that 1844 is, <code><a href="#XML_Parse">XML_Parse</a></code> and <code><a href= 1845 "#XML_ParseBuffer">XML_ParseBuffer</a></code> return 1846 <code>XML_STATUS_ERROR</code> with error code <code>XML_ERROR_ABORTED</code>. 1847 </p> 1848 1849 <p> 1850 <strong>Note:</strong> This will be applied to the current parser instance 1851 only, that is, if there is a parent parser then it will continue parsing when 1852 the external entity reference handler returns. It is up to the implementation 1853 of that handler to call <code><a href= 1854 "#XML_StopParser">XML_StopParser</a></code> on the parent parser (recursively), 1855 if one wants to stop parsing altogether. 1856 </p> 1857 1858 <p> 1859 When suspended, parsing can be resumed by calling <code><a href= 1860 "#XML_ResumeParser">XML_ResumeParser</a></code>. 1861 </p> 1862 1863 <p> 1864 New in Expat 1.95.8. 1865 </p> 1866 </div> 1867 1868 <h4 id="XML_ResumeParser"> 1869 XML_ResumeParser 1870 </h4> 1871 1872 <pre class="fcndec"> 1873enum XML_Status XMLCALL 1874XML_ResumeParser(XML_Parser p); 1875</pre> 1876 <div class="fcndef"> 1877 <p> 1878 Resumes parsing after it has been suspended with <code><a href= 1879 "#XML_StopParser">XML_StopParser</a></code>. Must not be called from within a 1880 handler call-back. Returns same status codes as <code><a href= 1881 "#XML_Parse">XML_Parse</a></code> or <code><a href= 1882 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. An additional error code, 1883 <code>XML_ERROR_NOT_SUSPENDED</code>, will be returned if the parser was not 1884 currently suspended. 1885 </p> 1886 1887 <p> 1888 <strong>Note:</strong> This must be called on the most deeply nested child 1889 parser instance first, and on its parent parser only after the child parser has 1890 finished, to be applied recursively until the document entity's parser is 1891 restarted. That is, the parent parser will not resume by itself and it is up to 1892 the application to call <code><a href= 1893 "#XML_ResumeParser">XML_ResumeParser</a></code> on it at the appropriate 1894 moment. 1895 </p> 1896 1897 <p> 1898 New in Expat 1.95.8. 1899 </p> 1900 </div> 1901 1902 <h4 id="XML_GetParsingStatus"> 1903 XML_GetParsingStatus 1904 </h4> 1905 1906 <pre class="fcndec"> 1907void XMLCALL 1908XML_GetParsingStatus(XML_Parser p, 1909 XML_ParsingStatus *status); 1910</pre> 1911 1912 <pre class="signature"> 1913enum XML_Parsing { 1914 XML_INITIALIZED, 1915 XML_PARSING, 1916 XML_FINISHED, 1917 XML_SUSPENDED 1918}; 1919 1920typedef struct { 1921 enum XML_Parsing parsing; 1922 XML_Bool finalBuffer; 1923} XML_ParsingStatus; 1924</pre> 1925 <div class="fcndef"> 1926 <p> 1927 Returns status of parser with respect to being initialized, parsing, finished, 1928 or suspended, and whether the final buffer is being processed. The 1929 <code>status</code> parameter <em>must not</em> be <code>NULL</code>. 1930 </p> 1931 1932 <p> 1933 New in Expat 1.95.8. 1934 </p> 1935 </div> 1936 1937 <h3> 1938 <a id="setting" name="setting">Handler Setting</a> 1939 </h3> 1940 1941 <p> 1942 Although handlers are typically set prior to parsing and left alone, an 1943 application may choose to set or change the handler for a parsing event while the 1944 parse is in progress. For instance, your application may choose to ignore all 1945 text not descended from a <code>para</code> element. One way it could do this is 1946 to set the character handler when a para start tag is seen, and unset it for the 1947 corresponding end tag. 1948 </p> 1949 1950 <p> 1951 A handler may be <em>unset</em> by providing a <code>NULL</code> pointer to the 1952 appropriate handler setter. None of the handler setting functions have a return 1953 value. 1954 </p> 1955 1956 <p> 1957 Your handlers will be receiving strings in arrays of type <code>XML_Char</code>. 1958 This type is conditionally defined in expat.h as either <code>char</code>, 1959 <code>wchar_t</code> or <code>unsigned short</code>. The former implies UTF-8 1960 encoding, the latter two imply UTF-16 encoding. Note that you'll receive them in 1961 this form independent of the original encoding of the document. 1962 </p> 1963 1964 <div class="handler"> 1965 <h4 id="XML_SetStartElementHandler"> 1966 XML_SetStartElementHandler 1967 </h4> 1968 1969 <pre class="setter"> 1970void XMLCALL 1971XML_SetStartElementHandler(XML_Parser p, 1972 XML_StartElementHandler start); 1973</pre> 1974 1975 <pre class="signature"> 1976typedef void 1977(XMLCALL *XML_StartElementHandler)(void *userData, 1978 const XML_Char *name, 1979 const XML_Char **atts); 1980</pre> 1981 <p> 1982 Set handler for start (and empty) tags. Attributes are passed to the start 1983 handler as a pointer to a vector of char pointers. Each attribute seen in a 1984 start (or empty) tag occupies 2 consecutive places in this vector: the 1985 attribute name followed by the attribute value. These pairs are terminated by a 1986 <code>NULL</code> pointer. 1987 </p> 1988 1989 <p> 1990 Note that an empty tag generates a call to both start and end handlers (in that 1991 order). 1992 </p> 1993 </div> 1994 1995 <div class="handler"> 1996 <h4 id="XML_SetEndElementHandler"> 1997 XML_SetEndElementHandler 1998 </h4> 1999 2000 <pre class="setter"> 2001void XMLCALL 2002XML_SetEndElementHandler(XML_Parser p, 2003 XML_EndElementHandler); 2004</pre> 2005 2006 <pre class="signature"> 2007typedef void 2008(XMLCALL *XML_EndElementHandler)(void *userData, 2009 const XML_Char *name); 2010</pre> 2011 <p> 2012 Set handler for end (and empty) tags. As noted above, an empty tag generates a 2013 call to both start and end handlers. 2014 </p> 2015 </div> 2016 2017 <div class="handler"> 2018 <h4 id="XML_SetElementHandler"> 2019 XML_SetElementHandler 2020 </h4> 2021 2022 <pre class="setter"> 2023void XMLCALL 2024XML_SetElementHandler(XML_Parser p, 2025 XML_StartElementHandler start, 2026 XML_EndElementHandler end); 2027</pre> 2028 <p> 2029 Set handlers for start and end tags with one call. 2030 </p> 2031 </div> 2032 2033 <div class="handler"> 2034 <h4 id="XML_SetCharacterDataHandler"> 2035 XML_SetCharacterDataHandler 2036 </h4> 2037 2038 <pre class="setter"> 2039void XMLCALL 2040XML_SetCharacterDataHandler(XML_Parser p, 2041 XML_CharacterDataHandler charhndl) 2042</pre> 2043 2044 <pre class="signature"> 2045typedef void 2046(XMLCALL *XML_CharacterDataHandler)(void *userData, 2047 const XML_Char *s, 2048 int len); 2049</pre> 2050 <p> 2051 Set a text handler. The string your handler receives is <em>NOT 2052 null-terminated</em>. You have to use the length argument to deal with the end 2053 of the string. A single block of contiguous text free of markup may still 2054 result in a sequence of calls to this handler. In other words, if you're 2055 searching for a pattern in the text, it may be split across calls to this 2056 handler. Note: Setting this handler to <code>NULL</code> may <em>NOT 2057 immediately</em> terminate call-backs if the parser is currently processing 2058 such a single block of contiguous markup-free text, as the parser will continue 2059 calling back until the end of the block is reached. 2060 </p> 2061 </div> 2062 2063 <div class="handler"> 2064 <h4 id="XML_SetProcessingInstructionHandler"> 2065 XML_SetProcessingInstructionHandler 2066 </h4> 2067 2068 <pre class="setter"> 2069void XMLCALL 2070XML_SetProcessingInstructionHandler(XML_Parser p, 2071 XML_ProcessingInstructionHandler proc) 2072</pre> 2073 2074 <pre class="signature"> 2075typedef void 2076(XMLCALL *XML_ProcessingInstructionHandler)(void *userData, 2077 const XML_Char *target, 2078 const XML_Char *data); 2079 2080</pre> 2081 <p> 2082 Set a handler for processing instructions. The target is the first word in the 2083 processing instruction. The data is the rest of the characters in it after 2084 skipping all whitespace after the initial word. 2085 </p> 2086 </div> 2087 2088 <div class="handler"> 2089 <h4 id="XML_SetCommentHandler"> 2090 XML_SetCommentHandler 2091 </h4> 2092 2093 <pre class="setter"> 2094void XMLCALL 2095XML_SetCommentHandler(XML_Parser p, 2096 XML_CommentHandler cmnt) 2097</pre> 2098 2099 <pre class="signature"> 2100typedef void 2101(XMLCALL *XML_CommentHandler)(void *userData, 2102 const XML_Char *data); 2103</pre> 2104 <p> 2105 Set a handler for comments. The data is all text inside the comment delimiters. 2106 </p> 2107 </div> 2108 2109 <div class="handler"> 2110 <h4 id="XML_SetStartCdataSectionHandler"> 2111 XML_SetStartCdataSectionHandler 2112 </h4> 2113 2114 <pre class="setter"> 2115void XMLCALL 2116XML_SetStartCdataSectionHandler(XML_Parser p, 2117 XML_StartCdataSectionHandler start); 2118</pre> 2119 2120 <pre class="signature"> 2121typedef void 2122(XMLCALL *XML_StartCdataSectionHandler)(void *userData); 2123</pre> 2124 <p> 2125 Set a handler that gets called at the beginning of a CDATA section. 2126 </p> 2127 </div> 2128 2129 <div class="handler"> 2130 <h4 id="XML_SetEndCdataSectionHandler"> 2131 XML_SetEndCdataSectionHandler 2132 </h4> 2133 2134 <pre class="setter"> 2135void XMLCALL 2136XML_SetEndCdataSectionHandler(XML_Parser p, 2137 XML_EndCdataSectionHandler end); 2138</pre> 2139 2140 <pre class="signature"> 2141typedef void 2142(XMLCALL *XML_EndCdataSectionHandler)(void *userData); 2143</pre> 2144 <p> 2145 Set a handler that gets called at the end of a CDATA section. 2146 </p> 2147 </div> 2148 2149 <div class="handler"> 2150 <h4 id="XML_SetCdataSectionHandler"> 2151 XML_SetCdataSectionHandler 2152 </h4> 2153 2154 <pre class="setter"> 2155void XMLCALL 2156XML_SetCdataSectionHandler(XML_Parser p, 2157 XML_StartCdataSectionHandler start, 2158 XML_EndCdataSectionHandler end) 2159</pre> 2160 <p> 2161 Sets both CDATA section handlers with one call. 2162 </p> 2163 </div> 2164 2165 <div class="handler"> 2166 <h4 id="XML_SetDefaultHandler"> 2167 XML_SetDefaultHandler 2168 </h4> 2169 2170 <pre class="setter"> 2171void XMLCALL 2172XML_SetDefaultHandler(XML_Parser p, 2173 XML_DefaultHandler hndl) 2174</pre> 2175 2176 <pre class="signature"> 2177typedef void 2178(XMLCALL *XML_DefaultHandler)(void *userData, 2179 const XML_Char *s, 2180 int len); 2181</pre> 2182 <p> 2183 Sets a handler for any characters in the document which wouldn't otherwise be 2184 handled. This includes both data for which no handlers can be set (like some 2185 kinds of DTD declarations) and data which could be reported but which currently 2186 has no handler set. The characters are passed exactly as they were present in 2187 the XML document except that they will be encoded in UTF-8 or UTF-16. Line 2188 boundaries are not normalized. Note that a byte order mark character is not 2189 passed to the default handler. There are no guarantees about how characters are 2190 divided between calls to the default handler: for example, a comment might be 2191 split between multiple calls. Setting the handler with this call has the side 2192 effect of turning off expansion of references to internally defined general 2193 entities. Instead these references are passed to the default handler. 2194 </p> 2195 2196 <p> 2197 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>. 2198 </p> 2199 </div> 2200 2201 <div class="handler"> 2202 <h4 id="XML_SetDefaultHandlerExpand"> 2203 XML_SetDefaultHandlerExpand 2204 </h4> 2205 2206 <pre class="setter"> 2207void XMLCALL 2208XML_SetDefaultHandlerExpand(XML_Parser p, 2209 XML_DefaultHandler hndl) 2210</pre> 2211 2212 <pre class="signature"> 2213typedef void 2214(XMLCALL *XML_DefaultHandler)(void *userData, 2215 const XML_Char *s, 2216 int len); 2217</pre> 2218 <p> 2219 This sets a default handler, but doesn't inhibit the expansion of internal 2220 entity references. The entity reference will not be passed to the default 2221 handler. 2222 </p> 2223 2224 <p> 2225 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>. 2226 </p> 2227 </div> 2228 2229 <div class="handler"> 2230 <h4 id="XML_SetExternalEntityRefHandler"> 2231 XML_SetExternalEntityRefHandler 2232 </h4> 2233 2234 <pre class="setter"> 2235void XMLCALL 2236XML_SetExternalEntityRefHandler(XML_Parser p, 2237 XML_ExternalEntityRefHandler hndl) 2238</pre> 2239 2240 <pre class="signature"> 2241typedef int 2242(XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p, 2243 const XML_Char *context, 2244 const XML_Char *base, 2245 const XML_Char *systemId, 2246 const XML_Char *publicId); 2247</pre> 2248 <p> 2249 Set an external entity reference handler. This handler is also called for 2250 processing an external DTD subset if parameter entity parsing is in effect. 2251 (See <a href= 2252 "#XML_SetParamEntityParsing"><code>XML_SetParamEntityParsing</code></a>.) 2253 </p> 2254 2255 <p> 2256 <strong>Warning:</strong> Using an external entity reference handler can lead 2257 to <a href="https://libexpat.github.io/doc/xml-security/#external-entities">XXE 2258 vulnerabilities</a>. It should only be used in applications that do not parse 2259 untrusted XML input. 2260 </p> 2261 2262 <p> 2263 The <code>context</code> parameter specifies the parsing context in the format 2264 expected by the <code>context</code> argument to <code><a href= 2265 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 2266 <code>code</code> is valid only until the handler returns, so if the referenced 2267 entity is to be parsed later, it must be copied. <code>context</code> is 2268 <code>NULL</code> only when the entity is a parameter entity, which is how one 2269 can differentiate between general and parameter entities. 2270 </p> 2271 2272 <p> 2273 The <code>base</code> parameter is the base to use for relative system 2274 identifiers. It is set by <code><a href="#XML_SetBase">XML_SetBase</a></code> 2275 and may be <code>NULL</code>. The <code>publicId</code> parameter is the public 2276 id given in the entity declaration and may be <code>NULL</code>. 2277 <code>systemId</code> is the system identifier specified in the entity 2278 declaration and is never <code>NULL</code>. 2279 </p> 2280 2281 <p> 2282 There are a couple of ways in which this handler differs from others. First, 2283 this handler returns a status indicator (an integer). 2284 <code>XML_STATUS_OK</code> should be returned for successful handling of the 2285 external entity reference. Returning <code>XML_STATUS_ERROR</code> indicates 2286 failure, and causes the calling parser to return an 2287 <code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error. 2288 </p> 2289 2290 <p> 2291 Second, instead of having the user data as its first argument, it receives the 2292 parser that encountered the entity reference. This, along with the context 2293 parameter, may be used as arguments to a call to <code><a href= 2294 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 2295 Using the returned parser, the body of the external entity can be recursively 2296 parsed. 2297 </p> 2298 2299 <p> 2300 Since this handler may be called recursively, it should not be saving 2301 information into global or static variables. 2302 </p> 2303 </div> 2304 2305 <h4 id="XML_SetExternalEntityRefHandlerArg"> 2306 XML_SetExternalEntityRefHandlerArg 2307 </h4> 2308 2309 <pre class="fcndec"> 2310void XMLCALL 2311XML_SetExternalEntityRefHandlerArg(XML_Parser p, 2312 void *arg) 2313</pre> 2314 <div class="fcndef"> 2315 <p> 2316 Set the argument passed to the ExternalEntityRefHandler. If <code>arg</code> is 2317 not <code>NULL</code>, it is the new value passed to the handler set using 2318 <code><a href= 2319 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>; 2320 if <code>arg</code> is <code>NULL</code>, the argument passed to the handler 2321 function will be the parser object itself. 2322 </p> 2323 2324 <p> 2325 <strong>Note:</strong> The type of <code>arg</code> and the type of the first 2326 argument to the ExternalEntityRefHandler do not match. This function takes a 2327 <code>void *</code> to be passed to the handler, while the handler accepts an 2328 <code>XML_Parser</code>. This is a historical accident, but will not be 2329 corrected before Expat 2.0 (at the earliest) to avoid causing compiler warnings 2330 for code that's known to work with this API. It is the responsibility of the 2331 application code to know the actual type of the argument passed to the handler 2332 and to manage it properly. 2333 </p> 2334 </div> 2335 2336 <div class="handler"> 2337 <h4 id="XML_SetSkippedEntityHandler"> 2338 XML_SetSkippedEntityHandler 2339 </h4> 2340 2341 <pre class="setter"> 2342void XMLCALL 2343XML_SetSkippedEntityHandler(XML_Parser p, 2344 XML_SkippedEntityHandler handler) 2345</pre> 2346 2347 <pre class="signature"> 2348typedef void 2349(XMLCALL *XML_SkippedEntityHandler)(void *userData, 2350 const XML_Char *entityName, 2351 int is_parameter_entity); 2352</pre> 2353 <p> 2354 Set a skipped entity handler. This is called in two situations: 2355 </p> 2356 2357 <ol> 2358 <li>An entity reference is encountered for which no declaration has been read 2359 <em>and</em> this is not an error. 2360 </li> 2361 2362 <li>An internal entity reference is read, but not expanded, because <a href= 2363 "#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> has been 2364 called. 2365 </li> 2366 </ol> 2367 2368 <p> 2369 The <code>is_parameter_entity</code> argument will be non-zero for a parameter 2370 entity and zero for a general entity. 2371 </p> 2372 2373 <p> 2374 Note: Skipped parameter entities in declarations and skipped general entities 2375 in attribute values cannot be reported, because the event would be out of sync 2376 with the reporting of the declarations or attribute values 2377 </p> 2378 </div> 2379 2380 <div class="handler"> 2381 <h4 id="XML_SetUnknownEncodingHandler"> 2382 XML_SetUnknownEncodingHandler 2383 </h4> 2384 2385 <pre class="setter"> 2386void XMLCALL 2387XML_SetUnknownEncodingHandler(XML_Parser p, 2388 XML_UnknownEncodingHandler enchandler, 2389 void *encodingHandlerData) 2390</pre> 2391 2392 <pre class="signature"> 2393typedef int 2394(XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData, 2395 const XML_Char *name, 2396 XML_Encoding *info); 2397 2398typedef struct { 2399 int map[256]; 2400 void *data; 2401 int (XMLCALL *convert)(void *data, const char *s); 2402 void (XMLCALL *release)(void *data); 2403} XML_Encoding; 2404</pre> 2405 <p> 2406 Set a handler to deal with encodings other than the <a href= 2407 "#builtin_encodings">built in set</a>. This should be done before 2408 <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 2409 "#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on the given 2410 parser. 2411 </p> 2412 2413 <p> 2414 If the handler knows how to deal with an encoding with the given name, it 2415 should fill in the <code>info</code> data structure and return 2416 <code>XML_STATUS_OK</code>. Otherwise it should return 2417 <code>XML_STATUS_ERROR</code>. The handler will be called at most once per 2418 parsed (external) entity. The optional application data pointer 2419 <code>encodingHandlerData</code> will be passed back to the handler. 2420 </p> 2421 2422 <p> 2423 The map array contains information for every possible leading byte in a byte 2424 sequence. If the corresponding value is >= 0, then it's a single byte 2425 sequence and the byte encodes that Unicode value. If the value is -1, then that 2426 byte is invalid as the initial byte in a sequence. If the value is -n, where n 2427 is an integer > 1, then n is the number of bytes in the sequence and the 2428 actual conversion is accomplished by a call to the function pointed at by 2429 convert. This function may return -1 if the sequence itself is invalid. The 2430 convert pointer may be <code>NULL</code> if there are only single byte codes. 2431 The data parameter passed to the convert function is the data pointer from 2432 <code>XML_Encoding</code>. The string s is <em>NOT</em> null-terminated and 2433 points at the sequence of bytes to be converted. 2434 </p> 2435 2436 <p> 2437 The function pointed at by <code>release</code> is called by the parser when it 2438 is finished with the encoding. It may be <code>NULL</code>. 2439 </p> 2440 </div> 2441 2442 <div class="handler"> 2443 <h4 id="XML_SetStartNamespaceDeclHandler"> 2444 XML_SetStartNamespaceDeclHandler 2445 </h4> 2446 2447 <pre class="setter"> 2448void XMLCALL 2449XML_SetStartNamespaceDeclHandler(XML_Parser p, 2450 XML_StartNamespaceDeclHandler start); 2451</pre> 2452 2453 <pre class="signature"> 2454typedef void 2455(XMLCALL *XML_StartNamespaceDeclHandler)(void *userData, 2456 const XML_Char *prefix, 2457 const XML_Char *uri); 2458</pre> 2459 <p> 2460 Set a handler to be called when a namespace is declared. Namespace declarations 2461 occur inside start tags. But the namespace declaration start handler is called 2462 before the start tag handler for each namespace declared in that start tag. 2463 </p> 2464 </div> 2465 2466 <div class="handler"> 2467 <h4 id="XML_SetEndNamespaceDeclHandler"> 2468 XML_SetEndNamespaceDeclHandler 2469 </h4> 2470 2471 <pre class="setter"> 2472void XMLCALL 2473XML_SetEndNamespaceDeclHandler(XML_Parser p, 2474 XML_EndNamespaceDeclHandler end); 2475</pre> 2476 2477 <pre class="signature"> 2478typedef void 2479(XMLCALL *XML_EndNamespaceDeclHandler)(void *userData, 2480 const XML_Char *prefix); 2481</pre> 2482 <p> 2483 Set a handler to be called when leaving the scope of a namespace declaration. 2484 This will be called, for each namespace declaration, after the handler for the 2485 end tag of the element in which the namespace was declared. 2486 </p> 2487 </div> 2488 2489 <div class="handler"> 2490 <h4 id="XML_SetNamespaceDeclHandler"> 2491 XML_SetNamespaceDeclHandler 2492 </h4> 2493 2494 <pre class="setter"> 2495void XMLCALL 2496XML_SetNamespaceDeclHandler(XML_Parser p, 2497 XML_StartNamespaceDeclHandler start, 2498 XML_EndNamespaceDeclHandler end) 2499</pre> 2500 <p> 2501 Sets both namespace declaration handlers with a single call. 2502 </p> 2503 </div> 2504 2505 <div class="handler"> 2506 <h4 id="XML_SetXmlDeclHandler"> 2507 XML_SetXmlDeclHandler 2508 </h4> 2509 2510 <pre class="setter"> 2511void XMLCALL 2512XML_SetXmlDeclHandler(XML_Parser p, 2513 XML_XmlDeclHandler xmldecl); 2514</pre> 2515 2516 <pre class="signature"> 2517typedef void 2518(XMLCALL *XML_XmlDeclHandler)(void *userData, 2519 const XML_Char *version, 2520 const XML_Char *encoding, 2521 int standalone); 2522</pre> 2523 <p> 2524 Sets a handler that is called for XML declarations and also for text 2525 declarations discovered in external entities. The way to distinguish is that 2526 the <code>version</code> parameter will be <code>NULL</code> for text 2527 declarations. The <code>encoding</code> parameter may be <code>NULL</code> for 2528 an XML declaration. The <code>standalone</code> argument will contain -1, 0, or 2529 1 indicating respectively that there was no standalone parameter in the 2530 declaration, that it was given as no, or that it was given as yes. 2531 </p> 2532 </div> 2533 2534 <div class="handler"> 2535 <h4 id="XML_SetStartDoctypeDeclHandler"> 2536 XML_SetStartDoctypeDeclHandler 2537 </h4> 2538 2539 <pre class="setter"> 2540void XMLCALL 2541XML_SetStartDoctypeDeclHandler(XML_Parser p, 2542 XML_StartDoctypeDeclHandler start); 2543</pre> 2544 2545 <pre class="signature"> 2546typedef void 2547(XMLCALL *XML_StartDoctypeDeclHandler)(void *userData, 2548 const XML_Char *doctypeName, 2549 const XML_Char *sysid, 2550 const XML_Char *pubid, 2551 int has_internal_subset); 2552</pre> 2553 <p> 2554 Set a handler that is called at the start of a DOCTYPE declaration, before any 2555 external or internal subset is parsed. Both <code>sysid</code> and 2556 <code>pubid</code> may be <code>NULL</code>. The 2557 <code>has_internal_subset</code> will be non-zero if the DOCTYPE declaration 2558 has an internal subset. 2559 </p> 2560 </div> 2561 2562 <div class="handler"> 2563 <h4 id="XML_SetEndDoctypeDeclHandler"> 2564 XML_SetEndDoctypeDeclHandler 2565 </h4> 2566 2567 <pre class="setter"> 2568void XMLCALL 2569XML_SetEndDoctypeDeclHandler(XML_Parser p, 2570 XML_EndDoctypeDeclHandler end); 2571</pre> 2572 2573 <pre class="signature"> 2574typedef void 2575(XMLCALL *XML_EndDoctypeDeclHandler)(void *userData); 2576</pre> 2577 <p> 2578 Set a handler that is called at the end of a DOCTYPE declaration, after parsing 2579 any external subset. 2580 </p> 2581 </div> 2582 2583 <div class="handler"> 2584 <h4 id="XML_SetDoctypeDeclHandler"> 2585 XML_SetDoctypeDeclHandler 2586 </h4> 2587 2588 <pre class="setter"> 2589void XMLCALL 2590XML_SetDoctypeDeclHandler(XML_Parser p, 2591 XML_StartDoctypeDeclHandler start, 2592 XML_EndDoctypeDeclHandler end); 2593</pre> 2594 <p> 2595 Set both doctype handlers with one call. 2596 </p> 2597 </div> 2598 2599 <div class="handler"> 2600 <h4 id="XML_SetElementDeclHandler"> 2601 XML_SetElementDeclHandler 2602 </h4> 2603 2604 <pre class="setter"> 2605void XMLCALL 2606XML_SetElementDeclHandler(XML_Parser p, 2607 XML_ElementDeclHandler eldecl); 2608</pre> 2609 2610 <pre class="signature"> 2611typedef void 2612(XMLCALL *XML_ElementDeclHandler)(void *userData, 2613 const XML_Char *name, 2614 XML_Content *model); 2615</pre> 2616 2617 <pre class="signature"> 2618enum XML_Content_Type { 2619 XML_CTYPE_EMPTY = 1, 2620 XML_CTYPE_ANY, 2621 XML_CTYPE_MIXED, 2622 XML_CTYPE_NAME, 2623 XML_CTYPE_CHOICE, 2624 XML_CTYPE_SEQ 2625}; 2626 2627enum XML_Content_Quant { 2628 XML_CQUANT_NONE, 2629 XML_CQUANT_OPT, 2630 XML_CQUANT_REP, 2631 XML_CQUANT_PLUS 2632}; 2633 2634typedef struct XML_cp XML_Content; 2635 2636struct XML_cp { 2637 enum XML_Content_Type type; 2638 enum XML_Content_Quant quant; 2639 const XML_Char * name; 2640 unsigned int numchildren; 2641 XML_Content * children; 2642}; 2643</pre> 2644 <p> 2645 Sets a handler for element declarations in a DTD. The handler gets called with 2646 the name of the element in the declaration and a pointer to a structure that 2647 contains the element model. It's the user code's responsibility to free model 2648 when finished with via a call to <code><a href= 2649 "#XML_FreeContentModel">XML_FreeContentModel</a></code>. There is no need to 2650 free the model from the handler, it can be kept around and freed at a later 2651 stage. 2652 </p> 2653 2654 <p> 2655 The <code>model</code> argument is the root of a tree of 2656 <code>XML_Content</code> nodes. If <code>type</code> equals 2657 <code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then 2658 <code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other fields 2659 will be zero or <code>NULL</code>. If <code>type</code> is 2660 <code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be 2661 <code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and 2662 <code>numchildren</code> will contain the number of elements that are allowed 2663 to be mixed in and <code>children</code> points to an array of 2664 <code>XML_Content</code> structures that will all have type XML_CTYPE_NAME with 2665 no quantification. Only the root node can be type <code>XML_CTYPE_EMPTY</code>, 2666 <code>XML_CTYPE_ANY</code>, or <code>XML_CTYPE_MIXED</code>. 2667 </p> 2668 2669 <p> 2670 For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field points to the 2671 name and the <code>numchildren</code> and <code>children</code> fields will be 2672 zero and <code>NULL</code>. The <code>quant</code> field will indicate any 2673 quantifiers placed on the name. 2674 </p> 2675 2676 <p> 2677 Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> indicate a 2678 choice or sequence respectively. The <code>numchildren</code> field indicates 2679 how many nodes in the choice or sequence and <code>children</code> points to 2680 the nodes. 2681 </p> 2682 </div> 2683 2684 <div class="handler"> 2685 <h4 id="XML_SetAttlistDeclHandler"> 2686 XML_SetAttlistDeclHandler 2687 </h4> 2688 2689 <pre class="setter"> 2690void XMLCALL 2691XML_SetAttlistDeclHandler(XML_Parser p, 2692 XML_AttlistDeclHandler attdecl); 2693</pre> 2694 2695 <pre class="signature"> 2696typedef void 2697(XMLCALL *XML_AttlistDeclHandler)(void *userData, 2698 const XML_Char *elname, 2699 const XML_Char *attname, 2700 const XML_Char *att_type, 2701 const XML_Char *dflt, 2702 int isrequired); 2703</pre> 2704 <p> 2705 Set a handler for attlist declarations in the DTD. This handler is called for 2706 <em>each</em> attribute. So a single attlist declaration with multiple 2707 attributes declared will generate multiple calls to this handler. The 2708 <code>elname</code> parameter returns the name of the element for which the 2709 attribute is being declared. The attribute name is in the <code>attname</code> 2710 parameter. The attribute type is in the <code>att_type</code> parameter. It is 2711 the string representing the type in the declaration with whitespace removed. 2712 </p> 2713 2714 <p> 2715 The <code>dflt</code> parameter holds the default value. It will be 2716 <code>NULL</code> in the case of "#IMPLIED" or "#REQUIRED" attributes. You can 2717 distinguish these two cases by checking the <code>isrequired</code> parameter, 2718 which will be true in the case of "#REQUIRED" attributes. Attributes which are 2719 "#FIXED" will have also have a true <code>isrequired</code>, but they will have 2720 the non-<code>NULL</code> fixed value in the <code>dflt</code> parameter. 2721 </p> 2722 </div> 2723 2724 <div class="handler"> 2725 <h4 id="XML_SetEntityDeclHandler"> 2726 XML_SetEntityDeclHandler 2727 </h4> 2728 2729 <pre class="setter"> 2730void XMLCALL 2731XML_SetEntityDeclHandler(XML_Parser p, 2732 XML_EntityDeclHandler handler); 2733</pre> 2734 2735 <pre class="signature"> 2736typedef void 2737(XMLCALL *XML_EntityDeclHandler)(void *userData, 2738 const XML_Char *entityName, 2739 int is_parameter_entity, 2740 const XML_Char *value, 2741 int value_length, 2742 const XML_Char *base, 2743 const XML_Char *systemId, 2744 const XML_Char *publicId, 2745 const XML_Char *notationName); 2746</pre> 2747 <p> 2748 Sets a handler that will be called for all entity declarations. The 2749 <code>is_parameter_entity</code> argument will be non-zero in the case of 2750 parameter entities and zero otherwise. 2751 </p> 2752 2753 <p> 2754 For internal entities (<code><!ENTITY foo "bar"></code>), 2755 <code>value</code> will be non-<code>NULL</code> and <code>systemId</code>, 2756 <code>publicId</code>, and <code>notationName</code> will all be 2757 <code>NULL</code>. The value string is <em>not</em> null-terminated; the length 2758 is provided in the <code>value_length</code> parameter. Do not use 2759 <code>value_length</code> to test for internal entities, since it is legal to 2760 have zero-length values. Instead check for whether or not <code>value</code> is 2761 <code>NULL</code>. 2762 </p> 2763 2764 <p> 2765 The <code>notationName</code> argument will have a non-<code>NULL</code> value 2766 only for unparsed entity declarations. 2767 </p> 2768 </div> 2769 2770 <div class="handler"> 2771 <h4 id="XML_SetUnparsedEntityDeclHandler"> 2772 XML_SetUnparsedEntityDeclHandler 2773 </h4> 2774 2775 <pre class="setter"> 2776void XMLCALL 2777XML_SetUnparsedEntityDeclHandler(XML_Parser p, 2778 XML_UnparsedEntityDeclHandler h) 2779</pre> 2780 2781 <pre class="signature"> 2782typedef void 2783(XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData, 2784 const XML_Char *entityName, 2785 const XML_Char *base, 2786 const XML_Char *systemId, 2787 const XML_Char *publicId, 2788 const XML_Char *notationName); 2789</pre> 2790 <p> 2791 Set a handler that receives declarations of unparsed entities. These are entity 2792 declarations that have a notation (NDATA) field: 2793 </p> 2794 2795 <div id="eg"> 2796 <pre> 2797<!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> 2798</pre> 2799 </div> 2800 2801 <p> 2802 This handler is obsolete and is provided for backwards compatibility. Use 2803 instead <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a>. 2804 </p> 2805 </div> 2806 2807 <div class="handler"> 2808 <h4 id="XML_SetNotationDeclHandler"> 2809 XML_SetNotationDeclHandler 2810 </h4> 2811 2812 <pre class="setter"> 2813void XMLCALL 2814XML_SetNotationDeclHandler(XML_Parser p, 2815 XML_NotationDeclHandler h) 2816</pre> 2817 2818 <pre class="signature"> 2819typedef void 2820(XMLCALL *XML_NotationDeclHandler)(void *userData, 2821 const XML_Char *notationName, 2822 const XML_Char *base, 2823 const XML_Char *systemId, 2824 const XML_Char *publicId); 2825</pre> 2826 <p> 2827 Set a handler that receives notation declarations. 2828 </p> 2829 </div> 2830 2831 <div class="handler"> 2832 <h4 id="XML_SetNotStandaloneHandler"> 2833 XML_SetNotStandaloneHandler 2834 </h4> 2835 2836 <pre class="setter"> 2837void XMLCALL 2838XML_SetNotStandaloneHandler(XML_Parser p, 2839 XML_NotStandaloneHandler h) 2840</pre> 2841 2842 <pre class="signature"> 2843typedef int 2844(XMLCALL *XML_NotStandaloneHandler)(void *userData); 2845</pre> 2846 <p> 2847 Set a handler that is called if the document is not "standalone". This happens 2848 when there is an external subset or a reference to a parameter entity, but does 2849 not have standalone set to "yes" in an XML declaration. If this handler returns 2850 <code>XML_STATUS_ERROR</code>, then the parser will throw an 2851 <code>XML_ERROR_NOT_STANDALONE</code> error. 2852 </p> 2853 </div> 2854 2855 <h3> 2856 <a id="position" name="position">Parse position and error reporting functions</a> 2857 </h3> 2858 2859 <p> 2860 These are the functions you'll want to call when the parse functions return 2861 <code>XML_STATUS_ERROR</code> (a parse error has occurred), although the position 2862 reporting functions are useful outside of errors. The position reported is the 2863 byte position (in the original document or entity encoding) of the first of the 2864 sequence of characters that generated the current event (or the error that caused 2865 the parse functions to return <code>XML_STATUS_ERROR</code>.) The exceptions are 2866 callbacks triggered by declarations in the document prologue, in which case they 2867 exact position reported is somewhere in the relevant markup, but not necessarily 2868 as meaningful as for other events. 2869 </p> 2870 2871 <p> 2872 The position reporting functions are accurate only outside of the DTD. In other 2873 words, they usually return bogus information when called from within a DTD 2874 declaration handler. 2875 </p> 2876 2877 <h4 id="XML_GetErrorCode"> 2878 XML_GetErrorCode 2879 </h4> 2880 2881 <pre class="fcndec"> 2882enum XML_Error XMLCALL 2883XML_GetErrorCode(XML_Parser p); 2884</pre> 2885 <div class="fcndef"> 2886 Return what type of error has occurred. 2887 </div> 2888 2889 <h4 id="XML_ErrorString"> 2890 XML_ErrorString 2891 </h4> 2892 2893 <pre class="fcndec"> 2894const XML_LChar * XMLCALL 2895XML_ErrorString(enum XML_Error code); 2896</pre> 2897 <div class="fcndef"> 2898 Return a string describing the error corresponding to code. The code should be 2899 one of the enums that can be returned from <code><a href= 2900 "#XML_GetErrorCode">XML_GetErrorCode</a></code>. 2901 </div> 2902 2903 <h4 id="XML_GetCurrentByteIndex"> 2904 XML_GetCurrentByteIndex 2905 </h4> 2906 2907 <pre class="fcndec"> 2908XML_Index XMLCALL 2909XML_GetCurrentByteIndex(XML_Parser p); 2910</pre> 2911 <div class="fcndef"> 2912 Return the byte offset of the position. This always corresponds to the values 2913 returned by <code><a href= 2914 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and 2915 <code><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code>. 2916 </div> 2917 2918 <h4 id="XML_GetCurrentLineNumber"> 2919 XML_GetCurrentLineNumber 2920 </h4> 2921 2922 <pre class="fcndec"> 2923XML_Size XMLCALL 2924XML_GetCurrentLineNumber(XML_Parser p); 2925</pre> 2926 <div class="fcndef"> 2927 Return the line number of the position. The first line is reported as 2928 <code>1</code>. 2929 </div> 2930 2931 <h4 id="XML_GetCurrentColumnNumber"> 2932 XML_GetCurrentColumnNumber 2933 </h4> 2934 2935 <pre class="fcndec"> 2936XML_Size XMLCALL 2937XML_GetCurrentColumnNumber(XML_Parser p); 2938</pre> 2939 <div class="fcndef"> 2940 Return the <em>offset</em>, from the beginning of the current line, of the 2941 position. The first column is reported as <code>0</code>. 2942 </div> 2943 2944 <h4 id="XML_GetCurrentByteCount"> 2945 XML_GetCurrentByteCount 2946 </h4> 2947 2948 <pre class="fcndec"> 2949int XMLCALL 2950XML_GetCurrentByteCount(XML_Parser p); 2951</pre> 2952 <div class="fcndef"> 2953 Return the number of bytes in the current event. Returns <code>0</code> if the 2954 event is inside a reference to an internal entity and for the end-tag event for 2955 empty element tags (the later can be used to distinguish empty-element tags from 2956 empty elements using separate start and end tags). 2957 </div> 2958 2959 <h4 id="XML_GetInputContext"> 2960 XML_GetInputContext 2961 </h4> 2962 2963 <pre class="fcndec"> 2964const char * XMLCALL 2965XML_GetInputContext(XML_Parser p, 2966 int *offset, 2967 int *size); 2968</pre> 2969 <div class="fcndef"> 2970 <p> 2971 Returns the parser's input buffer, sets the integer pointed at by 2972 <code>offset</code> to the offset within this buffer of the current parse 2973 position, and set the integer pointed at by <code>size</code> to the size of 2974 the returned buffer. 2975 </p> 2976 2977 <p> 2978 This should only be called from within a handler during an active parse and the 2979 returned buffer should only be referred to from within the handler that made 2980 the call. This input buffer contains the untranslated bytes of the input. 2981 </p> 2982 2983 <p> 2984 Only a limited amount of context is kept, so if the event triggering a call 2985 spans over a very large amount of input, the actual parse position may be 2986 before the beginning of the buffer. 2987 </p> 2988 2989 <p> 2990 If <code>XML_CONTEXT_BYTES</code> is zero, this will always return 2991 <code>NULL</code>. 2992 </p> 2993 </div> 2994 2995 <h3> 2996 <a id="attack-protection" name="attack-protection">Attack Protection</a><a id= 2997 "billion-laughs" name="billion-laughs"></a> 2998 </h3> 2999 3000 <h4 id="XML_SetBillionLaughsAttackProtectionMaximumAmplification"> 3001 XML_SetBillionLaughsAttackProtectionMaximumAmplification 3002 </h4> 3003 3004 <pre class="fcndec"> 3005/* Added in Expat 2.4.0. */ 3006XML_Bool XMLCALL 3007XML_SetBillionLaughsAttackProtectionMaximumAmplification(XML_Parser p, 3008 float maximumAmplificationFactor); 3009</pre> 3010 <div class="fcndef"> 3011 <p> 3012 Sets the maximum tolerated amplification factor for protection against <a href= 3013 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs 3014 attacks</a> (default: <code>100.0</code>) of parser <code>p</code> to 3015 <code>maximumAmplificationFactor</code>, and returns <code>XML_TRUE</code> upon 3016 success and <code>XML_FALSE</code> upon error. 3017 </p> 3018 3019 <p> 3020 Once the <a href= 3021 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">threshold for 3022 activation</a> is reached, the amplification factor is calculated as .. 3023 </p> 3024 3025 <pre>amplification := (direct + indirect) / direct</pre> 3026 <p> 3027 .. while parsing, whereas <code>direct</code> is the number of bytes read from 3028 the primary document in parsing and <code>indirect</code> is the number of 3029 bytes added by expanding entities and reading of external DTD files, combined. 3030 </p> 3031 3032 <p> 3033 For a call to 3034 <code>XML_SetBillionLaughsAttackProtectionMaximumAmplification</code> to 3035 succeed: 3036 </p> 3037 3038 <ul> 3039 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3040 any parent parsers) and 3041 </li> 3042 3043 <li> 3044 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and 3045 greater than or equal to <code>1.0</code>. 3046 </li> 3047 </ul> 3048 3049 <p> 3050 <strong>Note:</strong> If you ever need to increase this value for non-attack 3051 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3052 bug report</a>. 3053 </p> 3054 3055 <p> 3056 <strong>Note:</strong> Peak amplifications of factor 15,000 for the entire 3057 payload and of factor 30,000 in the middle of parsing have been observed with 3058 small benign files in practice. So if you do reduce the maximum allowed 3059 amplification, please make sure that the activation threshold is still big 3060 enough to not end up with undesired false positives (i.e. benign files being 3061 rejected). 3062 </p> 3063 </div> 3064 3065 <h4 id="XML_SetBillionLaughsAttackProtectionActivationThreshold"> 3066 XML_SetBillionLaughsAttackProtectionActivationThreshold 3067 </h4> 3068 3069 <pre class="fcndec"> 3070/* Added in Expat 2.4.0. */ 3071XML_Bool XMLCALL 3072XML_SetBillionLaughsAttackProtectionActivationThreshold(XML_Parser p, 3073 unsigned long long activationThresholdBytes); 3074</pre> 3075 <div class="fcndef"> 3076 <p> 3077 Sets number of output bytes (including amplification from entity expansion and 3078 reading DTD files) needed to activate protection against <a href= 3079 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs 3080 attacks</a> (default: <code>8 MiB</code>) of parser <code>p</code> to 3081 <code>activationThresholdBytes</code>, and returns <code>XML_TRUE</code> upon 3082 success and <code>XML_FALSE</code> upon error. 3083 </p> 3084 3085 <p> 3086 For a call to 3087 <code>XML_SetBillionLaughsAttackProtectionActivationThreshold</code> to 3088 succeed: 3089 </p> 3090 3091 <ul> 3092 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3093 any parent parsers). 3094 </li> 3095 </ul> 3096 3097 <p> 3098 <strong>Note:</strong> If you ever need to increase this value for non-attack 3099 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3100 bug report</a>. 3101 </p> 3102 3103 <p> 3104 <strong>Note:</strong> Activation thresholds below 4 MiB are known to break 3105 support for <a href= 3106 "https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</a> 3107 1.3 payload and are hence not recommended. 3108 </p> 3109 </div> 3110 3111 <h4 id="XML_SetAllocTrackerMaximumAmplification"> 3112 XML_SetAllocTrackerMaximumAmplification 3113 </h4> 3114 3115 <pre class="fcndec"> 3116/* Added in Expat 2.7.2. */ 3117XML_Bool 3118XML_SetAllocTrackerMaximumAmplification(XML_Parser p, 3119 float maximumAmplificationFactor); 3120</pre> 3121 <div class="fcndef"> 3122 <p> 3123 Sets the maximum tolerated amplification factor between direct input and bytes 3124 of dynamic memory allocated (default: <code>100.0</code>) of parser 3125 <code>p</code> to <code>maximumAmplificationFactor</code>, and returns 3126 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 3127 </p> 3128 3129 <p> 3130 <strong>Note:</strong> There are three types of allocations that intentionally 3131 bypass tracking and limiting: 3132 </p> 3133 3134 <ul> 3135 <li>application calls to functions <code><a href= 3136 "#XML_MemMalloc">XML_MemMalloc</a></code> and <code><a href="#XML_MemRealloc"> 3137 XML_MemRealloc</a></code> — <em>healthy</em> use of these two functions 3138 continues to be a responsibility of the application using Expat —, 3139 </li> 3140 3141 <li>the main character buffer used by functions <code><a href="#XML_GetBuffer"> 3142 XML_GetBuffer</a></code> and <code><a href= 3143 "#XML_ParseBuffer">XML_ParseBuffer</a></code> (and thus also by plain 3144 <code><a href="#XML_Parse">XML_Parse</a></code>), and 3145 </li> 3146 3147 <li>the <a href="#XML_SetElementDeclHandler">content model memory</a> (that is 3148 passed to the <a href="#XML_SetElementDeclHandler">element declaration 3149 handler</a> and freed by a call to <code><a href= 3150 "#XML_FreeContentModel">XML_FreeContentModel</a></code>). 3151 </li> 3152 </ul> 3153 3154 <p> 3155 Once the <a href="#XML_SetAllocTrackerActivationThreshold">threshold for 3156 activation</a> is reached, the amplification factor is calculated as .. 3157 </p> 3158 3159 <pre>amplification := allocated / direct</pre> 3160 <p> 3161 .. while parsing, whereas <code>direct</code> is the number of bytes read from 3162 the primary document in parsing and <code>allocated</code> is the number of 3163 bytes of dynamic memory allocated in the parser hierarchy. 3164 </p> 3165 3166 <p> 3167 For a call to <code>XML_SetAllocTrackerMaximumAmplification</code> to succeed: 3168 </p> 3169 3170 <ul> 3171 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3172 any parent parsers) and 3173 </li> 3174 3175 <li> 3176 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and 3177 greater than or equal to <code>1.0</code>. 3178 </li> 3179 </ul> 3180 3181 <p> 3182 <strong>Note:</strong> If you ever need to increase this value for non-attack 3183 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3184 bug report</a>. 3185 </p> 3186 3187 <p> 3188 <strong>Note:</strong> Amplifications factors greater than <code>100.0</code> 3189 can been observed near the start of parsing even with benign files in practice. 3190 So if you do reduce the maximum allowed amplification, please make sure that 3191 the activation threshold is still big enough to not end up with undesired false 3192 positives (i.e. benign files being rejected). 3193 </p> 3194 </div> 3195 3196 <h4 id="XML_SetAllocTrackerActivationThreshold"> 3197 XML_SetAllocTrackerActivationThreshold 3198 </h4> 3199 3200 <pre class="fcndec"> 3201/* Added in Expat 2.7.2. */ 3202XML_Bool 3203XML_SetAllocTrackerActivationThreshold(XML_Parser p, 3204 unsigned long long activationThresholdBytes); 3205</pre> 3206 <div class="fcndef"> 3207 <p> 3208 Sets number of allocated bytes of dynamic memory needed to activate protection 3209 against disproportionate use of RAM (default: <code>64 MiB</code>) of parser 3210 <code>p</code> to <code>activationThresholdBytes</code>, and returns 3211 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 3212 </p> 3213 3214 <p> 3215 <strong>Note:</strong> For types of allocations that intentionally bypass 3216 tracking and limiting, please see <code><a href= 3217 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a></code> 3218 above. 3219 </p> 3220 3221 <p> 3222 For a call to <code>XML_SetAllocTrackerActivationThreshold</code> to succeed: 3223 </p> 3224 3225 <ul> 3226 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3227 any parent parsers). 3228 </li> 3229 </ul> 3230 3231 <p> 3232 <strong>Note:</strong> If you ever need to increase this value for non-attack 3233 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3234 bug report</a>. 3235 </p> 3236 </div> 3237 3238 <h4 id="XML_SetReparseDeferralEnabled"> 3239 XML_SetReparseDeferralEnabled 3240 </h4> 3241 3242 <pre class="fcndec"> 3243/* Added in Expat 2.6.0. */ 3244XML_Bool XMLCALL 3245XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled); 3246</pre> 3247 <div class="fcndef"> 3248 <p> 3249 Large tokens may require many parse calls before enough data is available for 3250 Expat to parse it in full. If Expat retried parsing the token on every parse 3251 call, parsing could take quadratic time. To avoid this, Expat only retries once 3252 a significant amount of new data is available. This function allows disabling 3253 this behavior. 3254 </p> 3255 3256 <p> 3257 The <code>enabled</code> argument should be <code>XML_TRUE</code> or 3258 <code>XML_FALSE</code>. 3259 </p> 3260 3261 <p> 3262 Returns <code>XML_TRUE</code> on success, and <code>XML_FALSE</code> on error. 3263 </p> 3264 </div> 3265 3266 <h3> 3267 <a id="miscellaneous" name="miscellaneous">Miscellaneous functions</a> 3268 </h3> 3269 3270 <p> 3271 The functions in this section either obtain state information from the parser or 3272 can be used to dynamically set parser options. 3273 </p> 3274 3275 <h4 id="XML_SetUserData"> 3276 XML_SetUserData 3277 </h4> 3278 3279 <pre class="fcndec"> 3280void XMLCALL 3281XML_SetUserData(XML_Parser p, 3282 void *userData); 3283</pre> 3284 <div class="fcndef"> 3285 This sets the user data pointer that gets passed to handlers. It overwrites any 3286 previous value for this pointer. Note that the application is responsible for 3287 freeing the memory associated with <code>userData</code> when it is finished with 3288 the parser. So if you call this when there's already a pointer there, and you 3289 haven't freed the memory associated with it, then you've probably just leaked 3290 memory. 3291 </div> 3292 3293 <h4 id="XML_GetUserData"> 3294 XML_GetUserData 3295 </h4> 3296 3297 <pre class="fcndec"> 3298void * XMLCALL 3299XML_GetUserData(XML_Parser p); 3300</pre> 3301 <div class="fcndef"> 3302 This returns the user data pointer that gets passed to handlers. It is actually 3303 implemented as a macro. 3304 </div> 3305 3306 <h4 id="XML_UseParserAsHandlerArg"> 3307 XML_UseParserAsHandlerArg 3308 </h4> 3309 3310 <pre class="fcndec"> 3311void XMLCALL 3312XML_UseParserAsHandlerArg(XML_Parser p); 3313</pre> 3314 <div class="fcndef"> 3315 After this is called, handlers receive the parser in their <code>userData</code> 3316 arguments. The user data can still be obtained using the <code><a href= 3317 "#XML_GetUserData">XML_GetUserData</a></code> function. 3318 </div> 3319 3320 <h4 id="XML_SetBase"> 3321 XML_SetBase 3322 </h4> 3323 3324 <pre class="fcndec"> 3325enum XML_Status XMLCALL 3326XML_SetBase(XML_Parser p, 3327 const XML_Char *base); 3328</pre> 3329 <div class="fcndef"> 3330 Set the base to be used for resolving relative URIs in system identifiers. The 3331 return value is <code>XML_STATUS_ERROR</code> if there's no memory to store base, 3332 otherwise it's <code>XML_STATUS_OK</code>. 3333 </div> 3334 3335 <h4 id="XML_GetBase"> 3336 XML_GetBase 3337 </h4> 3338 3339 <pre class="fcndec"> 3340const XML_Char * XMLCALL 3341XML_GetBase(XML_Parser p); 3342</pre> 3343 <div class="fcndef"> 3344 Return the base for resolving relative URIs. 3345 </div> 3346 3347 <h4 id="XML_GetSpecifiedAttributeCount"> 3348 XML_GetSpecifiedAttributeCount 3349 </h4> 3350 3351 <pre class="fcndec"> 3352int XMLCALL 3353XML_GetSpecifiedAttributeCount(XML_Parser p); 3354</pre> 3355 <div class="fcndef"> 3356 When attributes are reported to the start handler in the atts vector, attributes 3357 that were explicitly set in the element occur before any attributes that receive 3358 their value from default information in an ATTLIST declaration. This function 3359 returns the number of attributes that were explicitly set times two, thus giving 3360 the offset in the <code>atts</code> array passed to the start tag handler of the 3361 first attribute set due to defaults. It supplies information for the last call to 3362 a start handler. If called inside a start handler, then that means the current 3363 call. 3364 </div> 3365 3366 <h4 id="XML_GetIdAttributeIndex"> 3367 XML_GetIdAttributeIndex 3368 </h4> 3369 3370 <pre class="fcndec"> 3371int XMLCALL 3372XML_GetIdAttributeIndex(XML_Parser p); 3373</pre> 3374 <div class="fcndef"> 3375 Returns the index of the ID attribute passed in the atts array in the last call 3376 to <code><a href="#XML_StartElementHandler">XML_StartElementHandler</a></code>, 3377 or -1 if there is no ID attribute. If called inside a start handler, then that 3378 means the current call. 3379 </div> 3380 3381 <h4 id="XML_GetAttributeInfo"> 3382 XML_GetAttributeInfo 3383 </h4> 3384 3385 <pre class="fcndec"> 3386const XML_AttrInfo * XMLCALL 3387XML_GetAttributeInfo(XML_Parser parser); 3388</pre> 3389 3390 <pre class="signature"> 3391typedef struct { 3392 XML_Index nameStart; /* Offset to beginning of the attribute name. */ 3393 XML_Index nameEnd; /* Offset after the attribute name's last byte. */ 3394 XML_Index valueStart; /* Offset to beginning of the attribute value. */ 3395 XML_Index valueEnd; /* Offset after the attribute value's last byte. */ 3396} XML_AttrInfo; 3397</pre> 3398 <div class="fcndef"> 3399 Returns an array of <code>XML_AttrInfo</code> structures for the attribute/value 3400 pairs passed in the last call to the <code>XML_StartElementHandler</code> that 3401 were specified in the start-tag rather than defaulted. Each attribute/value pair 3402 counts as 1; thus the number of entries in the array is 3403 <code>XML_GetSpecifiedAttributeCount(parser) / 2</code>. 3404 </div> 3405 3406 <h4 id="XML_SetEncoding"> 3407 XML_SetEncoding 3408 </h4> 3409 3410 <pre class="fcndec"> 3411enum XML_Status XMLCALL 3412XML_SetEncoding(XML_Parser p, 3413 const XML_Char *encoding); 3414</pre> 3415 <div class="fcndef"> 3416 Set the encoding to be used by the parser. It is equivalent to passing a 3417 non-<code>NULL</code> encoding argument to the parser creation functions. It must 3418 not be called after <code><a href="#XML_Parse">XML_Parse</a></code> or 3419 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on 3420 the given parser. Returns <code>XML_STATUS_OK</code> on success or 3421 <code>XML_STATUS_ERROR</code> on error. 3422 </div> 3423 3424 <h4 id="XML_SetParamEntityParsing"> 3425 XML_SetParamEntityParsing 3426 </h4> 3427 3428 <pre class="fcndec"> 3429int XMLCALL 3430XML_SetParamEntityParsing(XML_Parser p, 3431 enum XML_ParamEntityParsing code); 3432</pre> 3433 <div class="fcndef"> 3434 This enables parsing of parameter entities, including the external parameter 3435 entity that is the external DTD subset, according to <code>code</code>. The 3436 choices for <code>code</code> are: 3437 <ul> 3438 <li> 3439 <code>XML_PARAM_ENTITY_PARSING_NEVER</code> 3440 </li> 3441 3442 <li> 3443 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code> 3444 </li> 3445 3446 <li> 3447 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code> 3448 </li> 3449 </ul> 3450 <b>Note:</b> If <code>XML_SetParamEntityParsing</code> is called after 3451 <code>XML_Parse</code> or <code>XML_ParseBuffer</code>, then it has no effect and 3452 will always return 0. 3453 </div> 3454 3455 <h4 id="XML_SetHashSalt"> 3456 XML_SetHashSalt (deprecated) 3457 </h4> 3458 3459 <pre class="fcndec"> 3460int XMLCALL 3461XML_SetHashSalt(XML_Parser parser, 3462 unsigned long hash_salt); 3463</pre> 3464 <div class="fcndef"> 3465 Sets the hash salt to use for internal hash calculations. Helps in preventing DoS 3466 attacks based on predicting hash function behavior. In order to have an effect 3467 this must be called before parsing has started. Returns 1 if successful, 0 when 3468 called after <code>XML_Parse</code> or <code>XML_ParseBuffer</code> or when 3469 <code>parser</code> is <code>NULL</code>. 3470 <p> 3471 <b>Note:</b> Function <code>XML_SetHashSalt</code> is 3472 <strong>deprecated</strong>. Please use function <code><a href= 3473 "#XML_SetHashSalt16Bytes">XML_SetHashSalt16Bytes</a></code> instead for better 3474 security. <code>XML_SetHashSalt</code> only provides 4 to 8 bytes of entropy 3475 (depending on the size of type <code>unsigned long</code>) while the SipHash 3476 implementation used by Expat can leverage up to 16 bytes of entropy — at least 3477 twice as much. Function <code><a href= 3478 "#XML_SetHashSalt16Bytes">XML_SetHashSalt16Bytes</a></code> of Expat >=2.8.0 3479 (and where backported) matches the amount of entropy supported by SipHash. 3480 </p> 3481 3482 <p> 3483 <b>Note:</b> This call is optional, as the parser will auto-generate a new 3484 random salt value internally if no value has been set by the start of parsing. 3485 </p> 3486 3487 <p> 3488 <b>Note:</b> One should not call <code>XML_SetHashSalt</code> with a hash salt 3489 value of 0, as this value is used as sentinel value to indicate that 3490 <code>XML_SetHashSalt</code> has <b>not</b> been called. Consequently such a 3491 call will have no effect, even if it returns 1. 3492 </p> 3493 </div> 3494 3495 <h4 id="XML_SetHashSalt16Bytes"> 3496 XML_SetHashSalt16Bytes 3497 </h4> 3498 3499 <pre class="fcndec"> 3500/* Added in Expat 2.8.0. */ 3501XML_Bool XMLCALL 3502XML_SetHashSalt16Bytes(XML_Parser parser, 3503 const uint8_t entropy[16]); 3504</pre> 3505 <div class="fcndef"> 3506 Sets the hash salt to use for internal hash calculations. Helps in preventing DoS 3507 attacks based on predicting hash function behavior. In order to have an effect 3508 this must be called before parsing has started. Returns <code>XML_TRUE</code> if 3509 successful, <code>XML_FALSE</code> when called after <code>XML_Parse</code> or 3510 <code>XML_ParseBuffer</code> or when <code>parser</code> is <code>NULL</code>. 3511 <p> 3512 <b>Note:</b> Setting a salt that is <em>not</em> from a source of high quality 3513 entropy (like <code>getentropy(3)</code>) will make the parser vulnerable to 3514 hash flooding attacks. 3515 </p> 3516 3517 <p> 3518 <b>Note:</b> This call is optional, as the parser will auto-generate a new 3519 random salt value internally if no value has been set by the start of parsing. 3520 </p> 3521 </div> 3522 3523 <h4 id="XML_UseForeignDTD"> 3524 XML_UseForeignDTD 3525 </h4> 3526 3527 <pre class="fcndec"> 3528enum XML_Error XMLCALL 3529XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD); 3530</pre> 3531 <div class="fcndef"> 3532 <p> 3533 This function allows an application to provide an external subset for the 3534 document type declaration for documents which do not specify an external subset 3535 of their own. For documents which specify an external subset in their DOCTYPE 3536 declaration, the application-provided subset will be ignored. If the document 3537 does not contain a DOCTYPE declaration at all and <code>useDTD</code> is true, 3538 the application-provided subset will be parsed, but the 3539 <code>startDoctypeDeclHandler</code> and <code>endDoctypeDeclHandler</code> 3540 functions, if set, will not be called. The setting of parameter entity parsing, 3541 controlled using <code><a href= 3542 "#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code>, will be 3543 honored. 3544 </p> 3545 3546 <p> 3547 The application-provided external subset is read by calling the external entity 3548 reference handler set via <code><a href= 3549 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code> 3550 with both <code>publicId</code> and <code>systemId</code> set to 3551 <code>NULL</code>. 3552 </p> 3553 3554 <p> 3555 If this function is called after parsing has begun, it returns 3556 <code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores 3557 <code>useDTD</code>. If called when Expat has been compiled without DTD 3558 support, it returns <code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise, 3559 it returns <code>XML_ERROR_NONE</code>. 3560 </p> 3561 3562 <p> 3563 <b>Note:</b> For the purpose of checking WFC: Entity Declared, passing 3564 <code>useDTD == XML_TRUE</code> will make the parser behave as if the document 3565 had a DTD with an external subset. This holds true even if the external entity 3566 reference handler returns without action. 3567 </p> 3568 </div> 3569 3570 <h4 id="XML_SetReturnNSTriplet"> 3571 XML_SetReturnNSTriplet 3572 </h4> 3573 3574 <pre class="fcndec"> 3575void XMLCALL 3576XML_SetReturnNSTriplet(XML_Parser parser, 3577 int do_nst); 3578</pre> 3579 <div class="fcndef"> 3580 <p> 3581 This function only has an effect when using a parser created with 3582 <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, i.e. when 3583 namespace processing is in effect. The <code>do_nst</code> sets whether or not 3584 prefixes are returned with names qualified with a namespace prefix. If this 3585 function is called with <code>do_nst</code> non-zero, then afterwards namespace 3586 qualified names (that is qualified with a prefix as opposed to belonging to a 3587 default namespace) are returned as a triplet with the three parts separated by 3588 the namespace separator specified when the parser was created. The order of 3589 returned parts is URI, local name, and prefix. 3590 </p> 3591 3592 <p> 3593 If <code>do_nst</code> is zero, then namespaces are reported in the default 3594 manner, URI then local_name separated by the namespace separator. 3595 </p> 3596 </div> 3597 3598 <h4 id="XML_DefaultCurrent"> 3599 XML_DefaultCurrent 3600 </h4> 3601 3602 <pre class="fcndec"> 3603void XMLCALL 3604XML_DefaultCurrent(XML_Parser parser); 3605</pre> 3606 <div class="fcndef"> 3607 This can be called within a handler for a start element, end element, processing 3608 instruction or character data. It causes the corresponding markup to be passed to 3609 the default handler set by <code><a href= 3610 "#XML_SetDefaultHandler">XML_SetDefaultHandler</a></code> or <code><a href= 3611 "#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></code>. It does 3612 nothing if there is not a default handler. 3613 </div> 3614 3615 <h4 id="XML_ExpatVersion"> 3616 XML_ExpatVersion 3617 </h4> 3618 3619 <pre class="fcndec"> 3620XML_LChar * XMLCALL 3621XML_ExpatVersion(); 3622</pre> 3623 <div class="fcndef"> 3624 Return the library version as a string (e.g. <code>"expat_1.95.1"</code>). 3625 </div> 3626 3627 <h4 id="XML_ExpatVersionInfo"> 3628 XML_ExpatVersionInfo 3629 </h4> 3630 3631 <pre class="fcndec"> 3632struct XML_Expat_Version XMLCALL 3633XML_ExpatVersionInfo(); 3634</pre> 3635 3636 <pre class="signature"> 3637typedef struct { 3638 int major; 3639 int minor; 3640 int micro; 3641} XML_Expat_Version; 3642</pre> 3643 <div class="fcndef"> 3644 Return the library version information as a structure. Some macros are also 3645 defined that support compile-time tests of the library version: 3646 <ul> 3647 <li> 3648 <code>XML_MAJOR_VERSION</code> 3649 </li> 3650 3651 <li> 3652 <code>XML_MINOR_VERSION</code> 3653 </li> 3654 3655 <li> 3656 <code>XML_MICRO_VERSION</code> 3657 </li> 3658 </ul> 3659 Testing these constants is currently the best way to determine if particular 3660 parts of the Expat API are available. 3661 </div> 3662 3663 <h4 id="XML_GetFeatureList"> 3664 XML_GetFeatureList 3665 </h4> 3666 3667 <pre class="fcndec"> 3668const XML_Feature * XMLCALL 3669XML_GetFeatureList(); 3670</pre> 3671 3672 <pre class="signature"> 3673enum XML_FeatureEnum { 3674 XML_FEATURE_END = 0, 3675 XML_FEATURE_UNICODE, 3676 XML_FEATURE_UNICODE_WCHAR_T, 3677 XML_FEATURE_DTD, 3678 XML_FEATURE_CONTEXT_BYTES, 3679 XML_FEATURE_MIN_SIZE, 3680 XML_FEATURE_SIZEOF_XML_CHAR, 3681 XML_FEATURE_SIZEOF_XML_LCHAR, 3682 XML_FEATURE_NS, 3683 XML_FEATURE_LARGE_SIZE 3684}; 3685 3686typedef struct { 3687 enum XML_FeatureEnum feature; 3688 XML_LChar *name; 3689 long int value; 3690} XML_Feature; 3691</pre> 3692 <div class="fcndef"> 3693 <p> 3694 Returns a list of "feature" records, providing details on how Expat was 3695 configured at compile time. Most applications should not need to worry about 3696 this, but this information is otherwise not available from Expat. This function 3697 allows code that does need to check these features to do so at runtime. 3698 </p> 3699 3700 <p> 3701 The return value is an array of <code>XML_Feature</code>, terminated by a 3702 record with a <code>feature</code> of <code>XML_FEATURE_END</code> and 3703 <code>name</code> of <code>NULL</code>, identifying the feature-test macros 3704 Expat was compiled with. Since an application that requires this kind of 3705 information needs to determine the type of character the <code>name</code> 3706 points to, records for the <code>XML_FEATURE_SIZEOF_XML_CHAR</code> and 3707 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the beginning of 3708 the list, followed by <code>XML_FEATURE_UNICODE</code> and 3709 <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at all. 3710 </p> 3711 3712 <p> 3713 Some features have an associated value. If there isn't an associated value, the 3714 <code>value</code> field is set to 0. At this time, the following features have 3715 been defined to have values: 3716 </p> 3717 3718 <dl> 3719 <dt> 3720 <code>XML_FEATURE_SIZEOF_XML_CHAR</code> 3721 </dt> 3722 3723 <dd> 3724 The number of bytes occupied by one <code>XML_Char</code> character. 3725 </dd> 3726 3727 <dt> 3728 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> 3729 </dt> 3730 3731 <dd> 3732 The number of bytes occupied by one <code>XML_LChar</code> character. 3733 </dd> 3734 3735 <dt> 3736 <code>XML_FEATURE_CONTEXT_BYTES</code> 3737 </dt> 3738 3739 <dd> 3740 The maximum number of characters of context which can be reported by 3741 <code><a href="#XML_GetInputContext">XML_GetInputContext</a></code>. 3742 </dd> 3743 </dl> 3744 </div> 3745 3746 <h4 id="XML_FreeContentModel"> 3747 XML_FreeContentModel 3748 </h4> 3749 3750 <pre class="fcndec"> 3751void XMLCALL 3752XML_FreeContentModel(XML_Parser parser, XML_Content *model); 3753</pre> 3754 <div class="fcndef"> 3755 Function to deallocate the <code>model</code> argument passed to the 3756 <code>XML_ElementDeclHandler</code> callback set using <code><a href= 3757 "#XML_SetElementDeclHandler">XML_ElementDeclHandler</a></code>. This function 3758 should not be used for any other purpose. 3759 </div> 3760 3761 <p> 3762 The following functions allow external code to share the memory allocator an 3763 <code>XML_Parser</code> has been configured to use. This is especially useful for 3764 third-party libraries that interact with a parser object created by application 3765 code, or heavily layered applications. This can be essential when using 3766 dynamically loaded libraries which use different C standard libraries (this can 3767 happen on Windows, at least). 3768 </p> 3769 3770 <h4 id="XML_MemMalloc"> 3771 XML_MemMalloc 3772 </h4> 3773 3774 <pre class="fcndec"> 3775void * XMLCALL 3776XML_MemMalloc(XML_Parser parser, size_t size); 3777</pre> 3778 <div class="fcndef"> 3779 Allocate <code>size</code> bytes of memory using the allocator the 3780 <code>parser</code> object has been configured to use. Returns a pointer to the 3781 memory or <code>NULL</code> on failure. Memory allocated in this way must be 3782 freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>. 3783 </div> 3784 3785 <h4 id="XML_MemRealloc"> 3786 XML_MemRealloc 3787 </h4> 3788 3789 <pre class="fcndec"> 3790void * XMLCALL 3791XML_MemRealloc(XML_Parser parser, void *ptr, size_t size); 3792</pre> 3793 <div class="fcndef"> 3794 Allocate <code>size</code> bytes of memory using the allocator the 3795 <code>parser</code> object has been configured to use. <code>ptr</code> must 3796 point to a block of memory allocated by <code><a href= 3797 "#XML_MemMalloc">XML_MemMalloc</a></code> or <code>XML_MemRealloc</code>, or be 3798 <code>NULL</code>. This function tries to expand the block pointed to by 3799 <code>ptr</code> if possible. Returns a pointer to the memory or 3800 <code>NULL</code> on failure. On success, the original block has either been 3801 expanded or freed. On failure, the original block has not been freed; the caller 3802 is responsible for freeing the original block. Memory allocated in this way must 3803 be freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>. 3804 </div> 3805 3806 <h4 id="XML_MemFree"> 3807 XML_MemFree 3808 </h4> 3809 3810 <pre class="fcndec"> 3811void XMLCALL 3812XML_MemFree(XML_Parser parser, void *ptr); 3813</pre> 3814 <div class="fcndef"> 3815 Free a block of memory pointed to by <code>ptr</code>. The block must have been 3816 allocated by <code><a href="#XML_MemMalloc">XML_MemMalloc</a></code> or 3817 <code>XML_MemRealloc</code>, or be <code>NULL</code>. 3818 </div> 3819 3820 <hr /> 3821 3822 <div class="footer"> 3823 Found a bug in the documentation? <a href= 3824 "https://github.com/libexpat/libexpat/issues">Please file a bug report.</a> 3825 </div> 3826 </div> 3827 </body> 3828</html> 3829