1<?xml version="1.0" encoding="utf-8"?> 2<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 4<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> 5 <head> 6 <!-- 7 __ __ _ 8 ___\ \/ /_ __ __ _| |_ 9 / _ \\ /| '_ \ / _` | __| 10 | __// \| |_) | (_| | |_ 11 \___/_/\_\ .__/ \__,_|\__| 12 |_| XML parser 13 14 Copyright (c) 2000 Clark Cooper <coopercc@users.sourceforge.net> 15 Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net> 16 Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net> 17 Copyright (c) 2017-2026 Sebastian Pipping <sebastian@pipping.org> 18 Copyright (c) 2017 Jakub Wilk <jwilk@jwilk.net> 19 Copyright (c) 2021 Tomas Korbar <tkorbar@redhat.com> 20 Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr> 21 Copyright (c) 2022 Thijs Schreijer <thijs@thijsschreijer.nl> 22 Copyright (c) 2023-2025 Hanno Böck <hanno@gentoo.org> 23 Copyright (c) 2023 Sony Corporation / Snild Dolkow <snild@sony.com> 24 Licensed under the MIT license: 25 26 Permission is hereby granted, free of charge, to any person obtaining 27 a copy of this software and associated documentation files (the 28 "Software"), to deal in the Software without restriction, including 29 without limitation the rights to use, copy, modify, merge, publish, 30 distribute, sublicense, and/or sell copies of the Software, and to permit 31 persons to whom the Software is furnished to do so, subject to the 32 following conditions: 33 34 The above copyright notice and this permission notice shall be included 35 in all copies or substantial portions of the Software. 36 37 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 38 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 39 MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN 40 NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, 41 DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR 42 OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE 43 USE OR OTHER DEALINGS IN THE SOFTWARE. 44--> 45 46 <title> 47 Expat XML Parser 48 </title> 49 <meta name="author" content="Clark Cooper, coopercc@netheaven.com" /> 50 <link href="ok.min.css" rel="stylesheet" /> 51 <link href="style.css" rel="stylesheet" /> 52 </head> 53 <body> 54 <div> 55 <h1> 56 The Expat XML Parser <small>Release 2.8.2</small> 57 </h1> 58 </div> 59 60 <div class="content"> 61 <p> 62 Expat is a library, written in C, for parsing XML documents. It's the underlying 63 XML parser for the open source Mozilla project, Perl's <code>XML::Parser</code>, 64 Python's <code>xml.parsers.expat</code>, and other open-source XML parsers. 65 </p> 66 67 <p> 68 This library is the creation of James Clark, who's also given us groff (an nroff 69 look-alike), Jade (an implementation of ISO's DSSSL stylesheet language for 70 SGML), XP (a Java XML parser package), XT (a Java XSL engine). James was also the 71 technical lead on the XML Working Group at W3C that produced the XML 72 specification. 73 </p> 74 75 <p> 76 This is free software, licensed under the <a href="../COPYING">MIT/X Consortium 77 license</a>. You may download it from <a href="https://libexpat.github.io/">the 78 Expat home page</a>. 79 </p> 80 81 <p> 82 The bulk of this document was originally commissioned as an article by <a href= 83 "https://www.xml.com/">XML.com</a>. They graciously allowed Clark Cooper to 84 retain copyright and to distribute it with Expat. This version has been 85 substantially extended to include documentation on features which have been added 86 since the original article was published, and additional information on using the 87 original interface. 88 </p> 89 90 <hr /> 91 92 <h2> 93 Table of Contents 94 </h2> 95 96 <ul> 97 <li> 98 <a href="#overview">Overview</a> 99 </li> 100 101 <li> 102 <a href="#building">Building and Installing</a> 103 </li> 104 105 <li> 106 <a href="#using">Using Expat</a> 107 </li> 108 109 <li> 110 <a href="#reference">Reference</a> 111 <ul> 112 <li> 113 <a href="#creation">Parser Creation Functions</a> 114 <ul> 115 <li> 116 <a href="#XML_ParserCreate">XML_ParserCreate</a> 117 </li> 118 119 <li> 120 <a href="#XML_ParserCreateNS">XML_ParserCreateNS</a> 121 </li> 122 123 <li> 124 <a href="#XML_ParserCreate_MM">XML_ParserCreate_MM</a> 125 </li> 126 127 <li> 128 <a href= 129 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a> 130 </li> 131 132 <li> 133 <a href="#XML_ParserFree">XML_ParserFree</a> 134 </li> 135 136 <li> 137 <a href="#XML_ParserReset">XML_ParserReset</a> 138 </li> 139 </ul> 140 </li> 141 142 <li> 143 <a href="#parsing">Parsing Functions</a> 144 <ul> 145 <li> 146 <a href="#XML_Parse">XML_Parse</a> 147 </li> 148 149 <li> 150 <a href="#XML_ParseBuffer">XML_ParseBuffer</a> 151 </li> 152 153 <li> 154 <a href="#XML_GetBuffer">XML_GetBuffer</a> 155 </li> 156 157 <li> 158 <a href="#XML_StopParser">XML_StopParser</a> 159 </li> 160 161 <li> 162 <a href="#XML_ResumeParser">XML_ResumeParser</a> 163 </li> 164 165 <li> 166 <a href="#XML_GetParsingStatus">XML_GetParsingStatus</a> 167 </li> 168 </ul> 169 </li> 170 171 <li> 172 <a href="#setting">Handler Setting Functions</a> 173 <ul> 174 <li> 175 <a href="#XML_SetStartElementHandler">XML_SetStartElementHandler</a> 176 </li> 177 178 <li> 179 <a href="#XML_SetEndElementHandler">XML_SetEndElementHandler</a> 180 </li> 181 182 <li> 183 <a href="#XML_SetElementHandler">XML_SetElementHandler</a> 184 </li> 185 186 <li> 187 <a href="#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a> 188 </li> 189 190 <li> 191 <a href= 192 "#XML_SetProcessingInstructionHandler">XML_SetProcessingInstructionHandler</a> 193 </li> 194 195 <li> 196 <a href="#XML_SetCommentHandler">XML_SetCommentHandler</a> 197 </li> 198 199 <li> 200 <a href= 201 "#XML_SetStartCdataSectionHandler">XML_SetStartCdataSectionHandler</a> 202 </li> 203 204 <li> 205 <a href= 206 "#XML_SetEndCdataSectionHandler">XML_SetEndCdataSectionHandler</a> 207 </li> 208 209 <li> 210 <a href="#XML_SetCdataSectionHandler">XML_SetCdataSectionHandler</a> 211 </li> 212 213 <li> 214 <a href="#XML_SetDefaultHandler">XML_SetDefaultHandler</a> 215 </li> 216 217 <li> 218 <a href="#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a> 219 </li> 220 221 <li> 222 <a href= 223 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a> 224 </li> 225 226 <li> 227 <a href= 228 "#XML_SetExternalEntityRefHandlerArg">XML_SetExternalEntityRefHandlerArg</a> 229 </li> 230 231 <li> 232 <a href="#XML_SetSkippedEntityHandler">XML_SetSkippedEntityHandler</a> 233 </li> 234 235 <li> 236 <a href= 237 "#XML_SetUnknownEncodingHandler">XML_SetUnknownEncodingHandler</a> 238 </li> 239 240 <li> 241 <a href= 242 "#XML_SetStartNamespaceDeclHandler">XML_SetStartNamespaceDeclHandler</a> 243 </li> 244 245 <li> 246 <a href= 247 "#XML_SetEndNamespaceDeclHandler">XML_SetEndNamespaceDeclHandler</a> 248 </li> 249 250 <li> 251 <a href="#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a> 252 </li> 253 254 <li> 255 <a href="#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a> 256 </li> 257 258 <li> 259 <a href= 260 "#XML_SetStartDoctypeDeclHandler">XML_SetStartDoctypeDeclHandler</a> 261 </li> 262 263 <li> 264 <a href= 265 "#XML_SetEndDoctypeDeclHandler">XML_SetEndDoctypeDeclHandler</a> 266 </li> 267 268 <li> 269 <a href="#XML_SetDoctypeDeclHandler">XML_SetDoctypeDeclHandler</a> 270 </li> 271 272 <li> 273 <a href="#XML_SetElementDeclHandler">XML_SetElementDeclHandler</a> 274 </li> 275 276 <li> 277 <a href="#XML_SetAttlistDeclHandler">XML_SetAttlistDeclHandler</a> 278 </li> 279 280 <li> 281 <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a> 282 </li> 283 284 <li> 285 <a href= 286 "#XML_SetUnparsedEntityDeclHandler">XML_SetUnparsedEntityDeclHandler</a> 287 </li> 288 289 <li> 290 <a href="#XML_SetNotationDeclHandler">XML_SetNotationDeclHandler</a> 291 </li> 292 293 <li> 294 <a href="#XML_SetNotStandaloneHandler">XML_SetNotStandaloneHandler</a> 295 </li> 296 </ul> 297 </li> 298 299 <li> 300 <a href="#position">Parse Position and Error Reporting Functions</a> 301 <ul> 302 <li> 303 <a href="#XML_GetErrorCode">XML_GetErrorCode</a> 304 </li> 305 306 <li> 307 <a href="#XML_ErrorString">XML_ErrorString</a> 308 </li> 309 310 <li> 311 <a href="#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a> 312 </li> 313 314 <li> 315 <a href="#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a> 316 </li> 317 318 <li> 319 <a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a> 320 </li> 321 322 <li> 323 <a href="#XML_GetCurrentByteCount">XML_GetCurrentByteCount</a> 324 </li> 325 326 <li> 327 <a href="#XML_GetInputContext">XML_GetInputContext</a> 328 </li> 329 </ul> 330 </li> 331 332 <li> 333 <a href="#attack-protection">Attack Protection</a> 334 <ul> 335 <li> 336 <a href= 337 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a> 338 </li> 339 340 <li> 341 <a href= 342 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a> 343 </li> 344 345 <li> 346 <a href= 347 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a> 348 </li> 349 350 <li> 351 <a href= 352 "#XML_SetAllocTrackerActivationThreshold">XML_SetAllocTrackerActivationThreshold</a> 353 </li> 354 355 <li> 356 <a href= 357 "#XML_SetReparseDeferralEnabled">XML_SetReparseDeferralEnabled</a> 358 </li> 359 </ul> 360 </li> 361 362 <li> 363 <a href="#miscellaneous">Miscellaneous Functions</a> 364 <ul> 365 <li> 366 <a href="#XML_SetUserData">XML_SetUserData</a> 367 </li> 368 369 <li> 370 <a href="#XML_GetUserData">XML_GetUserData</a> 371 </li> 372 373 <li> 374 <a href="#XML_UseParserAsHandlerArg">XML_UseParserAsHandlerArg</a> 375 </li> 376 377 <li> 378 <a href="#XML_SetBase">XML_SetBase</a> 379 </li> 380 381 <li> 382 <a href="#XML_GetBase">XML_GetBase</a> 383 </li> 384 385 <li> 386 <a href= 387 "#XML_GetSpecifiedAttributeCount">XML_GetSpecifiedAttributeCount</a> 388 </li> 389 390 <li> 391 <a href="#XML_GetIdAttributeIndex">XML_GetIdAttributeIndex</a> 392 </li> 393 394 <li> 395 <a href="#XML_GetAttributeInfo">XML_GetAttributeInfo</a> 396 </li> 397 398 <li> 399 <a href="#XML_SetEncoding">XML_SetEncoding</a> 400 </li> 401 402 <li> 403 <a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a> 404 </li> 405 406 <li> 407 <a href="#XML_SetHashSalt">XML_SetHashSalt</a> (deprecated) 408 </li> 409 410 <li> 411 <a href="#XML_SetHashSalt16Bytes">XML_SetHashSalt16Bytes</a> 412 </li> 413 414 <li> 415 <a href="#XML_UseForeignDTD">XML_UseForeignDTD</a> 416 </li> 417 418 <li> 419 <a href="#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a> 420 </li> 421 422 <li> 423 <a href="#XML_DefaultCurrent">XML_DefaultCurrent</a> 424 </li> 425 426 <li> 427 <a href="#XML_ExpatVersion">XML_ExpatVersion</a> 428 </li> 429 430 <li> 431 <a href="#XML_ExpatVersionInfo">XML_ExpatVersionInfo</a> 432 </li> 433 434 <li> 435 <a href="#XML_GetFeatureList">XML_GetFeatureList</a> 436 </li> 437 438 <li> 439 <a href="#XML_FreeContentModel">XML_FreeContentModel</a> 440 </li> 441 442 <li> 443 <a href="#XML_MemMalloc">XML_MemMalloc</a> 444 </li> 445 446 <li> 447 <a href="#XML_MemRealloc">XML_MemRealloc</a> 448 </li> 449 450 <li> 451 <a href="#XML_MemFree">XML_MemFree</a> 452 </li> 453 </ul> 454 </li> 455 </ul> 456 </li> 457 </ul> 458 459 <hr /> 460 461 <h2> 462 <a id="overview" name="overview">Overview</a> 463 </h2> 464 465 <p> 466 Expat is a stream-oriented parser. You register callback (or handler) functions 467 with the parser and then start feeding it the document. As the parser recognizes 468 parts of the document, it will call the appropriate handler for that part (if 469 you've registered one.) The document is fed to the parser in pieces, so you can 470 start parsing before you have all the document. This also allows you to parse 471 really huge documents that won't fit into memory. 472 </p> 473 474 <p> 475 Expat can be intimidating due to the many kinds of handlers and options you can 476 set. But you only need to learn four functions in order to do 90% of what you'll 477 want to do with it: 478 </p> 479 480 <dl> 481 <dt> 482 <code><a href="#XML_ParserCreate">XML_ParserCreate</a></code> 483 </dt> 484 485 <dd> 486 Create a new parser object. 487 </dd> 488 489 <dt> 490 <code><a href="#XML_SetElementHandler">XML_SetElementHandler</a></code> 491 </dt> 492 493 <dd> 494 Set handlers for start and end tags. 495 </dd> 496 497 <dt> 498 <code><a href= 499 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code> 500 </dt> 501 502 <dd> 503 Set handler for text. 504 </dd> 505 506 <dt> 507 <code><a href="#XML_Parse">XML_Parse</a></code> 508 </dt> 509 510 <dd> 511 Pass a buffer full of document to the parser 512 </dd> 513 </dl> 514 515 <p> 516 These functions and others are described in the <a href= 517 "#reference">reference</a> part of this document. The reference section also 518 describes in detail the parameters passed to the different types of handlers. 519 </p> 520 521 <p> 522 Let's look at a very simple example program that only uses 3 of the above 523 functions (it doesn't need to set a character handler.) The program <a href= 524 "../examples/outline.c">outline.c</a> prints an element outline, indenting child 525 elements to distinguish them from the parent element that contains them. The 526 start handler does all the work. It prints two indenting spaces for every level 527 of ancestor elements, then it prints the element and attribute information. 528 Finally it increments the global <code>Depth</code> variable. 529 </p> 530 531 <pre class="eg"> 532int Depth; 533 534void XMLCALL 535start(void *data, const char *el, const char **attr) { 536 int i; 537 538 for (i = 0; i < Depth; i++) 539 printf(" "); 540 541 printf("%s", el); 542 543 for (i = 0; attr[i]; i += 2) { 544 printf(" %s='%s'", attr[i], attr[i + 1]); 545 } 546 547 printf("\n"); 548 Depth++; 549} /* End of start handler */ 550</pre> 551 <p> 552 The end tag simply does the bookkeeping work of decrementing <code>Depth</code>. 553 </p> 554 555 <pre class="eg"> 556void XMLCALL 557end(void *data, const char *el) { 558 Depth--; 559} /* End of end handler */ 560</pre> 561 <p> 562 Note the <code>XMLCALL</code> annotation used for the callbacks. This is used to 563 ensure that the Expat and the callbacks are using the same calling convention in 564 case the compiler options used for Expat itself and the client code are 565 different. Expat tries not to care what the default calling convention is, though 566 it may require that it be compiled with a default convention of "cdecl" on some 567 platforms. For code which uses Expat, however, the calling convention is 568 specified by the <code>XMLCALL</code> annotation on most platforms; callbacks 569 should be defined using this annotation. 570 </p> 571 572 <p> 573 The <code>XMLCALL</code> annotation was added in Expat 1.95.7, but existing 574 working Expat applications don't need to add it (since they are already using the 575 "cdecl" calling convention, or they wouldn't be working). The annotation is only 576 needed if the default calling convention may be something other than "cdecl". To 577 use the annotation safely with older versions of Expat, you can conditionally 578 define it <em>after</em> including Expat's header file: 579 </p> 580 581 <pre class="eg"> 582#include <expat.h> 583 584#ifndef XMLCALL 585#if defined(_MSC_VER) && !defined(__BEOS__) && !defined(__CYGWIN__) 586#define XMLCALL __cdecl 587#elif defined(__GNUC__) 588#define XMLCALL __attribute__((cdecl)) 589#else 590#define XMLCALL 591#endif 592#endif 593</pre> 594 <p> 595 After creating the parser, the main program just has the job of shoveling the 596 document to the parser so that it can do its work. 597 </p> 598 599 <hr /> 600 601 <h2> 602 <a id="building" name="building">Building and Installing Expat</a> 603 </h2> 604 605 <p> 606 The Expat distribution comes as a compressed (with GNU gzip) tar file. You may 607 download the latest version from <a href= 608 "https://sourceforge.net/projects/expat/">Source Forge</a>. After unpacking this, 609 cd into the directory. Then follow either the Win32 directions or Unix directions 610 below. 611 </p> 612 613 <h3> 614 Building under Win32 615 </h3> 616 617 <p> 618 If you're using the GNU compiler under cygwin, follow the Unix directions in the 619 next section. Otherwise if you have Microsoft's Developer Studio installed, you 620 can use CMake to generate a <code>.sln</code> file, e.g. <code>cmake -G"Visual 621 Studio 17 2022" -DCMAKE_BUILD_TYPE=RelWithDebInfo .</code> , and build Expat 622 using <code>msbuild /m expat.sln</code> after. 623 </p> 624 625 <p> 626 Alternatively, you may download the Win32 binary package that contains the 627 "expat.h" include file and a pre-built DLL. 628 </p> 629 630 <h3> 631 Building under Unix (or GNU) 632 </h3> 633 634 <p> 635 First you'll need to run the configure shell script in order to configure the 636 Makefiles and headers for your system. 637 </p> 638 639 <p> 640 If you're happy with all the defaults that configure picks for you, and you have 641 permission on your system to install into /usr/local, you can install Expat with 642 this sequence of commands: 643 </p> 644 645 <pre class="eg"> 646./configure 647make 648make install 649</pre> 650 <p> 651 There are some options that you can provide to this script, but the only one 652 we'll mention here is the <code>--prefix</code> option. You can find out all the 653 options available by running configure with just the <code>--help</code> option. 654 </p> 655 656 <p> 657 By default, the configure script sets things up so that the library gets 658 installed in <code>/usr/local/lib</code> and the associated header file in 659 <code>/usr/local/include</code>. But if you were to give the option, 660 <code>--prefix=/home/me/mystuff</code>, then the library and header would get 661 installed in <code>/home/me/mystuff/lib</code> and 662 <code>/home/me/mystuff/include</code> respectively. 663 </p> 664 665 <h3> 666 Configuring Expat Using the Pre-Processor 667 </h3> 668 669 <p> 670 Expat's feature set can be configured using a small number of pre-processor 671 definitions. The symbols are: 672 </p> 673 674 <dl class="cpp-symbols"> 675 <dt> 676 <a id="XML_GE" name="XML_GE">XML_GE</a> 677 </dt> 678 679 <dd> 680 Added in Expat 2.6.0. Include support for <a href= 681 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-physical-struct">general 682 entities</a> (syntax <code>&e1;</code> to reference and syntax 683 <code><!ENTITY e1 'value1'></code> (an internal general entity) or 684 <code><!ENTITY e2 SYSTEM 'file2'></code> (an external general entity) to 685 declare). With <code>XML_GE</code> enabled, general entities will be replaced 686 by their declared replacement text; for this to work for <em>external</em> 687 general entities, in addition an <code><a href= 688 "#XML_SetExternalEntityRefHandler">XML_ExternalEntityRefHandler</a></code> must 689 be set using <code><a href= 690 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>. 691 Also, enabling <code>XML_GE</code> makes the functions <code><a href= 692 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code> 693 and <code><a href= 694 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code> 695 available.<br /> 696 With <code>XML_GE</code> disabled, Expat has a smaller memory footprint and can 697 be faster, but will not load external general entities and will replace all 698 general entities (except the <a href= 699 "https://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">predefined 700 five</a>: <code>amp</code>, <code>apos</code>, <code>gt</code>, 701 <code>lt</code>, <code>quot</code>) with a self-reference: for example, 702 referencing an entity <code>e1</code> via <code>&e1;</code> will be 703 replaced by text <code>&e1;</code>. 704 </dd> 705 706 <dt> 707 <a id="XML_DTD" name="XML_DTD">XML_DTD</a> 708 </dt> 709 710 <dd> 711 Include support for using and reporting DTD-based content. If this is defined, 712 default attribute values from an external DTD subset are reported and attribute 713 value normalization occurs based on the type of attributes defined in the 714 external subset. Without this, Expat has a smaller memory footprint and can be 715 faster, but will not load external parameter entities or process conditional 716 sections. If defined, makes the functions <code><a href= 717 "#XML_SetBillionLaughsAttackProtectionMaximumAmplification">XML_SetBillionLaughsAttackProtectionMaximumAmplification</a></code> 718 and <code><a href= 719 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">XML_SetBillionLaughsAttackProtectionActivationThreshold</a></code> 720 available. 721 </dd> 722 723 <dt> 724 <a id="XML_NS" name="XML_NS">XML_NS</a> 725 </dt> 726 727 <dd> 728 When defined, support for the <cite><a href= 729 "https://www.w3.org/TR/REC-xml-names/">Namespaces in XML</a></cite> 730 specification is included. 731 </dd> 732 733 <dt> 734 <a id="XML_UNICODE" name="XML_UNICODE">XML_UNICODE</a> 735 </dt> 736 737 <dd> 738 When defined, character data reported to the application is encoded in UTF-16 739 using wide characters of the type <code>XML_Char</code>. This is implied if 740 <code>XML_UNICODE_WCHAR_T</code> is defined. 741 </dd> 742 743 <dt> 744 <a id="XML_UNICODE_WCHAR_T" name="XML_UNICODE_WCHAR_T">XML_UNICODE_WCHAR_T</a> 745 </dt> 746 747 <dd> 748 If defined, causes the <code>XML_Char</code> character type to be defined using 749 the <code>wchar_t</code> type; otherwise, <code>unsigned short</code> is used. 750 Defining this implies <code>XML_UNICODE</code>. 751 </dd> 752 753 <dt> 754 <a id="XML_LARGE_SIZE" name="XML_LARGE_SIZE">XML_LARGE_SIZE</a> 755 </dt> 756 757 <dd> 758 If defined, causes the <code>XML_Size</code> and <code>XML_Index</code> integer 759 types to be at least 64 bits in size. This is intended to support processing of 760 very large input streams, where the return values of <code><a href= 761 "#XML_GetCurrentByteIndex">XML_GetCurrentByteIndex</a></code>, <code><a href= 762 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and 763 <code><a href= 764 "#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code> could 765 overflow. It may not be supported by all compilers, and is turned off by 766 default. 767 </dd> 768 769 <dt> 770 <a id="XML_CONTEXT_BYTES" name="XML_CONTEXT_BYTES">XML_CONTEXT_BYTES</a> 771 </dt> 772 773 <dd> 774 The number of input bytes of markup context which the parser will ensure are 775 available for reporting via <code><a href= 776 "#XML_GetInputContext">XML_GetInputContext</a></code>. This is normally set to 777 1024, and must be set to a positive integer to enable. If this is set to zero, 778 the input context will not be available and <code><a href= 779 "#XML_GetInputContext">XML_GetInputContext</a></code> will always report 780 <code>NULL</code>. Without this, Expat has a smaller memory footprint and can 781 be faster. 782 </dd> 783 784 <dt> 785 <a id="XML_STATIC" name="XML_STATIC">XML_STATIC</a> 786 </dt> 787 788 <dd> 789 On Windows, this should be set if Expat is going to be linked statically with 790 the code that calls it; this is required to get all the right MSVC magic 791 annotations correct. This is ignored on other platforms. 792 </dd> 793 794 <dt> 795 <a id="XML_ATTR_INFO" name="XML_ATTR_INFO">XML_ATTR_INFO</a> 796 </dt> 797 798 <dd> 799 If defined, makes the additional function <code><a href= 800 "#XML_GetAttributeInfo">XML_GetAttributeInfo</a></code> available for reporting 801 attribute byte offsets. 802 </dd> 803 </dl> 804 805 <hr /> 806 807 <h2> 808 <a id="using" name="using">Using Expat</a> 809 </h2> 810 811 <h3> 812 Compiling and Linking Against Expat 813 </h3> 814 815 <p> 816 Unless you installed Expat in a location not expected by your compiler and 817 linker, all you have to do to use Expat in your programs is to include the Expat 818 header (<code>#include <expat.h></code>) in your files that make calls to 819 it and to tell the linker that it needs to link against the Expat library. On 820 Unix systems, this would usually be done with the <code>-lexpat</code> argument. 821 Otherwise, you'll need to tell the compiler where to look for the Expat header 822 and the linker where to find the Expat library. You may also need to take steps 823 to tell the operating system where to find this library at run time. 824 </p> 825 826 <p> 827 On a Unix-based system, here's what a Makefile might look like when Expat is 828 installed in a standard location: 829 </p> 830 831 <pre class="eg"> 832CC=cc 833LDFLAGS= 834LIBS= -lexpat 835xmlapp: xmlapp.o 836 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 837</pre> 838 <p> 839 If you installed Expat in, say, <code>/home/me/mystuff</code>, then the Makefile 840 would look like this: 841 </p> 842 843 <pre class="eg"> 844CC=cc 845CFLAGS= -I/home/me/mystuff/include 846LDFLAGS= 847LIBS= -L/home/me/mystuff/lib -lexpat 848xmlapp: xmlapp.o 849 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS) 850</pre> 851 <p> 852 You'd also have to set the environment variable <code>LD_LIBRARY_PATH</code> to 853 <code>/home/me/mystuff/lib</code> (or to 854 <code>${LD_LIBRARY_PATH}:/home/me/mystuff/lib</code> if LD_LIBRARY_PATH already 855 has some directories in it) in order to run your application. 856 </p> 857 858 <h3> 859 Expat Basics 860 </h3> 861 862 <p> 863 As we saw in the example in the overview, the first step in parsing an XML 864 document with Expat is to create a parser object. There are <a href= 865 "#creation">three functions</a> in the Expat API for creating a parser object. 866 However, only two of these (<code><a href= 867 "#XML_ParserCreate">XML_ParserCreate</a></code> and <code><a href= 868 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>) can be used for constructing 869 a parser for a top-level document. The object returned by these functions is an 870 opaque pointer (i.e. "expat.h" declares it as void *) to data with further 871 internal structure. In order to free the memory associated with this object you 872 must call <code><a href="#XML_ParserFree">XML_ParserFree</a></code>. Note that if 873 you have provided any <a href="#userdata">user data</a> that gets stored in the 874 parser, then your application is responsible for freeing it prior to calling 875 <code>XML_ParserFree</code>. 876 </p> 877 878 <p> 879 The objects returned by the parser creation functions are good for parsing only 880 one XML document or external parsed entity. If your application needs to parse 881 many XML documents, then it needs to create a parser object for each one. The 882 best way to deal with this is to create a higher level object that contains all 883 the default initialization you want for your parser objects. 884 </p> 885 886 <p> 887 Walking through a document hierarchy with a stream oriented parser will require a 888 good stack mechanism in order to keep track of current context. For instance, to 889 answer the simple question, "What element does this text belong to?" requires a 890 stack, since the parser may have descended into other elements that are children 891 of the current one and has encountered this text on the way out. 892 </p> 893 894 <p> 895 The things you're likely to want to keep on a stack are the currently opened 896 element and it's attributes. You push this information onto the stack in the 897 start handler and you pop it off in the end handler. 898 </p> 899 900 <p> 901 For some tasks, it is sufficient to just keep information on what the depth of 902 the stack is (or would be if you had one.) The outline program shown above 903 presents one example. Another such task would be skipping over a complete 904 element. When you see the start tag for the element you want to skip, you set a 905 skip flag and record the depth at which the element started. When the end tag 906 handler encounters the same depth, the skipped element has ended and the flag may 907 be cleared. If you follow the convention that the root element starts at 1, then 908 you can use the same variable for skip flag and skip depth. 909 </p> 910 911 <pre class="eg"> 912void 913init_info(Parseinfo *info) { 914 info->skip = 0; 915 info->depth = 1; 916 /* Other initializations here */ 917} /* End of init_info */ 918 919void XMLCALL 920rawstart(void *data, const char *el, const char **attr) { 921 Parseinfo *inf = (Parseinfo *) data; 922 923 if (! inf->skip) { 924 if (should_skip(inf, el, attr)) { 925 inf->skip = inf->depth; 926 } 927 else 928 start(inf, el, attr); /* This does rest of start handling */ 929 } 930 931 inf->depth++; 932} /* End of rawstart */ 933 934void XMLCALL 935rawend(void *data, const char *el) { 936 Parseinfo *inf = (Parseinfo *) data; 937 938 inf->depth--; 939 940 if (! inf->skip) 941 end(inf, el); /* This does rest of end handling */ 942 943 if (inf->skip == inf->depth) 944 inf->skip = 0; 945} /* End rawend */ 946</pre> 947 <p> 948 Notice in the above example the difference in how depth is manipulated in the 949 start and end handlers. The end tag handler should be the mirror image of the 950 start tag handler. This is necessary to properly model containment. Since, in the 951 start tag handler, we incremented depth <em>after</em> the main body of start tag 952 code, then in the end handler, we need to manipulate it <em>before</em> the main 953 body. If we'd decided to increment it first thing in the start handler, then we'd 954 have had to decrement it last thing in the end handler. 955 </p> 956 957 <h3 id="userdata"> 958 Communicating between handlers 959 </h3> 960 961 <p> 962 In order to be able to pass information between different handlers without using 963 globals, you'll need to define a data structure to hold the shared variables. You 964 can then tell Expat (with the <code><a href= 965 "#XML_SetUserData">XML_SetUserData</a></code> function) to pass a pointer to this 966 structure to the handlers. This is the first argument received by most handlers. 967 In the <a href="#reference">reference section</a>, an argument to a callback 968 function is named <code>userData</code> and have type <code>void *</code> if the 969 user data is passed; it will have the type <code>XML_Parser</code> if the parser 970 itself is passed. When the parser is passed, the user data may be retrieved using 971 <code><a href="#XML_GetUserData">XML_GetUserData</a></code>. 972 </p> 973 974 <p> 975 One common case where multiple calls to a single handler may need to communicate 976 using an application data structure is the case when content passed to the 977 character data handler (set by <code><a href= 978 "#XML_SetCharacterDataHandler">XML_SetCharacterDataHandler</a></code>) needs to 979 be accumulated. A common first-time mistake with any of the event-oriented 980 interfaces to an XML parser is to expect all the text contained in an element to 981 be reported by a single call to the character data handler. Expat, like many 982 other XML parsers, reports such data as a sequence of calls; there's no way to 983 know when the end of the sequence is reached until a different callback is made. 984 A buffer referenced by the user data structure proves both an effective and 985 convenient place to accumulate character data. 986 </p> 987 <!-- XXX example needed here --> 988 989 <h3> 990 XML Version 991 </h3> 992 993 <p> 994 Expat is an XML 1.0 parser, and as such never complains based on the value of the 995 <code>version</code> pseudo-attribute in the XML declaration, if present. 996 </p> 997 998 <p> 999 If an application needs to check the version number (to support alternate 1000 processing), it should use the <code><a href= 1001 "#XML_SetXmlDeclHandler">XML_SetXmlDeclHandler</a></code> function to set a 1002 handler that uses the information in the XML declaration to determine what to do. 1003 This example shows how to check that only a version number of <code>"1.0"</code> 1004 is accepted: 1005 </p> 1006 1007 <pre class="eg"> 1008static int wrong_version; 1009static XML_Parser parser; 1010 1011static void XMLCALL 1012xmldecl_handler(void *userData, 1013 const XML_Char *version, 1014 const XML_Char *encoding, 1015 int standalone) 1016{ 1017 static const XML_Char Version_1_0[] = {'1', '.', '0', 0}; 1018 1019 int i; 1020 1021 for (i = 0; i < (sizeof(Version_1_0) / sizeof(Version_1_0[0])); ++i) { 1022 if (version[i] != Version_1_0[i]) { 1023 wrong_version = 1; 1024 /* also clear all other handlers: */ 1025 XML_SetCharacterDataHandler(parser, NULL); 1026 ... 1027 return; 1028 } 1029 } 1030 ... 1031} 1032</pre> 1033 <h3> 1034 Namespace Processing 1035 </h3> 1036 1037 <p> 1038 When the parser is created using the <code><a href= 1039 "#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, function, Expat performs 1040 namespace processing. Under namespace processing, Expat consumes 1041 <code>xmlns</code> and <code>xmlns:...</code> attributes, which declare 1042 namespaces for the scope of the element in which they occur. This means that your 1043 start handler will not see these attributes. Your application can still be 1044 informed of these declarations by setting namespace declaration handlers with 1045 <a href= 1046 "#XML_SetNamespaceDeclHandler"><code>XML_SetNamespaceDeclHandler</code></a>. 1047 </p> 1048 1049 <p> 1050 Element type and attribute names that belong to a given namespace are passed to 1051 the appropriate handler in expanded form. By default this expanded form is a 1052 concatenation of the namespace URI, the separator character (which is the 2nd 1053 argument to <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>), 1054 and the local name (i.e. the part after the colon). Names with undeclared 1055 prefixes are not well-formed when namespace processing is enabled, and will 1056 trigger an error. Unprefixed attribute names are never expanded, and unprefixed 1057 element names are only expanded when they are in the scope of a default 1058 namespace. 1059 </p> 1060 1061 <p> 1062 However if <code><a href= 1063 "#XML_SetReturnNSTriplet">XML_SetReturnNSTriplet</a></code> has been called with 1064 a non-zero <code>do_nst</code> parameter, then the expanded form for names with 1065 an explicit prefix is a concatenation of: URI, separator, local name, separator, 1066 prefix. 1067 </p> 1068 1069 <p> 1070 You can set handlers for the start of a namespace declaration and for the end of 1071 a scope of a declaration with the <code><a href= 1072 "#XML_SetNamespaceDeclHandler">XML_SetNamespaceDeclHandler</a></code> function. 1073 The StartNamespaceDeclHandler is called prior to the start tag handler and the 1074 EndNamespaceDeclHandler is called after the corresponding end tag that ends the 1075 namespace's scope. The namespace start handler gets passed the prefix and URI for 1076 the namespace. For a default namespace declaration (xmlns='...'), the prefix will 1077 be <code>NULL</code>. The URI will be <code>NULL</code> for the case where the 1078 default namespace is being unset. The namespace end handler just gets the prefix 1079 for the closing scope. 1080 </p> 1081 1082 <p> 1083 These handlers are called for each declaration. So if, for instance, a start tag 1084 had three namespace declarations, then the StartNamespaceDeclHandler would be 1085 called three times before the start tag handler is called, once for each 1086 declaration. 1087 </p> 1088 1089 <h3> 1090 Character Encodings 1091 </h3> 1092 1093 <p> 1094 While XML is based on Unicode, and every XML processor is required to recognized 1095 UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode), other encodings may be 1096 declared in XML documents or entities. For the main document, an XML declaration 1097 may contain an encoding declaration: 1098 </p> 1099 1100 <pre> 1101<?xml version="1.0" encoding="ISO-8859-2"?> 1102</pre> 1103 <p> 1104 External parsed entities may begin with a text declaration, which looks like an 1105 XML declaration with just an encoding declaration: 1106 </p> 1107 1108 <pre> 1109<?xml encoding="Big5"?> 1110</pre> 1111 <p> 1112 With Expat, you may also specify an encoding at the time of creating a parser. 1113 This is useful when the encoding information may come from a source outside the 1114 document itself (like a higher level protocol.) 1115 </p> 1116 1117 <p> 1118 <a id="builtin_encodings" name="builtin_encodings"></a>There are four built-in 1119 encodings in Expat: 1120 </p> 1121 1122 <ul> 1123 <li>UTF-8 1124 </li> 1125 1126 <li>UTF-16 1127 </li> 1128 1129 <li>ISO-8859-1 1130 </li> 1131 1132 <li>US-ASCII 1133 </li> 1134 </ul> 1135 1136 <p> 1137 Anything else discovered in an encoding declaration or in the protocol encoding 1138 specified in the parser constructor, triggers a call to the 1139 <code>UnknownEncodingHandler</code>. This handler gets passed the encoding name 1140 and a pointer to an <code>XML_Encoding</code> data structure. Your handler must 1141 fill in this structure and return <code>XML_STATUS_OK</code> if it knows how to 1142 deal with the encoding. Otherwise the handler should return 1143 <code>XML_STATUS_ERROR</code>. The handler also gets passed a pointer to an 1144 optional application data structure that you may indicate when you set the 1145 handler. 1146 </p> 1147 1148 <p> 1149 Expat places restrictions on character encodings that it can support by filling 1150 in the <code>XML_Encoding</code> structure. include file: 1151 </p> 1152 1153 <ol> 1154 <li>Every ASCII character that can appear in a well-formed XML document must be 1155 represented by a single byte, and that byte must correspond to it's ASCII 1156 encoding (except for the characters $@\^'{}~) 1157 </li> 1158 1159 <li>Characters must be encoded in 4 bytes or less. 1160 </li> 1161 1162 <li>All characters encoded must have Unicode scalar values less than or equal to 1163 65535 (0xFFFF)<em>This does not apply to the built-in support for UTF-16 and 1164 UTF-8</em> 1165 </li> 1166 1167 <li>No character may be encoded by more that one distinct sequence of bytes 1168 </li> 1169 </ol> 1170 1171 <p> 1172 <code>XML_Encoding</code> contains an array of integers that correspond to the 1173 1st byte of an encoding sequence. If the value in the array for a byte is zero or 1174 positive, then the byte is a single byte encoding that encodes the Unicode scalar 1175 value contained in the array. A -1 in this array indicates a malformed byte. If 1176 the value is -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte 1177 sequence respectively. Multi-byte sequences are sent to the convert function 1178 pointed at in the <code>XML_Encoding</code> structure. This function should 1179 return the Unicode scalar value for the sequence or -1 if the sequence is 1180 malformed. 1181 </p> 1182 1183 <p> 1184 One pitfall that novice Expat users are likely to fall into is that although 1185 Expat may accept input in various encodings, the strings that it passes to the 1186 handlers are always encoded in UTF-8 or UTF-16 (depending on how Expat was 1187 compiled). Your application is responsible for any translation of these strings 1188 into other encodings. 1189 </p> 1190 1191 <h3> 1192 Handling External Entity References 1193 </h3> 1194 1195 <p> 1196 Expat does not read or parse external entities directly. Note that any external 1197 DTD is a special case of an external entity. If you've set no 1198 <code>ExternalEntityRefHandler</code>, then external entity references are 1199 silently ignored. Otherwise, it calls your handler with the information needed to 1200 read and parse the external entity. 1201 </p> 1202 1203 <p> 1204 Your handler isn't actually responsible for parsing the entity, but it is 1205 responsible for creating a subsidiary parser with <code><a href= 1206 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> that 1207 will do the job. This returns an instance of <code>XML_Parser</code> that has 1208 handlers and other data structures initialized from the parent parser. You may 1209 then use <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 1210 "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this parser. Since 1211 external entities my refer to other external entities, your handler should be 1212 prepared to be called recursively. 1213 </p> 1214 1215 <h3> 1216 Parsing DTDs 1217 </h3> 1218 1219 <p> 1220 In order to parse parameter entities, before starting the parse, you must call 1221 <code><a href="#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code> 1222 with one of the following arguments: 1223 </p> 1224 1225 <dl> 1226 <dt> 1227 <code>XML_PARAM_ENTITY_PARSING_NEVER</code> 1228 </dt> 1229 1230 <dd> 1231 Don't parse parameter entities or the external subset 1232 </dd> 1233 1234 <dt> 1235 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code> 1236 </dt> 1237 1238 <dd> 1239 Parse parameter entities and the external subset unless <code>standalone</code> 1240 was set to "yes" in the XML declaration. 1241 </dd> 1242 1243 <dt> 1244 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code> 1245 </dt> 1246 1247 <dd> 1248 Always parse parameter entities and the external subset 1249 </dd> 1250 </dl> 1251 1252 <p> 1253 In order to read an external DTD, you also have to set an external entity 1254 reference handler as described above. 1255 </p> 1256 1257 <h3 id="stop-resume"> 1258 Temporarily Stopping Parsing 1259 </h3> 1260 1261 <p> 1262 Expat 1.95.8 introduces a new feature: its now possible to stop parsing 1263 temporarily from within a handler function, even if more data has already been 1264 passed into the parser. Applications for this include 1265 </p> 1266 1267 <ul> 1268 <li>Supporting the <a href="https://www.w3.org/TR/xinclude/">XInclude</a> 1269 specification. 1270 </li> 1271 1272 <li>Delaying further processing until additional information is available from 1273 some other source. 1274 </li> 1275 1276 <li>Adjusting processor load as task priorities shift within an application. 1277 </li> 1278 1279 <li>Stopping parsing completely (simply free or reset the parser instead of 1280 resuming in the outer parsing loop). This can be useful if an application-domain 1281 error is found in the XML being parsed or if the result of the parse is 1282 determined not to be useful after all. 1283 </li> 1284 </ul> 1285 1286 <p> 1287 To take advantage of this feature, the main parsing loop of an application needs 1288 to support this specifically. It cannot be supported with a parsing loop 1289 compatible with Expat 1.95.7 or earlier (though existing loops will continue to 1290 work without supporting the stop/resume feature). 1291 </p> 1292 1293 <p> 1294 An application that uses this feature for a single parser will have the rough 1295 structure (in pseudo-code): 1296 </p> 1297 1298 <pre class="pseudocode"> 1299fd = open_input() 1300p = create_parser() 1301 1302if parse_xml(p, fd) { 1303 /* suspended */ 1304 1305 int suspended = 1; 1306 1307 while (suspended) { 1308 do_something_else() 1309 if ready_to_resume() { 1310 suspended = continue_parsing(p, fd); 1311 } 1312 } 1313} 1314</pre> 1315 <p> 1316 An application that may resume any of several parsers based on input (either from 1317 the XML being parsed or some other source) will certainly have more interesting 1318 control structures. 1319 </p> 1320 1321 <p> 1322 This C function could be used for the <code>parse_xml</code> function mentioned 1323 in the pseudo-code above: 1324 </p> 1325 1326 <pre class="eg"> 1327#define BUFF_SIZE 10240 1328 1329/* Parse a document from the open file descriptor 'fd' until the parse 1330 is complete (the document has been completely parsed, or there's 1331 been an error), or the parse is stopped. Return non-zero when 1332 the parse is merely suspended. 1333*/ 1334int 1335parse_xml(XML_Parser p, int fd) 1336{ 1337 for (;;) { 1338 int last_chunk; 1339 int bytes_read; 1340 enum XML_Status status; 1341 1342 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1343 if (buff == NULL) { 1344 /* handle error... */ 1345 return 0; 1346 } 1347 bytes_read = read(fd, buff, BUFF_SIZE); 1348 if (bytes_read < 0) { 1349 /* handle error... */ 1350 return 0; 1351 } 1352 status = XML_ParseBuffer(p, bytes_read, bytes_read == 0); 1353 switch (status) { 1354 case XML_STATUS_ERROR: 1355 /* handle error... */ 1356 return 0; 1357 case XML_STATUS_SUSPENDED: 1358 return 1; 1359 } 1360 if (bytes_read == 0) 1361 return 0; 1362 } 1363} 1364</pre> 1365 <p> 1366 The corresponding <code>continue_parsing</code> function is somewhat simpler, 1367 since it only need deal with the return code from <code><a href= 1368 "#XML_ResumeParser">XML_ResumeParser</a></code>; it can delegate the input 1369 handling to the <code>parse_xml</code> function: 1370 </p> 1371 1372 <pre class="eg"> 1373/* Continue parsing a document which had been suspended. The 'p' and 1374 'fd' arguments are the same as passed to parse_xml(). Return 1375 non-zero when the parse is suspended. 1376*/ 1377int 1378continue_parsing(XML_Parser p, int fd) 1379{ 1380 enum XML_Status status = XML_ResumeParser(p); 1381 switch (status) { 1382 case XML_STATUS_ERROR: 1383 /* handle error... */ 1384 return 0; 1385 case XML_ERROR_NOT_SUSPENDED: 1386 /* handle error... */ 1387 return 0;. 1388 case XML_STATUS_SUSPENDED: 1389 return 1; 1390 } 1391 return parse_xml(p, fd); 1392} 1393</pre> 1394 <p> 1395 Now that we've seen what a mess the top-level parsing loop can become, what have 1396 we gained? Very simply, we can now use the <code><a href= 1397 "#XML_StopParser">XML_StopParser</a></code> function to stop parsing, without 1398 having to go to great lengths to avoid additional processing that we're expecting 1399 to ignore. As a bonus, we get to stop parsing <em>temporarily</em>, and come back 1400 to it when we're ready. 1401 </p> 1402 1403 <p> 1404 To stop parsing from a handler function, use the <code><a href= 1405 "#XML_StopParser">XML_StopParser</a></code> function. This function takes two 1406 arguments; the parser being stopped and a flag indicating whether the parse can 1407 be resumed in the future. 1408 </p> 1409 <!-- XXX really need more here --> 1410 1411 <hr /> 1412 <!-- ================================================================ --> 1413 1414 <h2> 1415 <a id="reference" name="reference">Expat Reference</a> 1416 </h2> 1417 1418 <h3> 1419 <a id="creation" name="creation">Parser Creation</a> 1420 </h3> 1421 1422 <h4 id="XML_ParserCreate"> 1423 XML_ParserCreate 1424 </h4> 1425 1426 <pre class="fcndec"> 1427XML_Parser XMLCALL 1428XML_ParserCreate(const XML_Char *encoding); 1429</pre> 1430 <div class="fcndef"> 1431 <p> 1432 Construct a new parser. If encoding is non-<code>NULL</code>, it specifies a 1433 character encoding to use for the document. This overrides the document 1434 encoding declaration. There are four built-in encodings: 1435 </p> 1436 1437 <ul> 1438 <li>US-ASCII 1439 </li> 1440 1441 <li>UTF-8 1442 </li> 1443 1444 <li>UTF-16 1445 </li> 1446 1447 <li>ISO-8859-1 1448 </li> 1449 </ul> 1450 1451 <p> 1452 Any other value will invoke a call to the UnknownEncodingHandler. 1453 </p> 1454 </div> 1455 1456 <h4 id="XML_ParserCreateNS"> 1457 XML_ParserCreateNS 1458 </h4> 1459 1460 <pre class="fcndec"> 1461XML_Parser XMLCALL 1462XML_ParserCreateNS(const XML_Char *encoding, 1463 XML_Char sep); 1464</pre> 1465 <div class="fcndef"> 1466 Constructs a new parser that has namespace processing in effect. Namespace 1467 expanded element names and attribute names are returned as a concatenation of the 1468 namespace URI, <em>sep</em>, and the local part of the name. This means that you 1469 should pick a character for <em>sep</em> that can't be part of an URI. Since 1470 Expat does not check namespace URIs for conformance, the only safe choice for a 1471 namespace separator is a character that is illegal in XML. For instance, 1472 <code>'\xFF'</code> is not legal in UTF-8, and <code>'\xFFFF'</code> is not legal 1473 in UTF-16. There is a special case when <em>sep</em> is the null character 1474 <code>'\0'</code>: the namespace URI and the local part will be concatenated 1475 without any separator - this is intended to support RDF processors. It is a 1476 programming error to use the null separator with <a href= 1477 "#XML_SetReturnNSTriplet">namespace triplets</a>. 1478 </div> 1479 1480 <p> 1481 <strong>Note:</strong> Expat does not validate namespace URIs (beyond encoding) 1482 against RFC 3986 today (and is not required to do so with regard to the XML 1.0 1483 namespaces specification) but it may start doing that in future releases. Before 1484 that, an application using Expat must be ready to receive namespace URIs 1485 containing non-URI characters. 1486 </p> 1487 1488 <h4 id="XML_ParserCreate_MM"> 1489 XML_ParserCreate_MM 1490 </h4> 1491 1492 <pre class="fcndec"> 1493XML_Parser XMLCALL 1494XML_ParserCreate_MM(const XML_Char *encoding, 1495 const XML_Memory_Handling_Suite *ms, 1496 const XML_Char *sep); 1497</pre> 1498 1499 <pre class="signature"> 1500typedef struct { 1501 void *(XMLCALL *malloc_fcn)(size_t size); 1502 void *(XMLCALL *realloc_fcn)(void *ptr, size_t size); 1503 void (XMLCALL *free_fcn)(void *ptr); 1504} XML_Memory_Handling_Suite; 1505</pre> 1506 <div class="fcndef"> 1507 <p> 1508 Construct a new parser using the suite of memory handling functions specified 1509 in <code>ms</code>. If <code>ms</code> is <code>NULL</code>, then use the 1510 standard set of memory management functions. If <code>sep</code> is 1511 non-<code>NULL</code>, then namespace processing is enabled in the created 1512 parser and the character pointed at by sep is used as the separator between the 1513 namespace URI and the local part of the name. 1514 </p> 1515 </div> 1516 1517 <h4 id="XML_ExternalEntityParserCreate"> 1518 XML_ExternalEntityParserCreate 1519 </h4> 1520 1521 <pre class="fcndec"> 1522XML_Parser XMLCALL 1523XML_ExternalEntityParserCreate(XML_Parser p, 1524 const XML_Char *context, 1525 const XML_Char *encoding); 1526</pre> 1527 <div class="fcndef"> 1528 <p> 1529 Construct a new <code>XML_Parser</code> object for parsing an external general 1530 entity. Context is the context argument passed in a call to a 1531 ExternalEntityRefHandler. Other state information such as handlers, user data, 1532 namespace processing is inherited from the parser passed as the 1st argument. 1533 So you shouldn't need to call any of the behavior changing functions on this 1534 parser (unless you want it to act differently than the parent parser). 1535 </p> 1536 1537 <p> 1538 <strong>Note:</strong> Please be sure to free subparsers created by 1539 <code><a href= 1540 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> 1541 <em>prior to</em> freeing their related parent parser, as subparsers reference 1542 and use parts of their respective parent parser, internally. Parent parsers 1543 must outlive subparsers. 1544 </p> 1545 </div> 1546 1547 <h4 id="XML_ParserFree"> 1548 XML_ParserFree 1549 </h4> 1550 1551 <pre class="fcndec"> 1552void XMLCALL 1553XML_ParserFree(XML_Parser p); 1554</pre> 1555 <div class="fcndef"> 1556 <p> 1557 Free memory used by the parser. 1558 </p> 1559 1560 <p> 1561 <strong>Note:</strong> Your application is responsible for freeing any memory 1562 associated with <a href="#userdata">user data</a>. 1563 </p> 1564 1565 <p> 1566 <strong>Note:</strong> Please be sure to free subparsers created by 1567 <code><a href= 1568 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code> 1569 <em>prior to</em> freeing their related parent parser, as subparsers reference 1570 and use parts of their respective parent parser, internally. Parent parsers 1571 must outlive subparsers. 1572 </p> 1573 </div> 1574 1575 <h4 id="XML_ParserReset"> 1576 XML_ParserReset 1577 </h4> 1578 1579 <pre class="fcndec"> 1580XML_Bool XMLCALL 1581XML_ParserReset(XML_Parser p, 1582 const XML_Char *encoding); 1583</pre> 1584 <div class="fcndef"> 1585 Clean up the memory structures maintained by the parser so that it may be used 1586 again. After this has been called, <code>parser</code> is ready to start parsing 1587 a new document. All handlers are cleared from the parser, except for the 1588 unknownEncodingHandler. The parser's external state is re-initialized except for 1589 the values of ns and ns_triplets. This function may not be used on a parser 1590 created using <code><a href= 1591 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>; it 1592 will return <code>XML_FALSE</code> in that case. Returns <code>XML_TRUE</code> on 1593 success. Your application is responsible for dealing with any memory associated 1594 with <a href="#userdata">user data</a>. 1595 </div> 1596 1597 <h3> 1598 <a id="parsing" name="parsing">Parsing</a> 1599 </h3> 1600 1601 <p> 1602 To state the obvious: the three parsing functions <code><a href= 1603 "#XML_Parse">XML_Parse</a></code>, <code><a href= 1604 "#XML_ParseBuffer">XML_ParseBuffer</a></code> and <code><a href= 1605 "#XML_GetBuffer">XML_GetBuffer</a></code> as well as the two cleanup functions 1606 <code><a href="#XML_ParserFree">XML_ParserFree</a></code> and <code><a href= 1607 "#XML_ParserReset">XML_ParserReset</a></code> must not be called from within a 1608 handler unless they operate on a separate parser instance, that is, one that did 1609 not call the handler. For example, it is OK to call the parsing functions from 1610 within an <code>XML_ExternalEntityRefHandler</code>, if they apply to the parser 1611 created by <code><a href= 1612 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 1613 </p> 1614 1615 <p> 1616 Note: The <code>len</code> argument passed to these functions should be 1617 considerably less than the maximum value for an integer, as it could create an 1618 integer overflow situation if the added lengths of a buffer and the unprocessed 1619 portion of the previous buffer exceed the maximum integer value. Input data at 1620 the end of a buffer will remain unprocessed if it is part of an XML token for 1621 which the end is not part of that buffer. 1622 </p> 1623 1624 <p> 1625 <a id="isFinal" name="isFinal"></a>The application <em>must</em> make a 1626 concluding <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 1627 "#XML_ParseBuffer">XML_ParseBuffer</a></code> call with <code>isFinal</code> set 1628 to <code>XML_TRUE</code>. 1629 </p> 1630 1631 <h4 id="XML_Parse"> 1632 XML_Parse 1633 </h4> 1634 1635 <pre class="fcndec"> 1636enum XML_Status XMLCALL 1637XML_Parse(XML_Parser p, 1638 const char *s, 1639 int len, 1640 int isFinal); 1641</pre> 1642 1643 <pre class="signature"> 1644enum XML_Status { 1645 XML_STATUS_ERROR = 0, 1646 XML_STATUS_OK = 1 1647}; 1648</pre> 1649 <div class="fcndef"> 1650 <p> 1651 Parse some more of the document. The string <code>s</code> is a buffer 1652 containing part (or perhaps all) of the document. The number of bytes of s that 1653 are part of the document is indicated by <code>len</code>. This means that 1654 <code>s</code> doesn't have to be null-terminated. It also means that if 1655 <code>len</code> is larger than the number of bytes in the block of memory that 1656 <code>s</code> points at, then a memory fault is likely. Negative values for 1657 <code>len</code> are rejected since Expat 2.2.1. The <code>isFinal</code> 1658 parameter informs the parser that this is the last piece of the document. 1659 Frequently, the last piece is empty (i.e. <code>len</code> is zero.) 1660 </p> 1661 1662 <p> 1663 If a parse error occurred, it returns <code>XML_STATUS_ERROR</code>. Otherwise 1664 it returns <code>XML_STATUS_OK</code> value. Note that regardless of the return 1665 value, there is no guarantee that all provided input has been parsed; only 1666 after <a href="#isFinal">the concluding call</a> will all handler callbacks and 1667 parsing errors have happened. 1668 </p> 1669 1670 <p> 1671 Simplified, <code>XML_Parse</code> can be considered a convenience wrapper that 1672 is pairing calls to <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> and 1673 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> (when Expat is 1674 built with macro <code>XML_CONTEXT_BYTES</code> defined to a positive value, 1675 which is both common and default). <code>XML_Parse</code> is then functionally 1676 equivalent to calling <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code>, 1677 <code>memcpy</code>, and <code><a href= 1678 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. 1679 </p> 1680 1681 <p> 1682 To avoid double copying of the input, direct use of functions <code><a href= 1683 "#XML_GetBuffer">XML_GetBuffer</a></code> and <code><a href= 1684 "#XML_ParseBuffer">XML_ParseBuffer</a></code> is advised for most production 1685 use, e.g. if you're using <code>read</code> or similar functionality to fill 1686 your buffers, fill directly into the buffer from <code><a href= 1687 "#XML_GetBuffer">XML_GetBuffer</a></code>, then parse with <code><a href= 1688 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. 1689 </p> 1690 </div> 1691 1692 <h4 id="XML_ParseBuffer"> 1693 XML_ParseBuffer 1694 </h4> 1695 1696 <pre class="fcndec"> 1697enum XML_Status XMLCALL 1698XML_ParseBuffer(XML_Parser p, 1699 int len, 1700 int isFinal); 1701</pre> 1702 <div class="fcndef"> 1703 <p> 1704 This is just like <code><a href="#XML_Parse">XML_Parse</a></code>, except in 1705 this case Expat provides the buffer. By obtaining the buffer from Expat with 1706 the <code><a href="#XML_GetBuffer">XML_GetBuffer</a></code> function, the 1707 application can avoid double copying of the input. 1708 </p> 1709 1710 <p> 1711 Negative values for <code>len</code> are rejected since Expat 2.6.3. 1712 </p> 1713 </div> 1714 1715 <h4 id="XML_GetBuffer"> 1716 XML_GetBuffer 1717 </h4> 1718 1719 <pre class="fcndec"> 1720void * XMLCALL 1721XML_GetBuffer(XML_Parser p, 1722 int len); 1723</pre> 1724 <div class="fcndef"> 1725 Obtain a buffer of size <code>len</code> to read a piece of the document into. A 1726 <code>NULL</code> value is returned if Expat can't allocate enough memory for 1727 this buffer. A <code>NULL</code> value may also be returned if <code>len</code> 1728 is zero. This has to be called prior to every call to <code><a href= 1729 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. A typical use would look like 1730 this: 1731 1732 <pre class="eg"> 1733for (;;) { 1734 int bytes_read; 1735 void *buff = XML_GetBuffer(p, BUFF_SIZE); 1736 if (buff == NULL) { 1737 /* handle error */ 1738 } 1739 1740 bytes_read = read(docfd, buff, BUFF_SIZE); 1741 if (bytes_read < 0) { 1742 /* handle error */ 1743 } 1744 1745 if (! XML_ParseBuffer(p, bytes_read, bytes_read == 0)) { 1746 /* handle parse error */ 1747 } 1748 1749 if (bytes_read == 0) 1750 break; 1751} 1752</pre> 1753 </div> 1754 1755 <h4 id="XML_StopParser"> 1756 XML_StopParser 1757 </h4> 1758 1759 <pre class="fcndec"> 1760enum XML_Status XMLCALL 1761XML_StopParser(XML_Parser p, 1762 XML_Bool resumable); 1763</pre> 1764 <div class="fcndef"> 1765 <p> 1766 Stops parsing, causing <code><a href="#XML_Parse">XML_Parse</a></code> or 1767 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> to return. Must be 1768 called from within a call-back handler, except when aborting (when 1769 <code>resumable</code> is <code>XML_FALSE</code>) an already suspended parser. 1770 Some call-backs may still follow because they would otherwise get lost, 1771 including 1772 </p> 1773 1774 <ul> 1775 <li>the end element handler for empty elements when stopped in the start 1776 element handler, 1777 </li> 1778 1779 <li>the end namespace declaration handler when stopped in the end element 1780 handler, 1781 </li> 1782 1783 <li>the character data handler when stopped in the character data handler while 1784 making multiple call-backs on a contiguous chunk of characters, 1785 </li> 1786 </ul> 1787 1788 <p> 1789 and possibly others. 1790 </p> 1791 1792 <p> 1793 This can be called from most handlers, including DTD related call-backs, except 1794 when parsing an external parameter entity and <code>resumable</code> is 1795 <code>XML_TRUE</code>. Returns <code>XML_STATUS_OK</code> when successful, 1796 <code>XML_STATUS_ERROR</code> otherwise. The possible error codes are: 1797 </p> 1798 1799 <dl> 1800 <dt> 1801 <code>XML_ERROR_NOT_STARTED</code> 1802 </dt> 1803 1804 <dd> 1805 when stopping or suspending a parser before it has started, added in Expat 1806 2.6.4. 1807 </dd> 1808 1809 <dt> 1810 <code>XML_ERROR_SUSPENDED</code> 1811 </dt> 1812 1813 <dd> 1814 when suspending an already suspended parser. 1815 </dd> 1816 1817 <dt> 1818 <code>XML_ERROR_FINISHED</code> 1819 </dt> 1820 1821 <dd> 1822 when the parser has already finished. 1823 </dd> 1824 1825 <dt> 1826 <code>XML_ERROR_SUSPEND_PE</code> 1827 </dt> 1828 1829 <dd> 1830 when suspending while parsing an external PE. 1831 </dd> 1832 </dl> 1833 1834 <p> 1835 Since the stop/resume feature requires application support in the outer parsing 1836 loop, it is an error to call this function for a parser not being handled 1837 appropriately; see <a href="#stop-resume">Temporarily Stopping Parsing</a> for 1838 more information. 1839 </p> 1840 1841 <p> 1842 When <code>resumable</code> is <code>XML_TRUE</code> then parsing is 1843 <em>suspended</em>, that is, <code><a href="#XML_Parse">XML_Parse</a></code> 1844 and <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> return 1845 <code>XML_STATUS_SUSPENDED</code>. Otherwise, parsing is <em>aborted</em>, that 1846 is, <code><a href="#XML_Parse">XML_Parse</a></code> and <code><a href= 1847 "#XML_ParseBuffer">XML_ParseBuffer</a></code> return 1848 <code>XML_STATUS_ERROR</code> with error code <code>XML_ERROR_ABORTED</code>. 1849 </p> 1850 1851 <p> 1852 <strong>Note:</strong> This will be applied to the current parser instance 1853 only, that is, if there is a parent parser then it will continue parsing when 1854 the external entity reference handler returns. It is up to the implementation 1855 of that handler to call <code><a href= 1856 "#XML_StopParser">XML_StopParser</a></code> on the parent parser (recursively), 1857 if one wants to stop parsing altogether. 1858 </p> 1859 1860 <p> 1861 When suspended, parsing can be resumed by calling <code><a href= 1862 "#XML_ResumeParser">XML_ResumeParser</a></code>. 1863 </p> 1864 1865 <p> 1866 New in Expat 1.95.8. 1867 </p> 1868 </div> 1869 1870 <h4 id="XML_ResumeParser"> 1871 XML_ResumeParser 1872 </h4> 1873 1874 <pre class="fcndec"> 1875enum XML_Status XMLCALL 1876XML_ResumeParser(XML_Parser p); 1877</pre> 1878 <div class="fcndef"> 1879 <p> 1880 Resumes parsing after it has been suspended with <code><a href= 1881 "#XML_StopParser">XML_StopParser</a></code>. Must not be called from within a 1882 handler call-back. Returns same status codes as <code><a href= 1883 "#XML_Parse">XML_Parse</a></code> or <code><a href= 1884 "#XML_ParseBuffer">XML_ParseBuffer</a></code>. An additional error code, 1885 <code>XML_ERROR_NOT_SUSPENDED</code>, will be returned if the parser was not 1886 currently suspended. 1887 </p> 1888 1889 <p> 1890 <strong>Note:</strong> This must be called on the most deeply nested child 1891 parser instance first, and on its parent parser only after the child parser has 1892 finished, to be applied recursively until the document entity's parser is 1893 restarted. That is, the parent parser will not resume by itself and it is up to 1894 the application to call <code><a href= 1895 "#XML_ResumeParser">XML_ResumeParser</a></code> on it at the appropriate 1896 moment. 1897 </p> 1898 1899 <p> 1900 New in Expat 1.95.8. 1901 </p> 1902 </div> 1903 1904 <h4 id="XML_GetParsingStatus"> 1905 XML_GetParsingStatus 1906 </h4> 1907 1908 <pre class="fcndec"> 1909void XMLCALL 1910XML_GetParsingStatus(XML_Parser p, 1911 XML_ParsingStatus *status); 1912</pre> 1913 1914 <pre class="signature"> 1915enum XML_Parsing { 1916 XML_INITIALIZED, 1917 XML_PARSING, 1918 XML_FINISHED, 1919 XML_SUSPENDED 1920}; 1921 1922typedef struct { 1923 enum XML_Parsing parsing; 1924 XML_Bool finalBuffer; 1925} XML_ParsingStatus; 1926</pre> 1927 <div class="fcndef"> 1928 <p> 1929 Returns status of parser with respect to being initialized, parsing, finished, 1930 or suspended, and whether the final buffer is being processed. The 1931 <code>status</code> parameter <em>must not</em> be <code>NULL</code>. 1932 </p> 1933 1934 <p> 1935 New in Expat 1.95.8. 1936 </p> 1937 </div> 1938 1939 <h3> 1940 <a id="setting" name="setting">Handler Setting</a> 1941 </h3> 1942 1943 <p> 1944 Although handlers are typically set prior to parsing and left alone, an 1945 application may choose to set or change the handler for a parsing event while the 1946 parse is in progress. For instance, your application may choose to ignore all 1947 text not descended from a <code>para</code> element. One way it could do this is 1948 to set the character handler when a para start tag is seen, and unset it for the 1949 corresponding end tag. 1950 </p> 1951 1952 <p> 1953 A handler may be <em>unset</em> by providing a <code>NULL</code> pointer to the 1954 appropriate handler setter. None of the handler setting functions have a return 1955 value. 1956 </p> 1957 1958 <p> 1959 Your handlers will be receiving strings in arrays of type <code>XML_Char</code>. 1960 This type is conditionally defined in expat.h as either <code>char</code>, 1961 <code>wchar_t</code> or <code>unsigned short</code>. The former implies UTF-8 1962 encoding, the latter two imply UTF-16 encoding. Note that you'll receive them in 1963 this form independent of the original encoding of the document. 1964 </p> 1965 1966 <div class="handler"> 1967 <h4 id="XML_SetStartElementHandler"> 1968 XML_SetStartElementHandler 1969 </h4> 1970 1971 <pre class="setter"> 1972void XMLCALL 1973XML_SetStartElementHandler(XML_Parser p, 1974 XML_StartElementHandler start); 1975</pre> 1976 1977 <pre class="signature"> 1978typedef void 1979(XMLCALL *XML_StartElementHandler)(void *userData, 1980 const XML_Char *name, 1981 const XML_Char **atts); 1982</pre> 1983 <p> 1984 Set handler for start (and empty) tags. Attributes are passed to the start 1985 handler as a pointer to a vector of char pointers. Each attribute seen in a 1986 start (or empty) tag occupies 2 consecutive places in this vector: the 1987 attribute name followed by the attribute value. These pairs are terminated by a 1988 <code>NULL</code> pointer. 1989 </p> 1990 1991 <p> 1992 Note that an empty tag generates a call to both start and end handlers (in that 1993 order). 1994 </p> 1995 </div> 1996 1997 <div class="handler"> 1998 <h4 id="XML_SetEndElementHandler"> 1999 XML_SetEndElementHandler 2000 </h4> 2001 2002 <pre class="setter"> 2003void XMLCALL 2004XML_SetEndElementHandler(XML_Parser p, 2005 XML_EndElementHandler); 2006</pre> 2007 2008 <pre class="signature"> 2009typedef void 2010(XMLCALL *XML_EndElementHandler)(void *userData, 2011 const XML_Char *name); 2012</pre> 2013 <p> 2014 Set handler for end (and empty) tags. As noted above, an empty tag generates a 2015 call to both start and end handlers. 2016 </p> 2017 </div> 2018 2019 <div class="handler"> 2020 <h4 id="XML_SetElementHandler"> 2021 XML_SetElementHandler 2022 </h4> 2023 2024 <pre class="setter"> 2025void XMLCALL 2026XML_SetElementHandler(XML_Parser p, 2027 XML_StartElementHandler start, 2028 XML_EndElementHandler end); 2029</pre> 2030 <p> 2031 Set handlers for start and end tags with one call. 2032 </p> 2033 </div> 2034 2035 <div class="handler"> 2036 <h4 id="XML_SetCharacterDataHandler"> 2037 XML_SetCharacterDataHandler 2038 </h4> 2039 2040 <pre class="setter"> 2041void XMLCALL 2042XML_SetCharacterDataHandler(XML_Parser p, 2043 XML_CharacterDataHandler charhndl) 2044</pre> 2045 2046 <pre class="signature"> 2047typedef void 2048(XMLCALL *XML_CharacterDataHandler)(void *userData, 2049 const XML_Char *s, 2050 int len); 2051</pre> 2052 <p> 2053 Set a text handler. The string your handler receives is <em>NOT 2054 null-terminated</em>. You have to use the length argument to deal with the end 2055 of the string. A single block of contiguous text free of markup may still 2056 result in a sequence of calls to this handler. In other words, if you're 2057 searching for a pattern in the text, it may be split across calls to this 2058 handler. Note: Setting this handler to <code>NULL</code> may <em>NOT 2059 immediately</em> terminate call-backs if the parser is currently processing 2060 such a single block of contiguous markup-free text, as the parser will continue 2061 calling back until the end of the block is reached. 2062 </p> 2063 </div> 2064 2065 <div class="handler"> 2066 <h4 id="XML_SetProcessingInstructionHandler"> 2067 XML_SetProcessingInstructionHandler 2068 </h4> 2069 2070 <pre class="setter"> 2071void XMLCALL 2072XML_SetProcessingInstructionHandler(XML_Parser p, 2073 XML_ProcessingInstructionHandler proc) 2074</pre> 2075 2076 <pre class="signature"> 2077typedef void 2078(XMLCALL *XML_ProcessingInstructionHandler)(void *userData, 2079 const XML_Char *target, 2080 const XML_Char *data); 2081 2082</pre> 2083 <p> 2084 Set a handler for processing instructions. The target is the first word in the 2085 processing instruction. The data is the rest of the characters in it after 2086 skipping all whitespace after the initial word. 2087 </p> 2088 </div> 2089 2090 <div class="handler"> 2091 <h4 id="XML_SetCommentHandler"> 2092 XML_SetCommentHandler 2093 </h4> 2094 2095 <pre class="setter"> 2096void XMLCALL 2097XML_SetCommentHandler(XML_Parser p, 2098 XML_CommentHandler cmnt) 2099</pre> 2100 2101 <pre class="signature"> 2102typedef void 2103(XMLCALL *XML_CommentHandler)(void *userData, 2104 const XML_Char *data); 2105</pre> 2106 <p> 2107 Set a handler for comments. The data is all text inside the comment delimiters. 2108 </p> 2109 </div> 2110 2111 <div class="handler"> 2112 <h4 id="XML_SetStartCdataSectionHandler"> 2113 XML_SetStartCdataSectionHandler 2114 </h4> 2115 2116 <pre class="setter"> 2117void XMLCALL 2118XML_SetStartCdataSectionHandler(XML_Parser p, 2119 XML_StartCdataSectionHandler start); 2120</pre> 2121 2122 <pre class="signature"> 2123typedef void 2124(XMLCALL *XML_StartCdataSectionHandler)(void *userData); 2125</pre> 2126 <p> 2127 Set a handler that gets called at the beginning of a CDATA section. 2128 </p> 2129 </div> 2130 2131 <div class="handler"> 2132 <h4 id="XML_SetEndCdataSectionHandler"> 2133 XML_SetEndCdataSectionHandler 2134 </h4> 2135 2136 <pre class="setter"> 2137void XMLCALL 2138XML_SetEndCdataSectionHandler(XML_Parser p, 2139 XML_EndCdataSectionHandler end); 2140</pre> 2141 2142 <pre class="signature"> 2143typedef void 2144(XMLCALL *XML_EndCdataSectionHandler)(void *userData); 2145</pre> 2146 <p> 2147 Set a handler that gets called at the end of a CDATA section. 2148 </p> 2149 </div> 2150 2151 <div class="handler"> 2152 <h4 id="XML_SetCdataSectionHandler"> 2153 XML_SetCdataSectionHandler 2154 </h4> 2155 2156 <pre class="setter"> 2157void XMLCALL 2158XML_SetCdataSectionHandler(XML_Parser p, 2159 XML_StartCdataSectionHandler start, 2160 XML_EndCdataSectionHandler end) 2161</pre> 2162 <p> 2163 Sets both CDATA section handlers with one call. 2164 </p> 2165 </div> 2166 2167 <div class="handler"> 2168 <h4 id="XML_SetDefaultHandler"> 2169 XML_SetDefaultHandler 2170 </h4> 2171 2172 <pre class="setter"> 2173void XMLCALL 2174XML_SetDefaultHandler(XML_Parser p, 2175 XML_DefaultHandler hndl) 2176</pre> 2177 2178 <pre class="signature"> 2179typedef void 2180(XMLCALL *XML_DefaultHandler)(void *userData, 2181 const XML_Char *s, 2182 int len); 2183</pre> 2184 <p> 2185 Sets a handler for any characters in the document which wouldn't otherwise be 2186 handled. This includes both data for which no handlers can be set (like some 2187 kinds of DTD declarations) and data which could be reported but which currently 2188 has no handler set. The characters are passed exactly as they were present in 2189 the XML document except that they will be encoded in UTF-8 or UTF-16. Line 2190 boundaries are not normalized. Note that a byte order mark character is not 2191 passed to the default handler. There are no guarantees about how characters are 2192 divided between calls to the default handler: for example, a comment might be 2193 split between multiple calls. Setting the handler with this call has the side 2194 effect of turning off expansion of references to internally defined general 2195 entities. Instead these references are passed to the default handler. 2196 </p> 2197 2198 <p> 2199 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>. 2200 </p> 2201 </div> 2202 2203 <div class="handler"> 2204 <h4 id="XML_SetDefaultHandlerExpand"> 2205 XML_SetDefaultHandlerExpand 2206 </h4> 2207 2208 <pre class="setter"> 2209void XMLCALL 2210XML_SetDefaultHandlerExpand(XML_Parser p, 2211 XML_DefaultHandler hndl) 2212</pre> 2213 2214 <pre class="signature"> 2215typedef void 2216(XMLCALL *XML_DefaultHandler)(void *userData, 2217 const XML_Char *s, 2218 int len); 2219</pre> 2220 <p> 2221 This sets a default handler, but doesn't inhibit the expansion of internal 2222 entity references. The entity reference will not be passed to the default 2223 handler. 2224 </p> 2225 2226 <p> 2227 See also <code><a href="#XML_DefaultCurrent">XML_DefaultCurrent</a></code>. 2228 </p> 2229 </div> 2230 2231 <div class="handler"> 2232 <h4 id="XML_SetExternalEntityRefHandler"> 2233 XML_SetExternalEntityRefHandler 2234 </h4> 2235 2236 <pre class="setter"> 2237void XMLCALL 2238XML_SetExternalEntityRefHandler(XML_Parser p, 2239 XML_ExternalEntityRefHandler hndl) 2240</pre> 2241 2242 <pre class="signature"> 2243typedef int 2244(XMLCALL *XML_ExternalEntityRefHandler)(XML_Parser p, 2245 const XML_Char *context, 2246 const XML_Char *base, 2247 const XML_Char *systemId, 2248 const XML_Char *publicId); 2249</pre> 2250 <p> 2251 Set an external entity reference handler. This handler is also called for 2252 processing an external DTD subset if parameter entity parsing is in effect. 2253 (See <a href= 2254 "#XML_SetParamEntityParsing"><code>XML_SetParamEntityParsing</code></a>.) 2255 </p> 2256 2257 <p> 2258 <strong>Warning:</strong> Using an external entity reference handler can lead 2259 to <a href="https://libexpat.github.io/doc/xml-security/#external-entities">XXE 2260 vulnerabilities</a>. It should only be used in applications that do not parse 2261 untrusted XML input. 2262 </p> 2263 2264 <p> 2265 The <code>context</code> parameter specifies the parsing context in the format 2266 expected by the <code>context</code> argument to <code><a href= 2267 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 2268 <code>code</code> is valid only until the handler returns, so if the referenced 2269 entity is to be parsed later, it must be copied. <code>context</code> is 2270 <code>NULL</code> only when the entity is a parameter entity, which is how one 2271 can differentiate between general and parameter entities. 2272 </p> 2273 2274 <p> 2275 The <code>base</code> parameter is the base to use for relative system 2276 identifiers. It is set by <code><a href="#XML_SetBase">XML_SetBase</a></code> 2277 and may be <code>NULL</code>. The <code>publicId</code> parameter is the public 2278 id given in the entity declaration and may be <code>NULL</code>. 2279 <code>systemId</code> is the system identifier specified in the entity 2280 declaration and is never <code>NULL</code>. 2281 </p> 2282 2283 <p> 2284 There are a couple of ways in which this handler differs from others. First, 2285 this handler returns a status indicator (an integer). 2286 <code>XML_STATUS_OK</code> should be returned for successful handling of the 2287 external entity reference. Returning <code>XML_STATUS_ERROR</code> indicates 2288 failure, and causes the calling parser to return an 2289 <code>XML_ERROR_EXTERNAL_ENTITY_HANDLING</code> error. 2290 </p> 2291 2292 <p> 2293 Second, instead of having the user data as its first argument, it receives the 2294 parser that encountered the entity reference. This, along with the context 2295 parameter, may be used as arguments to a call to <code><a href= 2296 "#XML_ExternalEntityParserCreate">XML_ExternalEntityParserCreate</a></code>. 2297 Using the returned parser, the body of the external entity can be recursively 2298 parsed. 2299 </p> 2300 2301 <p> 2302 Since this handler may be called recursively, it should not be saving 2303 information into global or static variables. 2304 </p> 2305 </div> 2306 2307 <h4 id="XML_SetExternalEntityRefHandlerArg"> 2308 XML_SetExternalEntityRefHandlerArg 2309 </h4> 2310 2311 <pre class="fcndec"> 2312void XMLCALL 2313XML_SetExternalEntityRefHandlerArg(XML_Parser p, 2314 void *arg) 2315</pre> 2316 <div class="fcndef"> 2317 <p> 2318 Set the argument passed to the ExternalEntityRefHandler. If <code>arg</code> is 2319 not <code>NULL</code>, it is the new value passed to the handler set using 2320 <code><a href= 2321 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code>; 2322 if <code>arg</code> is <code>NULL</code>, the argument passed to the handler 2323 function will be the parser object itself. 2324 </p> 2325 2326 <p> 2327 <strong>Note:</strong> The type of <code>arg</code> and the type of the first 2328 argument to the ExternalEntityRefHandler do not match. This function takes a 2329 <code>void *</code> to be passed to the handler, while the handler accepts an 2330 <code>XML_Parser</code>. This is a historical accident, but will not be 2331 corrected before Expat 2.0 (at the earliest) to avoid causing compiler warnings 2332 for code that's known to work with this API. It is the responsibility of the 2333 application code to know the actual type of the argument passed to the handler 2334 and to manage it properly. 2335 </p> 2336 </div> 2337 2338 <div class="handler"> 2339 <h4 id="XML_SetSkippedEntityHandler"> 2340 XML_SetSkippedEntityHandler 2341 </h4> 2342 2343 <pre class="setter"> 2344void XMLCALL 2345XML_SetSkippedEntityHandler(XML_Parser p, 2346 XML_SkippedEntityHandler handler) 2347</pre> 2348 2349 <pre class="signature"> 2350typedef void 2351(XMLCALL *XML_SkippedEntityHandler)(void *userData, 2352 const XML_Char *entityName, 2353 int is_parameter_entity); 2354</pre> 2355 <p> 2356 Set a skipped entity handler. This is called in two situations: 2357 </p> 2358 2359 <ol> 2360 <li>An entity reference is encountered for which no declaration has been read 2361 <em>and</em> this is not an error. 2362 </li> 2363 2364 <li>An internal entity reference is read, but not expanded, because <a href= 2365 "#XML_SetDefaultHandler"><code>XML_SetDefaultHandler</code></a> has been 2366 called. 2367 </li> 2368 </ol> 2369 2370 <p> 2371 The <code>is_parameter_entity</code> argument will be non-zero for a parameter 2372 entity and zero for a general entity. 2373 </p> 2374 2375 <p> 2376 Note: Skipped parameter entities in declarations and skipped general entities 2377 in attribute values cannot be reported, because the event would be out of sync 2378 with the reporting of the declarations or attribute values 2379 </p> 2380 </div> 2381 2382 <div class="handler"> 2383 <h4 id="XML_SetUnknownEncodingHandler"> 2384 XML_SetUnknownEncodingHandler 2385 </h4> 2386 2387 <pre class="setter"> 2388void XMLCALL 2389XML_SetUnknownEncodingHandler(XML_Parser p, 2390 XML_UnknownEncodingHandler enchandler, 2391 void *encodingHandlerData) 2392</pre> 2393 2394 <pre class="signature"> 2395typedef int 2396(XMLCALL *XML_UnknownEncodingHandler)(void *encodingHandlerData, 2397 const XML_Char *name, 2398 XML_Encoding *info); 2399 2400typedef struct { 2401 int map[256]; 2402 void *data; 2403 int (XMLCALL *convert)(void *data, const char *s); 2404 void (XMLCALL *release)(void *data); 2405} XML_Encoding; 2406</pre> 2407 <p> 2408 Set a handler to deal with encodings other than the <a href= 2409 "#builtin_encodings">built in set</a>. This should be done before 2410 <code><a href="#XML_Parse">XML_Parse</a></code> or <code><a href= 2411 "#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on the given 2412 parser. 2413 </p> 2414 2415 <p> 2416 If the handler knows how to deal with an encoding with the given name, it 2417 should fill in the <code>info</code> data structure and return 2418 <code>XML_STATUS_OK</code>. Otherwise it should return 2419 <code>XML_STATUS_ERROR</code>. The handler will be called at most once per 2420 parsed (external) entity. The optional application data pointer 2421 <code>encodingHandlerData</code> will be passed back to the handler. 2422 </p> 2423 2424 <p> 2425 The map array contains information for every possible leading byte in a byte 2426 sequence. If the corresponding value is >= 0, then it's a single byte 2427 sequence and the byte encodes that Unicode value. If the value is -1, then that 2428 byte is invalid as the initial byte in a sequence. If the value is -n, where n 2429 is an integer > 1, then n is the number of bytes in the sequence and the 2430 actual conversion is accomplished by a call to the function pointed at by 2431 convert. This function may return -1 if the sequence itself is invalid. The 2432 convert pointer may be <code>NULL</code> if there are only single byte codes. 2433 The data parameter passed to the convert function is the data pointer from 2434 <code>XML_Encoding</code>. The string s is <em>NOT</em> null-terminated and 2435 points at the sequence of bytes to be converted. 2436 </p> 2437 2438 <p> 2439 The function pointed at by <code>release</code> is called by the parser when it 2440 is finished with the encoding. It may be <code>NULL</code>. 2441 </p> 2442 </div> 2443 2444 <div class="handler"> 2445 <h4 id="XML_SetStartNamespaceDeclHandler"> 2446 XML_SetStartNamespaceDeclHandler 2447 </h4> 2448 2449 <pre class="setter"> 2450void XMLCALL 2451XML_SetStartNamespaceDeclHandler(XML_Parser p, 2452 XML_StartNamespaceDeclHandler start); 2453</pre> 2454 2455 <pre class="signature"> 2456typedef void 2457(XMLCALL *XML_StartNamespaceDeclHandler)(void *userData, 2458 const XML_Char *prefix, 2459 const XML_Char *uri); 2460</pre> 2461 <p> 2462 Set a handler to be called when a namespace is declared. Namespace declarations 2463 occur inside start tags. But the namespace declaration start handler is called 2464 before the start tag handler for each namespace declared in that start tag. 2465 </p> 2466 </div> 2467 2468 <div class="handler"> 2469 <h4 id="XML_SetEndNamespaceDeclHandler"> 2470 XML_SetEndNamespaceDeclHandler 2471 </h4> 2472 2473 <pre class="setter"> 2474void XMLCALL 2475XML_SetEndNamespaceDeclHandler(XML_Parser p, 2476 XML_EndNamespaceDeclHandler end); 2477</pre> 2478 2479 <pre class="signature"> 2480typedef void 2481(XMLCALL *XML_EndNamespaceDeclHandler)(void *userData, 2482 const XML_Char *prefix); 2483</pre> 2484 <p> 2485 Set a handler to be called when leaving the scope of a namespace declaration. 2486 This will be called, for each namespace declaration, after the handler for the 2487 end tag of the element in which the namespace was declared. 2488 </p> 2489 </div> 2490 2491 <div class="handler"> 2492 <h4 id="XML_SetNamespaceDeclHandler"> 2493 XML_SetNamespaceDeclHandler 2494 </h4> 2495 2496 <pre class="setter"> 2497void XMLCALL 2498XML_SetNamespaceDeclHandler(XML_Parser p, 2499 XML_StartNamespaceDeclHandler start, 2500 XML_EndNamespaceDeclHandler end) 2501</pre> 2502 <p> 2503 Sets both namespace declaration handlers with a single call. 2504 </p> 2505 </div> 2506 2507 <div class="handler"> 2508 <h4 id="XML_SetXmlDeclHandler"> 2509 XML_SetXmlDeclHandler 2510 </h4> 2511 2512 <pre class="setter"> 2513void XMLCALL 2514XML_SetXmlDeclHandler(XML_Parser p, 2515 XML_XmlDeclHandler xmldecl); 2516</pre> 2517 2518 <pre class="signature"> 2519typedef void 2520(XMLCALL *XML_XmlDeclHandler)(void *userData, 2521 const XML_Char *version, 2522 const XML_Char *encoding, 2523 int standalone); 2524</pre> 2525 <p> 2526 Sets a handler that is called for XML declarations and also for text 2527 declarations discovered in external entities. The way to distinguish is that 2528 the <code>version</code> parameter will be <code>NULL</code> for text 2529 declarations. The <code>encoding</code> parameter may be <code>NULL</code> for 2530 an XML declaration. The <code>standalone</code> argument will contain -1, 0, or 2531 1 indicating respectively that there was no standalone parameter in the 2532 declaration, that it was given as no, or that it was given as yes. 2533 </p> 2534 </div> 2535 2536 <div class="handler"> 2537 <h4 id="XML_SetStartDoctypeDeclHandler"> 2538 XML_SetStartDoctypeDeclHandler 2539 </h4> 2540 2541 <pre class="setter"> 2542void XMLCALL 2543XML_SetStartDoctypeDeclHandler(XML_Parser p, 2544 XML_StartDoctypeDeclHandler start); 2545</pre> 2546 2547 <pre class="signature"> 2548typedef void 2549(XMLCALL *XML_StartDoctypeDeclHandler)(void *userData, 2550 const XML_Char *doctypeName, 2551 const XML_Char *sysid, 2552 const XML_Char *pubid, 2553 int has_internal_subset); 2554</pre> 2555 <p> 2556 Set a handler that is called at the start of a DOCTYPE declaration, before any 2557 external or internal subset is parsed. Both <code>sysid</code> and 2558 <code>pubid</code> may be <code>NULL</code>. The 2559 <code>has_internal_subset</code> will be non-zero if the DOCTYPE declaration 2560 has an internal subset. 2561 </p> 2562 </div> 2563 2564 <div class="handler"> 2565 <h4 id="XML_SetEndDoctypeDeclHandler"> 2566 XML_SetEndDoctypeDeclHandler 2567 </h4> 2568 2569 <pre class="setter"> 2570void XMLCALL 2571XML_SetEndDoctypeDeclHandler(XML_Parser p, 2572 XML_EndDoctypeDeclHandler end); 2573</pre> 2574 2575 <pre class="signature"> 2576typedef void 2577(XMLCALL *XML_EndDoctypeDeclHandler)(void *userData); 2578</pre> 2579 <p> 2580 Set a handler that is called at the end of a DOCTYPE declaration, after parsing 2581 any external subset. 2582 </p> 2583 </div> 2584 2585 <div class="handler"> 2586 <h4 id="XML_SetDoctypeDeclHandler"> 2587 XML_SetDoctypeDeclHandler 2588 </h4> 2589 2590 <pre class="setter"> 2591void XMLCALL 2592XML_SetDoctypeDeclHandler(XML_Parser p, 2593 XML_StartDoctypeDeclHandler start, 2594 XML_EndDoctypeDeclHandler end); 2595</pre> 2596 <p> 2597 Set both doctype handlers with one call. 2598 </p> 2599 </div> 2600 2601 <div class="handler"> 2602 <h4 id="XML_SetElementDeclHandler"> 2603 XML_SetElementDeclHandler 2604 </h4> 2605 2606 <pre class="setter"> 2607void XMLCALL 2608XML_SetElementDeclHandler(XML_Parser p, 2609 XML_ElementDeclHandler eldecl); 2610</pre> 2611 2612 <pre class="signature"> 2613typedef void 2614(XMLCALL *XML_ElementDeclHandler)(void *userData, 2615 const XML_Char *name, 2616 XML_Content *model); 2617</pre> 2618 2619 <pre class="signature"> 2620enum XML_Content_Type { 2621 XML_CTYPE_EMPTY = 1, 2622 XML_CTYPE_ANY, 2623 XML_CTYPE_MIXED, 2624 XML_CTYPE_NAME, 2625 XML_CTYPE_CHOICE, 2626 XML_CTYPE_SEQ 2627}; 2628 2629enum XML_Content_Quant { 2630 XML_CQUANT_NONE, 2631 XML_CQUANT_OPT, 2632 XML_CQUANT_REP, 2633 XML_CQUANT_PLUS 2634}; 2635 2636typedef struct XML_cp XML_Content; 2637 2638struct XML_cp { 2639 enum XML_Content_Type type; 2640 enum XML_Content_Quant quant; 2641 const XML_Char * name; 2642 unsigned int numchildren; 2643 XML_Content * children; 2644}; 2645</pre> 2646 <p> 2647 Sets a handler for element declarations in a DTD. The handler gets called with 2648 the name of the element in the declaration and a pointer to a structure that 2649 contains the element model. It's the user code's responsibility to free model 2650 when finished with via a call to <code><a href= 2651 "#XML_FreeContentModel">XML_FreeContentModel</a></code>. There is no need to 2652 free the model from the handler, it can be kept around and freed at a later 2653 stage. 2654 </p> 2655 2656 <p> 2657 The <code>model</code> argument is the root of a tree of 2658 <code>XML_Content</code> nodes. If <code>type</code> equals 2659 <code>XML_CTYPE_EMPTY</code> or <code>XML_CTYPE_ANY</code>, then 2660 <code>quant</code> will be <code>XML_CQUANT_NONE</code>, and the other fields 2661 will be zero or <code>NULL</code>. If <code>type</code> is 2662 <code>XML_CTYPE_MIXED</code>, then <code>quant</code> will be 2663 <code>XML_CQUANT_NONE</code> or <code>XML_CQUANT_REP</code> and 2664 <code>numchildren</code> will contain the number of elements that are allowed 2665 to be mixed in and <code>children</code> points to an array of 2666 <code>XML_Content</code> structures that will all have type XML_CTYPE_NAME with 2667 no quantification. Only the root node can be type <code>XML_CTYPE_EMPTY</code>, 2668 <code>XML_CTYPE_ANY</code>, or <code>XML_CTYPE_MIXED</code>. 2669 </p> 2670 2671 <p> 2672 For type <code>XML_CTYPE_NAME</code>, the <code>name</code> field points to the 2673 name and the <code>numchildren</code> and <code>children</code> fields will be 2674 zero and <code>NULL</code>. The <code>quant</code> field will indicate any 2675 quantifiers placed on the name. 2676 </p> 2677 2678 <p> 2679 Types <code>XML_CTYPE_CHOICE</code> and <code>XML_CTYPE_SEQ</code> indicate a 2680 choice or sequence respectively. The <code>numchildren</code> field indicates 2681 how many nodes in the choice or sequence and <code>children</code> points to 2682 the nodes. 2683 </p> 2684 </div> 2685 2686 <div class="handler"> 2687 <h4 id="XML_SetAttlistDeclHandler"> 2688 XML_SetAttlistDeclHandler 2689 </h4> 2690 2691 <pre class="setter"> 2692void XMLCALL 2693XML_SetAttlistDeclHandler(XML_Parser p, 2694 XML_AttlistDeclHandler attdecl); 2695</pre> 2696 2697 <pre class="signature"> 2698typedef void 2699(XMLCALL *XML_AttlistDeclHandler)(void *userData, 2700 const XML_Char *elname, 2701 const XML_Char *attname, 2702 const XML_Char *att_type, 2703 const XML_Char *dflt, 2704 int isrequired); 2705</pre> 2706 <p> 2707 Set a handler for attlist declarations in the DTD. This handler is called for 2708 <em>each</em> attribute. So a single attlist declaration with multiple 2709 attributes declared will generate multiple calls to this handler. The 2710 <code>elname</code> parameter returns the name of the element for which the 2711 attribute is being declared. The attribute name is in the <code>attname</code> 2712 parameter. The attribute type is in the <code>att_type</code> parameter. It is 2713 the string representing the type in the declaration with whitespace removed. 2714 </p> 2715 2716 <p> 2717 The <code>dflt</code> parameter holds the default value. It will be 2718 <code>NULL</code> in the case of "#IMPLIED" or "#REQUIRED" attributes. You can 2719 distinguish these two cases by checking the <code>isrequired</code> parameter, 2720 which will be true in the case of "#REQUIRED" attributes. Attributes which are 2721 "#FIXED" will have also have a true <code>isrequired</code>, but they will have 2722 the non-<code>NULL</code> fixed value in the <code>dflt</code> parameter. 2723 </p> 2724 </div> 2725 2726 <div class="handler"> 2727 <h4 id="XML_SetEntityDeclHandler"> 2728 XML_SetEntityDeclHandler 2729 </h4> 2730 2731 <pre class="setter"> 2732void XMLCALL 2733XML_SetEntityDeclHandler(XML_Parser p, 2734 XML_EntityDeclHandler handler); 2735</pre> 2736 2737 <pre class="signature"> 2738typedef void 2739(XMLCALL *XML_EntityDeclHandler)(void *userData, 2740 const XML_Char *entityName, 2741 int is_parameter_entity, 2742 const XML_Char *value, 2743 int value_length, 2744 const XML_Char *base, 2745 const XML_Char *systemId, 2746 const XML_Char *publicId, 2747 const XML_Char *notationName); 2748</pre> 2749 <p> 2750 Sets a handler that will be called for all entity declarations. The 2751 <code>is_parameter_entity</code> argument will be non-zero in the case of 2752 parameter entities and zero otherwise. 2753 </p> 2754 2755 <p> 2756 For internal entities (<code><!ENTITY foo "bar"></code>), 2757 <code>value</code> will be non-<code>NULL</code> and <code>systemId</code>, 2758 <code>publicId</code>, and <code>notationName</code> will all be 2759 <code>NULL</code>. The value string is <em>not</em> null-terminated; the length 2760 is provided in the <code>value_length</code> parameter. Do not use 2761 <code>value_length</code> to test for internal entities, since it is legal to 2762 have zero-length values. Instead check for whether or not <code>value</code> is 2763 <code>NULL</code>. 2764 </p> 2765 2766 <p> 2767 The <code>notationName</code> argument will have a non-<code>NULL</code> value 2768 only for unparsed entity declarations. 2769 </p> 2770 </div> 2771 2772 <div class="handler"> 2773 <h4 id="XML_SetUnparsedEntityDeclHandler"> 2774 XML_SetUnparsedEntityDeclHandler 2775 </h4> 2776 2777 <pre class="setter"> 2778void XMLCALL 2779XML_SetUnparsedEntityDeclHandler(XML_Parser p, 2780 XML_UnparsedEntityDeclHandler h) 2781</pre> 2782 2783 <pre class="signature"> 2784typedef void 2785(XMLCALL *XML_UnparsedEntityDeclHandler)(void *userData, 2786 const XML_Char *entityName, 2787 const XML_Char *base, 2788 const XML_Char *systemId, 2789 const XML_Char *publicId, 2790 const XML_Char *notationName); 2791</pre> 2792 <p> 2793 Set a handler that receives declarations of unparsed entities. These are entity 2794 declarations that have a notation (NDATA) field: 2795 </p> 2796 2797 <div id="eg"> 2798 <pre> 2799<!ENTITY logo SYSTEM "images/logo.gif" NDATA gif> 2800</pre> 2801 </div> 2802 2803 <p> 2804 This handler is obsolete and is provided for backwards compatibility. Use 2805 instead <a href="#XML_SetEntityDeclHandler">XML_SetEntityDeclHandler</a>. 2806 </p> 2807 </div> 2808 2809 <div class="handler"> 2810 <h4 id="XML_SetNotationDeclHandler"> 2811 XML_SetNotationDeclHandler 2812 </h4> 2813 2814 <pre class="setter"> 2815void XMLCALL 2816XML_SetNotationDeclHandler(XML_Parser p, 2817 XML_NotationDeclHandler h) 2818</pre> 2819 2820 <pre class="signature"> 2821typedef void 2822(XMLCALL *XML_NotationDeclHandler)(void *userData, 2823 const XML_Char *notationName, 2824 const XML_Char *base, 2825 const XML_Char *systemId, 2826 const XML_Char *publicId); 2827</pre> 2828 <p> 2829 Set a handler that receives notation declarations. 2830 </p> 2831 </div> 2832 2833 <div class="handler"> 2834 <h4 id="XML_SetNotStandaloneHandler"> 2835 XML_SetNotStandaloneHandler 2836 </h4> 2837 2838 <pre class="setter"> 2839void XMLCALL 2840XML_SetNotStandaloneHandler(XML_Parser p, 2841 XML_NotStandaloneHandler h) 2842</pre> 2843 2844 <pre class="signature"> 2845typedef int 2846(XMLCALL *XML_NotStandaloneHandler)(void *userData); 2847</pre> 2848 <p> 2849 Set a handler that is called if the document is not "standalone". This happens 2850 when there is an external subset or a reference to a parameter entity, but does 2851 not have standalone set to "yes" in an XML declaration. If this handler returns 2852 <code>XML_STATUS_ERROR</code>, then the parser will throw an 2853 <code>XML_ERROR_NOT_STANDALONE</code> error. 2854 </p> 2855 </div> 2856 2857 <h3> 2858 <a id="position" name="position">Parse position and error reporting functions</a> 2859 </h3> 2860 2861 <p> 2862 These are the functions you'll want to call when the parse functions return 2863 <code>XML_STATUS_ERROR</code> (a parse error has occurred), although the position 2864 reporting functions are useful outside of errors. The position reported is the 2865 byte position (in the original document or entity encoding) of the first of the 2866 sequence of characters that generated the current event (or the error that caused 2867 the parse functions to return <code>XML_STATUS_ERROR</code>.) The exceptions are 2868 callbacks triggered by declarations in the document prologue, in which case they 2869 exact position reported is somewhere in the relevant markup, but not necessarily 2870 as meaningful as for other events. 2871 </p> 2872 2873 <p> 2874 The position reporting functions are accurate only outside of the DTD. In other 2875 words, they usually return bogus information when called from within a DTD 2876 declaration handler. 2877 </p> 2878 2879 <h4 id="XML_GetErrorCode"> 2880 XML_GetErrorCode 2881 </h4> 2882 2883 <pre class="fcndec"> 2884enum XML_Error XMLCALL 2885XML_GetErrorCode(XML_Parser p); 2886</pre> 2887 <div class="fcndef"> 2888 Return what type of error has occurred. 2889 </div> 2890 2891 <h4 id="XML_ErrorString"> 2892 XML_ErrorString 2893 </h4> 2894 2895 <pre class="fcndec"> 2896const XML_LChar * XMLCALL 2897XML_ErrorString(enum XML_Error code); 2898</pre> 2899 <div class="fcndef"> 2900 Return a string describing the error corresponding to code. The code should be 2901 one of the enums that can be returned from <code><a href= 2902 "#XML_GetErrorCode">XML_GetErrorCode</a></code>. 2903 </div> 2904 2905 <h4 id="XML_GetCurrentByteIndex"> 2906 XML_GetCurrentByteIndex 2907 </h4> 2908 2909 <pre class="fcndec"> 2910XML_Index XMLCALL 2911XML_GetCurrentByteIndex(XML_Parser p); 2912</pre> 2913 <div class="fcndef"> 2914 Return the byte offset of the position. This always corresponds to the values 2915 returned by <code><a href= 2916 "#XML_GetCurrentLineNumber">XML_GetCurrentLineNumber</a></code> and 2917 <code><a href="#XML_GetCurrentColumnNumber">XML_GetCurrentColumnNumber</a></code>. 2918 </div> 2919 2920 <h4 id="XML_GetCurrentLineNumber"> 2921 XML_GetCurrentLineNumber 2922 </h4> 2923 2924 <pre class="fcndec"> 2925XML_Size XMLCALL 2926XML_GetCurrentLineNumber(XML_Parser p); 2927</pre> 2928 <div class="fcndef"> 2929 Return the line number of the position. The first line is reported as 2930 <code>1</code>. 2931 </div> 2932 2933 <h4 id="XML_GetCurrentColumnNumber"> 2934 XML_GetCurrentColumnNumber 2935 </h4> 2936 2937 <pre class="fcndec"> 2938XML_Size XMLCALL 2939XML_GetCurrentColumnNumber(XML_Parser p); 2940</pre> 2941 <div class="fcndef"> 2942 Return the <em>offset</em>, from the beginning of the current line, of the 2943 position. The first column is reported as <code>0</code>. 2944 </div> 2945 2946 <h4 id="XML_GetCurrentByteCount"> 2947 XML_GetCurrentByteCount 2948 </h4> 2949 2950 <pre class="fcndec"> 2951int XMLCALL 2952XML_GetCurrentByteCount(XML_Parser p); 2953</pre> 2954 <div class="fcndef"> 2955 Return the number of bytes in the current event. Returns <code>0</code> if the 2956 event is inside a reference to an internal entity and for the end-tag event for 2957 empty element tags (the later can be used to distinguish empty-element tags from 2958 empty elements using separate start and end tags). 2959 </div> 2960 2961 <h4 id="XML_GetInputContext"> 2962 XML_GetInputContext 2963 </h4> 2964 2965 <pre class="fcndec"> 2966const char * XMLCALL 2967XML_GetInputContext(XML_Parser p, 2968 int *offset, 2969 int *size); 2970</pre> 2971 <div class="fcndef"> 2972 <p> 2973 Returns the parser's input buffer, sets the integer pointed at by 2974 <code>offset</code> to the offset within this buffer of the current parse 2975 position, and set the integer pointed at by <code>size</code> to the size of 2976 the returned buffer. 2977 </p> 2978 2979 <p> 2980 This should only be called from within a handler during an active parse and the 2981 returned buffer should only be referred to from within the handler that made 2982 the call. This input buffer contains the untranslated bytes of the input. 2983 </p> 2984 2985 <p> 2986 Only a limited amount of context is kept, so if the event triggering a call 2987 spans over a very large amount of input, the actual parse position may be 2988 before the beginning of the buffer. 2989 </p> 2990 2991 <p> 2992 If <code>XML_CONTEXT_BYTES</code> is zero, this will always return 2993 <code>NULL</code>. 2994 </p> 2995 </div> 2996 2997 <h3> 2998 <a id="attack-protection" name="attack-protection">Attack Protection</a><a id= 2999 "billion-laughs" name="billion-laughs"></a> 3000 </h3> 3001 3002 <h4 id="XML_SetBillionLaughsAttackProtectionMaximumAmplification"> 3003 XML_SetBillionLaughsAttackProtectionMaximumAmplification 3004 </h4> 3005 3006 <pre class="fcndec"> 3007/* Added in Expat 2.4.0. */ 3008XML_Bool XMLCALL 3009XML_SetBillionLaughsAttackProtectionMaximumAmplification(XML_Parser p, 3010 float maximumAmplificationFactor); 3011</pre> 3012 <div class="fcndef"> 3013 <p> 3014 Sets the maximum tolerated amplification factor for protection against <a href= 3015 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs 3016 attacks</a> (default: <code>100.0</code>) of parser <code>p</code> to 3017 <code>maximumAmplificationFactor</code>, and returns <code>XML_TRUE</code> upon 3018 success and <code>XML_FALSE</code> upon error. 3019 </p> 3020 3021 <p> 3022 Once the <a href= 3023 "#XML_SetBillionLaughsAttackProtectionActivationThreshold">threshold for 3024 activation</a> is reached, the amplification factor is calculated as .. 3025 </p> 3026 3027 <pre>amplification := (direct + indirect) / direct</pre> 3028 <p> 3029 .. while parsing, whereas <code>direct</code> is the number of bytes read from 3030 the primary document in parsing and <code>indirect</code> is the number of 3031 bytes added by expanding entities and reading of external DTD files, combined. 3032 </p> 3033 3034 <p> 3035 For a call to 3036 <code>XML_SetBillionLaughsAttackProtectionMaximumAmplification</code> to 3037 succeed: 3038 </p> 3039 3040 <ul> 3041 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3042 any parent parsers) and 3043 </li> 3044 3045 <li> 3046 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and 3047 greater than or equal to <code>1.0</code>. 3048 </li> 3049 </ul> 3050 3051 <p> 3052 <strong>Note:</strong> If you ever need to increase this value for non-attack 3053 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3054 bug report</a>. 3055 </p> 3056 3057 <p> 3058 <strong>Note:</strong> Peak amplifications of factor 15,000 for the entire 3059 payload and of factor 30,000 in the middle of parsing have been observed with 3060 small benign files in practice. So if you do reduce the maximum allowed 3061 amplification, please make sure that the activation threshold is still big 3062 enough to not end up with undesired false positives (i.e. benign files being 3063 rejected). 3064 </p> 3065 </div> 3066 3067 <h4 id="XML_SetBillionLaughsAttackProtectionActivationThreshold"> 3068 XML_SetBillionLaughsAttackProtectionActivationThreshold 3069 </h4> 3070 3071 <pre class="fcndec"> 3072/* Added in Expat 2.4.0. */ 3073XML_Bool XMLCALL 3074XML_SetBillionLaughsAttackProtectionActivationThreshold(XML_Parser p, 3075 unsigned long long activationThresholdBytes); 3076</pre> 3077 <div class="fcndef"> 3078 <p> 3079 Sets number of output bytes (including amplification from entity expansion and 3080 reading DTD files) needed to activate protection against <a href= 3081 "https://en.wikipedia.org/wiki/Billion_laughs_attack">billion laughs 3082 attacks</a> (default: <code>8 MiB</code>) of parser <code>p</code> to 3083 <code>activationThresholdBytes</code>, and returns <code>XML_TRUE</code> upon 3084 success and <code>XML_FALSE</code> upon error. 3085 </p> 3086 3087 <p> 3088 For a call to 3089 <code>XML_SetBillionLaughsAttackProtectionActivationThreshold</code> to 3090 succeed: 3091 </p> 3092 3093 <ul> 3094 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3095 any parent parsers). 3096 </li> 3097 </ul> 3098 3099 <p> 3100 <strong>Note:</strong> If you ever need to increase this value for non-attack 3101 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3102 bug report</a>. 3103 </p> 3104 3105 <p> 3106 <strong>Note:</strong> Activation thresholds below 4 MiB are known to break 3107 support for <a href= 3108 "https://en.wikipedia.org/wiki/Darwin_Information_Typing_Architecture">DITA</a> 3109 1.3 payload and are hence not recommended. 3110 </p> 3111 </div> 3112 3113 <h4 id="XML_SetAllocTrackerMaximumAmplification"> 3114 XML_SetAllocTrackerMaximumAmplification 3115 </h4> 3116 3117 <pre class="fcndec"> 3118/* Added in Expat 2.7.2. */ 3119XML_Bool 3120XML_SetAllocTrackerMaximumAmplification(XML_Parser p, 3121 float maximumAmplificationFactor); 3122</pre> 3123 <div class="fcndef"> 3124 <p> 3125 Sets the maximum tolerated amplification factor between direct input and bytes 3126 of dynamic memory allocated (default: <code>100.0</code>) of parser 3127 <code>p</code> to <code>maximumAmplificationFactor</code>, and returns 3128 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 3129 </p> 3130 3131 <p> 3132 <strong>Note:</strong> There are three types of allocations that intentionally 3133 bypass tracking and limiting: 3134 </p> 3135 3136 <ul> 3137 <li>application calls to functions <code><a href= 3138 "#XML_MemMalloc">XML_MemMalloc</a></code> and <code><a href="#XML_MemRealloc"> 3139 XML_MemRealloc</a></code> — <em>healthy</em> use of these two functions 3140 continues to be a responsibility of the application using Expat —, 3141 </li> 3142 3143 <li>the main character buffer used by functions <code><a href="#XML_GetBuffer"> 3144 XML_GetBuffer</a></code> and <code><a href= 3145 "#XML_ParseBuffer">XML_ParseBuffer</a></code> (and thus also by plain 3146 <code><a href="#XML_Parse">XML_Parse</a></code>), and 3147 </li> 3148 3149 <li>the <a href="#XML_SetElementDeclHandler">content model memory</a> (that is 3150 passed to the <a href="#XML_SetElementDeclHandler">element declaration 3151 handler</a> and freed by a call to <code><a href= 3152 "#XML_FreeContentModel">XML_FreeContentModel</a></code>). 3153 </li> 3154 </ul> 3155 3156 <p> 3157 Once the <a href="#XML_SetAllocTrackerActivationThreshold">threshold for 3158 activation</a> is reached, the amplification factor is calculated as .. 3159 </p> 3160 3161 <pre>amplification := allocated / direct</pre> 3162 <p> 3163 .. while parsing, whereas <code>direct</code> is the number of bytes read from 3164 the primary document in parsing and <code>allocated</code> is the number of 3165 bytes of dynamic memory allocated in the parser hierarchy. 3166 </p> 3167 3168 <p> 3169 For a call to <code>XML_SetAllocTrackerMaximumAmplification</code> to succeed: 3170 </p> 3171 3172 <ul> 3173 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3174 any parent parsers) and 3175 </li> 3176 3177 <li> 3178 <code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and 3179 greater than or equal to <code>1.0</code>. 3180 </li> 3181 </ul> 3182 3183 <p> 3184 <strong>Note:</strong> If you ever need to increase this value for non-attack 3185 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3186 bug report</a>. 3187 </p> 3188 3189 <p> 3190 <strong>Note:</strong> Amplifications factors greater than <code>100.0</code> 3191 can been observed near the start of parsing even with benign files in practice. 3192 So if you do reduce the maximum allowed amplification, please make sure that 3193 the activation threshold is still big enough to not end up with undesired false 3194 positives (i.e. benign files being rejected). 3195 </p> 3196 </div> 3197 3198 <h4 id="XML_SetAllocTrackerActivationThreshold"> 3199 XML_SetAllocTrackerActivationThreshold 3200 </h4> 3201 3202 <pre class="fcndec"> 3203/* Added in Expat 2.7.2. */ 3204XML_Bool 3205XML_SetAllocTrackerActivationThreshold(XML_Parser p, 3206 unsigned long long activationThresholdBytes); 3207</pre> 3208 <div class="fcndef"> 3209 <p> 3210 Sets number of allocated bytes of dynamic memory needed to activate protection 3211 against disproportionate use of RAM (default: <code>64 MiB</code>) of parser 3212 <code>p</code> to <code>activationThresholdBytes</code>, and returns 3213 <code>XML_TRUE</code> upon success and <code>XML_FALSE</code> upon error. 3214 </p> 3215 3216 <p> 3217 <strong>Note:</strong> For types of allocations that intentionally bypass 3218 tracking and limiting, please see <code><a href= 3219 "#XML_SetAllocTrackerMaximumAmplification">XML_SetAllocTrackerMaximumAmplification</a></code> 3220 above. 3221 </p> 3222 3223 <p> 3224 For a call to <code>XML_SetAllocTrackerActivationThreshold</code> to succeed: 3225 </p> 3226 3227 <ul> 3228 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without 3229 any parent parsers). 3230 </li> 3231 </ul> 3232 3233 <p> 3234 <strong>Note:</strong> If you ever need to increase this value for non-attack 3235 payload, please <a href="https://github.com/libexpat/libexpat/issues">file a 3236 bug report</a>. 3237 </p> 3238 </div> 3239 3240 <h4 id="XML_SetReparseDeferralEnabled"> 3241 XML_SetReparseDeferralEnabled 3242 </h4> 3243 3244 <pre class="fcndec"> 3245/* Added in Expat 2.6.0. */ 3246XML_Bool XMLCALL 3247XML_SetReparseDeferralEnabled(XML_Parser parser, XML_Bool enabled); 3248</pre> 3249 <div class="fcndef"> 3250 <p> 3251 Large tokens may require many parse calls before enough data is available for 3252 Expat to parse it in full. If Expat retried parsing the token on every parse 3253 call, parsing could take quadratic time. To avoid this, Expat only retries once 3254 a significant amount of new data is available. This function allows disabling 3255 this behavior. 3256 </p> 3257 3258 <p> 3259 The <code>enabled</code> argument should be <code>XML_TRUE</code> or 3260 <code>XML_FALSE</code>. 3261 </p> 3262 3263 <p> 3264 Returns <code>XML_TRUE</code> on success, and <code>XML_FALSE</code> on error. 3265 </p> 3266 </div> 3267 3268 <h3> 3269 <a id="miscellaneous" name="miscellaneous">Miscellaneous functions</a> 3270 </h3> 3271 3272 <p> 3273 The functions in this section either obtain state information from the parser or 3274 can be used to dynamically set parser options. 3275 </p> 3276 3277 <h4 id="XML_SetUserData"> 3278 XML_SetUserData 3279 </h4> 3280 3281 <pre class="fcndec"> 3282void XMLCALL 3283XML_SetUserData(XML_Parser p, 3284 void *userData); 3285</pre> 3286 <div class="fcndef"> 3287 This sets the user data pointer that gets passed to handlers. It overwrites any 3288 previous value for this pointer. Note that the application is responsible for 3289 freeing the memory associated with <code>userData</code> when it is finished with 3290 the parser. So if you call this when there's already a pointer there, and you 3291 haven't freed the memory associated with it, then you've probably just leaked 3292 memory. 3293 </div> 3294 3295 <h4 id="XML_GetUserData"> 3296 XML_GetUserData 3297 </h4> 3298 3299 <pre class="fcndec"> 3300void * XMLCALL 3301XML_GetUserData(XML_Parser p); 3302</pre> 3303 <div class="fcndef"> 3304 This returns the user data pointer that gets passed to handlers. It is actually 3305 implemented as a macro. 3306 </div> 3307 3308 <h4 id="XML_UseParserAsHandlerArg"> 3309 XML_UseParserAsHandlerArg 3310 </h4> 3311 3312 <pre class="fcndec"> 3313void XMLCALL 3314XML_UseParserAsHandlerArg(XML_Parser p); 3315</pre> 3316 <div class="fcndef"> 3317 After this is called, handlers receive the parser in their <code>userData</code> 3318 arguments. The user data can still be obtained using the <code><a href= 3319 "#XML_GetUserData">XML_GetUserData</a></code> function. 3320 </div> 3321 3322 <h4 id="XML_SetBase"> 3323 XML_SetBase 3324 </h4> 3325 3326 <pre class="fcndec"> 3327enum XML_Status XMLCALL 3328XML_SetBase(XML_Parser p, 3329 const XML_Char *base); 3330</pre> 3331 <div class="fcndef"> 3332 Set the base to be used for resolving relative URIs in system identifiers. The 3333 return value is <code>XML_STATUS_ERROR</code> if there's no memory to store base, 3334 otherwise it's <code>XML_STATUS_OK</code>. 3335 </div> 3336 3337 <h4 id="XML_GetBase"> 3338 XML_GetBase 3339 </h4> 3340 3341 <pre class="fcndec"> 3342const XML_Char * XMLCALL 3343XML_GetBase(XML_Parser p); 3344</pre> 3345 <div class="fcndef"> 3346 Return the base for resolving relative URIs. 3347 </div> 3348 3349 <h4 id="XML_GetSpecifiedAttributeCount"> 3350 XML_GetSpecifiedAttributeCount 3351 </h4> 3352 3353 <pre class="fcndec"> 3354int XMLCALL 3355XML_GetSpecifiedAttributeCount(XML_Parser p); 3356</pre> 3357 <div class="fcndef"> 3358 When attributes are reported to the start handler in the atts vector, attributes 3359 that were explicitly set in the element occur before any attributes that receive 3360 their value from default information in an ATTLIST declaration. This function 3361 returns the number of attributes that were explicitly set times two, thus giving 3362 the offset in the <code>atts</code> array passed to the start tag handler of the 3363 first attribute set due to defaults. It supplies information for the last call to 3364 a start handler. If called inside a start handler, then that means the current 3365 call. 3366 </div> 3367 3368 <h4 id="XML_GetIdAttributeIndex"> 3369 XML_GetIdAttributeIndex 3370 </h4> 3371 3372 <pre class="fcndec"> 3373int XMLCALL 3374XML_GetIdAttributeIndex(XML_Parser p); 3375</pre> 3376 <div class="fcndef"> 3377 Returns the index of the ID attribute passed in the atts array in the last call 3378 to <code><a href="#XML_StartElementHandler">XML_StartElementHandler</a></code>, 3379 or -1 if there is no ID attribute. If called inside a start handler, then that 3380 means the current call. 3381 </div> 3382 3383 <h4 id="XML_GetAttributeInfo"> 3384 XML_GetAttributeInfo 3385 </h4> 3386 3387 <pre class="fcndec"> 3388const XML_AttrInfo * XMLCALL 3389XML_GetAttributeInfo(XML_Parser parser); 3390</pre> 3391 3392 <pre class="signature"> 3393typedef struct { 3394 XML_Index nameStart; /* Offset to beginning of the attribute name. */ 3395 XML_Index nameEnd; /* Offset after the attribute name's last byte. */ 3396 XML_Index valueStart; /* Offset to beginning of the attribute value. */ 3397 XML_Index valueEnd; /* Offset after the attribute value's last byte. */ 3398} XML_AttrInfo; 3399</pre> 3400 <div class="fcndef"> 3401 Returns an array of <code>XML_AttrInfo</code> structures for the attribute/value 3402 pairs passed in the last call to the <code>XML_StartElementHandler</code> that 3403 were specified in the start-tag rather than defaulted. Each attribute/value pair 3404 counts as 1; thus the number of entries in the array is 3405 <code>XML_GetSpecifiedAttributeCount(parser) / 2</code>. 3406 </div> 3407 3408 <h4 id="XML_SetEncoding"> 3409 XML_SetEncoding 3410 </h4> 3411 3412 <pre class="fcndec"> 3413enum XML_Status XMLCALL 3414XML_SetEncoding(XML_Parser p, 3415 const XML_Char *encoding); 3416</pre> 3417 <div class="fcndef"> 3418 Set the encoding to be used by the parser. It is equivalent to passing a 3419 non-<code>NULL</code> encoding argument to the parser creation functions. It must 3420 not be called after <code><a href="#XML_Parse">XML_Parse</a></code> or 3421 <code><a href="#XML_ParseBuffer">XML_ParseBuffer</a></code> have been called on 3422 the given parser. Returns <code>XML_STATUS_OK</code> on success or 3423 <code>XML_STATUS_ERROR</code> on error. 3424 </div> 3425 3426 <h4 id="XML_SetParamEntityParsing"> 3427 XML_SetParamEntityParsing 3428 </h4> 3429 3430 <pre class="fcndec"> 3431int XMLCALL 3432XML_SetParamEntityParsing(XML_Parser p, 3433 enum XML_ParamEntityParsing code); 3434</pre> 3435 <div class="fcndef"> 3436 This enables parsing of parameter entities, including the external parameter 3437 entity that is the external DTD subset, according to <code>code</code>. The 3438 choices for <code>code</code> are: 3439 <ul> 3440 <li> 3441 <code>XML_PARAM_ENTITY_PARSING_NEVER</code> 3442 </li> 3443 3444 <li> 3445 <code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code> 3446 </li> 3447 3448 <li> 3449 <code>XML_PARAM_ENTITY_PARSING_ALWAYS</code> 3450 </li> 3451 </ul> 3452 <b>Note:</b> If <code>XML_SetParamEntityParsing</code> is called after 3453 <code>XML_Parse</code> or <code>XML_ParseBuffer</code>, then it has no effect and 3454 will always return 0. 3455 </div> 3456 3457 <h4 id="XML_SetHashSalt"> 3458 XML_SetHashSalt (deprecated) 3459 </h4> 3460 3461 <pre class="fcndec"> 3462int XMLCALL 3463XML_SetHashSalt(XML_Parser parser, 3464 unsigned long hash_salt); 3465</pre> 3466 <div class="fcndef"> 3467 Sets the hash salt to use for internal hash calculations. Helps in preventing DoS 3468 attacks based on predicting hash function behavior. In order to have an effect 3469 this must be called before parsing has started. Returns 1 if successful, 0 when 3470 called after <code>XML_Parse</code> or <code>XML_ParseBuffer</code> or when 3471 <code>parser</code> is <code>NULL</code>. 3472 <p> 3473 <b>Note:</b> Function <code>XML_SetHashSalt</code> is 3474 <strong>deprecated</strong>. Please use function <code><a href= 3475 "#XML_SetHashSalt16Bytes">XML_SetHashSalt16Bytes</a></code> instead for better 3476 security. <code>XML_SetHashSalt</code> only provides 4 to 8 bytes of entropy 3477 (depending on the size of type <code>unsigned long</code>) while the SipHash 3478 implementation used by Expat can leverage up to 16 bytes of entropy — at least 3479 twice as much. Function <code><a href= 3480 "#XML_SetHashSalt16Bytes">XML_SetHashSalt16Bytes</a></code> of Expat >=2.8.0 3481 (and where backported) matches the amount of entropy supported by SipHash. 3482 </p> 3483 3484 <p> 3485 <b>Note:</b> This call is optional, as the parser will auto-generate a new 3486 random salt value internally if no value has been set by the start of parsing. 3487 </p> 3488 3489 <p> 3490 <b>Note:</b> One should not call <code>XML_SetHashSalt</code> with a hash salt 3491 value of 0, as this value is used as sentinel value to indicate that 3492 <code>XML_SetHashSalt</code> has <b>not</b> been called. Consequently such a 3493 call will have no effect, even if it returns 1. 3494 </p> 3495 </div> 3496 3497 <h4 id="XML_SetHashSalt16Bytes"> 3498 XML_SetHashSalt16Bytes 3499 </h4> 3500 3501 <pre class="fcndec"> 3502/* Added in Expat 2.8.0. */ 3503XML_Bool XMLCALL 3504XML_SetHashSalt16Bytes(XML_Parser parser, 3505 const uint8_t entropy[16]); 3506</pre> 3507 <div class="fcndef"> 3508 Sets the hash salt to use for internal hash calculations. Helps in preventing DoS 3509 attacks based on predicting hash function behavior. In order to have an effect 3510 this must be called before parsing has started. Returns <code>XML_TRUE</code> if 3511 successful, <code>XML_FALSE</code> when called after <code>XML_Parse</code> or 3512 <code>XML_ParseBuffer</code> or when <code>parser</code> is <code>NULL</code>. 3513 <p> 3514 <b>Note:</b> Setting a salt that is <em>not</em> from a source of high quality 3515 entropy (like <code>getentropy(3)</code>) will make the parser vulnerable to 3516 hash flooding attacks. 3517 </p> 3518 3519 <p> 3520 <b>Note:</b> This call is optional, as the parser will auto-generate a new 3521 random salt value internally if no value has been set by the start of parsing. 3522 </p> 3523 </div> 3524 3525 <h4 id="XML_UseForeignDTD"> 3526 XML_UseForeignDTD 3527 </h4> 3528 3529 <pre class="fcndec"> 3530enum XML_Error XMLCALL 3531XML_UseForeignDTD(XML_Parser parser, XML_Bool useDTD); 3532</pre> 3533 <div class="fcndef"> 3534 <p> 3535 This function allows an application to provide an external subset for the 3536 document type declaration for documents which do not specify an external subset 3537 of their own. For documents which specify an external subset in their DOCTYPE 3538 declaration, the application-provided subset will be ignored. If the document 3539 does not contain a DOCTYPE declaration at all and <code>useDTD</code> is true, 3540 the application-provided subset will be parsed, but the 3541 <code>startDoctypeDeclHandler</code> and <code>endDoctypeDeclHandler</code> 3542 functions, if set, will not be called. The setting of parameter entity parsing, 3543 controlled using <code><a href= 3544 "#XML_SetParamEntityParsing">XML_SetParamEntityParsing</a></code>, will be 3545 honored. 3546 </p> 3547 3548 <p> 3549 The application-provided external subset is read by calling the external entity 3550 reference handler set via <code><a href= 3551 "#XML_SetExternalEntityRefHandler">XML_SetExternalEntityRefHandler</a></code> 3552 with both <code>publicId</code> and <code>systemId</code> set to 3553 <code>NULL</code>. 3554 </p> 3555 3556 <p> 3557 If this function is called after parsing has begun, it returns 3558 <code>XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING</code> and ignores 3559 <code>useDTD</code>. If called when Expat has been compiled without DTD 3560 support, it returns <code>XML_ERROR_FEATURE_REQUIRES_XML_DTD</code>. Otherwise, 3561 it returns <code>XML_ERROR_NONE</code>. 3562 </p> 3563 3564 <p> 3565 <b>Note:</b> For the purpose of checking WFC: Entity Declared, passing 3566 <code>useDTD == XML_TRUE</code> will make the parser behave as if the document 3567 had a DTD with an external subset. This holds true even if the external entity 3568 reference handler returns without action. 3569 </p> 3570 </div> 3571 3572 <h4 id="XML_SetReturnNSTriplet"> 3573 XML_SetReturnNSTriplet 3574 </h4> 3575 3576 <pre class="fcndec"> 3577void XMLCALL 3578XML_SetReturnNSTriplet(XML_Parser parser, 3579 int do_nst); 3580</pre> 3581 <div class="fcndef"> 3582 <p> 3583 This function only has an effect when using a parser created with 3584 <code><a href="#XML_ParserCreateNS">XML_ParserCreateNS</a></code>, i.e. when 3585 namespace processing is in effect. The <code>do_nst</code> sets whether or not 3586 prefixes are returned with names qualified with a namespace prefix. If this 3587 function is called with <code>do_nst</code> non-zero, then afterwards namespace 3588 qualified names (that is qualified with a prefix as opposed to belonging to a 3589 default namespace) are returned as a triplet with the three parts separated by 3590 the namespace separator specified when the parser was created. The order of 3591 returned parts is URI, local name, and prefix. 3592 </p> 3593 3594 <p> 3595 If <code>do_nst</code> is zero, then namespaces are reported in the default 3596 manner, URI then local_name separated by the namespace separator. 3597 </p> 3598 </div> 3599 3600 <h4 id="XML_DefaultCurrent"> 3601 XML_DefaultCurrent 3602 </h4> 3603 3604 <pre class="fcndec"> 3605void XMLCALL 3606XML_DefaultCurrent(XML_Parser parser); 3607</pre> 3608 <div class="fcndef"> 3609 This can be called within a handler for a start element, end element, processing 3610 instruction or character data. It causes the corresponding markup to be passed to 3611 the default handler set by <code><a href= 3612 "#XML_SetDefaultHandler">XML_SetDefaultHandler</a></code> or <code><a href= 3613 "#XML_SetDefaultHandlerExpand">XML_SetDefaultHandlerExpand</a></code>. It does 3614 nothing if there is not a default handler. 3615 </div> 3616 3617 <h4 id="XML_ExpatVersion"> 3618 XML_ExpatVersion 3619 </h4> 3620 3621 <pre class="fcndec"> 3622XML_LChar * XMLCALL 3623XML_ExpatVersion(); 3624</pre> 3625 <div class="fcndef"> 3626 Return the library version as a string (e.g. <code>"expat_1.95.1"</code>). 3627 </div> 3628 3629 <h4 id="XML_ExpatVersionInfo"> 3630 XML_ExpatVersionInfo 3631 </h4> 3632 3633 <pre class="fcndec"> 3634struct XML_Expat_Version XMLCALL 3635XML_ExpatVersionInfo(); 3636</pre> 3637 3638 <pre class="signature"> 3639typedef struct { 3640 int major; 3641 int minor; 3642 int micro; 3643} XML_Expat_Version; 3644</pre> 3645 <div class="fcndef"> 3646 Return the library version information as a structure. Some macros are also 3647 defined that support compile-time tests of the library version: 3648 <ul> 3649 <li> 3650 <code>XML_MAJOR_VERSION</code> 3651 </li> 3652 3653 <li> 3654 <code>XML_MINOR_VERSION</code> 3655 </li> 3656 3657 <li> 3658 <code>XML_MICRO_VERSION</code> 3659 </li> 3660 </ul> 3661 Testing these constants is currently the best way to determine if particular 3662 parts of the Expat API are available. 3663 </div> 3664 3665 <h4 id="XML_GetFeatureList"> 3666 XML_GetFeatureList 3667 </h4> 3668 3669 <pre class="fcndec"> 3670const XML_Feature * XMLCALL 3671XML_GetFeatureList(); 3672</pre> 3673 3674 <pre class="signature"> 3675enum XML_FeatureEnum { 3676 XML_FEATURE_END = 0, 3677 XML_FEATURE_UNICODE, 3678 XML_FEATURE_UNICODE_WCHAR_T, 3679 XML_FEATURE_DTD, 3680 XML_FEATURE_CONTEXT_BYTES, 3681 XML_FEATURE_MIN_SIZE, 3682 XML_FEATURE_SIZEOF_XML_CHAR, 3683 XML_FEATURE_SIZEOF_XML_LCHAR, 3684 XML_FEATURE_NS, 3685 XML_FEATURE_LARGE_SIZE 3686}; 3687 3688typedef struct { 3689 enum XML_FeatureEnum feature; 3690 XML_LChar *name; 3691 long int value; 3692} XML_Feature; 3693</pre> 3694 <div class="fcndef"> 3695 <p> 3696 Returns a list of "feature" records, providing details on how Expat was 3697 configured at compile time. Most applications should not need to worry about 3698 this, but this information is otherwise not available from Expat. This function 3699 allows code that does need to check these features to do so at runtime. 3700 </p> 3701 3702 <p> 3703 The return value is an array of <code>XML_Feature</code>, terminated by a 3704 record with a <code>feature</code> of <code>XML_FEATURE_END</code> and 3705 <code>name</code> of <code>NULL</code>, identifying the feature-test macros 3706 Expat was compiled with. Since an application that requires this kind of 3707 information needs to determine the type of character the <code>name</code> 3708 points to, records for the <code>XML_FEATURE_SIZEOF_XML_CHAR</code> and 3709 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> will be located at the beginning of 3710 the list, followed by <code>XML_FEATURE_UNICODE</code> and 3711 <code>XML_FEATURE_UNICODE_WCHAR_T</code>, if they are present at all. 3712 </p> 3713 3714 <p> 3715 Some features have an associated value. If there isn't an associated value, the 3716 <code>value</code> field is set to 0. At this time, the following features have 3717 been defined to have values: 3718 </p> 3719 3720 <dl> 3721 <dt> 3722 <code>XML_FEATURE_SIZEOF_XML_CHAR</code> 3723 </dt> 3724 3725 <dd> 3726 The number of bytes occupied by one <code>XML_Char</code> character. 3727 </dd> 3728 3729 <dt> 3730 <code>XML_FEATURE_SIZEOF_XML_LCHAR</code> 3731 </dt> 3732 3733 <dd> 3734 The number of bytes occupied by one <code>XML_LChar</code> character. 3735 </dd> 3736 3737 <dt> 3738 <code>XML_FEATURE_CONTEXT_BYTES</code> 3739 </dt> 3740 3741 <dd> 3742 The maximum number of characters of context which can be reported by 3743 <code><a href="#XML_GetInputContext">XML_GetInputContext</a></code>. 3744 </dd> 3745 </dl> 3746 </div> 3747 3748 <h4 id="XML_FreeContentModel"> 3749 XML_FreeContentModel 3750 </h4> 3751 3752 <pre class="fcndec"> 3753void XMLCALL 3754XML_FreeContentModel(XML_Parser parser, XML_Content *model); 3755</pre> 3756 <div class="fcndef"> 3757 Function to deallocate the <code>model</code> argument passed to the 3758 <code>XML_ElementDeclHandler</code> callback set using <code><a href= 3759 "#XML_SetElementDeclHandler">XML_ElementDeclHandler</a></code>. This function 3760 should not be used for any other purpose. 3761 </div> 3762 3763 <p> 3764 The following functions allow external code to share the memory allocator an 3765 <code>XML_Parser</code> has been configured to use. This is especially useful for 3766 third-party libraries that interact with a parser object created by application 3767 code, or heavily layered applications. This can be essential when using 3768 dynamically loaded libraries which use different C standard libraries (this can 3769 happen on Windows, at least). 3770 </p> 3771 3772 <h4 id="XML_MemMalloc"> 3773 XML_MemMalloc 3774 </h4> 3775 3776 <pre class="fcndec"> 3777void * XMLCALL 3778XML_MemMalloc(XML_Parser parser, size_t size); 3779</pre> 3780 <div class="fcndef"> 3781 Allocate <code>size</code> bytes of memory using the allocator the 3782 <code>parser</code> object has been configured to use. Returns a pointer to the 3783 memory or <code>NULL</code> on failure. Memory allocated in this way must be 3784 freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>. 3785 </div> 3786 3787 <h4 id="XML_MemRealloc"> 3788 XML_MemRealloc 3789 </h4> 3790 3791 <pre class="fcndec"> 3792void * XMLCALL 3793XML_MemRealloc(XML_Parser parser, void *ptr, size_t size); 3794</pre> 3795 <div class="fcndef"> 3796 Allocate <code>size</code> bytes of memory using the allocator the 3797 <code>parser</code> object has been configured to use. <code>ptr</code> must 3798 point to a block of memory allocated by <code><a href= 3799 "#XML_MemMalloc">XML_MemMalloc</a></code> or <code>XML_MemRealloc</code>, or be 3800 <code>NULL</code>. This function tries to expand the block pointed to by 3801 <code>ptr</code> if possible. Returns a pointer to the memory or 3802 <code>NULL</code> on failure. On success, the original block has either been 3803 expanded or freed. On failure, the original block has not been freed; the caller 3804 is responsible for freeing the original block. Memory allocated in this way must 3805 be freed using <code><a href="#XML_MemFree">XML_MemFree</a></code>. 3806 </div> 3807 3808 <h4 id="XML_MemFree"> 3809 XML_MemFree 3810 </h4> 3811 3812 <pre class="fcndec"> 3813void XMLCALL 3814XML_MemFree(XML_Parser parser, void *ptr); 3815</pre> 3816 <div class="fcndef"> 3817 Free a block of memory pointed to by <code>ptr</code>. The block must have been 3818 allocated by <code><a href="#XML_MemMalloc">XML_MemMalloc</a></code> or 3819 <code>XML_MemRealloc</code>, or be <code>NULL</code>. 3820 </div> 3821 3822 <hr /> 3823 3824 <div class="footer"> 3825 Found a bug in the documentation? <a href= 3826 "https://github.com/libexpat/libexpat/issues">Please file a bug report.</a> 3827 </div> 3828 </div> 3829 </body> 3830</html> 3831