Lines Matching +full:pre +full:- +full:processing

1 <?xml version="1.0" encoding="iso-8859-1"?>
2 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
3 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
6 <!--
15 Copyright (c) 2000-2004 Fred L. Drake, Jr. <fdrake@users.sourceforge.net>
16 Copyright (c) 2002-2012 Karl Waclawek <karl@waclawek.net>
17 Copyright (c) 2017-2025 Sebastian Pipping <sebastian@pipping.org>
20 Copyright (c) 2021 Nicolas Cavallari <nicolas.cavallari@green-communications.fr>
44 -->
47 <meta http-equiv="Content-Style-Type" content="text/css" />
63 other open-source XML parsers.</p>
66 groff (an nroff look-alike), Jade (an implementation of ISO's DSSSL
156 <a href="#attack-protection">Attack Protection</a>
197 <p>Expat is a stream-oriented parser. You register callback (or
243 <pre class="eg">
262 </pre>
266 <pre class="eg">
269 Depth--;
271 </pre>
291 <pre class="eg">
303 </pre>
324 cmake -G"Visual Studio 17 2022" -DCMAKE_BUILD_TYPE=RelWithDebInfo .
328 contains the "expat.h" include file and a pre-built DLL.</p>
339 <pre class="eg">
343 </pre>
346 only one we'll mention here is the <code>--prefix</code> option. You
348 the <code>--help</code> option.</p>
353 give the option, <code>--prefix=/home/me/mystuff</code>, then the
358 <h3>Configuring Expat Using the Pre-Processor</h3>
361 pre-processor definitions. The symbols are:</p>
363 <dl class="cpp-symbols">
368 <a href="https://www.w3.org/TR/2006/REC-xml-20060816/#sec-physical-struct">general entities</a>
384 (except the <a href="https://www.w3.org/TR/2006/REC-xml-20060816/#sec-predefined-ent">predefined five</a>:
386 with a self-reference:
392 <dd>Include support for using and reporting DTD-based content. If
406 "https://www.w3.org/TR/REC-xml-names/" >Namespaces in XML</a></cite>
411 encoded in UTF-16 using wide characters of the type
424 processing of very large input streams, where the return values of
463 usually be done with the <code>-lexpat</code> argument. Otherwise,
469 <p>On a Unix-based system, here's what a Makefile might look like when
472 <pre class="eg">
475 LIBS= -lexpat
477 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS)
478 </pre>
483 <pre class="eg">
485 CFLAGS= -I/home/me/mystuff/include
487 LIBS= -L/home/me/mystuff/lib -lexpat
489 $(CC) $(LDFLAGS) -o xmlapp xmlapp.o $(LIBS)
490 </pre>
506 constructing a parser for a top-level document. The object returned
545 <pre class="eg">
548 info->skip = 0;
549 info->depth = 1;
557 if (! inf->skip) {
559 inf->skip = inf->depth;
565 inf->depth++;
572 inf->depth--;
574 if (! inf->skip)
577 if (inf->skip == inf->depth)
578 inf->skip = 0;
580 </pre>
612 common first-time mistake with any of the event-oriented interfaces to
621 <!-- XXX example needed here -->
627 the value of the <code>version</code> pseudo-attribute in the XML
631 alternate processing), it should use the <code><a href=
637 <pre class="eg">
662 </pre>
664 <h3>Namespace Processing</h3>
668 performs namespace processing. Under namespace processing, Expat
683 are not well-formed when namespace processing is enabled, and will
689 >XML_SetReturnNSTriplet</a></code> has been called with a non-zero
715 to recognized UTF-8 and UTF-16 (1 and 2 byte encodings of Unicode),
719 <pre>
720 &lt;?xml version="1.0" encoding="ISO-8859-2"?&gt;
721 </pre>
725 <pre>
727 </pre>
734 <p><a name="builtin_encodings"></a>There are four built-in encodings
737 <li>UTF-8</li>
738 <li>UTF-16</li>
739 <li>ISO-8859-1</li>
740 <li>US-ASCII</li>
758 <li>Every ASCII character that can appear in a well-formed XML document
763 equal to 65535 (0xFFFF)<em>This does not apply to the built-in support
764 for UTF-16 and UTF-8</em></li>
773 array. A -1 in this array indicates a malformed byte. If the value is
774 -2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte
775 sequence respectively. Multi-byte sequences are sent to the convert
777 function should return the Unicode scalar value for the sequence or -1
782 it passes to the handlers are always encoded in UTF-8 or UTF-16
824 <h3 id="stop-resume">Temporarily Stopping Parsing</h3>
835 <li>Delaying further processing until additional information is
843 if an application-domain error is found in the XML being parsed or if
855 the rough structure (in pseudo-code):</p>
857 <pre class="pseudocode">
873 </pre>
880 function mentioned in the pseudo-code above:</p>
882 <pre class="eg">
887 been an error), or the parse is stopped. Return non-zero when
920 </pre>
928 <pre class="eg">
931 non-zero when the parse is suspended.
949 </pre>
951 <p>Now that we've seen what a mess the top-level parsing loop can
955 processing that we're expecting to ignore. As a bonus, we get to stop
964 <!-- XXX really need more here -->
968 <!-- ================================================================ -->
975 <pre class="fcndec">
978 </pre>
981 Construct a new parser. If encoding is non-<code>NULL</code>, it specifies a
983 encoding declaration. There are four built-in encodings:
986 <li>US-ASCII</li>
987 <li>UTF-8</li>
988 <li>UTF-16</li>
989 <li>ISO-8859-1</li>
997 <pre class="fcndec">
1001 </pre>
1003 Constructs a new parser that has namespace processing in effect. Namespace
1009 in XML. For instance, <code>'\xFF'</code> is not legal in UTF-8, and
1010 <code>'\xFFFF'</code> is not legal in UTF-16. There is a special case when
1012 the local part will be concatenated without any separator - this is intended
1021 be ready to receive namespace URIs containing non-URI characters.
1025 <pre class="fcndec">
1030 </pre>
1031 <pre class="signature">
1037 </pre>
1042 non-<code>NULL</code>, then namespace processing is enabled in the created parser
1048 <pre class="fcndec">
1053 </pre>
1058 user data, namespace processing is inherited from the parser passed as
1065 <pre class="fcndec">
1068 </pre>
1075 <pre class="fcndec">
1079 </pre>
1085 state is re-initialized except for the values of ns and ns_triplets.
1120 <pre class="fcndec">
1126 </pre>
1127 <pre class="signature">
1132 </pre>
1138 that <code>s</code> doesn't have to be null-terminated. It also means that
1183 <pre class="fcndec">
1188 </pre>
1204 <pre class="fcndec">
1208 </pre>
1217 <pre class="eg">
1237 </pre>
1241 <pre class="fcndec">
1245 </pre>
1251 call-back handler, except when aborting (when <code>resumable</code>
1253 call-backs may still follow because they would otherwise get
1261 while making multiple call-backs on a contiguous chunk of characters,</li>
1266 call-backs, except when parsing an external parameter entity and
1287 not being handled appropriately; see <a href= "#stop-resume"
1315 <pre class="fcndec">
1318 </pre>
1322 within a handler call-back. Returns same status codes as <code><a
1341 <pre class="fcndec">
1345 </pre>
1346 <pre class="signature">
1358 </pre>
1385 The former implies UTF-8 encoding, the latter two imply UTF-16 encoding.
1391 <pre class="setter">
1395 </pre>
1396 <pre class="signature">
1401 </pre>
1413 <pre class="setter">
1417 </pre>
1418 <pre class="signature">
1422 </pre>
1429 <pre class="setter">
1434 </pre>
1440 <pre class="setter">
1444 </pre>
1445 <pre class="signature">
1450 </pre>
1452 is <em>NOT null-terminated</em>. You have to use the length argument
1457 may <em>NOT immediately</em> terminate call-backs if the parser is currently
1458 processing such a single block of contiguous markup-free text, as the parser
1464 <pre class="setter">
1468 </pre>
1469 <pre class="signature">
1475 </pre>
1476 <p>Set a handler for processing instructions. The target is the first word
1477 in the processing instruction. The data is the rest of the characters in
1483 <pre class="setter">
1487 </pre>
1488 <pre class="signature">
1492 </pre>
1499 <pre class="setter">
1503 </pre>
1504 <pre class="signature">
1507 </pre>
1513 <pre class="setter">
1517 </pre>
1518 <pre class="signature">
1521 </pre>
1527 <pre class="setter">
1532 </pre>
1538 <pre class="setter">
1542 </pre>
1543 <pre class="signature">
1548 </pre>
1555 that they will be encoded in UTF-8 or UTF-16. Line boundaries are not
1570 <pre class="setter">
1574 </pre>
1575 <pre class="signature">
1580 </pre>
1591 <pre class="setter">
1595 </pre>
1596 <pre class="signature">
1603 </pre>
1605 called for processing an external DTD subset if parameter entity parsing
1645 <pre class="fcndec">
1649 </pre>
1672 <pre class="setter">
1676 </pre>
1677 <pre class="signature">
1682 </pre>
1691 <p>The <code>is_parameter_entity</code> argument will be non-zero for
1700 <pre class="setter">
1705 </pre>
1706 <pre class="signature">
1718 </pre>
1734 value is -1, then that byte is invalid as the initial byte in a sequence.
1735 If the value is -n, where n is an integer &gt; 1, then n is the number of
1737 call to the function pointed at by convert. This function may return -1
1741 string s is <em>NOT</em> null-terminated and points at the sequence of
1750 <pre class="setter">
1754 </pre>
1755 <pre class="signature">
1760 </pre>
1769 <pre class="setter">
1773 </pre>
1774 <pre class="signature">
1778 </pre>
1787 <pre class="setter">
1792 </pre>
1798 <pre class="setter">
1802 </pre>
1803 <pre class="signature">
1809 </pre>
1815 contain -1, 0, or 1 indicating respectively that there was no
1822 <pre class="setter">
1826 </pre>
1827 <pre class="signature">
1834 </pre>
1838 will be non-zero if the DOCTYPE declaration has an internal subset.</p>
1843 <pre class="setter">
1847 </pre>
1848 <pre class="signature">
1851 </pre>
1858 <pre class="setter">
1863 </pre>
1869 <pre class="setter">
1873 </pre>
1874 <pre class="signature">
1879 </pre>
1880 <pre class="signature">
1906 </pre>
1943 <pre class="setter">
1947 </pre>
1948 <pre class="signature">
1956 </pre>
1971 <code>isrequired</code>, but they will have the non-<code>NULL</code> fixed value
1977 <pre class="setter">
1981 </pre>
1982 <pre class="signature">
1993 </pre>
1995 The <code>is_parameter_entity</code> argument will be non-zero in the
1999 <code>value</code> will be non-<code>NULL</code> and <code>systemId</code>,
2001 The value string is <em>not</em> null-terminated; the length is
2004 legal to have zero-length values. Instead check for whether or not
2006 argument will have a non-<code>NULL</code> value only for unparsed entity
2012 <pre class="setter">
2016 </pre>
2017 <pre class="signature">
2025 </pre>
2029 <div id="eg"><pre>
2031 </pre></div>
2039 <pre class="setter">
2043 </pre>
2044 <pre class="signature">
2051 </pre>
2057 <pre class="setter">
2061 </pre>
2062 <pre class="signature">
2065 </pre>
2093 <pre class="fcndec">
2096 </pre>
2102 <pre class="fcndec">
2105 </pre>
2113 <pre class="fcndec">
2116 </pre>
2125 <pre class="fcndec">
2128 </pre>
2135 <pre class="fcndec">
2138 </pre>
2145 <pre class="fcndec">
2148 </pre>
2152 entity and for the end-tag event for empty element tags (the later can
2153 be used to distinguish empty-element tags from empty elements using
2158 <pre class="fcndec">
2163 </pre>
2184 <h3><a name="attack-protection">Attack Protection</a><a name="billion-laughs"></a></h3>
2187 <pre class="fcndec">
2192 </pre>
2207 <pre>amplification := (direct + indirect) / direct</pre>
2216 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without any parent parsers) and</li>
2217 <li><code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and greater than or equal to <code>1.0</code>.</li>
2222 If you ever need to increase this value for non-attack payload,
2240 <pre class="fcndec">
2245 </pre>
2258 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without any parent parsers).</li>
2263 If you ever need to increase this value for non-attack payload,
2276 <pre class="fcndec">
2281 </pre>
2327 <pre>amplification := allocated / direct</pre>
2336 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without any parent parsers) and</li>
2337 <li><code>maximumAmplificationFactor</code> must be non-<code>NaN</code> and greater than or equal to <code>1.0</code>.</li>
2342 If you ever need to increase this value for non-attack payload,
2358 <pre class="fcndec">
2363 </pre>
2382 <li>parser <code>p</code> must be a non-<code>NULL</code> root parser (without any parent parsers).</li>
2387 If you ever need to increase this value for non-attack payload,
2393 <pre class="fcndec">
2397 </pre>
2419 <pre class="fcndec">
2423 </pre>
2435 <pre class="fcndec">
2438 </pre>
2445 <pre class="fcndec">
2448 </pre>
2457 <pre class="fcndec">
2461 </pre>
2470 <pre class="fcndec">
2473 </pre>
2479 <pre class="fcndec">
2482 </pre>
2496 <pre class="fcndec">
2499 </pre>
2503 >XML_StartElementHandler</a></code>, or -1 if there is no ID
2509 <pre class="fcndec">
2512 </pre>
2513 <pre class="signature">
2520 </pre>
2525 in the start-tag rather than defaulted. Each attribute/value pair counts
2531 <pre class="fcndec">
2535 </pre>
2538 passing a non-<code>NULL</code> encoding argument to the parser creation functions.
2547 <pre class="fcndec">
2551 </pre>
2568 <pre class="fcndec">
2572 </pre>
2579 <p><b>Note:</b> This call is optional, as the parser will auto-generate
2588 <pre class="fcndec">
2591 </pre>
2596 external subset in their DOCTYPE declaration, the application-provided
2599 application-provided subset will be parsed, but the
2606 <p>The application-provided external subset is read by calling the
2626 <pre class="fcndec">
2630 </pre>
2635 i.e. when namespace processing is in effect. The <code>do_nst</code>
2638 non-zero, then afterwards namespace qualified names (that is qualified
2649 <pre class="fcndec">
2652 </pre>
2655 processing instruction or character data. It causes the corresponding
2664 <pre class="fcndec">
2667 </pre>
2673 <pre class="fcndec">
2676 </pre>
2677 <pre class="signature">
2683 </pre>
2686 Some macros are also defined that support compile-time tests of the
2698 <pre class="fcndec">
2701 </pre>
2702 <pre class="signature">
2721 </pre>
2732 identifying the feature-test macros Expat was compiled with. Since an
2760 <pre class="fcndec">
2763 </pre>
2773 is especially useful for third-party libraries that interact with a
2780 <pre class="fcndec">
2783 </pre>
2793 <pre class="fcndec">
2796 </pre>
2813 <pre class="fcndec">
2816 </pre>