Extensible Markup Language (XML) is a widely used standard for storing data in a text format that many different programs can access. It combines the actual data with 'mark-up' which indicates how the data should be interpreted.
The ŒXML system function can be used to extract data from XML format into an APL array, and to generate XML from an APL array. The direction of conversion is determined by the type of the right argument.
See also the ŒIMPORT and ŒEXPORT functions, which allow data to be transferred to/from XML files in a single step.
A full description of XML is beyond the scope of this document. However, the following simple but complete XML example demonstrates some of the main features:
<?xml version="1.0" encoding="utf-8"?> <sales> <!-- Sales by month --> <month>January <item> <name>Ice Cream</name> <amount currency="dollars">25.10</amount> </item> <item> <name>Fizzy Drinks</name> <amount currency="dollars">360.92</amount> </item> </month> <month>February <item> <name>Ice Cream</name> <amount currency="dollars">5.02</amount> </item> <item> <name>Fizzy Drinks</name> <amount currency="dollars">403.16</amount> </item> </month> </sales>
The first line specifies the XML version used, and the third line ("Sales by month") is a comment. The remainder of the document consists of elements which contain the data. Each element begins with a start tag and ends with a matching end tag, for example:
<name>...</name>
Element tag names are case-sensitive.
An element may contain data, or other elements nested within it, or both. In addition the start tag may include one or more attributes specifying how the data is to be interpreted. Each attribute is a pair of the form name="value", for example:
<amount currency="dollars">25.10</amount>
An empty element which contains no data and no other elements nested within it can be written as:
<name/>
Within an XML document there is usually no significance in the amount of white space used, for example the number of spaces used to indent an element or the positions of line breaks. The following is valid in XML:
<item><name>Ice Cream</name><amount currency="dollars">25.10</amount></item>
Syntax:
R„[options] ŒXML CHRVEC
The right argument is a character vector (with embedded carriage returns and/or line feeds) containing the XML text to be converted. The optional left argument gives some control over the conversion process and is discussed below.
The result is an N-row, 5-column matrix containing a flattened representation of the XML data. Each element in the XML data will produce one row in the result. The columns are as follows:
Column 1: | An integer indicating the depth of nesting of the element. A value of 0 is used for the outer-most nesting level, with deeper nesting being indicated by higher numbers. |
Column 2: | The element name as specified in the start tag. |
Column 3: | The element data as a character vector |
Column 4: | An M-row, 2-column nested matrix containing any attribute name/value pairs. Each item in the matrix is a character vector. If the element has no attributes, this matrix will have 0 rows. |
Column 5: | A code to help interpret the type of data the row contains (See below) |
For example, when presented with the XML sample listed above the array produced is as follows:
ŒXML xml_data 0 sales 3 1 month 7 2 January 4 2 item 3 3 name Ice Cream 5 3 amount 25.10 currency dollars 5 2 item 3 3 name Fizzy Drinks 5 3 amount 360.92 currency dollars 5 1 month 7 2 February 4 2 item 3 3 name Ice Cream 5 3 amount 5.02 currency dollars 5 2 item 3 3 name Fizzy Drinks 5 3 amount 403.16 currency dollars 5 ŒDISPLAY ŒXML xml_data Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ ‡ Ú…ÎÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 0 ÛsalesÛ Û Û ² Ú´Ì Ú´Ì Û 3 Û Û ÀÎÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 1 ÛmonthÛ Û Û ² Ú´Ì Ú´Ì Û 7 Û Û ÀÎÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú´Ì Ú…ÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 Û Û ÛJanuaryÛ ² Ú´Ì Ú´Ì Û 4 Û Û ÀÎÙ ÀÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 ÛitemÛ Û Û ² Ú´Ì Ú´Ì Û 3 Û Û ÀÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 3 ÛnameÛ ÛIce CreamÛ ² Ú´Ì Ú´Ì Û 5 Û Û ÀÎÎÎÎÙ ÀÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ Û Û 3 ÛamountÛ Û25.10Û ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û Û ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÎÙ Û ÛcurrencyÛ ÛdollarsÛ Û Û Û Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 ÛitemÛ Û Û ² Ú´Ì Ú´Ì Û 3 Û Û ÀÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 3 ÛnameÛ ÛFizzy DrinksÛ ² Ú´Ì Ú´Ì Û 5 Û Û ÀÎÎÎÎÙ ÀÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ Û Û 3 ÛamountÛ Û360.92Û ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û Û ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÙ Û ÛcurrencyÛ ÛdollarsÛ Û Û Û Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 1 ÛmonthÛ Û Û ² Ú´Ì Ú´Ì Û 7 Û Û ÀÎÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú´Ì Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 Û Û ÛFebruaryÛ ² Ú´Ì Ú´Ì Û 4 Û Û ÀÎÙ ÀÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 ÛitemÛ Û Û ² Ú´Ì Ú´Ì Û 3 Û Û ÀÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 3 ÛnameÛ ÛIce CreamÛ ² Ú´Ì Ú´Ì Û 5 Û Û ÀÎÎÎÎÙ ÀÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ Û Û 3 ÛamountÛ Û5.02Û ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û Û ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÙ Û ÛcurrencyÛ ÛdollarsÛ Û Û Û Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 ÛitemÛ Û Û ² Ú´Ì Ú´Ì Û 3 Û Û ÀÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 3 ÛnameÛ ÛFizzy DrinksÛ ² Ú´Ì Ú´Ì Û 5 Û Û ÀÎÎÎÎÙ ÀÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ Û Û 3 ÛamountÛ Û403.16Û ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û Û ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÙ Û ÛcurrencyÛ ÛdollarsÛ Û Û Û Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ Û À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ
The conversion from XML to an APL array described above can be controlled by an optional left argument which consists of one or more keyword/value pairs, for example:
R„('markup' 'preserve') ('whitespace' 'preserve') ŒXML xml_data
The supported keywords are:
The fifth column of the array produced by ŒXML contains a type code which can be used to interpret the row. Its value depends on
whether the XML element has any children.
Possible children can be of the following types. (Note that if markup is stripped only the first of these types can occur in the final result).
<Parent> <Child>...</Child> </Parent>
<Parent> <!--Comment--> </Parent>
<Parent> <?Processing instruction?> </Parent>
<Parent> <!ELEMENT name (#PCDATA)> </Parent>
(a) If the XML element has children its type code is formed from a sum of the following values, reflecting the types of children found on subsequent rows:
1 | Element has a tag (in column 2) (Always true) |
2 | Element contains nested child element |
4 | Element contains data as well as nested items |
8 | Element contains nested XML markup |
16 | Element contains nested XML comment |
32 | Element contains nested XML Processing Instruction |
For example, the element <Weight> in the following example has a type code of 21 (1 + 16 + 4) when markup and comments are preserved:
<Weight> <!-- All weights approximate--> 100 </Weight>
Notice that an XML element with children always has a tag name in column 2. It never has any data in column 3 : all the data is returned in subsequent rows.
(b) The following type codes are used for XML elements which don't have any children:
1 | Element is an empty XML tag, e.g. <empty/>. The tag name in returned in column 2, and column 3 is blank. |
4 | Row is data for parent (See below). The data is returned in column 3, and column 2 is blank. |
5 | Element has an XML tag and data, e.g. <Tag>Data</Tag> The tag name is returned in column 2 and the data in column 3. |
8 | Element is unprocessed XML markup, e.g. <!ELEMENT name (#PCDATA)>. The markup is returned in column 2, and column 3 is blank. |
16 | Element is XML comment, e.g. <!--Comment-->. The comment is returned in column 2, and column 3 is blank. |
32 | Element is XML Processing Instruction, e.g. <?xml version="1.0" encoding="utf-8"?>. The processing instruction is returned in column 2, and column 3 is blank. |
The following example illustrates how the codes are used:
<Tag1>Text <Tag2> <Tag3>Text</Tag3> </Tag2> More Text </Tag1>
When converted by ŒXML this will produce the following array
ŒDISPLAY ŒXML xml_data Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ ‡ Ú…ÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 0 ÛTag1Û Û Û ² Ú´Ì Ú´Ì Û 7 Û Û ÀÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú´Ì Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 1 Û Û ÛTextÛ ² Ú´Ì Ú´Ì Û 4 Û Û ÀÎÙ ÀÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Û Û 1 ÛTag2Û Û Û ² Ú´Ì Ú´Ì Û 3 Û Û ÀÎÎÎÎÙ ÀÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú…ÎÎÎÌ Ú…ÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 2 ÛTag3Û ÛTextÛ ² Ú´Ì Ú´Ì Û 5 Û Û ÀÎÎÎÎÙ ÀÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û Û Ú´Ì Ú…ÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ Û Û 1 Û Û ÛMore TextÛ ² Ú´Ì Ú´Ì Û 4 Û Û ÀÎÙ ÀÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û Û Û Û ÀÎÙ ÀÎÙ Û Û Û À¹ÎÎÎÎÎÎÎÎÙ Û À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ
Syntax:
R„[options] ŒXML NSTMAT
When presented with an array of APL data, ŒXMLwill convert it to XML representation. The result is a character vector with embedded line-feed characters.
The right argument must be a nested matrix with one row for each XML element, and between 3 and 5 columns as follows
Column 1: | An integer indicating the depth of nesting of the element. A value of 0 is used for the outer-most nesting level, with deeper nesting being indicated by higher numbers. |
Column 2: | The element name to use for the start tag. |
Column 3: | The element data (see below) |
Column 4: | (Optional) An M-row, 2-column nested matrix containing any attribute name/value pairs. Each item in the matrix is a character vector.
If the element has no attributes you can specify a 0-row matrix, or a pair of empty character vectors. If none of the elements have any attributes you can omit column 4 completely. |
Column 5: | (Optional) An integer type code (ignored). This column is only used to facilitate round-trip conversions from XML to APL and back again. |
The data specified in Column 3 will usually be a character vector or scalar. However, as a convenience ŒXML also allows you to specify numeric values. These are formatted as character data before copying to the XML result. Numeric values are also allowed for attribute values (but not names).
Example:
array„1 4½0 '?xml version="1.0" encoding="utf-8"?' '' ('' '') array„array®0 'Person' '' ('' '') array„array®1 'Name' '' ('order' 'western') array„array®2 'FirstName' 'Fred' ('' '') array„array®2 'LastName' 'Smith' ('' '') array„array®1 'DateOfBirth' '' ('' '') array„array®2 'Year' 1943 ('' '') array„array®2 'Month' 12 ('' '') array„array®2 'Day' 17 ('' '') XML„ŒXML array ŒSS XML ŒL ŒR © Convert line feeds to carriage return for display <?xml version="1.0" encoding="utf-8"?> <Person> <Name order="western"> <FirstName>Fred</FirstName> <LastName>Smith</LastName> </Name> <DateOfBirth> <Year>1943</Year> <Month>12</Month> <Day>17</Day> </DateOfBirth> </Person>
The conversion process can be controlled by an optional left argument, for example:
R„('whitespace' 'preserve') ŒXML apl_data
The only supported option is:
To be valid, an XML file must start with a line containing an XML prologue, e.g.
<?xml version="1.0" encoding="utf-8"?>
Note that ŒXML does not add the prologue automatically. To ensure that the XML is valid you must do one of two things:
(a) Make sure that the first row of the array used to generate the XML contains a valid prologue, as in the example above, or
(b) Prepend the prologue after the XML has been generated:
XML„ŒXML 1 'Name' 'Fred Smith' XML„'<?xml version="1.0" encoding="utf-8"?>',ŒL,XML
If you create an XML file using ŒEXPORT, APLX will automatically add the prologue if it is missing from the array.
This work is based on the original design concepts and implementation by Mark E. Johns, and has been designed in cooperation with Dyalog Ltd