ŒXML Convert to/from XML


Extensible Markup Language (XML) is a widely used standard for storing data in a text format that many different programs can access. It combines the actual data with 'mark-up' which indicates how the data should be interpreted.

The ŒXML system function can be used to extract data from XML format into an APL array, and to generate XML from an APL array. The direction of conversion is determined by the type of the right argument.

See also the ŒIMPORT and ŒEXPORT functions, which allow data to be transferred to/from XML files in a single step.

An Example of XML format

A full description of XML is beyond the scope of this document. However, the following simple but complete XML example demonstrates some of the main features:


<?xml version="1.0" encoding="utf-8"?>
<sales>
    <!-- Sales by month -->
    <month>January
        <item>
            <name>Ice Cream</name>
            <amount currency="dollars">25.10</amount>
       </item>
       <item>
            <name>Fizzy Drinks</name>
            <amount currency="dollars">360.92</amount>
       </item>
    </month>
    <month>February
        <item>
            <name>Ice Cream</name>
            <amount currency="dollars">5.02</amount>
       </item>
       <item>
            <name>Fizzy Drinks</name>
            <amount currency="dollars">403.16</amount>
       </item>
    </month>
</sales>

The first line specifies the XML version used, and the third line ("Sales by month") is a comment. The remainder of the document consists of elements which contain the data. Each element begins with a start tag and ends with a matching end tag, for example:

    <name>...</name>

Element tag names are case-sensitive.

An element may contain data, or other elements nested within it, or both. In addition the start tag may include one or more attributes specifying how the data is to be interpreted. Each attribute is a pair of the form name="value", for example:

    <amount currency="dollars">25.10</amount>

An empty element which contains no data and no other elements nested within it can be written as:

    <name/>

Within an XML document there is usually no significance in the amount of white space used, for example the number of spaces used to indent an element or the positions of line breaks. The following is valid in XML:

<item><name>Ice Cream</name><amount currency="dollars">25.10</amount></item>

Converting XML Data to an APL Array

Syntax: R„[options] ŒXML CHRVEC

The right argument is a character vector (with embedded carriage returns and/or line feeds) containing the XML text to be converted. The optional left argument gives some control over the conversion process and is discussed below.

The result is an N-row, 5-column matrix containing a flattened representation of the XML data. Each element in the XML data will produce one row in the result. The columns are as follows:

Column 1:An integer indicating the depth of nesting of the element.
A value of 0 is used for the outer-most nesting level, with deeper nesting being indicated by higher numbers.
Column 2:The element name as specified in the start tag.
Column 3:The element data as a character vector
Column 4:An M-row, 2-column nested matrix containing any attribute name/value pairs. Each item in the matrix is a character vector.
If the element has no attributes, this matrix will have 0 rows.
Column 5:A code to help interpret the type of data the row contains (See below)

For example, when presented with the XML sample listed above the array produced is as follows:


      ŒXML xml_data
 0 sales                                  3
 1 month                                  7
 2        January                         4
 2 item                                   3
 3 name   Ice Cream                       5
 3 amount 25.10         currency dollars  5
 2 item                                   3
 3 name   Fizzy Drinks                    5
 3 amount 360.92        currency dollars  5
 1 month                                  7
 2        February                        4
 2 item                                   3
 3 name   Ice Cream                       5
 3 amount 5.02          currency dollars  5
 2 item                                   3
 3 name   Fizzy Drinks                    5
 3 amount 403.16        currency dollars  5
 
 
      ŒDISPLAY ŒXML xml_data
Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ
‡   Ú…ÎÎÎÎÌ  Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 0 ÛsalesÛ  Û Û            ² Ú´Ì Ú´Ì Û              3 Û
Û   ÀÎÎÎÎÎÙ  ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÎÌ  Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 1 ÛmonthÛ  Û Û            ² Ú´Ì Ú´Ì Û              7 Û
Û   ÀÎÎÎÎÎÙ  ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú´Ì      Ú…ÎÎÎÎÎÎÌ      Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 2 Û Û      ÛJanuaryÛ      ² Ú´Ì Ú´Ì Û              4 Û
Û   ÀÎÙ      ÀÎÎÎÎÎÎÎÙ      Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÌ   Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 2 ÛitemÛ   Û Û            ² Ú´Ì Ú´Ì Û              3 Û
Û   ÀÎÎÎÎÙ   ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÌ   Ú…ÎÎÎÎÎÎÎÎÌ    Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 3 ÛnameÛ   ÛIce CreamÛ    ² Ú´Ì Ú´Ì Û              5 Û
Û   ÀÎÎÎÎÙ   ÀÎÎÎÎÎÎÎÎÎÙ    Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÌ        Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ   Û
Û 3 ÛamountÛ Û25.10Û        ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û
Û   ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÎÙ        Û ÛcurrencyÛ ÛdollarsÛ Û   Û
Û                           Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û   Û
Û                           À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ   Û
Û   Ú…ÎÎÎÌ   Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 2 ÛitemÛ   Û Û            ² Ú´Ì Ú´Ì Û              3 Û
Û   ÀÎÎÎÎÙ   ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÌ   Ú…ÎÎÎÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 3 ÛnameÛ   ÛFizzy DrinksÛ ² Ú´Ì Ú´Ì Û              5 Û
Û   ÀÎÎÎÎÙ   ÀÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÎÌ       Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ   Û
Û 3 ÛamountÛ Û360.92Û       ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û
Û   ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÙ       Û ÛcurrencyÛ ÛdollarsÛ Û   Û
Û                           Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û   Û
Û                           À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ   Û
Û   Ú…ÎÎÎÎÌ  Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 1 ÛmonthÛ  Û Û            ² Ú´Ì Ú´Ì Û              7 Û
Û   ÀÎÎÎÎÎÙ  ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú´Ì      Ú…ÎÎÎÎÎÎÎÌ     Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 2 Û Û      ÛFebruaryÛ     ² Ú´Ì Ú´Ì Û              4 Û
Û   ÀÎÙ      ÀÎÎÎÎÎÎÎÎÙ     Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÌ   Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 2 ÛitemÛ   Û Û            ² Ú´Ì Ú´Ì Û              3 Û
Û   ÀÎÎÎÎÙ   ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÌ   Ú…ÎÎÎÎÎÎÎÎÌ    Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 3 ÛnameÛ   ÛIce CreamÛ    ² Ú´Ì Ú´Ì Û              5 Û
Û   ÀÎÎÎÎÙ   ÀÎÎÎÎÎÎÎÎÎÙ    Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÌ         Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ   Û
Û 3 ÛamountÛ Û5.02Û         ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û
Û   ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÙ         Û ÛcurrencyÛ ÛdollarsÛ Û   Û
Û                           Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û   Û
Û                           À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ   Û
Û   Ú…ÎÎÎÌ   Ú´Ì            Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 2 ÛitemÛ   Û Û            ² Ú´Ì Ú´Ì Û              3 Û
Û   ÀÎÎÎÎÙ   ÀÎÙ            Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÌ   Ú…ÎÎÎÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ                Û
Û 3 ÛnameÛ   ÛFizzy DrinksÛ ² Ú´Ì Ú´Ì Û              5 Û
Û   ÀÎÎÎÎÙ   ÀÎÎÎÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û                Û
Û                           Û ÀÎÙ ÀÎÙ Û                Û
Û                           À¹ÎÎÎÎÎÎÎÎÙ                Û
Û   Ú…ÎÎÎÎÎÌ Ú…ÎÎÎÎÎÌ       Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ   Û
Û 3 ÛamountÛ Û403.16Û       ‡ Ú…ÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÌ Û 5 Û
Û   ÀÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÙ       Û ÛcurrencyÛ ÛdollarsÛ Û   Û
Û                           Û ÀÎÎÎÎÎÎÎÎÙ ÀÎÎÎÎÎÎÎÙ Û   Û
Û                           À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ   Û
À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ

Options for converting XML to an APL array

The conversion from XML to an APL array described above can be controlled by an optional left argument which consists of one or more keyword/value pairs, for example:


     R„('markup' 'preserve') ('whitespace' 'preserve') ŒXML xml_data

The supported keywords are:

Type code returned by ŒXML

The fifth column of the array produced by ŒXML contains a type code which can be used to interpret the row. Its value depends on whether the XML element has any children.

Possible children can be of the following types. (Note that if markup is stripped only the first of these types can occur in the final result).

(a) If the XML element has children its type code is formed from a sum of the following values, reflecting the types of children found on subsequent rows:

1Element has a tag (in column 2) (Always true)
2Element contains nested child element
4Element contains data as well as nested items
8Element contains nested XML markup
16Element contains nested XML comment
32Element contains nested XML Processing Instruction

For example, the element <Weight> in the following example has a type code of 21 (1 + 16 + 4) when markup and comments are preserved:


    <Weight>
        <!-- All weights approximate-->
        100
    </Weight>

Notice that an XML element with children always has a tag name in column 2. It never has any data in column 3 : all the data is returned in subsequent rows.

(b) The following type codes are used for XML elements which don't have any children:

1Element is an empty XML tag, e.g. <empty/>.
The tag name in returned in column 2, and column 3 is blank.
4Row is data for parent (See below).
The data is returned in column 3, and column 2 is blank.
5Element has an XML tag and data, e.g. <Tag>Data</Tag>
The tag name is returned in column 2 and the data in column 3.
8Element is unprocessed XML markup, e.g. <!ELEMENT name (#PCDATA)>.
The markup is returned in column 2, and column 3 is blank.
16Element is XML comment, e.g. <!--Comment-->.
The comment is returned in column 2, and column 3 is blank.
32Element is XML Processing Instruction, e.g. <?xml version="1.0" encoding="utf-8"?>.
The processing instruction is returned in column 2, and column 3 is blank.

The following example illustrates how the codes are used:


  <Tag1>Text
        <Tag2>
        <Tag3>Text</Tag3>
        </Tag2> 
        More Text
  </Tag1>

When converted by ŒXML this will produce the following array


     ŒDISPLAY ŒXML xml_data
Ú…ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÌ
‡   Ú…ÎÎÎÌ Ú´Ì         Ú…ÎÎÎÎÎÎÎÎÌ   Û
Û 0 ÛTag1Û Û Û         ² Ú´Ì Ú´Ì Û 7 Û
Û   ÀÎÎÎÎÙ ÀÎÙ         Û Û Û Û Û Û   Û
Û                      Û ÀÎÙ ÀÎÙ Û   Û
Û                      À¹ÎÎÎÎÎÎÎÎÙ   Û
Û   Ú´Ì    Ú…ÎÎÎÌ      Ú…ÎÎÎÎÎÎÎÎÌ   Û
Û 1 Û Û    ÛTextÛ      ² Ú´Ì Ú´Ì Û 4 Û
Û   ÀÎÙ    ÀÎÎÎÎÙ      Û Û Û Û Û Û   Û
Û                      Û ÀÎÙ ÀÎÙ Û   Û
Û                      À¹ÎÎÎÎÎÎÎÎÙ   Û
Û   Ú…ÎÎÎÌ Ú´Ì         Ú…ÎÎÎÎÎÎÎÎÌ   Û
Û 1 ÛTag2Û Û Û         ² Ú´Ì Ú´Ì Û 3 Û
Û   ÀÎÎÎÎÙ ÀÎÙ         Û Û Û Û Û Û   Û
Û                      Û ÀÎÙ ÀÎÙ Û   Û
Û                      À¹ÎÎÎÎÎÎÎÎÙ   Û
Û   Ú…ÎÎÎÌ Ú…ÎÎÎÌ      Ú…ÎÎÎÎÎÎÎÎÌ   Û
Û 2 ÛTag3Û ÛTextÛ      ² Ú´Ì Ú´Ì Û 5 Û
Û   ÀÎÎÎÎÙ ÀÎÎÎÎÙ      Û Û Û Û Û Û   Û
Û                      Û ÀÎÙ ÀÎÙ Û   Û
Û                      À¹ÎÎÎÎÎÎÎÎÙ   Û
Û   Ú´Ì    Ú…ÎÎÎÎÎÎÎÎÌ Ú…ÎÎÎÎÎÎÎÎÌ   Û
Û 1 Û Û    ÛMore TextÛ ² Ú´Ì Ú´Ì Û 4 Û
Û   ÀÎÙ    ÀÎÎÎÎÎÎÎÎÎÙ Û Û Û Û Û Û   Û
Û                      Û ÀÎÙ ÀÎÙ Û   Û
Û                      À¹ÎÎÎÎÎÎÎÎÙ   Û
À¹ÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÎÙ

Creating XML Data from an APL Array

Syntax: R„[options] ŒXML NSTMAT

When presented with an array of APL data, ŒXMLwill convert it to XML representation. The result is a character vector with embedded line-feed characters.

The right argument must be a nested matrix with one row for each XML element, and between 3 and 5 columns as follows

Column 1:An integer indicating the depth of nesting of the element.
A value of 0 is used for the outer-most nesting level, with deeper nesting being indicated by higher numbers.
Column 2:The element name to use for the start tag.
Column 3:The element data (see below)
Column 4:(Optional) An M-row, 2-column nested matrix containing any attribute name/value pairs. Each item in the matrix is a character vector.
If the element has no attributes you can specify a 0-row matrix, or a pair of empty character vectors.
If none of the elements have any attributes you can omit column 4 completely.
Column 5:(Optional) An integer type code (ignored).
This column is only used to facilitate round-trip conversions from XML to APL and back again.

The data specified in Column 3 will usually be a character vector or scalar. However, as a convenience ŒXML also allows you to specify numeric values. These are formatted as character data before copying to the XML result. Numeric values are also allowed for attribute values (but not names).

Example:


      array„1 4½0 '?xml version="1.0" encoding="utf-8"?' '' ('' '')
      array„array®0 'Person' '' ('' '')
      array„array®1 'Name' '' ('order' 'western')
      array„array®2 'FirstName' 'Fred' ('' '')
      array„array®2 'LastName' 'Smith' ('' '')
      array„array®1 'DateOfBirth' '' ('' '')
      array„array®2 'Year' 1943 ('' '')
      array„array®2 'Month' 12 ('' '')
      array„array®2 'Day' 17 ('' '')
      XML„ŒXML array
      ŒSS XML ŒL ŒR    © Convert line feeds to carriage return for display
<?xml version="1.0" encoding="utf-8"?>
<Person>
    <Name order="western">
        <FirstName>Fred</FirstName>
        <LastName>Smith</LastName>
    </Name>
    <DateOfBirth>
        <Year>1943</Year>
        <Month>12</Month>
        <Day>17</Day>
    </DateOfBirth>
</Person>

The conversion process can be controlled by an optional left argument, for example:


     R„('whitespace' 'preserve') ŒXML apl_data

The only supported option is:

Adding the XML Prologue

To be valid, an XML file must start with a line containing an XML prologue, e.g.


<?xml version="1.0" encoding="utf-8"?>

Note that ŒXML does not add the prologue automatically. To ensure that the XML is valid you must do one of two things:

(a) Make sure that the first row of the array used to generate the XML contains a valid prologue, as in the example above, or

(b) Prepend the prologue after the XML has been generated:


      XML„ŒXML 1 'Name' 'Fred Smith'
      XML„'<?xml version="1.0" encoding="utf-8"?>',ŒL,XML

If you create an XML file using ŒEXPORT, APLX will automatically add the prologue if it is missing from the array.


Acknowledgment

This work is based on the original design concepts and implementation by Mark E. Johns, and has been designed in cooperation with Dyalog Ltd


Topic: APLX Help : Help on APL language : System Functions & Variables : ŒXML Convert to/from XML
[ Previous | Contents | Index ]