The ŒNREAD function allows you to read data from anywhere in the file, specifying an optional start byte, count and conversion mode. The file must have been opened in a mode which permits reading. The full syntax is ({} means optional) :
R „ ŒNREAD TIENO {,CONV {,COUNT {,STARTBYTE}}}
TIENO is the tie number associated with the file to read
CONV specifies any conversion to apply to the data - e.g. read as raw data,
translated characters, 4-byte integers, booleans, etc. The default is to
read the file as raw character data. See below for details.
COUNT specifies the number of elements to read (except when CONV = 11,
see below). The number of bytes read from the file will depend on the
conversion mode used (see below). A value of ¯1 (default) specifies read to
end-of-file.
STARTBYTE may be used to specify the offset in bytes from the beginning
of the file at which to start reading data. A value of ¯1 (default)
specifies the current file position, the position at which the last
successful read or write operation completed.
The conversion mode parameter CONV can be used to read data very easily from a file with a known structure. Data can be read as raw bytes, translated characters, booleans, integers or floating point numbers. In addition byte-swapping facilities allow data to be read from a file created on a host with different local byte-ordering conventions. The full list of supported values for the conversion mode is as follows :
Normal modes: 0 read data as a stream of raw bytes 1 read data as booleans, 1 bit per element 2 read data as 32-bit integers 3 read data as 64-bit IEEE double-precision floating point numbers 4 read character data and translate from external representation to APLX's own internal format 5 read Unicode UTF-16 characters (two bytes per element), and convert to APLX internal representation as characters. Any Unicode values which cannot be mapped to APLX characters are converted to the value set by ŒMC (by default, question mark). 6 read data as 32-bit IEEE single-precision floating point numbers 8 read Unicode UTF-8 characters (variable bytes per element), convert to APLX internal representation as characters. Any Unicode values which cannot be mapped to APLX characters are converted to the value set by ŒMC (by default, question mark).
Byte-swapped modes: ¯2 read data as 32-bit byte-swapped integers ¯3 read data as 64-bit byte-swapped floats ¯5 read data as byte-swapped Unicode characters ¯6 read data as 32-bit byte-swapped floats
For compatibility with some other APL interpreters the following conversion specifiers are also supported :
11 read data as booleans (same as mode=1) 82 read data as raw characters (same as mode=0) 163 read data as 16-bit integers. Values are converted to 32-bit integers before being returned. They are treated as unsigned. 323 read data as 32-bit integers (same as mode=2) 325 read data as 32-bit floating point numbers (same as mode=6) 645 read data as 64-bit floating point numbers (same as mode=3) ¯163 read data as unsigned 16-bit integers with byte-swapping ¯323 read data as 32-bit integers with byte-swapping (same as mode=¯2) ¯325 read data as 32-bit floats with byte-swapping (same as mode=¯6) ¯645 read data as 64-bit floats with byte-swapping (same as mode=¯3)
Under APLX64, the following additional conversion types are available:
7 read data as 64-bit integers ¯7 read data as 64-bit integers with byte-swapping 643 read data as 64-bit integers (same as mode=7) 643 read data as 64-bit integers with byte-swapping (same as mode=¯7)
The optional COUNT specified in the ŒNREAD argument relates to the number of elements to read, not necessarily the number of bytes. For example when reading 32-bit integers, the number of bytes read will be four times the value of COUNT. Note that when reading boolean data with CONV = 11, the value of COUNT specifies the number of bytes to read rather than the number of bits. This is done for compatibility with some other APL interpreters.
Note that all file i/o operations start on a byte boundary. In particular, following a boolean read operation that returns a non-integral number of bytes, the current file position will be aligned on the next byte boundary.
Two possible errors may occur when specifying an inappropriate value of COUNT or CONV. If the number of bytes remaining in the file is insufficient to satisfy the request, a FILE I/O ERROR occurs and the current file position is unchanged. For example it is an error to try to read 10000 bytes from a file with only 9000 bytes remaining :-
ŒNREAD 100 0 10000 Insufficient data available FILE I/O ERROR ŒNREAD 100 0 10000 ^
To read all data up to the end of file, omit the count parameter or specify it as ¯1.
A second type of error can arise when trying to read integers or floats. If the count is not explicitly specified and the number of bytes remaining in the file is not an exact multiple of the element size, a FILE I/O ERROR occurs and the current file position is again unchanged. For example, if there are 23 bytes remaining in the file an attempt to read them as 4-byte integers will fail since there is too much data for five integers and not enough for six:
ŒNREAD 100 2 Wrong number of bytes remain for data type requested FILE I/O ERROR ŒNREAD 100 2 ^
Examples:
Read all bytes from current file position to end of file as raw data:
ŒNREAD 100
Read from current file position to end of file as integers:
ŒNREAD 100 2
Read next ten integers from file:
ŒNREAD 100 2 10
Read ten floats starting at offset 20 bytes from start of file:
ŒNREAD 100 3 10 20
By convention, Unicode UTF-16 plain-text files start with a 'byte-order' mark. This is
the special hex value FEFF, represented as a two-byte value in the
byte-ordering used to create the file. Thus, on 'big-endian' systems such as the Macintosh, the first two bytes
of the file will normally be hex FE and FF (decimal 254 and 255). On a 'little-endian' system such as Windows
or x86 Linux, numbers are represented backwards so the first two bytes will
normally be FF and FE.
You can use this
information to determine whether to use the conversion type 5 or ¯5 when reading the contents of a UTF-16 text
file, by reading the first element of the file as a 16-bit integer (conversion
code 163 for ŒNREAD. If you get the value 65279 (hex FEFF), the
Unicode file was written using the same byte-ordering as the machine you are
running on, so no byte reversal is required and you can use conversion code 5 to read the Unicode characters from the
remainder of the file. If you get the
value 65534 (hex FFFE), the Unicode file was written using the opposite
byte-ordering convention to that of the machine you are using, so you need to
use conversion code ¯5. For
example:
'c:\temp\uni.txt' ŒNTIE 1 © Open a UTF-16 text file ŒNREAD 1 163 1 0 © Read first two bytes as 16-bit integer 65279 © This is the correct value for hex FEFF ŒAF 4 ŒDR 65279 0 0 254 255 TEXT„ŒNREAD 1 5 ¯1 © Read the remainder of the file as Unicode ŒNUNTIE 1
If you want to read UTF-16 files without converting them to APLX characters, use conversion type 163 or ¯163, to read them as 2-byte (unsigned) integers, with byte-swapping if necessary. This allows you to process Unicode values which cannot be represented in the APLX character set. If you later need to convert the returned integer values to APLX text, use ŒUCS.
See also ŒMC, which contains the character used to replace Unicode characters which cannot be represented in APLX.