Structure and General Syntax

A Rexx program is built from a series of clauses that are composed of:

Conceptually, each clause is scanned from left to right before processing, and the tokens composing it are identified. Instruction keywords are recognized at this stage, comments are removed, and sequences of whitespace characters (except within literal strings) are converted to single blanks. Whitespace characters adjacent to operator characters and special characters are also removed.

Characters

A character is a member of a defined set of elements that is used for the control or representation of data. You can usually enter a character with a single keystroke. The coded representation of a character is its representation in digital form. A character, the letter A, for example, differs from its coded representation or encoding. Various coded character sets (such as ASCII and EBCDIC) use different encodings for the letter A (decimal values 65 and 193, respectively). This book uses characters to convey meanings and not to imply a specific character code, except where otherwise stated. The exceptions are certain built-in functions that convert between characters and their representations. The functions C2D, C2X, D2C, X2C, and XRANGE depend on the character set used.

A code page specifies the encodings for each character in a set. Be aware that:

Whitespace

A whitespace character is one that the interpreter recognizes as a "blank" or "space" character. There are two characters used by Rexx as whitespace that can be used interchangably:

(blank)

A "blank" or "space" character. This is represented by '20'X in ASCII implementations.

(horizontal tab)

A "tab". This is represented by '09'X in ASCII implementations.

Horizontal tabs encountered in Rexx program source are converted into blanks, allowing tab characters and blanks to be use interchangeably in source. Additionally, Rexx operations such as the PARSE instruction or the SUBWORD() built-in function will also accept either blank or tab characters as word delimiters.

Comments

A comment is a sequence of characters delimited by specific characters. It is ignored by the program but acts as a separator. For example, a token containing one comment is treated as two tokens.

The interpreter recognizes the following types of comments:

A line comment is started by two subsequent minus signs (--) and ends at the end of a line. Example:

"Fred"
"Don't Panic!"
'You shouldn''t'        -- Same as "You shouldn't"
""

In this example, the language processor processes the statements from 'Fred' to 'You shouldn''t', ignores the words following the line comment, and continues to process the statement "".

A standard comment is a sequence of characters (on one or more lines) delimited by /* and */. Within these delimiters any characters are allowed. Standard comments can contain other standard comments, as long as each begins and ends with the necessary delimiters. They are called nested comments. Standard comments can be anywhere and of any length.

/* This is an example of a valid Rexx comment */

Take special care when commenting out lines of code containing /* or */ as part of a literal string. Consider the following program segment:

01    parse pull input
02    if substr(input,1,5) = "/*123"
03      then call process
04    dept = substr(input,32,5)

To comment out lines 2 and 3, the following change would be incorrect:

01    parse pull input
02 /* if substr(input,1,5) = "/*123"
03      then call process
04 */ dept = substr(input,32,5)

This is incorrect because the language processor would interpret the /* that is part of the literal string /*123 as the start of a nested standard comment. It would not process the rest of the program because it would be looking for a matching standard comment end (*/).

You can avoid this type of problem by using concatenation for literal strings containing /* or */; line 2 would be:

if substr(input,1,5) = "/" || "*123"

You could comment out lines 2 and 3 correctly as follows:

01    parse pull input
02 /* if substr(input,1,5) = "/" || "*123"
03      then call process
04 */ dept = substr(input,32,5)

Both types of comments can be mixed and nested. However, when you nest the two types, the type of comment that comes first takes precedence over the one nested. Here is an example:

"Fred"
"Don't Panic!"
'You shouldn''t'        /* Same as "You shouldn't"
""                      -- The null string         */

In this example, the language processor ignores everything after 'You shouldn''t' up to the end of the last line. In this case, the standard comment has precedence over the line comment.

When nesting the two comment types, make sure that the start delimiter of the standard comment /* is not in the line commented out with the line comment signs.

Example:

"Fred"
"Don't Panic!"
'You shouldn''t'        -- Same as /* "You shouldn't"
""                      The null string         */

This example produces an error because the language processor ignores the start delimiter of the standard comment, which is commented out using the line comment.

Tokens

A token is the unit of low-level syntax from which clauses are built. Programs written in Rexx are composed of tokens. Tokens can be of any length, up to an implementation-restricted maximum. They are separated by whitespace or comments, or by the nature of the tokens themselves. The classes of tokens are:

Literal Strings

A literal string is a sequence including any characters except line feed (X"10") and delimited by a single quotation mark (') or a double quotation mark ("). You use two consecutive double quotation marks ("") to represent one double quotation mark (") within a string delimited by double quotation marks. Similarly, you use two consecutive single quotation marks ('') to represent one single quotation mark (') within a string delimited by single quotation marks. A literal string is a constant and its contents are never modified when it is processed. Literal strings must be complete on a single line. This means that unmatched quotation marks can be detected on the line where they occur.

A literal string with no characters (that is, a string of length 0) is called a null string.

These are valid strings:

"Fred"
"Don't Panic!"
'You shouldn''t'        /* Same as "You shouldn't" */
""                      /* The null string         */

Implementation maximum: A literal string has no upper bound on the number of characters, limited on by available memory.

Note that a string immediately followed by a left parenthesis is considered to be the name of a function. If immediately followed by the symbol X or x, it is considered to be a hexadecimal string. If followed immediately by the symbol B or b, it is considered to be a binary string.

Hexadecimal Strings

A hexadecimal string is a literal string, expressed using a hexadecimal notation of its encoding. It is any sequence of zero or more hexadecimal digits (0-9, a-f, A-F), grouped in pairs. A single leading 0 is assumed, if necessary, at the beginning of the string to make an even number of hexadecimal digits. The groups of digits are optionally separated by one or more whitespace characters, and the whole sequence is delimited by single or double quotation marks and immediately followed by the symbol X or x. Neither x nor X can be part of a longer symbol. The whitespace characters, which can only be byte boundaries (and not at the beginning or end of the string), are to improve readability. The language processor ignores them.

A hexadecimal string is a literal string formed by packing the hexadecimal digits given. Packing the hexadecimal digits removes whitespace and converts each pair of hexadecimal digits into its equivalent character, for example, "41"X to A.

Hexadecimal strings let you include characters in a program even if you cannot directly enter the characters themselves. These are valid hexadecimal strings:

"ABCD"x
"1d ec f8"X
"1 d8"x

Note: A hexadecimal string is not a representation of a number. It is an escape mechanism that lets a user describe a character in terms of its encoding (and, therefore, is machine-dependent). In ASCII, "20"X is the encoding for a blank. In every case, a string of the form "....."x is an alternative to a straightforward string. In ASCII "41"x and "A" are identical, as are "20"x and a blank, and must be treated identically.

Implementation maximum: The packed length of a hexadecimal string (the string with whitespace removed) is unlimited.

Binary Strings

A binary string is a literal string, expressed using a binary representation of its encoding. It is any sequence of zero or more binary digits (0 or 1) in groups of 8 (bytes) or 4 (nibbles). The first group can have less than four digits; in this case, up to three 0 digits are assumed to the left of the first digit, making a total of four digits. The groups of digits are optionally separated by one or more whitespace characters, and the whole sequence is delimited by matching single or double quotation marks and immediately followed by the symbol b or B. Neither b nor B can be part of a longer symbol. The whitespace characters, which can only be byte or nibble boundaries (and not at the beginning or end of the string), are to improve readability. The language processor ignores them.

A binary string is a literal string formed by packing the binary digits given. If the number of binary digits is not a multiple of 8, leading zeros are added on the left to make a multiple of 8 before packing. Binary strings allow you to specify characters explicitly, bit by bit. These are valid binary strings:

"11110000"b        /* == "f0"x                  */
"101 1101"b        /* == "5d"x                  */
"1"b               /* == "00000001"b and "01"x  */
"10000 10101010"b  /* == "0001 0000 1010 1010"b */
""b                /* == ""                     */

Implementation maximum: The packed length of a binary-literal string is unlimited.

Symbols

Symbols are groups of characters, selected from the:

  • English alphabetic characters (A-Z and a-z).

  • Numeric characters (0-9)

  • Characters . ! ? and underscore (_).

Any lowercase alphabetic character in a symbol is translated to uppercase (that is, lowercase a-z to uppercase A-Z) before use.

These are valid symbols:

Fred
Albert.Hall
WHERE?

If a symbol does not begin with a digit or a period, you can use it as a variable and can assign it a value. If you have not assigned a value to it, its value is the characters of the symbol itself, translated to uppercase (that is, lowercase a-z to uppercase A-Z). Symbols that begin with a number or a period are constant symbols and cannot directly be assigned a value. (See Environment Symbols.)

One other form of symbol is allowed to support the representation of numbers in exponential format. The symbol starts with a digit (0-9) or a period, and it can end with the sequence E or e, followed immediately by an optional sign (- or +), followed immediately by one or more digits (which cannot be followed by any other symbol characters). The sign in this context is part of the symbol and is not an operator.

These are valid numbers in exponential notation:

17.3E-12
.03e+9

Numbers

Numbers are character strings consisting of one or more decimal digits, with an optional prefix of a plus (+) or minus (-) sign, and optionally including a single period (.) that represents a decimal point. A number can also have a power of 10 suffixed in conventional exponential notation: an E (uppercase or lowercase), followed optionally by a plus or minus sign, then followed by one or more decimal digits defining the power of 10. Whenever a character string is used as a number, rounding can occur to a precision specified by the NUMERIC DIGITS instruction (the default is nine digits). See Numbers and Arithmetic for a full definition of numbers.

Numbers can have leading whitespace (before and after the sign) and trailing whitespace. Whitespace characters cannot be embedded among the digits of a number or in the exponential part. Note that a symbol or a literal string can be a number. A number cannot be the name of a variable.

These are valid numbers:

12
"-17.9"
127.0650
73e+128
" + 7.9E5 "

You can specify numbers with or without quotation marks around them. Note that the sequence -17.9 (without quotation marks) in an expression is not simply a number. It is a minus operator (which can be prefix minus if no term is to the left of it) followed by a positive number. The result of the operation is a number, which might be rounded or reformatted into exponential form depending on the size of the number and the current NUMERIC DIGITS setting.

A whole number is a number that has a no decimal part and that the language processor would not usually express in exponential notation. That is, it has no more digits before the decimal point than the current setting of NUMERIC DIGITS.

Implementation maximum: The exponent of a number expressed in exponential notation can have up to nine digits.

Operator Characters

The characters + - \ / % * | & = ¬ > < and the sequences >= <= \> \< \= >< <> == \== // && || ** ¬> ¬< ¬= ¬== >> << >>= \<< ¬<< \>> ¬>> <<= indicate operations (see Operators). A few of these are also used in parsing templates, and the equal sign and the sequences +=, -=, *= /=, %=, //=, ||=, &=, |=, and &&= are also used to indicate assignment. Whitespace characters adjacent to operator characters are removed. Therefore, the following are identical in meaning:

345>=123
345 >=123
345 >= 123
345 > = 123

Some of these characters (and some special characters—see the next section) might not be available in all character sets. In this case, appropriate translations can be used. In particular, the vertical bar (|) is often shown as a split vertical bar (¦).

Note: The Rexx interpreter uses ASCII character 124 in the concatenation operator and as the logical OR operator. Depending on the code page or keyboard for your particular country, ASCII 124 can be shown as a solid vertical bar (|) or a split vertical bar (¦). The character on the screen might not match the character engraved on the key. If you receive error 13, Invalid character in program, on an instruction including a vertical bar character, make sure this character is ASCII 124.

Throughout the language, the NOT (¬) character is synonymous with the backslash (\). You can use the two characters interchangeably according to availability and personal preference.

The Rexx interpreter recognizes both ASCII character 170 ('AA'X) and ASCII character 172 ('AC'X) for the logical NOT operator. Depending on your country, the ¬ might not appear on your keyboard. If the character is not available, you can use the backslash (\) in place of ¬.

Special Characters

The following characters, together with the operator characters, have special significance when found outside of literal strings:

,   ;   :   (   )   [   ]   ~

These characters constitute the set of special characters. They all act as token delimiters, and whitespace characters (blank or horizontal tab) adjacent to any of these are removed. There is an exception: a whitespace character adjacent to the outside of a parenthesis or bracket is deleted only if it is also adjacent to another special character (unless the character is a parenthesis or bracket and the whitespace character is outside it, too). For example, the language processor does not remove the blank in A (Z). This is a concatenation that is not equivalent to A(Z), a function call. The language processor removes the blanks in (A) + (Z) because this is equivalent to (A)+(Z).

Example

The following example shows how a clause is composed of tokens:

"REPEAT"   A + 3;

This example is composed of six tokens—a literal string ("REPEAT"), a blank operator, a symbol (A, which can have an assigned value), an operator (+), a second symbol (3, which is a number and a symbol), and the clause delimiter (;). The blanks between the A and the + and between the + and the 3 are removed. However, one of the blanks between the "REPEAT" and the A remains as an operator. Thus, this clause is treated as though written:

"REPEAT" A+3;

Implied Semicolons

The last element in a clause is the semicolon (;) delimiter. The language processor implies the semicolon at a line end, after certain keywords, and after a colon if it follows a single symbol. This means that you need to include semicolons only when there is more than one clause on a line or to end an instruction whose last character is a comma.

A line end usually marks the end of a clause and, thus, Rexx implies a semicolon at most end of lines. However, there are the following exceptions:

Rexx automatically implies semicolons after colons (when following a single symbol or literal string, a label) and after certain keywords when they are in the correct context. The keywords that have this effect are ELSE, OTHERWISE, and THEN. These special cases reduce typographical errors significantly.

Note: The two characters forming the comment delimiters, /* and */, must not be split by a line end (that is, / and * should not appear on different lines) because they could not then be recognized correctly; an implied semicolon would be added.

Continuations

One way to continue a clause on the next line is to use the comma or the minus sign (-), which is referred to as the continuation character. The continuation character is functionally replaced by a blank, and, thus, no semicolon is implied. One or more comments can follow the continuation character before the end of the line.

The following example shows how to use the continuation character to continue a clause:

say "You can use a comma",       -- this line is continued
"to continue this clause."

or

say "You can use a minus"-       -- this line is continued
"to continue this clause."