BARfly Help - BAR Implementation File Reference - Fundamental Components

  String Literals

String literals are sequences of characters representing a series of ASCII characters, with either single quotes, double quotes, or CDATA sectionsused to delimit the literal.  Unicode characters are not supported.

String ::= NullTermString | SmallString | CDATAString

There are two types of string literals:  null-terminated string literals and non-null-terminated string literals.

Null-terminated String Literals

NullTermStringPortion ::= ‘”’ ([#32-#33] | [#35-#91] | [#93-#126] | EscapedCharacter)* ‘”’
NullTermString ::= (NullTermStringPortion S?)+

The literal is composed of all characters listed in consecutive chunks of double-quote-bounded character sequences; a trailing null byte is appended automatically.

In general, it is only necessary to specify a single instance of double-quote-bounded character sequence, but more can be used if a long string will not fit on a single line.  Only whitespace can occur between consecutive chunks for the entire series to be considered a single string—preprocessor, remarks, and other punctuators are not allowed.

A null-terminated string literal cannot contain more than 512 characters.

Non-null-terminated String Literals (Small Strings)

SmallString ::= “’” ([#32-#38] | [#40-#91] | [#93-#126] | EscapedCharacter)* “’”

The literal is composed of all characters listed in a single-quote-bounded character sequence.  A null byte is not appended to the string automatically.

A non-null-terminated string literal cannot contain more than 10 characters.  Because the string is often used to identify numerical representations of characters, which require only 1-8 characters, this category of string literal is sometimes called “small string.”

Escaped Characters

EscapedCharacter ::= ‘\’ (‘a’ | ‘b’ | ‘t’ | ‘n’ | ‘v’ | ‘f’ | ‘r’ | ‘”’ | “’” | ‘\’ | (‘x’ HexDigit+) | ([0-9]+) | (#13? #10))

There is often a need to fill string literals with unprintable characters or characters that cause problems in compilation if they appear verbatim.  For this reason, escaped characters are supported by the BAR compiler to allow any ASCII character to be a part of a string literal.

The following are recognized escaped characters:

  • \a is character 7 (bell).
  • \b is character 8 (backspace).
  • \t is character 9 (horiztonal tab).
  • \n is character 10 (newline or line feed).
  • \v is character 11 (vertical tab).
  • \f is character 12 (form feed).
  • \r is character 13 (carriage return).
  • is character 34 (double-quote character).
  • is character 39 (single-quote character).
  • \\ is character 92 (backslash character).
  • \x(nnn) is a custom character specified by ASCII code. The value of “nnn” is a hexadecimal number representing the code.
  • \(nnn) is a custom character specified by ASCII code. The value of “nnn” is an octal number representing the code.
  • \(line break) denotes that the string will continue as the first non-whitespace character beyond the line break.

CDATA Sections

CDATAString ::= ‘<![CDATA[’ ^‘]]>’* ‘]]>’

A CDATA section, borrowed from SGML syntax, is a way to represent a string with no characters between the start and end of the declaration being treated as markup--everything is treated as verbatim in the string, including tabs and line breaks.  The only character combination not allowed inside the CDATA section is the closing three-character combination, ']]>'.

Use CDATA sections to define long multi-line strings.  CDATA sections are ideal for long blocks of documentation.  With CDATA sections, it is possible to define a string up to 16,383 characters long.

The compiler places a null byte as the last character of the string (CDATA sections are always null-terminated).


  See also:  [Punctuators] [Operators] [Keywords]
[Identifiers] [Numbers] [String literals] [Remarks]
[Preprocessor directives] [Whitespace] [Unrecognized characters]


BARfly Help Copyright © 2009 Christopher Allen