UBF(A)
is the data transport encoding for Armstrong's
Universal Binary Format.
It provides four primitive types: atoms (symbolic constants), integers, strings, and binary data.
There are two compound types: fixed-length tuples and variable-length lists.
Ubfa(2)
provides basic support in Limbo for reading and writing streams of UBF(A)-encoded data.
The
input
syntax is defined by the following rules:
-
input ::= item* '$'
item ::= integer | atom | string | binary | tuple | list | store | push | comment | tag
integer ::= '-'?[0-9]+
atom ::= "'" ([^\'] | '\\' | "\'")* "'"
string ::= '"' ([^\"] | '\\' | '\"')* '"'
binary ::= '~' byte* '~' # preceded by integer byte count
tuple ::= '{' item* '}'
list ::= '#' (item '&')*
store ::= '>' reg
push ::= reg
reg ::= [^-%"~'`{}#& \n\r\t,0-9]
comment ::= '%' ([^\%] | '\\' | '\%')* '%'
tag ::= '`' ([^\`] | '\\' | '\`')* '`'
White space is any sequence of blank, tab, newline or carriage-return characters, and can appear
before or after any instance of
item
in the grammar.
The
input
data is interpreted by a simple virtual machine.
The machine contains a stack of values of primitive and compound types, and a set of registers also containing
values of those types.
White space and comments are ignored.
Primitive
integer,
atom
and
string
values are pushed onto the stack as they are recognised.
Certain input bytes outside any value act as operators:
- {
- Note the current stack depth.
- }
- Pop stack values to restore the most recently noted stack depth.
Push a single value
representing a tuple of those items; the left-most value in the tuple is the last one popped
(the first in the original input stream).
- ~
- Pop an integer value
n
from the stack.
Read
n
bytes from the input stream and push a value onto the stack that represents them.
The next byte must be the character
~,
which is discarded.
- #
- Push a value representing an empty list onto the stack.
- &
- Pop a value
v.
Pop another value
l,
which must represent a list.
Push a value that represents
the list
v::l.
(Note that the items in a
list
therefore appear in reverse order in the input stream.)
- >reg
- Pop the top value from the stack and store it in a register labelled by the byte
reg.
- reg
- Push the value of register
reg
(which must be non-null) onto the stack.
- tag
- Associate the tag string with the value on top of the stack.
The
ubfa(2)
implementation does so by replacing it by a special
Tag
tuple.
- $
- End-of-input: there must be exactly one value on the stack,
which is the result.
Applications using UBF(A) typically take turns to exchange
input
values on a communication channel.