A critical step in deserialization is a categorical action
taken as part of a binary file's deserialization procedure in BAR. The
action is manifested in the form a critical-step method of a
construct (a block or structure).
The full deserialization procedure is very complicated, but you
don't need to know the full procedure. BAR allows you to choose only
the critical steps that matter the most to you. There are so
many ways to read and interpret a data file that it is easiest to design an
I.F. around only the points that make a difference.
Of course, such versatility of the BAR engine means there may be more than one
way to construct the critical steps. This is the intent: if there
is more than one way, it gets you to your solution that much faster.
The critical step-methods of deserialization have the following
function names: Initialization, Validation, BlockSize, Termination,
OffsetTweak, and Deserialize.
Initialization
The Initialization method is called when BAR is considering
whether node data match a construct, but before there is any attempt at
actually validating the construct. This function is used to calculate
important quantities before any other methods are called on a
construct.
Definition of this method is optional. If the method is not defined, the
Validation method is called immediately afterwards.
The Initialization method has return type void and has no
parameters. Examples of implementations:
void Initialization() { _lh_ptr = this; }; //Saves a pointer to the node; this
pointer can be referenced later.
void Initialization() { _count = 0; }; //Resets a global counter to zero; this
counter can be used as a way to count iterations before terminating a repeating
list.
Validation
The Validation method provides a means of determining
whether or not a block or data structure is of a specific type.
BAR calls this method as a way of checking if the node data forms a "match"
with the presumed construct.
Definition of this method is optional. If the method is not defined,
BAR assumes a default validation condition: true.
Validation methods are invoked during deserialization of a file
in order to ensure file format compliance. Validation methods
also distinguish between constructs of different types in lists.
The Validation method has a Boolean return
value, with a true condition representing a matched
construct and false representing a mismatch.
The function has no parameters.
Examples of implementations:
bool Validation() {
_bfOffBits = bfOffBits; //Saves state variable.
return (bfType == 'BM'); //Returns true if the header starts with the
two letters 'BM'.
};
bool Validation() {
_offset = 0; //Resets running offset calculation.
if (*(long*)BAR_ID != 'BAR1' || version < 1) return false; //Returns
false if the header starts with
//something other than the four characters 'BAR1' or has a version
number less than 1.
return true; //Got past false conditions; return true condition.
};
BlockSize
The BlockSize method calculates the size of a block, in bytes,
bits, or units. BAR calls this method to size a block.
For this reason, BlockSize is only used when deserializing
blocks. BlockSize is not used when deserializing data structures.
Definition of this method is optional. If the method is not defined,
the block size automatically defaults to the maximum available size.
A BlockSize method has a long integer return value,
representing the size calculated. The function has no
parameters.
BlockSize is called immediately after the Validation
critical step for blocks. If Validation had failed, BlockSize
will not be called--BAR never calls BlockSize on an
invalid node.
The return value unit depends on the type of block being sized:
-
For unorganized blocks, the return value represents a unit
count.
-
For bit scan blocks, the return value represents a bit
count.
-
For organized blocks that are not bit scan blocks, the return
value represents a byte count.
BlockSize methods are invoked during deserialization of a file
in order to calculate the size of a block. A BlockSize method
can be defined as a direct method of a block, or it can be
defined as an indirect method (a child of the block's header
structure). If a header structure has a
BlockSize method, this method is used to size the block--not
the header structure.
A BlockSize method always has a range of values with which to
work whenever it is called. The global variable maxblocksize
represents the largest possible return value, while the global
variable minblocksize represents the smallest possible
return value. A critical error results if
BlockSize returns a value outside the range of [minblocksize,
maxblocksize].
BlockSize only serves to limit a block's size. It
cannot expand a block's size beyond its theoretical upper limits, and it cannot
return zero. Furthermore, the individual deserialization
nuances of a block's children might further limit a block's
size to a quantity lower than the value BlockSize might
return.
Examples of implementations:
long BlockSize() { return (textblock_offset + textblock_size); }; //Return sum
of two individual block portion sizes.
long BlockSize() { return ((chunkSize + 8 + 1) & ~1); }; //Return
header-provided size, plus additional fields and aligned on a two-byte
boundary.
long BlockSize() { return (_num_relocation_items * 4); }; //Return field count
scaled by the size of each field.
long BlockSize() { return (maxblocksize / 2); }; //Return half the maximum block
size.
Termination
The Termination method provides a means of determining when a
parent block’s node list has no more iterations
remaining. It is called after BlockSize.
A person uses the Termination method when there is no easy way to
size blocks ahead of actually processing their contents.
Repeating node lists can be used to capture a variable number
of children in an organized block, but these lists still need to be told when
to "stop." The Termination method informs a parent block
that this iteration is the last iteration.
A Termination method has a Boolean return
value, with true flagging the end of the list,
and false flagging the go-ahead to continue repeating the
list. The function has no parameters.
Termination methods are invoked during deserialization of a
file in order to find the last iteration of a node list. If the
method is not defined, BAR assumes a default termination condition:
false.
Examples of implementations:
bool Termination() { return true; }; //Have construct act as a de-facto
terminator: wherever it appears, it terminates the list.
bool Termination() { return (++_current_lump >= _num_lumps); }; //Terminate
only after a specific number of iterations.
OffsetTweak (advanced deserialization)
The OffsetTweak method calculates the new absolute position of
the parse cursor during deserialization. Offset
tweaking, as it is known in BAR, is an advanced concept that is
only designed to be used with nonlinear file formats.
If an OffsetTweak method is defined, it is called after BlockSize
and Termination have completed for the node, and
deserialization is also complete for all of the children of
the construct.
An OffsetTweak method has a long integer return value,
representing the new absolute position of the parse cursor. There are
also two parameters. The first parameter
is a long integer indicating the default absolute position
of the parse cursor; the second parameter is a Boolean
value indicating whether the method is invoked from a deserialization
operation (true) or serialization operation (false).
Because OffsetTweak is called by both deserialization and
serialization procedures, the second parameter is necessary to distinguish one
category of critical step from the other.
Offset tweaking is a very advanced way of conducting
deserialization, and should not be used unless absolutely necessary. When
the contents of a file are mapped in a nonlinear manner, OffsetTweak
methods are quite powerful: the constructs are linearly
organized while deserialized, but nonlinearly
organized while serialized.
If the method is not defined, the default absolute position of the parse
cursor is passed directly to the return value. The default
position is the absolute byte position just after the last byte of the
evaluated construct (forward-moving cursor with no bytes wasted). Unless
BAR is explicitly told to "jump," so to speak, the normal behavior is to
proceed with deserialization as if the file were linear.
An OffsetTweak method always has a range of values with
which to work whenever it is called. The global variable totalfilesize
represents the largest possible return value, while zero
represents the smallest possible return value. A critical
error results if OffsetTweak returns a value
outside the range of [0, totalfilesize].
Example of implementation:
long OffsetTweak(long old_offset, bool is_deserializing) {
_directory_offset += 16;
_lump_size = lump_size;
if (!is_deserializing) return old_offset; //Serialization unsupported
if (_lump_size == 0) {
if (_current_lump >= _num_lumps) return old_offset;
//Don't move parse cursor if at end
return _directory_offset; //Move parse cursor to next
directory entry
}
else {
return lump_offset; //Move parse cursor to next lump
}
};
Deserialize (advanced deserialization)
The Deserialize method performs translation of
a serialized chunk of data to its deserialized equivalent.
A Deserialize method has as return value a
pointer or reference to the construct in its deserialized
form. The function has no parameters. The size and
type of the deserialized construct need not match the size and type of the
serialized chunk of data. You have a node going in, and just about any
node going out.
Deserialize methods are invoked immediately after the OffsetTweak
critical step, if it exists. However, it is only necessary to
actually define a Deserialize method if the serialized
form of the data is of a very convoluted nature, such as a compressed
and/or encrypted bit stream.
If the method is not defined, the contents of the construct undergo no
changes as part of deserialization. In other words, you get
as output exactly the node processed as part of all previous critical-step
methods.
What makes Deserialize especially powerful is the prospect of
creating many from one, or one from many.
You can translate an organized block with many of children and sub-children
into just a simple data structure. Conversely, you can
translate just a few bytes into a massive organized block
with many "tree branches" just within the body of a single
function.
Examples of implementation:
void *Deserialize() {
char *oldbuffer = (char*)this;
long count = totalfilesize;
long final_size = totalfilesize;
char *newbuffer;
char *newcursor;
while (count) {
if (*oldbuffer == 13) --final_size;
++oldbuffer;
--count;
}
...
};
void *Deserialize() {
...
switch (composite_choice) {
case CTYPE_CHAR:
case CTYPE_UCHAR:
newptr = new chardata[total_result_size];
if (!newptr) { m_Last_Error = alloc_error_str;
return 0; }
break;
case CTYPE_CHAR_BE:
...
};
Rules governing critical steps
It is quite unusual to have a construct declared in an I.F. with every
critical-step method defined. The usage of critical steps depends on
the need for a particular type of calculation at a given point in
deserialization.
Still, an I.F. designer might become confused as to what is "actually happening"
during the deserialization procedure if one or two critical steps do not appear
to work as expected. The following rules should help the user understand
how critical steps fit together.
General order for critical steps is the following:
-
Minimum size test
(exit early if too small)
-
Initialization
(exit early if critical error)
-
Validation
(exit early if invalid)
-
BlockSize
(exit early if critical error; only called when deserializing blocks)
-
Termination
(exit early if critical error)
-
Organized block body deserialization
(exit early if body is invalid; only called when deserializing organized
blocks)
-
OffsetTweak
(exit early if critical error)
-
Deserialize (exit early if critical error)
If an organized block and its header structure both have the same
critical-step method (one direct declaration and one indirect declaration):
-
Initialization for block is called first;
Initialization for header structure is called
after BlockSize for block.
-
Validation for block is called
first; Validation for header structure is
called after BlockSize for block.
-
BlockSize for block is called first; BlockSize
for header structure is called after BlockSize
for block.
-
Termination for header structure is
called first; Termination for block
is called immediately afterwards.
-
OffsetTweak is only called for header structure.
It is not called for block.
-
Deserialize is only called for header structure.
It is not called for block.
Minimum size test:
The minimum size test is a special pre-deserialization step
for each construct. An unorganized block has a minimum
size of 1 unit, a data structure has a
minimum size (bits or bytes) equalling its fixed size as
declared in the I.F., and an organized block has a minimum
size of 1 byte or the size of its header structure,
if one exists, whichever one is larger. Failure to satisfy the minimum
size test renders the construct invalid before any critical steps are
executed.
Organized block body deserialization varies depending on the construct type
selected as the body:
-
Data structure: Just one data structure composes the
body. This is deserialized just after the Termination
critical step of the parent organized block.
-
Unorganized block: Just one unorganized block composes
the body. This is deserialized just after the Termination
critical step of the parent organized block.
-
Organized block: Just one organized block composes the
body. This is deserialized just after the Termination
critical step of the parent organized block.
-
Decision list: A choice between possible alternatives.
-
BAR tries to validate the first construct choice in the list,
accepting it as the block body if it is valid. If it is not valid, BAR
tries the second choice
in the list, and so on.
-
If BAR has gone through the entire list without finding a valid construct,
and the decision list is optional,
the block body is left empty.
-
If BAR has gone through the entire list without finding a valid construct,
and the decision list is NOT optional, the parent block is
rendered invalid.
-
Node list: A sequence of individual child constructs to
compose the organized block's node children. The children can be blocks,
structures, or decision lists.
-
If any list member other than an optional decision list is invalid,
the parent block is rendered invalid.
-
If the list does not repeat,
the block body is composed of just a single node list iteration.
-
If the list does repeat, the block body can be composed of one
or more iterations of the node list.
A repeating node list terminates under the following circumstances:
-
Maximum space for block body is exhausted. A BlockSize
method limits the size of a block, and if the last iteration of a node list
takes up all possible space, the node list terminates by default.
-
A Termination routine returned true for a child. A Termination
method in one of the parent block's children flags the iteration as the last
one. This will further limit a block size even if it was explicitly
assigned via a BlockSize
method.
-
The node list auto-terminates in the event of an invalid iteration.
The "autoterminate" attribute for a node list automatically
terminates a list if a single invalid iteration is encountered. In the
event of such an invalid iteration, the partial iteration is discarded, leaving
only valid iterations.
Block minimum size considerations:
-
Unorganized blocks must have at least one unit. You
cannot size an unorganized block to zero units as a way to declare it invalid;
you must explicitly flag its invalid condition with the Validation function.
-
Organized blocks can have a zero-length body only if the
block has a header structure. A zero-length body for a block
without a header structure yields a zero-length parent organized block, which
is not allowed. Instead of sizing an organized block to zero bytes,
you should explicitly flag its invalid condition with the Validation function.
-
The "autoterminate" attribute only works for node lists with
at least one iteration. You cannot use "autoterminate" to create a node
list of zero iterations.
See also: [Deserialization:
Critical steps] [Serialization: Critical
steps] [Fundamental components]
[Common declarations] [File
scope] [Structure scope] [Block
scope]
[Node list scope] [Decision
list scope] [Function scope] [Expressions]
[Compiler errors] [Compiler
warnings]
|