Archive for the ‘BAR Design Strategies’ Category

Software Development Case Study!

Wednesday, September 30th, 2009

Now here’s something software developers will find very useful! The following is a case study on how to use BARfly to design and implement levels for a video game.

I’ve already got several video game file types supported here. In particular, WAD (for Doom), NIB and NGR (for Nibbler), and a special Kroz level extractor from source code, which I’ve mentioned in previous entries.

But what if your level designs are more complex than linear? Whether it’s in binary or text format, many developers can get by with just a linear file format. A grid speaks for itself, with or without run-length encoding, and objects can be spot-placed with coordinate pairs (or triplets). Unfortunately, many games have trickier design formats. Like these:

  1. Doom. This and other FPS games geometrically organize the points, lines, and textures. But when it comes to linking all these objects together, you’ll need a system to “compile” them into something usable before a game can be played.
  2. Abuse. Not as well-known as Doom, but still worth a mention. Abuse is a very curious beast because LISP is used to drive almost all parts of the gameplay. What is LISP? It’s a linked-list based language, one you’re likely to encounter only in grad-level computer science. Each object, each enemy, etc. has one or more “links” to another object or enemy. Pretty open-ended!
  3. Diablo. This game, and its sequel, Diablo II, have dynamic level design. This means you don’t just have a level grid or spot-placements of objects: you have patterns, probabilities, and schemes determining what a level “might” look like. Levels are mixed in a similar fashion to Nethack or Rogue, which are text-based predecessor games that influenced Diablo’s creators significantly.

Questions to ask: how do you store links and relationships between objects? How do such relationships shape the data structures and storage formats themselves? How much work must be done in the construction of the levels versus the game engine’s actual setup process?

For credibility’s sake, I have no choice but to answer the above questions myself. I’m coming out with a new game soon, called Brian’s Journey. This uses a new game engine, which, at the core, is just a platform-independent graphical wrapper around the BAR engine. To design a level in this game, you’ll need two files: a room definition file and a shape interpretation file.

Text or binary? How about both? This is a profound “gotcha” for game developers. People care immensely about the end product (the game), but the path to the end product can be impossibly bumpy. Do you make text files, which means you use just a text editor, saving time, but perhaps making visualization harder? And making a tricky text-to-binary conversion part of the engine? Or do you pack everything in binary, forcing you to construct a complex editor that few people might ever use, and even fewer like using? Giving you the advantage, perhaps, of less pre-map processing when the level is actually played?

Thanks to BARfly, you can actually have both. Here’s what the shape interpretation file looks like for Brian’s Journey:

Shape file (text)

Looks kind of like source code. In a manner of speaking, it is. Now look at what happens when BARfly loads it:

Shape file (BARfly)

Awesome. Everything’s a node. But do we really want to cycle through all that filler syntax to get at the associations we need? After running the I.F. function “Normalize,” the data looks like this:

Shape file (final)

This function strips out comments and whitespace and gives us just name-value pairs! Architecturally, a program won’t have any trouble accessing this vectored data. Here’s my actual game engine code that loads a shape interpretation file:

//Load interface to shape file.
sh8 = i.Create_BAR(”lf8r_shapeinterp.bar”);
if (!sh8) return 1;

//Load file.
if (sh8->Load(filename) < 0) return 2;

//Normalize (get rid of extra components).
assoc_count = sh8->Function_Call_L(”ShapeCharAssociations.Normalize”);

Not many lines, huh? With so much of the parsing and conversion done in the I.F., the game developer is free to keep the game’s business logic pegged to architecture, rather than low-level parsing.

Okay, great. But that was the easy part. What about the room definition file format? Here’s an snippet of a room definition file:

Room file (text)

Quite complex, indeed. The room definition has some parts ordered and some parts random–taking the best principles from Nethack, Diablo, Kroz, Vintage Hyperactive, and a variety of static level design formats. And because it’s text-based, you won’t need to design rooms in an editor (although you can later). Loading in BARfly gives you the following:

Room file (BARfly)

Do we have normalization I.F. functions here, too? We do. We run first Assimilate_Binary_Style, which removes comments, whitespace, etc…

Room file (intermediate)

…and then Normalize, which translates the remaining text-based nodes to binary-based nodes.

Room file (final)

Before, everything was a text field. Now, everything is a binary structure. If you save this file and reload it, it stays a binary file!

This is a rather novel method, from a software architect’s perspective, when translating text into binary. Traditionally, an architect will need to store text and binary fields separately, forcing a kind of “double vision” to develop. On one side you have your human-readable text, and on the other side you have your binary structures and relationships, which require their own separate architectures for processing.

But with BARfly, you can implement and test all the parsing and conversion logic without ever having to make the video game that uses it (I haven’t yet). There is no “dual architecture.” There is only one. Rather than throw away the source, you preserve the physical relationships of the nodes, and gradually “nibble around the edges” of the text architecture until it becomes a binary format.

Because the I.F for the room definition files does all the heavy lifting, the game engine only needs to deal with the binary format. Compilation isn’t just implicit as part of the level’s load process: it’s totally invisible to the game’s architecture!

Some of you sharp developers might have started to wonder what this means for games that would like to have the ability to read other games’ file formats. Currently, it’s not worth it, because you’d need to invest in a whole separate library of code devoted to reading just that other format. But if the architecture is pre-built into its own BAR implementation file? You’d just need to distribute that I.F., nothing more.

RegExp in BAR: An Application

Monday, August 24th, 2009

We know regular expressions are used all the time. But what do they look like, and how do they fit into the larger scheme of things with BAR?

I’ve provided a case study here. There is an I.F. for Kroz level files (one of Apogee’s earliest game creations) on this site. When I designed the I.F. originally, it picked apart data from source code by using BAR’s 1.0 functionality, which was relatively limited–binary parsing only. But source code is best parsed as text, by regular expressions…which only debuted with version 1.3b of BAR.

No one should reasonably expect to write binary parsing code for most text formats, especially if you aren’t dealing with a cornucopia of powerfully optimized functions in your arsenal. Instead, just write regular expressions for the simplest of syntaxes and work your way up in complexity. Here’s a complete list of the regular expression assignments used in kroz_alt.bar, the 1.3b-capable version of the Kroz level file reader:


//Unordered level syntax
block unorganized textual df_header ::= "DF[" ["0-9"]+ "]:=" [^"'"]* "'";
block unorganized textual df_chars ::= [^"'"]*;
block unorganized textual df_transition ::= "'" ["\x0-\x20"]* "+" [^"'"]* "'";
block unorganized nofragment df_spawncounts { unittype = unsigned short; };

//Ordered level syntax
block unorganized textual procedure_header ::= "procedure Level" ["0-9"]+ ";" ["\x0-\x20"]* "begin";
block unorganized textual fp_padding ::= ^"FP[";
block unorganized textual fp_header ::= "FP[" ["0-9"]+ "]:=" ["\x0-\x20"]* "'";
block unorganized textual fp_chars ::= [^"'"]*;
block unorganized textual fp_remainder ::= ^"end;";
block unorganized textual nofragment fp_symbol_line {
unittype = unsigned char;
enum = kroz_char_enums;
};

//Transition syntax
block unorganized textual df_filler ::= ^"DF[";
block unorganized textual df_filler2 ::= "DF[" ^"DF[";
block unorganized textual fp_filler ::= ^"procedure Level";
block unorganized textual fp_filler2 ::= "procedure Level" ^"procedure Level";

You'll never need to write memcmps and strcmps and strtoks and mids and strlens and...you get the idea. All the above unorganized blocks instantly validate and size properly if the pattern matches!

Of course, there is often a need to put fields together in particular patterns, in which you must extract individual portions of each field. You don't want to make a truly massive regular expression for the entire dataset in such a case--it gives you only one node of data. Instead, you'll want to employ organized blocks to characterize the text fields in a more list-friendly fashion:


block organized ordered_level_line {
mainbody nodelist {
block fp_padding;
block fp_header;
block fp_chars;
};
bool Termination() { return (++iterations >= max_ylines); };
};

block organized ordered_level {
void Initialization() { iterations = 0; };

mainbody nodelist {
block procedure_header;
block organized ordered_level_lines {
mainbody nodelist repeats {
block ordered_level_line;
};
};
block fp_remainder;
};
};

block organized unordered_level {
mainbody nodelist {
block df_header;
block df_chars;
choice optional { block df_transition; };
choice optional { block df_chars; };
choice optional { block df_transition; };
choice optional { block df_chars; };
};
};

In the final release of the I.F., I've made the block definitions a bit more complex, of course. This is because the above definitions only give you the text fields as an ordered list--people would find the data a heckuva lot more useful if the fields had undergone alphanumeric-to-binary conversion, had enumerations broken out, etc.

To do this, just add some Deserialize calls and you're good to go.

The output for an unordered level looks like this:


struct unordered_spawn_counts {
slow_enemy = 600,
medium_enemy = 0,
fast_enemy = 0,
breakable_block = 0,
whip = 20,
stairs = 1,
chest = 0,
slow_time = 5,
gem = 30,
blindness_potion = 0,
teleport_scroll = 5,
key = 0,
door = 0,
solid_wall = 0,
speed_time = 0,
teleport_trap = 0,
river = 0,
power_ring = 0,
forest = 0,
tree = 0,
bomb = 0,
lava = 0,
pit = 0,
staff = 0,
tunnel = 0,
freeze_time = 0,
nugget_or_artifact = 20,
quake_trap = 5,
invisible_breakable_block = 0,
invisible_solid_wall = 0,
invisible_door = 0,
enemy_stop_space = 0,
enemy_activator = 0,
enemy_zap_spell = 0,
enemy_creation_trap = 0,
enemy_generator = 0,
enemy_activator2 = 0,
moving_block = 700...

A list of descriptive spawn counts for a randomly generated level! Now that's useful. But is it an improvement? See for yourself how Scott Miller encoded them originally:


DF[21]:=
{ 1 2 3 X W L C S + I T K D # F . R Q B V = A U Z * E ; : - @ ] G ( M )}
'600 20 1 5 30 5 20 5 700 '+
{ P ! O H N [ | " 4 5 6 7 8 9 Y 0 ~ $}

Good Lord! I’m dead serious! If you can make sense of that, you let me know!

As far as parsing difficulty is concerned, I’d place Kroz level files in the “moderate” category when it comes to what you can do with regular expressions. XML would fall into the “easy” category because it’s very sound and has a well-defined syntax. METAR would fall into the “hard” category because it’s ill-defined, inconsistent, and barely even human-readable. Not that any text format would be impossible.

Since BAR is a relatively new technology, I’m all ears for interesting new challenges people have with ETL or other readabilty/portability/conversion issues. Chances are good that BAR can tear it to pieces within hours.

Regular Expressions: A subset of BAR

Monday, July 27th, 2009

Great news! We’re now at 1.3b of the BAR engine. This means that you can define both text and binary syntaxes easily with BAR.

BAR was syntactically weak when it came to validating and sizing text strings in earlier versions. Take the text “procedure Level17 ;” for example. If this is a de-facto header, it doesn’t fit within a neat header structure in BAR. You’ll need to account for lots of variable-length portions of data, with optional whitespace, and character combinations not easily reconciled by the basic scripting functionality:

char procedure_start_string[] = "procedure Level";
block unorganized textual nofragment procedure_header_1 {
unittype = char;

bool Validation()
return (!memcmp(this, procedure_start_string, strlen(procedure_start_string)));
};
long BlockSize() {
return strlen(procedure_start_string);
};
};

And this is only one portion! The entire portion is characterized by the following:

block organized procedure_header {
mainbody nodelist {
block procedure_header_1;
block numerals;
choice optional { block whitespace; };
block semicolon;
};
};

Note we haven’t even declared how whitespace, numerals, or semicolon are supposed to characterize our fields. The bottom line, folks: this is a yucky, yucky way to characterize text formats.

With BAR 1.3b, you can simplify everything. Replace all the above with just this one line:

block unorganized textual procedure_header ::= "procedure Level" ["0-9"]+ ["\x0-\x20"]* ";"

That’s it! Just one line for a node with a complex syntax.

Regular expressions, which are often defined using either Perl “slashed” syntax or Extended-Backus-Naur Form (EBNF), are rather difficult to read if you’re not familiar with them. However, they are easy to understand once you get the hang of them, and syntactically, they are incredibly powerful.

In BAR’s case, I have chosen to use regular expression syntax that closely resembles the EBNF definitions found on W3C’s website for XML and other formats (http://www.w3.org). I’ve also been designing a still-unreleased I.F. called XML.BAR, which uses many of the same expressions from W3C as a way to characterize unorganized blocks in BAR.

BAR now supports most of the staples found in regular expressions:

  • Quoted strings: using “abc” or ‘abc’, indicates presence of whole, case-sensitive strings.
  • Character classes []: using brackets, indicates multiple character choices that can be present at any one particular character location.
  • Asterisk (*): place on end of expression to repeat indefinitely, and make expression optional.
  • Plus (+): place on end of expression to repeat indefinitely, and force presence of at least one iteration.
  • Question (?): place on end of expression to make expression optional (0 or 1 instance only).
  • Specific Repeat Counts {3, 5}: place on end of expression to make expression have a repeat count within a specific range. In this example, minimum is 3 iterations, maximum is 5 iterations.
  • NOT operator (^): place inside character class, in front of quoted string, or in front of parenthetical notation to match every possibility BUT the combination to the right.
  • AND, OR, and AND NOT operators ((space, |, -): adjacent expressions with just a space between them (AND), a pipe between them (OR), or a hyphen between them (AND NOT) act as boolean operators when testing multiple conditions in expressions.

There are still limitations:

  • Character classes allow a NOT operator inside brackets, but it must not be quoted.
  • Character classes have valid characters or ranges inside quotes (single or double). All markup is consistent with BAR’s backslash-oriented markup for string literals; there is no Perl-like markup for whitespace such as /s or related escapes.
  • To specify the hyphen character in a character class, it must be placed at the very beginning of the string. All other appearances count as range specifiers.
  • ^”abc” Has the effect of returning all characters leading UP to the combination “abc”, if it exists. If “abc” doesn’t exist, the entire set of remaining characters is returned.
  • [^"0-9"] Has the effect of matching all characters EXCEPT numerals.
  • ^(”abc” | “123″) Only looks at IMMEDIATE location for non-match to “abc” or “123″. Will not scan for either combination and then stop.
  • ["a-z"]* – “aa” Only excludes a combination that starts with “aa”. Will not extend to the first arbitrary point at which “aa” is found.
  • AND has higher priority than OR, which in turn has higher priority than AND NOT.
  • Unorganized blocks are forced to have 1-byte character unit type as well as the nofragment attribute.
  • You cannot specify already-declared names of unorganized blocks in these expressions. For example, you can’t declare “Name” first, then declare Name2 ::= Name ” ” Name ” ” Name.
  • Organized blocks cannot be declared using regular expressions.

I would eventually like to relax many of these restrictions, especially the last two. Feedback on what sort of improvements you’d like to see in this arena is more than welcome.

The file size and time-to-implementation for many of my formats in the works has dropped dramatically as a result of these changes! A few syntactic changes can go a long way. When each BAR I.F. can act as a unique integrated compiler, the possibilities are endless.

Why Use BAR?

Thursday, July 9th, 2009

One of the most common questions I’ve found people asking me is this: Why would I want to use BAR or BARfly? What advantage do I gain by using this product?

Hmmm…if the product features described in the Main BARfly Website don’t provide a good answer, it will be a hard question to answer.

It’s possible your needs are very specific. For this reason, I’ve provided in the documentation some ideas about who you might be and why you would want to use BARfly. A quick rehash:

  1. Software Developers: People wanting to write code to support and maintain particular file formats
  2. Software Architects: People wanting to design structural elements to a software application
  3. Software Testers: People wanting to examine the contents of generated files or memory content
  4. Security Auditors: People wanting to study a company’s ability to keep data secure from hackers and crackers
  5. Database Administrators: People wanting to detect flaws and inefficiencies in a database, as well as develop solutions
  6. System Troubleshooters: People wanting to audit, diagnose, and fix files (tasks that were expensive or impossible before BARfly)
  7. Network Administrators: People wanting to examine traffic over a network in a schema-oriented fashion
  8. System Administrators: People wanting to do a number of the things listed above
  9. Cryptographers and Cryptanalysts: People trying to design and crack encrypted formats
  10. Casual Validators: Analysts that wants to check a file for consistency
  11. Data Entry Specialists: Individuals that must perform high-throughput data entry and format conversions
  12. Very Curious People: Individuals wanting to find out what all that weird unreadable stuff on their hard drive is

There are three builds of BARfly, which have capabilities reflecting the needs of the user:

  1. BARfly Bronze: Contains only viewing capability. You can view files, but you cannot edit them. Nor can you develop your own BAR implementation files.
  2. BARfly Silver: Contains viewing and editing capability. You can view files, edit them, and save them. You cannot develop your own BAR implementation files.
  3. BARfly Gold: Contains viewing and editing capability, plus the ability to develop BAR implementation files. This build comes with an integrated compiler, BARCC, that allows a user unlimited ability to create, edit, test, and use customized schemas.

Now, as for using the BAR engine in your own application, I’ll let you answer that question yourself. There are a HUGE number of technologies and languages being used. The fact that many “meta-languages” have been created has actually made the problem worse, because when people program in a meta, it actually reduces the readability and comprehensibility of the code or markup being written.

What’s to gain by learning an entirely new one? A lot, actually. For the most part, there’s very little to learn. If you know C++, you’ll know about 98% of BAR. BAR tries to keep the scripting and data-definition language low-level for handling low-level data. Where BAR is truly unique is in two simple definition types: blocks, and lists.

Think of BAR as the “XML Schema” for all files, text or binary. There are profound advantages to having applications reference platform-independent schemas for all their architecture, whether it’s in-memory only, secondary-storage only, or some combination of the two.

As of this writing, no attempt has been made to offer the BAR software development kit on this website. If enough people have looked at the documentation and are interested in trying it out, I’ll release it at a very reasonable cost, perhaps even for free.