NOTE: This entry is rather technical in nature, geared towards programmers.
A fellow named Robby recently posed a software engineer’s dilemma when it comes to characterizing a file like a database. The question is, at what point is the data useful to me? It might be useful only when converted into the data I want. Or, alternatively, it might only be useful completely raw. Or, possibly, somewhere in the middle.
In other words, where, in the process of deserialization, do you “stop?”
The nice thing about BAR is that you can choose, in the schema, exactly how far you wish to go when you deserialize. You are still limited by the nature of the file format itself, of course: those formats that are constructed with little consideration given to hierarchy, organization, or resynchronization on error will limit a person’s options.
Robby’s example was a JPEG file. Good example, because I offer a free JPEG I.F. on the website.
At the rawest of raw, use FLAT. This yields the entire JPEG file as a single unorganized block. You can read or write anything with FLAT–but chances are you want just a bit more detail.
The next step up is the free format, which breaks the data into segments. The actual image scan itself, though, is untouched.
The next step up is characterization of the bit scan fields (Arithmetic or Huffman). However, no attempt at converting the fields takes place.
The next step up is converting the bit fields into data that can be used. But…I’m being too kind here! In fact, there are four or five individual “stopping points” you could rest at, since many decoding steps are necessary for JPEG. This includes…
1) Arithmetic/Huffman field translation
2) IDCT (Inverse Discrete Cosine Transform) translation
3) Quantization
4) Component generation (generally YUV)
5) Image pixel generation (generally RGB)
Robby’s question was what was useful to him. But I’m asking a more radical question: what are all the possible ways this format can be useful to you?
BAR gives you an unprecedented luxury in being able to “see” the progress as it’s being done. If you need to develop an encoding or decoding implementation on your own, you generally have to rely on classic debugging and testing techniques: conditional breakpoints, single-stepping, debug-output dumps of iterative data. Not to mention cumbersome exception-handlers when you’ve inevitably screwed up.
If you screw up in BAR? No exceptions. Full call stack report. Full node record report. Immediate API return. All that, and it’s platform-independent, AND language-independent.
BAR can be used to characterize. BAR can be used to convert. I won’t presume to know exactly what each individual wants to do with his or her files. The power of BAR is really in the question, not the answer.