There Is No Such Thing As Binary Data In This World

The «binary data» is a myth, created by very unintelligent people. This is just another undefined term in IT amongst millions of its brethren. Oh, dear! if you disagree I challenge you to define it or at least look up a definition. Seriously, give it a try.
I have challenged many advocates of «binary data» to define their beloved product of words. All of them (who are not indoctrinated enough to refuse the challenge altogether) immediately slipped into reasoning about text editors, terminals, and ASCII. But, wait, «my terminal can not display binary data» — is your terminal's problem and nothing more. Some terminals can not display unicode — is it «binary»? All different kinds of terminals unable to display different kinds of data. The same goes to the text editors — which particular byte sequences make a text editor cringe and glitch is specifically defined within the text editor and nowhere else.

Besides that I believe it is important to remember that the «data» in the present context is neither a text editor nor a terminal, not any other program utilized for any operation on the subject data. So, please, stay on-topic and tell me, can you define the data's property «binary» on the data's own merits?

Surprisingly, the answer is «yes». And more than that, the answer is «Yes, certainly! Why not?!». If properly tortured by the Socratic method the majority of the «binary data» advocates agree on the following formula: «A string of bytes is considered binary if it contains any bytes from a certain fixed set». However defining this set of «binary» bytes is itself a problem. This problem returned us to the original failed reasoning about terminals and text editors, and consequently was subjected to the procedure: «please, explain the motives of the membership of each particular byte in the set». It turned out that the majority of proposed «binary» bytes are derived from very specific software requirements and lead us to a non-universal definition, saying, a chunk of data is considered «binary» RELATIVE TO a particular set of programs. Which is not surprising, but very unsatisfactory. When they say «binary» they apparently mean a UNIVERSAL property.

And this universal property condenses to the following definition: «A string of bytes is considered binary if it contains a 0-byte» — what a marvel of numerology! — clean, sharp, charming, and completely useless. Well done!

In order to illustrate the uselessness of this definition, I refer you to the PostgreSQL type system. There is a pair of types TEXT and BINARY, the distinction between them is described in the mainstream «binary data» terms, and manifests in the output format. Whereas internally these types are indistinguishable, they are both treated equally in any aspect other than output, and share the same internal representation «varlena». You only need a few minutes of the code reading to discover how identical these types are. I believe the shallowness of the practical difference between them tells a lot.

0 comments

Only registered users can comment.