It’s the 21st century, computers are all around us, and explaining them can yield to some pretty interesting blog content.
If you have used a computer you’ve used or at the very least heard of a file, and you have probably seen many different file extensions (.png, .wav, .docx, .txt).
“What makes them different?” You might ask.
To a computer a file looks exactly like any other file. Strings of binary, rows upon rows of 0’s and 1’s. A computer has no notion of a, b, c, unless we tell them something like 001 is a 010 is b and 100 is c. So that’s what we do. And to make it more readable we turn 8 bits (each digit in binary) into a more compressed byte (which is 2 digits in hex). This is turning 10001010 (base-2) into 8a (base-16).
This means that different files are just different ways of reading those bytes. Some files have strict formatting rules and some have no rules at all.
There are essentially two different kinds of files, even though all files are really just bytes. Human-readable and binary. Binary files are files that aren’t really intended on being read by humans, while human-readable is exactly what it sounds like.
.txt files are human-readable, if you open one up and readily convert the bytes to characters without following any formatting rules then you’ll get a file that you should be able to read.
.csv files are also human-readable but have a common formatting they have commas separating all of the variables. These are common for spreadsheets.
On the opposite end, things like .docx, the document used to hold your Microsoft Word document, is binary. It sounds confusing, but .docx is capable of holding pictures and formatting and colors and so many things that a conventional .txt couldn’t hold.
Another binary file could be something like .png which can display cool images given the proper program to read it, but also looks like this when you open it in a hex editor.
The right side shows what the byte values on the left look like as character, and is what it will look like if you try to open a .png in a text editor (like notepad). If you didn’t have a program to interpret it (like paint) you wouldn’t be able to get an image.
A couple of thing are worth noting here though. Notice “IHDR” on the first line?
That indicates to a .png reader that it is the first chunk of the .png. It has to be there and all the future data is interpreted based off of that chunk.
On the flip-side “IEND” indicates the last chunk of the .png. This lets the .png reader know to stop reading the file, since it won’t get anymore information about the image.
This means that you could shove a ton of data at the end of a .png file and it won’t be read. For example: The entire Bee Movie Script.
While retaining a completely normal .png image of Barry from the Bee Movie, you can actually put the entire Bee Movie Script by Script-o-rama.com on the end of it. That said, if you download that image right now it won’t have it on there, because the image reader for WordPress actually will chop it all off after only reading what is needed for the image.
If you did decide to open up a text editor and try it yourself, it would look something like this, and the image would look exactly the same, when you opened it up.