Bits, Bytes and Binary
Learn how binary is practically used.
Last updated
Learn how binary is practically used.
Last updated
Computers have billions of tiny switches that can switch electrical signals on and off. A computer is an electronic device that works just as a light bulb connected to a power source.
In computers, information is represented and processed using binary digits or bits. These bits are typically represented by voltage levels, where a high voltage might represent a 1 and a low voltage represents a 0 (ON=1, OFF=0). These voltages are ‘transferred’ around the parts of the computer using wires.
Here is an example of storing the denary number 21 in an electrical circuit:
Each individual digit in a binary value is referred to as a bit, (from the term binary digit). In a computer we can represent binary values by using ON and OFF voltage signals for each individual bit.
For n bits, a computer can produce 2^n
different combinations of values and the maximum value it can product is (2^n)-1
. For example, for 3 bits, a computer can product 2^3 (8)
different values, which are:
As you can see, are were 8 total possible combinations of the binary sequences for 3 bits, and the maximum value is 7.
As a binary value gets larger it will need more bits to store the number. A computer has fixed wiring that cannot be adjusted to accommodate more bits; instead, it works with bits grouped together into units called bytes. A byte is a collection of 8 bits. Two or more bytes can be grouped together to hold larger values
Here is an example of storing the denary number 21 in an electrical circuit in a single byte:
Kilo
KB
1,000
Mega
MB
1,000,000
Giga
GB
1,000,000,000
Tera
TB
1,000,000,000,000
Traditionally computer scientists used these same number prefixes to refer to groups of bytes. The numbers above were used historically for many things, but in the realm of computing, they were incorrect. So in 1998, the International Electrotechnical Commission (IEC) established different prefixes to represent multiples of base-2 (so instead of 10 being the base of the powers, it is now 2).
Kibi
KiB
1,024
Mebi
MiB
1,048,576
Gibi
GiB
1,073,741,824
Tebi
TiB
1,099,511,627,776
In 1963 the American Standard Code for Information Interchange (ASCII) was established to encode symbols found in the English alphabet. It was composed of a 7 bit character set giving 128 possible binary codes.
Every character in your keyboard is represented by a binary value. Upper case letter have different binary values from their lower case representations (e.g. denary value of A is 65, whist value for a is 97). Also, there's different values to represent punctuations (denary value for space is 32).
In Addition, the first 32 values of ascii are control characters, meaning they are used to send commands/instructions to the hardware. An eighth bit was later introduced into ascii for extra characters such as © and ®.
Numbers are also encoded in ASCII. For example, the number 9 in ascii is 0111001
whereas the binary representation of number 9 is 00001001
. With the ascii number, you cannot perform mathematical operations unlike the binary representation. The "9" is ascii is like a string in programming languages, where as the binary representation is an integer.
As you may have already recognised, ascii is very limited. It only has characters that are mainly used to represent the english language. In order to solve this, unicode system was introduced to standarise the encoding of characters from all languages.
Unicodes apply a variable length encoding of either 16 bits and 32 bits (around 4 bytes). To improve the adoption of this new standard, the first 128 characters of unicode were chosen to be the same as the ones in the ascii set.
One main disadvantage of using unicode is that more bits are needed for non-ascii characters, which can cause the files to take up more space, unlike ascii or other encoding systems that are language specfic.
When data is transmitted, it doesn't always arrive in the same format as it was sent — as it gets corrupted.
Here are few reasons for the data being corrupted
Electrical Interferences
Power surges (unexpected changes in voltage)
Synchronisation issues
Wear and tear on the cable or connectors
These problems with transmission cause the bits to flip from 1s to 0s
and 0s to 1s
.
Here are the few techniques used to check for errors in data transmission:
Parity bits
Majority voting
Check digits
Checksums
Once we have confirmed that there has been an error, a request to resend the data is invoked, as we cannot correct the corrupted data.
Computers submit data in bytes (8-bits). Normal ascii characters don't make sure of the most significant bit (8th bit) as they only make sure of 7-bits. This leaves us with one unused bit for every byte we send. We can put this into a good use and turn it into a parity bit.
Specfic computers are designed to either use odd parity or even parity. Because of these protocols, the computer counts the number of 1s in the byte received, and checks if the sum is even or odd. If it doesn't match the protocol (if the number of ones is odd in an even parity computer), the computer will recognise that there has been some corruption to the data.
The main disadvantage of using a parity bit is that if two bits in a byte are corrupted, the resultant numbers of 1s will still remain in the same even/odd classification, and therefore the error will not be detected.
This is when each bit of the message is sent 3 times. Then the recipient computer looks for patterns in the data. It checks for the bit which has been sent in majority in the 3 requests, and then assumes that the most repeated bit in the 3 requests in the bit which was intended to be sent. For example, if one of the bits sent three times was 1 in two of the transmissions and 0 in the last transmission, the computer will assume that the broadcaster/sender intended to send 1.
A disadvantage of majority voting is that now you will have to send bits 3 times more just to transmit an 8-bit character.
Check digits are a form of redundancy check used for error detection on identification numbers such as bank account number and books ISBN number. Check digits are usually used in places where long strings are numbers are inputted, thus making them more prone to human typing errors, so check digits help us get around this.
Printed books and other products have a unique barcode with a ISBN (International Standard Book Number) or EAN (European Article Number). The first 12 digits of the barcode are the unique item number, the 13th is the check digit calculated by an algorithm based on the other 12 digits. This can be calculated using the Modulo 10 system.
Checksums work a bit similar to check digits. Before sending the data, the computer uses algorithms to turn the data into a hash. Then the whole data is transmitted, along side the hash. Then the receiving computer recalculates the checksum of the data, and compares it with the hash given by the sending computer. If both of the hashes are the same, the data was not corrupted. If the hash does not match, it means it has been altered/corrupted.
Often, checksums are used when downloading large files. For example, if you want to install a +2GB ISO file for Ubuntu, you will see a pre-calculated checksum on their website. Once you have downloaded the ISO file, you can run the hash algorithm locally and check if your checksum matches theirs.