2️⃣Bits, Bytes and Binary

Learn how binary is practically used.

Circuits

Computers have billions of tiny switches that can switch electrical signals on and off. A computer is an electronic device that works just as a light bulb connected to a power source.

In computers, information is represented and processed using binary digits or bits. These bits are typically represented by voltage levels, where a high voltage might represent a 1 and a low voltage represents a 0 (ON=1, OFF=0). These voltages are ‘transferred’ around the parts of the computer using wires.

Here is an example of storing the denary number 21 in an electrical circuit:

Binary Digits

Each individual digit in a binary value is referred to as a bit, (from the term binary digit). In a computer we can represent binary values by using ON and OFF voltage signals for each individual bit.

For n bits, a computer can produce 2^n different combinations of values and the maximum value it can product is (2^n)-1. For example, for 3 bits, a computer can product 2^3 (8) different values, which are:

As you can see, are were 8 total possible combinations of the binary sequences for 3 bits, and the maximum value is 7.

Bytes

As a binary value gets larger it will need more bits to store the number. A computer has fixed wiring that cannot be adjusted to accommodate more bits; instead, it works with bits grouped together into units called bytes. A byte is a collection of 8 bits. Two or more bytes can be grouped together to hold larger values

Here is an example of storing the denary number 21 in an electrical circuit in a single byte:

Prefixes for Bytes

Prefix

Symbol

Multiple

Kilo

1,000

Mega

1,000,000

Giga

1,000,000,000

Tera

1,000,000,000,000

Traditionally computer scientists used these same number prefixes to refer to groups of bytes. The numbers above were used historically for many things, but in the realm of computing, they were incorrect. So in 1998, the International Electrotechnical Commission (IEC) established different prefixes to represent multiples of base-2 (so instead of 10 being the base of the powers, it is now 2).

Prefix

Symbol

Multiple

Kibi

KiB

1,024

Mebi

MiB

1,048,576

Gibi

GiB

1,073,741,824

Tebi

TiB

1,099,511,627,776

Ascii

In 1963 the American Standard Code for Information Interchange (ASCII) was established to encode symbols found in the English alphabet. It was composed of a 7 bit character set giving 128 possible binary codes.

Every character in your keyboard is represented by a binary value. Upper case letter have different binary values from their lower case representations (e.g. denary value of A is 65, whist value for a is 97). Also, there's different values to represent punctuations (denary value for space is 32).

In Addition, the first 32 values of ascii are control characters, meaning they are used to send commands/instructions to the hardware. An eighth bit was later introduced into ascii for extra characters such as © and ®.

Numbers are also encoded in ASCII. For example, the number 9 in ascii is 0111001 whereas the binary representation of number 9 is 00001001. With the ascii number, you cannot perform mathematical operations unlike the binary representation. The "9" is ascii is like a string in programming languages, where as the binary representation is an integer.

Unicode

As you may have already recognised, ascii is very limited. It only has characters that are mainly used to represent the english language. In order to solve this, unicode system was introduced to standarise the encoding of characters from all languages.

Unicodes apply a variable length encoding of either 16 bits and 32 bits (around 4 bytes). To improve the adoption of this new standard, the first 128 characters of unicode were chosen to be the same as the ones in the ascii set.

One main disadvantage of using unicode is that more bits are needed for non-ascii characters, which can cause the files to take up more space, unlike ascii or other encoding systems that are language specfic.

Transmission Errors

When data is transmitted, it doesn't always arrive in the same format as it was sent — as it gets corrupted.

Here are few reasons for the data being corrupted

Electrical Interferences
Power surges (unexpected changes in voltage)
Synchronisation issues
Wear and tear on the cable or connectors

These problems with transmission cause the bits to flip from 1s to 0s and 0s to 1s.

Error Detections

Here are the few techniques used to check for errors in data transmission:

Parity bits
Majority voting
Check digits
Checksums

Once we have confirmed that there has been an error, a request to resend the data is invoked, as we cannot correct the corrupted data.

Parity Bit

Computers submit data in bytes (8-bits). Normal ascii characters don't make sure of the most significant bit (8th bit) as they only make sure of 7-bits. This leaves us with one unused bit for every byte we send. We can put this into a good use and turn it into a parity bit.

Specfic computers are designed to either use odd parity or even parity. Because of these protocols, the computer counts the number of 1s in the byte received, and checks if the sum is even or odd. If it doesn't match the protocol (if the number of ones is odd in an even parity computer), the computer will recognise that there has been some corruption to the data.

The main disadvantage of using a parity bit is that if two bits in a byte are corrupted, the resultant numbers of 1s will still remain in the same even/odd classification, and therefore the error will not be detected.

Majority Voting

This is when each bit of the message is sent 3 times. Then the recipient computer looks for patterns in the data. It checks for the bit which has been sent in majority in the 3 requests, and then assumes that the most repeated bit in the 3 requests in the bit which was intended to be sent. For example, if one of the bits sent three times was 1 in two of the transmissions and 0 in the last transmission, the computer will assume that the broadcaster/sender intended to send 1.

A disadvantage of majority voting is that now you will have to send bits 3 times more just to transmit an 8-bit character.

Check digits

Check digits are a form of redundancy check used for error detection on identification numbers such as bank account number and books ISBN number. Check digits are usually used in places where long strings are numbers are inputted, thus making them more prone to human typing errors, so check digits help us get around this.

Printed books and other products have a unique barcode with a ISBN (International Standard Book Number) or EAN (European Article Number). The first 12 digits of the barcode are the unique item number, the 13th is the check digit calculated by an algorithm based on the other 12 digits. This can be calculated using the Modulo 10 system.

Calculating the check digits

Checksum

Checksums work a bit similar to check digits. Before sending the data, the computer uses algorithms to turn the data into a hash. Then the whole data is transmitted, along side the hash. Then the receiving computer recalculates the checksum of the data, and compares it with the hash given by the sending computer. If both of the hashes are the same, the data was not corrupted. If the hash does not match, it means it has been altered/corrupted.

Often, checksums are used when downloading large files. For example, if you want to install a +2GB ISO file for Ubuntu, you will see a pre-calculated checksum on their website. Once you have downloaded the ISO file, you can run the hash algorithm locally and check if your checksum matches theirs.

Last updated 1 year ago