5️⃣Compression & Encryption

Learn different ways of reducing file size & ways in which data can be hidden.

Data Transfer & Storage

As you already know, data is constantly being moved around computers and network due to the wide spread of usage of interconnected network.

Many times, we face problems with data transfer and storage. Data Transfer needs to be high-speed and accurate. Sometimes, when distances get longer, data transfer is slower and prone to more interferences. Likewise, Data Storage also has limitations on how much you can store.

Compression

To help us with storing and transferring, we can compress text, images and sound data. This can significantly reduce the data in size, which can help us to transfer the data faster through the internet, and we can store more data in computers.

Reducing the data brings the following benefits:

  • Data is sent more quickly

  • Less bandwidth is used as transfer limits may apply

  • Buffering on Audio/Video stream is less likely to occur

  • Less storage is required

Types of Compression

Lossy β€” Non-essential data is completely removed, for example; different shades of same colour in an image; frequencies of sound outside human hearing range.

Lossless β€” Patterns of data are spotted and used to summarise the data in shorter format without permanently removing any information.

Lossy Compression – JPG

JPEG compression reduces file size by changing the colour values and blocking together groups of pixels with a more uniform colour, so that it doesn't have to store as many different ones. While this does decrease the file size, it also alters the true image by changing the colours. Which can have adverse affect on the image's quality.

Lossy Compression – MP3

MP3 lossy compression removes sound in frequency that is beyond human hearing range so it doesn't affect the playback quality that much. Also, quieter notes that are played at the same time as louder sounds are also removed.

Lossless Compression

Lossless compression involves recording patterns in the data and storing that, instead of storing the whole data. Using a pattern information can reduce the file size without losing any data. A disadvantage of using lossless compression is that it reduces less file size, as compared to lossy compression.

Two main type of lossless compressions are: Run-Length-Encoding & Dictionary-based-compression. And these methods work best on files that have lots of repeated data.

Run-Length-Encoding

RLE is a basic method of compression that summarises consecutively patterns of the same data. It works best with images & sounds where data could be consecutively repeated many times.

Example of RLE Compression

As the image above shows, using RLE can replace repeated pixels with one pixel value and a number of repetitions has reduced the storage space required to represent the image.

Dictionary-based Compression

This type of compression spots regularly occurring data and stores it separately in a dictionary. Then whenever a reference is made to the repeated data, it calls the variable/key which stores the data rather than rewriting the data itself.

A disadvantage of using dictionary method is that the dictionary contained the must also be stored in the same file, which can increase the size of the compressed file. This has a significant effect when the data that needs to be compresses is too short.

Compressing large volumes

  • In a text document each letter could be stored as an ASCII code of 8 bits.

  • In this document the word β€˜because’ requires 56 bits of data (7 letters x 8 bits)

  • Instead, the word could be added to a dictionary and assigned the binary code 01 which is a reduction of 54 bits for each occurrence.

Lossy Compression
Lossless Compression

Some data is lost during the process

No loss of data

Quality of the data is reduced

No loss of quality

The extent to which file size can be reduced is not limited

There is a limit to how much a file can be compressed

Encryption

Encryption is the process of systematically scrambling data so that is cannot be understood even if it is intercepted, this helps keep it secure during transmission.

Unencrypted data is referred to as plaintext, whilst an encrypted data is called ciphertext. A cipher is a type of encryption method. In order to decrypt ciphertext, you must know the encryption method used, and also the key that was used to encrypt.

Caesar Ciphers

Caesar ciphers encrypt information by replacing characters. One character is always replaced by the same character. There are two main types of caesar cipher: Shift Ciphers & Substitution Ciphers.

Shift Ciphers

When encrypting using shift cipher, all of the letters in the alphabet are shifted by the same amount. The amount which by which the character are shifted form the key that can be used to decrypt the ciphertext.

Substitution Ciphers

In this type of cipher, the character are randomly replaced. This method is a bit better than shift ciphers but it is still relatively easy to crack.

An example of substitution cipher

Vernam Cipher

The Vernam cipher is an example of a one-time pad cipher. This means that each key should only ever be used once. Additionally, the Vernam cipher requires the key to be random and at least as long as the plaintext (in length) that is to be encrypted.

How Vernam Cipher Works?

Vernam cipher works by aligning the plaintext and the key and then converting each character to binary by using an . Then XOR logical operation is applied onto the two bit patterns, which is then converted back to a character.

Example of vernam cipher encryption

Conditions for good security

In order to make sure that your vernam cipher messages are not decoded, you must follow certain security protocols such as; using a truly random key that is equal in length; discarding the key after every use so each key is only used once; sharing the key directly to the recipient by hand to ensure no-one else knows about the key.

One-time pads

The one-time pad must be truly random, generated from a physical and unpredictable phenomenon.

  • Sources may include: atmospheric noise, radioactive decay, the movements of a mouse or snapshots of a lava lamp.

  • A truly random key will render any frequency analysis useless as it would have a uniform distribution.

  • Computer generated β€˜random’ sequences are not actually random

Cracking Ciphers

All ciphers other than the Vernam cipher are, in theory, crackable. Caesar ciphers can easily be cracked via methods such as brute forcing and frequency analysis.

Brute Force

A brute force attack attempts to apply every possible key to decrypt ciphertext until one works. Check out my website which specialises in decrypting ciphertext by brute forcing the keys.

Spaces are removed from ciphertext in order to make decryption harder.

Frequency Analysis

Letters are not equally used in terms of frequency. In English, E is by far the most commonly used letter, followed by T, A, O, I, N, S, R, then H. In order to crack, you can check which character is mostly repeated in the targeted ciphertext, and compare it with normal text and try to discover a character. Once you discover just one character, a shift cipher can be completely cracked, as the key can be found. Substitution ciphers are a little better but are still relatively easy to crack.

Here is an awesome playlist you can use to learn more!

Last updated