Compression & Encryption
Learn different ways of reducing file size & ways in which data can be hidden.
Data Transfer & Storage
As you already know, data is constantly being moved around computers and network due to the wide spread of usage of interconnected network.
Many times, we face problems with data transfer and storage. Data Transfer needs to be high-speed and accurate. Sometimes, when distances get longer, data transfer is slower and prone to more interferences. Likewise, Data Storage also has limitations on how much you can store.
Compression
To help us with storing and transferring, we can compress text, images and sound data. This can significantly reduce the data in size, which can help us to transfer the data faster through the internet, and we can store more data in computers.
Reducing the data brings the following benefits:
Data is sent more quickly
Less bandwidth is used as transfer limits may apply
Buffering on Audio/Video stream is less likely to occur
Less storage is required
Types of Compression
Lossy β Non-essential data is completely removed, for example; different shades of same colour in an image; frequencies of sound outside human hearing range.
Lossless β Patterns of data are spotted and used to summarise the data in shorter format without permanently removing any information.
Lossy Compression β JPG
JPEG compression reduces file size by changing the colour values and blocking together groups of pixels with a more uniform colour, so that it doesn't have to store as many different ones. While this does decrease the file size, it also alters the true image by changing the colours. Which can have adverse affect on the image's quality.
Lossy Compression β MP3
MP3 lossy compression removes sound in frequency that is beyond human hearing range so it doesn't affect the playback quality that much. Also, quieter notes that are played at the same time as louder sounds are also removed.
Lossless Compression
Lossless compression involves recording patterns in the data and storing that, instead of storing the whole data. Using a pattern information can reduce the file size without losing any data. A disadvantage of using lossless compression is that it reduces less file size, as compared to lossy compression.
Two main type of lossless compressions are: Run-Length-Encoding & Dictionary-based-compression. And these methods work best on files that have lots of repeated data.
Run-Length-Encoding
RLE is a basic method of compression that summarises consecutively patterns of the same data. It works best with images & sounds where data could be consecutively repeated many times.
As the image above shows, using RLE can replace repeated pixels with one pixel value and a number of repetitions has reduced the storage space required to represent the image.
Dictionary-based Compression
This type of compression spots regularly occurring data and stores it separately in a dictionary. Then whenever a reference is made to the repeated data, it calls the variable/key which stores the data rather than rewriting the data itself.
A disadvantage of using dictionary method is that the dictionary contained the must also be stored in the same file, which can increase the size of the compressed file. This has a significant effect when the data that needs to be compresses is too short.
Compressing large volumes
In a text document each letter could be stored as an ASCII code of 8 bits.
In this document the word βbecauseβ requires 56 bits of data (7 letters x 8 bits)
Instead, the word could be added to a dictionary and assigned the binary code 01 which is a reduction of 54 bits for each occurrence.
Some data is lost during the process
No loss of data
Quality of the data is reduced
No loss of quality
The extent to which file size can be reduced is not limited
There is a limit to how much a file can be compressed
Encryption
Encryption is the process of systematically scrambling data so that is cannot be understood even if it is intercepted, this helps keep it secure during transmission.
Unencrypted data is referred to as plaintext, whilst an encrypted data is called ciphertext. A cipher is a type of encryption method. In order to decrypt ciphertext, you must know the encryption method used, and also the key that was used to encrypt.
Caesar Ciphers
Caesar ciphers encrypt information by replacing characters. One character is always replaced by the same character. There are two main types of caesar cipher: Shift Ciphers & Substitution Ciphers.
Shift Ciphers
When encrypting using shift cipher, all of the letters in the alphabet are shifted by the same amount. The amount which by which the character are shifted form the key that can be used to decrypt the ciphertext.
Substitution Ciphers
In this type of cipher, the character are randomly replaced. This method is a bit better than shift ciphers but it is still relatively easy to crack.
Vernam Cipher
The Vernam cipher is an example of a one-time pad cipher. This means that each key should only ever be used once. Additionally, the Vernam cipher requires the key to be random and at least as long as the plaintext (in length) that is to be encrypted.
How Vernam Cipher Works?
Vernam cipher works by aligning the plaintext and the key and then converting each character to binary by using an . Then XOR logical operation is applied onto the two bit patterns, which is then converted back to a character.
Conditions for good security
In order to make sure that your vernam cipher messages are not decoded, you must follow certain security protocols such as; using a truly random key that is equal in length; discarding the key after every use so each key is only used once; sharing the key directly to the recipient by hand to ensure no-one else knows about the key.
One-time pads
The one-time pad must be truly random, generated from a physical and unpredictable phenomenon.
Sources may include: atmospheric noise, radioactive decay, the movements of a mouse or snapshots of a lava lamp.
A truly random key will render any frequency analysis useless as it would have a uniform distribution.
Computer generated βrandomβ sequences are not actually random
Cracking Ciphers
All ciphers other than the Vernam cipher are, in theory, crackable. Caesar ciphers can easily be cracked via methods such as brute forcing and frequency analysis.
Brute Force
Spaces are removed from ciphertext in order to make decryption harder.
Frequency Analysis
Letters are not equally used in terms of frequency. In English, E is by far the most commonly used letter, followed by T, A, O, I, N, S, R, then H. In order to crack, you can check which character is mostly repeated in the targeted ciphertext, and compare it with normal text and try to discover a character. Once you discover just one character, a shift cipher can be completely cracked, as the key can be found. Substitution ciphers are a little better but are still relatively easy to crack.
Last updated