Steganography in Principle
Writen by Emmanuel Sodipo
1.0 Introduction
Steganography is one of the oldest arts that people have always wanted to have since they started communicating with each other, but sadly the least researched. Most people study steganography either as academic discipline or out of curiosity and I belong to the latter camp. Although steganography is used in military and commercial circuits the level of application and understanding is very low.
The term steganography as well cryptography was derived from the Greek language. The prefix crypto comes from the Greek word kryptos, which means hidden or secret. The suffix graphy was derived from graphia, which means writing. Cryptography is essentially the art of secret writing and the goal is to maintain the secrecy of the message even if it is visible. Steganography is also a form of writing (concealed writing). The Greek word steganos means unseen or hidden. Steganography is a form of hidden communication, it should not be seen as a replacement for cryptography but rather as a complement to it. Steganography, although closely related to cryptography, is different. The goal of cryptography is to conceal the content of a message, while the goal of steganography is to conceal the existence of a message. However, these two techniques can be combined effectively by first encrypting the secret message before embedding in a cover data. Concealing the transmission of encrypted messages enhances their overall security since outsiders are unaware of the communication.
Encrypted data can attract the attention of hackers and investigators through its mere existence, however if concealed, no attempt would be made to break the code or to obtain the secret key. Steganographic methods primarily use image or audio files to hide encrypted data, such techniques conceal information in the least significant bits of the carrier medium, which serves as a hiding place. It is important that the carrier medium does not lose its appearance after the embedding process.
Another technique similar to steganography is watermarking, the goal of watermarking is to mark an image or sound file to the owner by making elusive modifications to the file. These modifications should not be noticeable but rather, very robust; nobody should be able to remove an existing mark or mark an already marked file as belonging to him. This technique is of great interest to the entertainment industry because it gives an efficient way to determine if a file was illegally downloaded from the web or rightfully purchased.
A good steganography system should fulfil the same requirements posed by the Kerckhoff principle in cryptography that security of a system should not rely on the on its method of operation being unknown to the enemy, but rather on the choice of a secret key.
1.1 Background
In recent years there has been an exciting convergence of information protection technologies and the main emphasis is information hiding as oppose to encryption. The two big policy issues of copyright protection and state surveillance motivated this development. The more information that is placed on the Internet or public media, the more the owner of the information need to protect themselves from theft and abuse. The entertainment industry is particularly very nervous due to the ease at which exact copies of digital music and video can be made. The way forward is to embrace advance technology to protect investment rather than oppose it. Part of the solution may be a change in the sale process of music and video; one mechanism is copyright marking (hiding notices and serial numbers in a way that would be difficult for pirates to remove). Systems and techniques that can uncover hidden information will be useful in computer forensics and digital traffic analysis. Understanding the limitations of current techniques can help develop more robust techniques. The principal focus is hiding information or at least stopping other people from hiding information.
2.0 Steganography
Steganographic techniques have been used since World War I and World War II, Chemicals were developed and used as secret inks that become visible when brought in contact with other chemicals. A brief history of steganography would give us a valuable background.
2.1 History
Greek historian Herodotus recorded the earliest records of steganography. When Histiaeus had to send a secret message to his son-in-law, he shaved the head of a slave and tattooed a message, he waited till the hair had grown before dispatching him in order to avoid detection. Another Greek history was when Demeratus scraped the wax off tablets and wrote messages on the underlying wood he then covered the wood with wax again to conceal the message. The tablets appear to be blank and unused when inspected.
Invisible ink has always been a popular method of steganography. Ancient Romans wrote between lines using invisible inks made from substances like milk, urine and fruit juices. When it is heated, the invisible ink would darken and become legible.
Gaspari Schotti wrote the earliest book on steganography in 1665 called Steganographica. A major development in the field occurred in 1883 with the publication of Auguste Kerckhoffs cryptographie militaire. Although the work was mostly on cryptography, it provides valuable principle in the design of new steganographic systems [SEL03].
2.2 Steganography In Principle
Bruce Schneier describes steganography as follows: Steganography serves to hide secret messages in other messages, such that the secret’s very existence is concealed [SCH96]. Another basic definition would simply be the act of hidden communication. Whatever definition you find suitable the fundamental principle should be the same. The message is the information to be hidden and may be an image, audio or anything that can be embedded into a bitstream. The cover and the embedded message create a stego-carrier that may require a stegokey. The stegokey is additional secret information such as a password. A possible formula for the process is represented as follows:
Cover medium + embedded message + stegokey = stego-medium
Hiding information in electronic media requires alterations to the media properties, which may introduce some form of degradation. This degradation can sometimes be visible and point to the signatures of the steganographic methods and tools. These signatures may actually broadcast the existence of the embedded message thereby defeating the purpose of steganography.
Steganographic system is considered broken:
If the attacker can detect the use of steganography.
If the attacker can read the embedded message.
Traditional cryptography succeeds by locking up messages in a mathematical safe, but steganography offers some stealth and exploit bit randomness. The possible techniques are as follows:
Noise: The simplest technique is to replace the noise in a sound or image file with the message. For example, one spot in a picture may have 220units of pink on a scale of 0 to 255. The average eye would not notice if that one spot was converted to 219 units of pink. It is possible to hide volumes of information below the threshold of perception if done systematically.
Spread information: Spreading the information increases the resilience to destruction, the algorithm distribute the information in such a way that not all the bits are required to reassemble the original data. Data usually falls into patterns, observing the patterns will enable you to exploit decision process of computers.
Randomness: Information can be hidden in place of the random bits. A few algorithms allow the broadcast of information without revealing its identity [WAY02].
2.3 Stegosystem
The steganographic system is referred to as the stegosystem it defines all the relationship with the data and processes involved.
2.4 Private And Public key Steganography
In order to effectively describe the private and public key steganography, it is important to understand the prisoner’s problem. The prisoners’ problem is considered the standard model for covert communication, first proposed by G.J Simmons in 1983. In this problem, two individuals attempt to communicate covertly without alerting a warden who controls the communication channel. One assumption is that the participants are allowed to share some secret information (encryption key) prior to imprisonment. The other assumption makes the problem more difficult; the warden is allowed to modify and read messages sent between prisoners.
2.4.1 Private-Key Steganography
In this scenario we assume that Alice and Bob are allowed to share a secret key prior to imprisonment. This gives them the opportunity to communicate covertly and defeat an active warden (Wendy). In all previous discussion, steganography simply encrypts a message in such a way that the ciphertext appears random while embedding the bits of the message in a known subliminal channel.
In the presence of an active warden, it would not be enough to embed a message in a known place. If Alice can alter the bits in an image then the warden can do the same thereby destroying message sent through the subliminal channel. A cryptographically secure pseudo-random generator, seeded by a secret key can be used to pick a subset of pixels in an image to conceal the data. If Wendy makes changes to the image, it would only scramble a small percentage of the channel bit since she does not know where they are. The scrambling can be corrected with an error-correcting code. Sharing keys before imprisonment gives a lot of freedom to Alice and Bob, and the public key can be used to sign the secret message, which provides additional security by preventing impersonation. Having to exchange keys far in advance of covert communication makes it a bit difficult in real life.
2.4.2 Public-Key Steganography
In this approach the secret key does not have to be agreed between Alice and Bob before imprisonment, but one must know the public key of the other. This is a more practical requirement in the real world.
1. Alice knows Bob’s public key, encrypts the message to obtain the ciphertext C
2. Alice embeds C in a channel known to Bob (also to known to Wendy) and sends
the stego to Bob.
3. Bob has no knowledge a message was hidden in the channel, if he suspects a
Message he would attempt to decrypt with his private key.
The problem with this approach is that Bob will just have to suspect a hidden message in every object he receives. This is not a serious problem if we assume that the steganographic technique is known to all and can be easily extracted. A more practical related problem is a when a large group of recipient is involved with everyone suspecting a hidden content intended for one recipient.
2.5 Steganographic Methods
The task of embedding a secret message could be performed by the combination of various techniques. Usually most steganographic programs follow these steps:
Finding the Redundant Bits.
Choosing the Cover Bits.
Embedding the Data.
2.5.1 Finding the redundant bits: The assumption that the least significant bits are redundant and can be replaced without analysing the cover object is used by most programmes. A more successful technique would be to embed data in all regions of an object that is not informative. In order to determine these regions the image is split into single bit planes and analysed. For every 88 block of each bit plane a test is done to determine the threshold, at certain point above the threshold a secret message can be inserted without significantly altering the cover object.
2.5.2 Choosing the cover bits: The number of bits required to embed a secret message is usually not equal to the number of redundant bit; it is then necessary to choose a subset of the redundant bit.
Most programmes embed the message in the first few redundant bits (n) at the beginning of a cover file than at the end of the cover file, this approach exposes the modified object to visual attack. A pseudo-random permutation ensures that the cover bits are chosen with the same probability by providing equal spreading among the redundant cover bits. This technique depends on a secret key applied to the positions of the redundant cover bits; it also reduces visual and statistical attack.
2.5.3 Embedding the data: There are several techniques for embedding secret messages, but the most common methods overwrite the cover bits with the encrypted secret message. Although this technique provides large capacity by embedding one bit of the secret message with one bit of the cover, it can be detected with visual and statistical attack.
Matrix encoding enables more than two secret bits to be encoded in the cover for every change. This is accomplished by encoding the value of the first secret bit as the parity and the value of the first plus the third cover bit and the second secret bit are xored to form the parity for the second and third cover bit. The drawback is that an increased rate reduces the capacity of the cover file [HET02].
2.6 Attacks on Steganography
Two aspects of attacks on steganography are detection and destruction of embedded message. Any object can be manipulated with the intent of destroying some hidden information whether an embedded message exist or not. [JOJ98].
Attacking steganographic algorithm is very similar to attacking cryptographic algorithms and similar techniques apply. If the original unmodified file used as a cover by the stegosystem is available to an attacker or investigator all he has to do is a bit-by-bit comparison with the suspect version in order to establish steganographic content. That is why publicly available files (sound files from CD or images from internet) should never be used as a cover. The strength of a steganographic algorithm depends on its ability to successfully withstand attacks.
A few of the possible attacks are as follows:
File Only: The attacker has access to the file and must determine if there is a message hidden inside. This is the weakest form of attack, but it is also the minimum threshold for a successful steganography. File only attack relies on statistical analysis to reveal the presence of a message in a file.
File an Original copy: In some cases the attacker may have a copy of the file with the encoded message and a copy of the original. If the two files are different, there must be some hidden information inside. The attacker can simply replace the message with the original to destroy the hidden information.
Reformat Attack: One possible attack is to change the format of the file. This can work because different file formats store data in different ways.
Compression Attack. One of the simplest forms of attack is to compress the file. Compression algorithms try to remove extraneous information from a file. A good example is the JPEG, where the image format is not accurate but rather an approximate of the original.
Another possible attack is to simply destroy the message or encode a new message if you have access to the algorithm. A message with hidden information maybe detectable, but this only becomes an issue if someone is trying to detect it. Detecting hidden information will save time with message elimination by processing only the messages with hidden information.
2.7 Steganalysis
Steganalysis is the art of discovering hidden data in covert messages, as in cryptanalysis we assume that the steganographic method is publicly known with exception of a secret key. A more practical definition of steganalysis is by Neil Johnson the art of discovering and rendering useless such covert messages Identifying the existence of a hidden message is often enough for an attacker, the messages are often fragile and the attacker can destroy the message without reading it. A steganalyst is one who applies steganalysis in an attempt to detect the existence of hidden information.
There are four basic approaches to a successful attack:
Visual or Aural attack.
Structural attack.
Statistical attack.
2.7.1 Visual Attack
The visual attack is a stego-only-attack that strips away part of the object in way that allows for a human to search for visual anomalies. The most common attack is to display the least significant bit of an object; Digital equipments such as cameras and scanners are not perfect and often leave echoes in the least significant bits. These completely random noises indicate the existence of a hidden message. The average ear can pick up subtle difference in sound. However, this is a very slow and costly attack.
2.7.2 Structural Attack
Steganographic algorithms leave behind a characteristic structure to the data. The format of the data file is often different when information is embedded. The attacker may detect the presence of a message by examining the statistical profile of the bits. These changes to the data file usually fall into easily detectable pattern that gives an indication of a hidden message.
2.7.3 Statistical Attack
Statistical attack is similar to visual attack. The fact that most programs relies on the assumption that least significant bit of a cover file is random and therefore overwritten with a secret message is not necessarily true. The idea of the statistical attack is to compare the frequency distribution of a potential cover file with the theoretically expected distribution of the cover file. If the new data does not have the same statistical profile as the standard data is expected to have, then it probably contains a hidden message.
2.8 Algorithms
A secure steganographic algorithm should hide message within other more extensive data. Many steganographic algorithms are weak against visual and statistical attacks, and those without these weaknesses allows for only small steganographic message. For instance MP3Stego withstand auditory attacks and all known statistical attack because of its low embedding rate but offers the message less than 1% capacity of the modified medium, while the F5 algorithm withstands both attacks (visual and statistical) but still offers a high steganographic capacity by improving the efficiency of the embedding process. In the following sections we would consider a few of the algorithms.
2.8.1 Jsteg: Derek Upham developed the algorithm and the standard uses lossy encoding to compress its data. The encoding process is split into lossy and non-lossy stages; the lossy (see glossary) stages use a discrete cosine transform and a quantization step to compress the image data and non-lossy stage uses Huffman coding to further compress the image data. This algorithm is resistance against visual attack and still offers high capacity for steganographic message; about 12.8% of the modified medium. The embedding mechanism skips all coefficients with the values 0 and 1 and replaces the least significant bits of the frequency coefficients with the secret message. However because Jsteg replaces bits, it is exposed to statistical attack. The effectiveness of this technique is reasonable, but not astounding [WES01].
2.8.2 Snow: Mathew Knaw developed this algorithm; it exploits the steganographic nature of whitespace. This allows messages to be hidden in ASCII text without affecting the visual appearance. Since spaces and tabs occur naturally their existence should not alert an observer. Snow uses Huffman encoding scheme for compression and the encryption algorithm is a 64-bit block cipher called ICE built into the algorithm. The algorithm is inefficient because it runs a 1-bit cipher feedback (CFB) mode but provides good security because different messages can be encrypted with the same password. CFB makes use of the first 64bits of the key as the initialisation vector (IV), but the key is encrypted. The program runs in two modes; message concealment and message extraction [KWA01].
Message concealment
Message -> optional compression -> optional encryption -> text concealment
Message extraction reverses the process
Extracted data from text -> optional decryption -> optional uncompression ->
Message.
2.8.3 F5: A new algorithm developed by Pfitzmann and Westfeld, the F5 algorithm is a secure high capacity JPEG steganography. F5 implements matrix encoding to improve the efficiency of the embedding process, if the full capacity of the steganogram (modified medium) is unused; matrix encoding decreases the number of necessary changes. F5 also employs permutative straddling to uniformly spread out the changes over the whole steganogram. The F5 algorithm accepts six inputs
Quality factor of the stego-image.
Input files (TIFF, BMP, JPEG or GIF).
Output file name.
File containing the secret message.
User password to be used as a seed for PRNG.
Comment to be inserted in the header.
F5 withstands statistical and visual attacks and still offers a large steganographic capacity [FMH02]. The major drawback of this algorithm is that it only works on JPEG files.
2.8.4 MP3Stego: Created by Fabien A.P. Petitcolas, MP3 is an effective steganographic medium because it employs lossy compression algorithm. MP3Stego will hide information in MP3 files during the compression process. The data is first compressed, encrypted and then hidden in the MP3 bit stream. MP3Stego uses 3DES encryption and the SHA-1 hash. The 3DES and a passphrase are used to protect the hidden data payload. SHA-1 is employed to generate pseudo random bits for use in the hiding process. MP3Stego can also be used as a copyright marking system for MP3 files. The algorithm withstands auditory attacks and all known statistical attack but offers low steganographic capacity..
2.9 Conclusion
Digital steganography and its derivatives are growing in use and application. Majority of steganographic algorithms suffers from fundamental weaknesses. Many of the older steganographic software leave behind statistical anomalies that can be detected by steganalysis.
The embedding techniques must not cause significant changes to the properties of the cover data such that the use of steganography is perceptible.
The development of attacks is necessary to assess security, the most common attacks are statistical and visual attack. Statistical test are superior to visual attack, this is because statistical attack is less dependent on the cover, which allows it to be automated and deployed on large scale.