Stefan's Vigenere Cypher Challenge

and some C programmes to crack arbitrary cyphertext

STOP! If you're a windows script kiddie with little idea or interest in Linux, this is not the page for you!

These scripts are written to run on a Linux / Unix system. I have no interest in Windows systems. I will ignore all requests to email you a windows binary.

If you're a Linux / Unix user that's genuinely tried to solve your own problem but still have questions, I'd be really delighted to respond and help you. Likewise, if you develop a windows equivalent of what I've done here, I'd be happy to hear from you and add a reference to your work onto my pages.

Some time ago, I read a popular book on the history of encryption and code breaking called The Code Book by Simon Singh. I really enjoyed reading a little about the history of the Enigma machine during World War 2.

One day, a friend of my daughter Erin set a puzzle for her to decrypt a short piece of text. When Erin asked for help, I pointed her at the book and told her to do some reading. As I had suspected, the code she'd been set was a simple Caesar shifted code. It was relatively easy for her to decypher when she'd read the book and learned a little about the different ways that things can be encrypted.

After some time, I decided to set her and her friend a new and more tricky challenge. Here is the page I handed to them. (Microsoft Word format). This is the C programme I used to generate the cyphertext. If you have a C compiler handy, you can try it for yourself.

Of course, it is a vigenere cypher. Although based on the Caesar shift, its quite a lot more complex to crack because you can't simply use brute force to crack it. In fact, the code stood uncracked for generations. Eventually, Charles Babbage developed a process for cracking the code, and although somewhat tedious, it is very reliable providing the number of characters in your cyphertext is sufficiently long for you to be able to make a meaningful frequency analysis.

Erin's friend assured me that he could find a web site that would crack the cyphertext for him. I hadn't thought that he'd choose such a cowards way of finding the solution, but I got to wondering whether it really could be that easy to write an automated decryption tool. After half an hour of web searching, finding half a dozen so-called on-line decoding tools for the vigenere cypher and discovering that the authors' claims of being able to decode text were all overstated, I started to feel a little more comfortable that my challenge wasn't going to yield that easily. In fact, I have never heard another word from this person again on this subject and can only presume that he, as I, could not find a web based tool to do the 'hard work' of cracking the vigenere cypher.

None of the web based tools I found came even close. Not a single one could even guess the length of the key that I'd chosen, let along decypher the key and the text. So I got to thinking whether there are just a hell of a lot of incompetent jocks out there on the Internet, or whether the vigenere cypher really is difficult to automatically decypher.

One hour later, I had written myself another C programme, barely more complicated than the simple encryption tool I'd written, and I was amazed that I could decypher arbitrary vigenere codes at will. My simple tool certainly had no trouble with the cyphertext I'd set my daughter and her friend, and which those other web based tools had failed to crack. Here is a copy of the vigenere cracking tool I came up with.

The trick I employed in doing the decyphering was to find a suitably good, long text (ie a book) to use as a basis for calculating the natural frequency of each letter in the English language. My favourite 'English' book is Pride and Prejudice, by Jane Austen, and several years ago I discovered that you can find this and many other fine text right here on the Internet. If that link breaks over time, go to the Gutenberg home page and search from there.

When you're decoding an unknown vigenere cypher, it can help if you know whether it is written in US or UK style english - ie with American spelling or English spelling. You will find that the letter Z, for example, has a much higher frequency in US English than in British English - because of the way Americans use the ending 'ize' in place of the British 'ise'. The copy of Pride and Prejudice I used followed the British way of spelling, which is what I naturally use as an Australian.

Anyway, with the tools I provide on this page, you should be able to decypher arbitrary vigenere text in a fraction of a second. It will certainly take you far longer to download the code, compile it and type the command than it will take to crack the seemingly impossible to crack vigenere code.

Stefan Keller-Tuberg. June 2006.

Contact me

Return to my home page.