| Author |
Replies: 190 / Views: 8,496 |
|
|
|
Pillar of the Community
 United States
8904 Posts |
|
|
Valued Member
Canada
287 Posts |
Oh no, I was told that screen will not come up if I upgraded. They were right. Now it simply reboots without the screen, no need to press Ctrl-Alt-Delete. They didn't lie.
I have a spreadsheet I made that I put your code on line 2 and I type on line 1. As I type it fills in the rest of the code with the matching letters.
Now take a list of 1012 three letter words and type them into the first word one at a time. After each word, type in each three letter word that fits into the next 3 letter word, in this case matching the middle letter. And remembering no words will contain an 'S'.
Once you have found the two 3 letter words, take out the list of 3904 four letter words and start the same process with the second word in this puzzle.
Oh, and remember to save your place in the list from time to time.
|
|
Pillar of the Community
 United States
8904 Posts |
This is the theory of my code solving: Quote: My solving program begins with a random KEY, which is the 26 letters of the alphabet arranged in any order. It then decrypts the ciphertext (Ct) according to this KEY and 'looks' at the result. The program then makes a one letter change to the KEY, by exchanging one of the letters with the letter next to it (flips a pair of letters). It decrypts the ciphertext with this KEY and 'looks' at the result. The program then decides, 'better or worse.'
It keeps the better KEY and changes it by one letter (flips a different pair of letters). It uses this changed KEY to decrypt the ciphertext and 'looks' at the result and again makes the 'better or worse' decision. In the program it does this 325 times (26 x 25)/2 and prints the 'best' KEY and the resulting decryption (plaintext) of the ciphertext on the screen. Whenever it finds a better KEY, at the end of a round, it prints the KEY and the plaintext on the screen.
It continues in this manner until, after going through a complete round of flipping (325 flips), it cannot find a 'better' KEY than the one from the previous round. At that point it generates a completely new KEY randomly and begins the next round. The program never really 'knows' when it has found the correct KEY, since it has no way of telling whether there is in fact a 'better' KEY that has not yet been tried. For a monoalphabetic cipher, the number of different possible KEYs is 26! (factorial), which is a VERY large number. The average rate of KEY generation by this program is about 50,000 KEYs per second. At this rate it would take approximately 256 trillion years (2.557 x 10^14) of continuous operation to generate and test all the possible KEYs. So, it is up to the user to look at the plaintext on the screen and determine if the program has found the correct KEY or a KEY that is very close to correct and stop the process.
The algorithm that controls the flipping and the length of the round is called a random KEY generator. There are a number of different KEY generator algorithms, but experimentation has shown that the random approach is usually more effective, possibly because it is extremely fast. The primary requirement for this algorithm is that it must always produce a different KEY.
The algorithm that determines 'better or worse' is called the scoring algorithm. It is also called a 'hill climber' since the process can be visualized as one of climbing a hill. A 'worse' result would be a step down the hill and a 'better' result is a step up the hill. When a long series of changes to the KEY (a round) does not result in a step up the hill the process is said to have 'plateaued.' When a plateau is reached, it is time to leave the hill and begin climbing another one (new random KEY).
Now, the difficulty with this approach to finding the correct KEY using a computer program is the problem of 'teaching' the program how to read and therefore recognize which choice of plaintext is 'better or worse.'
Here is an example ciphertext - GLOX OX NWGGWD. Depending upon the KEY used, here are two possible decrypts of this ciphertext - THIS IS BETTER or - THIS IS RATTAN. In this example the first decrypt assumes N=B and in the second N=R and so forth. For a person who can read and is familiar with 'normal' English, the first sentence will appear to be more likely correct (better) than the second one. But, without some means to tell the computer program that the five letter word 'BETTER' is more likely correct than the five letter word 'RATTAN,' it has no basis for deciding the 'better or worse' question.
This is where the frequency of occurrence profile data (NGraph file) comes in. The tetragraph (4Graph) file is a table of data that is simply the count of the number of times a particular four letter group appears in a sample of English text. This table has an entry for every possible four letter group, from AAAA to ZZZZ. That would be 26 x 26 x 26 x 26 entries (456,976). A fairly large number. In actuality, the table used by the computer program only contains those tetragraphs that really appear in a sample of English text. The English.tet file that is the default NGraph file for the program contains 52,483 entries. If the program 'looks' in the table and can't find an entry, the value is assumed to be zero (0). So, most of the possible four letter groups are not actually in the table, only the ones that actually appeared in the sample text used to create the file.
One additional consideration. In the special case of the monoalphabetic ciphertext that does not have correct word lengths the example above would really appear like this - THISISBETTER Vs. THISISRATTAN. In this example there are eight (8) tetragraphs in the sentence (Number of characters (12) - letters in group (4) - 1). The first one is THIS. The second one is HISI. The third ISIS and so forth. Therefore, the program that counts the occurrence of letter groups in a sample of English text must also ignore correct word lengths during the count by eliminating the spaces between words.
In the example above, when the program 'looks up' the four letter group 'THIS' in the table it will find a relatively large number. In the English.tet file that value is 927. That four letter group appears in both possible decrypt sentences so the number of occurrences of that group is of no help in making the 'better or worse' decision. However when the program evaluates the 'BETT' group Vs. the 'RATT' group it finds that 'BETT' occurs 90 times and 'RATT' occurs 42 times. From the programs standpoint, the 'BETT' group is 'twice as good' as 'RATT.' The next tetragraph in the sentence is 'ETTE,' which occurs 210 times and 'ATTA' which only occurs 31 times. 'RATTAN' is looking worse and worse. The program soon decides that 'BETTER' is more likely to be correct than 'RATTAN.'
Interestingly, when word lengths are ignored, an analysis of common English text produces a tetragraph profile in which the four letter group 'OFTH' is the most frequently appearing tetragraph. The high occurrence begin with the appearance of wording like OFTHE, OFTHIS or OFTHAT and so forth.
This situation, of course, is true for any other NGraph file. A trigraph profile would have a high occurrence of 'THE' and 'OFT' and 'FTH' and so forth. The three letter group 'AND' would certainly be one that would have a high occurrence, but, as it turns out, only about half of the number of occurrences of 'THE.' That aspect of the profile could make it difficult for the program to successfully solve a ciphertext containing a number of occurrences of the word 'AND' and no instances of the word 'THE,' assuming the remainder of the text is 'normal.' Obviously, this would probably not be much of a problem for an English reader.
That is the weakness of this type of ciphertext solver. It can be fooled or foiled by creating a ciphertext that does not exhibit the 'normal' frequency profiles of English text. The program evaluates the frequency profile the ciphertext and displays the results on the screen in the form of an Index of Coincidence value. This is a value that expresses the likelihood that any pair of letters in a message text are equal to each other. The Index of Coincidence for the 26 letters of the English alphabet as used in normal plaintext is 0.0667, while for a text that consists of letters of the English alphabet randomly chosen is 0.0385.
It is fairly easy to produce text whose single letter or even letter pair (digraph) frequency profile is not 'normal.' The sentence - 'A quick brown fox jumps over the lazy dog' is a classic example. The Index of Coincidence of this sentence is .0189! In this sentence every letter in the alphabet appears at least once, which is extremely unusual in a 'normal' English sample of this size (33 letters). However, it becomes more difficult to achieve a significant deviation from a trigraph or tetragraph profile. At the other end of the scale, pentagraphs (5 letters) and hexagraphs (six letters), distortion can be achieved by limiting the text to three and four letter words. Interestingly, the program can usually 'find' the words 'over the' in the brown fox sentence, using the English.tet NGraph file and the words 'a quick' by using a trigraph file.
Another problem with this sentence is its length. The shorter the ciphertext, the more difficult it is to find a correct solution. Ciphertext that contains less than 50 characters are 'difficult' merely because of their length. On the other hand, ciphertext of 250 characters, is usually sufficient for the program to be able to correctly score a KEY. Naturally, the longer the ciphertext, the longer it takes the program to do the actual decrypt and perform the lookup to develop a score. When faced with a large ciphertext, a solution may be found more quickly by merely using a portion of the ciphertext (250-300 characters).
There remains yet another consideration in the problem of teaching the program how to 'read.' It is, how to choose the sample text with which to do the frequency profile. The English.tet file that is the default file for the program was developed using the text of the book "Life on the Mississippi" by Mark Twain. That sample contains about 146,000 words and 664,000 characters. This seems to be adequate for creating most frequency profiles (NGraphs) and the tetragraph profile seems to be most efficient, of the four choices (3 through 6 NGraphs) available with the program. However, in cases where the ciphertext frequency profile is significantly different than the frequency profile of the "Life on the Mississippi" text, the program is unable to 'read' the plaintext decryption and make a correct 'better or worse' decision. In such cases, it is possible to improve the program's performance by supplying a NGraph frequency profile data file that has been created using a different sample text, one that more closely matches the frequency profile of the ciphertext.
A simple example would be the case of ciphertext that is concealing military message traffic. Military messages, because of their unique syntax and word usage have a profile that is somewhat different than 'normal' English. So, to facilitate solving military message ciphertext, a sample of military text could be used to create a unique 'military' NGraph file. The same should be true for foreign language ciphertext. Although the program has not been tested with a foreign language ciphertext, it is theoretically possible, by using a sample of the foreign language text, to create a suitable NGraph file. The only constraint in this situation is the makeup of the plaintext alphabet. This program assumes the English alphabet 'abcdefg....' This would rule out a foreign language that uses a different alphabet.
In summary, the operation of this program is relatively simple. It finds the correct KEY in a manner similar to the way a person would do it when unaided by a computer. First it creates a possible KEY and then uses it to produce a decrypt. It scores the decrypt. Then it changes the KEY to one that has yet to be evaluated and repeats the scoring of the resulting decrypt. Then it continues in that manner until the user stops the program. The advantage the program has over a person is the speed in which it can produce KEYs and decrypt the ciphertext. The advantage the person has over the program is the ability to recognize a correct result.
Edited by Moe145 01/26/2011 11:37 pm
|
|
Valued Member
Canada
287 Posts |
Don't suppose you have a copy of the 'English.tet' file. This sounds like a mighty powerful tool. Certainly would help to automate the process. Might take some of hte fun out if it as well.
So far, I have not found a four letter word that once I typed in the 2 three letter words. Which is why I think that 'D > D'.
|
|
Valued Member
Canada
287 Posts |
Did you create that program Moe? It does sound impressive.
|
|
Valued Member
Canada
287 Posts |
Took a break, opened a beer, and grabbed a roll of pennies from a half emptied box. To my surprise it was all Lincoln's. Not odd for you maybe, but here in Canada I generally get about 10% US coins when I roll search. Better yet, 30 of them were wheat cents ranging from 1927 to 1959. Nothing special though. So I just grabbed another roll and it is the same. Must be my lucky night. Edit: second roll had 34 wheats, including three 1943 steel (one needs rust remover), and a 1912S.
Edited by rikcando 01/27/2011 12:37 am
|
|
Valued Member
Canada
287 Posts |
Well, I'm off to bed. Mu best guess. D > G
|
|
Valued Member
United States
185 Posts |
Does NASA use your theory of code solving? 
|
|
Pillar of the Community
 United States
8904 Posts |
Quote: Does NASA use your theory of code solving? No, but the Klingons do...  
|
|
Valued Member
Canada
287 Posts |
Battle hardened brute force decryption. Sounds like a Klingon tactic.
Spending the day with my darling daughter. Back to battle the Klingons later tonight.
|
|
Pillar of the Community
 United States
8904 Posts |
Are you giving up? 
|
|
Valued Member
Canada
287 Posts |
No, but at the current rate it could possibly take me until July to finish it. July, 2012.
Seriously, I am working on it but considering there are 580 three letter words to use (reduced from 1019 by removing those with an 'S' and some uncommon ones), and 1900 four letter words (reduced from 3900), and then check those sets against a reduced list of 4600 five letter words... this may take a while to finish. But I am working on it.
|
|
Valued Member
Canada
287 Posts |
... FIX TACO DIP _|_|CO_| ..nope FIX ....
... FIX YOUR KID _|_|UR_| ..nope FIX ....
Edited by rikcando 01/30/2011 10:05 am
|
|
Valued Member
Canada
287 Posts |
... GOD LIKE ROW _|_|KE_| ..nope GOD ....
still working on it actually, the above does work with TAKEN but could not work it into the rest of the sentence(s) since 'Y' = vowel. (R=vowel though)
Edited by rikcando 01/31/2011 09:39 am
|
|
Valued Member
Canada
287 Posts |
I went back to the beginning and trying a slightly different approach. more extensive, but more accurate. I have been able to greatly reduce the number of possible words by expanding it out to look at the double JJ and double SS. Further behind but this time tomorrow I'll be just as far ahead.
|
| |
Replies: 190 / Views: 8,496 |