Cracking the Agrippa Code

Technical description

I wrote the following in an attempt to synthesize the submissions, and put the complicated technical details into accessible language. Any errors or omissions in this summary description are my fault. I can't fully express how involved and dedicated all contestants were—constantly answering my probing questions and going beyond all expectations of speed, thoroughness and accuracy. I would also like to thank Abram Hindle for his helpful assistance. ~ Quinn

The contest proved that Agrippa does use encryption, although the cryptographic algorithm is very insecure, even for its release in 1992.

There are four main aspects of the program that were revealed by the submissions: the compiled binary, the main cryptographic algorithm, the encryption effect that runs after the poem finishes scrolling, and the "self-destruct" mechanism that prohibits running the program more than once.

The compiled binary

The Agrippa program was developed using Macintosh Allegro Common Lisp, possibly version 1.2.2 or 1.2.3 (see Robert), and bundled as a self-extracting binary. As mentioned in the archival documents, the auto-run, "virus-like" mechanism was never developed, and while there are some tricks to impede reverse-engineering the program, the programmer's boasts that it would be impossible to run through a debugger are unfounded. The Macintosh Lisp compiler uses Lempel–Ziv–Welch (LZW) compression, with the poem stored as encrypted text in a string variable ("zi") (see Robert for a suggestion that variable names may correspond to non-English words). This variable is encoded in the MacRoman character set, but only uses ASCII characters (low in the ASCII table), so the visible effect is indistinguishable from ASCII. Offset values are known for most aspects of the Agrippa binary (see Ayal).

As was known at the time, and exploited by a number of the contestants, Macintosh Lisp contains a error (not binding a keyboard handler at a particular point) that allows one to drop out of the program and into a Lisp Run-Eval-Print-Loop (REPL) console. With access to the REPL, arbitrary code may be run and interaction with the variables and routines reveals much of the source contents (however, the REPL is of limited utility because many variables are uninitialized, so constants are incorrect). By exploiting this error, it was discovered that the shipping code does not exactly match the archived (partial) source code printout. This suggests that the archival document was from an earlier stage of development. Several routines either changed names or no longer exist, e.g., UN-WAYMUTE-IT, UN-PERMUTE-IT, and UN-ROLL-THE-TEXT (which is thought to likely correspond to UN-ROLL-ZI) (see Brian).

The cryptographic algorithm

The Agrippa poem is pre-encrypted and stored as a string variable in the program. The ciphertext is not visible in the binary due to the LZW compression. All contestants discovered that the cryptographic algorithm is a custom RSA function that encrypts in three-character blocks, with additional "bit-scrambling" permutations (a kind of simple substitution cipher). Because the poem comes pre-encrypted there is no encryption routine in the program: the program simply loads the ciphertext, decrypts it to memory, and then abandons the plaintext (still in memory). As proof of this, the same decryption routine can work on a "fresh" disk as well as previously-run "corrupted" disk (more on the "corruption" below) (see Nikita's reverse engineering of the LZW compression that enables running the decryption routine on the compiled program in either state).

The cryptography is applied identically and independently to three-character blocks (resulting in cryptographic weaknesses, see below). On each block the cryptographic routine performs three distinct steps: first a bit permutation (substitution cipher) that converts three 8-bit characters into two 12-bit characters, then an RSA transformation using a 12-bit key, and finally another bit permutation that converts the two 12-bit characters back into three 8-bit characters (corresponding to ASCII-encoded text) (see Jeremy and Nikita's graphical depiction of this process).

The RSA cryptography has a public modulus of 4097 (from primes 17 x 241), and a public exponent of 11. Due to the extremely short bit-length of the key, the private exponent can be found easily (either through brute-force or the Chinese Remainder Theorem), and was revealed to be 3491 (the private modulus is always the same as the public modulus, 4097). So, the RSA encryption process is to take some number x to the 3491st power, modulus 4097—however, remember that the encryption routine is not present in Agrippa, and was reverse-engineered once the private exponent was discovered (see Wikipedia's article on RSA for more information). Evidently the anonymous programmer was either undecided or confused about what kind of encryption to employ, remarking in a letter that the cryptography would be similar to both DES and RSA (the prior being a symmetrical key algorithm, and the latter asymmetrical): "another info source would by anything on the Data Encryption standard or mathematical works by Rivest, Shanker, Aldemann" (Letter from Programmer). Likewise, the cryptographic routines are named with references to DES, even though they are RSA.

The anonymous programmer attempted to strengthen the cryptography by using a more sophisticated mechanism for block enciphering (where the same key is re-used for each block, but ideally "mixed" with the neighboring block); the programmer remarked, "The value, both character and numerical, of any particular character is determined by the characters next to it, which from a cryptoanalysis or code-breaking point of view is an utter nightmare" (Letter from Programmer). Yet, in reality the encryption is applied identically to each three-character block, in a mode of operation known as Electronic Codebook (ECB). This simple mode of operation has many cryptographic weaknesses, most visibly the fact that identical blocks will encrypt to the same result. For example, in the plaintext of the poem there are numerous sections of three consecutive spaces which encrypt to "space, e with circumflex, backslash", or in decimal 20 136 92 (with quotes added for clarity: " ê\") (see Robert and Ayal). Even without reverse-engineering the algorithm, this weakness is significant and exposes the ciphertext to old-fashioned statistical analysis (where common three-letter words or three-letter combinations would be visible in the ciphertext). Similarly, the three-character block size and ECB mode of operation explain the curious two spaces at the end of the poem (earlier thought to be a marker for the end of the poem, a good guess but incorrect, see Freek Wiedijk's contribution to The Agrippa Files). Rather, the two spaces at the end of the poem are needed to pad the block, otherwise the cryptographic routine would fail.

When Agrippa was released in 1992 the United States famously classified all cryptographic materials as munitions and restricted the export of "strong" cryptography. Agrippa was seen by cipherpunks and computer scientists as a challenge against these stifling and backwards cryptography export controls, yet given the extremely short key-length (12 bits), Agrippa would never have been prevented from export, which in 1992 permitted an RSA modulus of 512 bits (see RSA). Indeed, the political involvement of John Perry Barlow and the Electronic Frontier Foundation (see Letter from John Perry Barlow to Kevin Begos Jr.) was superfluous given that Agrippa would not have been considered strong cryptography.

Encryption effect

When Agrippa is run the poem slowly scrolls down the screen, and when finished it displays an encryption effect, seemingly to evoke the idea that the poem is re-encrypted. As discussed above, there is no RSA encryption algorithm in the binary, however, by running the plaintext through a permutation routine (re-purposed from the main decryption algorithm) the plaintext is effectively encrypted using a simple substitution cipher. While this is a re-encryption of the plaintext, it is actually only for visual effect since the cryptotext generated by the substitution cipher is not saved back to disk, but is instead displayed and then abandoned. Once Agrippa is run, only one change is saved to disk: the "self-destruct" mechanism.

The "self-destruct" mechanism

Agrippa famously runs only a single time. When originally released there was considerable discussion about a "virus-like" mechanism or a trigger that would prevent running Agrippa more than once, and given the uncertainty around the mechanism The Agrippa Files had to go to some lengths to forensically extract the Agrippa disk image without potentially triggering a self-destruction of the original diskette. Agrippa does self-destruct once run, which is effectively irreversible. In this sense, there is no "round trip".

There are a number of possible mechanisms to cause a self-modifying program to run only once. For example, one could flip a switch in the binary that alerts the main routine that the program has previously been run, or re-encrypt the data and throw the key away (possibly generated dynamically with runtime variables), and so on. The anonymous programmer of Agrippa choose a simple mechanism: write a large string of data over a portion of the binary that contains necessary run routines. In the archived source code printout this self-destruct mechanism wrote 40,000 ASCII characters (ASCII code 255) to a specified offset, leaving a string of 320,000 binary 1's to corrupt the program (see Brian and Nikita). Evidently, at some later stage of development someone thought it would be more artistic to write a fake genetic sequence (CTAG's) instead of merely 1's.

This self-destruct routine is called MAKE-SOME-SHIT, and is located in the archival source code listing halfway across page three and the missing page four. It is possible that page four of the source code listing was omitted from the archive due to the presence of the word "SHIT" in the routine name. It was revealed that MAKE-SOME-SHIT uses a fixed seed to call the Mac Toolbox Random Number Generator (RNG) (see Brian), which saves 6000 characters (either C, T, A, or G) to the disk (at offset 680168). While the offset chosen for self-destruction does effectively corrupt the program, it does not destroy the ciphertext. Two decryption implementations can decrypt the poem from either a "fresh" or corrupted binary, since the self-destruct mechanism left the ciphertext intact (see Nikita's nikita_decode_from_file.py & Brian's agrippa-extract-ciphertext.cpp). In this sense, there is indeed a round trip.

The submissions

The winner, Robert Xiao's submission
integ3r supplied a binwalk output to aid further reverse engineering
Jeremy Cooper at Cryptography Research, Inc. took second place with his submission
Ayal Ryger's submission
Nikita Shulga's (aka malfet) submission
Brian Carnes's submission

The winners

Robert will be receiving his copies of all published William Gibson books shortly (generously provided by the Faculty of Information at the University of Toronto). For his narrow defeat but excellent submission Jeremy will receive a copy of William Gibson's Neuromancer. All the other submissions are excellent and invaluable, and should be consulted by anyone interested in Agrippa.

All submissions (and implementations) were submitted under a Creative Commons Attribution-NonCommercial 3.0 Unported License and are available for use and derivation.

Agrippa in action

All submissions included an implementation of the Agrippa decryption algorithm, however, Ayal Ryger implemented the decryption and (reverse-engineered) encryption in Javascript so you can run it in your web browser. See Agrippa in action!

Research and website by Quinn DuPont