Back to Table of Contents
Brian Carnes's submission:
Table of contents supplied for easier reference:
A program is run, which displays a poem that slowly scrolls on the screen. At program termination, the program will not run again (ever). The display was evidently intended as an ephemeral, one-shot performance.
The patter surrounding this behavior, as presented by the contest, and presumably surrounding the original performance, is that the program and poem are encrypted at the end of the run.
A long string of DNA base pairs is found in the program binary file after the run, perhaps causing the inability to run again, or perhaps just a message in addition to the program being disabled by a different method. Does this string encode the poem? Does this string have any meaning?
It is possible the poem starts off unencrypted, and is then encrypted at program close, as the patter states.
It is more likely that the poem starts off encrypted (to frustrate easy extraction of the poem). At termination the ability of the poem to be decrypted is either removed by overwriting the key, or the ciphertext itself is overwritten, or the program as a whole is rendered inoperable. Or any combination of the above.
The partial scanned source code, if it can be trusted, certainly indicates the poem starts off encrypted and is being decrypted for display…
The puzzle presented by www.crackingagrippa.net presents multiple issues to be dealt with.
The nominal core of the challenge is to look for the mechanism – possibly encryption – that locks up the poem after it is displayed once. Let us address that challenge after diving deeper on the other program behavior...
How the poem is stored to begin with is the first puzzle to be cracked. We must learn how the poem was encoded, and then determine if the poem survives into the output form at all.
No obvious evidence of the poem in plaintext is present in the binary. It is reported by Wiedijk on Agrippa Files that upon running, the full plaintext is present in memory. Some combination of compression/encoding and decryption must be present. (It turns out, both).
Obtaining the pure ciphertext from the system is the first challenge, after which determination and validation of the decryption algorithm can proceed. After that, on to the matter of the DNA base pairs and inability to run a second time.
Setting up an emulated environment for MacOS7 and the tools to investigate that environment is the first hurdle. The open-source Mini vMac emulator and vintage tools from the 68k can be brought to bear on this problem.
The instructions at Agrippa Files were sufficient to get a first run of the poem, and through cloning and replacing the virtual floppy disk image, could be run repeatedly.
The final analysis used a build of Mini vMac, modified with tracing extensions capable of providing many additional insights into program operation.
To obtain the ciphertext, some detailed inspection of the running system needed to be performed.
The program was evidently written to intentionally frustrate these sorts of efforts. It tries to prevent copies of itself from being present anywhere on the host system; it disables various methods of interrupting and debugging the program; etc.
As mentioned on Agrippa Files, the plaintext is present in memory during the display portion of the program run. By observing the populating of this area, the ciphertext source could be obtained.
It turns out the ciphertext is decrypted in place at the same location as the plaintext, so it was straightforward to extract it without yet confirming the decryption algorithm.
Having obtained it, it turns out this ciphertext is itself also not present on disk.
The program was written in Macintosh Lisp (MACL), consistent with the supposed source code printouts. The distributed program actually consists of compiled lisp, thus some investigation required interpreting 68k assembly, virtual lisp machine mechanisms, tail call program flow, and other compiler internals.
It turns out that the ciphertext is stored as a string within the code segment of the compiled MACL program, which itself is compressed along with the rest of the compiled lisp code in a MACL-specific routine. This was presumed to be similar to UPX (upx.sourceforge.net) and other executable packing strategies. (See later for details).
It is possible that the unpacking routine itself has been modified or possesses an encryption key. As the resulting ciphertext obtained is compatible with the partial source code presented at Agrippa Files, and there is no indication of lower-level manipulation of this part of program startup, this appears unlikely.
Upon allowing the executable to uncompress itself, the encrypted poem is then present alongside the compiled lisp code.
This is again consistent with the scanned images of source code, which shows a large block of unprintable text embedded within the source – presumably the quoted encrypted poem.
The incomplete nature of the scans and the low quality of the image, combined with the unprintable nature of many of the characters, of course made recovering the ciphertext from the images impossible.
Instrumenting the emulator to detect certain conditions and then snapshot its state to disk allows complete recovery of the ciphertext at the precise moment after MACL decompression, and before Agrippa begins to decrypt its poem.
The extracted ciphertext can be obtained below:
It is available in binary, text (hex dump), and C header file formats. If you wish to perform your own analysis, stop reading here. Go obtain the ciphertext (which was missing from the original challenge), and have fun!
Given the discussions mentioning multiple versions of the poem and multiple versions of the program, no assumptions were made that the partial source from the scanned images represents the source used to build the version being run.
Program tracing confirms that the decryption algorithm is, however, essentially identical to what was present in the scanned LISP source.
Source code for a decryption routine capable of producing the entire poem can be found here: agrippa-decrypt.cpp
The decryption algorithm mentions “DES” in one of the helper function names, but bears little relation to DES; rather, it is a simple RSA-type algorithm operating on blocks of 3 characters at a time, using a modulus of 4097 = 17 * 241 with exponent 11. See source code above for additional details: extra byte scrambling occurs at various stages. (Note that the factorization above trivially yields the private & public key pairs (4097,3491) and (4097,11), but this proved unimportant, as the program was determined to not actually encrypt at program close as the patter implied...)
The emulated system was instrumented to facilitate experimentation with the movement of the poem through the system.
No evidence was found for any encryption routines moving from the plaintext into the DNA base pair sequence, or any other form of cipher. All evidence points to the presence of just the pre-encrypted poem embedded in the program, which is decrypted for display, and then abandoned.
The brief display of "garbage" at the end of a live run appears to be a red herring, unrelated to either the (non-existent) re-encryption algorithm, nor to the program modification and DNA base pairs.
The modifications of the program that prevents it running it a 2nd time by injecting the DNA base pairs actually happens at program startup, before the poem starts scrolling. This can be verified by halting/rebooting a virtual run of the poem, or the poem running on a physical machine, and observing that it cannot run a 2nd time already.
This also means that had the computer froze or lost power during the original performance, the disk would be unusable even though the full poem had not been presented.
The above modification appears to be the work of a routine with the Lisp symbol name of "MAKE-SOME-SHIT".
I can speculate that the missing page 4 of the source code scanned images was possibly destroyed/withheld due to the presence of that last word? But that last page also contains the logic behind the base pairs, which is the remaining crux of this puzzle.
The program binary has 6000 bytes (starting at offset 680168) replaced with a DNA sequence. With two bits of information per base pair, there are 12,000 bits, or 1500 bytes of information.
Taking the plaintext of the poem and compressing it using modern compressors (gzip, bzip2, etc. at highest compression levels) results in >4700 bytes to encode the 9918 bytes of plaintext.
It is unlikely that a program of such a vintage, apparently written in pure lisp, and ostensibly missing just one page of code, is encrypting the poem within the 1500 bytes – a feat beyond the standard compressors a decade later.
These bytes written to the middle of this executable render the program inoperable, presumably from overwriting essential bits of machine code, or corrupting the structures underlying the unpacking/uncompression routines part of MACL startup.
It is possible that this exact offset and length was chosen to only overwrite "dead code" and the program itself is self-checking and intentionally aborting, but it is more likely this (like with the earlier compression) is instead just another behavior inherited from the MACL runtime, which is either finding its data compromised or the MACL startup routines themselves are being corrupted.
It is most likely this is a simple self-destruct, and is irreversible. Determining the structure and layout of the executable, and what the range of overwritten bytes were used for during initial run operation, could prove this.
Alternatively if the original ciphertext can be recovered, then the destruction of the information is "reversible", in a way, even if the program is non-functional. More on this later.
But what do the base pairs represent? If not the poem, is there any message or structure present within the 6000 character string?
If the scanned source printout is trustworthy, we are missing just one page of source, only a portion of which can create the base pair string. Thus the complexity of encoding or encrypting is probably extremely low.
Repeated running of the program produced identical DNA strings, so its generation is deterministic.
Further instrumentation showed many calls through Mac Toolbox RNG routine. So the next focus became looking for converting of the (low-quality) RNG stream into DNA base pairs, with a fixed initial seed.
With a little experimentation, the proper seed and a simple routine for producing the 6000 base-pair sequence was found.
Code to generate the DNA sequence can be found at the above link, in the file DNA-generate.cpp, as well as text files containing the DNA string extracted from a live run, which matches the generated string exactly.
We now know the program does self-destruct and renders itself inoperable with random DNA base pairs. However, further probing determines it does not effectively destroy the compressed ciphertext or the decryption routines.
The original programmer probably experimentally chose various offsets and lengths, and after finding one which caused the program to fail to execute, left it at that without being aware whether the portions containing the ciphertext itself were actually destroyed.
The MACL runtime uses a LZW variant to uncompress and load the precompiled Lisp code with bundled ciphertext.
The segment containing the ciphertext and Agrippa code is entirely outside of the region of program data destroyed by the DNA base pairs. (See code below for details).
Thus it is possible to recover the ciphertext from Agrippa even after it has self-destructed.
Run agrippa-extract-ciphertext.cpp, giving it the filename of the Agrippa executable (just the data fork is enough).
It will display whether the executable has been "destroyed" or not, and then extract the ciphertext regardless. The ciphertext can then be fed into the previous program to extract the poem.
From the point of view of being able to run the program under OS7 and get the full, intended Agrippa scrolling presentation – the program does self-destruct irreversibly, short of restoring those 6000 trashed bytes.
However, the destruction chosen by the programmer does not render the poem payload irretrievable, due to the continued presence of the compressed ciphertext and even the decryption routine. This compiled module (a fasl) could be loaded into any MACL runtime, or extracted directly even outside of MACL, as I have done above.
I have had no contact with the original author or any other Agrippa specific knowledge other than that which is available on the Agrippa Files page.
While I grew up happily reading Neuromancer and the Burning Chrome short story collection, this was the first time I had heard of the Agrippa art project.
Thanks to the folks at the UCSB Agrippa Files for the archive.
Thanks to Quinn DuPont of the Agrippa Challenge website for bringing this multifaceted puzzle to my attention.
$ ./agrippa-extract-ciphertext Agrippa This copy of agrippa is: Pristine Recovered ciphertext: c1 67 bb 7c 6f 35 ee a5 7d 71 7a f5 02 54 c9 1a 8f 35 35 1a bf 9f 88 a4 fa 51 f8 9b 3a 5f 35 08 79 05 fb e3 01 a5 5e 01 21 79 3f 8a a6 97 00 7d ...
$ ./agrippa-extract-ciphertext Agrippa_disabled This copy of agrippa is: Destroyed Recovered ciphertext: c1 67 bb 7c 6f 35 ee a5 7d 71 7a f5 02 54 c9 1a 8f 35 35 1a bf 9f 88 a4 fa 51 f8 9b 3a 5f 35 08 79 05 fb e3 01 a5 5e 01 21 79 3f 8a a6 97 00 7d ...
$ ./agrippa-decrypt "I hesitated before untying the bow that bound this book together. A black book: ALBUMS CA. AGRIPPA Order Extra Leaves By Letter and Name ...
Hitting Command-M during the Lisp runtime startup will interrupt the program during a window of vulnerability before the main routine disables interrupts and begins catching errors. The best time is shortly after the white display box appears, hit command-M once, and then wait several seconds.
You'll get an interactive lisp shell in which you can query the symbol table (including compiled lisp routines), and even inject lisp code to be run (with many restrictions in this limited runtime).
This is less useful than it could be, as the runtime is only partially initialized - all of the defvars are uninitialized and so the compiled routines will use incorrect constants while executing.
Attempts to set the variables properly will appear to succeed, but the compiled routines are linked to the uninitialized locations. There may be some lisp intern/package/closure black magic I'm missing, or this is a product of stopping compiled lisp code in an inappropriate early state before anybody should be calling them.
The locations can be set properly manually via TMON if you find their internal storage location. Setting exponent and modulus, for example, allows un-make-des to return correct results in interactive mode.
(setq *backtrace-on-break* t) (_ShowCursor) (provide "PPRINT") (continue) (apropos-list "") (apropos '...)
Running (make-some-shit) interactively in this break prompt gives a File Save dialog for the Agrippa executable, hinting that it is the original routine responsible for the DNA base destruction. Presumably other uninitialized variables prevent it from operating properly and silently as it does during the normal run.
This view into the lisp symbol names does hint that the disk version differs from the scanned source code:
(Or it is possible the LISP compiler has inlined some routines and discarded their names, and abbreviated other names.)
Once the main program logic has begun, the lisp interpreter is unavailable. I did not need to find out how to disable the error catching or how to inject a (break) call.
Below values are valid for memory usage set at 2500K using the OS7 image.
Cipher & Plaintext @ 0016b8e0 first byte = x22 is written at PC=00105a00 initial load is done by pc=00146e78 first read of cipher text pc=001059a6
Tracing the decryption shows distinct cycles of 3.
Entering full trace mode between read of 1st ciphertext char and writing of 3rd plaintext char gives one cycle.
Differencing this trace against read of 4th ciphertext and writing 6th plaintext, causes all relevant mathematical operations to “pop out”, and the algorithm was validated independently of the scanned source.
Thus the scanned source was doubly unnecessary (between this, which was sufficient alone, and getting into the interpreter above).
Lisp machine numbers have high bit set.
A6 is Lisp stack, grows downward A5 is function table pointer uninitialized p-q @ 309fc4 -> 0x1001 uninitialized e-2 @ 309fe4 -> 0xb 17106c contains addresses to be initialized (309fc4) 176FCE - convert lisp number to GATC 17DB70 - produce a GATC by calling A5(30da) 17DB7C - filling in GATC 1016a2 gets GATC A0 = 16b820 -> G 16b82c -> C, 16b814 -> A, 16b808 -> T 16f8ea - load 808=T 16f91a - load 814=A 16f948 - load 820=G 16f976 - load 82c=C 16f8b8 - call RNG 16f8d0 mod 4 Trap A861 Random 0x40839c == _Random
Full trace through A861 trap gives the RNG algorithm in use.
Much simpler using 64bit arithmetic compared to the 68k version...
Modifications to Mini vMac:
Break on divide/multiply by a particular value (which you can then trigger by via the interactive lisp shell calling into the compiled decryption routines)
Snapshot memory in response to Control-1
When flagged, dump full instruction trace & changed register values.
Memory hardware breakpoints: address, address&value, or just value
Interpret function table subroutines (calls through A5(offset))
0x162a: name = "mul"; 0xf8d0: name = "pop+mul"; 0xf7a2: name = "sub"; 0xf7d2: name = "add"; 0x1042: name = "push"; 0x104a: name = "push?"; 0xf602: name = "call"; 0x0ada: name = "push2"; f0x602/call DEST (in A7) 0x10b1b2: mod *(A6) *(A6+4) 0x100aa2: setf? 0x10b596: expt *(A6+8) *(A6+12)