Plots, love letters and remedies: The medieval secrets being revealed by AI

Sandrine Ceurstemont
News imageBeáta Megyesi An encrypted page from the Copiale cipher covered in handwritten symbols and letters (Credit: Beáta Megyesi)Beáta Megyesi
(Credit: Beáta Megyesi)

Historic messages and documents obscured by incomprehensible ciphers can be found in libraries and archives all over the world. Artificial intelligence is helping historians crack open these mysterious texts.

Deep in the archives of the Vatican library, a mysterious hand-written book, scrawled with strange symbols, had lain unread for more than 400 years. Its cryptic pages apparently concealed secret remedies "for affections of the human body", according to some text scratched inside the cover. Such healing practices were kept under wraps at the time since they could attract suspicion or even accusations of witchcraft.

Known as the Borg cipher, the 408-page-long manuscript is mostly incomprehensible – coded using 34 obscure symbols with a few Roman letters and a front page written in Arabic. There was no known key to reveal what was encrypted. Some of the pages are also damaged due to their age, making the code even more challenging to read. 

But with the help of machine learning – a form of artificial intelligence – researchers were able to unravel the code. They discovered the text was filled with thousands of bizarre treatments such as drinking several glasses of high-quality red wine or fermenting a nutmeg in some dough to combat dysentery.

"It is like detective work where every symbol, pattern, and partial solution may bring us closer to someone's secrets and to a lost historical world," says Beáta Megyesi, a professor in computational linguistics at Stockholm University in Sweden, who was part of the team who decoded the text. Even with the help of AI, the process of unlocking the cipher key was painstaking.

Now Megyesi and her colleagues are leading efforts to harness the power of AI to crack historic ciphers, potentially unlocking a wealth of coded information from the past that has previously been uncrackable.

This opens up exciting possibilities for rare and non-standard writing systems – Beáta Megyesi

According to some estimates, around 1% of the material in archives and libraries around the world is fully or partially encrypted. Some of the earliest known ciphers date back to Ancient Greece and Rome.

Decoys, dead languages and bad handwriting

Together, coded historic documents conceal diplomatic intelligence, the rituals of secret societies, medical knowledge, love affairs or everyday details that people wanted to keep secret. This is information currently missing from historical narratives. In some cases, decoding these documents has the potential to rewrite what we know about a famous individual or an entire period of history. (One recent cipher to do this were a collection of coded letters that were found to have been written by Mary Queen of Scots during her long imprisonment in England. They revealed her involvement in plots to regain her throne and her tense relationship with her son, James VI of Scotland and future King James I of England.) 

Historic ciphers can be relatively simple: the Borg cipher, for example, uses a substitution cipher, meaning that each symbol was swapped with a single Roman letter to hide what was written. Others, however, can be difficult to unravel. In some cases, nothing is known about the original language the uncoded text was written in. Extra, meaningless symbols can also be inserted as a decoy to throw off anyone hoping to snoop on the text. In other cases, several signs can be used to represent the same letter.

This can mean a huge amount of work – often involving trial and error – to decode even a small amount of text. It took Cecile Pierrot, a cryptologist at the French National Institute for Computer Science Research (INRIA) in Nancy, France, and her colleagues six months to gradually unravel the key to a 500-year-old letter from Charles V, the Holy Roman Emperor and King of Spain, that had been written using 120 different cipher symbols across three pages. (The decrypted letter revealed Charles V – one of the most powerful men of his time – undone by fear of a plot to kill him. The king was terrified that an Italian mercenary warlord serving the French king, Francis I, was about to assassinate him.)

News imageBiblioteca Apostolica Vaticana The Borg cipher is thought to be around 400 years and contains a mixture of strange cipher symbols and some Latin script on its 408 pages (Credit: Biblioteca Apostolica Vaticana)Biblioteca Apostolica Vaticana
The Borg cipher is thought to be around 400 years and contains a mixture of strange cipher symbols and some Latin script on its 408 pages (Credit: Biblioteca Apostolica Vaticana)

Before code-breaking can begin, researchers must first painstakingly transform a handwritten cipher into a digital document that can be fed into code-cracking software. Bad handwriting and fading of the ink can make this task even harder.

Pierrot says it typically takes her a day just to transcribe a two-page letter containing symbols that are unfamiliar to her.

How AI is helping speed-read secrets

But AI is starting to speed up the process. Michelle Waldispühl, a professor of German linguistics at the University of Oslo in Norway and her colleagues, recently used an online AI platform called Transkribus to transcribe a secret letter written by nobleman Sigismund Heusner von Wandersleben to the Swedish Lord High Chancellor Axel Oxenstierna in 1637 at the height of the 30 Years' War, a religious conflict that would ultimately claim millions of lives and devastate huge swathes of Europe.

The tool has been trained on various languages, scripts and handwriting styles that cover several centuries. After the image of a document is uploaded to the system, the AI detects blocks of texts and individual lines before scanning the whole text character by character to turn it into a digital form.

Although some manual corrections were needed, the tool worked quite well on Von Wandersleben's letter as it was only partly encrypted using numbers separated by dots that were neatly written with clear spaces between them. Other parts were not coded and simply written in 17th-Century German script.

Existing AI transcription platforms often struggle when manuscripts are encrypted with unusual characters, such as invented signs, astrological symbols or numbers that are written in an odd way. But Megyesi, Waldispühl and their colleagues are developing their own AI tool to turn handwritten historical texts with obscure symbols or scripts into machine-readable documents as part of the multinational Descrypt project

"We are developing more adaptable models trained and tested across a broad range of scripts, alphabets and symbolic repertoires," says Megyesi.

News imageGetty Images Many archives and libraries around the world contain encrypted texts that may contain valuable historical information (Credit: Getty Images)Getty Images
Many archives and libraries around the world contain encrypted texts that may contain valuable historical information (Credit: Getty Images)

Once a secret document has been transcribed, the detective work can begin. At the moment, cryptologists often use specially designed non-AI computer software to help with the task which harnesses algorithms to try to determine what cipher was used and break the code. Simple ciphers can often be cracked by analysing the frequency of symbols used and matching them to letters of the alphabet that appear at the same rate in a language. In English, for example, the letter E is the most common while Z, Q and X are the least frequent.

But in Von Wandersleben's letter from the frontlines of the 30 Years' War, for example, he used up to eight different symbols to represent the letter E. It meant trial and error, as well as Waldispühl's knowledge of old German, was needed to gradually unpick the code.

"It was very much back and forth between the machine and the human validator," says Waldispühl. "Maybe at some point AI can do it completely independently."

Hidden behind the cipher were Von Wandersleben's warnings about the threat posed by factions of Sweden's protestant allies in the war. He told Oxenstierna that he had been forced to make strategic retreats from the conflict after being told about a conspiracy among his allies, including Lord Franz Heinrich of Saxony. 

Reopening cold case codes

Megyesi and her team are now exploring how AI could skip the transcription stage all together, simply by analysing photos of the pages to decipher secret messages. They recently showed how the approach could work for simple codes, where every letter is replaced by a single symbol.

They tested the system on a 105-page manuscript they had already decoded, known as the Copiale cipher, which details the rituals, rules and ideals of an 18th-Century German secret society. By training the AI on generic handwriting, followed by images of specific lines from the cipher and the corresponding, decoded German text, the system was able to accurately decipher parts of the text it hadn't seen before.

Such a system could especially be useful when the underlying language of a cipher is unknown.

"This opens up exciting possibilities for rare and non-standard writing systems," says Megyesi. "The ultimate goal is to combine transcription and decipherment in one single step."

News imageGetty Images The symbols on the 4,000-year-old Phaistos disc – found in the remains of a Minoan palace on Crete – remain largely undeciphered (Credit: Getty Images)Getty Images
The symbols on the 4,000-year-old Phaistos disc – found in the remains of a Minoan palace on Crete – remain largely undeciphered (Credit: Getty Images)

Waldispühl and her colleagues have been scouring old archives in search of cipher scripts to compile into a database. This could prove vital as a way of gathering sufficient data to train an AI capable of cracking codes. Large language models that underpin AI chatbots such as ChatGPT are trained on trillions of words from books, articles and websites. Finding equivalent amounts of data for code cracking is challenging.

Amongst the material they have collected are 400 mysterious postcards written in cipher script from the late 1800s to early 1800s. The few scraps decoded so far reveal some of these to be love letters written in German. 

Megyesi's team have used their work to create an AI chatbot-style tool that combines transcription and decryption in a single step. The chatbot combines algorithms for decryption trained on pairs of cipher characters and the text they represent with large language models trained on historical texts from different time periods to help provide clues about a code. Image recognition algorithms, trained on annotated handwriting, are also being incorporated. The AI tool will also be able to improve itself by incorporating corrections from experts that use it.

The idea would be that researchers, or even the public, could give the chatbot a coded, historical text and have it reveal what is written.

When the researchers tested their AI chatbot with the Borg cipher, Megyesi and her colleagues found it could translate and decode a 500 symbol extract in a little over 29 minutes. It even provided an English translation. It also documented the process and explained why the solution was plausible. This is important to make sure that the AI is not hallucinating or inventing interpretations.

The team also recently tested the system with two other ciphers they had previously decoded which represent different time periods, languages, types of secret codes and levels of complexity. It quickly decrypted them too, showing that it is capable of tackling a range of ciphers.

"AI helps most with scale, speed, pattern discovery and integration of tasks," says Megyesi.

Such AI tools could be key to cracking historical ciphers that have been elusive to date. They will also help with ancient texts written in alphabets that nobody can read today. The 4,000 year old Phaistos Disc from Crete, for example, remains undeciphered as does the early Greek language "Linear B". 

"What excites me is not only the possibility of solving one specific historical puzzle, but the prospect of creating methods that can assist researchers across many different cases," says Megyesi.

--

If you liked this story, sign up for The Essential List newsletter – a handpicked selection of features, videos and can't-miss news, delivered to your inbox twice a week.

For more science, technology, environment and health stories from the BBC, follow us on Facebook and Instagram.