Could you briefly introduce yourself and describe your research profile?
I am part of the international Regesta Imperii project, which, among other things, is dedicated to making the documents of Sigismund of Luxembourg accessible. I have been working on the project for over ten years, preparing regesta of documents in German, and now I am also working on a GA ČR grant. I have long focused on late medieval diplomatics and archival theory, which today inevitably includes digital archiving.
Your dissertation focused on mortgage deeds. How did you come across this topic?
It was a coincidence. At one of the first project meetings, we came across a suspicious document. Project leader Petr Elbel noted that it could be a forgery, and I based my entire dissertation on this case. It turned out that no one had yet systematically analyzed Sigismund's mortgage deeds.
How did you conduct this analysis?
I had to locate the documents, identify them, transcribe them, and compare their forms and characters. Over 500 documents have been preserved for Czech recipients, but only about a tenth of them are originals—the rest are known to us thanks to brief records.
So, did this process also lead you to identify the forgery?
Yes, it was a traditional analysis of internal and external characteristics. The document was suspicious because of the way Sigismund's name was written, both in terms of the font and the parchment. In addition, it was a pledge deed for church estates, which Sigismund was not legally allowed to pledge at that time. It was supposed to have been issued in the 1420s, when he used the Hungarian seal for this type of document, but it bore the imperial seal. That in itself pointed to its problematic nature.
Looking back, what did these topics give you and how did they influence your current research direction?
Thanks to my work on mortgage deeds, I gained a deeper understanding of the reign of Sigismund of Luxembourg. Together with my colleagues, we gradually processed all of his deeds stored in Czech archives. Today, each of us focuses on different regions—part of the team focuses on Munich, others on Poland or Italy. I was assigned Nuremberg, which I have been working on since 2019. Our latest grant is based on the Nuremberg deeds.
What are you working on now? Where is your research headed?
I am currently focusing on analyzing documents for the imperial city of Nuremberg, one of the most important centers of Sigismund's empire. In addition to a number of privileges, the so-called missive books, i.e., council correspondence, have also been preserved here. We are investigating how the city communicated with the monarch.
The collection is so extensive that traditional methods are insufficient. That is why we use digital humanities approaches with the vision that they can also be applied to other Sigismund documents in other archives. In Sigismund's case, there are tens of thousands of documents in total—the oldest edition contains over 12,000. Such a volume cannot be handled by a single researcher, which is why digital processing is essential.
Can you give specific examples of the digital humanities methods you use?
At the beginning, we decided to go in two directions. The first is automated text transcription using HTR (Handwritten Text Recognition) technology. In this regard, we are collaborating with the ERC grant "From Digital to Distant Diplomatics" (DiDip), led by Professor Georg Vogeler at the University of Graz. This project has resulted in the creation of Transkribus, a tool that enables automatic recognition of handwritten texts.
It is often said that medieval manuscripts are a tough nut to crack. In our case, however, it is not that complicated – our colleague Tobias Heil brought back a well-trained language model for German-language texts from the second half of the 15th century from a winter school in Vienna. He then trained it further on Sigismund's documents. Since these are office documents, the texts are relatively easy to read. After the first training, the model achieved an error rate of around 3%, which is an exceptionally good result. This allows us to obtain text data very quickly.
The second approach is automated analysis. Theoretically, we would be able to analyze the documents using traditional methods, given their quantity, but we decided to use this opportunity to train sophisticated search algorithms. Our goal is to create a well-processed corpus that will be available to other researchers.
The problem is that such analyses encounter a lack of comparative data. Many researchers give up on data preparation precisely because they have nothing to compare it with. If we repeated this approach over and over again, we would not get anywhere. That is why we are trying to create a basic corpus that will enable further research and open the way to new methods in diplomatics and archiving.
Is network analysis also relevant to your research?
Yes, we tried to use it in a project focused on Sigismund's party in Hussite Bohemia. However, it turned out that working with medieval data has its limitations—mentions of individuals tend to be sporadic and random, which complicates their collection and cleaning. In the end, we opted for a more traditional approach: we identified Sigismund's supporters and prepared their biographies.
We collaborated with the Center for Medieval Studies at the Institute of Philosophy of the Czech Academy of Sciences in Prague, which began building an online database of documents from the Hussite period. The digital edition differs from the traditional one—instead of fixed text, it allows access to different layers of the document and continuous updating of data. This paves the way for dynamic editions that can also be used for further research.
How do you think network analysis and other methods of digital humanities complement traditional historiography?
While working on Sigismund's documents in Nuremberg, we debated what digital tools could really offer us. Many questions can be addressed using traditional methods, so it is crucial to identify when the use of digital methods makes sense and adds value.
Network analysis can reveal connections that would otherwise escape the researcher. Sometimes it brings unexpected results that can be interpreted retrospectively.
The second level involves working with large data sets. In the past, we had a community genealogy database project based on users transcribing registry records. That made sense at the time, but today we would use automatic indexing or text recognition and simply approach it in a slightly different way.
It is always a balance between the quantity and purity of data. Involving lay users will bring a large amount of information, but not always of perfect quality. Moreover, projects evolve—with new tools, the original approaches are often surpassed.
In recent years, you have been involved in both formulating the Vision for Czech Archiving and preparing the publication Digital Archiving. How do these two projects differ, and what trends do you think will determine the future of the field?
Vision for Czech Archiving was originated as a conceptual text that opened up debate on the role of archives in society. This was followed by an extensive analysis of the current situation – one of the largest questionnaire surveys in Czech archiving. The results led to recommendations, discussion of new legislation, and the preparation of a concept for the development of the field over the next ten years.
On the contrary, the publication Digital Archiving was a specialist text for a university course. Archiving is traditionally linked to auxiliary historical sciences, which is why teaching has long focused on the historical aspect. We decided to strengthen the segment of digital archiving and information technology, but we encountered a lack of literature. That is why we prepared a publication that filled this gap.
My role here was primarily editorial—I coordinated the team and compiled a glossary of basic terms. Today, these terms are widely known, so we are working on a new edition of the text that will take into account legislative changes and current topics, such as the use of artificial intelligence in archives.
The publication focused on several key segments:
- electronic document management and e-government,
- digital archive architecture and electronic archive management,
- electronic processing of analog and digital documents,
- digitization and accessibility of archives,
- recording of archives in internal systems.
These areas show that digital archiving is one of the strongest trends today. The future of the field will be determined by the ability to combine traditional practices with new technologies and open up archives to society in a digital environment.
At Masaryk University, you teach a CORE course entitled Archives as Interdisciplinary Databanks for the 21st Century. How do you introduce it to students from other disciplines?
The course presents the archive primarily from the user's perspective. Each field can find its own source of information there, although a large part of the archives are still analog and require specific search strategies. Students become familiar with the archival network, learn various search methods, and explore thematic areas ranging from science and technology to landscape and personal data. An expert from the National Archives also participates externally, introducing the archiving of scientific data and the management of their own research materials.
What obstacles and risks await Czech archiving?
The biggest weakness is the fragmentation of digitization projects in public archives, which in the past created their own digital strategies. The National Archives are now trying to unify this situation through the National Archives Portal.
In addition, new challenges are emerging. Inspiration comes from a project by the National Archives, which tested the processing of the audio archive of Czech BBC broadcasts using tools for machine transcription of spoken words—listening to the audio archive alone would take an estimated 300 days. Artificial intelligence thus paves the way for effective data access.
Another challenge is working with analog documents, where digitization does not always make sense, and also selecting from a huge number of digital-born documents. Mass production requires the use of AI tools to help pre-select and decide what should be preserved.
How do you perceive the role of technology in historical research?
In archival practice, technology represents enormous potential, much like machines did in the 19th century, changing the nature of work. It can advance the field and raise its profile in society. Less than half of the archival materials in Czech archives have been processed, and technology is paving the way to showcase their richness.
In historical research, its use is more complex. Technology is a double-edged sword—it is necessary to know why we use it. It makes sense where the human approach is insufficient, for example in modern history with its inexhaustible amount of material. Contextual analysis has great potential, for example when searching old newspapers, where the user formulates a question and the tool offers corresponding results.
What are your plans for the near future?
The main project is the mentioned GA ČR grant, focused on the documents of Sigismund of Luxembourg in Nuremberg, which will continue for another two years.
The biggest challenge is finding suitable methods for processing. At the summer school of auxiliary historical sciences, my colleagues from Graz and I discussed how to use new methods in processing the registers of Sigismund of Luxembourg's documents. In addition to analyzing the Nuremberg privileges, we want to test methods that could be applied to his documents throughout Europe.
The potential is enormous—there are thousands of Sigismund's documents. Even if eight researchers worked on the project, the work would last them a lifetime. That is why we are looking for ways to do it faster and more efficiently, and technology may be the key.