exploratory programming

Digital Oulipo: Programming Potential Literature

I am pleased to announce the publication of my article, Digital Oulipo: Programming Potential Literature in the current issue of Digital Humanities Quarterly! This article recounts the project I completed within the Princeton Center for Digital Humanities, its design, difficulties, and results (both expected and unexpected). Most importantly, in it, I offer my two cents on the use of exploratory programming in Digital Humanities scholarship. Enjoy!

http://digitalhumanities.org/dhq/vol/11/3/000325/000325.html

Digital Humanities Summer School

Thanks to a travel grant from the Center for Digital Humanities @ Princeton, I have just completed the intensive week-long Digital Humanities summer school at the OBVIL laboratory at La Sorbonne. OBVIL stands for the “Observatoire de la vie littéraire” or the observatory of literary life. After my Digital Oulipo project and continued work on the Oulipo Archival Project, I cannot agree more with the metaphor of an observatory. Digital Humanities allow researchers to examine from a distance, which complements the traditional literary scholarship of “close readings.” Now more than ever, I believe humanities scholarship needs both perspectives to succeed.

In this intensive and rich program, I was able to continue to develop my skills in XML-TEI that I had been learning through the Oulipo Archival Project. Furthermore, I discovered exciting new software such as TXM, Phoebus, Médite, and Iramuteq and how they can be used to learn more about large corpuses of text. My favorite part of this program was that it was a specifically French introduction to European developments in the digital humanities, allowing me to broaden my perspective on the discipline.

Here is a brief summary of what I learned day by day. I am happy to answer any specific questions by email. Feel free to contact me if you want to know more about the OBVIL summer school, the specific tools discussed there, or just about digital humanities.

Day 1

The first day of the summer school was a general introduction to the history of digital humanities methods and how to establish a corpus to study using these digital methods. It was especially interesting for me to learn the history of these methods I have been experimenting with for months. I had no idea that the Textual Encoding Initiative (TEI) had been invented in 1987, before I was even born, as a new form of “literate” programming.

Surprisingly, the most useful workshop was a basic introduction to the various states of digital texts. While I knew most of the types of digital documents already as a natural byproduct of using computers in my day-to-day life, it was useful to discuss the specific terminology (in French even!) used to describe these various forms of texts and the advantages and disadvantages of each. For instance, while I knew that some PDFs were searchable while others are not, it was still useful to discuss how to create such documents, the advantages of each, and how to move from one medium to another.

Day 2

The second day of the summer school began by asking the not-so-simple question of “what’s in a word?” In the following sessions, we learned about everything from how to analyze word frequencies in texts to how to treat natural language automatically, through tokenization (segmenting text into elementary unities), tagging (determining the different characteristics of those unities), and lemmatization (identifying the base form of words).  We then had specific workshops meant to introduce us to ready-made tools we could use to treat language automatically. We did not discuss NLTK, however, which I am currently using to program the S+7 method for my Digital Oulipo project, most likely because using NLTK requires a basic understanding of programming in Python, which was out of the scope of this short summer school.

The second half of this day was an introduction to text encoding, how it works and why it is useful for analyzing large corpuses. While I was already familiar with everything covered here, it was still interesting to hear about the applications of TEI to something other than the Oulipo archive. It was especially interesting to hear about applications of TEI to highly structured texts such as 17th century French theater.

Day 3

This day was extremely technical. First we looked at co-occurrences of characters in Phèdre as an example of network graphs. Since the main technical work had been done for us, it was somewhat frustrating to be confronted with a result that we had no part in creating. While as a former mathematician, I knew how to understand the content of a network graph, many other students did not and took its spatial organization as somehow meaningful or significant. This demonstrates a potential pitfall in digital humanities research. One needs a proper technical understanding of the tools and how they function in order to interpret the results with accuracy.

In addition to network graphs, we also discussed how to use the XPath feature in Oxygen (an XML editor) to count various elements in classical theater such as spoken lines by characters, verses, or scenes in which characters take part. Once again, it was interesting to see how a computer could facilitate such a boring manual labor and how it could potentially be of interest for a scholar, but interpreting such statistical aspects of large corpuses of text is tricky work for someone whose last statistics class was in high school. This gave me the idea to create a course that would properly teach students how to use these tools and understand the results through workshops.

Day 4

This was another ready-made tool workshop in which we discussed using OBVIL’s programs Médite and Phoebus to edit online texts more efficiently and find differences between different editions. This was very interesting, but probably more useful for publishing houses than for graduate students.

The rest of the day was meant to introduce us to Textometry using TXM, but there were far too many technical issues with the computers provided by the university that we spent the entire time downloading the software on our personal laptops. This was not only frustrating, but ironic. One would think that a summer school in digital humanities run mostly by computer scientists would not have such technical difficulties.

Day 5

The final day of the program (Friday the 9th was devoted to discussing our personal projects with the staff) continued the work on TXM. In fact, as my section had had such issues the previous day, I decided to switch into the other group. This was a good decision, as the head of that session was more pedagogical in his approach, assigning a series of small exercises to introduce us to TXM. By experimenting with tokenization using TreeTagger and concordance of words, we were able to begin to write a bit of code that could parse a text and find specific groups of words.

This introduction was practical and hands on, but I wish there had been more. While I now know vaguely how to use TXM to analyze texts, I do not have experience coming up with the questions that such techniques might help me answer. This seems to me the key to effective digital humanities scholarship — asking a solvable question and knowing which tools can help you resolve it.

Digital Oulipo: Graph Theory for Literary Analysis

Raymond Queneau published Un conte à votre façon in 1967 in the midst of the Russian formalism excitement spurred by the translations of Vladimir Propp and his contemporaries. Propp’s premise is that all folktales can be broken into their simplest 31 narrative functions following an initial situation. À votre façon refers to a potential reader, who can compose it as he/she sees fit given a set of binary choices provided by Queneau. This “tree structure” comes from the mathematical field of graph theory that was being developed at that time by fellow Oulipian, Claude Berge. Queneau’s tale can, incidentally, be represented as a graph.

A graphical representation of the text designed by Queneau and published in a collected volume of the Oulipo

Queneau’s story initially gives this reader a choice between the story of three little peas, three big skinny beanpoles, or three average mediocre bushes. The following choices are all binary, and mostly stark oppositions. This is also a feature of algorithms, in the sense that binary choices must provide mutually exclusive alternatives. If not, the system would be contradictory. For instance, either the reader chooses to read the tale of the three peas or he does not. Should the reader prefer not to read this first option, (s)he will find that the two alternatives offer rather meager results. Refusing all three terminates the program almost immediately.

As with these preliminary choices, most nodes in Queneau’s short tale are superficial or contradictory, giving the reader only an illusion of choice. While the story and its various itineraries can be visualized as a flow chart, the story leaves very little freedom to the reader. The genius of the text lies in the simultaneous display of all possible options, allowing the reader unparalleled insight into the structure (and shortcomings) of this experimental tale.

At the end of Learn Python the Hard Way, I was able to make my own Un conte à votre façon program that allowed a reader to move through a “Map” with a pre-established number of scenes. While I was proud of myself for writing a program, I was still not satisfied. I wanted the graph and I wanted it to interact with the program somehow.

My technical lead, Cliff Wulfman introduced me to graphviz, an open source graph (network) visualization project that is freely available for Python. In graph theory, a graph is defined as a plot of nodes and edges. In graphviz as well, in order to make a graph, I had to define all requisite nodes and edges. The program then generates a visualization of the graph that is spatially efficient. As an exercise, I made a graph of the famous graph theory problem of the Bridges of Königsberg.

With this graph theoretical program in my Python arsenal, I was able to make my own graph of Un conte à votre façon. Still not enough, I aimed to integrate the two programs, and Cliff and I decided to give my original program the structure of a graph so that it would be able to generate a graph with graphviz at the end. My program now has nodes and edges that correspond with Queneau’s prompts and choices as follows.

With this structure, I was able to write a program that enters the graph at the first node and, depending on the choice of the user, proceeds through the story, printing out the selected nodes and the corresponding edges. The program records the path the reader takes and at the end, prints out a graph with the reader’s path filled in green, to represent the little peas. While my little program does not take into account the full potential of the intersection of graph theory and literature as proposed by Queneau’s text, I am very pleased with how it functions. For instance, I can leave to the reader the mathematical exercise of finding the minimum number of read-throughs required to hit every node. While there is still more that can be done, the graph my program generates is itself informative — side by side with the text, the reader can learn more about the potential of this three little peas story.

Digital OuLiPo: Learn Python the Hard Way

I wanted to write a brief blog post in praise of this free online textbook for Python. Over the January break, in order to move onto more complicated parts of my project (the Cent mille milliards de poèmes was a fairly basic introduction to programming), my technical lead proposed that I work my way through this textbook.

Written by Zed A. Shaw, this book has been helpful on many levels, not least of which because it introduced me to using the terminal rather than relying on some outside program. For my Cent mille milliards de poèmes annex, I had primarily been using AptanaStudio, which was a very powerful piece of software that allowed me to avoid learning the basics of programming. The first few chapters of Learn Python The Hard Way forced me to acquaint myself with the terminal.

The bulk of the chapters were similar to Code Academy, but working through the exercises outside of an online platform and then running them on the terminal was more pedagogical, as was the way the activities built upon themselves. I now feel more autonomous in my programming.

So for anyone else looking to learn a new (programming) language, I would highly recommend this free and easy online resource. Anything worth learning is worth learning the right way, and in this case, the “right” way seems to be “the hard way”!

Digital OuLiPo: Coding as Analysis

Given my lack of formal training in programming, the order in which I pursue this project depends upon the relative difficulty of each annex. Therefore, I have chosen to begin with the 3rd and 5th annexes. Annex 3 is a simple exercise that can function as a basic introduction to programming in Python, while Annex 5 involves no coding at all, just becoming familiar with a ready-made tool called TinderBox, which can produce hypertexts.

Step 1: Creating the Data Structure of the Cent mille milliards de poèmes

From what I understand so far about Python and object-oriented programming languages, Annex 3 is a simple exercise that I will be able to complete even as a beginner. Indeed, in order to program this text, I need to think like a computer. The Cent mille milliards de poèmes is composed of 10 sonnets, each of which — like all sonnets — has 14 lines. In computer science terms, a sonnet is an array or a collection of elements, to which each can be assigned a numbered index (so, lines 1-14). Since corresponding lines of each sonnet rhyme with one another, they can be swapped. This creates an array of arrays and each verse can therefore be represented by two indices: one indicating which numbered line it is (1-14) and another indicating to which poem it belongs (1-10).

As a former math major, this type of thinking is not foreign to me. It is the practical aspect of creating a program to manipulate these verses with which I am unfamiliar. For instance, in mathematics, you could label these verses however you want. With Python, you have to begin with 0 and not 1. Item 0,0 in my program therefore corresponds to the first verse in the first poem.

Step 2: Performing Operations

Once I am able to create the data structure for my program based on these insights, I will be able to write a program to generate “random” poems from Queneau’s prefabricated elements. This is my second step: performing operations on these arrays. I have experimented in allowing the reader choose one line of one sonnet and now I am trying to come up with a satisfying way to generate pseudo-random sonnets. Since a computer cannot generate pure random numbers, I am designing my own pseudo-random number generators (PRNGs) that will have some relation to the reader — or user. As the Oulipo is starkly anti-chance, this seems promising to me, since computers are utterly incapable of generating truly random sonnets from this collection. Instead, I will create a unique “key” based on user input. Below is some brainstorming:

  • Using the current date and time in order to determine which verses to pull. That way, the reader can generate a new sonnet every second. However, I will need to subject this date to some calculation, as a simple 14-digit string of the time and date (for instance, hh-mm-ss-MM-DD-YYYY) will result in very similar poems generated year to year, decade to decade, century to century, and millennium to millennium. Additionally, the first digits of the hours, minutes, and seconds, do not allow for all numbers between 0 and 9.
  • Pulling a number based on the computer hardware on which the poem is produced, but such numbers are often longer than 14 digits.
  • Using the coordinates of earth in the galaxy to generate such a number.
  • Ask the reader to input data about him/herself—age, height, weight, etc.—to generate the numbers. This will also certainly lead to the same problem as the first bullet point.

Exploring these possibilities is putting me in an Oulipian mindset, helping me understand their conception of chance by forcing me to create the way they do.

Insights into Computer Influence

I expect that learning to program the Cent mille milliards de poèmes will help me as a researcher to understand the text on a deeper level. It is clear from the paratextual elements (an epigraph by Alan Turing and a “user’s manual”) that even the author was inspired by computers. Indeed, he even had someone program it around the time it was first published. Understanding how computers function will therefore help me to grasp Queneau’s intentions for this odd volume.

I have never been satisfied with online versions of this text that require the reader to push a button and then get a “random” poem out of a hundred thousand billion. And even now that I understand that these versions by their very nature cannot produce truly random poems, I am still dissatisfied. Regardless of the method, these versions still seem to limit reader involvement, which I suspect is key.

The Oulipo members themselves grappled with this same issue as they created computer programs of early texts such as this. They were interested in the potential of computers as tools for text production, but always treated it with a grain of salt. Making my own computer program has helped me both understand the way computer programs function — teaching me the basics of coding — but it has also made me more critical of the Oulipian notion of chance and how it is to be understood.

Digital OuLiPo: The Plan

As a former math major who is now completing a Ph.D. in literature, the term “digital humanities” certainly appealed to me. Indeed, in my own research practice, I attempt to use unconventional tools to examine an experimental writing workshop called the Oulipo. With this in mind, I decided to embark on a project with the Center for Digital Humanities @ Princeton, which I hope will teach me basic concepts of exploratory programming as well as create an interactive addition to my dissertation work.

Below is a summary of my initial expectations for this project, which I intend to complement with periodic blog posts in the hopes of inspiring other academics (or even amateurs) to pursue their own projects. The description that follows is a modified version of the initial proposal I submitted this month, however I expect that much will change, as the process of learning to code is already forcing me to refine my original goals and nuance the scope of the project.

Chapter 1: Set Theory

The first chapter of my dissertation deals with set theory, a late 19th/early 20th-century mathematical development that attempted to replace the original foundations of mathematical study (the number) with a new language of “sets,” or collections of objects, called “elements.” Set theory was popularized in France by an odd, semi-clandestine group of former École Normale Supérieure mathematics students that published under the pseudonym of Nicolas Bourbaki starting in the 1930’s. This group’s influence extended beyond mathematicians in the period following World War II, including the Oulipo. The focus of my first chapter is to examine the extent of this influence and understand exactly what how the Oulipo has applied set theory to literature, as well as what distinguishes this work from other movements of the time that were influenced by Bourbaki such as structuralism.

With this in mind, I have decided that my first digital annex will view texts as sets of words or perhaps other elements. By choosing canonical texts (in both French and English) and creating an interface that will allow the reader to experiment with basic set theoretical operations, such a reader can learn to treat literature mathematically. An obvious example of such work would be to allow readers to find intersections in vocabulary, for instance examining the common words in plays by Racine and Corneille. This prefabricated humanities computing would also serve as an introduction to a specific brand of digital humanities scholarship for scholars.

Chapter 2: Algebra

In this chapter, I understand algebra broadly as the mathematical discipline dealing with mathematical symbols and the rules for manipulating them, including basic counting, arithmetic, elementary algebra, number theory, and abstract algebra. For the annex, I have chosen the canonical Oulipian procedure, Jean Lescure’s S+7, in which one takes a text and replaces every noun (S=substantif) with the noun that is found seven entries later in a dictionary of the author’s choice. My program will allow the user to experiment with the S+7 on individual texts, as well as with the procedure itself. Some potential avenues: replacing S with another part of speech (verb, adverb, etc.); applying a more generalized S+n and seeing how the difference in n’s changes the result (including an S-7 function for readers to verify the validity of Oulipian S+7s); changing dictionaries; etc.

Chapter 3: Combinatorics

Combinatorics is a branch of mathematics dealing with the study of finite or countable discrete structures. Central to the Oulipo and its aesthetics, the study of combinatorics deals with questions of probability and entropy, which helps us understand the Oulipo’s insisted opposition to chance.

My third annex will be a digital edition of Raymond Queneau’s first Oulipian text, the Cent mille milliards de poèmes (1961), which allows a reader to permute the corresponding verses of 10 pre-written sonnets in order to create 100000000000000 (or one hundred thousand billion) new ones. In Queneau’s original paper edition, the reader has the freedom to select certain poems, adding a pedagogical intention to the text. Unlike the electronic versions that the Oulipo created in the 1960s and 1970s and other amateur versions available on the internet, I hope that my version will restore some of this original freedom to the reader of this constrained text.

Chapter 4: Algorithms

The fourth chapter of my dissertation deals with algorithmic literature, written with computers in mind and often reformatted for computers. In its early years, the Oulipo experimented formally with computers, creating interactive electronic editions of various texts. However, this early interest in technology eventually waned and disappeared entirely in 1981 when a tangential group known as the ALAMO[1] (Atelier de Littérature Assistée par la Mathématique et les Ordinateurs) was created by computer scientist Paul Braffort and mathematician Jacques Roubaud. In 2004, the Oulipo released a CD-rom[2] through Gallimard with interactive computerized editions of several of their texts. Essential to my fourth chapter is understanding the nature of these early examples of proto digital humanities work and why the Oulipo abandoned them.

The fourth annex will consist of an electronic version of Un conte à votre façon (1973), a choose-your-own adventure tale of three little peas in a pod. While several online editions already exist, there are improvements to be made regarding the reader’s involvement. As this text is inspired by computer programs and therefore by algorithmic flowcharts and graph theory, I want to integrate my program with the graph corresponding to all the possible nodes of the story, which will allow the reader to understand various “glitches” that occur in Queneau’s “program.”

Chapter 5: Geometry

This chapter deals with geometry, which comes from the Greek for “measuring the earth.” Since the mathematical discipline deals with abstracted space, how does one reconcile this with the physical space in which we live. This becomes a central problematic in two Oulipian texts, both of which exhibit geometrical structures indicated by the table of contents: Italo Calvino’s Le città invisibili (1971) and Michèle Audin’s Mai quai Conti (2014). Both of these authors meticulously organize their novels according to geometrical structures in an attempt to reconcile the messiness of their topics with the regular design.

In Italo Calvino’s Le città invisibili, the Italian author organizes a fragmented and incoherent collection of theoretical prose poems according to a rigid mathematical figure (a parallelogram). However, the philosophical and theoretical content of each of the pieces does not seem to correspond with the crystalline structure. My fifth annex will take the form of an interactive table of contents for Calvino’s novel, where one can enter into the text from any angle, allowing for multiple readings that lead to various conclusions, as the author indicated he wanted in his Six Memos for the Next Millennium.

[1] http://www.alamo.free.fr/pmwiki.php?n=Alamo.Accueil

[2] http://www.gallimard.fr/Catalogue/GALLIMARD/CD-ROM/Machines-a-ecrire