chainsawriot

How I prepare ICA papers (version 2023): emacs, overleaf, literate programming

Posted on Nov 1, 2023 by Chung-hong Chan

The ICA writing season is now behind me. I can write about how I write my ICA papers, again. Previously, I wrote about how I prepare ICA papers in 2021. In that post, I said I wrote my ICA papers in RMarkdown over trackdown. Basically, putting the RMarkdown source code on Google Doc.

Now, it’s 2023 and I have another job. I think it’s time to give you an update.

First, I abandon trackdown; and it also means I partially abandon RMarkdown. The new kid Quarto actually has the same problem. Collaboratively writing a Markdown file on Google Doc is simply unnatural. And the workflow actually introduces many issues, for example, Google Doc converts all double quotes to some weird shit like Microsoft Word. It is also the worse of both worlds: It does not have preview of how the paper would look like and it does not run any embedded R code. Both tech and non-tech people are unhappy. For one time, I tried to write one Markdown paper collaboratively on HackMD. HedgeMD/HackMD open source politics aside, writing RMarkdown paper on HackMD was significantly better than trackdown because of the HTML preview. But still, HackMD was worse of one world: It won’t run R code.

To many people, writing paper locally and submitting patches via Git and editing a file via pull requests are just unnatural. There is only one super techie and R savvy person (my team lead David Schoch) who can write papers with me this way (we wrote two papers this way so far). I just can’t nudge my other collaborators to write papers this way. My e-mails about “the paper is available on GitHub too” were usually just a not useful sidenote.

There should be a RMarkdown / Quarto collaborative writing environment that supports code execution. But the fact that RMarkdown formats have been available for almost a decade but no one sees this as a business opportunity bothers me. Look at all those Notebook environments!

I talked to my collaborator Nathan Teblunthuis previously about his writing setup. His advice is “always write in \(\LaTeX\).” Although it sounds like the so-called Atwood’s Law, he has his point. It likes the mantra of Choose Boring Technology. Yeah, \(\LaTeX\) is not as flashy and trendy as any new shit. But it has been available since 1984. You know, the year when the perpetual war between Oceania, Eurasia, and Eastasia was still going on and that running lady with a hammer did not smash the Telescreen of the Big Brother.

Of course, another reason is Overleaf. It has been the de facto \(\LaTeX\) environment for many fields, including my neighboring field Political Science. So, I decided to write all my ICA papers this year in pure \(\LaTeX\) on Overleaf with my coauthors. And this decision was also opportunistic because all of my coauthors this time are (I believe) tech-savvy.

Overleaf is fine. Of course it is a freemium service ¹. But it is probably the best of two ordinary worlds. At least it has instant rendering and some form of literate programming. Here are the tricks:

Literate programming on Overleaf

Actually, Overleaf supports literate programming in some way. Of course, I can upload my data files to my Overleaf project. If one wants to use literate programming, the \(\LaTeX\) file needs to be in .Rtex file extension. So, just rename my \(\LaTeX\) files to *.Rtex and Overleaf will allow code execution.

The next question is: How the heck can I embed code in those *.Rtex files? Well, I need to use the Sweave…-inspired knitr notation. Isn’t knitr just for RMarkdown / Quarto? Well, it supports \(\LaTeX\) as well.

Inserting a code chunk and inline R code look like this:

<<test, echo = FALSE >>=
mod <- lm(Sepal.Length~Sepal.Width, data = iris)
@

The regression coefficient is \Sexpr{round(coef(mod)[2], 2)}

For this code, I can just render the project and I will get the result. And complex things like ggplot2 also work. The problem, however, is that I cannot install any R package. I must use the ones preinstalled. I recommend running this to get a list of all preinstalled R packages.

<<listpackages, echo = TRUE >>=
dimnames(installed.packages())[1]
@

Actually, there are many packages. Notably, all tidyverse packages and rio (thank you) are available. Even some which I think are potentially dangerous are available too, e.g. fs, ps, and curl. But unfortunately, some packages I like are not available, e.g. here. The point is to be mindful about the limited set of R packages available. For packages not available, I recommend running the analysis locally, saving the output as RDS. And then upload the RDS files and use the literate programming interface to unpack the output in the \(\LaTeX\) source. I would add the R scripts for generating those RDS files to the project too.

From online to offline

The Overleaf project can be brought offline to my computer via Git (a premium feature). Click on the Main Menu and there are GitHub, Git, and Dropbox integration. The Git option gives you the command to clone the current project offline, something like

git clone https://git.overleaf.com/xxxxxxxxxxx

On my local machine, there would be questions about how to render those files like I did online. Suppose the file is called wonderful.Rtex.

Rscript -e "knitr::knit(\"wonderful.Rtex\", output = \"wonderful.tex\")"

Like Rmarkdown, knitr::knit() executes the code chunks and inline R code and generates a clean \(\LaTeX\) file. And that \(\LaTeX\) file can be render by my favorite \(\LaTeX\) renderer. These days, I am getting lazy and I just use latexmk. I usually write a Makefile like this:

appendix:
	Rscript -e "knitr::knit(\"wonderful.Rtex\", output = \"wonderful.tex\")"
	latexmk online_wonderful.tex -pdf
	rm -f *.out *aux *bbl *blg *log *toc *.ptb *.tod *.fls *.fdb_latexmk *.lof *.fff *.run.xml *.bcf *.ttt
	rm wonderful.tex

Another issue is how to edit those .Rtex files offline. RStudio is just a matter of double click. On emacs, I just need to associate the file extension .Rtex to the poly-noweb+r-mode.

(add-to-list 'auto-mode-alist '("\\.Rtex" . poly-noweb+r-mode))

From offline to online

Actually, the Git integration allows me to edit the files offline and then push my edits back to Overleaf. However, this can be problematic because my collaborators might have edited the files via the web interface. And my edits via Git might overwrite my collaborators’ edits.

So, most of the time I just use the web interface. The emacs keybinding on the Overleaf web interface is a joke, don’t use it. Previously, I wrote about using GhostText with Overleaf and edit the text with emacs. Now, It works unreliably. I really hope that it can work again.

For writing from zero, I usually just write my text offline on emacs and then paste it into the file on Overleaf. For editing existing text, I use the web interface.

Conclusion

Despite all these limitations, it is still a usable literature programming online platform. If Overleaf can serve as a gateway drug to nudge people away from using Microsoft Word / Google Doc and all these US big-tech path-dependency bullshit, and embrace open source and computational reproducibility, it is already a good service. I also like the fact that my files are not locked in the platform. So, I think I will support this little uncomfy service. But, dear Overleaf: Please don’t sell your damn company to an evil conglomerate like Elsevier or Springer Nature. Thank you!

Disclaimer: My organization actually has the premium subscription. ↩