chainsawriot

Learning enough elisp development to be dangerous: ess_rproj

Posted on Jul 19, 2020 by Chung-hong Chan

tl;dr: I have wasted a lot of time on writing software with only one user again. And then I have wasted some more time to write a blog post that no one is interested in about that software. Probably I should use my time more effective in churning out another paper, another grant proposal or finish another R&R. This blog post is about how ineffective my use of time is. Thank you and good night!

Prologue

I program primarily in R these days. I am also an Emacs/Emacs speaks Statistics (ESS) user. Due to the recent explosion in the number of data scientists, I think R has a very healthy blog scene. But I don’t see a lot of ESS users blog about their experience. I think I am at the super small intersection of this Venn diagram.

According to this blog, I have at least 10 years of experience using emacs. I remember the early days I used Tinn-R on a Windows machine provided by the hospital to write R programs ¹. It’s wonky, but I thought it is better than the editor come with Windows version of R. Speaking of which, I used to have a former colleague back in Hong Kong who uses that RGui built-in editor very fondly, despite the fact that way better editors such as RStudio are available. Multiple conversion attempts to either RStudio or emacs/ESS by me failed ². That’s even more wonky.

Like many things in my life, I can’t recall why did I switch to emacs a decade ago. In spite of the fact that almost everyone is using RStudio or PyCharm or whatnot, I still use emacs, almost every day to be exact.

For education purposes, I think I need to explain a bit about what emacs and ESS are. Emacs is a text editor from the mid-70s which is still widely used today. It is known for having an interesting learning curve. Emacs is also well known for its limitless extensibility. Thanks to its programmability with a programming language called emacs lisp (or elisp), one can do any task with emacs. For example, one can play video files with emacs. These extensions, called modes, are usually developed by the emacs user community ³.

Of course, one can write statistical program with emacs. Emacs Speaks Statistics, or ESS, is a emacs mode to support several statistical programs, e.g. SAS, STATA, Stan, Julia, S-PLUS and of course R. Recent versions of ESS support integration of devtools. Another mode, Polymode, supports mixing R development with another language, e.g. C++ (RCpp) or Markdown (RMarkdown).

Similar to my experience with Javascript, I can never claim mastery with emacs. I always feel that I have a lot of knowledge gaps about the software. Of course, compared to other emacs wizards, my usage is not efficient. But there is a strong path dependency: I think I am not efficient in emacs, but my work efficiency with emacs is still higher than my work efficiency with other editors. I have of course tried RStudio. I teach my students to use RStudio too ⁴. I gave V…SCode a try earlier this year too ⁵. I am still using emacs.

One knowledge gap is my lack of experience with elisp. I love Lisp as my recreational programming language. I gave Lisp workshop. But I have not written any serious program in (e)lisp, except copy-n-paste elisp code into my .emacs file (my emacs configuration file) off the Internet.

One major reason was that I did have the need to write any thing in elisp. Up until recently.

“One more competing standard” and I am the “nobody”

One problem I have encountered with my increasingly more collaborative R development is defaults. I am working increasingly more with other’s code. And predictably, most of these R code was written with RStudio. As you may know, RStudio uses 2 spaces for indentation by default. ESS, instead, uses 4 spaces. You may see it in my R packages that all of them are indented with 4 spaces.

Therefore, if I edit R code written with RStudio with emacs, I instantly break the style at the moment of me pressing the tab key. Of course, I can either manually set the setting to 2 spaces temporarily.

Is there solution to this? Of course yes. Solutions, to be exact.

Many editors support EditorConfig, a well-defined configuration file specifying nitty gritty details about how the code in a code project should be written. Supporting editors look for this EditorConfig file and then configure itself to match the prescribed style. Hopefully, all code written in a project with the same EditorConfig file has the same style, even though code might have been written by different editors.

A wonderful solution, isn’t it? But well, the most popular IDE in the R realm, RStudio, does not support it. Despite pledges (e.g. one and two) from the community about supporting it, RStudio stands with its own standard.

Enter RStudio Project.

RStudio Project, or Rproj, is a way for RStudio the IDE to save the meta data about a software project in a directory. In the nutshell, it is also a plan-text file saving style-related configurations. But to be fair, Rproj does significantly more than EditorConfig. It solves a lot of problems that are frustrating for beginners. Working directory, workspace management etc etc. It even has advance things like where is the git remote.

Double click a Rproj file, and whoosh, all these things have been set up nicely for you. The easiest way to achieve the Project-oriented workflow advocates by one of RStudio’s employee, Dr Jenny Bryan.

By no means this is a diss. Rproj is great. I teach my student to use it. But just like any new standard, one more new standard is one more competing standard. It also creates an interoperability problem.

The defaults problem mentioned above is a symptom of this interoperability. Many R packages that I have contributed pull requests have a Rproj file. But still, I can break their code with emacs. The reason is simple: Emacs supports EditorConfig but not Rproj. Punkt.

ESS has no plan to support Rproj. One of the developers even proposed to have yet another standard using existing project management tools in the emacs ecosystem such as projectile. Oh, come on…

Taiwanese g0v civic hacker ethos can be summarized in one sentence: ask not why nobody is doing this. You are the “nobody”! Instead of asking why nobody is making Emacs/ESS supports Rproj, I am the “nobody” ⁶. Enter ess_rproj.

ess_rproj

Parsing a plain text file (Rproj file) and then adjusting the editor’s settings shouldn’t be rocket science, right?

No, it shouldn’t. But with my very limited knowledge in elisp, I was like the apes at the beginning of 2001 Space Odyssey. An alien monolith (emacs) appears before me, I need to discover how to use bone as a weapon. I learned how to program elisp from how to evaluate elisp code. ielm is my prefer way to explore the software lisp machine within emacs. It is like developing R: one code buffer and then a REPL.

My lisp is a bit rusty. But I am glad that I have some knowledge in other lisp dialects, I could pick up again quite swiftly.

Even with my knowledge in lisp, in order to develop something useful, one needs to read about the entire system of emacs APIs. The authoritative manual to this is An Introduction to Programming in Emacs Lisp. It is a 271-page text book. I need something quick and hand-on.

One useful resource I found is Emacs In a Box by Caio Rordrigues.

The most useful resource, of course, is UTSL. Use the source, Luke.

I have read the source code of ESS to study how ESS’s internal works. At the same time, I have learned some idiomatic elisp.

I have also clarified myself on some elementary emacs concepts. For example, I finally understand the customization mechanism is done mostly with mode hooks. A mode hook is an added (elisp) function to run when a mode is loaded. The function needs to be passed as a quote expression. Lucky me as the instructer of a functional programming workshop, I know what is a quoted expression ⁷.

Therefore, in order for my customization to work, I need to add a mode hook to ESS, like so:

(add-hook 'ess-mode-hook #'ess_rproj)

In the end, I have something like this. Not the prettiest elisp code out there. I think I should drop some let calls there. But I have tried not to use setq.

But hey, it is not perfect but it works! Perfect is the enemy of good enough.

I am a good citizen and I don’t break other’s code… style. Hopefully, Ken Benoit doesn’t need to lint my code after merging my future pull requests, although quanteda doesn’t use Rproj!

Conclusion

I always enjoy writing lisp. When things work as expected, that always brings smile to my face. Parentheses and lambdas are my super power!

Back in the day, there were just a few choices. Tinn-R’s contemporaries were Eclipse StatET which I didn’t even dare to install or zillion honorable attempts for a better GUI, e.g. Rcmdr, RKWard. Tinn-R was actually quite decent back then. ↩
Even the code editor bundled with Mac version is better its Windows counterpart. But this silly comparison is like comparing Internet Explorer 6 with Internet Explorer 4. ↩
In his book “The Cathedral and the Bazaar”, Eric Raymond uses the distinction between the development of Emacs C core and the user-driven elisp ecology as his analogy of Cathedral-builder and Bazaar styles. Due to the relatively lower entry barrier, Bazaar styles development are usually quicker and generate more unexpected innovations. If this book were written in 2020, probably Eric Raymand would use the distinction between the development of R Core by a team of white male members and the development of over 16000 R packages by a wide array of R users. ↩
It is inhumane to ask my students to learn emacs. ↩
If you get this as a joke, you are a geek! ↩
I take this ethos very seriously, or maybe too seriously sometimes. And I think this ethos is the root of my ever expanding number of side projects. Maybe I should like the others and be a moaner instead. Why nobody is developing a framework for validating topic models? Why nobody is developing an ODS parser? Why nobody is developing a multilingual topic model package? Why nobody is validating those sentiment dictionaries? ↩
If you don’t know: When you write an expression, for example, x <- 1. The default behaviour of your programming language is to eval this expression and give you the result, i.e. the binding of variable x to 1. Then, you can use the result in your subsequent program, e.g. using x in another expression of x <- x + 1. A quoted expression, e.g. quote(x <- 1), is a runnable expression. Instead of directly eval the expression and give you the result, this quoted expression is still an expression and passable as data to another function. If you want, you can use some other program to modify this quoted expression before eval this into a result. This is what we called metaprogramming: programs that use other programs as their data. ↩