docker2

This is the 2nd blog post on R development. It is the one dealing with development environment. I found out a bug of oolong by dockerizing oolong.

Preamble

oolong 0.4 is there. The most important feature, of course, is the ability to deploy the test online.

But I must say I see this only as a WIP. No doubt, WIP is still work. But perhaps it is just a baby-step kind of work. I always find that developing oolong (actually many packages) traps me in a dilemma, or in a trilemma. On the one hand, I find oolong doesn’t have the required functionalities. On the other hand, I can only develop functions that my imaginary users can handle. On the third hand (if I had one), actually I don’t know who my imaginary users are. Given a package with less than 100 downloads per week, it is not humble for me to say that oolong is not popular and perhaps also not important. So, most of the functionalities that I want to develop are perhaps just solutions looking for a problem.

An example of this trilemma is how the test should be deployed.

The current vignette suggests users to deploy it on shinyapps.io. It is an optimal choice for various reasons. First, it has a free tier. Second, it is the easiest to deploy, especially when one is using RStudio the IDE. However, the ease of deployment comes with a big price: the local file system on shinyapps.io is NOT permanent. Your deployed app is actually not on the same server all the time. It will be distributed across different servers.

This volatile nature actually kills many permanent storage solutions. For example, you cannot save the results as an RDS file store it on the shinyapps.io. Also, you cannot use database systems that use a flat file in the file system, e.g. SQLite, DuckDB etc. RStudio the company provides a guide on how to deal with this. They suggests several “remote” options, e.g. Amazon S3, Dropbox, Google Sheets etc.

To me, I don’t want to force a choice to my imaginary users, especially a choice that needs money (e.g. Amazon S3) or possibly insecure (e.g. Dropbox). Also, I don’t want my users to be in the ethical borderline to abandon oolong and use something like Mechanical Turk 1.

Of course, there is always a solution. For example, I can recommend my imaginary users to setup a PostgreSQL or MongoDB on their on-prem server. It solves all the problems, right? But it also raises the question: if my imaginary users can setup a PostgreSQL on their on-prem server, then why they need to deploy their oolong tests on shinyapps.io in the first place?

I hope this discussion illustrates how difficult it is for me to make decisions about the tech stack of oolong. Because of that, the current version of oolong makes a compromise: it doesn’t store the data; instead coders need to download their own data and give it back to the administrator. A stupid solution. But it makes the least number of assumptions about the tech stack. Still, it is an improvement over having the coders to install R on their machine, but still looks extremely stupid. Nonetheless, I still think it fulfills the sole purpose of oolong: remove the perceived technical difficulties (a.k.a. excuses) of setting up validation tests and nudge researchers to validate their tools. I have tested this setup in a sort of “massive” coding session: some double-digit number of (actually 10) students from National Taiwan University coded the same test and returned the data file via Google Drive. It works.

Suggesting shinyapps.io is optimal but stupid as said, but I believe in choice. In the vignette, I state that seasoned Shiny users might have their own way of deploying the oolong test online. Unlike authoritarianism, shinyapps.io is not the only “right” answer. It is just a suggested answer. Sure, there are many answers. One should always seek for one’s answer oneself. This is autonomy. If you just read someone’s suggested answer — a software tutorial, vignette, or the mumbo jumbo of some seniors (in terms of age, not in terms of knowledge) like the one you are reading — and then consider other’s answer to be your own answer, that’s heteronomy. You are being ruled by someone else. Don’t fall into such trap of TINA (“There is no alternative”) 2.

One alternative to shinyapps.io is to deploy the oolong to your Shiny server. In authentic autonomic ethos, one should really try to set up a Shiny server. But there are so many technical details one needs to understand before even trying. If I need to set the bar that high before one can use the oolong deployment feature, this feature is doomed.

The devops people actually know the pain and that’s why they use Docker. I have written about Docker and its prebuilt image Rocker for reproducible data analysis 3. Actually, we can use the same tech stack to setup a Shiny server ourselves and then deploy an oolong test there. If you can get your oolong test running in Docker on your local machine, you can make it running on whatever machine that can run Docker, i.e. virtually all computers, on-prem servers, AWS, Linode, Heroku or whatever cloud service, the Large Hadron Collider, or the International Space Station.

It’s actually a three-step process.

Empowering statement

This article is going to explain how to dockerize an oolong test. Really, the same procedure can be used to dockerize any Shiny app.

Step 1: Create your exported oolong test

Suppose you have create an oolong test using oolong 0.4 and then export it as a standalone Shiny app. If you still don’t know how to do that, it is like this:

require(oolong)
x <- wi(abstracts_keyatm)
export(x, "app")

The directory “app” contains the Shiny app. It can be launched by running this on your local machine.

shiny::runApp("app", port = 3838)

If you run it from the shell, you can do the same thing with

R -e "shiny::runApp('app', port = 3838)"

This bit of information looks trivial but is very important. Because we’ll need to use it again later.

Step 2: Build your Docker image and run it as a container

So, if you don’t know how to do it, I would recommend to read my previous blog post on Rocker. As nobody reads these days, I am going to give a short version of the same thing.

We are going to use Rocker again. Rocker provides several prebuilt docker images at your disposal. Among them, rocker/shiny is the image we are going to use as the basis of our Docker image. As the name implies, it is “Shiny Server on Debian stable”. Therefore, it is a nicely configured Shiny server.

Having only a Shiny server doesn’t cut it. You also need to, for example, install oolong, copy the exported oolong test over, and then launch the test. In order to do these, we need to specify them in a Dockerfile. There is a reference. This is the Dockerfile that we are going to use.

FROM rocker/shiny:latest

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
    libxml2-dev \
    libcairo2-dev \
    libsqlite3-dev \
    libmariadbd-dev \
    libpq-dev \
    libssh2-1-dev \
    unixodbc-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    git

RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get clean

ADD app ./app

RUN Rscript -e 'install.packages("devtools"); devtools::install_github("chainsawriot/oolong")'

EXPOSE 3838

CMD ["R", "-e", "shiny::runApp('app', host = '0.0.0.0', port = 3838)"]

Each line in a Dockerfile starts with a capitalized command. Following the Pareto principle, you only need to know 20% of the functionalities to deal with 80% of the situations. Actually, that 20% functionalities are 4 commands: FROM, RUN, ADD, CMD. EXPOSE is an extra verb that you may also need to know.

FROM tells docker what is the basis of the docker image. In our case, we want to use rocker/shiny:latest. And then we use RUN to install dependencies and do system update. These actions are only triggered during the docker image creation step.

And then ADD: One uses ADD command to copy files or directories from your local machine to the docker image 4. In our case, we copy app into the docker image.

We RUN again. This time we tell docker to install the Github version of oolong.

EXPOSE is to expose a certain port for us to access the docker container.

Finally, CMD is the command you want docker to run, when you launch this image as a container. Usually, a Dockerfile should only have one CMD command. In this case, we want this docker container to run our exported oolong test. As you can see, the command is exactly the last bit of information in the last paragraph.

OK, we can now build our Docker image and then run the image as a container. Depending on the way you installed docker, you may need to sudo. In this case, we name the Docker container “chainsawriotisevil”.

docker build -t docker-oolong .
docker run -d --rm -p 3838:3838 --name "chainsawriotisevil" docker-oolong

Step 3: There’s no step 3!

docker2_1

Well, actually there’s no step 3.

You can access the dockerized version of the oolong from your browser: localhost:3838

If you really need a step 3.

You can stop the docker container.

docker stop chainsawriotisevil

Or even delete the docker image.

docker image rm docker-oolong

Theoretically, the file system of the docker container is not volatile. But oolong can’t take advantage of the fact… yet.

Contribution

In this blog post, I illustrate how to dockerize an oolong test. In the process, you may have learned how to use Docker for both Shiny app development and deployment.

Footnotes

  1. Trust me, it creates more problems than solving your money problem. I have make a pledge not to use it again. Also, don’t use it casually like many political scientists in those big-name US universities. 

  2. Protesilaos Stavrou, a.k.a. Prot, wrote an excellent blog post on the moral lessons from switching to Emacs. And I borrow his notions of the difference between autonomy and heteronomy, as well as the concept of TINA here. 

  3. Recently, Musashi Harukawa (Oxon) also wrote an excellent piece on using Docker

  4. Actually, another way of doing it is to use COPY. It is not entirely true, but I usually consider ADD a superset of COPY. ADD has extra features such as auto-unpacking tar or fetching files from URL. COPY, as the name implies, do local copy.