Werden wir Helden für einen Tag

Home | About | Archive

Don't use paste() to handle JSON

Posted on May 10, 2023 by Chung-hong Chan

A quick note documenting how I went insane working with JSON and what should be the correct way to fix the problem.

The issue is this: For one project, I need to use MongoDB to store text data (I bold this for a purpose). You might know that almost all the things in MongoDB are in JSON (or its binary sister BSON): the query language is JSON, the return value is JSON, the internal representation of data is BSON.

If you want to deal with MongoDB via R, the only viable game now is mongolite.

The package is wonderful. For example, if I want to replace a record this is the code:

collection <- "somecollection"
db <- "somedb"
con <- mongolite::mongo(collection = collection, db = db)
con$update('{"link": "chainsawriot.com"}', '{"$set": {"text": "Werden wir Helden für einen Tag"}}')

So far so good. The update method takes two parameters: first parameter is the text (or character) presentation of a JSON query searching for some records (or “documents”, in MongoDB lingo); the second parameter is what I want to modify, again also the text presentation of a JSON query. But you might notice the ugly mixture of double and single quotes. And the double quotes in the JSON queries are needed (although in mongosh, some of them are optional, e.g. field names).

So level up. How to parameterize this? Let’s say I have many links to update. A quick-and-dirty solution is to use paste0().

link <- "chainsawriot.com"
natural_text <- "Werden wir Helden für einen Tag"

con <- mongolite::mongo(collection = collection, db = db)
con$update(paste0('{"link": "', link'"}', paste0('{"$set": {"text": "', natural_text, '"}}')

And it works… until it goes downhill. Try to fuzz it a bit, it dies. Specifically, any JSON special character will kill this and there are several JSON special characters: double quote, backslash, forward slash, backspace, newline and many more! Basically, any Indo-European natural language would probably kill this. One can test with this:

natural_text <- "so-called \"expert\""
json_content <- paste0('{"$set": {"text": "', natural_text, '"}}')
jsonlite::validate(json_content) ## no

This is an issue related to escaping text and I would say base R is an extremely confusing environment for doing that. I am sure you have the experience of adding slashes until something work. I can never remember when I should use one slash, two slashes, or even four slashes. It can quickly drive me insane.

The issue here is extremely similar to the hard lesson I learned from readODS 1.8. Instead of rolling out my own XML generator, here I roll out my own JSON generator inside a confusing R environment where string escaping is difficult. And I don’t want to fuzz it and solve it myself. So the solution is actually simple: use something well tested and don’t roll out your own sh*t!

natural_text <- "so-called \"expert\""
x <- list()
x$`$set` <- list()
x$`$set`$`text` <- natural_text
json_content <- jsonlite::toJSON(x, auto_unbox = TRUE)
jsonlite::validate(json_content)

You might wonder, why the second solution works. It is because the toJSON function is exporting an object called json and it handles all the escaping automatically, not regular character. And I am less insane now for using that.

The conventional wisdom says that solutions involving paste() (or her friend paste0()) are usually not very robust. I provide here yet another example, although it also mixes with the complexity of the string escaping business.

Postscript

As a preemptive response, I test here also glue.

natural_text <- "so-called \"expert\""
## glue::glue('{"$set": {"text": "{natural_text}"}}') doesn't work
json_content <- glue::glue('{"$set": {"text": "[[natural_text]]"}}', .open = "[[", .close = "]]")
jsonlite::validate(json_content) ## no

If you really want to do that with paste() or her friends, the correct answer is five slashes.

natural_text <- "so-called \"expert\""
unnatural_text <- gsub("\"", "\\\\\"", natural_text)
json_content <- paste0('{"$set": {"text": "', unnatural_text, '"}}')
jsonlite::validate(json_content)

Powered by Jekyll and profdr theme