Werden wir Helden für einen Tag

Home | About | Archive

Rust is fast, but how fast (vs R)?

Posted on Jan 22, 2023 by Chung-hong Chan

I am currently “learning Rust” 1. Like I have written in a previous post, I originally embarked on a hypothetical adventure of “learning Rust” for the sake of learning Rust. I am reading Rust Programming Language. A good book, but I am bored after reading a few chapters, or am so itchy to try things out. So I go building things to solve problems (even when I don’t 100% understand Rust). And man, this is best way to learn Rust.

The issue

In a previous post, I wrote about how I write academic papers. And you may know that I mostly write in markdown formats. Actually, my job now actually wants me to write in Quarto, which I like tremendously.

I have a main BibTeX file with all the entries. I cite from that BibTeX file. But the issue is, it is not reproducible. I need to provide a smaller version of that BibTeX file with only the entries cited in a markdown file, or a bunch of markdown files. This process is called “condensation”. The only software that I know doing this is condensebib by Andreas Beger, which I have also written about in that post. This nice little non-CRAN R package worked quite well for me, up until a certain point.

First, the orphaning of the underlying R bibtex package. Although a new maintainer is now available and the package is again on CRAN, you know how easy it is for scientific software to be on the brink of vanishing (bibtex and all whole chain of R packages using it: bibtex => RefManager => (eurostat => (ARPALData, iotables), knitcitation => (datelife, OpenTreeChronograms, wallace)) And I am talking about those on CRAN only.) bibtex is now again on CRAN, but the BibTeX parser is not very robust. In the last ICA writing reason (last October), the last few days were for me to debug what went wrong with the parser (it wasted a lot of my time). Couldn’t figure it out (someone else did), and I did the condensation manually.

Also, condensebib is not complete feature-wise. I contributed to a bookdown project and each chapter is a single RMarkdown file. I needed for that project to condense a BibTeX file with all citations from all chapters. The current version can’t do that. I modified the package, it works. Submitted a PR, never merge.

All of my frustration with condensebib (and the associated R BibTeX ecosystem) culminated to this moment: rewrite condensebib.

In R? No, in Rust.

Learning enough Rust to be safe

I like to say “learning enough x to be dangerous.” But this time, I say “learning enough Rust to be safe.” Memory safe, to be exact. But actually, it is quite difficult to write memory unsafe Rust code, unless one explicitly does so.

My condensebib rewrite is called bibcon and it’s available from my project’s Github. It works exceptionally well. But the question is: How fast it is?

I did a quick benchmark to compare condensebib and bibcon. It is totally unfair to compare them and it is like proving riding a speed train is faster than walking.

Time difference is 2.235s vs 0.013s. >171x. Most of the time saved is from the parsing of the BibTeX file. So, if your BibTeX file is super large, the time saved would be substantial.

I can’t sell this software by saying it is fast. I (and many people) don’t care about this 2s time difference. But bibcon is stable, feature complete, and active maintained (by the friendly owner of this blog). I have put it into use for several of my recent papers already. And hopefully with bibcon, I don’t need to fiddle with the BibTeX thing in the coming paper writing season.

  1. Yes, it’s for Projekt 71. 

Powered by Jekyll and profdr theme