chainsawriot

readODS 1.7 is released

Posted on Jul 12, 2020 by Chung-hong Chan

readODS 1.7 is released. Last release was in November 2018. I can’t remember what I was doing in November 2018. I can’t keep track of time now.

As the maintainer of the package, I am happy that a new release is there. In this blog post, I’m going to show you a few new features and the road map to readODS 2.0. In the very end, I will reflect a little bit about blogging my packages. If you care about R, you can skip the last part. If you care about me as a person but not so much about R per se, you can read the reflection directly.

readODS 1.7

readODS 1.7 is going to be the last release in the 1.0 series. It serves as the interim version before the next major release. In the end of this blog, I am going to show you the future breaking changes of 2.0. Here, I am going to show you the new features of 1.7 first.

For those who don’t know what is readODS: readODS is a package for reading and writing OpenDocument Spreadsheet (ODS) files. ODS is suggested to be an open standard for spreadsheets. It is famous for being the default format used by LibreOffice and its predecessor OpenOffice. Common spreadsheet software tools such as Microsoft Excel and Google Sheets also support this format. Many governments around the world use this format to circulate open data.

write_ods

Thanks to the contributions from John Foster, write_ods has improved significantly. The initial implementation of write_ods was done by Thomas J. Leeper. Similar to a xlsx file, an ODS file can have multiple sheets. In version 1.7, write_ods can write a data frame to an existing ODS file. New arguments append and update make this possible.

write_ods is very easy to use. It takes two essential argument: the data frame you would like to be written into a file (e.g. PlantGrowth) and the file name (“plant.ods”).

library(readODS)
write_ods(PlantGrowth, "plant.ods", sheet = "plant")

The ODS file can be read using read_ods.

read_ods("plant.ods")

##    weight group
## 1    4.17  ctrl
## 2    5.58  ctrl
## 3    5.18  ctrl
## 4    6.11  ctrl
## 5    4.50  ctrl
## 6    4.61  ctrl
## 7    5.17  ctrl
## 8    4.53  ctrl
## 9    5.33  ctrl
## 10   5.14  ctrl
## 11   4.81  trt1
## 12   4.17  trt1
## 13   4.41  trt1
## 14   3.59  trt1
## 15   5.87  trt1
## 16   3.83  trt1
## 17   6.03  trt1
## 18   4.89  trt1
## 19   4.32  trt1
## 20   4.69  trt1
## 21   6.31  trt2
## 22   5.12  trt2
## 23   5.54  trt2
## 24   5.50  trt2
## 25   5.37  trt2
## 26   5.29  trt2
## 27   4.92  trt2
## 28   6.15  trt2
## 29   5.80  trt2
## 30   5.26  trt2

Or listing all the sheets in the file.

list_ods_sheets("plant.ods")

## [1] "plant"

Using the append argument, you can append a new data frame (e.g. mtcars) to an existing ODS file (e.g. “plant.ods”).

write_ods(mtcars, "plant.ods", sheet = "mtcars_ods", append = TRUE)

Listing all sheets.

list_ods_sheets("plant.ods")

## [1] "plant"      "mtcars_ods"

Reading from a particularly sheet. Notice the missing row names.

read_ods("plant.ods", sheet = "mtcars_ods")

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Suppose you are not happy with the data in a sheet, you can use the argument update instead to update the content in a sheet.

write_ods(mtcars, "plant.ods", sheet = "mtcars_ods", update = TRUE, row_names = TRUE)
read_ods("plant.ods", sheet = "mtcars_ods")

## Warning: Missing column names filled in: 'X1' [1]

##                     NA  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1            Mazda RX4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2        Mazda RX4 Wag 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3           Datsun 710 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 4       Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 5    Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 6              Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 7           Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 8            Merc 240D 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 9             Merc 230 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 10            Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 11           Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 12          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 13          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 14         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 15  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 16 Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 17   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 18            Fiat 128 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 19         Honda Civic 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 20      Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 21       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 22    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 23         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 24          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 25    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 26           Fiat X1-9 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 27       Porsche 914-2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 28        Lotus Europa 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## 29      Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 30        Ferrari Dino 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 31       Maserati Bora 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 32          Volvo 142E 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

read_ods

The improvements to read_ods are relatively subtle. Almost invisible, in a good way.

One noticeable improvement is the output. Actually, you have seen that also in the previous examples of write_ods as well. In version 1.7, a new argument verbose is added to both read_ods and write_ods. By default, it is FALSE and thus It suppresses all messages.

read_ods("plant.ods", verbose = TRUE)

## Parsed with column specification:
## cols(
##   weight = col_double(),
##   group = col_character()
## )

##    weight group
## 1    4.17  ctrl
## 2    5.58  ctrl
## 3    5.18  ctrl
## 4    6.11  ctrl
## 5    4.50  ctrl
## 6    4.61  ctrl
## 7    5.17  ctrl
## 8    4.53  ctrl
## 9    5.33  ctrl
## 10   5.14  ctrl
## 11   4.81  trt1
## 12   4.17  trt1
## 13   4.41  trt1
## 14   3.59  trt1
## 15   5.87  trt1
## 16   3.83  trt1
## 17   6.03  trt1
## 18   4.89  trt1
## 19   4.32  trt1
## 20   4.69  trt1
## 21   6.31  trt2
## 22   5.12  trt2
## 23   5.54  trt2
## 24   5.50  trt2
## 25   5.37  trt2
## 26   5.29  trt2
## 27   4.92  trt2
## 28   6.15  trt2
## 29   5.80  trt2
## 30   5.26  trt2

Another improvement is the strings_as_factors argument. (Thanks to John Foster) This argument is historically controversial. Similar to the latest R 4.0, this is set to FALSE by default.

read_ods("plant.ods", strings_as_factors = TRUE)$group

##  [1] ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl ctrl trt1 trt1 trt1 trt1 trt1
## [16] trt1 trt1 trt1 trt1 trt1 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2 trt2
## Levels: ctrl trt1 trt2

Documentation and more

readme

README file of the package has finally been updated. One important change is the hex logo!

Also, a vignette is added!

Road map to readODS 2.0

As said, this 1.7 is the last release of the 1.0 series. If nothing stupid is going on, next release would be 2.0. There are many things planned for 2.0. The most important thing is that there would be quite a few breaking changes.

Deprecation

Many old functions will be deprecated. Actually, these functions are still there, only they will be renamed (and improved!). This has been going on since my tenure as the maintainer. As a member of a team studying the naming conventions of R packages, I firmly believe that a package should fanatically maintain the usage of only one single name convention. Quanteda, a package I religiously admire, has a very consistent naming convention. Its creator once told me they have a OCD maintaining that.

In the upcoming version 2.0, the following functions will be deprecated.

read.ods (the dot)
getNrOfSheetsInODS (the camel)
ods_sheets (no verb)

The updated functions of these are read_ods, get_num_sheets_in_ods and list_ods_sheets. Actually, the name of the package should also be renamed to readods too. But that’s historical and I don’t want to create even more problems.

The overwrite argument of write_ods will also be deprecated. For other file-writing functions, no one would expect to have this argument. Overwriting an existing file is the default behavior.

Just like every transition, this is going to be tough. Although read_ods has been introduced since 2016, read.ods is still in widespread use. Numerous online tutorials and Q&As have been written with read.ods, including a tutorial by a notorious data science education company (you know which one, right? If you don’t know, you don’t need to know).

In version 1.7, a warning message is displayed when you use these functions. Please don’t use those old functions in your code. Use read_ods, write_ods, list_ods_sheets and etc. Better yet, use rio. Thank you very much!

x <- read.ods("plant.ods")

## Warning in read.ods("plant.ods"): read.ods will be depreciated in the next
## version. Use read_ods instead.

At the mean time, I’m making pull requests to packages that are still using those old functions.

I think the next 2.0 CRAN release would be in next year. Probably around the summer. So now, we have one year for this (slow) transition.

readODS 2.0: what to expect?

I’ve decided to peg the expected behaviors of readODS to 3 functions: readxl::read_xlsx, writexl::write_xlsx and xlsx::write.xlsx. As the maintainer of the least popular (and arguably, the least important too) package among them, I don’t mind to be the trend-follower. I rather let them to be trend-setters.

So, what you would expect to see in readODS 2.0, you can see it in the current versions of these 3 packages, namely:

read_ods

returning tibbles
using range to select specific sheet, e.g. “Budget!B2:G14”.
trim_ws to trim white spaces before and after string data
speed

write_ods

speed

This is the end of the release note. You can skip the next part.

Reflection: blogging my packages

This is the first time for me to write a blog about my package. I’ve not even written one for oolong or rectr, my so-called mainstays. I used to think that blogging packages is not a very productive thing to do. But I think I have changed my mind.

My primary consideration, of course, is that readODS is not like oolong or rectr. readODS was created by Gerrit-Jan Schutten and I am just the maintainer. Although there is not much difference between a creator and a maintainer, this package was already quite widely used before I uptook the maintainership in 2016. It is like the reversed story of rio. rio was created by me in 2013 but it took off after Thomas Leeper’s wonderful leadership.

I don’t want to upset the users of readODS. Unlike oolong or rectr, readODS is the most used package under my wing. There are many users out there, whom I may not even know. I think it is better to open about my role.

Another issue, of course, is about popularity. The reason for me to uptake the maintainership of the package is to popularize the ODS format. I hope that this format can really be the open format. I don’t know if I have helped.

The data science community loves blog posts. I’ve read the book by Emily Robinson and Jacqueline Nolis and there is a chapter only about blogging. Although I don’t consider myself to be a data scientist, I can see why data scientists think blog posts are important.

I think from now on, I will blog about at least the CRAN release of my packages.

It also suits my OKR of “think out loud”.