Don't ask me why I did that. I'm not doing this for a PhD. I think I deserve a CS PhD (from the mighty University of 1-555-confide, of course) for doing this without resorting to retyping. Well, opinion varies.

The data is not completely clean and will only be production ready when the version number reached 1.0.

Date source: 政府憲報 2012 年第 10 號號外公告

Version 0.1:




Version 0.2:

* fixed the incorrect number of nominees (should be n = 310) and some OCR craziness in leung.txt


Version 1.0:

* chomped all trailing whitespace on the name list
* combined all three files into one csv for easy Excel import
* the R script for data cleansing dataprocessing.R

Dataset: nominee

Version 1.1:

* Fixed more incorrect cells
* matched the English name in ElectComm.csv