Don't ask me why I did that. I'm not doing this for a PhD. I think I deserve a CS PhD (from the mighty University of 1-555-confide, of course) for doing this without resorting to retyping. Well, opinion varies.
The data is not completely clean and will only be production ready when the version number reached 1.0.
Date source: 政府憲報 2012 年第 10 號號外公告
Version 0.1:
Version 0.2:
* fixed the incorrect number of nominees (should be n = 310) and some OCR craziness in leung.txt
Version 1.0:
* chomped all trailing whitespace on the name list
* combined all three files into one csv for easy Excel import
* the R script for data cleansing dataprocessing.R
Dataset: nominee
Version 1.1:
* Fixed more incorrect cells
* matched the English name in ElectComm.csv