The IASTED African Conference on
Health Informatics
AfricaHI 2010

Science and Technology Applications for Health and Sustainable Development

September 6 – 8, 2010
Gaborone, Botswana


Impact of Missing Data Estimation

Prof. Tshilidzi Marwala
University of Johannesburg, South Africa


This tutorial presents an impact assessment for the imputation of missing data. The assessment is performed by measuring the impacts of missing data on the statistical nature of the data, on a classifier, and on a logistic regression system. The data set used is HIV seroprevalence data from an antenatal clinic study survey performed in 2001. Data imputation is performed through the use of Random Forests, selected based on best imputation performance above five other techniques. Test sets are developed which consist of the original data and of imputed data with varying numbers of specifically selected missing variables imputed. Results indicate that, for this data set, the evaluated properties and tested paradigms are fairly immune to missing data imputation. The impact is not highly significant, with, for example, linear correlations of 96 % between HIV status probability prediction with a full set and with a set of two imputed variables using the logistic regression analysis.

Qualifications of the Instructor(s)

Tutorial Session Portrait

Tshilidzi Marwala born 28 July 1971 in Venda, Limpopo South Africa is the Executive Dean of the Faculty of Engineering and the Built Environment at the University of Johannesburg. He was previously a full Professor of Electrical Engineering, the Carl and Emily Fuchs Chair of Systems and Control Engineering, and the DST/NRF South Africa Research Chair of Systems Engineering at the University of the Witwatersrand. He is the youngest recipient of the Order of Mapungubwe and was the first African Engineer to be awarded the President Award by the National Research Foundation of South Africa. He holds a Bachelor of Science in Mechanical Engineering (Magna Cum Laude) from Case Western Reserve University, a Master of Engineering from the University of Pretoria, the PhD in Engineering from the University of Cambridge and successfully completed a Program for Leadership Development at Harvard Business School. He was a post-doctoral research associate at the Imperial College of Science, Technology and Medicine and in year 2006 to 2007 was a visiting fellow at Harvard University. In the year 2007 to 2008 he has been appointed a visiting fellow of Wolfson College, Cambridge. His research interests include the application of computational intelligence to engineering, computer science, finance, social science and medicine. He has supervised 38 masters and PhD students and has published over 200 papers in journals such as the American Institute of Aeronautics and Astronautics Journal, proceedings and book chapters. His work has appeared in publications such as The Economist, Time Magazine and New Scientist.