Big Data Experiences

I have the experiences of working with 18 big data sets. These are BTD, USTD, GDP6, LAQN, HD2015, USRDS, NIS, BDHS-99, BDHS-11, CARDIA, MESA, NGHS, NSD, NCD, K12SD, USPOST, EXPCS, and CSD. I also have experiences of exploring two other big data sets from DC Sentencing Commission, which I cannot disclose due to NDA (Non-Disclosure Agreement). BTD and USTD stands for Bangladesh temperature data and USA temperature data, GDP6 stands for GDP data from six countries (Australia, Canada, Germany, Japan, UK and USA) from 1970 to 2015. These data are used in graduate level Time Series course for comparative study between parametric and nonparametric time seires models. LAQN  stands for London Air Quality Network data. Currently I am working on this data for one-step kernel loglikelihood estimation and two-step smoothing estimation of time-variant parameter. USRDS stands for United States Renal Data System, NIS stands for National Impatient Sample. I worked on USRDS and NIS data sets when I was a Postdoctoral fellow at NIAMS, NIH. BDHS-99 and BDHS-11 stands for Bangladesh Demographic and Health Survey for the year 1999 and 2011. NGHS stands for National Growth and Health Study (Longitudinal data), which was conducted from September, 1985 - March, 2000 under the supervision of NIH. My PhD dissertation data comes from this study. I am very familiar with the MESA and CARDIA study as I collaborated with NIH and Johns Hopkins University under the supervision of Dr. Colin Wu and Dr. Joao Lima. MESA stands for Multi-Ethnic Study of Atherosclerosis. CARDIA Stands for Coronary Artery Risk Development in Young Adults. So far, there are eight CARDIA study and I have used SAS macro to combine the SAS files from each study. NSD stands for National School Data Set. I came across this data set when I worked as an analytic researcher at K12. K12SD stands for K12 School Data Set. NCD stands for National Crime Data Set. I worked on this data when I worked as a part time statistical modeler at Washington DC sentencing commmission. USPOST stand for United States Postal Service Data set. I worked on this data when I was a summer intern at IHS, Global Insight in Washington DC. EXPCS Stands for Excel Academy Public School Data Set. I used this data when I worked as chief data analyst at Excel Academy Public Charte School in Washington DC. CSD stands Convenience Store Data Set, which I came across when I worked as a summer intern at NACS headquarter in Virginia.