Conference News : Minnesota Pop Center Data Initiatives

Two reports on data initiatives at MPC were given:

David Van Riper reported on Terra Populous (TerraPop), an NSF-funded DataNet project that seeks to lower the barriers for conducting human-environment interactions research.

Dr. Miriam King reported on the Integrated Demographic and Health Series, a new NICHD-funded data integration project that lowers the barriers to cross-temporal and cross-country research.

Jean Sack posted this resumé based on her notes:

David Van Riper presented Terra Populus: Integrated Data on Population and Environment (one of 5 projects from National Science Foundation), in 2007 DataNet established.

GIS mapping at Minnesota Population Center; collaborative with CIESIN at Columbia University, ICPSR, Institute of the Environment at Univ. of Minnesota, Humans in the environment, INNE crop data (needs documentation and archiving),

  1. DataOne (second 5 years of funding given but reduced) 2. DataConservatory (failed 18 month review and lost funding) 3. DFC Datanet Federation Consortium in N. Carolina 4. ?? 5. SEAD: Sustainable Environments – Actionable Data

Terrapop is like a blender to mix together 3 levels of data to make them interoperable: microdata, area-level data, raster data (land cover tied to spatial coordinates such as trees, crops, water bodies, roads tied to GIS). Terrapop is trying to create tools to quickly link population and raster data with climate data with income data).


Improved data access – some data comes in tiles from FTP sites so now access is simplified to global resource sets broken down into country sets. “Liberating data from hard-to-use formats” Some data in html tables such as in Croatia (not machine readable); Loas data in tables which were converted into digital metadata

Preservation – need plans! Previous versions of data difficult or impossible to find when new collection supersedes old collections.

Documentation – data lacks sufficient (or any) metadata, eg EarthStat lacked on FTP site, Tiff files only without data description, Terrapop wrote a python script to give tips in metadata. need user permissions. Needed area information, linage statement, GLI data is used on National Geographic but it falls apart to gain access.

Data Creation – construct historical subnational GIS data. Eg Departments in 1980 are different than 1970 in Tucuman, Argentina. Brazil and Latin America will be trained with GIS and cameras to disseminate through Terra Pop to study Population changes. Used Harvard’s maps collection, took digital pictures which can incorporate other data sources. Interlibrary loans from Census Bureau International Collection to take photos, Library of Congress,.

Transformations: learning curve for sociologists would be difficult so Terrapop can show precipitation map over the geographic/political traditional authority boundaries. Can calculate % of cover of trees in Brazil. Excellent coverage in South America and parts of Asia with censuses, down to county levels but military and politics often blocks lower level information sharing. 175 crops, landcover, worldclim 1950-2000. 12 MODIS land cover types.

Project year 4 – August 2015 will role out a new Terrapop website.

Q who are your big users and how can I encourage our colleagues to use it

A let’s build it, they will come… we need to talk to environmentalists (who don’t know what data exists). David knows who owns what country data,

Q How can your data relate to other socio-economic indicators?

A type of crop may involve child labor, gender, deforestation for soybean production, etc.

Q Will website give geographic level by country?

A Yes we do have geographic level info and if we have geographic boundary sets

Q could economists use this?

A Yes, Gates Harvestplus folks were excited about the crop information, Yes for women farmers,

Q Is there a group that specialized in environmental and population

A University of Colorado

Q How are you leveraging your data to gain more funding?

A We have post-doc student and are learning about his former datasets. Many datasets do exist but others needed and funding is needed.

Q Can people download maps

A Yes, in August the maps and shape files can be downloaded from the revised TerraPop website

IDHS: An Integrated system

Dr. Miriam King, coordinates the integration of DHS and GIS but level of geography is not good, small samples and only representative geography. Third year of work on interoperability of DHS IDHS countries in Africa, India, with more than two DHS, and committed to integrating newest surveys (as in Nigeria, 2013) 18 countries, 76 surveys to look at dramatic change over time from mid 1980s to recent.

Works like IPUMS so that use carries over into the IDHS They take the publicly available data sets and make them easier to understand and do file management and select variables in   demographics, geography, household possessions, SES, education, media, FP, sex practices & attitudes, condom use & access, HIV/AIDS STIs, antenatal & delivery care, insurances & care access, NEW: Fistula TB; childhood diarrhea respiratory illness child nutrition, alcohol & tobacco use, female genital cutting, domestic violence, household decision-making. May be adding pregnancy termination.

Early 2016 will add: BIRTHS as 3rd unit of analysis and children under 3 as comparison for Feb 2016; many nonstandard variables by country, new countries Cameroon, Madagascar, Rwanda

How to use IDHS – start with unit of analysis (women or children); select samples of interest

Variable available at a glance – which survey has FGM?

Easy access to variable information

Customized datasets, easy to modify to ignore those you do not need

Variable integration without loss of detail (have now defined variables into a given code)

Users wanting to download data can use DHS/IPF login ID or apply for access to DHS data (info on organization, contact into, reasons for using). Samples can be used from web freely but researchers who might want to upload data must register.

Dropdown menu of topics appears with sample data so subgroups can be seen. Eg Domestic Violence has 14 variables – husband accuses of unfaithfulness to spouse ever threatens with harm – the X shows that the DHS does include the variable. Can learn about individual variables and how they differ between samples, how it changed, clickable link to explanations. Gives case counts on variables so sample sizes are adequate, age ranges match. TABS show how variables constructed, survey question text, other texts are linked to those variables to see context of question, original survey forms and model questionnaires translated into English!

Customized datasets possible after log in, select samples and variables (data cart), merge files on the fly to create a single custom extract over time, select format (SPSS, State, SAS, CSV, ascii), download fully-integrated file (with variables code translated/integrated over time; meanings obvious). Email notification of dataset ready with specifications saved by IDHS. Easy to modify datasets to add or delete samples.

Variable integration without loss of detail: IDHS uses a variety of national surveys, not just DHS, to compare and point out differences for variables, issues of comparability, consistent codes. Uses IPUMS-International using international census data for 3 countries: Bangladesh, Mexico, and Kenya. Excel Translation Table harmonizes codes and labels with 3 digit composite codes. E.g. married, single, divorced. MPC dataset saves so much time in leafing through code books, merging files, focus on questions.

Q Does harmonization occur before samples or on the fly?

A Broadest ranges included (perhaps not monogamy or polygamy)

Q Why didn’t DHS or Macro International [ICF] do this?

A They are funded for getting out in the field, negotiate with country, create final reports, they never had the orientation of looking at historical impact of data and economic change. They lack a backward looking perspective and are too busy getting surveys into field and cleaning data. Minnesota Population Center had much experiences and in-house knowledge of multiple data sets ($2.5m over 5 years which is being cut by 20% is not much to start from scratch) Dee Ruggles of US Census Bureau had experienced his own student frustrations with historical censuses.

Q How do researchers appropriate citations?

A on the website is a link to the proper citation format for using IDHS versions. When people apply for access they are asked to cite the data properly. Funding agencies won’t fund unless we mention the use of IDHS! MPC bibliography names datasets. Working on Youtube instructions, hands-on exercises used at PAA workshop on April 28th and will be put on line, and a help email is on web

Q Are there are other “new” huge datasets that you are using

A USA – based Integrated Health survey will integrate a MEPS survey (treatment, costs of care, drugs taken, combined with status of smokers, depression, education). Micro Indicator Cluster Surveys (MICS) from UNICEF may be harmonized… Mhanes someday…


Leave a Comment

You must be logged in to post a comment.