May 8, 2013

Building a custom database of country time-series data using Quandl

Encouraged by this post I had another look at quandl for collecting datasets from different agencies. Right now I need to get data for four countries on a couple of dozen indicators.


This graphic is just a quick example with only two indicators of what I am aiming to be able to do.

The process on Quandl at the moment is a bit fiddly:

  • there is no search function in the API
  • the country codes used are different from agency to agency

So my workflow is this. It isn’t as complicated as it sounds. I have used spreadsheets to store country codes and queries to make it all as re-useable as possible. You can download the spreadsheets here and here.

  • edit the csv spreadsheet of the 2-and 3-digit ISO country codes, plus the actual names. Also, WHO for some reasons uses some other codes which I had to paste in by hand. If you find your sources are also using yet other codes, you can add them to the spreadsheet. Put an x in the enabled” column to mark the countries you want to use.
  • search manually at quandl for interesting queries and add them to the other csv spreadsheet, replacing the country code with %s, again putting an x in the enabled” column for the queries you want, adding a human-readable title in the title” column if you want and putting alpha2” or alpha3″ etc in the country_sign column to mark which kind of country code is being used.
  • run the script below.











for(qq in 1:nrow(queries)){


for(cc in 1:nrow(codesE)){












rr$Value=ifelse(!$Value),rr$Value,rr$Percent) #you might have to do something like this if your queries are returning data in columns with some other label than Value

#then try a graphic for demonstration purposes

ggplot(data=rr,aes(x=Year,y=Value,group=Country,colour=Country))+geom_point(size=3)+geom_line()+facet_grid(Indicator~.,scales=“free”)+ theme(strip.text.y = theme_text(size = 13, hjust=0,angle = 0))+theme(axis.text.x=element_text(angle=90))

And voila.

I wanted to put the spreadsheets as a google spreadsheet but it seems RGoogleDocs is not working for R 3.0.

development evaluation R reproducibleResearch code

Previous post
Changing figure options mid-chunk (in a loop) using the pander package. I wrote already about changing figure options mid-chunk in reproducible research. This can be important  e.g. if you are looping through a dataset
Next post
Omni test for statistical significance Rplot13 In survey research, our datasets nearly always comprise variables with mixed measurement levels - in particular, nominal, ordinal and

This blog by Steve Powell is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, syndicated on r-bloggers and powered by Blot.
Privacy Policy