Click on the title links to download the data for each session. Unzip the folders and place them on your desktop.
Most of these files are in CSV format, which stands for comma-separated values. These are plain text files, in which fields in the data are separated by commas, and each record is on a separate row. CSV is a common format for storing and exchanging data, and can be read by most data analysis and visualization software. Values that are intended to be treated as text, rather than numbers, are often enclosed in quote marks.
The characters used to separate the variables, called ‘delimiters,’ may vary. A .tsv
extension, for instance, indicates that the variables are separated by tabs. More generally, text files have the extention .txt
.
The file ca_counties_medicare.zip
is a shapefile, which is a common format used for geographical data.
Shapefiles are usually made available for download as zipped folders, and actually consist of a series of files. At a minimum, a shapefile must contain three component files, with the same root name and the following extensions:
.shp
The main file containing the geometry of the points, lines or polygons mapped in the shapefile..dbf
A database file in dBASE format containing a table of data relating to the components of the geometry. For example, in a world shapefile giving national boundaries, this table might contain data about the countries including their names, capital cities, population, annual GDP, and so on..shx
A positional index of the shapefile’s geometry.Zipped folder containing the following files:
oil_production.csv
Data on oil production by world region from 2000 onwards, in thousands of barrels per day, from the U.S. Energy Information Administration.gdp_pc.csv
World Bank data on Gross Domestic Product (GDP) per capita for nations and groups of nations, from 1990 onwards, in current international dollars, corrected for purchasing power in different territories.mlb_salaries_2015.csv
Salaries of players in Major League Baseball at the start of the 2015 season, from the Lahman Baseball Database.disease_democ.csv
Data illustrating a controversial theory suggesting that the emergence of democratic political systems has depended largely on nations having low rates of infectious disease, from the Global Infectious Diseases and Epidemiology Network and Democratization: A Comparative Analysis of 170 Countries.food_stamps.csv
Percentage of Americans participating in the federal Supplemental Nutrition Asssistance Program from 1969 onwards, from the U.S. Department of Agriculture, plus the percentage of Americans below the federal poverty level, from the U.S. Census Bureau.nations_2015.csv
Data from the World Bank Indicators portal, which is an incredibly rich resource. Contains the following fields:iso2c
iso3c
Two- and Three-letter codes for each country, assigned by the International Organization for Standardization.country
Country name.year
2015 for this data.gdp_percap
Gross Domestic Product per capita in current international dollars, corrected for purchasing power in different territories.life_expect
Life expectancy at birth, in years.population
Estimated total population at mid-year, including all residents apart from refugees.region
income
World Bank regions and income groups, explained here.Zipped folder containing the following:
nations.csv
Data from the World Bank Indicators portal. Contains the following fields:iso2c
iso3c
Two- and Three-letter codes for each country, assigned by the International Organization for Standardization.country
Country name.year
gdp_percap
Gross Domestic Product per capita in current international dollars, corrected for purchasing power in different territories.life_expect
Life expectancy at birth, in years.population
Estimated total population at mid-year, including all residents apart from refugees.birth_rate
Live births during the year per 1,000 people, based on mid-year population estimate.neonat_mortal_rate
Neonatal mortality rate: babies dying before reaching 28 days of age, per 1,000 live births in a given year.region
income
World Bank regions and income groups, explained here.Zipped folder containing the following files:
ca_counties_medicare.zip
Zipped shapefile with data on Medicare reimbursement per enrollee by California county in 2012, from the Dartmouth Atlas of Healthcare. healthcare_facilities.csv
Locations and other data for hospitals and other healthcare facilities in California, from the California Department of Public Health. I have geocoded those facilities that lacked latitude and longitude coordinates in the raw data.The folders for each session also each contain a template web page, test.html
, for embedding visualizations that will make.