Click on the title links to download the data. Please contact me before the class in which data will be used if you have any problems downloading!
Contains:
urls.txt links scraped from State of New York site documenting disciplinary actions against doctors.
ny_dr_discip_urls.html web page with a table giving links to pages documenting individual disciplinary actions against New York doctors.
cleaning Folder containing:
techexports.xls High-technology exports from 1990 to 2015, in current US dollars, from the UN Comtrade database, supplied via the World Bank. High-technology exports include products in aerospace, computers, pharmaceuticals, scientific instruments, and electrical machinery.
ny_dr_discip_actions.xls The data on disciplinary actions against doctor in New York state, scraped from this website in the previous class.
ucb_stanford_2014.csv Data on federal government grants to UC Berkeley and Stanford University in 2014, downloaded from USASpending.gov.
geocoding Folder containing:
sf_test_addresses.tsv Text file containing a list of 100 addresses in San Francisco.sf_addresses_short.tsv The first 10 addresses from the previous file.refine-geocoder.json A script in JSON format that we will use to automate geocoding.pfizer Folder containing:
pfizer.sqlite SQLite database detailing payments by the drug company Pfizer to doctorspfizer.txt Tab-delimited text file containing the same datafda.txt Tab-delimited text file documenting warning letters from the Food and Drug Administration to doctors involved in research on experimental drugs or medical devices.Data used in reporting this news story.
Containing:
nations.csv Data from the World Bank Indicators portal, which is an incredibly rich resource. Contains the following fields:iso2c iso3c Two- and Three-letter codes for each country, assigned by the International Organization for Standardization.country Country name.yearpopulation Estimated total population at mid-year, including all residents apart from refugees.gdp_percap Gross Domestic Product per capita in current international dollars, corrected for purchasing power in different territories.life_expect Life expectancy at birth, in years.population Estimated total population at mid-year, including all residents apart from refugees.birth_rate Live births during the year per 1,000 people, based on mid-year population estimate.neonat_mortal_rate Neonatal mortality rate: babies dying before reaching 28 days of age, per 1,000 live births in a given year.region income World Bank regions and income groups, explained here.Containing:
oregon_dams.csv Data on dams in Oregon, from the state’s dams inventory. Contains fields including hazard (H=High, S=Significant, L=Low) , and storage, giving the each dam’s capacity in acre feet.
seismic_risk.zip Data on seismic risk for the contiguous US, calculated by the US Geological Survey. The risk values are in the field acc_val, and represent the peak ground acceleration, expressed as a percentage of the acceleration due to gravity, likely to be experienced with a 2% chance over 50 years. This data is a zipped shapefile, a common format for geographic data. Broadly speaking, a shapefile contains a table of data, which is associated with map geometries that can be polygons, lines, or points.
We will also import data from the USGS earthquakes API, for earthquakes since 1960 with a magitude of 5 and above, within a 2,000km radius of the geographic center of Oregon, using this url:
http://earthquake.usgs.gov/fdsnws/event/1/query?starttime=1960-01-01T00:00:00&latitude=43.9336&longitude=-120.5583&maxradiuskm=2000&minmagnitude=5&format=csv&orderby=time