The most common spatial format that geographic data is distributed in is ESRI shapefiles. The geographic datasets provided at the City of Chicago's open data portal are no different. In this post, I'm going to walk through converting an ESRI shapefile into SQL, importing that SQL into a PostGIS enabled PostgreSQL database, and querying the geographic data.
Before we get started you will need to download and/or install the following data and software:
- PostGIS/PostgreSQL database
- The GDAL libraries
- A dataset from the City of Chicago Data Portal. For this example I am going to use the Ward data provided by the city.
- pgAdminIII - A PostgreSQL admin and management tool that is optional but highly recommended.
Step 1: SHP file to SQL file
The data published by the city uses a spacial reference system (SRS) called NAD 1983 StatePlane Illinois East FIPS 1201 Feet to store data. This SRS falls into the State Plane category which is typically great for measurement and maps covering a limited geographic area like a state or city. For most contexts this is perfect but, if you want to use this spatial data with mapping layers or geocoding services from Google or Bing, it will cause problems. Mapping providers like Google and Microsoft Bing expect data to be stored using the WGS 84 (EPSG code 4326) SRS. This means that if we want to use latitude and longitude coordinates from these services or utilize their mapping layers we will have to convert our data from State Plane to WGS 84. Thankfully the GDAL tools provide a utility to transform the data.
ogr2ogr -f "ESRI Shapefile" -s_srs "EPSG:102671" -t_srs "EPSG:4326" Wards4326.shp Wards.shp
Once the shapefile has been converted from State Plane to WGS 84, we can then use the shp2pgsql tool to generate the SQL statements necessary to add the geography data to our database.
shp2pgsql -s 4326 -I -c -W UTF-8 Wards4326.shp wards > wards.sql
Here is a sample of one of the insert statements generated by the shp2pgsql with geometry column (the_geom) abbreviated:
INSERT INTO "wards" ("data_admin","perimeter","ward","alderman","class","ward_phone","hall_phone","hall_offic","address","edit_date1","shape_area","shape_len",the_geom) VALUES ('111078823.37393899','98206.78600274','20','WILLIE COCHRAN','1','773-955-5610','312-744-6840','121 N LASALLE ST, RM 300 OFFICE 19, 60602','6357 S COTTAGE GROVE','20030527','111200062.771999999','98166.05290559999','0106000020E6100...'
Step 2: Insert Data into DB
Now that the dataset has been transformed and the SQL has been generated, it is time to add it to the database. You can do so by running the following command. (NOTE: you will have to swap in your own database name and user for the $DB_NAME and $DB_USER arguments).
psql -d $DB_NAME $DB_USER < wards.sql
Step 3: Write Queries
With the data entered into the database we can now write queries against it. Here is a query that returns information about the ward that contains the point specified by latitude and longitude arguments.
SELECT ward, alderman FROM wards WHERE ST_Contains(the_geom, ST_GeomFromText('POINT(41.927064 87.645658)', 4326));
The WHERE clause is calling the spatial relationship function ST_Contains. Spatial relationship functions in PostGIS take two input geometries and return either a boolean or another geometry. In our case, ST_Contains returns a boolean indicating if the point geometry passed as the second argument is contained in the geometry stored in the the_geom column of the wards table. The point geometry is constructed from another spatial function ST_GeomFromText. This function takes a textual representation of the geometry along with a spatial reference id (SRID) specifying the SRS that the geometry should be created in. In our case, we are using latitude and longitude values obtained from Google so we are creating the point geometry in the EPSG 4326 SRS.
I found the following links useful as I started learning about PostGIS.