On the PostGIS discussion list, a question was asked:
> I have a polygon table that has many small areas and holes. Now, I would
> like to remove small areas and holes that are 2800 m^2.
> Any help or advice would be really appreciated.
I decided to have a bash at this as part of my continuing to learn PostGIS.
This article will firstly look at a simple (single) polygon (I will use WKT to construct a polygon with two inner holes) which will make it easy for my readers to replicate. Then it will consider how to handle situations where a supplied polygon has no holes (inner rings) and how to handle polygons and multipolygons together before concluding with some performance figures.
1. Single Polygon with Inner Rings (holes)
The function that can help us do what we want is the ST_DumpRings(geometry) function. Here we can see how the function, applied to our test geomety, will extract all the rings for processing:
SELECT ST_AsText(b.the_geom)AS final_geom, ST_Area(b.the_geom)AS area
FROM(SELECT(ST_DumpRings(a.geom)).geom AS the_geom
Note that the output is a MULTIPOLYGON with each ring being a separate polygon in the collection and not a single POLYGON (as was the source data) with inner rings which is what we do want as the result.
Searching the PostGIS documentation indicated that I should be able to reconstitute the rings into a single polygon via use of the ST_Difference aggregate. Here we want to take the outer ring as the first argument to ST_Difference and subtract the set of all inner rings from it:
The splitting of the SQL into the two halves to extract the outer ring and the inner rings (separately) had to be done because the only method of reconstructing the polygon was via the ST_MakePolygon function and it needs two inputs: a linestring for the outer ring and an array of linestrings for the inner rings left after being filtered by area. Sadly, ST_Collect() on the straight output of ST_DumpRings (with no filtering by path only area) just generates a multipolygon. I tried playing around with ST_Intersection etc but these still return a multipolygon.
Now, this SQL is fine for a single, hardcoded test geometry, but I think it would be messy if we tried to use it to process a lot of polygons in a table. The best way to approach this is to encapsulate the above SQL (a complete algorithm) in a function and then use the function when processing our tabular data.
After submission to the PostGIS discussion list, the greatly skilled and experienced Regina Obe suggested a few small amendments to my SQL and function.
“You should do this:
Instead of this:
SELECT (ST_DumpRings(a.geom)).geom As the_geom, path(ST_DumpRings(a.geom)) as path
(Which would mean in the upper part you would need to reference by .geom instead of the_geom.)
The reason for that is internally PostgreSQL will call ST_DumpRings twice [cf Deterministic functions in Oracle). This was pointed out to me by a very experienced PostgreSQL developer: His blog entry about it is here”
Implementing this suggestion leads to this change in the filter_rings function.
Of course, having a filter function for polygons is great, but many real-world definitions of area features involves multiple outer shells. So, we really need an implementation that handles mutli-polygons. Let’s try the following.
In summary, I think Regina’s suggested change to use ST_BuildArea is neater and cleaner than my use of ST_MakePolygon. But my final solution is a combination of both.
What is now needed is some performance analysis of Regina and my approaches especially given Regina’s comment alluding to performance.
I have a dataset which contains some very complicated rural polygons with multiple inner rings so I thought I would test using that. Here are the two test SQL statements:
-- Test SQL 1
FROM rural_areas a
-- Test SQL 2
SELECT filter_rings(a.geom,10000)/* , and ms */
FROM rural_areas a
I also recompiled filter_rings to include the fuller multi-geometry handling and ran the last SQL statement again. The relative performance can be seen in the table below.
Run 1 ms
Run 2 ms
Clearly the single polygon-handling ST_MakePolygon-based filter_rings performs faster than the ST_BuildArea-based upgis_filter_rings function. Converting filter_rings so that it handles multi-polygons pushes the performance out: this is probably due to the use of ST_BuildArea and the multi-polygon disaggregation/aggregation. I suspect changing Regina’s upgis_filter_rings to be multi-polygon aware would push the performance of that function out even further.
My attempt at converting Regina’s function is:
CREATEORREPLACEFUNCTION upgis_filter_rings(geometry,FLOAT)RETURNS geometry AS
FROM(SELECT(ST_DumpRings(ST_GeometryN(ST_Multi($1),/*ST_Multi converts any Single Polygons to MultiPolygons */
(b.path>0AND ST_Area(b.geom)> $2)
Testing against the rural_areas table showed the function ran at 147281 (Run 1) and 144813 (Run 2) ms. Regina was right: upgis_filter_rings is slower because of its use of ST_BuildArea.
In summary, this has demonstrated to me that PostGIS is very functionally rich in the handling of basic simple features geometries. It has more construction and editing tools than any of the current commercial products though many of those have functionality that PostGIS still does not have.
I hope this article gives, the reader, some idea as to the direction you could take…