Go to content Go to navigation and search

Home

Current PostGIS Blog Articles


Search

Browse

RSS / Atom

Email me

textpattern

Creative Commons License
All Blog Articles, Data Models and Free Source Code by Simon Greener, The SpatialDB Advisor is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Filtering Rings in Polygon (PostGIS)

Thursday January 01 2009 at 07:30

Keywordsfilter ring rings hole holes postgis polygon multipolygon
Summary

This article shows how to create a function to filter the inner rings of a (multi)polygon in PostGIS.

This is my first blog article on PostGIS.

On the PostGIS discussion list, a question was asked:

> I have a polygon table that has many small areas and holes. Now, I would
> like to remove small areas and holes that are 2800 m^2.
>
> Any help or advice would be really appreciated.

I decided to have a bash at this as part of my continuing to learn PostGIS.

This article will firstly look at a simple (single) polygon (I will use WKT to construct a polygon with two inner holes) which will make it easy for my readers to replicate. Then it will consider how to handle situations where a supplied polygon has no holes (inner rings) and how to handle polygons and multipolygons together before concluding with some performance figures.

1. Single Polygon with Inner Rings (holes)

The function that can help us do what we want is the ST_DumpRings(geometry) function. Here we can see how the function, applied to our test geomety, will extract all the rings for processing:

  1. SELECT ST_AsText(b.the_geom) AS final_geom, ST_Area(b.the_geom) AS area
  2.   FROM (SELECT (ST_DumpRings(a.geom)).geom AS the_geom
  3.           FROM (SELECT ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))') AS geom
  4.                 ) a
  5.         ) b;

Results:

geometry area
POLYGON ((0 0,20 0,20 20,0 20,0 0))” 400
POLYGON ((10 10,10 11,11 11,11 10,10 10))” 1
POLYGON ((5 5,5 7,7 7,7 5,5 5))” 4

My first attempt to use ST_DumpRings to achieve the required outcome resulted in the following SQL (note how we can apply our area filter to the extracted rings):

  1. SELECT ST_AsText(c.the_geom) AS final_geom
  2.   FROM (SELECT ST_Collect(b.the_geom) AS the_geom
  3.           FROM (SELECT (ST_DumpRings(a.geom)).geom AS the_geom
  4.                   FROM (SELECT ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))') AS geom
  5.                        ) a
  6.                 ) b
  7.           WHERE ST_Area(b.the_geom) > 1
  8.         ) c;
  9. -- Result
  10. "MULTIPOLYGON(((0 0,20 0,20 20,0 20,0 0)),((5 5,5 7,7 7,7 5,5 5)))"

Note that the output is a MULTIPOLYGON with each ring being a separate polygon in the collection and not a single POLYGON (as was the source data) with inner rings which is what we do want as the result.

Searching the PostGIS documentation indicated that I should be able to reconstitute the rings into a single polygon via use of the ST_Difference aggregate. Here we want to take the outer ring as the first argument to ST_Difference and subtract the set of all inner rings from it:

  1. SELECT ST_AsEWKT(d.final_geom)
  2.   FROM (SELECT ST_Difference(                   ST_GeometryN(c.the_geom,1),
  3.                              (SELECT ST_Collect(ST_GeometryN(c.the_geom,
  4.                                                              generate_series(2,numgeometries(c.the_geom)))))::geometry
  5.                             ) AS final_geom
  6.           FROM (SELECT ST_Collect(b.the_geom) AS the_geom
  7.                   FROM (SELECT (ST_DumpRings(a.geom)).geom AS the_geom
  8.                           FROM (SELECT ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))') AS geom ) a
  9.                         ) b
  10.                  WHERE ST_Area(b.the_geom) > 1
  11.                ) c
  12.          GROUP BY c.the_geom
  13.         ) d;
  14. -- Result
  15. ERROR:  set-valued FUNCTION called IN context that cannot accept a SET

This is because the ST_Collect returns geometry[] (ie a multi-geometry) and not a single geometry. (See documentation note: “Do not call with a GeometryCollection as an argument”)

Are there other solutions?

The ST_MakePolygon function looks useful:

ST_MakePolygon(linestring, [linestring[]])

Creates a Polygon formed by the given shell and array of holes. You can construct a geometry array using ST_Accum.

Though it requires linestring and not polygon arguments which will result in more complex SQL (as is shown below):

  1. SELECT ST_AsEWKT(ST_MakePolygon(c.outer_ring, d.inner_rings )) AS final_geom
  2.   FROM (/* Get outer ring of polygon */
  3.         SELECT ST_ExteriorRing(b.the_geom) AS outer_ring
  4.           FROM (SELECT (ST_DumpRings(a.geom)).geom AS the_geom, path(ST_DumpRings(a.geom)) AS path
  5.                   FROM (SELECT ST_PolyFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))') AS geom
  6.                        ) a
  7.                 ) b
  8.           WHERE path[1] = 0 /* ie the outer ring */
  9.         ) c,
  10.        (/* Get all inner rings > a particular area */
  11.         SELECT ST_Accum(ST_ExteriorRing(b.the_geom)) AS inner_rings
  12.           FROM (SELECT (ST_DumpRings(a.geom)).geom AS the_geom, path(ST_DumpRings(a.geom)) AS path
  13.                   FROM (SELECT ST_PolyFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))') AS geom
  14.                        ) a
  15.                 ) b
  16.           WHERE path[1] > 0 /* ie not the outer ring */
  17.             AND ST_Area(b.the_geom) > 1
  18.         ) d;
  19. -- Result
  20. "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"

Note, that we get our desired result.

The splitting of the SQL into the two halves to extract the outer ring and the inner rings (separately) had to be done because the only method of reconstructing the polygon was via the ST_MakePolygon function and it needs two inputs: a linestring for the outer ring and an array of linestrings for the inner rings left after being filtered by area. Sadly, ST_Collect() on the straight output of ST_DumpRings (with no filtering by path only area) just generates a multipolygon. I tried playing around with ST_Intersection etc but these still return a multipolygon.

Now, this SQL is fine for a single, hardcoded test geometry, but I think it would be messy if we tried to use it to process a lot of polygons in a table. The best way to approach this is to encapsulate the above SQL (a complete algorithm) in a function and then use the function when processing our tabular data.

  1. CREATE OR REPLACE FUNCTION Filter_Rings(geometry,FLOAT)
  2. RETURNS geometry AS
  3. $$
  4. SELECT ST_MakePolygon(c.outer_ring, d.inner_rings) AS final_geom
  5.   FROM (/* Get outer ring of polygon */
  6.         SELECT ST_ExteriorRing(b.the_geom) AS outer_ring
  7.           FROM (SELECT (ST_DumpRings($1)).geom AS the_geom, path(ST_DumpRings($1)) AS path) b
  8.           WHERE b.path[1] = 0 /* ie the outer ring */
  9.         ) c,
  10.        (/* Get all inner rings > a particular area */
  11.         SELECT ST_Accum(ST_ExteriorRing(b.the_geom)) AS inner_rings
  12.           FROM (SELECT (ST_DumpRings($1)).geom AS the_geom, path(ST_DumpRings($1)) AS path) b
  13.           WHERE b.path[1] > 0 /* ie not the outer ring */
  14.             AND ST_Area(b.the_geom) > $2
  15.         ) d
  16. $$
  17.   LANGUAGE 'sql' IMMUTABLE;

And example of the use of Filter_Rings is:

  1. SELECT ST_AsEWKT(Filter_Rings(ST_PolyFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))') ,1::FLOAT));
  2. -- Result
  3. "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"

After submission to the PostGIS discussion list, the greatly skilled and experienced Regina Obe suggested a few small amendments to my SQL and function.

“You should do this:

SELECT (ST_DumpRings(a.geom)).*

Instead of this:

SELECT (ST_DumpRings(a.geom)).geom As the_geom, path(ST_DumpRings(a.geom)) as path

(Which would mean in the upper part you would need to reference by .geom instead of the_geom.)

The reason for that is internally PostgreSQL will call ST_DumpRings twice [cf Deterministic functions in Oracle). This was pointed out to me by a very experienced PostgreSQL developer: His blog entry about it is here

Implementing this suggestion leads to this change in the filter_rings function.

  1. CREATE OR REPLACE FUNCTION filter_rings(geometry, FLOAT)
  2.   RETURNS geometry AS
  3. $BODY$
  4. SELECT ST_MakePolygon(c.outer_ring, d.inner_rings) AS final_geom
  5.   FROM (/* Get outer ring of polygon */
  6.         SELECT ST_ExteriorRing(b.geom) AS outer_ring
  7.           FROM (SELECT (ST_DumpRings($1)).*) b
  8.           WHERE b.path[1] = 0 /* ie the outer ring */
  9.         ) c,
  10.        (/* Get all inner rings > a particular area */
  11.         SELECT ST_Accum(ST_ExteriorRing(b.geom)) AS inner_rings
  12.           FROM (SELECT (ST_DumpRings($1)).*) b
  13.           WHERE b.path[1] > 0 /* ie not the outer ring */
  14.             AND ST_Area(b.geom) > $2
  15.         ) d
  16. $BODY$
  17.   LANGUAGE 'sql' IMMUTABLE;

However, Regina also pointed out a way to shorten the SQL to something like that which I wanted at the start of this article by suggesting:

“I think ST_BuildArea might be better than ST_MakePolygon in this regard. It will work fine with a single closed ring and if multiple, it turns the inners to holes.

What I was thinking of was this.

  1. CREATE OR REPLACE FUNCTION upgis_filter_rings(geometry,FLOAT) RETURNS geometry AS
  2. $$ SELECT ST_BuildArea(ST_Collect(a.geom)) AS final_geom
  3.         FROM ST_DumpRings($1) AS a
  4.           WHERE a.path[1] = 0 OR
  5.                 (a.path[1] > 0 AND ST_Area(a.geom) > $2)
  6. $$
  7.   LANGUAGE 'sql' IMMUTABLE;

“BuildArea just assumes everything outside is a polygon and everything inside is hole where as ST_MakePolygon takes your categorization of hole/vs shell as gospel.

So […] compar[ing]:

  1. SELECT ST_AsText(upgis_filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))'),2));
  2. -- Result
  3. "POLYGON((0 0,0 20,20 20,20 0,0 0),(5 5,7 5,7 7,5 7,5 5))"
  4. -- Another ....
  5. SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0),(10 10,10 11,11 11,11 10,10 10),(5 5,5 7,7 7,7 5,5 5))'),2));
  6. -- Result
  7. "POLYGON((0 0,0 20,20 20,20 0,0 0),(5 5,7 5,7 7,5 7,5 5))"
  8. .

We note that they yield the same answer.”

Regina goes on to say:

“The main disadvantage aside from possibly speed over Simon’s is that if you have 3D polygons, I think the above will squash them to 2D where as his approach will support 3D.”

Testing the 3D/2D issue leads to:

  1. SELECT ST_AsEWKT(upgis_filter_rings(ST_GeomFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
  2. -- Result
  3. "POLYGON((0 0 1,0 20 1,20 20 1,20 0 1,0 0 1),(5 5 3,7 5 3,7 7 3,5 7 3,5 5 3))"
  4. .
  5. SELECT ST_AsEWKT(      filter_rings(ST_GeomFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
  6. -- Result
  7. "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"
  8. .

Which shows that both approaches honour the third dimension.

2. Handling of Polygons with no Inner Rings

It is important that any function we deploy handles polygons with an outer ring but no inner rings. Testing Regina’s function and mine leads to this:

  1. SELECT ST_AsText(upgis_filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
  2. -- Result
  3. "POLYGON((0 0,0 20,20 20,20 0,0 0))"
  4. .
  5. SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
  6. -- Result
  7. ""

So, the implementation based on ST_MakePolygon fails while Regina’s, based on ST_BuildArea, passes.

The problem with my approach appears to be in the handling of the “ST_Collect“ing (or “ST_Accum“ulating) of the inner rings. Regina again:

“It has to do with this annoying behavior of ST_MakePolygon and hmm ST_Accum that when the second argument is null it nullifies everything and I think ST_Accum is returning null when no result.

Now if you changed your function to this — it would work fine and ARRAY is faster than ST_Accum anyway.

Also you really don’t need the subselects so I took that out as well.”

So, the new filter_rings based on ST_MakePolygon becomes:

  1. CREATE OR REPLACE FUNCTION filter_rings(geometry, DOUBLE PRECISION)
  2.   RETURNS geometry AS
  3. $BODY$
  4. SELECT ST_MakePolygon((/* Get outer ring of polygon */
  5.         SELECT ST_ExteriorRing(geom) AS outer_ring
  6.           FROM ST_DumpRings($1)
  7.           WHERE path[1] = 0 /* ie the outer ring */
  8.         ),  ARRAY(/* Get all inner rings > a particular area */
  9.         SELECT ST_ExteriorRing(geom) AS inner_rings
  10.           FROM ST_DumpRings($1)
  11.           WHERE path[1] > 0 /* ie not the outer ring */
  12.             AND ST_Area(geom) > $2
  13.         ) ) AS final_geom
  14. $BODY$
  15.   LANGUAGE 'sql' IMMUTABLE;

And, testing:

  1. SELECT ST_AsEWKT(filter_rings(ST_GeomFromText('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
  2. -- Result
  3. "POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))"
  4. .
  5. SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
  6. -- Result
  7. "POLYGON((0 0,20 0,20 20,0 20,0 0))"

Success: thanks Regina!

3. Multi-Polygons

Of course, having a filter function for polygons is great, but many real-world definitions of area features involves multiple outer shells. So, we really need an implementation that handles mutli-polygons. Let’s try the following.

  1. SELECT ST_AsEWKT(ST_Collect(c.final_geom)) AS filtered_geom
  2.   FROM (SELECT ST_MakePolygon((/* Get outer ring of polygon */
  3.     SELECT ST_ExteriorRing(b.the_geom) AS outer_ring /* ie the outer ring */
  4.     ),  ARRAY(/* Get all inner rings > a particular area */
  5.      SELECT ST_ExteriorRing(e.geom) AS inner_ring
  6.        FROM (SELECT (ST_DumpRings(b.the_geom)).*) e
  7.       WHERE e.path[1] > 0 /* ie not the outer ring */
  8.         AND ST_Area(e.geom) > 2
  9.     ) ) AS final_geom
  10.          FROM (SELECT ST_GeometryN(a.geom,generate_series(1,numgeometries(a.geom))) AS the_geom
  11.                  FROM (SELECT ST_Multi(ST_GeomFromEWKT('MULTIPOLYGON(((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))')) AS geom
  12.                        ) a
  13.               ) b
  14.        ) c;
  15. -- Result
  16. "MULTIPOLYGON(((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))"

Encapsulating this in our filter_rings function:

  1. CREATE OR REPLACE FUNCTION filter_rings(geometry, DOUBLE PRECISION)
  2.   RETURNS geometry AS
  3. $BODY$
  4. SELECT ST_BuildArea(ST_Collect(b.final_geom)) AS filtered_geom
  5.   FROM (SELECT ST_MakePolygon((/* Get outer ring of polygon */
  6.     SELECT ST_ExteriorRing(a.the_geom) AS outer_ring /* ie the outer ring */
  7.     ),  ARRAY(/* Get all inner rings > a particular area */
  8.      SELECT ST_ExteriorRing(b.geom) AS inner_ring
  9.        FROM (SELECT (ST_DumpRings(a.the_geom)).*) b
  10.       WHERE b.path[1] > 0 /* ie not the outer ring */
  11.         AND ST_Area(b.geom) > $2
  12.     ) ) AS final_geom
  13.          FROM (SELECT ST_GeometryN(ST_Multi($1),/*ST_Multi converts any Single Polygons to MultiPolygons */
  14.                                    generate_series(1,ST_NumGeometries(ST_Multi($1)))
  15.                                    ) AS the_geom
  16.                ) a
  17.        ) b
  18. $BODY$
  19.   LANGUAGE 'sql' IMMUTABLE;

Which, testing, gives us:

  1. SELECT ST_AsText(filter_rings(ST_GeomFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),2));
  2. -- Result
  3. "POLYGON((0 10,0 20,20 20,20 0,0 0))"
  4. .
  5. SELECT ST_AsEWKT(filter_rings(ST_GeomFromEWKT('POLYGON((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3))'),2));
  6. -- Result
  7. "POLYGON((0 0 1,0 20 1,20 20 1,20 0 1,0 0 1),(5 5 3,7 5 3,7 7 3,5 7 3,5 5 3))"
  8. .
  9. SELECT ST_AsEWKT(filter_rings(ST_GeomFromEWKT('MULTIPOLYGON(((0 0 1,20 0 1,20 20 1,0 20 1,0 0 1),(10 10 2,10 11 2,11 11 2,11 10 2,10 10 2),(5 5 3,5 7 3,7 7 3,7 5 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))'),2));
  10. -- Result
  11. "MULTIPOLYGON(((0 0 1,0 20 1,20 20 1,20 0 1,0 0 1),(5 5 3,7 5 3,7 7 3,5 7 3,5 5 3)),((50 5 3,50 7 3,70 7 3,70 5 3,50 5 3)))"

Which, looks pretty good to me.

In summary, I think Regina’s suggested change to use ST_BuildArea is neater and cleaner than my use of ST_MakePolygon. But my final solution is a combination of both.

4. Performance

What is now needed is some performance analysis of Regina and my approaches especially given Regina’s comment alluding to performance.

I have a dataset which contains some very complicated rural polygons with multiple inner rings so I thought I would test using that. Here are the two test SQL statements:

  1. -- Test SQL 1
  2. SELECT upgis_filter_rings(a.geom,10000)
  3.   FROM rural_areas a
  4.  WHERE GeometryType(geom) = 'POLYGON'
  5.  ORDER BY random()
  6.  LIMIT 1000;
  7. -- Test SQL 2
  8. SELECT filter_rings(a.geom,10000) /* ,  and  ms */
  9.   FROM rural_areas a
  10.  WHERE GeometryType(geom) = 'POLYGON'
  11.  ORDER BY random()
  12.  LIMIT 1000;

I also recompiled filter_rings to include the fuller multi-geometry handling and ran the last SQL statement again. The relative performance can be seen in the table below.

Function Run 1 ms Run 2 ms
upgis_filter_rings 77453 76500
filter_rings 23781 23890
(multi) filter_rings 86563 87625

Clearly the single polygon-handling ST_MakePolygon-based filter_rings performs faster than the ST_BuildArea-based upgis_filter_rings function. Converting filter_rings so that it handles multi-polygons pushes the performance out: this is probably due to the use of ST_BuildArea and the multi-polygon disaggregation/aggregation. I suspect changing Regina’s upgis_filter_rings to be multi-polygon aware would push the performance of that function out even further.

My attempt at converting Regina’s function is:

  1. CREATE OR REPLACE FUNCTION upgis_filter_rings(geometry,FLOAT) RETURNS geometry AS
  2. $$ SELECT ST_BuildArea(ST_Collect(d.built_geom)) AS filtered_geom
  3.      FROM (SELECT ST_BuildArea(ST_Collect(c.geom)) AS built_geom
  4.              FROM (SELECT b.geom
  5.                      FROM (SELECT (ST_DumpRings(ST_GeometryN(ST_Multi($1),/*ST_Multi converts any Single Polygons to MultiPolygons */
  6.                                                             generate_series(1,ST_NumGeometries(ST_Multi($1)) )
  7.                                                             ))).*
  8.                            ) b
  9.                     WHERE b.path[1] = 0 OR
  10.                          (b.path[1] > 0 AND ST_Area(b.geom) > $2)
  11.                    ) c
  12.            ) d
  13. $$
  14.   LANGUAGE 'sql' IMMUTABLE;

Testing against the rural_areas table showed the function ran at 147281 (Run 1) and 144813 (Run 2) ms. Regina was right: upgis_filter_rings is slower because of its use of ST_BuildArea.

Summary

In summary, this has demonstrated to me that PostGIS is very functionally rich in the handling of basic simple features geometries. It has more construction and editing tools than any of the current commercial products though many of those have functionality that PostGIS still does not have.

I hope this article gives, the reader, some idea as to the direction you could take…

Creative Commons License

post this at del.icio.uspost this at Diggpost this at Technoratipost this at Redditpost this at Farkpost this at Yahoo! my webpost this at Windows Livepost this at Google Bookmarkspost this to Twitter

Comment [1]

I think you have problem with multi-polygons.
In 3. Multi-Polygons
“ST_BuildArea” is extra in
SELECT ST_BuildArea(ST_Collect(d.built_geom)) as filtered_geom” row.

— Jevgeni · 9 March 2010, 08:44 · #