Go to content Go to navigation and search

Home

Straws in the Wind Blog Articles


Search

Browse

RSS / Atom

Email me

textpattern

Creative Commons License
All Blog Articles, Data Models and Free Source Code by Simon Greener, The SpatialDB Advisor is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Spatial Database Independence

Monday September 26 2011 at 06:53

I often hear asserted in discussions about accessing data in databases that the approach taken should be “database independent”. The assertion is always made as if it were some sort of Scientific Truth (Dogma even). And it is always assumed that this is easy to achieve. Finally, it always seems to be an accepted reality as if it was backed by some sort of consensus in the Computer Science Literature.

(I have almost always heard this when discussing non-Microsoft databases and technologies but rarely when the Microsoft technologies are the principle components of a solution. But, perhaps that is just a selective reading based on my own career!)

This article will provide a simple comparison based on generating “vectors” from the rings of a multi-polygon using Oracle’s ST_Geometry, SQL Server 2008’s STGeometry and PostgreSQL/PostGIS’s ST_Geometry.

Initial Comments on Hierarchy

In my comparison work I use the OGC SFS 1.1 standard with preference for the SQL/MM standard. Since all three databases are “Object” relational databases (ORDBMS) we need to start with a comment about this aspect of the implementation.

Oracle’s ST_Geometry is a true type hierarchy with only those functions defined on specific elements available for geometries of that type. So, for example, you cannot call ST_Area on an ST_CURVE (ST_LINESTRING, ST_MULTILINESTRING etc), whereas in SQL Server 2008 and PostGIS you can.

Oracle

  1. SELECT a.geom.ST_Area() AS area
  2.   FROM (SELECT TREAT(ST_Geometry.FROM_WKT('LINESTRING(10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0)',NULL) AS st_linestring) AS geom
  3.           FROM DUAL ) a;
  4. -- Result
  5. ORA-00904: "A"."GEOM"."ST_AREA": invalid identifier
  6. 00904. 00000 -  "%s: invalid identifier"

The availability of a true object hierarchy was not always the case. Up to 10g Oracle only had a singly-inherited type library. The original SDO_GEOMETRY approach is still singly inherited: only the SQLMM/OGC implementation from 10g onwards is not.

In the above SQL you will notice a lack of an ST_LineFromText() constructor that returns an ST_LineString object: this is not available in the ST_CURVE or ST_LINESTRING sub-class objects. Given the use of ST_Geometry one would have thought an ST_GeomFromText() method would be available rather than the non-standard FROM_WKT: nope, sorry, Oracle has not implemented one. The only ST*FromText() method available are:

  1. SELECT object_name || '.' || procedure_name AS conversion_method
  2.   FROM all_procedures
  3.  WHERE owner = 'MDSYS'
  4.    AND object_name LIKE 'ST%'
  5.    AND procedure_name LIKE 'ST%TEXT';
  6. -- Result
  7. CONVERSION_METHOD
  8. -------------------------------------
  9. ST_ANNOTATION_TEXT.ST_ANNOTATION_TEXT
  10. ST_MULTIPOLYGON.ST_BDMPOLYFROMTEXT
  11. ST_MULTIPOLYGON.ST_BDMPOLYFROMTEXT
  12. ST_POLYGON.ST_BDPOLYFROMTEXT
  13. ST_POLYGON.ST_BDPOLYFROMTEXT

Finally, with Oracle, some constructors or accessors return ST_Geometry objects even if they are ST_LineStrings etc. To access the methods of the underlying sub-class object, we need to “cast” them to the required type via the Oracle TREAT function. eg SELECT TREAT (geom as ST_LineString) as line FROM …..

PostGIS

Note that in PostGIS you cannot use the . notation to access the methods of an object.

  1. SELECT ST_Area(a.geom) AS area
  2.   FROM (SELECT ST_LineFromText('LINESTRING(10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0)',0) AS geom ) a;
  3. -- Result
  4. area
  5. ----
  6. 0
p. Because PostGIS is "singly inherited", it does not use a "dot" object.method invocation from a base class. Thus ST_LineFromText() function is an independent function that returns a geometry (ST_Geometry): No casting is required. *SQL Server 2008*
  1. SELECT a.geom.STArea() AS area
  2.   FROM (SELECT geometry::STLineFromText('LINESTRING(10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0)',0) AS geom) a;
  3. -- Result
  4. area
  5. ----
  6. 0

SQL Server 2008 is a “singly inherited” type system that uses a “dot” object.method invocation from a base class in object creation or method execution. Thus ST_STLineFromText() function requires the use of “geometry::” that acts as a surrogate for a missing ST_Geometry abstract class.

Comparisons

The following three SQL statements are the result of trying to minimise the differences in SQL between the three databases. Where possible, I avoided use of anything that is specific to a database (eg rownum in Oracle). In addition, I use my SQL Server and Oracle versions of PostgreSQL’s generate_series() set returning function.

Oracle

  1. -- MultiPoly with two elements (polygons) one with an Inner Ring
  2. WITH poly AS (
  3. SELECT ROW_NUMBER() OVER (PARTITION BY b.n ORDER BY b.n) AS polyNo,
  4.        a.geom
  5.   FROM (SELECT TREAT(VALUE(t) AS st_polygon) AS geom
  6.           FROM (SELECT ST_MultiPolygon.ST_BdMPolyFromText(
  7.                        'MULTIPOLYGON(((10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0),
  8.                                      (12.0 12.0, 12.0 14.0, 14.0 14.0, 14.0 12.0, 12.0 12.0)),
  9.                                     ((100.0 100.0, 150.0 100.0, 150.0 150.0, 100.0 150.0, 100.0 100.0)))',0)
  10.                        AS geom
  11.                   FROM dual) o,
  12.                 TABLE(o.geom.ST_GEOMETRIES()) t
  13.        ) a,
  14.        (SELECT 1 AS n FROM dual) b
  15.  WHERE a.geom.ST_GeometryType() = 'ST_POLYGON'
  16. )
  17. SELECT a.polyNo,
  18.        a.ringNo,
  19.        ROW_NUMBER() OVER (PARTITION BY a.PolyNo, a.ringNo, b.n ORDER BY b.n) AS vectorNo,
  20.        a.sp.ST_X() AS sx, a.sp.ST_Y() AS sy,
  21.        a.ep.ST_X() AS ex, a.ep.ST_Y() AS ey
  22.   FROM ( SELECT g.polyNo,
  23.                 g.ringNo,
  24.                 g.geom.ST_PointN(p.column_value  ) AS sp,
  25.                 g.geom.ST_PointN(p.column_value+1) AS ep
  26.            FROM (SELECT p.polyNo,
  27.                         1 AS ringNo,
  28.                         p.geom.ST_ExteriorRing() AS geom
  29.                    FROM poly p
  30.                   UNION ALL
  31.                  SELECT p.polyNo,
  32.                         p.column_value+1 AS ringNo,
  33.                         p.geom.ST_InteriorRingN(p.column_value) AS geom
  34.                    FROM poly p,
  35.                         TABLE(codesys.geom.generate_series(1,p.geom.ST_NumInteriorRing(),1)) p
  36.                   WHERE p.column_value < p.geom.ST_NumInteriorRing()  /* Reports 2 but actually only 1: the first */
  37.                 ) g,
  38.                 TABLE(codesys.geom.generate_series(1,g.geom.ST_NumPoints(),1)) p
  39.           WHERE p.column_value < g.geom.ST_NumPoints()
  40.        ) a,
  41.        (SELECT 1 AS n FROM dual) b
  42.  ORDER BY a.polyNo, a.ringNo, 3;
  43. -- Result
  44. POLYNO RINGNO VECTORNO SX   SY  EX  EY
  45. ------ ------ -------- ---- --- --- ---
  46. 1      1      1        10   10  15  10
  47. 1      1      2        15   10  15  15
  48. 1      1      3        15   15  10  15
  49. 1      1      4        10   15  10  10
  50. 1      2      1        12   12  14  12
  51. 1      2      2        12   14  12  12
  52. 1      2      3        14   14  12  14
  53. 1      2      4        14   12  14  14
  54. 2      1      1        100  100 150 100
  55. 2      1      2        150  100 150 150
  56. 2      1      3        150  150 100 150
  57. 2      1      4        100  150 100 100
  58. -- Result
  59.  12 ROWS selected

Notes:

  1. Need for “(select 1 as n from dual) b” in order to generate a unique row number as Row_Number() cannot be used outside of a partition or order by (this is the same problem across all three databases).
  2. generate_series return referenced by database specific “column_value” column name;
  3. The use of TREAT to ensure the polygons extracted from the MultiPolygon are correctly cast to ST_Polygons so its methods eg ST_ExteriorRing() can be used.
  4. ST_NumInteriorRing() seems to also report the exterior ring in its count yet it cannot be used (this is not the same as PostGIS and SQL Server 2008).
  5. FROM dual” cannot be avoided;
  6. ST_GeometryType returns “ST_POLYGON” (uppercase);
  7. TABLE functions cannot be avoided but they can be used with values from other tables in a query;
  8. TABLE function must be used to disaggregate the individual polygons returned by the non-standard ST_GEOMETRIES() method – Oracle does not implement ST_GeometryN();
  9. Constructor function name ST_BdMPolyFromText different from other databases.

However, an additional comment is warranted with regards to ROW_NUMBER. In Oracle – but not the other 2 databases – it is possible to generate a row number as follows:

  1. SELECT ROW_NUMBER() OVER (partition BY 1 ORDER BY 1) AS rowno, b.column_value
  2.   FROM TABLE(codesys.geom.generate_series(1,5,1)) b;
  3. -- Result
  4. ROWNO COLUMN_VALUE
  5. ----- ------------
  6. 1     1
  7. 2     2
  8. 3     3
  9. 4     4
  10. 5     5

To ensure commonality across databases in generating row numbers, I had to use the constructs as shown in the different SQL.

SQL Server 2008

  1. -- MultiPoly with two elements (polygons) one with an Inner Ring
  2. WITH poly AS (
  3.   SELECT ROW_NUMBER() OVER (PARTITION BY B.N ORDER BY B.N) AS polyNo,
  4.          a.geom
  5.     FROM (SELECT o.geom.STGeometryN(t.IntValue) AS geom
  6.       FROM (SELECT geometry::STMPolyFromText(
  7.                      'MULTIPOLYGON(((10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0),
  8.                                     (12.0 12.0, 12.0 14.0, 14.0 14.0, 14.0 12.0, 12.0 12.0)),
  9.                                    ((100.0 100.0, 150.0 100.0, 150.0 150.0, 100.0 150.0, 100.0 100.0)))',0)
  10.                    AS geom
  11.          ) o
  12.          CROSS APPLY
  13.          generate_series(1,o.geom.STNumGeometries(),1) t
  14.          ) a,
  15.          (SELECT 1 AS n) b
  16.    WHERE UPPER(a.geom.STGeometryType()) = 'POLYGON'
  17. )
  18. SELECT a.polyNo,
  19.        a.ringNo,
  20.        ROW_NUMBER() OVER (PARTITION BY a.PolyNo, a.ringNo, b.n ORDER BY b.n) AS vectorNo,
  21.        a.sp.STX AS sx, a.sp.STY AS sy,
  22.        a.ep.STX AS ex, a.ep.STY AS ey
  23.   FROM ( SELECT g.polyNo,
  24.                 g.ringNo,
  25.                 g.geom.STPointN(p.IntValue  ) AS sp,
  26.                 g.geom.STPointN(p.IntValue+1) AS ep
  27.            FROM (SELECT polyNo,
  28.                         1 AS ringNo,
  29.                         p.geom.STExteriorRing() AS geom
  30.                    FROM poly p
  31.                   UNION ALL
  32.                  SELECT polyNo,
  33.                         t.IntValue+1 AS ringNo,
  34.                         p.geom.STInteriorRingN(t.IntValue) AS geom
  35.                    FROM poly p
  36.                         CROSS APPLY
  37.                         generate_series(1,p.geom.STNumInteriorRing(),1) t
  38.                 ) g
  39.                 CROSS APPLY
  40.                 generate_series(1,g.geom.STNumPoints(),1) p
  41.           WHERE p.IntValue < g.geom.STNumPoints()
  42.        ) a,
  43.        (SELECT 1 AS n) b
  44.  ORDER BY a.polyNo, a.ringNo, 3;
  45. -- Results
  46. polyNo ringNo vectorNo sx  sy  ex  ey
  47. ------ ------ -------- --- --- --- ---
  48. 1      1      1        10  10  15  10
  49. 1      1      2        15  10  15  15
  50. 1      1      3        15  15  10  15
  51. 1      1      4        10  15  10  10
  52. 1      2      1        12  12  12  14
  53. 1      2      2        12  14  14  14
  54. 1      2      3        14  14  14  12
  55. 1      2      4        14  12  12  12
  56. 2      1      1        100 100 150 100
  57. 2      1      2        150 100 150 150
  58. 2      1      3        150 150 100 150
  59. 2      1      4        100 150 100 100

Notes:

  1. Use of non-standard “CROSS APPLY” to allow for table values to be used by the generate_series function;
  2. generate_series return referenced by database specific “IntValue” column name;
  3. Methods have ST prefix with no underscore;
  4. Use of “geometry::” cast.
  5. STGeometryType returns “Polygon” (no ST_ prefix as in other two databases);
  6. Constructor function name STMPolyFromText different from other databases;
  7. STNumInteriorRing reports only interior rings;
  8. STX/STY are not methods.

PostgreSQL/PostGIS

  1. WITH poly AS (
  2. SELECT ROW_NUMBER() OVER (PARTITION BY b.n ORDER BY b.n) AS polyNo,
  3.        a.geom
  4.   FROM (SELECT ST_GeometryN(o.geom,generate_series(1,ST_NumGeometries(o.geom),1)) AS geom
  5.           FROM (SELECT ST_MultiPolygonFromText(
  6.                       'MULTIPOLYGON(((10.0 10.0, 15.0 10.0, 15.0 15.0, 10.0 15.0, 10.0 10.0),
  7.                                     (12.0 12.0, 12.0 14.0, 14.0 14.0, 14.0 12.0, 12.0 12.0)),
  8.                                    ((100.0 100.0, 150.0 100.0, 150.0 150.0, 100.0 150.0, 100.0 100.0)))',0)
  9.                        AS geom
  10.                 ) o
  11.        ) a,
  12.        (SELECT 1 AS n ) b
  13.  WHERE UPPER(ST_GeometryType(a.geom)) = 'ST_POLYGON'
  14. )
  15. SELECT a.polyNo,
  16.        a.ringNo,
  17.        ROW_NUMBER() OVER (PARTITION BY a.PolyNo, a.ringNo, b.n ORDER BY b.n) AS vectorNo,
  18.        ST_X(a.sp) AS sx,ST_Y(a.sp) AS sy,
  19.        ST_X(a.ep) AS ex,ST_Y(a.ep) AS ey
  20.   FROM ( SELECT g.polyNo,
  21.                 g.ringNo,
  22.                 ST_PointN(g.geom,generate_series(1,ST_NumPoints(g.geom)-1,1)) AS sp,
  23.                 ST_PointN(g.geom,generate_series(2,ST_NumPoints(g.geom)  ,1)) AS ep
  24.            FROM (SELECT p.polyNo,
  25.                         1 AS ringNo,
  26.                         ST_ExteriorRing(p.geom) AS geom
  27.                    FROM poly p
  28.                   UNION ALL
  29.                  SELECT p.polyNo,
  30.                         r.ringNo + 1,
  31.                         ST_InteriorRingN(p.geom,r.ringNo) AS geom
  32.                    FROM poly p,
  33.                        (SELECT generate_series(1,ST_NumInteriorRing(p.geom),1) AS ringNo FROM poly p) AS r
  34.                 ) g
  35.        ) a,
  36.        (SELECT 1 AS n) b
  37.  ORDER BY a.polyNo, a.ringNo, 3;
  38. -- Results
  39. polyNo ringNo vectorNo sx  sy  ex  ey
  40. ------ ------ -------- --- --- --- ---
  41. 1      1      1        10  10  15  10
  42. 1      1      2        15  10  15  15
  43. 1      1      3        15  15  10  15
  44. 1      1      4        10  15  10  10
  45. 1      2      1        12  12  12  14
  46. 1      2      2        12  14  14  14
  47. 1      2      3        14  14  14  12
  48. 1      2      4        14  12  12  12
  49. 2      1      1        100 100 150 100
  50. 2      1      2        150 100 150 150
  51. 2      1      3        150 150 100 150
  52. 2      1      4        100 150 100 100

Notes:

  1. Constructor function name MultiPolygonFromText different from other databases;
  2. STNumInteriorRing reports only interior rings.
  3. No object.method notation;
  4. ST_GeometryType returns “ST_Polygon” which is upper cased for cross-database comparison purposes;
  5. set returning function must be implemented as an attribute in a select statement in order to be able to use geometry methods.

Set Returning Functions

PostgreSQL’s inability for set returning functions eg generate_series() to draw their values/parameters from another table’s columns in the SQL is one of the major issues in trying to create cross-database SQL.

Here are some examples that show the issue. The first two show how generate_series can be used reasonably “naturally” with the value from the first table “a” being fed into the generate_series of “b”. The only difference is SQL Server’s keywords “CROSS APPLY” vs Oracle’s “, TABLE”.

  1. -- SQL Server 2008
  2. SELECT b.*
  3.   FROM (SELECT 5 AS maxValue) a
  4.        CROSS APPLY
  5.        generate_series(1,a.maxValue,1) b;
  6. -- Result
  7. IntValue
  8. --------
  9. 1
  10. 2
  11. 3
  12. 4
  13. 5
  1. -- Oracle
  2. SELECT b.*
  3.   FROM (SELECT 5 AS maxValue FROM DUAL) a,
  4.        TABLE(codesys.geom.generate_series(1,a.maxValue,1)) b;
  5. -- Result
  6. COLUMN_VALUE
  7. ------------
  8. 1
  9. 2
  10. 3
  11. 4
  12. 5

But with PostgreSQL the above pattern cannot be applied.

  1. -- PostgreSQL - Attempt 1
  2. SELECT b.*
  3.   FROM (SELECT 5 AS maxValue) a,
  4.        generate_series(1,a.maxValue,1) b;
  5. -- Result
  6. ERROR:  FUNCTION expression IN FROM cannot refer TO other relations OF same query level
  7. LINE 3:        generate_series(1,a.maxValue,1) b;
  1. -- PostgreSQL - Attempt 2
  2. SELECT b.*
  3.   FROM (SELECT 5 AS maxValue) a,
  4.        (SELECT generate_series(1,a.maxValue,1) AS column_value) b;
  5. -- Result
  6. ERROR:  subquery IN FROM cannot refer TO other relations OF same query level
  7. LINE 3:        (SELECT generate_series(1,a.maxValue,1) AS column_val...
  1. -- PostgreSQL - Attempt 3
  2. WITH aTable AS (
  3. SELECT 5 AS maxValue
  4. )
  5. SELECT b.*, a.maxValue
  6.   FROM aTable a,
  7.        (SELECT generate_series(1,a.maxValue,1) AS column_value FROM aTable a) b;
  8. -- Result
  9. column_value maxValue
  10. ------------ --------
  11. 1            5
  12. 2            5
  13. 3            5
  14. 4            5
  15. 5            5

Finally, we get a result we can work with with PostgreSQL. However, to use this as the template for “cross database” SQL coding is, I think, pretty unacceptable. It is ugly, wordy and contorted, leading to semantic interpretation problems, never mind the potential for voluminous SQL (with multiple CTEs).

There is a better method in PostgreSQL which is as follows.

  1. -- PostgreSQL - Attempt 4
  2. SELECT generate_series(1,a.maxValue,1) b, a.maxValue
  3.   FROM (SELECT 5 AS maxValue) a;
  4. -- Result
  5. column_value maxValue
  6. ------------ --------
  7. 1            5
  8. 2            5
  9. 3            5
  10. 4            5
  11. 5            5

However, this use of a table function as an attribute is not possible in SQL Server and Oracle, thus it cannot be a basis for a cross-platform SQL “template”.

I am sure there are better methods. Anyone?

Summary

Here is a table comparing the OGC/SQLMM Methods Used:

Standard (SQL/MM ie ISO TC211) Oracle (11gR2) SQL Server 2008 PostGIS 1.5
ST_MPolyFromText() ST_BdMPolyFromText() STMPolyFromText() ST_MultiPolygonFromText()
ST_ExteriorRing() ST_ExteriorRing() STExteriorRing() ST_ExteriorRing()
ST_GeometryN() ST_Geometries() STGeometryN() ST_GeometryN()
ST_GeometryType() ST_GeometryType() STGeometryType() ST_GeometryType()
ST_InteriorRingN() ST_InteriorRingN() STInteriorRingN() ST_InteriorRingN()
ST_NumGeometries() none STNumGeometries() ST_NumGeometries()
ST_NumInteriorRing() ST_NumInteriorRing() STNumInteriorRing() ST_NumInteriorRing()
ST_NumPoints() ST_NumPoints() STNumPoints() ST_NumPoints()
ST_PointN() ST_PointN() STPointN() ST_PointN()
ST_X() ST_X() STX ST_X()
ST_Y() ST_Y() STY ST_Y()

In short, the OGC/SQLMM side of the database access problem is reasonably straight-forward (if anything ever is). Sure, there are naming differences but they can overcome. The lack of a true object hierarchy does provide some interesting implementation issues but these, I think, can be encapsulated via WITH or CTE expressions.

But, on the whole “database (or any technology) independence” is not easy to achieve for a whole host of non-spatial reasons.

1. Implementations of the relevant SQL standards (eg SQL99 Analytics) differ across all databases;
2. Non-standard database-specific functionality exists eg ROWNUM/DUAL/TABLE function (Oracle) etc;
3. Lack of “common” methods across all database products for handling collections eg PostgreSQL has no similar “CROSS APPLY”/TABLE function handling capability (can do so via methods that the other databases do not provide);
4. OGC/SQLMM implementation varies across databases with with methods missing (eg ST_GeometryN in Oracle), and limited or no inheritance;
5. Interpretation of OGC/SQLMM standards differ eg SQL Server 2008 + PostGIS has geography datatype which the standard does not mention or require.
6. API keywords vary;
7. Database architectures differ.

Conclusion

Well, this is one attempt at looking at the cross-database. I have not discussed other architectural/philosophical design patterns like Oracle’s “Writers never block readers”, database ROI etc. That stuff is well covered elsewhere eg Tom Kyte’s views are well represented in this article by Mark Rittman called The Cost Of Database Independence and Comparison of different SQL implementations.

I have probably made some errors as I don’t profess to being a PostgreSQL expert (just an experienced user). If anyone else wishes to contribute to this discussion please do via some constructive comments.

Creative Commons License

post this at del.icio.uspost this at Diggpost this at Technoratipost this at Redditpost this at Farkpost this at Yahoo! my webpost this at Windows Livepost this at Google Bookmarkspost this to Twitter

Comment