Current Oracle Spatial Blog Articles
The “production” database included a single table (containing around 70 million individual point records that described around 13,000 individual hydrographic surveys. Dynamic searching and visualising individual surveys in the base table was considered to be too slow to be used in the interactive mapping that was required for the data discovery web application so “surrogate” vector linestrings existed against which searching and visualisation would occur.
No one had yet been able to implement search and visualisation against the base point data: until I came along.
I have not included a description of the original tables in this blog; what I have done is create a set of tables and scripts to generate random data which are included in a download for you to run yourself. Also, to simplify the material for this article, I will not discuss how the original code handles surveys with more 3D points than Oracle’s Sdo_Geometry Sdo_Ordinate_Array can hold (which is ( 1048576 – 1 ) / 3 = 349,525 coordinates): all generated data will not break this limit.
The rest of this article describes the testing that was done testing different methods for the creation of single multi-point records from the base single point data. The article also shows how Oracle’s PL/SQL profiling package was used in improving the performance of the actual code.
Note that all timings in this article are based on Oracle’s DBMS_PROFILER package whose use will be described in another blog article.
Note, sb_eno is the enterprise identifier of a single singlebeam survey; the pointno is the survey point within a singlebeam survey – normally, sb_eno,pointno is the primary key but, for simplicity, the generated pointno will actually be unique across all surveys.
To populate this table we can do this:
This data does not need to be indexed so this is not covered.
The target table that will hold our multipoint sdo_geometry objects is:
Generating the MultiPoint objects
Of course, in a perfect world, the generation of the multipoint sdo_geometries should be as simple as (the APPEND hint provides a more efficient “direct path” insert into the table):
However, except for small datasets, this method is slow (some numbers justifying this assertion will be provided later).
Method Two: “Appendix D”
The Oracle Spatial Team’s recommended approach for the most efficient use of the SDO_AGGR_UNION function is documented in Section D.4 of Appendix D of the Oracle® Spatial User’s Guide and Reference 10g Release 2 (10.2) documentation.
The general idea here is to avoid expensive “context swapping” in the Oracle aggregation engine by implementing a form of “parallelisation”. The numbers presented later will show that this is effective but that it still does not perform fast enough compared to other methods.
Method Three: Pipelined Function
During my testing at GeoScience Australia I had the idea of using PL/SQL to generate the multi-points using a well-known, scalable, Oracle collection aggregation technique. The best way to describe this is by example in the following PL/SQL (the code includes a number of standard optimisations familiar to PL/SQL developers i.e., the use of CURSORs, BULK COLLECTion, FORALL “array” insertion, ROWTYPEs etc). It is the SQL statement that is of most interest and has been highlighted in the code.
This first method included the use of a pipelined function (pipelined function use will be covered in another blog) which turned an Sdo_Point object into an Sdo_Ordinate_Array object which could then be BULK COLLECTED in the highlighted SQL:
The BULK COLLECT aggregates the individual Sdo_Ordinate_Arrays returned by the function into a single array for the survey. What I discovered was this approach was substantially faster than the previous two methods.
After getting this to work, I then realised I didn’t need the Function as is described in the next method.
Method Four: PL/SQL SDO_Ordinate_Array aggregation
The next improvement to the algorithm was to get rid of the custom “asOrdinates” function in favour of using the Oracle Spatial type library directly.
All the heavy lifting in this approach is done by directly referencing the individual ordinates coded in the geometry’s 3D Sdo_Point object and converting them into an Sdo_Ordinate_Array which can then be BULK COLLECTED into a resultant Sdo_Ordinate_Array.
This method was the fastest of all the methods used. However, because some surveys where larger than can be represented in an Sdo_Ordinate_Array, I did look at creating “custom” point arrays that did not have the million ordinate restriction that is in the MDSYS.SDO_ORDINATE_ARRAY object.
Method Five: Custom array of SDO_Point_Type
Another method using aggregation involved the use of a custom array of SDO_Point_Type as in the following PL/SQL. (One of the advantages of this approach is the fact that the aggregation SELECT statement can be used for all multi-point aggregation regardless as to whether the resulting object breaks the SDO_Ordinate_Array limit; a negative is the need to turn the array of SDO_Point_Type back into array bound “safe” SDO_Ordinate_Arrays on insert. But the method is illustrative of what can be done if one “thinks outside the square”.)
To ensure that the performance of all four methods can be compared, the first two methods were “wrapped” by PL/SQL such that the SELECT statement was executed once for each sb_eno. Also, the final methods went through a few different interations in the handling of those surveys whose points broke the SDO_Ordinate_Array limit (see the actual SQL for how this was handled).
The first chart shows how substantially different the “native” versus “custom” methods can be: one cannot see the actual timings for the “custom” methods:
When one removes the “native” methods, one can see more clearly the relative performance of the “custom” methods.
Replication of the full, 67 million row raw point data, to a 13,000 row multi-point table was achieved in less than 50 minutes of processing. This was a substantial improvement and as a result of the work GeoScience Australia moved to a database replication strategy that included the efficient generation of multi-point survey data from the raw observations. This also gives the end-user the ability to work with the actual points in a web application instead of the surrogate averaged linework resulting in more accurate analysis and data discovery.
The “take home message” for users of Oracle Spatial/Locator should look to the power of the whole database to solve data warehousing and analysis problems and not just the Spatial packages themselves.
A fully worked example showing the fastest method for you to try youself is available here