Go to content Go to navigation and search

Home

Current Oracle Spatial Blog Articles

    Convert Single Geometry to Multi-part Geometry in Oracle Spatial
    Optimized Rectangle to 5 Point Polygon
    Centroid Package now supports Y ordinate seeding
    Convert GeoJSON document to Sdo_Geometry objects
    Implementation Of Travelling Salesman Problem
    Create Polygon From Bearings And Distances
    Function That Returns a Compass Point From a Whole Circle Bearing
    Playing around with Centroids by using different seed values
    GeoRaptor 4.x Update 2
    Simple Oracle C Sprintf or Java String.format
    Some Oriented Point Functions
    Extracting Inner Rings Changed Ordinate Ordering: A Trap For Players Who Don't Read Documentation!
    PLS-00306: wrong number or types of arguments in call to 'SDO_GEOMETRY'
    Converting Google Earth Formatted Longitude/Latitude points to decimal degrees
    Oracle Business Intelligence Warehousing and Analytics - Spatial Summit
    How far inside, is inside? Measuring actual distance.
    Noding and building a polygon from single, overlapping linestrings
    Analyzing Spatial Query Performance Improvements in Oracle Spatial and Graph 12c Through Cross-Vendor Comparison
    ST_VertexN / ST_PointN - Extracting a specific point from any geometry
    Convert Single Point stored in SDO_ORDINATES to SDO_POINT_TYPE
    Aggregate APPEND Islands and XOR polygons
    Circular Arcs in Geodetic Polygons
    Some SDO_GEOMETRY/DIMINFO handling functions
    Applying And Extending Oracle Spatial - Book Released
    Changing all DIMINFO sdo_tolerance values for all metadata records in one go.
    Building Polygons from Incomplete Linestrings using ST_PolygonBuilder
    Computing Cardinal Directions to nearby geometries
    Intersecting two aggregated polygon layers with SC4O
    Spatial and Oracle 12c
    Update Triggers and SDO_GEOMETRY Equality
    Duplicate Geometry data and Data Models
    CENTROID package update
    How to calculate cumulative length of a linestring
    Useful Package of Wrapper Functions for Sdo_Util.AffineTransforms
    Compute Location from known Lat/Long point using delta easting and northing in miles
    SDO_AGGR_SET_UNION
    Sorting SDO_GEOMETRY data using the ORDER BY clause of a SELECT statement
    Creating linestrings from points
    Rounding Coordinates or Ordinates in SDO_GEOMETRY
    Effects of Sdo_Geometry Ordinate Precision on Performance
    Effects of Sdo_Geometry Ordinate Precision on Storage
    The Spatial filtering of geometries: The effect of tolerances on relationships
    Application of Delaunay Triangulation and Inverse Distance Weighting (IDW) in Oracle for Soils Interpolation
    Selecting all SDO_GTYPE values for all tables/sdo_geometry columns in a schema
    CENTROID package - Tips for Use
    Announcing the Spatial Companion For Oracle (SC4O)
    Filtering Rings (Oracle Spatial)
    Splitting a polygon using one or more linestrings
    isValid, isSimple, Dimension and CoordDim methods for SDO_Geometry
    Line Merging or Collecting lines together: ST_LineMerger
    ST_DeleteVertex for Oracle SDO_Geometry based on Jaspa/JTS
    3D/4D and SRID aware Conversion functions for SDO_Geometry: WKT and EWKT
    Topological vs Non-Topological Simplification/Generalization of Aggregated Area Geometies in Oracle
    Filtering very short linestrings via bitmap function index
    CENTROID For Oracle
    Gridding a sdo_geometry line/polygon object (Oracle)
    Finding centre and radius of a circular geometry
    Constraining geometry type for sdo_geometry column in a table.
    CASE Statements and SDO_GEOMETRY
    The Power of Constraints and Indexes for Spatial Constraints: stopping duplicate points
    Replacement for SDO_GEOM.RELATE - ST_Relate based on JTS
    Changing Oracle Spatial Index Parameters on existing index
    Writing Excel Spreadsheets files from within the Oracle database using Java and PL/SQL
    Writing xSV (eg csv) files from within the Oracle database using Java and PL/SQL
    A simple spike finder for Spatial/Locator
    JTS Java class compilation for 11g and above
    Random Spatial Search Procedure
    Geometry Snapping using JTS in Oracle
    Exposing JTS's MinimumBoundingCircle functionality
    Exposing JTS's Densifier functionality
    Using JTS's Comparison Functions - HausdorffSimilarityMeasure & AreaSimilarityMeasure with SDO_GEOMETRY
    Free JTS-based Area/Length Functions
    Handy way of systematically fixing polygon geometries with 13349 and other errors
    Standalone CENTROID package now available for download
    Free Union, Intersection, Xor and Difference Functions for Oracle Locator - Part 4 Processing Geodetic data
    Configurable Buffer: JTS and Oracle
    Free Union, Intersection, Xor and Difference Functions for Oracle Locator - Part 3
    Free Union, Intersection, Xor and Difference Functions for Oracle Locator - Part 2
    Free Union, Intersection, Xor and Difference Functions for Oracle Locator - Part 1
    Building Lines into Polygons in Oracle Locator / Spatial
    Finding Intersection Points between Line and Polygon
    SDO2GeoJSON
    Free version of sdo_length
    Alternative to my SQL based GetNumRings function
    External Tables and SDO_Geometry data.
    layer_gtype keyword issue when indexing linear data on 11g
    String Tokenizer for Oracle
    Free Aggregate Method for Concatenating 2D Lines in Oracle Locator 10g
    Reducing 5 Vertex Polygon to Optimized Rectangle
    Square Buffer
    Converting decimal seconds to string
    SDO_GEOM.VALIDATE_GEOMETRY_WITH_CONTEXT - 13356 Issues
    Valid conversion unit values for Oracle sdo_geom.sdo_length()
    Removing Steps in Gridded Vector Data - SmoothGrid for Oracle
    Oracle Spatial DISJOINT search/filtering
    Creating SDO_Geometry from geometric data recorded in the columns of a table
    Concave Hull Geometries in Oracle 11gR2
    Projecting SDO_GEOM_METADATA DIMINFO XY ordinates
    Instantiating MDSYS.VERTEX_TYPE
    New PL/SQL Packages - Rotate oriented point
    GeoRaptor Development Team
    Fast Refreshing Materialized View Containing SDO_GEOMETRY and SDO_GEOM.SDO_AREA function
    Performance of PL/SQL Functions using SQL vs Pure Code
    Implementing the BEST VicGrid Projection in Oracle 10gR2
    Making Sdo Geometry Metadata Update Generic Code
    ORA-13011 errors when using SDO_GEOM.VALIDATE_LAYER_WITH_CONTEXT()
    Extract Polygons from Compound Polygon
    Detecting sdo_geometries with compound (3-point Arcs) segments
    GEOMETRY_COLUMNS for Oracle Spatial
    Convert GML to SDO_Geometry in Oracle 10gR2
    Spatial Sorting of Data via Morton Key
    Swapping Ordinates in an SDO_GEOMETRY object
    New To_3D Function
    Extend (Reduce/Contract/Skrink) Function for Oracle
    Loading and Processing GPX 1.1 files using Oracle XMLDB
    Loading Spatial Data from an external CSV file in Oracle
    Calling the Oracle Spatial shapefile loader from within the Oracle database itself
    Implementing SDO_VertexUpdate/ST_VertexUpdate for Oracle
    Implementing SDO_RemovePoint/ST_RemovePoint for Oracle
    Implementing SDO_AddPoint/ST_AddPoint for Oracle
    ESRI ArcSDE Exverted and Inverted Polygons and Oracle Spatial
    Funky Fix Ordinates By Formula
    Implementing a SetPoint/ST_SetPoint function in Oracle
    Implementing an ST_SnapToGrid (PostGIS) function for Oracle Spatial
    Generating random point data
    Implementing an Affine/ST_Affine function for Oracle Spatial
    Implementing a Scale/ST_Scale function for Oracle Spatial
    Implementing a Parallel/ST_Parallel function for linestring data for Oracle Spatial
    Implementing a Rotate/ST_Rotate function for Oracle Spatial
    Limiting table list returned when connecting to Oracle Database using ODBC
    ST_Azimuth for Oracle: AKA Cogo.Bearing
    Implementing a Translate/ST_Translate/Move function for Oracle Spatial
    Elem_Info_Array Processing: An alternative to SDO_UTIL.GetNumRings and querying SDO_ELEM_INFO itself
    Minumum Bounding Rectangle (MBR) Object Type for Oracle
    How to extract elements from the result of an sdo_intersection of two polygons.
    How to restart a database after failed parameter change
    Fixing failed spatial indexes after import using data pump
    generate_series: an Oracle implementation in light of SQL Design Patterns
    Multi-Centroid Shootout
    Oracle Spatial Centroid Shootout
    On the use of ROLLUP in Oracle SELECT statements
    Surrounding Parcels
    Spatial Pipelining
    Using Oracle's SDO_NN Operator - Some examples
    Converting distances and units of measure in Oracle Locator
    Split Sdo_Geometry Linestring at a known point
    Forcing an Sdo_Geometry object to contain only points, lines or areas
    Unpacking USER_SDO_GEOM_METADATA's DIMINFO structure using SQL
    Generating multi-points from single point records in Oracle Spatial
    Object Tables of Sdo_Geometry
    Oracle Locator vs Oracle Spatial: A Reflection on Oracle Licensing of the SDO_GEOM Package
    FAST REFRESHing of Oracle Materialized Views containing Sdo_Geometry columns
    Australian MGA/AMG Zone Calculation from geographic (longitude/latitude) data
    Loading Shapefiles (SHP) into Oracle Spatial
    Oracle Spatial Mapping and Map Rendering Performance Tips
    The significance of sdo_lb/sdo_ub in USER_SDO_GEOM_METDATA: Do I need it?
    Oracle Spatial Forum - Melbourne April 2007
    Layer_GTypes for spatial indexes
    Oracle's SQL/MM Compliant Types
    Tips and Tricks

Search

    Browse

RSS / Atom

Email me

textpattern

Creative Commons License
All Blog Articles, Data Models and Free Source Code by Simon Greener, The SpatialDB Advisor is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

String Tokenizer for Oracle

Wednesday June 29 2011 at 04:49

Keywordsstring tokenizer returns token separator
Summary

This article contains two versions of a string tokenizer. One that just returns tokens that are separated by a list of separators, the other that returns tokens and the separators as they are found in the string.

UPDATED FUNCTION AT END OF ARTICLE RETURNS SEPARATORS

We often have need of string tokenizers in applications. Here is one for Oracle based on work by Pawel Barut.

  1. -- We need a type to hold the returned tokens
  2. --
  3. CREATE OR REPLACE TYPE T_TokenSet AS TABLE OF VARCHAR2(4000);
  4. /
  5. SHOW errors
  6. GRANT EXECUTE ON T_TokenSet TO public WITH GRANT OPTION;
  7. /*********************************************************************************
  8. ** @function    : Tokenizer
  9. ** @precis      : Splits any string into its tokens.
  10. ** @description : Supplied a string and a list of separators this function
  11. **                returns resultant tokens as a pipelined collection.
  12. ** @example     : SELECT t.column_value
  13. **                  FROM TABLE(tokenizer('The rain in spain, stays mainly on the plain.!',' ,.!') ) t;
  14. ** @param       : p_string. The string to be Tokenized.
  15. ** @param       : p_separators. The characters that are used to split the string.
  16. ** @requires    : t_TokenSet type to be declared.
  17. ** @history     : Pawel Barut, http://pbarut.blogspot.com/2007/03/yet-another-tokenizer-in-oracle.html
  18. ** @history     : Simon Greener - July 2006 - Original coding (extended SQL sourced from a blog on the internet)
  19. **/
  20. CREATE OR REPLACE
  21.   FUNCTION Tokenizer(p_string     IN VarChar2,
  22.                      p_separators IN VarChar2 DEFAULT ' ')
  23.     RETURN T_TokenSet Pipelined
  24.   AS
  25.     v_strs camod_common.T_TokenSet;
  26.   BEGIN
  27.     IF ( p_string IS NULL
  28.          OR
  29.          p_separators IS NULL ) THEN
  30.        RETURN;
  31.     END IF;
  32.     WITH sel_string AS (SELECT p_string fullstring FROM dual)
  33.     SELECT substr(fullstring, beg+1, end_p-beg-1) token
  34.            Bulk Collect INTO v_strs
  35.       FROM (SELECT beg, Lead(beg) OVER (ORDER BY beg) end_p, fullstring
  36.               FROM (SELECT beg, fullstring
  37.                       FROM (SELECT Level beg, fullstring
  38.                               FROM sel_string
  39.                             CONNECT BY Level <= LENGTH(fullstring)
  40.                   )
  41.                      WHERE instr(p_separators,substr(fullstring,beg,1)) >0
  42.                     UNION ALL
  43.                     SELECT 0, fullstring
  44.                       FROM sel_string
  45.                     UNION ALL
  46.                     SELECT LENGTH(fullstring)+1, fullstring
  47.                       FROM sel_string)
  48.            )
  49.      WHERE end_p IS NOT NULL
  50.        AND end_p > beg + 1;
  51.     FOR i IN v_strs.FIRST..v_strs.LAST Loop
  52.       PIPE ROW(v_strs(i));
  53.     END Loop;
  54.     RETURN;
  55.   END Tokenizer;

Here are my, simple, tests.

  1. SELECT DISTINCT t.column_value AS token
  2.   FROM TABLE(Tokenizer('LineString:MultiLineString:MultiPoint:MultiPolygon:Point:Point:LineString:Polygon:Polygon',':')) t;

Result.

token
LineString
MultiLineString
MultiPoint
MultiPolygon
Point
Polygon
  1. SELECT t.column_value AS token
  2.   FROM TABLE(tokenizer('The rain in spain, stays mainly on the plain.!',' ,.!')) t;

Result.

token
The
rain
in
spain
stays
mainly
on
the
plain

Updated Function

Sometimes it is more than handy to be able to access the separators in the position they are found in the output from a Tokenizer. Here is an updated version of the above that does this.

  1. -- New types
  2. --
  3. -- We need a type to hold the returned tokens
  4. --
  5. DROP TYPE T_Token    Force;
  6. DROP TYPE T_TokenSet Force;
  7. CREATE TYPE T_Token AS Object (
  8.    id        INTEGER,
  9.    token     varchar2(30000),
  10.    separator varchar2(30000)
  11. );
  12. /
  13. SHOW errors
  14. GRANT EXECUTE ON T_Token TO public WITH GRANT OPTION;
  15. CREATE TYPE T_TokenSet AS TABLE OF codesys.t_token;
  16. /
  17. SHOW errors
  18. GRANT EXECUTE ON T_TokenSet TO public WITH GRANT OPTION;
  19. /*********************************************************************************
  20. ** @function    : Tokenizer
  21. ** @precis      : Splits any string into its tokens.
  22. ** @description : Supplied a string and a list of separators this function
  23. **                returns resultant tokens as a pipelined collection.
  24. ** @example     : SELECT t.column_value
  25. **                  FROM TABLE(tokenizer('The rain in spain, stays mainly on the plain.!',' ,.!') ) t;
  26. ** @param       : p_string. The string to be Tokenized.
  27. ** @param       : p_separators. The characters that are used to split the string.
  28. ** @requires    : t_TokenSet type to be declared.
  29. ** @history     : Pawel Barut, http://pbarut.blogspot.com/2007/03/yet-another-tokenizer-in-oracle.html
  30. ** @history     : Simon Greener - July 2006 - Original coding (extended SQL sourced from a blog on the internet)
  31. ** @history     : Simon Greener - Apr 2012  - Extended TO include returning OF tokens
  32. **/
  33. CREATE OR REPLACE
  34.   FUNCTION Tokenizer(p_string     IN VarChar2,
  35.                      p_separators IN VarChar2 DEFAULT ' ')
  36.     RETURN T_TokenSet Pipelined
  37.   AS
  38.     v_tokens codesys.T_TokenSet;
  39.   BEGIN
  40.     IF ( p_string IS NULL
  41.          OR
  42.          p_separators IS NULL ) THEN
  43.        RETURN;
  44.     END IF;
  45.     WITH myCTE AS (
  46.        SELECT c.beg, c.sep, ROW_NUMBER() OVER(ORDER BY c.beg ASC) rid
  47.          FROM (SELECT b.beg, c.sep
  48.                  FROM (SELECT Level beg
  49.                          FROM dual
  50.                         CONNECT BY Level <= LENGTH(p_string)
  51.                       ) b,
  52.                       (SELECT SubStr(p_separators,level,1) AS sep
  53.                         FROM dual
  54.                         CONNECT BY Level <= LENGTH(p_separators)
  55.                       ) c
  56.                 WHERE instr(c.sep,substr(p_string,b.beg,1)) >0
  57.                UNION ALL SELECT 0, CAST(NULL AS varchar2(10)) FROM dual
  58.              ) c
  59.     )
  60.     SELECT T_Token(ROW_NUMBER() OVER (ORDER BY a.rid ASC),
  61.                    CASE WHEN LENGTH(a.token) = 0 THEN NULL ELSE a.token END,
  62.                    a.sep) AS token
  63.       Bulk Collect INTO v_tokens
  64.       FROM (SELECT d.rid,
  65.                    SubStr(p_string,
  66.                           (d.beg + 1),
  67.                           NVL((Lead(d.beg,1) OVER (ORDER BY d.rid ASC) - d.beg - 1),LENGTH(p_string)) ) AS token,
  68.                    Lead(d.sep,1) OVER (ORDER BY d.rid ASC) AS sep
  69.               FROM MyCTE d
  70.            ) a
  71.      WHERE LENGTH(a.token) <> 0 OR LENGTH(a.sep) <> 0;
  72.     FOR v_i IN v_tokens.FIRST..v_tokens.LAST loop  
  73.        PIPE ROW(v_tokens(v_i));
  74.     END LOOP;
  75.     RETURN;
  76.   END Tokenizer;

Testing

  1. SELECT DISTINCT t.token
  2.  FROM TABLE(Tokenizer('LineString:MultiLineString:MultiPoint:MultiPolygon:Point:Point:LineString:Polygon:Polygon',':')) t;

Results

TOKEN
LineString
MultiLineString
MultiPoint
MultiPolygon
Point
Polygon

The classic “Rain in Spain…”.

  1. SELECT t.*
  2.   FROM TABLE(tokenizer('The rain in spain, stays mainly on the plain.!',' ,.!')) t;

Results

ID TOKEN SEPARATOR
1 The {SPACE}
2 rain {SPACE}
3 in {SPACE}
4 spain ,
5 (null) {SPACE}
6 stays {SPACE}
7 mainly {SPACE}
8 on {SPACE}
9 the {SPACE}
10 plain .
11 (null) !

Now, let’s process a POLYGON WKT.

  1. SELECT t.id, t.token, t.separator
  2.   FROM TABLE(tokenizer('POLYGON((2300 400, 2300 700, 2800 1100, 2300 1100, 1800 1100, 2300 400), (2300 1000, 2400  900, 2200 900, 2300 1000))',' ,()')) t;

Results

ID TOKEN SEPARATOR
1 POLYGON (
2 (null) (
3 2300 {SPACE}
4 400 ,
5 (null) {SPACE}
6 2300 {SPACE}
7 700 ,
8 (null) {SPACE}
9 2800 {SPACE}
10 1100 ,
11 (null) {SPACE}
12 2300 {SPACE}
13 1100 ,
14 (null) {SPACE}
15 1800 {SPACE}
16 1100 ,
17 (null) {SPACE}
18 2300 {SPACE}
19 400 )
20 (null) ,
21 (null) {SPACE}
22 (null) (
23 2300 {SPACE}
24 1000 ,
25 (null) {SPACE}
26 2400 {SPACE}
27 (null) {SPACE}
28 900 ,
29 (null) {SPACE}
30 2200 {SPACE}
31 900 ,
32 (null) {SPACE}
33 2300 {SPACE}
34 1000 )
35 (null) )

This time don’t include the space as a separator.

  1. SELECT t.id, t.token, t.separator
  2.   FROM TABLE(tokenizer('POLYGON((2300 400, 2300 700, 2800 1100, 2300 1100, 1800 1100, 2300 400), (2300 1000, 2400  900, 2200 900, 2300 1000))',',()')) t;

Results

ID TOKEN SEPARATOR
1 POLYGON (
2 (null) (
3 2300 400 ,
4 2300 700 ,
5 2800 1100 ,
6 2300 1100 ,
7 1800 1100 ,
8 2300 400 )
9 (null) ,
10 {SPACE} (
11 2300 1000 ,
12 2400 900 ,
13 2200 900 ,
14 2300 1000 )
15 (null) )

I hope that someone out there finds this useful.code id=

Creative Commons License

post this at del.icio.uspost this at Diggpost this at Technoratipost this at Redditpost this at Farkpost this at Yahoo! my webpost this at Windows Livepost this at Google Bookmarkspost this to Twitter

Comment