Wednesday, August 10, 2011

Determining the Spatial Features in a SQL Server Table


I’ve been using the SQL Server Denali CTP(s) for just over 6 months now. As a result, I’ve become accustomed to some of the new features and forget that they aren’t generally available in production environments yet… it always comes as a bit of a surprise when you have to do some work on a SQL Server 2008/R2 database and suddenly discover half of the feature you’ve come to rely on aren’t there any more (I pity those consultants who turn up at a client’s site to find SQL Server 2000/7/6.5 instances – fortunately, if you’re dealing with spatial data, this doesn’t really happen ;)

One such example happened today, when I was trying to determine the extent of the features in a column of spatial data, in order to create a bounding box for a spatial index covering all the data. In SQL Server Denali, this can be done very easily using an EnvelopeAggregate. Although not documented, any envelope created by SQL Server (i.e. using STEnvelope() or EnvelopeAggregate()) will be always have its first point at the bottom left of the envelope, and its third point at the opposing, top right corner. You can therefore calculate the minimum and maximum extents of the features in the MyGeomColumn of geometry data as follows:
  geometry::EnvelopeAggregate(MyGeomColumn).STPointN(1).STX AS MinX,
  geometry::EnvelopeAggregate(MyGeomColumn).STPointN(1).STY AS MinY,
  geometry::EnvelopeAggregate(MyGeomColumn).STPointN(3).STX AS MaxX,
  geometry::EnvelopeAggregate(MyGeomColumn).STPointN(3).STY AS MaxX
FROM BigOlTableOfData;
So what do you do in SQL Server 2008/R2, where the EnvelopeAggregate() is not available? Here’s a couple of options…

Option #1 : With a Cursor

Well, you could come up with your own envelope aggregate using a (dreaded) cursor. Something like this:
  @geom geometry,
  @geomaggregate geometry = 'GEOMETRYCOLLECTION EMPTY';
DECLARE @i int = 0;

DECALRE geocursor cursor fast_forward
FOR SELECT MyGeomColumn FROM BigOlTableOfData;
OPEN geoCursor;
FETCH NEXT FROM geoCursor INTO @geom;
  SET @geomaggregate = @geomagregate.STUnion(@geom);
  FETCH NEXT FROM geoCursor INTO @geom;
  SET @i += 1;
CLOSE geocursor;
deallocate geoCursor;
Then, create an envelope around @geomaggregate using STEnvelope(), and select the first and third points as before.

Option #2 : CLR Function

Alternatively, you could choose not to explicitly create the envelope, but create a CLR function that looped through all the points of each geometry (using STPointN() and STNumPoints()) , and keep track of the minimum and maximum X and Y values in the loop. Something like this:
double xMin = double.MaxValue;
double xMax = double.MinValue;
double yMin = double.MaxValue;
double yMax = double.MinValue;

// Loop through each geometry in the dataset
foreach(SqlGeometry geom in MyGeomColumn) {

  // Loop through each point in this geometry
  for(int i=1; i<= geom.STNumPoints(); i++)
    SqlGeometry point = geom.STPointN(i);

    // Check whether this point is a new min/max value
    if (point.X < xMin)     { xMin = point.X; }     else if (point.X > xMax)
    { xMax = point.X; }

    if (point.Y < yMin)     { yMin = point.Y; }     else if (point.Y > yMax)
    { yMax = point.Y; }

Option #3 : CTE

Here’s a nice alternative using two CTEs, which I first saw described here. The first CTE creates a table, ENVELOPE, containing the individual envelopes of each feature in the table. The second table selects just the corner points of each of those envelopes, with a UNION ALL query (note that this is a regular UNION, not a STUnion()).

Then, you simply select the minimum and maximum X and Y values of any of the points listed in the CORNERS table:
    SELECT MyGeomColumn.STEnvelope() as envelope from BigOlTableOfData
    SELECT envelope.STPointN(1) as point from ENVELOPE
    UNION ALL select envelope.STPointN(3) from ENVELOPE

  MIN(point.STX) as MinX,
  MIN(point.STY) as MinY,
  MAX(point.STX) as MaxX,
  MAX(point.STY) as MaxY

Option #4 : Persisted Envelopes

Another option, and the one I chose to use today, is to alter your table structure to include computed columns containing the minimum and maximum X and Y values of the envelope around each feature on a per-row basis. I chose to make my columns PERSISTED, so that the computed values would actually be materialised and saved in the database rather than computed on-the-fly:
  MinX AS (CONVERT(int, MyGeomColumn.STEnvelope().STPointN((1)).STX, 0)) PERSISTED,
  MinY AS (CONVERT(int, MyGeomColumn.STEnvelope().STPointN((1)).STY, 0)) PERSISTED,
  MaxX AS (CONVERT(int, MyGeomColumn.STEnvelope().STPointN((3)).STX, 0)) PERSISTED,
  MaxY AS (CONVERT(int, MyGeomColumn.STEnvelope().STPointN((3)).STY, 0)) PERSISTED;
This requires a little bit more processing up-front, (and on each additional INSERT into the table), but the data in my table was going to remain relatively static. The advantage of this approach is that you can now work out the full extent of values in the dataset at any time quickly and simply by:
  MIN(MinX) AS MinX,
  MIN(MinY) AS MinY,
  MAX(MaxX) AS MaxX,
  MAX(MaxY) AS MaxY;
Seeing as these are just integer columns, this is much quicker than relying on the more complex spatial methods, of having to employ any looping-type behaviour. You can also add an index to the integer coordinate values in the computed columns, which might enable you to perform faster search and retrieval queries of e.g. values lying within a rectangular area of interest (such as what features are visible in a given Bing Maps view) than using the more powerful, but frequently slower, built-in spatial indexes.


No comments:

Post a Comment

Share This: