+ Reply to Thread
Results 1 to 10 of 10

Thread: Finding sum of values in floating point raster

  1. #1
    Mark Andersen
    Join Date
    May 2010
    Posts
    11
    Points
    0
    Answers Provided
    0


    0

    Default Finding sum of values in floating point raster

    I am trying to find the quickest way (in terms of processing speed) to find the total (sum) of all values in a raster. In other words, I am looking for a tool that will add the value for all cells within a particular raster, and just give me that value as a number. Ideally, this would spit out a table with just one row, as the "Get Raster Properties" tool does. The "Get Raster Properties" tool in the "Raster Properties" toolbox in ArcToolbox looks like it is very close, in that it provides summary statistics for a raster, but there are two problems: 1) there is not an option for "sum"; and 2) it appears to round-off the answer to 6 decimal places, and I need much greater precision than that.

    The only way I've found to do this is to use the "Zonal Statistics As Table" tool and set a dummy "zone" raster that has a single value ("1") for all pixels covering my study area, so that it is finding the statistics (including sum) for all cells in the raster. However, this is pretty slow--mostly because I think it's comparing each pixel to see which zone it's in.

    It seems that there should be a built-in way to get a sum of all cell values in a raster. Help!

  2. #2
    Mark Andersen
    Join Date
    May 2010
    Posts
    11
    Points
    0
    Answers Provided
    0


    0

    Default Re: Finding sum of values in floating point raster

    I should also mention that I need to do this with over 2,000 grids, so I'm hoping for something that's easy to batch either in Model Builder or simply by using the batch functionality of ArcToolbox.

  3. #3
    Dan Patterson

    Join Date
    Apr 2010
    Posts
    1,755
    Points
    442
    Answers Provided
    41


    0

    Default Re: Finding sum of values in floating point raster

    presuming that the mean in Get Raster Properties is determined from the sum divideded by the number of cells (avg = sum/N), then perhaps you could exploit COLUMNCOUNT * ROWCOUNT to determine N. This would not of course work if there are nodata cells in the raster, and I would suggest that you explore this since there is no indication in the help whether nodata values are accounted for when calculating the statistical properties

  4. #4
    Mark Andersen
    Join Date
    May 2010
    Posts
    11
    Points
    0
    Answers Provided
    0


    0

    Default Re: Finding sum of values in floating point raster

    Dan--I thought of that. The problem is that the mean that is given by the "Get Raster Properties" is rounded to 6 decimal places. My grids contain very small values, and over 400,000,000 cells, so any rounding error at all means that such a multiplication would produce erroneous results.

  5. #5
    Dan Patterson

    Join Date
    Apr 2010
    Posts
    1,755
    Points
    442
    Answers Provided
    41


    0

    Default Re: Finding sum of values in floating point raster

    Ok, large grid...presuming that the output is a table, perhaps, the display is just set to 6 decimal points. If you had any capabilities to get the raster out to a numpy array, then the summation is simple (raster shown as a list of lists)
    Code:
    >>> import numpy
    >>> a = numpy.array([[1,1],[2,2]])
    >>> a.sum()
    6
    and all your conversion could be done to arrays using Python as your programming platform

  6. #6
    William Huber

    Join Date
    Apr 2010
    Posts
    694
    Points
    73
    Answers Provided
    2


    0

    Default Re: Finding sum of values in floating point raster

    Quote Originally Posted by markandersen View Post
    Dan--I thought of that. The problem is that the mean that is given by the "Get Raster Properties" is rounded to 6 decimal places. My grids contain very small values, and over 400,000,000 cells, so any rounding error at all means that such a multiplication would produce erroneous results.
    Actually, the value likely is not rounded: the data values themselves are maintained only in single-precision IEEE floats. I suspect they are summed using single-precision arithmetic (rather than with double precision), implying the result will be available only in single precision: about six decimal sig figs.

    You can do better by writing code that reads each value, promotes it to a double, and performs the summation in double precision.

    However, I do not follow how the current results, even if rounded, could be characterized as "erroneous," nor do I see how the error depends on the sizes of the values in the grid. The tough case occurs when there are many positive and negative values in the grid that almost balance: the floating point error could then be larger than the final result.
    --Bill Huber
    Quantitative Decisions
    For more help, visit the worldwide community at http://gis.stackexchange.com

  7. #7
    Mark Andersen
    Join Date
    May 2010
    Posts
    11
    Points
    0
    Answers Provided
    0


    0

    Default Re: Finding sum of values in floating point raster

    Bill,

    The number of significant digits far exceeds the number of decimal places being shown. For example, a value of 0.000000045821 is a valid value (a probability density function, where the probability for all cells across the study area sum to 1), but would be rounded to 0.000000 as it's being displayed when I use the Get Raster Properties. When I sum the grid values, if there is this much rounding in the calculations, they will sum to much less than 1. That's what I mean by erroneous.

    I do most of my scripting with VBA. Is this something that might be better done with Python? I haven't done much scripting with rasters, so I'm not sure how to access the values for individual cells. Any samples you have handy for something like this?

    Thanks!

  8. #8
    William Huber

    Join Date
    Apr 2010
    Posts
    694
    Points
    73
    Answers Provided
    2


    0

    Default Re: Finding sum of values in floating point raster

    Quote Originally Posted by markandersen View Post
    The number of significant digits far exceeds the number of decimal places being shown. For example, a value of 0.000000045821 is a valid value (a probability density function, where the probability for all cells across the study area sum to 1), but would be rounded to 0.000000 as it's being displayed when I use the Get Raster Properties. When I sum the grid values, if there is this much rounding in the calculations, they will sum to much less than 1. That's what I mean by erroneous.
    I think we might be miscommunicating about sig figs. 0.000000045821 has only five sig figs; the leading zeros don't contribute to the significant figures. There will be no internal rounding of this value.

    The "Get Raster Properties" tool might be rounding to a fixed decimal precision for numeric display but it is not the same code that sums the values, so its behavior is not relevant.

    The main difficulty with summing a pdf on a grid occurs when the values can vary by orders of magnitude. This is easy to see by doing the calculations on a hypothetical base-10 computer having only two significant digits (instead of on an actual base-2 computer having 21 significant binary digits). On such a computer a grid might have 900 values of 0.001 (stored as 0.10 * 10^-3), one value of 0.1 (stored as 0.10 * 10^-1), and 99 values of 0. These 1000 values sum to 1. The computer is capable of accurately performing computations like
    0.001 + 0.001 = 0.002
    and
    0.01 + 0.001 = 0.011,
    but using only two significant figures it will determine, for example, that
    0.1 + 0.001 = 0.1
    because the correct sum of 0.101 must be rounded to two significant figures, yielding 0.10. Thus, in the worst case the sum would be performed as
    (...(((0.1 + 0.001)+0.001)+0.001)+...+0.001) +0+...+0 = 0.1
    because all the sums will result in 0.1. In reality the 0.1 is unlikely to appear first so the error won't be this big, but it will still be substantial. It is attributable to the fact that the component numbers in the sum vary too much in magnitude compared to the inherent precision of the computer's addition mechanism.

    With ESRI floating grids, this will become an issue when numeric magnitudes vary by four or more orders of magnitude within a grid, approximately. (The threshold depends on grid size; the problem becomes more acute with larger grids.) One way to cope is to split the original grid into pieces by size of the values: create a grid of all values between 0.0001 and 1, for example (with zeros elsewhere); create another of all the values between 0.0001 and 0.00000001; etc. Sum the values in these grids separately--the sums will individually be fairly accurate--then sum the sums in VBA (or Python or whatever). In this fashion you won't have to pick the grids apart into individual cells (which I suspect might be a relatively slow operation).
    Last edited by whuber; 05-11-2010 at 10:15 AM.
    --Bill Huber
    Quantitative Decisions
    For more help, visit the worldwide community at http://gis.stackexchange.com

  9. #9
    Mark Andersen
    Join Date
    May 2010
    Posts
    11
    Points
    0
    Answers Provided
    0


    0

    Default Re: Finding sum of values in floating point raster

    Bill,

    Is there any reason to expect that the zonal statistics function (using a "zone" raster mask where all zone values are the same) will use single precision? When I do it that way (use a raster layer that simply has values of "1" for all cells overlaying my original raster), the "sum" I get seems reasonable.

  10. #10
    William Huber

    Join Date
    Apr 2010
    Posts
    694
    Points
    73
    Answers Provided
    2


    0

    Default Re: Finding sum of values in floating point raster

    Quote Originally Posted by markandersen View Post
    Is there any reason to expect that the zonal statistics function (using a "zone" raster mask where all zone values are the same) will use single precision? When I do it that way (use a raster layer that simply has values of "1" for all cells overlaying my original raster), the "sum" I get seems reasonable.
    Due to the absence of adequate technical documentation in the ArcGIS system, Mark, the only way to address such a question is by testing.

    Your test does not necessasrily stress the system because a large number of ones will be accurately added in single precision; you will start losing precision only in grids with more than about four million cells. Furthermore, your 1's were probably stored (and therefore computed) as integers, which doesn't test the floating point precision at all.

    To reproduce a situation like your probability distribution, I created a grid with 999,999 tiny values and one large value of 0.999638 (as reported both in the legend and the Identify tool). The zonal sum is reported as 1, exactly. I then removed this large value, replacing it by zero, and recomputed the zonal sum, which is now given in the table as "3.62344E-04". Double-precision addition of these numbers yields 1.00000034400000, which is close enough to 1 to suggest the zonal stats might be computed with better than single precision.

    Then, emulating your test, I created a grid of 25 million cells, each with the value 1.1 (to force it into floats) and computed a zonal sum: "2.75E+07" is the answer, bang-on correct.

    I have to conclude that zonal sums (and therefore means, etc.) are computed in double precision.
    --Bill Huber
    Quantitative Decisions
    For more help, visit the worldwide community at http://gis.stackexchange.com

+ Reply to Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts