- Joined
- 11/14/06
- Messages
- 80
- Points
- 16
Following up on a discussion in last night's Linear Algebra refresher, here is some recent data online (from Intel) about the # of clock cycles that are used to perform floating point arithmetic in software. If I can find hardware info, I will post it as an addendum or edit to this thread.
http://www.intel.com/technology/itj/2007/v11i1/s2-decimal/1-sidebar.htm
The discussion was about the operation count in forward and backward substitution. The issue was whether the loop counter increment materially impacts the overall calculation.
Based on the numbers in this paper (specifically, the barchart), it should be clear that the counting of operations should at the very least attach more weight to floating point arthmetic (whose clock cycles per operation are measured in 10s or even 100s, depending on the type of operation) than to the "housekeeping" processing in a given algorithm. This discounts the time taken to load the input value(s) from RAM and store the output value(s) back to RAM, if required.
To use the example from last night of housekeeping processing, the loop counter increment is an integer increment (usually of a register, not a value in memory) and will take 1 cycle (independent of the number of bits affected by a particular increment). Any CPU that takes more than 1 cycle to do this would not get off the drawing board. It's likely that saving the values to an output matrix or vector is going to be more costly than incrementing the loop counter but this will still (given an efficient data structure and code) be dominated by the math.
My (performance purist's) view is that this type of operation counting provides only a broad estimate at best of how the code will perform in live operation. The only way to measure it properly is to profile the code under production load (or a simulation thereof) over time. The main utility of this type of operation counting is to make sure that your code performs no more math ops than are implied by the uncoded math algorithm.
http://www.intel.com/technology/itj/2007/v11i1/s2-decimal/1-sidebar.htm
The discussion was about the operation count in forward and backward substitution. The issue was whether the loop counter increment materially impacts the overall calculation.
Based on the numbers in this paper (specifically, the barchart), it should be clear that the counting of operations should at the very least attach more weight to floating point arthmetic (whose clock cycles per operation are measured in 10s or even 100s, depending on the type of operation) than to the "housekeeping" processing in a given algorithm. This discounts the time taken to load the input value(s) from RAM and store the output value(s) back to RAM, if required.
To use the example from last night of housekeeping processing, the loop counter increment is an integer increment (usually of a register, not a value in memory) and will take 1 cycle (independent of the number of bits affected by a particular increment). Any CPU that takes more than 1 cycle to do this would not get off the drawing board. It's likely that saving the values to an output matrix or vector is going to be more costly than incrementing the loop counter but this will still (given an efficient data structure and code) be dominated by the math.
My (performance purist's) view is that this type of operation counting provides only a broad estimate at best of how the code will perform in live operation. The only way to measure it properly is to profile the code under production load (or a simulation thereof) over time. The main utility of this type of operation counting is to make sure that your code performs no more math ops than are implied by the uncoded math algorithm.