retain/release one more time + math

This adds some easy math as a background to the preceeding log entry. It doesn't prove anything about the worthwhileness of the "new" version.

Looking at the original "old" code, what is underlayed with the bluish tint, is the actual "algorithmic" part. Let's call this part M for multithreaded and lets name the time spent executing this part of the method tM.

- (void) release
{
if( MulleAtomicDecrement( &refCountMinusOne_) == -1) // M

[self dealloc]; }

Let's call the added code in the "new" version the, code with the red underlay S for singlethreaded and its time tS. We will ignore the time spent in calling dealloc since that is outside the scope of current interest.

- (void) release
{
if( ! refCountMinusOne_) // S

{ [self dealloc]; return; }

if( MulleAtomicDecrement( &refCountMinusOne_) == -1) // M

[self dealloc]; }

Simple Functions

Assume an object is allocated and then released, thereby deallocated. Assume there had been no intervening retains and releases, like [[Foo new] release];

This shall constitute the lifetime of the object and it's lifetime is measured in releases (or n). Therefore the lifetime of this object is 1. In this case the "old" code spends tM time and the "new" code tS time.

If on the other hand the object had been retained and released once in between (p = [[Foo new] retain]; [p release]; [p release]), there are now two releases. The lifetime for this instance is therefore 2 and the "old" code spends tM + tM, whereas the "new" code spends tS + tM + tS. Continuing this sequence we get the following table for a few retain/release cycles

# releases (n) old new

1 tM tS

2 tM + tM tS + tS + tM

3 tM + tM + tM tS + tS + tM + tS + tM

4 tM + tM + tM + tM tS + tS + tM + tS + tM + tS + tM

# releases (n)	old	new
1	tM	tS
2	tM + tM	tS + tS + tM
3	tM + tM + tM	tS + tS + tM + tS + tM
4	tM + tM + tM + tM	tS + tS + tM + tS + tM + tS + tM

It should be easy to see, that this can be generalized to (shortening number of releases to n):

old( n) = tM * n

new( n) = tS * n + tM * (n - 1)

With a little further math, we can put tS and tM as well as n into relation.

old( n) = new( n)
tM * n = tS * n + tM * (n - 1)
tM * n = tS * n + tM * n - tM
    0 = tS * n - tM
    tM = tS * n
    tS = tM / n
     n = tM / tS

But what does that mean ? It gives the relationship, where using the "new" code is exactly the same as using the old "code" for a certain number of releases. So for example is we set tM to 1 and want to have a "new" code that is as fast as the "old" one up to a release count of 3, we need tS to be 1/3 tM.

Probabilities

To judge the improvement of the "new" code, we need a model of the probability that an object will be released 'n' times over its lifetime. I don't have any hard data yet, so for illustration purposes I am assuming this guassian-ish kind of distribution, heavily favoring the lower release counts and ignoring any objects with a lifetime larger 6 (more releases than 6 simply don't happen in this simplified scenario). The table lists the likelyhood that an object will have a lifetime for each possible lifetime. The percentages add up to 100. The factor simplies scales the likelyhood or probability to a number between 0 and 1. All Factors add to 1.

# releases (n) percentage factor

1 20% 0.2

2 25% 0.25

3 20% 0.2

4 20% 0.2

5 10% 0.1

6 5% 0.05

# releases (n)	percentage	factor
1	20%	0.2
2	25%	0.25
3	20%	0.2
4	20%	0.2
5	10%	0.1
6	5%	0.05

Then the average time spent for "old" will work out to,

sum_old( 1, n) := factor( n) * old( n)

where sum_old( 1, n) is supposed to be the summation over 1 to n of the right term of :=. The factor( n) is the value taken from the table. old( n) is the formula from above. This is simply a weighting function.

Assuming 1 for tM, then this would work out to

factor( n) * old( n) = factor( n) * tM * n
= factor( n) * 1 * n
= factor( n) * n
= 0.2 * 1 + 0.25 * 2 + 0.2 * 3 + 0.2 * 4 + 0.1 * 5 + 0.05 * 6
= 0.2 + 0.5 + 0.6 + 0.8 + 0.5 + 0.30 
= 2.9

The average time spent on "new" is likewise

sum_new( 1, n) := factor( n) * new( n)

The same for tS, still assuming it is 1/3 tM...

factor( n) * new( n) = factor( n) *  (tS * n + tM * (n - 1))
= factor( n) * (1/3 tM * n + tM * (n - 1))
= factor( n) * tM (1/3 * n + (n  - 1))
= factor( n) * tM ( 4/3 n - 1)
= factor( n) * (4/3 * n - 1)

= 0.2*0.33+0.25*1.66+0.2*3+0.2*4.33+0.10*5.66+0.05*7
= 0.067 + 0.417 + 0.6 + 0.867 + 0.567 + 0.35 
= 2.55

So in this scenario with all these assumptions, "old" would be (2.9 - 2.55) * 100 / 2.55 = 13,7 % slower And that's what I meant, when I wrote, it could be a statistical win.

What to take home

The "new" code could be beneficial not just for objects with no intervening retains, but also for objects with very few intervening retains. It depends on how much quicker tS executes.

Stuff to do

find out the release statistics for a common Cocoa application
measure tM and tS for MulleAtomic

Stuff to remember

the possible change of tM by the preceeding execution of tS is reflected only in tS for simplicity

Nat!