Unsubscribed from cocoa-dev
Coming to think of it, the only mailing lists I regularly read now, are the GnuSTEP mailing lists, for their entertainment value (I don't use GnuSTEP :))
« February 2007 | Main | April 2007 »
Coming to think of it, the only mailing lists I regularly read now, are the GnuSTEP mailing lists, for their entertainment value (I don't use GnuSTEP :))
Desktop Tower Defense is totally addictive. And even if you complete it, you still wonder if there isn't a better way to play it... Don't try this game if you have work to do.
Supreme Commander needs some new hardware. Checking the Gaspoweredgames forum and various benchmarks, I figure I need a machine with at least 2GB RAM, a E6600 CPU and an 8800GTS videocard. I kind of wonder if such a machine is actually more powerful than a PS3, comparable or maybe even weaker ? I have no idea.
Ok enough aimless rambling...
For an extremely limited time we can offer you the one and only Mulle kybernetiK Guitar Pick.
It's thickness (gauge) is 1.25 (extra heavy). They are all yellow and the print is on one side of the pick. It's perfectly shaped for fast alternate picking and pinch harmonics don't take any effort with it, we guarantee you. It's a special kind of pick though, you can only play METAL with it.
Click on the pick for a different pic
So you want one pick it's gonna cost 2 Euros including shipping. Or take the extra special offer: three picks for 4 Euro. That's quite a competitive price compared to the competition. Send an email if you are interested.
With the 8800GTS I can max out all the settings (except resolution, my monitor is too weak), and wow does it look good!
The immediate concern is that maybe my 17" display is too small. SPC units are fairly tiny at a comfortable zoom level, and a resolution of 2560xwhatever on a 30" display suddenly appears to make sense for the first time...
CPU-power-wise, an E6600 CPU may not be enough! Today I played a sandbox game with 500 units and reaction to the commands was delayed by several seconds. The graphics animations ran silkily smooth at the same time but my commander followed my commands with perceptible delays. I did crowd my commander with about 50 to 100 T3 construction bots, so the wayfinding wasn't simple for sure.
A look at the task manager revealed, that CPU-1 was going full tilt, while CPU-2 wasn't doing much at all, maybe 20% load.
From what I have read the first CPU is doing all the AI and world calculations, whereas CPU-2 is dedicated to driving the graphics. If that is true, and it certainly looks that way, then a QuadCore will not help much, because the main logic appears to run in a single thread.
Well more on this later... I'll look how many threads are really running.
It is just no fun having to delete SPAM on a daily basis.
On the other hand since the code has been staling away for about five years, I don't feel like I am throwing out the crown jewels with the trash, when I put the code into my blog. In all likelihood, there is nothing I would be doing with this code otherwise.
Originally I wanted to use the code in MulleEOF eventually. My basic perception was then, that I was going to optimize MulleEOF that much, that I would be limited by memory speed and - malloc speed - the final frontier. That turned out not to be true. MulleEOF is fairly code intensive and malloc speed isn't really the limiting factor, even if you were to go all out and reduce all the Objective-C overhead to C levels... or at least, that is my current way of thinking.
Well that's not exactly true either, malloc speed can be a factor, but I used the "bunch" object allocation approach for a few choice classes, and that worked well enough. So MulleEOF doesn't bottleneck in memory allocation currently.
A preview of the next installments:
I want to start off with a little historic background, as to why I wrote it. Then I want to talk a little about the real word ramifications, that a malloc library faces and must deal with and ways how to test this and how to hook into OS X. Then I will likely write about, how to approach such a task with some general blather about optimizing again. Finally I am just gonna show the code and discuss the pros and cons of the approach.
At the beginning of a project like the Mullocator, I will try to figure out what the optimum result could be and what the current state of affairs is. Yes, that's what the old folks call "back of the envelope calculations" and yes I do it too, because I am old... :)
The goal for the Mullocator was to write a malloc implementation that could allocate memory as fast as possible. With no regard for the complicated task of managing freed memory, the most simple and fastest malloc routine would appear to be a routine like this:
/* No, this is not the Mullocator :) */
/* some .c file */
char memory[ 0x1000000]; /* 4K pages of 4K size */
char *buf = memory;
/* malloc.h */
static inline void *malloc( size_t size)
{
extern char memory[];
extern char *buf;
void *p;
p = buf;
buf = &buf[ size]; /* one memory write, inlined code, hard to beat ... */
return( p);
}
Although the optimal malloc is so small, there are already a few problems with this code, one of which being: this may not be the fastest way to allocate memory!
long memory[ 0x1000000 / sizeof( long)];
long *sentinel = &memory[ 0x1000000 / sizeof( long)];
long *buf = memory;
static inline void *malloc( size_t size)
{
extern long memory[];
extern long *sentinel;
extern long *buf;
void *p;
long *q;
/* assume the compiler does reduce that to shifting, it will... */
q = &buf[ (size + sizeof( long) - 1) / sizeof( long)];
if( q > sentinel)
return( NULL);
p = buf;
buf = q;
return( p);
}
Note how the code itself got a little slower: one add, one shift, a comparison with a not-taken branch.
Fixes for #3 and #4 next time. Any questions ? Use the comments.
/* assume that this method is called before any call to malloc,
* sorta like +load, just on a C-level.
* Otherwise we'd need to pull the lock initializer into malloc itself
*/
/* malloc.c */
static void initialize_malloc()
{
if( pthread_mutex_init( &lock, NULL))
abort();
}
pthread_mutex_t lock;
long memory[ 0x1000000 / sizeof( long)];
long *sentinel = &memory[ 0x1000000 / sizeof( long)];
long *buf = memory;
/* malloc.h */
static inline void *malloc( size_t size)
{
extern pthread_mutex_t lock;
extern long memory[];
extern long *sentinel;
extern long *buf;
void *p;
long *q;
if( pthread_mutex_lock( &lock)) /* can fail, but what to do ? */
return( NULL);
p = NULL;
q = &buf[ (size + sizeof( long) - 1) / sizeof( long)];
if( q <= sentinel)
{
p = buf;
buf = q;
}
pthread_mutex_unlock( &lock); /* fail next malloc..*/
return( p);
}
When you run both functions against each other the result is
2007-03-27 21:07:37.651 PunishmentByLocks[2320] single-thread compatible 2007-03-27 21:07:38.341 PunishmentByLocks[2320] multi-thread compatible 2007-03-27 21:07:41.932 PunishmentByLocks[2320] doneSo that's .349+.341=0.690 for no locks and .659+2.0+.932=3.625 with locks.
The "optimal" malloc just became six times slower, or in other words my G5 with 2.5 GHz seems to be running with 400KHz...
This code is terrible. Unfortunately as people have come to expect, that you can malloc a memory area in one thread and free it in another, there seems to be no good way around it except optimizing the locking operation itself by inlining the locking code.
This leads me to the following rant (yeah! :)):
Generally it was and is a nightmare to have two processes running and communicating. The bearded UNIX guys gave us pipes and communication over filedescriptors for processes, some UNIXes had shared memory and semaphores. The interprocess communication between processes is a royal pain in the ass. Ever sent a serialized object over a socket ? Was RPC (Ugh!) easy to program ?
Threads came into being, because people wanted something more lightweight. Threads are more lightweight than processes, since there is no memory protection between threads. Therefore the context switching becomes cheaper, because the virtual memory page tables need not be exchanged in the MMU/CPU. Since threads are fairly easy to implement even in user space they just happened without Operating System support at first.
Why did they people want something more lightweight ? I suspect this was chiefly because of windowed GUIs, where it is sort of a necessity for multiple windows.
Of course threads now crash other threads willy-nilly and something else has been lost too (see above)...
What should have been done is this: create very easy to use process interfaces, that are provided by the Operating System. User code should run without locks, basically the user programmer should not even need to know the concept of a lock. When accessing shared-process objects (NSWindow would be a good candidate), the locking should be done for him automatically. Something like Distributed Objects just even simpler, just on the localhost, possibly just within the same process "family".
Ok enough of the rant. Threads are a reality and must be dealt with. Even if we substitute the pthread code with inlined spinlocks, this is a toll we have to pay.
This page contains all entries posted to Nat!'s Web Journal in March 2007. They are listed from oldest to newest.
February 2007 is the previous archive.
April 2007 is the next archive.
Many more can be found on the main index page or by looking through the archives.