Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

char, did I ever know you ?

What the C Standard defines

Here’s some quotes from the C standard (WG14/N1256 Committee Draft — September 7, 2007 ISO/IEC 9899:TC3)

An object declared as type char is large enough to store any
member of the basic execution character set. 
If a member of the basic execution character set is stored 
in a char object, its value is guaranteed to be nonnegative. 

Interestingly char doesn’t say anything about bits or memory sizes. It’s tied to the notion of characters. Now what is the basic execution character set ?

Both the basic source and basic execution character sets 
shall have the following members: 
the **26** uppercase letters of the Latin alphabet
ABCDEFGHIJKLM NOPQRSTUVWXYZ
the **26** lowercase letters of the Latin alphabet 
abcdefghijklmnopqrstuvwxyz
the **10** decimal digits 
0123456789
the following **29** graphic characters
!"#%&'()*+,-./:;<=>?[\]^_{|}~

By that definition there is no SPACE (or LF) in the basic execution character set. I would guess that is a bug in the draft I am reading.

So the basic execution character set contains at least 91 or 92 characters and therefore needs at least 7 bits of room mathematically. Then, if char is a signed type, char needs to be 8 bit or larger. http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char.

This isn’t right. One could encode the charset with negative numbers too. So 7 bits is minimum even with signed.

pre C99 : sizeof makes char the smallest value type

It is interesting to note that sizeof gives the size of char always as 1, even if char is actually 24 bits wide. I got this from http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1. Which means that there can be only one smaller type than char: void.

Memory access is therefore at its smallest granularity also char size. It’s an interesting consequence of sizeof returning always a multiple of char.

I found it at first bizarre how the notion of a character set is eventually tied to memory size by the language, But it’s actually an implicit relationship, that is never really explained.

post C99 : sizeof now returns the size in bytes

The sizeof operator yields the size (in bytes) of its operand,
which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result
is an integer.

This changes things a bit. There is a possibility now, that an integer type exists that is smaller than char. This definition introduces the notion, that memory is organized in bytes though. It would make more sense specifying it as bits, but that would break everything.

If sizeof ever returned an integer in terms of char, I would have preferred sizeof staying the way it was and introducing a separate bitsizeof

So there you go in C99. Memory is organized in bytes. _Bool may be smaller than char, but it only stores 0 and 1. There is still no smaller defined value type as char, which makes memory still adressable in char quantities, though uint8_t could possibly magically work with compiler specific types.

Thus, int8_t denotes a signed integer type with a width of exactly 8 bits.

Therefore sizeof( uint8_t) is now guaranteed to be 1.


Post a comment

All comments are held for moderation; basic HTML formatting accepted.

Name:
E-mail: (not published)
Website: