Early UNIX file system formats
Here is a summary of the on-disk file system formats
used in early versions of UNIX.
Formats are presented in chronological order,
so far as I can determine that.
The focus is on how the bits were laid out on the disk,
rather than on the system calls used to access them,
though some notes on the latter are inevitable.
Some principles applying
to all the formats described here
(and to many of their successors):
-
A file system is constructed atop
a contiguous array of storage blocks
of uniform size;
64 words on the PDP-7,
512 bytes on the PDP-11.
Blocks are numbered starting at 0.
-
A data structure called an
i-node
describes the storage used by a file
(i.e. a list of block addresses),
and lists other attributes
such as permissions and ownership.
I-nodes are usually stored in a contiguous array
called the
i-list,
and named by their indexes within the array
(the i-number).
I-nodes are usually numbered starting at 1.
-
A file's size is measured in bytes or words;
the size may not be an even number of blocks.
Only whole blocks are allocated, however;
hence the last block of a file
may have wasted space at the end.
-
A file may be sparsely allocated:
there may be blocks represented by empty (zero)
block addresses.
Such blocks are assumed by the operating system to contain zeroes.
-
Certain i-nodes are somehow marked as
special files
that access devices rather than files in the file system.
Such files have no storage blocks,
and
the part of the i-node normally used for block addresses
may contain other data.
-
The first block or two of the file system
contains basic data
whence the rest of the file system structure
may be located;
this starting data is usually called the
super-block.
-
Directories are files
with specially-marked i-nodes.
A directory contains an array of
directory entries,
which in turn contain
a filename
and the i-number for the corresponding file.
One directory,
called the
root directory,
is the starting point
for naming any file
(the root of the directory tree,
when it is a tree);
a particular i-number is reserved for the root.
-
File system data structures
contain integers of various sizes;
these are stored in the format
native to the running system.
In particular,
file systems used on the PDP-11
use PDP-11 short and long integers.
PDP-7 source code printout,
unknown date
The information in this section
was puzzled out of an old,
nearly uncommented source code listing.
The date of the printout is unknown;
it is new enough to have
fork.
The PDP-7 is a word-addressed machine,
so all data structures are composed of words
rather than bytes.
A word is 18 bits.
Character strings conventionally reside in successive 9-bit half-words
(I don't know in which order).
Disk blocks are 64 words.
Block 0 contains basic information,
called
sysdata
within operating system;
it is not known whether the term
`super-block'
was used yet.
The i-list begins in block 2.
The
sysdata
area contains:
-
The address of the next block
containing a chunk of the free list
(one word).
-
A counter (one word) and an array (ten words)
of listing up to ten blocks known to be free.
-
The file system's current unique number
(one word).
-
Current system time
in sixtieths of a second
since an unrecorded base date
(two words;
most significant word first).
The disk copy of
sysdata
is updated regularly,
perhaps at every process switch.
The array of free blocks works as follows:
-
To allocate a block,
first
decrement the counter;
if the result is greater than zero,
return
array
[
counter]
.
(The array starts at zero.)
Otherwise,
read the block at the next-chunk address;
copy the first word of its contents
to the next-chunk address,
store the number of the block just read
in array element 0,
copy the next nine words of its contents
to the remaining words of the array,
set the counter to ten,
and start over.
If the next-chunk address was zero,
there are no more blocks.
-
To free a block:
if the counter is less than ten,
set the element indexed by the counter
to the newly-freed block address,
increment the counter,
and return.
Otherwise
write the current next-block address
followed by elements 1-9 of the array
into the block just freed;
store the address of the block just written
as the next block;
and set the counter to one.
Notice that this algorithm has a bug:
once a block becomes part of the chain
containing chunks of the free list,
it cannot be allocated to a file
even after that part of the list has been used up.
Perhaps the intent was for the allocator
to return the address of the block whence the chunk was just read
instead of starting over.
I-nodes are 12 words long:
-
Flags:
allocated,
large file,
special file,
directory,
read and write permissions
for owner and others.
-
Seven block addresses (each a single word).
-
Owner's user ID.
-
Link count.
-
File size,
in words.
-
Unique number for this i-node.
The first i-node is numbered 0,
but is never used.
I-node 3
is the root directory.
Other i-nodes numbered 1-16
are reserved for special files;
the i-number selects a device driver.
The remaining i-numbers have no special meaning.
There are two schemes for locating data blocks,
depending on the `large file' flag:
-
If the flag is clear,
this is a small file,
containing no more than seven blocks.
The block addresses name the data blocks directly,
i.e. address n
names file block n.
-
If the flag is set,
this is a large file;
each address in the i-node
names an indirect block
containing 64 block addresses,
which in turn name the data blocks.
Hence the address for file block n
is found in entry n
mod
64
in the indirect block named by address n/64 in the i-node array.
A directory entry is eight words long,
but only six words are defined:
an i-number,
four words
(eight characters)
of filename,
and the unique number for that i-node.
The filename is filled to the right
with space characters
(octal 040).
When an i-node is allocated,
the unique number in
sysdata
is incremented
and copied into the new i-node.
Hence when an i-node is re-used for a different file,
it is likely to have a different unique number.
When a directory entry is made,
the unique number is copied from the i-node.
The system thus could check
for directory entries pointing to the `wrong' i-node.
(This version didn't.)
Limits and notes:
-
The block-address scheme can handle files
of at most 7 * 64 = 448 blocks:
28672 words,
56 kilobytes
(half-word bytes).
-
The largest possible block address is
262144,
or roughly eight megabytes.
This was much bigger than the disk.
-
Notice that the i-node stores a user ID
but no group ID;
UNIX didn't have groups yet.
-
The base time is unknown,
but the time representation had a long potential life:
36 bits of 1/60-second ticks
is about 36 years.
First Edition manual,
file system
and
directory(V),
3 November 1971
The file system is much changed
from the PDP-7 verson.
Blocks 0 and 1 are the super-block,
containing bitmaps listing the free blocks
and free i-nodes in the file system.
The i-list
begins at block 2;
blocks after the i-list
are available for file data.
The contents of the super-block,
in more detail:
-
Number of bytes in the free-storage map
(one 16-bit word).
The count is always even.
-
The free-storage map.
Each bit represents a block;
if the bit is set,
the block is free.
Blocks used by the super-block
and the i-list are included:
i.e. the first bit is block 0.
-
Number of bytes in the free i-node map
(one 16-bit word);
always even.
-
The free-i-node map.
Each bit represents an i-node;
if the bit is set,
the i-node is allocated
(unlike the free-storage map).
The first bit is i-node 41;
the first 40 i-nodes are always allocated.
-
On the root file system only,
several words of status information
follow the free-i-node map.
Times are in sixtieths of a second.
-
The current time (32 bits)
-
Total time spent executing in the system since last boot
(32 bits)
-
Time spent waiting for I/O on
rf
and rk
disks
(32 bits)
-
Time spent executing in user processes (32 bits)
-
Error counts for
rf
and rk
disks
(one byte each)
I-nodes are 32 bytes long, and contain:
-
Flags (16 bits):
i-node allocated;
directory;
large file;
set user ID on execution;
a single `executable' flag;
read and write permissions
for owner and others.
-
Link count (8 bits).
-
Owner's user ID (8 bits).
-
File size in bytes (16 bits).
-
Eight 16-bit block addresses.
-
A file creation time
and a last-modified time
(32 bits apiece).
I-nodes 1-40
are special files;
the i-number selects a device.
I-node 41 is the root directory of the file system.
Other i-numbers have no special meaning.
If an i-node is marked free in the free map
but its `allocated' flag is set,
it is taken to be allocated,
so a corrupt map will not re-use i-nodes.
Block addressing
(small and large files)
works as on the PDP-7,
except that there are eight addresses in the i-node now,
and an indirect block now contains 256 16-bit block addresses.
The address for file block n
is found in entry n mod
256
in the indirect block named by address n/256 in the i-node array.
A directory entries is 20 bytes long:
a 16-bit i-number
followed by an eight-byte filename,
filled to the right with NUL bytes.
A zero i-number means the entry is unused;
the filename bytes of an empty entry
may contain garbage.
By convention,
the first two entries in each directory are
for .
(with this directory's i-number)
and ..
(its parent's i-number,
or its own if this is the root directory),
but this convention is not enforced by the operating system.
Limits and notes:
-
The block allocation scheme can represent a file of at most
2048 blocks (256 * 8),
or a megabyte;
since the file size is stored in 16 bits,
in fact the largest possible file is 65535 bytes.
-
The largest possible block address is 65535
(or a file system of 32 megabytes),
but the 1024-byte super-block limits the real file system size
to about 8100 blocks (a bit less than 4 megabytes).
This sounds like a serious constraint,
but remember that disks (and operating systems)
were much smaller in 1971.
The BUGS section in the file system
manual entry
says
Two blocks are not enough to handle the i- and free-storage maps
for an RP02 disk pack,
which contains around 10 million words.
This note remains in the manual through the Third Edition.
-
High addresses in the root file system
are used for swapping;
these blocks are marked free
in the free list,
even though they aren't.
-
The size of a directory entry
doesn't evenly divide a file system block.
Was the system willing to allow an entry to span blocks?
-
Calendar times
(e.g. the last-modified time in the i-node)
are in sixtieths of a second
since 00:00 1 January 1971.
The units are the same as on the PDP-7,
but because there are only 32 bits available,
the lifetime is much shorter;
with this base,
the format
will overflow 32 bits
in April 1973.
-
Notice that there is a user ID but no group ID;
UNIX didn't have groups yet.
Third Edition manual,
file system
and
directory(V),
15 March 1972
The file system format has not changed,
but times are now sixtieths of a second since
00:00 1 January 1972,
extending the life of the time format
until April 1974.
Fourth Edition manual:
file system
and
directory(V),
7 and 10 September 1973
The file system structure has changed quite a bit,
probably at the same time the operating system
was rewritten in C.
The new format is the almost that of the more familiar
Sixth Edition system.
Block 0 is reserved for a bootstrap program.
Block 1 is the super-block.
The i-list begins in block 2;
data blocks follow.
The super-block is structured as follows:
-
16-bit count of blocks in the i-list.
-
16-bit count of blocks in the entire file system;
said to be used by
check(I)
to validate block numbers,
but not by the operating system.
-
Free list header:
a count and a 100-element array of block numbers.
-
Free i-node cache:
a count and a 100-element array of i-node numbers.
-
The 32-bit time when the file system was last modified.
The free list is reminiscent of that
in the PDP-7 system,
but without the bug:
-
To allocate a block:
decrement count;
array
[
count]
is the block to be allocated.
If the resulting block number is zero,
there are no blocks left.
If count
is now zero,
read a new free-list segment
(a count followed by a 100-element array)
from the contents of the block.
-
To free a block:
first check whether count
is 100;
if so,
write the current count and array
into the block being freed,
and set count to zero.
Then set
array
[
count]
to the freed block number
and increment the count.
The free-inode cache is similar,
but is just a cache:
-
To allocate an i-node:
if count is nonzero,
decrement it
and fetch the i-node numbered
array
[
count]
.
If its `allocated' flag is clear,
return that i-number;
otherwise try again.
If count is zero,
search the i-list from the beginning,
storing free i-numbers in the array until it is full
or until all i-nodes have been examined,
then start over.
-
To free an i-node:
clear its `allocated' flag;
if count
is less than 100,
store the free i-number in
array
[
count]
and increment the counter.
I-nodes are 32 bytes long, and contain:
-
Flags (16 bits): i-node allocated; 2-bit file type
(plain file,
directory,
character special,
block special);
large file;
set user ID on execution;
set group ID on execution;
read, write, and execute permissions
for owner, group, and others.
-
Link count (8 bits).
-
Owner's user ID (8 bits).
-
Owner's group ID (8 bits).
-
File size in bytes (24 bits;
high eight bits stored in a separate byte).
-
Eight 16-bit block addresses.
-
A last-access time and a last-modified time
(32 bits apiece).
I-node 1 is the root directory
of the file system;
other i-numbers have no special meaning.
Special files are now distinguished by
the file type in the flags,
rather than by i-number.
The block addresses are used in the same way as in the
previous format,
with direct addresses for small files
and indirect blocks for large ones.
Special files have no data blocks;
instead,
the first `block address' is a number naming a device.
The high byte is an index into the system's table of devices;
the low byte indentifies a particular device unit,
and is interpreted in different ways by different drivers.
A directory entry is now 16 bytes long:
16 bits of i-number,
14 bytes of filename,
the latter filled to the right with NUL bytes.
The .
and ..
entries are unchanged.
Limits and notes:
-
The file size field in the i-node can now handle 16 megabytes,
but the block-addressing scheme is still limited to a megabyte
(2048 blocks), so that is the size of the biggest possible file.
-
The number of blocks in a file system is now limited
only by the use of 16-bit block numbers,
so a 32-megabyte file system is possible.
-
Blocks used for swapping no longer appear in the free list.
-
Time format has been changed to that
still in use in the late 1990s:
whole seconds since 1 January 1970 UTC,
which overflows in
January 2038.
(Most programs treat it as a signed number,
so the future is limited to 31 bits.)
-
Groups have appeared;
they show up elsewhere in the system as well.
Sixth Edition manual,
file system
and
directory(V),
9 February 1975
Evidently files larger than one megabyte are needed now,
so the block-addressing scheme has changed.
For large files,
the eighth address in the i-node
now points to a doubly-indirect block,
containing the addresses of 256 ordinary indirect blocks,
each containing 256 data block addresses.
Hence the addressing scheme can now represent files with
(256 * 7) + (256 * 256) = 67328 blocks,
or just under 33 megabytes;
since the file size is still stored in 24 bits,
the biggest possible file is now 16 megabytes.
Seventh Edition manual,
filsys
and
dir(5),
January 1979
The file system code has been rewritten again.
Types and constants are more carefully parameterized,
as part of the general cleanup of the system
that happened at this time.
As before,
block 0 is reserved for a bootstrap program;
block 1 is the super-block;
the i-list begins at block 2;
data blocks follow.
The super-block is structured as follows:
-
Number of first block not in i-list (16 bits).
-
Number of blocks in entire file system (32 bits).
-
Free-block list header:
16-bit count,
array of
NICFREE
,
32-bit block numbers.
-
Free-i-node cache:
16-bit count,
array of
NICINOD
16-bit i-numbers.
-
Four bytes of flags meaningful only inside the running system.
-
Time when super-block last updated (32 bits).
-
Several numbers that are `not maintained by this version of the system:'
-
total free block and i-node counters
-
two 16-bit `interleave factors'
-
six character file system and file system pack names
The free-block list and free-i-node cache
work as they have since the Fourth Edition format.
I-nodes have grown to 64 bytes:
-
The former `flags' are now called `type and mode':
file type
(regular,
directory,
block or character special,
block or character multiplexed special);
set user ID, set group ID on execution;
save swapped text after use;
read, write, and execute permissions
for owner, group, and others.
-
Number of links (now 16 bits).
-
Owner's user ID (now 16 bits).
-
Owner's group ID (now 16 bits).
-
File size in bytes (now 32 bits).
-
40 bytes of block addresses:
13 addresses of 3 bytes each
(hence 24 bytes)
followed by one unused byte.
-
Times when the file was last accessed
and last modified,
and when either the file or the i-node was last changed;
32 bits each.
The `last-changed' time is used to decide
whether a file should be saved on an incremental backup.
Comments in the data structure shown in the manual
(and hence probably in the header file
sys/ino.h
)
call it `time created,'
which will spawn years of confusion.
I-node 2 is now the root directory of the file system;
other i-numbers have no special meaning.
Instead of `small' and `large' files,
different addresses are used in different ways:
-
The first ten addresses name
the file's first ten blocks.
-
The eleventh address names an indirect block
containing the addresses of 128 additional data blocks.
Indirect blocks are simple arrays of 32-bit numbers.
-
The twelfth address is doubly-indirect:
it names a block of 128 addresses,
each naming a block of 128 addresses,
each naming a data block,
yielding 128 * 128 more blocks.
-
The thirteenth and last address is triply-indirect:
it names a block of 128 addresses,
each of a block of 128 addresses,
each of a block of 128 addresses of data blocks;
128 * 128 * 128 more blocks.
Directory entries are the same as the Fourth Edition format.
Limits and notes:
-
The file-size field in the i-node can now describe a file of two gigabytes
(constrained to 31 bits
because the operating system treats file pointers as signed numbers).
The real limit on file size is the number of blocks:
at most 10 + 128 + (128 * 128) + (128 * 128 * 128) are allowed,
a total of 2113674 blocks,
or just over a gigabyte.
-
The number of blocks in a file system is now limited by
the 24-bit addresses stored in the i-numbers:
an eight-gigabyte file system is possible.
Again,
signed file pointers provide the practical limit:
file system maintenance programs
cannot build or repair a file system
of more than two gigabytes.