The same holds for the "housekeeping information" on the disk. If, for
example, the software has recently read from a particular directory, then it's quite
likely that there will be a later request for the same directory.
A disk cache is a region of main memory that holds copies of some of the disk blocks.
When a block is requested, the file system software first checks to see whether
it's already in the cache, and if so a slow disk operation can be avoided. If the
block is not in the cache, it is placed in the cache as part of the read operation.
As the cache fills up, older blocks have to be discarded. Usually the software uses
some form of "least recently used" algorithm, which discards those blocks
that haven't been used for a long time.
Smart cacheing algorithms often also include some form of "read ahead".
If you read block N of a file, then it's likely that you'll soon want block N+1,
so the file system uses any spare processor time to read that block as well. Sometimes
you won't want block N+1, in which case the read-ahead has been wasted; but on average,
this approach saves enough time to compensate for the occasional wasted operation.
As long as you're only reading from the disk, and not writing, the cache contents
are identical with what is on the disk. Once you start writing, this might no longer
be true. There are two popular ways to manage write operations with a cache:
[When I used Windows, I found it simply wasn't safe to enable lazy writing, because
the disk corruption got a little worse each time an application crashed the system.
Now that I use OS/2 I usually have Lazy Write enabled, because faulty applications
usually don't stop the entire system.]
Surprisingly, quite a few people seem to shut down OS/2 by turning off the power,
rather than going through a proper shutdown operation. Such people should never
have Lazy Write enabled. (Actually, such people shouldn't be allowed in the same
room as a computer - but that's another story.) Lazy Write should also be disabled
on systems where security is more important than speed.
If you ever do get into the situation where your system is locked up so hard that
you can't shut it down, you should use the Ctrl/Alt/Del method of stopping the system.
(If nothing happens, wait a few seconds and try Ctrl/Alt/Del again. The second attempt
usually succeeds.) Although this doesn't do a proper shutdown, it does at least
attempt to ensure that the disk caches are flushed.
So much for the theory. Now let's look at the cache parameters you can control.
DISKCACHE=3328,LW,128,AC:CD
Remark: many OS/2 users keep one or more FAT partitions for DOS compatibility, but
have their most important files on HPFS partitions. In practice this means that
they just don't use FAT files very often. In such cases, it probably makes sense
to keep the DISKCACHE small, or eliminate it altogether. This would slow down FAT
operations, but would save enough main memory to make the rest of the system faster.
IFS=F:\OS2\HPFS.IFS /CACHE:2048 /CRECL:64 /AUTOCHECK:EF
You might also see a parameter like /F:2 on this line. This specifies what "level"
of disk checking is to be done by CHKDSK. The default value is 2, and that's the
best value for almost all situations.
The command "CACHE", without any parameters, returns a list of the parameters
currently in force. Here's a typical output: DiskIdle: 1000 milliseconds
MaxAge: 30000 milliseconds
BufferIdle: 10000 milliseconds
Cache size: 2048 kbytes
3 Lazy write worker(s) are enabled.
1 Read ahead worker(s) are enabled.
Disk <--> cache <--> buffers <--> application software
The original 8086 was not significantly faster than conventional memory, so it did
not include a cache. (It had an internal instruction pipeline that could be thought
of as a cache, but it was only a few bytes long.) Later members of the 80x86 family
are faster, and include an internal cache - that is, a cache memory that's implemented
as part of the processor chip. The newer the processor, the bigger the on-chip cache.
With the very fastest processors, you need a bigger cache than will fit on the chip.
(OK, strictly speaking you don't need it; but it does make a big performance
difference.) Some motherboard manufacturers now include a "level 2 cache",
which effectively increases the size of the on-chip cache. The main difference between
an on-chip cache and a level 2 cache is that the latter is physically placed on
the motherboard, rather than on the processor chip.
[Remark: the hardware manufacturers often neglect to test their hardware with true
multitasking software, which means that the hardware can't always go as fast as
the manufacturer thinks it can. As a result, OS/2 sometimes can't be installed on
hardware with a level 2 cache. The solution is to disable the level 2 cache (via
the BIOS setup options), install OS/2, and then re-enable the level 2 cache.]
Unlike disk caches, the operation of a processor cache is not controlled by software.
You need to do it all in hardware. Of course the necessary hardware is always included,
so this is a non-issue for most users.
Some people discover that their system slows down if they add more main memory.
(Normally, adding memory should make your system faster.) When this happens, it
means that the new memory is not being cached. To fix the problem, it's necessary
to configure the hardware in such a way that cacheing is enabled for all of main
memory. Typically this means that the size of the level 2 cache must be increased.
Paging is a different hardware memory management scheme, whose primary purpose is
to support disk swapping (see later). Paging hardware typically also contains some
protection mechanisms - e.g. the designation of some pages as read-only - but the
protection is not as complete as with segmentation.
Segmentation hardware in the 80x86 family first appeared in the 80286. (It's sometimes
said that the 8086 had segmentation, but that's an abuse of terminology. It had
an addressing mechanism that looked a little like segmented addressing, but this
wasn't true segmentation, because the protection hardware was missing.) Starting
with the 80386, the processors in this family have both segmentation and paging.
OS/2 uses both of these hardware features, which is why the current versions of
OS/2 require an 80386 or better.
The ideas behind segmentation are actually much older than the 8086; but for many
years there were very few computers that put the ideas into practice, because of
the high cost of the hardware. (Paging, being rather cruder and simpler, got more
support from the hardware designers.) One of the most important innovations in the
80286 was that the designers managed to fit the segmentation hardware onto the processor
chip itself, which made mass production possible at an affordable cost.
The 80386 address translation hardware is a little unusual in that it uses a two-stage
translation. Within a program - that is, in the executable machine code - addresses
are expressed as a pair (segment number, offset within segment). (In the majority
of machine language instructions the segment number is not explicitly included,
because the hardware provides for some "current segment" defaults. Nevertheless,
the programmers have to remain aware of which segment they're talking about.) The
segmentation hardware translates each such address into what's called a "linear
address". Then the paging hardware takes the linear address, splits it up into
the pair (page number, offset within page), looks up its own tables that give the
physical address of each page, and finally produces a physical address. It is this
physical address that is sent to the main memory as part of an instruction fetch,
a memory read, etc.
At first sight it might appear that the segmentation hardware and the paging hardware
are doing the same thing. If this were true, it would of course be a waste to have
both sets of hardware, since it wouldn't be doing any more than could be achieved
with a one-stage translation. There are, however, several important differences:
Both the segmentation hardware and the paging hardware have to look up tables in
main memory in order to do their job. This sounds like a major overhead, and indeed
it would be if every address translation triggered several extra memory references.
What makes the whole system work is that both sets of hardware have their own private
caches (in high-speed memory) for the translation tables.
Swapping is a technique that lets you use a disk file as an extension of main memory.
It works as follows. A program's memory consists of a number of what are called
virtual pages. The paging hardware maps virtual pages into physical pages. Each
entry in the page table (the address translation table for paging) contains a physical
page number, but it also contains several flags, and one of these flags is used
to signal a "physical page not present" condition.
As long as your software is using memory pages that are physically present in main
memory, nothing unusual happens. If, however, the paging hardware detects a "page
not present" condition, it issues an interrupt called a "page fault"
interrupt. The interrupt routine then has to deal with this condition.
There are at least two possible causes of a page fault. The obvious cause is a programming
error where the software is trying to address something outside its legal range.
Of course there's nothing you can do about that but abort the program. The less
obvious cause, but in fact the most common one, is that the address is legal but
the swapping software has not yet loaded that page into main memory. When that happens,
your program is temporarily suspended while that page is fetched from disk; and
then the program can proceed again.
The overall effect is that the effective size of main memory is increased by the
size of a special disk file called the swap file. The system software that looks
after paging and swapping manages the movement of pages between main memory and
the swap file as needed.
Some systems require the swap file to have a fixed size. In OS/2, a line in CONFIG.SYS
specifies the initial size of the swap file, but the swap file can subsequently
grow and shrink as needed. This flexibility comes at a cost: while the swap file
is growing, the extra overhead causes your system to slow down substantially. To
avoid that overhead, it's best to make the initial swap file size so large that
it will rarely need to grow.
Quite a lot of what's in main memory is executable code that won't be altered during
execution. When this code has to be bumped out of main memory to make room for something
else, the memory image doesn't need to be saved in the swap file; it can be re-read,
the next time it's swapped in, from the original source file. This helps to keep
the swap file small. Executable code pages are normally marked "discardable",
to tell the swapper that they need not be saved in the swap file.
There is, however, a slight time penalty in making code pages discardable. Code
saved in the swap file is saved in the form of a memory image, i.e. it's an exact
copy of what was in main memory. Code in an EXE or DLL file has a slightly more
complicated format, and requires some processing by the system loader as it's being
loaded into memory. To reduce the time overhead, some frequently used code is marked
"swappable" rather than "discardable", to force it to be written
to the swap file when it's swapped out. (If you've used several versions of OS/2,
you might have noticed that the swap file gets bigger than it used to be in earlier
versions.) Although this increases the disk overhead, it makes the overall system
a little faster.
Segmentation and protection were introduced with the 80286, but the designers faced
a compatibility problem: most of the existing software was written for the 8086,
therefore the 80286 had to be capable of executing 8086 software. The solution they
adopted was to define two operating modes for the processor. In "real mode",
the processor acted just like an 8086, and the segmentation hardware was disabled.
In "protected mode", the new protection features were enabled.
As it turned out, not much software was written for the 80286. The dominance of
the DOS/Windows market meant that most people used the 80286 as an 8086 emulator.
The advanced features were largely wasted.
The 80386 introduced a new twist. It still had real and protected modes, but in
protected mode it was possible to define a special segment type that acted as an
8086 emulator. This allowed you run a protected-mode operating system, and still
have a mechanism to run all those legacy applications without having to re-boot
back to real mode. It's this "virtual 8086" mode that OS/2 uses to run
DOS/Windows applications.
Given this feature, there's no longer much need for real mode. The processor still
boots up in real mode, but the OS/2 initialisation routines switch the processor
into protected mode almost immediately.
Now and then you get a DOS application - usually a game - that won't run even under
the OS/2 DOS emulation. In that case your only option is to switch back into real
mode and run "pure" DOS. OS/2 provides a "hibernate" feature
that does this for you - in effect, it re-boots the machine so that OS/2 is no longer
in charge.
It's all based on a misconception. I'll explain why later in this section. In fact,
32-bit software is usually slower and more memory-hungry than equivalent 16-bit
versions.
In the original 8086, most internal registers were 16 bits wide. The processor used
32-bit addresses, but these were broken down into a 16-bit segment base and a 16-bit
offset. Since the segment base was implicit in most instructions, it was common
to refer to these 32-bit addresses as 16-bit addresses. (To complicate matters,
main memory addresses were only 20 bits wide.)
Later models of the processor, starting with the 80386, expanded many of the internal
registers to a 32-bit width. This turned the addresses into 48-bit addresses: 16
bits for the segment number, and 32 bits for the offset within the segment. (Again,
most people call these addresses 32-bit addresses rather than 48-bit addresses.)
In addition, these later processor models introduced some new instructions and new
addressing modes.
This brings us right back to the question of upwards compatibility. You can't just
change the register sizes and still expect the old 8086 software to execute correctly.
To solve this, each code segment has two special flags, which are stored in the
segment descriptor. (The segment descriptor is the table entry that the segmentation
hardware uses to do its address translation.) One of these flags says whether the
code uses 32-bit data registers or just the lower 16 bits. The other flag specifies
whether the code is using addresses with 16-bit offsets or addresses with 32-bit
offsets. This is done on a per-segment basis, thereby allowing a mixture of software
using the new and the old conventions. You can even call a 16-bit procedure from
a 32-bit code segment, or vice versa.
(In fact, there are even special "escape" codes that allow isolated 32-bit
instructions in a 16-bit segment, or vice versa.)
Do you really need 32-bit data registers? My own experience (and I've written a
lot of software over the years) is that 16 bits is adequate nearly all of the time.
There are a few special situations where 32-bit variables must be used, but those
situations don't arise all that often.
On the other hand, the use of 16-bit data does mean that the programmer has to be
conscious of the possibility of overflow, and to design the software accordingly.
The world has a lot more bad programmers than good programmers, and there's a lot
of software out there that doesn't take this complication into account. For the
weaker programmers, a move to 32-bit data registers reduces the probability of error,
and that's probably a good thing.
The situation with respect to addresses is a bit more complicated. With a 16-bit
offset, the maximum segment size is 64 kilobytes, and that's not a lot of memory.
There are several situations where you need bigger segments.
OK, that goes some way towards explaining what 32-bit software is all about. It
doesn't yet explain why so many people are in a hurry to get rid of their 16-bit
applications and "move up" to 32-bit versions. What's the attraction?
The answer lies in a historical accident. The 32-bit support in the 80x86 family
appeared at roughly the same time as operating systems were starting to take advantage
of the processor's protected mode. The move to protected mode was definitely a step
forward. Most PC users were getting heartily sick of the "General Protection
Fault" syndrome, and it was a real relief to move to away from the situation
where one crashed application could bring down the entire system. The software vendors
weren't particularly clear on the distinction between "protected mode"
and "32-bit software". The sloppy advertising meant that most users were
also confused about, or even unaware of, the distinction.
Unfortunately, segmentation was expensive. It required hardware that did sophisticated
address translation at high speed (so as not to create unacceptable time overheads).
This was possible with the technology of that time, but it would have added significantly
to the overall cost of a computer. Presumably the hardware designers decided that
the extra cost was not justified; in any event, there were very few commercial implementations
of the idea.
By the time the 80286 appeared, the technology had finally caught up; at last, it
was possible to put into practice, at an acceptable cost, an idea that had been
around for many years.
It's a pity, then, that hardly anyone used the segmentation hardware. There were
several reasons for this.
An address space is linear if addresses can be combined according to the laws of
linear arithmetic. For example, if p1 and p2 are two pointers, then the operations
p1+p2 and p1-p2 should produce two other valid pointers.
A segmented address space is definitely not linear. With a segmented memory, p1+p2
never makes sense, and p1-p2 has a meaning only if p1 and p2 are pointers into the
same segment. The nonlinearity of a segmented address space puts some severe limitations
on what sort of address arithmetic is possible.
In a strongly typed programming language, these limitations make sense. In fact,
a segmented address space is ideally suited to the implementation of the more modern
programming languages (e.g. Ada, Modula-2, Oberon) which stress the concept of modularity.
There is a very natural mapping between modules and segments.
As it happened, however, many of today's operating systems were designed at a time
when the older language C was near a peak of its popularity, and C does not
match particularly well with a segmented memory model. C permits some operations
(mixing pointers to code with pointers to data, unrestricted linear address arithmetic,
etc.) which the segmentation hardware would trap as illegal. One might well argue
that the only things that segmentation would prevent are those things that a sensible
programmer wouldn't do, but that's irrelevant. A compiler has to permit anything
that the language standard permits - even the stupid things - or it's not a standard-conforming
compiler.
As a result, C programmers generally insist on having a linear address space - and
this tradition seems to be continued by the C++ programmers. This has had a major
influence on OS/2, because so much OS/2 software is written in C or C++.
Can you get a linear address space with 80x86 hardware? Well, you can't actually
disable the segmentation, but there's a way of pretending that it isn't there. The
trick is to combine your entire program (both code and data) into one huge segment,
and to set up the segment registers so that they all select the same segment. Another
way of looking at this is to say that you have several segments, but they overlap
precisely so that they're all in the same physical memory. And that's what the "flat
memory model" of the 80x86 (more precisely, the 80386 or higher) is all about.
As you might guess from the above discussion, I'm not particularly a fan of the
flat memory model. In fact, I think it's crazy to throw away the advantages of segmentation.
I can afford to say that because I'm not selling any OS/2 software. The software
vendors don't have the same luxury; you can go out of business by calling your customers
crazy. The flat memory model is what most OS/2 programmers want, and that's what
they get.