|
NSTL Debunks Optimization Myth In Affiliation
=@MACARLO MICROSOFT= =@MACARLO YAHOO= =@MACARLO WEBALIAS= =@MACARLO ALTAVISTA=
|
![]()
![]()
from The Executive Software
Team
http://www.execsoft.com
![]()
NSTL (National Software Testing Laboratory)
has just released a white paper entitled "System Performance and File Fragmentation
In Windows NT". This is the first-ever third-party technical examination
of defragmentation and its affect on system performance. As part of this
examination, NSTL thoroughly exposes the optimization "theory" --
and how this can have a negative impact on a real-world network environment.
The entire section on optimization from the NSTL white paper is below. The full
text of the white paper will be available shortly on our Web site:
http://www.executive.com
![]()
"Optimization" is Not a Solution
Some utility vendors propose implementing additional file placement schemes,
like "optimization" to improve system performance. At the time
of this
writing, there is not a standardized method. In some cases, file
optimization attempts to place files strategically on the disk so that
little-used files reside at the physical "center" of the disk, while
frequently used files are placed near the outside edge or perimeter. In
other cases, optimization methods do just the opposite, attempting to place
recently accessed files towards the "center" of the disk. The
idea is that
head movement will be reduced, because frequently used files will be grouped
together, and that fragmentation will be prevented, because little used
files will stay in a stable area in order not to compete for free space with
frequently used files. The theory is flawed in many respects.
First, on a busy server, the number of accessed and changing files is very
large. Especially on a system that is dedicated to data as opposed to
programs, even infrequently used files may be accessed enough that the
effect on performance of strategic placement ("optimization") is essentially
immeasurable. Additionally, the cost in resource overhead to accomplish
such a task can result in a net loss to performance.
There is also the fact that physical clusters on a disk may not be aligned
as linearly as logical clusters. When you force a file into a specific
position on the disk by specifying exact Logical Cluster Numbers, (LCNs),
how do you know where it really is? You have to take into account the
difference between logical cluster numbers and physical cluster numbers
(PCNs). These two are not the same thing. LCNs are assigned to most
of the
physical clusters and the remainder are used as spares and for maintenance
purposes. What's more is that magnetic disks are far from perfect and
sometimes clusters "go bad." In fact, it is a rarity for a magnetic
disk to
leave the manufacturer without some bad clusters. When a disk is formatted,
the bad blocks are detected and "revectored" to spares. Revectored
means
that the LCN assigned to that physical cluster is reassigned to some other
physical cluster. This revectoring can also be done on the fly while your
disk is in use. This means that the new block after revectoring might
be on
the same track and physically close to the original, but then again it may
not. Thus, all LCNs do not correspond to the physical cluster of the same
number and two consecutive LCNs may actually be widely separated on the
disk.
Still another flaw with this system has to do with forcing a file to a
specific position on the disk. The problem is how do you know where the
file really is? You may be playing probabilities and perhaps you should
think twice before gambling with user data and system performance.
As it can be observed that LCNs do not necessarily correspond to the
physical cluster of the same number, a similar situation is true for
multi-spindle disks (one with two or more sets of platters rotating on
separate spindles). There are several different types of multi-spindle
disks. Besides the common volume sets and stripesets, there are also disks
that use multiple spindles for speed and reliability. These appear to
the
operating system as a single disk drive. Again the question is: Where
is
the "center" or "outside edge" of such a disk? The
answer: Such disk
volumes actually have several "centers" and "outer edges"
when speaking in
terms of access times.
On a disk that has multiple partitions, it is not possible to place
subsequent partition's Metadata (MFT/File Allocation Table, etc.) to the
outside physical or middle tracks of the disk. Take, for example, a single
disk that has physical clusters 0 to 9999 and it's partitioned into three
equally sized logical volumes, C:, D:, and E:. When each logical volume
is
formatted they will have logical clusters 0 to 3333. This is what your
applications and Windows NT will see and use to allocate/deallocate when
files are created, modified, or deleted.
The Metadata for each of these three separate logical volumes are stored in
their respected allocated space on the physical disk. Therefore, partition
C: will have its Metadata on some area of the disk, perhaps near the very
beginning of the physical disk, but partitions D: and E: will have their
Metadata elsewhere on the physical disk not anywhere near the location of
partition C: Metadata nor each other's. In this example, there are really
three different "beginning," "middle," and "end"
locations, yet only one
outer track or physical center.
Generally speaking, multiple partitions are set up to allow for better
allocation and ease of management. Many put the operating system on one
partition, programs and applications on another, and data on yet another.
When using the computer, one, two, or all of these individual partitions may
be accessed at any given time. The user issues a command to the application
(partition D: is accessed), the application makes call to an operating
system function (partition C: is accessed), and then data is searched for
and located (partition E: is accessed). The Read/Write heads will be moving
back and forth across the disk in a random pattern.
File placement capability (optimization) was designed for the real-time
laboratory environment in which a single process has continuous control of
the computer system. In such a system, the time consumed by head movement
from one particular file to another can be critical to the success of the
process. The system designed can minimize the critical time lag by
calculating the ideal location for the second file in relation to the first
file and force the two files into exact locations on the disk. Then, when
the process has completed reading the first file, access to the second is
effected with minimal delay.
For a network solution, by comparison, it is important to consider a typical
interactive user environment. Dozens or even hundreds of interactive users
may be logged onto the system and active at any given moment. They may
be
running a variety of applications and accessing innumerable files located in
every conceivable section of the hard disk. With this extremely random
mode
of operation, a disk optimizer cannot position files at exact locations in
order to reduce disk access times. File positioning is equally as likely
to
worsen system performance as to improve it. Even if the two conditions
balance out at zero, the overhead involved can result in a net loss.
![]()
![]()
@Macarlo, Inc.
@Macarlo's Shareware & Web
OS/2
Java Lobby Member
Java Site Accredited