Andys Binary Folding Editor is primarily designed for structured browsing, although
it also provides minimal editing facilities.
This program is designed to take in a set of binary files, and with the aid of
an initialisation file, decode and display the definitions (structures
or unions) within them. BE is particularly suited to displaying non-variable length
definitions within the files.
This makes examination of known file types easy, and allows rapid and reliable
navigation of memory dumps.
For a summary of how to use the editor, see the section Using the editor.
BE has the following features :-
usage: be [-w width] [-h height] [-c colscheme] [-p] [-r]
[-i inifile] {-I incpath} {-D symbol} {-S name=val}
[-d defn] [-a addr] [-f field] [-v viewflags] [-C dx] [-g]
{-Y symfmt} {-y symfile[@bias]} [-C dx] { binfile[@addr] | mx![args[@addr]] }
flags: -w width screen width
-h height screen height
-c colscheme set colour scheme (0 to 3, default: 0)
-p print data to stdout, non-interactive
-r restricted mode, no shelling out allowed
-i inifile override default initialisation file
-I incpath append include path(s) for use by inifile
-D symbol pre-$define symbol(s) for use by inifile
-S name=val set constant name to be value
-d defn initial definition to use (default: main)
-a addr initial address to use (default: 0)
-f field field name within defn (list link, or array to expand)
-v viewflags combinations of A,O,L,I,Y,a,e,b,o,d,h,J,+,-
-C dx code disassembler extension]
-g perform seg:off->physical mapping on all addresses
-Y symfmt symbol table format
-y symfile@bias input symbol table file(s) (with optional bias)
binfile@addr binary file(s) (with optional address, default: 0)
mx!args@addr memory extension with arguments (and optional address)
The -w and -h arguments can be used to try to override the
current screen size. This doesn't work on UNIX, but does on DOS, OS/2 and Windows.
The -c argument allows you to choose from a small selection of colour schemes.
The -p flag causes BE to be invoked in a non-interactive manner. It
decodes the address given, as a structure of the type specified, and writes the
result to the screen (as stdout).
The -r flag prevents a user of BE from shelling out a nested operating
system command.
The -i flag overrides the default initialisation file.
The -I flag affects the operation of the include
command in the initialisation file.
The -D flag allows the definition of symbols which may be accessed via
the $ifdef and similar directives in the initialisation file.
The -S flag allows the definition of a named constant
for use in numeric expressions in the configuration file.
The initial structure definition and address to decode may be overridden with
the -d and -a flags. Normally BE starts by looking up the definition
of a 'main' definition, and decoding the data at address 0 as such. The address
expression is allowed to refer to symbols in symbol tables, as it is evaluated after
the symbol tables have been loaded. All the other numeric command line arguments
are evaluated before any symbol table loading takes place, and so can't refer to
symbols.
If the -f flag is used, it must identify a field within the specified
structure. If the field is a pointer to a structure of the same type, BE will initially
display a linked list of structures, rather than just one structure. Otherwise,
the field is assumed to be an array of fields, and an element list is displayed
instead.
The -v flag allows you to state that addresses, offsets, lengths and
array indices are to be displayed next to the data on display initially (note that
-vI turns off indices). You can also turn on the symbolic display of addresses.
In addition, you can specify the display mode of indices, one of binary, octal,
decimal or hex. The + and - keys affect the initial level of detail
of display, and only has effect when used with the -f flag. This is particularly
useful when combined with the -p flag.
The -g argument is the 'segmented mode' switch. When enabled, BE translates
all 32 bit addresses prior to using them to fetch or store data. ie: address 0xSSSSOOOO
is mapped to SSSS*16+OOOO. This is obviously intended for debugging dumps
from embedded Intel processor dumps, and anyone with a sensible file format can
ignore this flag.
Symbol table(s) may be specified using the -y flag. Symbol files are
assumed to be the format generated by the ARM linker. However, the -Y flag
can be used to tell BE that symbols in other formats follow. Multiple symbol files
in differing formats may be specified, as in :-
be -Y aix_nm -y syms.nm -Y arm -y syms.sym ...
See the section on symbol table formats for a description
of the supported file formats.
If a bias is specified, then it is added to each symbol value in the file. This
is handy when a symbol table contains relative values, rather than absolute addresses.
Multiple input binary files can be specified, and they should be loaded at non-overlapping
address ranges.
Each binary file provides data for a part of the memory space which BE can edit.
Therefore each binary file may be described as a memory section.
Alternatively a memory section may be specified as mx!args. This instructs
BE to load a memory extension, and to access the data indentified
by the arguments via the memory extension. This feature allows BE to be extended
to be able to edit non-file data directly, such as sectors on a disk.
The -C dx option may be used to extend BE by the use of a disassembler
extension. This is a peice of code with a well defined interface, which BE uses
to disassemble data annotated as code.
Typical invokations of BE might be :-
be picture.bmp
to edit a file, which is loaded into the BE memory space at 0 onwards.
be -y gizmo.sym gizmo.rom gizmo.ram@0x8000
to edit dumps from the RAM and ROM of a coprocessor.
where the ROM starts at 0, and the RAM at 0x8000.
gizmo.sym is the symbol for the microcode the coprocessor was running.
be -Y map -y ucode.map -i ucode.ini -g -C i86 coproc!io=0x400,mem=0xc0000
to live edit a running coprocessor.
ucode.map has the symbols for the microcode the coprocessor is running.
ucode.ini is a custom initialisation file.
BEcoproc.DLL provides BE with access to coprocessor memory.
io=0x400,mem=0xc0000 tells BEcoproc.DLL how to find the coprocessor.
BEi86.DLL allows BE to disassemble any code in the data.
be -d HEADER -a 512 -p -vA file.dat
display the header at 512 bytes into file.dat.
decoded data is to be written to stdout, BE is not interactive.
addresses are to be displayed next to the data.
One of the first things BE does is to find and load the initialisation file,
and this tells BE the layout of various file formats and the structures within them.
Under OS/2, Windows and DOS, BE finds the initialisation file by searching along
the path for the .EXE file, and then looking for a .INI file with
the same name.
Under UNIX, BE looks for ~/.berc, and failing that, it looks along the
path for be and then appends .ini. If be is renamed to
xx, then the files will be ~/.xxrc and xx.ini.
BE can be made to look elsewhere using the -i command line option.
This initialisation file may contain C or C++ style comments.
Also, $define, $undef, $ifdef, $ifndef,$else,
$endif and $error are supported, as a form of a pre-processing/conditional
processing step. The -D command line option may be used to pre-$define
such conditional processing symbols.
It should be noted that $define, $undef, $ifdef and
$ifndef can all be given a list of symbols (rather that just one). This
causes $define or $undef to define or undefine all the symbols
in the list. It causes $ifdef or $ifndef to check that all the
symbols in the list are defined or that they are all undefined.
If BE is running on OS/2, then OS2 is pre-$defined. If running
on Windows, then WIN32 is pre-$defined. If running on a type of
UNIX, then UNIX is pre-$defined. If running specifically on AIX,
then AIX is pre-$defined. If running specifically on Linux, then
LINUX is pre-$defined. If running on DOS, then DOS is
pre-$defined. Either BE or LE will be pre-$defined,
depending upon whether BE is running on a big-endian or little-endian machine. These
$defines allow you to write initialisation files with sensible defaults,
relevant for the current environment.
An include directive is supported, and included files
will be searched for by looking in the current directory, then along an internal
include path, along the BEINCLUDE environment variable, and finally along
the PATH environment variable. The internal include path is usually empty,
but may be appended to by the use of the -I command line option.
By the time the initialisation file is processed, any symbol files specified
on the command line will have been loaded, along with any data files. This means
that initialisation files may make reference to symbols and also to the data itself.
The initialisation file contains commands to set the default
data display attributes, set constant, structure
definitions, alignment declarations and include
statements.
As BE processes the initialisation file, it generates warnings (such as undefined
symbol table symbol), and error messages into an internal buffer. If there are no
errors, then this buffer is discarded. If there are errors, then all the warnings
and errors are listed, and BE aborts.
Wherever the initialisation file calls for a number, the following variants may
be used :-
Note that the commas in the [ and [[ expressions can be omitted,
although this is not recommended. Consider the expression [n32 0xf000 -5]:
this looks like it means 'the 32 bit word from address 0xf000, or -5 if it can't
be fetched', but it actually means 'the 32 bit word at 0xeffb, with no default if
it can't be fetched'. Writing [n32 0xf000 (-5)] would fix this problem,
but using commas makes the intention explicit.
It should be noted that when using the offsetof or map keywords,
leading and trailing space is not significant in the "mapletstring"
or "fieldname".
Expressions may be constructed by use of brackets and also the following operators,
with usual C language meanings. Operators grouped together have equal precedence.
Higher precedence operators listed first :-
|
+, -, ~, ! |
unary plus, unary minus, complement, not |
|
*, /, % |
multiply, divide, modulo |
|
+, - |
add (plus), subtract (minus) |
|
<<, >>, >>> |
shift left, shift right (signed), shift right (unsigned) [Note 1] |
|
>, <, >=, <= |
greater than, less than, greater than or equal, less than or equal |
|
==, != |
equal, not equal |
|
& |
bitwise AND |
|
^ |
bitwise exclusive OR |
|
| |
bitwise inclusive OR |
|
&& |
logical AND |
|
^^ |
logical exclusive OR [Note 2] |
|
|| |
logical inclusive OR |
|
? : |
conditional expression |
Note 1: The >> is a signed shift right, and >>>
is the unsigned shift right (much like Java). This distinction is necessary as all
numbers in BE expressions are unsigned. (This affects affects the outcome of expressions
like -2/2 which is 0xfffffffe/2 which is 0x7fffffff,
rather than the -1 you might expect).
Note 2: C/C++ does not have a logical exclusive OR, but BE does for symmetry.
Note also that the operator precedence now matches that of C++.
Earlier versions of BE had fewer operators and had different precedences for
&, | and ^. This change shouldn't break anything.
Such numeric expressions can also be used when BE prompts for a number, not just
in the initialisation file.
Some example expressions :-
addr "tablebase" + 4 * sizeof RGB
-- symbol tablebase plus four times the size of the RGB definition
[ n32 be , 0x70200+0x44 ] + 27
-- fetch big-endian 32 bit word from 0x70244, then add 27
[ n16 be bits 11:4 , 0x1000 ]
-- get big-endian 16 bit word from 0x1000, extract bits 15 to 3 inclusive
-- the word was 0x1234, this would give a result of 0x23
[[ "SIGNATURE" , 0x1000 , 0x2000 , 4 ]]
-- locate "SIGNATURE" between 0x1000 and 0x2000, 4 byte aligned
BE maintains a smallish list of global numeric constants. eg:
set num_elements 14+5
Avoid using constant names which clash with other identifiers, such as map or
structure definition names. Also, avoid clashing with reserved
words in the initialisation file language.
The constant can be assigned any numeric expression, including
referencing other constants.
This feature allows initialisation files with the following technique for managing
multiple configurations of data :-
$ifdef BIG_DATA_FILE
set n_entries 100
$else
set n_entries 10
$endif
def DATA_RECORD
{
n_entries n32 buf 100 asc "names "
n_entries n32 dec "salaries"
}
Attempting to set a constant which is already defined produces an error.
The unset command can be used to undefine a previous value. It is not
an error to unset a constant which is not previously set to anything :-
set elems 100
unset elems
set elems 200
The -S command line flag can be used to set a constant before the initialisation
file is processed. Because the constant is set before the initialisation
file is processed, the expression the constant is set to can't refer to things within
the initialisation file. Assuming the initialisation file debinfo.ini uses
a constant called tabsize :-
be -i debinfo.ini -S tabsize=10 debug.dat is fine
be -i debinfo.ini -S tabsize=10+4 debug.dat is fine
be -i debinfo.ini -S "tabsize=sizeof STRUCT" debug.dat is illegal
The special constant nosym if set, is returned when the addr "symbol"
syntax is used in an expression, to try to determine the numeric value of a symbol
which isn't defined. The usual use of this is in defining a value which is miles
away from any sensible value.
When the program starts parsing the initialisation file, the default data display
attributes are le unsigned hex nomul abs nonull nocode nolj noseg nozterm.
To change this default setting, just include one or more of the following keywords
in the file :-
Note that when multibyte numeric values are displayed in ASCII or EBCDIC, the
ordering of the characters produced works like this :-
|
|
|
|
|
n8 |
0x41 |
'A' |
|
n16 |
0x4142 |
'AB' |
|
n24 |
0x414243 |
'ABC' |
|
n32 |
0x41424344 |
'ABCD' |
This can have the side effect that when people design eye-catcher values as numbers
to store into memory, they may appear reversed when displayed. In such cases, it
might make more sense to decode the field as a N byte ASCII buffer, rather than
a number.
Mappings are BE's equivelent to C enumerated types and bitfield support.
These define a mapping between symbolic names and numeric values. A typical mapping
definition in the initialisation file might be :-
map compression_type
{
"uncompressed" 1
"huffman" 2
"lzw" 3
}
If the numeric value on display matches the value given, then it can be converted
to the textual description.
Bitfields may be acheived in the following fashion :-
map pending_events
{
"reconfiguration" 0x0001 : 0x0001
"flush_cache" 0x0002 : 0x0002
"restart_io" 0x0004 : 0x0004
}
The : symbol introduces an additional mask. The number to string conversion
algorithm inside BE works like this :-
for each maplet in the map
if ( value & maplet.mask ) == maplet.value then
display the maplet.name
if some unexplained bits left over then
display the remaining value in hex
The case where the value and following mask are the same is much more common
than the case where they are not. So BE provides a typing shortcut where .
in the mask means 'the same as the value'. So the above example can be written :-
map pending_events
{
"reconfiguration" 0x0001 : .
"flush_cache" 0x0002 : .
"restart_io" 0x0004 : .
}
It is possible to have multiple field decodes from a single value :-
map twobitfields
{
"green" 0x0001 : 0x000f
"blue" 0x0002 : 0x000f
"red" 0x0003 : 0x000f
"small" 0x0100 : 0x0f00
"large" 0x0200 : 0x0f00
}
The value 0x0243 would be converted to red|large|0x40.
It has been alluded to above, that when supplying numeric expressions, the map
keyword may also be used. In the following example, the expression evaluates to
0x0105 :-
map twobitfields "small" + 5
In fact, if there is no constant or symbol with the same name, you can use the
following shorthand for the above example :-
small + 5
Even sophisticated mappings like the following will work as expected :-
map attribute_byte
{
"colour" 0x10 : 0xf0
"red" 0x13 : 0xff
"green" 0x14 : 0xff
"shape" 0x20 : 0xf0
"round" 0x23 : 0xff
"square" 0x24 : 0xff
}
In this example the meaning of the bottom 4 bits is dependent on the value of
the top 4 bits. The top 4 bits encode whether the attribute is encoding information
about the colour or shape of something, and the bottom 4 bits encode which colour
or shape. The value 0x23 is displayed as "shape|round".
When displaying a maplet decoded value, the M key can be used to bring
up a list of the maplets and whether they decode or not. Through this, the value
can be edited.
Definitions are BEs equivelent to C structures and unions.
Definitions are a list of at OFFSET clauses, align ALIGNMENT
clauses and field definitions. When the structure definition is processed, then
the current-offset is initialised to 0.
An at OFFSET clause moves the current-offset to the specified numeric
value.
An align ALIGNMENT clause moves the current-offset to be the next integer
multiple of the specified numeric value.
A field definition defines a field which lives at the current-offset
into the structure. After definition of the field, the current-offset is moved to
the end of the field, so that the next field will immediately follow it (unless
another at OFFSET clause is used, or a union is being defined).
The size of the structure is the largest value that the current-offset ever attains.
This is the value returned whenever sizeof DEFN is used as a number.
Duplicate definitions of the same named definition are not allowed.
A structure definition may have zero or more fields, align ALIGNMENT
clauses and/or at OFFSET clauses.
A structure definition may behave like a C struct definition, in that
each field follows on from the previous one in memory. Or it may behave like a C
union definition, in that all fields overlay each other in memory, and
the total size is the size of the largest field.
def A_STRUCTURE struct
{
n32 "first field, bytes 0 to 3"
n32 "next field, bytes 4 to 7"
// sizeof A_STRUCTURE is 8
}
def A_UNION union
{
n32 "first field, bytes 0 to 3"
n16 "second field, bytes 0 to 1"
// sizeof A_UNION is 4
}
The keyword struct is unnecessary, and may be omitted.
These may be combined, like in the following :-
def MY_COMPLICATED_STRUCTURE
{
n32 "first field, occupying bytes 0 to 3"
union
{
n32 "second field, occupying bytes 4 to 7"
struct
{
n16 "the bottom 16 bits of the second field, occupying bytes 4 to 5"
n8 "the upper middle byte, occupying byte 6"
n8 "the top byte, occupying byte 7"
}
}
}
The at OFFSET clause also allows the same areas of a structure to be
displayed in more than one way, thus also allowing the implementation of unions
:-
def UNION_THE_HARD_WAY
{
n32 le "first value, bytes 0 to 3"
at 0 n8 "the lower byte, byte 0"
// sizeof UNION_THE_HARD_WAY is 4
}
Note: in the above style of example, you can't use the offsetof keyword
to position a new field on top of an earlier field, because whilst you are defining
a structure definition, it isn't actually fully defined yet, and so the offsetof
keyword will not be able to find it.
Here are some examples of field definitions :-
n8 asc "initial"
n8 buf 20 "surname"
n16 be unsigned dec "age"
3 pet "pet names"
3 n16 be unsigned dec "pet costs"
2 n32 le unsigned hex ptr person "2 pointers to parents"
2 n32 ptr person null "2 pointers, null legal"
person "a person"
n32 sym code "__main"
1024 n32 unsigned dec "memory as 32 bit words"
9 n16 map errorcodes "results"
buf 100 asc zterm "a C style string"
GENERIC_POINTER suppress "pointer"
n32 ptr FRED add -. "link"
n32 bits 31:28 "top 4 bits"
n32 bits 27:0 "bottom 28 bits"
n32 sym code width 10 "function"
Each example is of the form :-
optional-count type optional-attrs name
The field describes count data items of the specified type, count is restricted
to being >= 1, and if it is > 1, then the field is initially displayed by
just showing its type (eg: 10 n32 le unsigned hex "numbers").
When you select the field, you are presented with an element list, with count lines,
from which you can select the element you are interested in.
The type of the data is one of n8, n16, n24, n32,
buf N or DEFN, where DEFN is the name of a previously defined
definition. This type may be considered to be the way in which BE is told the size
of the data item concerned. n8, n16, n24 and n32
mean 8, 16, 24 or 32 bit numeric data item. buf N means a buffer of N bytes.
The field has the default data display attributes, unless
data display attribute keywords (as defined above) are included in the field definition.
In addition to the data display attribute keywords given above is the map
MAP attribute which means display the numeric field by looking up a textual
equivelent of the numeric value using the mapping which must
have previously been defined.
If the field is numeric, the bits MS:LS designation can be used to say
that only a subset of the bits fetched are to be displayed. Also, if you edit the
field, only the subset of bits are changed. BE does a read-modify-write of the numeric
field to acheive this. Despite only showing a subset of the bits, the field is still
the same 'size', and the union mechanism must be used to decode multiple
bit ranges in the same numeric field. eg:
union
{
n16 be bits 15:12 bin "top 4 bits"
n16 be bits 11:0 hex "bottom 12 bits"
}
The ptr DEFN attribute says that the numeric value is in fact a pointer
to a definition of type DEFN. DEFN need not be defined yet in the initialisation
file. The mul/nomul attribute described above specifies whether
to multiply the pointer value by the size of the data item being pointed to. You
can use mult MULT to multiply the pointer value by MULT (therefore mul
is effectively the same as mult sizeof DEFN). The null/nonull
attribute described above specifies whether this pointer may be followed if the
numeric value is 0. The keyword add BASE may be used, and there is also
a align ALIGNMENT keyword. ALIGNMENT can only be 1, 2, 4, 8 or 16 in the
current implementation. Also, the rel/abs attribute described
above specifies whether to add the address of the pointer itself to the numeric
value. By using combinations of the pointer keywords, various effects may be acheived
:-
The procedure for following pointers is :-
The seg keyword works by taking the top 16 bits of the pointer value
as the segment, the bottom as the offset, and producing a new pointer value which
is segment*16+offset. This feature may be of use for decoding large memory model
program dumps which have been running on x86 processors running in real mode, or
a 16:16 protected mode with a linear selector mapping. This feature is not recommended
- its much easier to use the new -g command line switch instead. Anyone
with a sensible file format to decode, or a dump taken from the memory space of
a processor of a sensible architecture, can ignore this feature.
The keyword open may be given and this has the effect of increasing
the level of detail that is initially displayed. See the description of the level
of detail of display feature later in this document. This feature has its problems
(bugs), but can be used to ensure that small arrays and short definitions are displayed
in full without the user having to manually increase the level of detail by hand.
Also, the suppress keyword may be used. Normally all fields are shown
when a definition is being viewed, but some can be marked as suppressed. Fields
which are suppressed are shown with their values in round brackets, when you are
viewing a definition with a field to a line. When a whole definition is shown on
one line (by expanding the level of detail of display), those fields marked with
suppress, are not shown.
The tag attribute may be given. When this field is initially displayed,
the line will initially be tagged. Typically you might pre-tag one or two specific
fields in a structure, if the structure were large, and certain fields were more
important than others.
The width WIDTH attribute may also be given. By default, field widths
are 0, which means don't pad or truncate fields when they are displayed. When set
non-0, each field (or each individual field of an array) is padded or truncated
to be the given width. If a field is truncated, a > or <
symbol is shown. The width can be changed interactively by the user.
Finally the name of the field must be given. You used to have to pad all field
names of the same definition to be the same width with spaces, so that when displayed,
everything lines up nice. But now BE does this automatically for you.
A typical structure definition might look like :-
def FROGLISTELEM
{
n32 ptr FROGLISTELEM "next_frog_in_list"
buf 100 asc "name_of_this_frog"
}
However, consider the case that BE is being used to edit a dump of a processors
memory space. In this case we also wish to be able to see all the global variables,
whose addresses are determined by a symbol (rather than some fixed address). So
it is typical to take advantage of the fact that fields can be placed at any offset
into a structure (using at EXPR), and that expressions may refer to the
symbol table (using addr "SYM"). You put such fields in a structure
holding global variables, which would be decoded from address 0. You'd write something
like :-
def GLOBAL_VARS
{
at addr "frog_list" n32 ptr FROG "frog_list"
...
}
Now this can be a very common idiom, and you usually want the displayed field
name to match the symbol name. So to avoid typing everything twice, BE provides
a short-cut :-
def GLOBAL_VARS
{
n32 ptr FROG at "frog_list"
...
}
Normally, when parsing a structure definition, each field
is positioned immediately after the one before (unless the union, align,
or at keywords are used).
When BE begins processing the initialisation file, it believes that all n8,
n16, n24 and n32 variables should be aligned on a 1 byte
boundary. In other words, no special alignment is to be automatically performed.
This is radically different from the way the high level languages such as C lay
out the fields within their structures and unions. These languages enforce constraints
such as '32 bit integers are aligned on 4 byte boundaries'. This is usually done
because certain processor architectures either can't access certain sizes of data
from odd alignments, or are slower doing so. This can be accounted for by manually
adding padding to structure definitions :-
def ALIGNED_USING_MANUAL_PADDING
{
n8 "fred"
buf 3 "padding to align bill on a 4 byte boundary"
n32 "bill"
}
Or alternatively, the align keyword could be used :-
def ALIGN_USING_align_KEYWORD
{
n8 "fred"
align 4
n32 "bill"
}
It is possible to tell BE to automatically align n8, n16, n24,
n32 or nested definition fields on specific byte (offset) boundaries by
constructs such as the following (which corresponds to many 32 bit C compilers)
:-
align n16 2
align n32 4
align def 4
align { 4
align } 4
def ALIGNED_AUTOMATICALLY
{
n8 "fred"
n32 "bill"
}
The align { directive specifies that nested definitions must start on
the indicated boundary. The align } directive specifies that structure
sizes get rounded up to a multiple of the alignment.
Clearly, this feature is more useful when BE is being used to probe memory spaces
of running programs via an memory extension, or doing post-mortem
examination of program dumps.
Most data file formats don't-need-to and/or don't-bother-to align their fields.
The initialisation file can contain the following, as long as it is outside of
any other definition :-
include "anotherfile.ini"
Be sure to notice that this is a initialisation language command, not a pre-processor
directive like $ifdef. This is why it is not $include.
There is also a tryinclude variant, which tries to open the file specified,
but does not get upset if it can't :-
The following are reserved words, and so should be avoided as names of constants
in the initialisation file :-
abs add addr align asc at be bin bits buf code dec def ebc hex include le
lj map mul mult n16 n24 n32 n8 nocode nolj nomul nonull noseg nozterm
null oct offsetof open ptr rel seg set signed sizeof struct suppress
sym tag tryinclude union unset unsigned width zterm
Here is a snippet from a real initialisation file :-
le unsigned hex abs // set defaults, just to be sure
lj // allow ARM specific symbolic lookup of code addresses
map DE_
{
"DP_Pending" -1
"DS_Success" 0
"DE_Failure" 1
}
def DPB
{
n32 ptr DPB "DPB_Next" // Link to the next one
n32 sym code "DPB_Address"
n8 map DC_ "DPB_Number"
n8 "DPB_Flag2"
n8 map SY_ "DPB_Flag"
n8 signed map DE_ "DPB_Dsb"
}
def NOP
{
DPB "NOP_Header"
n8 "NOP_Spare1"
n8 "NOP_Spare2"
n8 "NOP_Spare3"
n8 dec "NOP_Period"
n32 dec "NOP_Value"
}
def main // the entire memory map
{
at addr "noptable" 100 NOP "noptable"
at addr "currentdpb" n32 ptr DPB "currentdpb"
}
In the above example, note how the DPB_Next field points to another
DPB. As this is the first field, it will be selected one when the DPB
is first shown. Thus, if they are strung together in a linked list, it can be a
simple matter of pressing Enter to step to the next element of the linked
list.
Sometimes, if the 'pointer to the next' field is not the first, people code the
following type of definition :-
def BLOB
{
at 4 // Goto where link is
n16 ptr rel BLOB add -. "BLOB_Next" // Note . == 4
at 0 // Go back to top
n16 hex "BLOB_FirstWord"
n16 hex "BLOB_SecondWord"
at 6 // skip link, we've already shown at the top
n16 hex "BLOB_FourthWord"
buf 512-. hex "BLOB_PadToBlock"
}
Although messy, this re-ordering can make traversal of long linked lists significantly
faster.
This technique falls over when very long linked lists are traversed, because
you must manually select the link field and press Enter to go to the next
linked list element. This can be time consuming. Also, each level of nesting consumes
a non-trivial amount of memory.
The solution which more effectively handles linked lists of small or large lengths is the use of the show a list mechanism, which is described later.
The supplied initialisation file contains enough definitions to enable you to
examine the contents of many file formats.
Bitmap files supported include :-
Animation formats :-
Also, the following miscellaneous file formats :-
The definitions in the initialisation file are in no way complete, or intended
to be a definitive statement of such files contents, but are merely intended to
aid in the browsing of the contents of such files.
Limitations of BE make it awkward to decode certain data structures in some files,
so the attitude taken is typically 'display as best you can', and where data may
be of variable length 'display the first few bytes worth...'.
If you are simply interesting in looking at some of the file raw, you can use
the DB, DW and DD definitions that come supplied in the
default initialisation file. If you wanted to look at memory at 0x8000 as dwords,
you could type :-
BE displays most of the non-obvious keys you may press on the 2nd line of its
status area, at the top of the screen.
BE works by presenting lists to the user. These can be lists of data fields, lists of array elements etc.. A user action can result in a new list being displayed on top of the previous one. Effectively, there is a 'stack' of lists, where you always get to see the topmost one. The level of nesting is always on display at the top right hand corner of the screen.
Although not displayed, the arrow keys, such as Up, Down, PgUp,
PgDn, Home, End, Left and Right all work in the
obvious ways, traversing the list on display. The Wordstar keys ^E, ^X,
^R, ^C, ^W, ^Z, ^S and ^D also work.
As you move around the current list, your line number and total number of lines
in the list are shown on the top right of the screen in the form line/totallines.
The user can discard the current list, and go back to the previous one by pressing
Esc.
q or @X (ie: Alt+X) exits the program. If you have made any changes,
you will be prompted as to whether BE should write them out to disk. @W can
write out any unsaved changes.
p allows you to 'print' the list on display to a file. You can specify
the filename, and whether to append to or overwrite any existing file of that name.
Non-printable (but displayable) characters get converted to '.' dots.
f or / or F9 allows you to do a find over the list on display.
This only searches as much as the user could see if he were to manually page up
and down through the list. The find command is case sensitive. n or F10
can be used to repeat the last find. If a find is taking a long time, it may be
interrupted using Ctrl+Break on OS/2, Windows and DOS. On AIX or Linux, the
Esc key may be used. The \ key will reverse the direction of the find,
ready for when you next use the 'repeat the last find' function.
i allows you to generate a new list, which only has lines which include
a pattern you specify. This new list pops-up on top of the current one. For example,
if you have an array of trace-point events, you can easily generate a list of just
trace-points from one module. Similarly, x allows you generate a display
which excludes lines which match the pattern.
S can be used to generate a new list which is the same as the current
list, except the lines are sorted. You are prompted for a 'sort after' pattern,
and as to whether the result is to be sorted in ascending or descending order. Anything
on each line, upto and including the 'sort after' pattern is ignored for the purpose
of the sort.
The find, include and exclude commands normally do a straight case sensitive
textual comparison. The editor can be toggled in and out of Extended Regular Expression
mode (as in UNIX egrep), using the @R key. When set into this mode, future
finds, includes and excludes all work with extended regular expressions. eg: include
(fred|bill)[0-9]+ will include all lines with 'fred' or 'bill', followed
by one or more digits.
Similarly, @I can be used to toggle in and out of case sensitive search
mode.
The Extended Regular Expression mode case sensitivity mode also affects the sort
command. The sort command and the use of Extended Regular Expression mode go naturally
hand in hand, because you often want to be able to sort upon the Nth field of each
line. It is trivial to write an ERE like ,[^,]*, which matches the first
pair of commas (so the sort can be done on the third field), or 0x[0-9a-f]+
which matches the first hex number.
The Extended Regular Expression mode and case sensitivity mode also affects the
'power address slide' patterns, and tag/untag all matching commands, as explained
later.
The r key causes a refresh. BE re-fetches all the data on display. The
R key is a slightly more aggressive form of refresh. If a memory
extension providing data to BE was caching data, this type of refresh causes
it to drop its cache. Sometimes BE is used with an extension to watch live real-time
data, and continual refresh is desired. By pressing the periodic update key, @U,
you can put BE into a mode whereby it refreshes at regular intervals. The interval
is user-selectable. You exit this mode using Ctrl+Break on OS/2, Windows
or DOS, or by using Esc on AIX or Linux.
Tags may be placed or removed within the list on display by pressing the @T
key. You may quickly move backwards or forwards between tags by pressing ^Home
or ^End. Tags appear as little 'T's on the right hand side of the line. Placing
or removing tags in one session or list has no effect on any others.
T and U may be used to tag or untag all lines matching a given
pattern or extended regular expression.
The ! key may be used to execute an operating system command. This capability
can be disabled by the -r command line flag.
@V can be used to bring up a view of a regular text file. There is no
text editing capability. As special cases, F1 trys to bring up the help file,
and F2 trys to bring up the configuration file.
BE doesn't just maintain a single stack of lists. In fact it maintains 10 parallel
stacks, or 'sessions'. You can jump between them using the @0, @1,
... @9 keys. This allows you to be looking at several places within your
data at once, and to be able to easily hop between them. The current session number
is the second from last number on display on the top right corner of the screen.
It is initially 1.
@C copies the stack of lists from the previous session onto the current
session. Typically you use this when you've found something interesting, and you'd
like to leave the current session showing the interesting data, and yet you'd also
like to continue investigations around that area.
Given there are 10 sessions, each with any amount of nesting, it can be easy
to get lost, so the @K allows you to generate a summary of where you are
in each session.
@Z may be used to pop off all the lists in the current session, and effectively
reset the nesting level to 1.
@F1 to @F4 inclusive may be used to change the colour scheme to
scheme 0 to 3, as initially specified by the -c command line argument,
or as initially defaulting to 0.
The keys A,O,L,I toggle the display of addresses, offsets, lengths and
array indices. @A, @E, @B, @O, @D and @H
may be used to set the display mode of the array indices to ASCII, EBCDIC, binary,
octal, decimal or hex. Also, @Y toggles the display of addresses between
raw hex, and symbol table entry and offset. The @J command toggles the display
of symbolic code addresses which have the lj attribute between the short
and long forms. By default, at startup, BE choses only to show array indices, the
array index mode is hex, addresses are not shown symbolic, and long jumps are shown
in their short form. The -v command line flag can also be used to change
the startup display flags.
The | (pipe-bar) key toggles the display of pipe bars between flags
in a mapping. This is typically only used when a mapping has been cleverly defined
to do something like RISC instruction set disassembly, to tidy up the display.
Pressing @ will cause BE to prompt for a structure definition
name, and then an address. It will then pop-up a new list, decoding the memory at
the given address as if it were of the specified structure type.
The C allows you to disassemble from a given address, assuming a disassembler
extension has been supplied to BE via the -C command line argument.
D can be used to pass user-options through to the disassembler.
Initially, if a symbol table is supplied to BE, disassembly stops when the addresses
symbols (as in symbol+offset) change. ie: BE stops disassembling more than one function.
Although one compiled C function typically has one label, hand written assembler
tends to have many labels within one function, so the Y key can toggle between
stopping on label changes and ignoring them.
The @F key pops up a list of the memory sections BE is editing. There
is one for each file (or memory extension invokation) currently
being edited. Against each, BE says whether it has any unsaved changes.
The editor holds a list of 12 'address slide' patterns, and these may be displayed
by pressing @M. These are used when the 'power address slide' feature is
used. You can set one of the 12 patterns by using the ~F1 to ~F12
keys. To disable one, you specify a new pattern as a empty string.
The editor holds an 'address slide' delta value. Initially this delta value is
4, but it may be changed using the # key. When using #, dot '.' may
be used in the numeric expression, and its current value is the current delta value.
This delta value is used by the manual 'address slide' feature using the <
and > keys, and also the 'power address slide' feature.
If you press ?, BE will prompt for an numeric expression, which it will
then evaluate. It will show the result in unsigned decimal, signed decimal and unsigned
hex.
When you use the ^L key, you are prompted for a count and a keystroke.
BE presses the keystroke on the current line, and then steps down a line. It does
this once for each of the count of lines you specified. The count value can be 0
or blank, meaning upto the end of the list on display. This keypress, step down
and repeat loop, will stop if the keypress is not 'understood' by the line it is
pressed on. This means that only keypresses which operate on a given line are sensible
for using with ^L. It will also stop if the end of the list is reached.
@G can be used to go to the Nth line on display. 0 means the first line, a blank line number, or a very large number means the very last line.
At any given time you may be displaying some data from some start address, as
indicated on the title at the top of the screen.
The . key can be used to change the current address, and the ,
key can be used to add to the current address.
The editor provides a feature known as 'address sliding'.
You can use the ( and ) keys to step (slide) the address backwards
or forwards by 1.
You can also use the < and > keys to step (slide) the address
backwards or forwards by a particular delta (as setup by the # key, described
above).
The 'power address slide' feature is the combination of regular 'address sliding'
with a pattern match capability. You set up the power address slide patterns and
then press [ or ] (for a backwards or forwards search). You then state
whether one, all, or all-in-order of the patterns must match, and how to refresh
the screen as the search proceeds. You're also prompted for an address to stop at.
BE then slides through memory, checking to see whether the patterns can be matched
with the screen, and if so it stops.
A 'power address slide' may be interrupted via Ctrl+Break (OS/2, Windows
and DOS), or Esc (AIX or Linux).
There are a few main uses of address sliding :-
The justification for the default delta of 4 is that many structures within processor
memory spaces or within files are 4 byte aligned.
The @ command described earlier works a little better when you are viewing
data, because a dot used in the numeric address expression is taken to mean the
current address (as shown on the title).
Similarly, the C command described earlier works a little better when
you are viewing data, because a dot used in the numeric address expression is taken
to mean the current address (as shown on the title).
Often you may find yourself looking at a definition that is actually a member of a larger definition. If you know the offset of the smaller definiton in the larger definition, you can subtract this from the current address and display the larger parent definition. This can be awkward, so the @P key will pop-up a list of all possible parent definitions, with an entry for every time the smaller definition appears in another definition.
g/l is displayed if you are allowed to change the memory interpretation
mode to big or little endian.
s/u is displayed if you are allowed to change the signed display mode
to signed or unsigned.
A subset of the keys a/e/b/o/d/h/y/m may be displayed if you are allowed
to change the viewing mode to ASCII, EBCDIC, binary, octal, hex, decimal, symbolic
or via a mapping table.
z is displayed if you are allowed to toggle the 'stop displaying when
a nul terminator is found' attribute.
The t will decode the current field as if it were raw ASCII text, and
will break it up into lines upon CR, LF or CR-LF pair boundarys. The new line-by-line
list pops-up on top the current list.
If the datum is a code address (marked with the code attribute in the
initialisation file), then c can disassemble the code at that address.
+/- is displayed to indicate that the level of detail of display may be
increased or decreased. Level 0 means display the data type only. Level 1 means
display the first level of data. Levels 2 and above mean display additional levels
of detail.
Increasing the level of display can make BE open up an array, and enumerate the
elements. eg: 3 n32 to [123,123,456].
Increasing the level of display can also make BE open up a definition, and display
the fields. eg: VAR to {"name",123}.
This is capable of opening up the datastructure pointed to by a pointer, providing
the pointer may be fetched and followed.
Some examples :-
|
|
|
|
|
|
n32 |
7 |
7 |
7 |
|
3 n32 |
3 n32 |
[8,9,10] |
[8,9,10] |
|
VAR |
VAR |
{"a",1} |
{"a",1} |
|
2 VAR |
2 VAR |
[VAR,VAR] |
[{"b",2},{"c",3}] |
|
n16 ptr VAR |
22->VAR |
22->{"d",4} |
22->{"d",4} |
|
2 n8 ptr VAR |
2 n8 ptr VAR |
[33->VAR,44->VAR] |
[33->{"e",5},44->{"f",6}] |
Enter is displayed if you can press enter to either show the contents
of the sub-definition, or to follow a pointer and show the definition there. This
results in a new list of fields or array elements being popped-up. The Esc
key brings you back to where you are now.
There is a shorthand of the above @ command. If you are on a numeric field,
and you know this is an absolute pointer to a structure definition,
you can use the follow pointer key *. BE will then prompt for the definition
name. This shortcut ignores any pointer information that may be deduce-able from
the value on display, so even if you are looking at a relative pointer which is
aligned, BE will decode a definition at an absolute address.
The editor provides the @L key, which makes the job of following long
linked lists especially easy. If you looking at the members of a definition,
and are on a member which is in fact a pointer to the same type of definition,
then you can use the @L (show list) key. You will be presented with the elements
in the linked list (at least the first 4000), and at the end the reason the link
following ended. This reason can be that there are too many to show at once, 'can't
fetch value', 'can't follow null pointer', or the list has 'looped back' to an element
shown earlier. If your list is really long, you can always go to the last linked
list element on display, select it, and then use the @L key again to get
the next 4000 elements!
The = key may be used to edit the current field on display.
If the current field is a numeric value, then you can type a new expression,
according to the rules for numbers and expressions used when
parsing the initialisation file. Dot '.' evaluates to the fields current numeric
value. Examples include :-
1
1+2
addr "symbol"
sizeof RGBTRIPLE
map FF_ "FF_Split" | 0x20
If the current field is displayed via a mapping table, then
the M key can be used to bring up a list of the maplets, and whether each
of them can be decoded from the numeric value. The current fields value can be edited
from this new list. Esc quits the maplet list.
If the current field is a buffer, then either ASCII data or raw hex bytes may
be supplied :-
"a string within quotes"
@1234FF00
If the zterm attribute is applicable to the current field, then after
the data is stored, a NUL terminator is appended.
The @S key toggles the suppress attribute of the current datum. This affects
how the current structure shall be displayed, when displayed in short. The @N
key unconditionally sets the suppress attribute of the current datum. Only non-suppressed
fields are shown in the one line summary.
w can be used to set the field width. Normally fields are shown is as
many characters as are necessary. This corresponds to a field width value of 0.
When non-0, fields are padded or truncated to the indicated width.
Del and Ins can be used to copy and paste between the current datum
and a memory clipboard or file. To use the memory clipboard, simply specify a blank
filename when prompted. Only smallish blocks of data (<=4MB) can be copied or
pasted. The amount of data transferred is always the minimum of the datum size,
the clipboard size and 4MB.
The external edit key, E, works by prompting you for an editor command.
It then saves away the current datum into a temporary file and invokes the editor
on it. Afterwards, the file contents are re-read. At most 4MB can be processed in
this way. This might be useful if a file contained a chunk of free-flow text, and
you wished to perform some complicated editing on it, involving inserting and deleting
- you could externally edit that chunk using a text editor. Or, sometimes when editing
binary data, you might like to see it in a typical hex dump and edit raw hex - you
can externally edit with a normal hex editor. This command doesn't work if BE is
running in restricted mode, ie: has been invoked with the -r command line
argument.
Z will zero the current datum. Only datums of 4MB or smaller can be zeroed.
Each possible maplet in the mapping is displayed in the list.
Each maplet has a mask and value, and the maplet is deemed to match if :-
value & maplet.mask = maplet.value
In this case a 1 is displayed next to it, otherwise a 0 is
shown.
If you press 0 then the value is anded with the complement of the mask.
If you press 1 then the value is anded with the complement of the mask,
and then the value is or-ed in.
Although this may seem strange, the net effect is that when maps are being used
for enumerations, 1 will change the value from whatever it was before to
the new desired value.
When the mapping is used for decoding bitfields, 1 will turn on a bit
and 0 will turn it off.
Examples of enumeration and bitfield style mappings :-
map ENUMERATION map BITFIELD
{ {
"first value" 1 "lowest bit" 0x01 : 0x01
"second value" 2 "next bit" 0x02 : 0x02
"third value" 3 "high bit" 0x80 : 0x80
} }
If the current line of code references another routine or code code address,
c can be used to pop up another list of the referenced routine.
Similarly, if data is referenced, and the address is easily determinable by the disassembler, the * can be used to follow a pointer and display a structure at that address.
W can be used to write back any unsaved changes on the current memory
section. This isn't normally necessary, as when you leave BE using q or @X,
you are prompted as to whether you wish to save any unsaved changes on a memory
section by memory section basis.
o can be used to pass an user supplied option string to the memory extension peice of code providing the memory section. The memory extension is given the memory section instance and the option string. It can parse the option string in any way it sees fit. If there is a syntax error, or other problem, it can fail the options command with an error message to say why. If a memory section is provided from a file, this command will fail (files have no options). This user-exit mechanism might be used to allow you to tell a memory extension to change how much caching it can do.
These are shown in the list brought up by the @M key, as described earlier.
It is a list of 12 entries, each of which may be disabled, a pattern or an Extended
Regular Expression.
You can set one of the entries using the = key. This is the same as using ~F1 to ~F12.
Many of the keystrokes listed above were chosen so as to match the default key
bindings of Andys Source Code Folding Editor (AE).
Although OS/2, Windows, AIX, Linux and DOS machines are able to support Alt keys,
not all UNIXes are. In fact Alt key support for UNIX can vary depending upon terminal
types. Therefore BE provides a 'feature' whereby Esc quickly followed by
a key is equivelent to pressing Alt and the key together.
This editor has extended regular expression support as in UNIX egrep.
<character> matches <character>
\<character> matches <character> (escaping special meaning).
. matches any single character.
[<class>] matches any character in <class>.
[^<class>] matches any character not in <class>.
<re>? matches zero or one occurance of <re>.
<re>+ matches one or more occurances of <re>.
<re>* matches zero or more occurances of <re>.
<re><re> 2 regular expressions catenated form a <re>.
<re>|<re> matches one <re> or the other <re>.
^ matches the left most position (left-anchor).
$ matches the right most position (right-anchor).
The \{n,m\} notation, often used to indicate between n and
m occurrances of a regular expression, is not supported.
The term 'bracket expression' has been introduced by POSIX to describe the character
class constructs found within [ and ] brackets.
A \ within a bracket expression escapes the special meaning of the next
character.
The following POSIX character class identifiers may be used within a bracket
expression. These correspond pretty much one-to-one to the isxxx(ch) family
of functions found in <ctype.h> on most systems. POSIX introduced
these variants as they generally increase portability. In summary :-
[:alnum:] alphanumeric characters
[:alpha:] alphabetic characters
[:blank:] space and tab characters
[:cntrl:] control characters
[:digit:] numeric characters
[:graph:] printable and visible (non-space) characters
[:lower:] lowercase characters
[:print:] alphanumeric characters
[:punct:] punctuation characters
[:space:] whitespace characters
[:upper:] uppercase characters
[:xdigit:] hexadecimal digits
POSIX collating symbols and POSIX equivalence classes allowed within bracket
expressions are not supported.
As BE is often used for viewing memory dumps from embedded programs, support for symbol tables is highly desirable. Although BE technically need only support one format, it actually supports a few of the more commonly used formats to avoid a proliferation of symbol file conversion programs.
The arm symbol format is the default. Each non-blank line in the symbol file has the symbol name, followed by a number of spaces, followed by the address specified in hex (without an 0x prefix). Additional information is sometimes present on the end of the line (particularly if overlays are used), but this is ignored.
On a Linux computer, the 'proc' filesystem provides a special file called /proc/ksyms.
Each line of this file has an address in hex (without an 0x prefix), followed
by a space, followed by the symbol name.
This is the ksyms symbol table format.
eg:
be -Y ksyms -y /proc/ksyms kernel.dat
-- assuming kernel.dat is a dump of the kernels memory
The nm command on an AIX 4.1 or later machine generates output which
is understood by the aix_nm symbol table format.
Typically nm is invoked with the -e argument, so that only
external symbols get listed.
Each line has the symbol name, followed by a symbol type character, followed
by an address and optionally followed by a length. Fields are seperated with white
space. Addresses and lengths are 0x preceeded if they are listed in hex
(this is caused by invoking nm with the -x flag).
BE ignores 4 byte type d data entries from the table, as these tend
to refer to TOC entries.
BE also ignores machine generated symbols which start _$STATIC.
C++ symbol names are typically listed demangled, and so can contain spaces. BE has quite complicated special logic to handle this.
The map format corresponds the .map files written by the 16
bit DOS link.exe program.
This has a section at the beginning of the file which declares segment names,
positions and sizes. BE ignores this.
Next the symbols are listed, ordered by name, and BE ignores this too.
Finally the symbols are listed again, ordered by value. BE reads this data.
Each line is of the form :-
SSSS:OOOO SymbolName
BE enters an entry in the symbol table of value 0xSSSSOOOO for each
symbol. This works well in conjunction with BEs -g command line argument.
eg:
be -Y map -y embedded.map -g dump.dat@0xf0000
-- assuming embedded.map is the map file from linking some embedded
-- application, and that dump.dat is a dump of the memory starting
-- at physical location 0xf0000
The binary file arguments to BE are normally of the form :-
filename[@address]
This tells BE to load the file and whenever data at a memory address from address
to address+filelength is accessed, to supply the data from the file.
However, it is possible to supply binary file arguments of the form :-
extension!args[@address]
Memory extensions may be written to provide either read-only, or read-write access
to their data.
BE loads the memory extension DLL or shared library. It then passes the args
and address to the memory extension, who does something of its own chosing
with them. The memory extension DLL can then supply data to BE on request.
One use of the BE memory extension feature is the provision of a memory extension
for handling files too massive to load into memory all at once. The memory extension
opens a file handle and reads bytes demanded by BE upon request. Source for BEBIG
is included in this document. The user can type :-
be big!verybigfile.dat
It ought to be noted that the author regularly uses BE on files of several megabytes
in size, without a problem. However, files of several gigabytes would present a
problem!
Another use is the in live-debug of running adapter cards. The memory extension
can provide data bytes directly from the memory space of the adapter. args
could be used to identify the slot the adapter is in. Alternatively, args
could identify IO base addresses, memory window addresses, or a device driver to
use to access the data. Memory extensions which do this, do exist, and they almost
turn BE into a debugger (almost, because there is no run, stop, or single step).
Run, stop and single step of an adapter could be driven by the options mechanism,
if that were possible and/or desired. When using these, a customised initialisation
file is typically also used, which understands all the structure definitions and
variables used in the firmware on the adapter.
Yet another use, might be providing BE with access to physical or virtual or
process specific linear address spaces, perhaps via the use of a device driver.
Shared memory windows might give addressibility of datastructures in other programs.
A simple example of this is a memory extension which reads bytes from the /dev/kmem
special device in the AIX or Linux environment. Using this, kernel device drivers
may be debugged.
Also, the surface of a disk or block device can be made accessible via an memory
extension. Again, a memory extension which does this does exist (but it uses a non-standard
mechanism for accessing the disk blocks). BE could then debug and repair filesystem
data.
Perhaps bytes sent down a communications port could be made to appear as a stream
of binary data.
The file bememext.h documents the extension interface. Currently extensions
may be built for :-
BEBIG is a simple memory extension for accessing enormous files. The source for
it is included here primarily as a reference for writing others. Despite not implementing
the full richness of the memory extension interface, it should serve well to get
you writing and testing your own extensions.
The C source code, bebig.C, looks like :-
//
// bebig.C - BE memory extension for editing massive files
//
// This is a rather simple implementation that simply seeks
// around an open file, and gets and puts bytes.
//
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <stddef.h>
#include <stdlib.h>
#include <memory.h>
#include "bememext.h"
class BeMem
{
FILE *fp;
unsigned base_addr, len;
Boolean read_only;
public:
BeMem(FILE *fp, unsigned base_addr, Boolean read_only)
:
fp(fp), base_addr(base_addr), read_only(read_only)
{ fseek(fp, 0L, SEEK_END); len = ftell(fp); }
~BeMem() { fclose(fp); }
Boolean read(unsigned addr, unsigned char & b)
{
addr -= base_addr;
if ( addr >= len )
return FALSE;
if ( fseek(fp, addr, SEEK_SET) != 0 )
return FALSE;
return fread(&b, 1, 1, fp) == 1;
}
Boolean write(unsigned addr, unsigned char b)
{
if ( read_only )
return FALSE;
addr -= base_addr;
if ( addr >= len )
return FALSE;
if ( fseek(fp, addr, SEEK_SET) != 0 )
return FALSE;
return fwrite(&b, 1, 1, fp) == 1;
}
};
BEMEMEXPORT Boolean BEMEMENTRY bemem_read(
void * ptr, unsigned addr, unsigned char & b
)
{
BeMem *bemem = (BeMem *) ptr;
return bemem->read(addr, b);
}
BEMEMEXPORT Boolean BEMEMENTRY bemem_write(
void * ptr, unsigned addr, unsigned char b
)
{
BeMem *bemem = (BeMem *) ptr;
return bemem->write(addr, b);
}
BEMEMEXPORT void * BEMEMENTRY bemem_create(
const char *args, unsigned addr, const char *(&err)
)
{
FILE *fp;
BeMem *bemem;
if ( !memcmp(args, "RO:", 3) )
{
if ( (fp = fopen(args+3, "rb")) == 0 )
{ err = "can't open file in read only mode"; return 0; }
bemem = new BeMem(fp, addr, TRUE);
}
else
{
if ( (fp = fopen(args, "rb+")) == 0 )
{ err = "can't open file in read/write mode"; return 0; }
bemem = new BeMem(fp, addr, FALSE);
}
if ( bemem == 0 )
{ fclose(fp); err = "out of memory"; return 0; }
return (void *) bemem;
}
BEMEMEXPORT void BEMEMENTRY bemem_delete(
void * ptr
)
{
BeMem *bemem = (BeMem *) ptr;
delete bemem;
}
#ifdef DOS32
// Note: Required due to the way DOS CauseWay DLLs are constructed
int main(int term) { term=term; return 0; }
#endif
#ifdef AIX
// Note: The need for this section may vanish if the AIX version of BE
// stops using loadAndInit to load shared libraries, and uses dlopen.
extern "C" {
BEMEM_EXPORT * __start(void)
{
static BEMEM_EXPORT exports[] =
{
(BEMEM_EP) bemem_read , "bemem_read" ,
(BEMEM_EP) bemem_write , "bemem_write" ,
(BEMEM_EP) bemem_create , "bemem_create" ,
(BEMEM_EP) bemem_delete , "bemem_delete" ,
(BEMEM_EP) 0 , 0
};
return exports;
}
}
#endif
Yes, sure, I could use C++ streams and I could cache the data read, but this
is supposed to be just a simple example.
Under OS/2, using IBM Visual Age C++, a module definition file, bebig.def,
is needed :-
LIBRARY BEBIG INITINSTANCE TERMINSTANCE
DATA MULTIPLE NONSHARED READWRITE
CODE PRELOAD EXECUTEREAD
EXPORTS
bemem_create
bemem_delete
bemem_read
bemem_write
Under OS/2, the makefile will typically look like :-
bebig.dll: bebig.obj bebig.def
ilink /NOI /NOLOGO /OUT:$@ $**
bebig.obj: bebig.C bememext.h
icc /C+ /W3 /Wcmp+cnd+dcl+ord+par+use+ \
/Ge-d-m+ /Q+ /DOS2 /Tp $*.C
Under Windows, the makefile is very similar :-
bebig.dll: bebig.obj
link /NOLOGO /INCREMENTAL:NO /DLL $** /OUT:$@
bebig.obj: bebig.C bememext.h
cl /c /DWIN32 /G4 /Gs /Oit /MT /nologo /W3 /WX /Tp $*.C
Under DOS, the makefile looks like the following example. If you don't
explicitly reference plib3r.lib, and the C++ code uses operator new
then its multithreaded equivelent gets dragged in (which causes link problems) :-
bebig.dll: bebig.obj
wlink @<<
System CWDLLR
Name $@
File bebig.obj
Library %watcom%\lib386\plib3r.lib
Option Quiet
<<
bebig.obj: bebig.C bememext.h
wpp386 -bt=DOS -dDOS32 -oit -4r -s -w3 -zp4 -mf -zq -fr -bd $*.C
Under AIX, the makefile looks like :-
bebig: bebig.o
/usr/lpp/xlC/bin/makeC++SharedLib \
-p 1 -n __start -o $@ bebig.o
chmod a-x $@
bebig.o: bebig.C bememext.h
xlC -DUNIX -DAIX -c $*.C
Under Linux, the makefile looks like :-
bebig.so: bebig.o
g++ -shared -o $@ bebig.o
chmod a-x $@
bebig.o: bebig.C bememext.h
g++ -DUNIX -DLINUX -fPIC -c $*.C
Despite BE being compiled multi-threaded on OS/2 and Win32, its not compiled
that way for DOS, AIX and Linux. One day BE for AIX and Linux may be multi-threaded
and thus the makefiles for making BE memory extensions may need appropriate modifications.
Even though BE only uses one thread, compiling multi-threaded gives the memory extension
writer the flexibility to write code which tries to read data in the background
in advance of it being needed.
The -C dx command line argument is a way of telling BE to load and use
a disassembler extension for displaying any code in the data.
The same rules for naming and locating disassembly extensions apply, as for memory
extensions.
eg: If you have an Intel 8086 disassembler, you could type :-
be -C i86 dump.ram
This assumes that under OS/2, Windows or DOS, the disassembler is provided by
the file BEI86.DLL, or via bei86 under AIX, or via bei86.so
under Linux.
The file bedisext.h documents the extension interface.
Disassembler extensions are compiled and linked in exactly the same way as memory extensions (see example above), although they obviously provide different entrypoints.
When editing files, changes to the data are recorded in memory. When BE is closed
down, it attempts to write back any changes back into the disk files where the data
originally came from. BE will prompt you as to whether to save the changes back
to disk.
If a memory extension is providing the data to BE for
display, and the memory extension supports modification of the data, it has a choice
:-
As most memory extensions provide a live view of some real-time data, they tend to opt for the first choice.
The latest version of BE is most easily obtainable over the Internet via the
links on my home page(s) :-
BE is a Win32 application, which has had extensive testing on Windows NT. Rather less testing has been performed with Windows 95, and quite a few bugs in the Windows 95 version of the Win32 Console API (used for screen redraw) have been identified and worked around. Some oddities relating to the use of the unusual screen sizes still remain. I would not be surprised if there are more problems to be found...
On AIX, because BE is a curses program, best keyboard and colour support is obtained
by using an aixterm, or by logging in from OS/2 using HFTTERM.EXE.
It should be noted that HFTTERM.EXE appears to have a bug whereby it doesn't
generate the correct datastream for the @9 and @0 keystrokes.
On Linux, best colour and keyboard support is found using the regular linux terminal.
Obviously, because BE for DOS uses a DOS extender, the machine upon which you run it must have a 32 bit processor and have suitable DPMI and/or VCPI drivers loaded to enable access to extended/high/upper memory.
Unfortunately I don't have continual access to all the platforms, so improvements in one version may not yet be reflected into the others.
Copying of this program is encouraged, as it is fully public domain. The source code is not publically available. Caveat Emptor.