Source code in the editor
The HBasic editor does not store the source code of a HBasic source as
a plain ASCII text. Instead it uses a special format with an internal representation
for each text line. There were three important reasons to to this.
-
The parser works faster if there is no need to recognize token like standard
identifier or other keywords.
-
The editor may put additional information into the source code that is
used with special GUI function.
-
The format may save memory space because parts like standard identifier
or leading blanks are stored in a short format.
This document describes the format of the source code within the HBasic
editor.
Layout of the source code in memory
In HBasic each module (BASIC source compined with GUI description) is stored
within a class of type CBasicDocument. This class (source code basic_document.cpp)
has a pointer to a memory block that holds the BASIC source code of this
module. Within this block each source line is stored between 2 byte that
describe the length of the line. This is needed because there are no more
carrige return characters or other delimiting signs at the end of a line.
At the start and end of the whole source there is an additional byte with
content 0xFF. A source code with one text line with a length of 8 Byte
would look like the following hex codes when stored in the editor:
0xFF 0x08 <6 Byte line description> 0x08 0xFF
Format of one source line
Each single line of code is made up of the following parts:
-
one byte length of the line (including length bytes)
-
one byte number of leading blanks
-
one flag byte that describes special flags for the current line
-
a long value that is used for debugging purposes (holds the start of the
compiled runtime-code for the line).
-
n byte code tags that describe the source contents of the line. You can
see the tags that HBasic knows in the following table
-
one byte length of the line
Valid source code tags
-
0xFB string constant stored without '"'. 1 byte length and n byte name
follows.
-
0xFC normal identifier 1 byte length and n byte name follows
-
0xFD standard identifier (number in next byte).
-
0xFE comment text until end of line (may be scipped by parser)
All other characters between 0x00..0xFA currently represent ASCII characters.
Flags used in the source line
Each editor line has one flag-byte which will be used to mark special conditions
of the source code in the line. This will be used to mark lines that must
be parsed in the first compiler phase. In the flag byte the following flags
may be used:
EDF_START_SUB
Marks the start of a subroutine or method definition. In newer
versions this flag is used for all declarations in the source code (SUB,
METHOD, CLASS, EVENT...). This is important for the parser which has to
recognize all definitions in the first compiler phase. It may become important
for the editor if the editor should mark the beginning of a subroutine
in a special way or list all subroutines.
EDF_END_SUB
Marks the end of a subroutine or other definition.
EDF_ERR_LINE
Marks a line where the parser recognized a syntax error. The
advantage of this representation of an error is that moving the source
line also moves the error flag. In the error line the error number and
column where the error occured will both be stored as a short value in
the long value used for debugging in the source line.
EDF_BREAK_LINE
This flag is used to mark break lines (lines where the interpreter
should stop execution).
EDF_MARKED_LINE
Marks the end of a subroutine or other definition.
EDF_PREPARSE_DIM
Marks a line with a dim statement. This is useful for the first
parser phase where veriable declarations will be parsed.
EDF_INHERIT
Marks a line with an inherit statement. The parser must recognize
inherit statements in the first compiler phase.