source_editor

Source code in the editor

The HBasic editor does not store the source code of a HBasic source as a plain ASCII text. Instead it uses a special format with an internal representation for each text line. There were three important reasons to to this.

The parser works faster if there is no need to recognize token like standard identifier or other keywords.
The editor may put additional information into the source code that is used with special GUI function.
The format may save memory space because parts like standard identifier or leading blanks are stored in a short format.

This document describes the format of the source code within the HBasic editor.

Layout of the source code in memory

In HBasic each module (BASIC source compined with GUI description) is stored within a class of type CBasicDocument. This class (source code basic_document.cpp) has a pointer to a memory block that holds the BASIC source code of this module. Within this block each source line is stored between 2 byte that describe the length of the line. This is needed because there are no more carrige return characters or other delimiting signs at the end of a line. At the start and end of the whole source there is an additional byte with content 0xFF. A source code with one text line with a length of 8 Byte would look like the following hex codes when stored in the editor:

0xFF 0x08 <6 Byte line description> 0x08 0xFF

Format of one source line

Each single line of code is made up of the following parts:

one byte length of the line (including length bytes)
one byte number of leading blanks
one flag byte that describes special flags for the current line
a long value that is used for debugging purposes (holds the start of the compiled runtime-code for the line).
n byte code tags that describe the source contents of the line. You can see the tags that HBasic knows in the following table
one byte length of the line

Valid source code tags

0xFB string constant stored without '"'. 1 byte length and n byte name follows.
0xFC normal identifier 1 byte length and n byte name follows
0xFD standard identifier (number in next byte).
0xFE comment text until end of line (may be scipped by parser)

All other characters between 0x00..0xFA currently represent ASCII characters.

Flags used in the source line

Each editor line has one flag-byte which will be used to mark special conditions of the source code in the line. This will be used to mark lines that must be parsed in the first compiler phase. In the flag byte the following flags may be used:

EDF_START_SUB

Marks the start of a subroutine or method definition. In newer versions this flag is used for all declarations in the source code (SUB, METHOD, CLASS, EVENT...). This is important for the parser which has to recognize all definitions in the first compiler phase. It may become important for the editor if the editor should mark the beginning of a subroutine in a special way or list all subroutines.

EDF_END_SUB

Marks the end of a subroutine or other definition.

EDF_ERR_LINE

Marks a line where the parser recognized a syntax error. The advantage of this representation of an error is that moving the source line also moves the error flag. In the error line the error number and column where the error occured will both be stored as a short value in the long value used for debugging in the source line.

EDF_BREAK_LINE

This flag is used to mark break lines (lines where the interpreter should stop execution).

EDF_MARKED_LINE

Marks the end of a subroutine or other definition.

EDF_PREPARSE_DIM

Marks a line with a dim statement. This is useful for the first parser phase where veriable declarations will be parsed.

EDF_INHERIT

Marks a line with an inherit statement. The parser must recognize inherit statements in the first compiler phase.