Interactive BASIC Compiler Project

Sunday, August 24, 2014

Qt Creator (GDB) No Watch Variables With GCC 4.8

The next set of changes are rather complicated and required debugging. Upon reaching the first breakpoint, no variables were displayed in the Locals and Expressions debugging window in Qt Creator. After a little research, the problem was determined to be with GCC 4.8.1, which is using a new format (named DWARF-4) to write debugging symbols to the executable.

The problem is that GDB (debugger) does not support this newer format, at least older versions prior to GDB 7.5. Mint 13 (Ubuntu 12.04) has only version 7.4. This problem does not affect Windows with the programs installed as recently described (see the Windows tagged posts). The latest MSYS with MinGW has GDB 7.6.1 and MinGW-w64 has GDB 7.7. Mint 17 (Ubuntu 14.04) is also fine with GDB 7.7.

There are two ways to solve this problem. The GCC compiler has an option for generating the older DWARF-3 format debugging symbols. Instead of permanently adding this option to the CMake build file, the following option can be added to the CMake command line or in Arguments field in the Run CMake wizard within Qt Creator:

-DCMAKE_CXX_FLAGS_DEBUG='-g -gdwarf-3'

The other solution is to build and install GDB from source code. Click Continue... for details on the procedure for this.

Continued... »

Thursday, August 21, 2014

Token – Status As Enumeration Class

I misinterpreted how the compiler generates code for switch statements (caused by looking at disassembled output instead of assembler output of the compiler). The compiler does not generate an if-elseif chain in circumstances where it doesn't simplify a switch statement to an array of return values (when the return values are simple types as previously described).

The compiler still generates an array for a switch statement, but instead of an array of return values (of simple types), it generates an array of pointers. Each pointer is the address to the code for the case. During run-time, the processor indexes on the expression of the switch into this array and jumps to the address. While this is not space efficient, it is run-time efficient.

Unfortunately, for the token status enumeration, a single C style string constants cannot be used because each must be the result of the tr() translation function. This was not necessary for the other enumeration to string functions because these are only used for testing (translation is not necessary).

The switch statement for converting a token status enumerator was put into the message function of the Token class. This function previously returned an element from the static message array (which was removed). The status enumeration was put into the Token class as an enumeration class therefore requiring the Token::Status:: scoping prefix on the status enumerators. The _TokenStatus suffix was removed from each (the bug statuses did not have this prefix, but now require the scoping prefix).

The generation of the token status enumeration was removed from the auto-generation awk script and the token source file was removed as a dependency to the auto-generated enumerations header file. Several of the token status enumerators were not being used and were removed. The only auto-generated code remaining is for the code enumeration and the code enumerator to code name array. These will be handled later when the Table class is redesigned.

[branch cpp11 commit cfed68f09b]

Wednesday, August 20, 2014

Token – Type As Enumeration Class

A series of changes were made to change the token type enumeration to an enumeration class because its size of enumerator was used to dimension several arrays. The changes were put into several individual commits on a work branch, which were later combined (squashed using the interactive git rebase command) into a single commit and merged back to the cpp11 branch before pushing to GitHub. Details of these changes follow.

1) Unrelated to the token type enumeration, the Table class convert code function was moved to the Token class. This function contained arguments for a token with a data type and the desired data type to convert to. It returned a conversion code. However, it did not actually access the table. Since codes are not part of the table (the table is only indexed by codes) and this function took a token pointer, it made more sense for this to be a Token member function.

2) Also unrelated to the token type enumeration, the auto-generated data type to string map was changed to a brute force static function with a switch statement in the test source file as described in the previous post. (The compiler in-lined this function when set to release build.) The generate map function was removed from the test names awk script and the dependency on the main header file was removed from the auto-generated test names source file from the CMake build file.

3) Two of the uses of the token type enumerator was the static has parentheses flag and precedence arrays used to determine if the token type has a parentheses and the precedence of the token type. These arrays were initialized by the static initialize function. These arrays were changed to unordered maps with initializer lists for their values. Note that the maps do not contain values for all the token types. When the map is accessed for one of these other token types, a new element will be added with the correct default values (false for the flag, zero for the precedence). Since these maps are initialized, the initialize function was no longer needed and was removed.

4) The token type enumeration was changed to an enumeration class and was moved into the Token class where it belongs. Outside of the Token class, the token type enumerators need a double scope as in the Constant_TokenType enumerator becomes Token::Type::Constant.

5) The auto-generated token type to string array was changed to a brute force function also. (The compiler also in-lined this function.) The find enumeration function in the test names awk script was now no longer used and so was removed and the dependency on the token header file was removed from the auto-generated test names source file from the CMake build file.

[branch cpp11 commit e7dc6a964a]

Tuesday, August 19, 2014

Better Enumerators To Strings Solution

Several enumerations need to be converted to strings for output during testing. This was accomplished with an awk script that scanned source files for the enumerations and automatically generated a source file that contained C style string arrays of the enumerator names, which were indexed by an enumerator. With enumeration classes, the enumerators can not be used as indexes to arrays. Another solution was to generate an unordered map that can be indexed by the enumerators provided that a generic hash is defined for enumeration classes.

The method of automatically generating source code from source code is kludgy and should be eliminated. Also, changing a source file read by an awk script forces the entire project to be rebuilt since all the source files are generally dependent on the auto-generated source file. A different solution was needed.

Another solution is using a function that takes an enumerator value as input and returns the string using the brute force method with switch statement and returning a string for each case, for example:

const char *enumName(Enum value)
{
    switch (value)
    {
    case Enum::Value1:
        return "Value1";
    case Enum::Value2:
        return "Value2":
    ...
    case Enum::ValueN:
        return "ValueN":
    }
}

A good optimizing compiler will convert this switch statement to an array instead of a series of an if-elseif chain, which the GCC compiler does at the higher optimization levels. However, there are some conditions that are required. First, the compiler will only generate a look up array if the function is static (local to the source file where used). Also, the return values need to be relatively simply types.

However, if the return type is QString and each return value is QString("Value1"), the compiler no longer generates an array, but instead an if-elseif chain. Though note that the actual return value can still be a C style string where the QString constructor is called to create the QString return value. (This information was acquired by looking at the assembly code generated.)

On the other hand, if the return value is a standard string (STL std::string), then the compiler does generate an array. This probably has something to do with the fact that the standard string class supports move constructors (a C++11 addition not discussed here). The QString class does not support these (at least Qt4 classes no not, but Qt5 classes do). The problem with using standard strings is that the return values of these functions are used with Qt stream classes, which do not support standard strings. Therefore, C style strings will be used for now.

Functions like the example above will be used. These functions could be auto-generated, but again this practice will no longer be used. The nice thing about auto-generation is that it eliminates the mistake of missing an enumerator, because with all warnings enabled (and as errors), if an enumerator is missing, a compiler warning is issued for the switch statement that not values are handled (provided no default case is included). This is not a perfect solution, but probably the best within the limits of the C++ language.

There is one final issue with the function above. The compiler sees that there is no return value at the end of the function and issues a warning (error). The compiler is apparently not smart enough to realize that execution does not get past the switch statement since all cases return. To silence this warning, a return ""; statement is needed at the end of the function.

Sunday, August 17, 2014

More Enumeration Classes

Several more enumerations were changed to enumeration classes. While more of these included a size of enumerator, they were not being used. This included the translator test mode, translator reference type, dictionary entry type, error list type, program model operation type, and the table multiple enumerations. Two instances where the enum keyword was in front of the enumeration name were removed (this is something required for C but not C++).

This leaves several enumerations that are not as easy as adding the class keyword and renaming the enumerators like those changed above. These include the code (auto-generated), token status (auto-generated), sub-code (bit masks), table flag (bit masks), table search type, token type, and test option enumerations.

The auto-generated enumerations need a good C++ solution and not the current kludgy C solution using an external awk script. The bit mask enumerations will need operator functions implemented (bit-wise OR and AND), which will require using type casting. For the others, the size of enumerators are used for either dimensioning arrays or for looping over the enumerators.

[branch cpp11 commit e71fe6f0fe]

Data Type As Enumeration Class

Now that the number of and size of enumerators of the data type enumeration have been removed, there was one more enumerator left that needed to be removed if possible, specifically the No_DataType enumerator that was assigned to a -1, which was used to indicate an unset data type. If this enumerator was left in place, then switch statements would need to have a case for it or would need a blank default.

C++11 allows a default enumerator in the form EnumName{} just like for any other type. For user types, it calls the default constructor (if there is one, else a compile error is reported). For built in types, the value is assigned a zero. And so is the case for a default enumerator, which also has the equivalent value of zero. For unscoped enumerations, the value is zero, but for an enumeration class, the value can't be used as an integer without a cast.

Therefore, the No_DataType enumerator was replaced with the default enumerator DataType{}. In addition, the first enumerator can't have the default value of zero, so was assigned to the value of one. This is not an issue since data type enumerators are no longer used as indexes to array (only as keys to maps). All of the Xx_DataType enumerators were changed to just Xx in the class enumeration definition and to DataType::Xx for all uses in the code since the enumerators are now scoped.

There was one remaining issue. The test names awk script scanned the header file for the data type enumeration and created an array of C style strings with the names of the enumerator. As the string This awk script generates the test names header file used in the test source file for converting enumerators to strings for test output. The awk script was modified to instead look for the enumeration class and to generate an unordered map. This code was put into a new function in the script. This script also generates C style string arrays for the code and token type enumerations. These will also be changed at some point.

[branch cpp11 commit 06ef6f17c5]

Size of Data Types Enumerator

The size of enumerator was removed but it was used for the dimension of two arrays. Both of these arrays were two dimensional, so some sort of compound map would have been needed. Instead, simple switch statements and conditional operators were used. This should be nearly as efficient as an array as the compiler may generate a lookup table for a switch and this is not in highly critical code so absolute efficiency is not required.

The first array was used to obtain a conversion code needed to convert the data type in a token to the desired data type. The conversion code obtained could be a null code representing no conversion needed or an invalid code representing a data type that can't be converted.

This array was used by the convert code and find code functions in the Table class. The convert code function was modified to process the two data types directly using compound switch statements (the first on the token data type, and the second on the needed data type). The find code function used the array for a data type in a token with the needed data type and so was changed to simply call the convert code function instead.

The second used of the size of enumerator was to dimension the an array used in the expected error status function of the Translator class. This array was dimensioned for the number of data types and the number of reference types (none, variable, variable or defined function, or all) and used to obtain a token error status when a reference is expected but not found. This function was changed to use a compound switch statement (the first on the data type, and the second on the reference type or the tertiary operator was used where a second switch was overkill). The size of enumerator for the reference enumerator was only used for this array and was also removed.

[branch cpp11 commit 4183a87f74]

Saturday, August 16, 2014

Number of Data Types Enumerator

Not only does the data type enumerator have the size of enumerator (which needs to be removed before changing to an enumeration class), it also has a number of enumerator, which was placed after the three main data types (double, integer and string). This enumerator was removed, but is was used to dimension two arrays that needed to be changed.

In the Table constructor, there is a section that scans the secondary associated codes of a main code, the purpose of which is to set the expected data type for the main code. If the main code has associated codes for both doubles and integers, the expected data type is set to number (for example, the minus operator); and if there are associated codes for all three types (numbers and string), the expected data type is set to any (for example, the plus operator).

This is accomplished by using bit masks where there is a bit for each type (double, integer and string). For each associated code, the data type of the code is bit-wise ORed together. After all the associated codes are scanned, the final value is checked to see if it as the two number bits set or all three bits set to determine the expected data type.

Originally, an array of three elements was defined with the bit masks for the three data types. The number of enumerator was used to dimension this array. With the number of enumerator removed, this array was replaced with an unordered map, with an initializer list to set the bit masks for each data type enumerator.

The other use of the number of enumerator was in the equivalent data type function of the Table class, which contained an array from data type to equivalent data type, but when the sub-string and temporary string data types were removed, this function ended up essentially just returning the data type passed to it. In other words, this function is no longer does anything, so it was removed and the one use of it was replaced with the data type (in the LET translator routine).

[branch cpp11 commit 8f56e1117a]

Enumeration Class Hash

In order to use the STL unordered map class, a hash function is required for the key of the map. Hash functions are provided for all the built in types plus some of the other STL classes (like std::string), but unfortunately, not for enumeration classes. For the built in types (integers), the number itself is used as the hash. However, enumeration class enumerator values can't be because they can't be used as integers even though there values are just numbers. The solution is to use a generic (template) function type that will work for all enumeration classes that converts the enumerators to integers:

struct EnumClassHash
{
    template <typename T>
    std::size_t operator()(T t) const
    {
        return static_cast<std::size_t>(t);
    }
};

This works for any enumeration class by returning an integer for an enumerator (a size_t is an integer that is large enough to cover any enumeration) using a static cast (hopefully the only place a cast will be needed). An unordered map to use an enumeration class with an initializer list to assign values to the enumerators would be defined as:

std::unordered_map<EnumName, QString, EnumClassHash> names {
    {First_EnumName, "First"},
    {Second_EnumName, "Second"},
    {Third_EnumName, "Third"}
};

The name of the map does not need to be repeated for an assignment of each value (which could be tedious for a long name or with a lot of enumerators). This does not resolve the issue of forgotten values, but hopefully when an undefined enumerator value is accessed, the default value (in this case a blank string) would be detected or identified easily.

QMap vs. Standard Map (Initializer Lists)

Using enumeration classes will require using an associated array container class (like QMap or QHash) since the size of the enumeration (the number of enumerators) is not obtainable (without kludgy type casting) to dimension an C style array. As mentioned in the previous post, to use a associated array container class requires run-time assignments to fill the container.

C++11 provides a solution with initialize lists. Unfortunately, the Qt containers do no support C++11 initializer lists (specifically Qt4 doesn't because Qt5 does contain support for initializer lists). The Standard Template Library (STL) containers does support initializer lists. Therefore, the STL containers will be used as needed until the inevitable change to Qt5. STL containers are technically already available since the various Qt containers have method functions for converting STL containers to and from Qt containers.

For an associated array, either the QMap or std::map container could be used. Both of these containers order the keys of the elements. Ordering of the keys is not required in this case. Qt provides the QHash class for an associated array not requiring ordered keys. Similarly, STL provides std::unordered_map, which will be the class used.

Enumerators As Indexes – Using A Map

The first enumeration that will be changed to an enumeration class will be the data type enumeration. One of the differences with enumerations is that unscoped enumerators can be used as integers. One of the other things I preferred to do when defining an enumeration was add a size of enumerator at the end:

enum EnumName {
    First_EnumName,
    Second_EnumName,
    Third_EnumName,
    sizeof_EnumName
};

This size of enumerator can then be used to dimension an array, like to hold a conversion to another value (another enumeration, string, etc.). This will not work with enumeration classes because the enumerators can't be used as integers without resorting to kludgy type casting (something I prefer to avoid if possible). Plus, once this size of enumerator is added, then any switch statement using the enumeration will generate a warning since there is no case statement for this enumerator. Adding a blank default statement is also kludgy.

Alternatively, using an associated array (map) instead of a C style array solves the issue of needing to know the size of the enumeration. Compare the array solution to the map solution:

QString names[sizeof_EnumName] = {    QMap<EnumName, QString> names;
    "First",                          names[First_EnumName] = "First";
    "Second",                         names[Second_EnumName] = "Second";
    "Third"                           names[Third_EnumName] = "Third";
};

The array method is error-prone and care must be taken to make sure the right values (strings in this case) are applied to the right elements in the array. This could be solved by using assignments that would look identical to the map assignments, though may not be as efficient because the assignments are done at run-time instead of at compile time, and the array still needs to be dimensioned.

The map method method of using assignments (same for the array method using assignments) can also be error-prone because it is easy to miss an assignment of one of the values. In the array case, a null pointer would be returned, which will probably cause a segmentation fault unless it is checked for.

For the map method, however, accessing a value that doesn't exist simply adds a new element to the map with a default value (in this case a blank string). This is still a problem, but much less fatal. Alternatively, in the case of QMap, the value method function could be used instead of the [] operator, which doesn't add an element for a non-existing element and allows a default value to be returned for the non-existing element (for example, in this case a default value like "BUG" could be used).

C++11 – New Enumeration Class

With original implementation of C/C++ enumerations (enum), I preferred to add a suffix to each of the enumerators to associate them to the enumeration so that when one appears in the code, it is easy to identify it to the enumeration:

enum EnumName {
    First_EnumName,
    Second_EnumName,
    Third_EnumName
};

Without this suffix, it is difficult to see which enumeration the value is associated to. (Though using an IDE like Qt Creator, there is a command to go right to the definition.) In addition, obviously the same name can't be used for two different enumerations. A suffix (a prefix could have been used instead) solves this issue.

C++11 introduces a new enum class where the enumerators are scoped inside the enumeration. The enumerators for these are also more strongly typed then the normal enumerations (which represent integers and can be used as such). As an enumeration class, the above enumeration becomes:

enum class EnumName {
    First,
    Second,
    Third
};

In order to use these enumerations, they must be scoped, so to refer to the second enumerator, the name EnumName::Second would be used. This is similar to my solution of using suffix (in this case it is a prefix). An example of an enumeration class was added to the try compile test program. The intention is to change the enumerations used in the project to enumeration classes.

[branch cpp11 commit 2a8e3899ee]

Thursday, August 14, 2014

Project – Enable C++11 Compiling

By default, GCC C++ compiles for the corrected C++98 standard (C++03) with GNU extensions (there is no option for compiling for the uncorrected original C++98 standard). There are two options for compiling C++11, one for the standard and one with GNU extensions. The option for the standard C++11 will be used and was added to the CMake build file.

Though not necessary, I decided that it was a reasonable idea to validate that C++11 compiling was enabled and that C++11 was supported while generating the make file. This meant adding a try_compile command to the CMake build file for a simple C++11 program. If compiling of the simple program fails, an error is reported and CMake aborts.

The initial simple C++11 program tests two C++11 features. As more C++11 features are used in the project, this simple program will be expanded to test those features also. The two initial features tested are the new universal initializer syntax and the new null pointer. Click Continue... for details of these two features.

I discovered something about CMake syntax, namely else and endif statements no longer require repeating the expression in the if statement (as of CMake 2.6.0). The expression may now be left empty. Repeating the expression can sometimes be misleading, for example:

if (NOT ${SOMETHING})
... <not something processing here> ...
else (NOT ${SOMETHING})
... <something procession here> ...
endif (NOT ${SOMETHING})

Which makes it seem like the else part is for handling the "not something" condition. Leaving the expression blank, as in else() and endif(), is much less confusing. The rest of the if-else-endif statements in the CMake file will be updated to this style at some point. (This syntax was already used for handling issues with Windows.)

There was also a statement in the CMake build file that obtained the GCC version for checking, but statements that were checking the version have been since removed, so this statement was also removed. Work is now taking place on a new cpp11 branch.

[branch cpp11 commit 18ed38e9af]

Continued... »

Wednesday, August 13, 2014

Project – Compiler Warnings

In [Stroustrup, 2013], he frequently mentions that for questionably formed C++ statements that the compiler should issue a warning. This made me realize that only the default warnings in the compiler were being used. To possibly catch more problems, all of the warnings were enabled using the following GCC compiler options:

-Wall       enable all (well actually most) warnings
-Wextra   enable warnings not enabled by -Wall
-pedantic   enable warnings demanded by strict ISO C++ standard
-Werrors    cause compiler to treat warnings as errors

The last option causes the compiler to abort after issuing a warning instead of proceeding, which will require the warning to be corrected; however, this does not affect warnings from pedantic option, therefore the -pendantic-errors was used instead, which has the same affect. After adding these options, several warnings were reported, which were of six different types. Click Continue... for details of these warning types that were corrected. Since this commit is development related and not specific to any topic, the commit was just made to the develop branch.

[commit bb0124444f]

Continued... »

Tuesday, August 12, 2014

Transitioning to the New Branching Model

The current topic in development was integrating the program model with recreator into the edit box, which has been going well, but there is at least one issue remaining. When the text for a line entered in the edit box is recreated line replaced back into the edit box, extra undo commands are added to the undo stack. So upon undoing the line, it firsts undoes the recreation and then undoes the change. This issue will probably not be trivial to correct.

I want to start working on improving the C++ usage including using features that are part of the C++11 standard. This implies a new topic branch for this work. A topic branch is also needed for the recreator to edit box integration, somewhere on the current branch0.6 branch. Finally for the new branching model, the develop branch is needed somewhere.

Since v0.6.1 was the last tag, the develop branch should go at this tag. For the new C++11 branch, it should branch off of the develop branch (at this tag). However, a lot of improvements are on the current branch0.6 branch, so I've decided to split branches further on from the last tag.

Therefore, the new develop branch was created at branch0.6. The current branch0.6 was dead ended with an appropriate abandoned commit. There will be no more of these branches. Normally, git does not allow an empty commit with just a message, but this can be circumvented by using the --allow-empty option on the commit command.

A new recreator-editbox topic branch was placed at where the failing encoder test expected results were corrected before the latest two commits where the various Windows build issues were corrected since these commits have nothing to with this topic. When work commences on this topic, changes can be merged from the develop branch.

[commit d3bb314f24] [commit 727f7d223f]

Monday, August 11, 2014

Project – New Git Branching Model

I found a better Git branching model. Currently, a branch is created for the development release series. For example, branch0.5 was created for all the 0.5.x versions (0.5.0 through 0.5.3). Once this development series was completed, the branch was merged back to the master branch. Since there were no other changes on the master branch, the pointer to master was simply moved from 0.4.6 (the last of the 0.4 development series) to 0.5.3 - what is known as a fast-forward merge. In the end though, the repository is just a long row of linear commits.

The new Git branch model is detailed in this blog post. I'm going to adopt this branching model. The main development branch will be named develop, which will replace the branchX.X branches that has been used for development. When development is ready for a release, a releaseX.X branch will be created from the develop branch. There will be at least one commit used for updating the version number, and possibly another to do minor clean up (like fixing comments), plus any for possible platform related bug fixes discovered during testing. This will replace the "updated files for release X.X.X" commits just before a release. If testing goes well (on all platforms), then this branch will be merged to master and tagged for the release (as vX.X.X).

When a new feature, function or change is being added (what is called a "topic"), a topic branch will be created off of the develop branch and will be named for the topic. This will more clearly state what is being developed (branchX.X states nothing except the release series). When the topic is complete, it will be merged back to the develop branch. If for some reason the topic doesn't work out, then the branch will abandoned and an empty commit will be made with the reason for the abandonment. These topic branches will allow more when one topic to be worked on at a given time.

This model also allows for hit fixes to the master branch, where a hotfix-X.X.X branch will be created for the fix (which includes the proposed hot fix version number). Once complete, the hot fix branch will be merged backed to the master branch, and it can be merged to the develop branch as needed.

As the blog post explains, all merges will use the no-fast-forward option (--no-ff), meaning that Git will make a merge commit showing the branch being merged instead of just moving the branch pointer. Using this option will allow it to be cleary seen where the branch started and where it was merged. With the current model, there is no way to see where the branch originally started, only where it ended up (assuming it isn't deleted, which is why the older development branches were not deleted).

Interactive BASIC Compiler Project

Sunday, August 24, 2014

Qt Creator (GDB) No Watch Variables With GCC 4.8

Thursday, August 21, 2014

Token – Status As Enumeration Class

Wednesday, August 20, 2014

Token – Type As Enumeration Class

Tuesday, August 19, 2014

Better Enumerators To Strings Solution

Sunday, August 17, 2014

More Enumeration Classes

Data Type As Enumeration Class

Size of Data Types Enumerator

Saturday, August 16, 2014

Number of Data Types Enumerator

Enumeration Class Hash

QMap vs. Standard Map (Initializer Lists)

Enumerators As Indexes – Using A Map

C++11 – New Enumeration Class

Thursday, August 14, 2014

Project – Enable C++11 Compiling

Wednesday, August 13, 2014

Project – Compiler Warnings

Tuesday, August 12, 2014

Transitioning to the New Branching Model

Monday, August 11, 2014

Project – New Git Branching Model

Email

Source and Downloads

Labels

Blog Archive