UMUC Asia - CMSC430 - Theory of Language Translation のバックアップの現在との差分(No.2)

バックアップ一覧
差分を表示
ソースを表示
バックアップを表示
UMUC Asia - CMSC430 - Theory of Language Translation へ行く。
- 1 (2015-01-12 (月) 14:57:20)
- 2 (2015-01-19 (月) 13:02:30)
- 3 (2015-01-21 (水) 02:19:09)
- 4 (2015-01-25 (日) 17:05:17)
- 5 (2015-02-08 (日) 15:26:43)
追加された行はこの色です。
削除された行はこの色です。
#contents

** Configuring A Linux Environment - Installing Cygwin for 32-bit Windows 7 [#a3bd504c]

+ Access to http://www.cygwin.com/install.html
+ Download and run setup-x86.exe
+ Select Root Intall Directory -> C:\cygwin
+ Select Local Package Directory -> C:\Users\username\Downloads
+ Select Internet Connection -> Direct Connection
+ Choose A Download Site -> choose a mirror site closer（.ac.jp）
+ Select Packages
- Devel (Cur)
-- Bin
--- binutils: GNU assembler, linker, and similar utilities
--- gcc-core: GNU Compiler Collection (C, OpenMP)
--- gettext: GNU Internationalization development utilities
--- mingw-gcc-core: GNU Compiler Collection (C, OpenMP)
--- mingw-gcc-core-g++: GNU Compiler Collection (C, OpenMP)
--- mingw-runtime: MinGW.org MSVC & compiler runtime headers and runtime
-- Bin/Src
--- bison: GNU yacc-compatible parser generator
--- flex: A fast lexical analyzer generator
--- gcc-g++: GNU Compiler Collection (C++)
--- make: The GNU version of the 'make' utility
- Resolving Dependencies -> Select required packages
- The installation is complete within 10 minutes

 $ whereis gcc
 gcc: /usr/bin/gcc.exe /usr/lib/gcc /usr/share/man/man1/gcc.1.gz /usr/src/gcc-4.9.2-1.src/gcc.cygport

 $ whereis bison
 bison: /usr/src/bison-3.0.2-1.src/bison.cygport

 $ whereis flex
 flex: /usr/src/flex-2.5.39-1/flex.mknetrel /usr/src/flex-2.5.39-1/flex.skl

 $ whereis make
 make: /usr/src/make-4.0-2/make.1 /usr/src/make-4.0-2/make.lnk /usr/src/make-4.0-2/make.mknetrel

- Additional Packages
-- Editors
--- vim: VI IMproved - enhanced vi editor
--- vim-common: VI IMproved - enhanced vi editor (common runtime)
-- Net
--- bind: DNS server
--- curl: Multi-potocol file transfer tool
--- whois: GNU Whois
--- ping: A basic network tool to test IP network conectivity
-- Web
--- wget: Utility to retrieve files from the WWW via HTTP and FTP

//** Introductions
//
//Hello class, my name is Yuji Shimojo and I'm 26. I am from Okinawa, Japan and live there now. I'm a technical sales representative at an IT company based in Okinawa. Actually, I used to study Information Engineering at University of the Ryukyus in Okinawa, but I dropped out of the college 5 years ago. While in the college, I had lived in San Jose, California for 6 months for an internship. I'm currently majoring in Computer Science with a minor in Business Administration. This class is required to obtain my B.S.. I took CMIS242, CMSC335, CMSC350, CMSC325, CMSC330, CMSC412, and CMIS330 as part of my major in the past. I'm aiming to enhance my sales engineer career through studying in UMUC.
//
//I have learned C, C++, Java, and CASL II in schools. In my job, I have used PHP, Ruby, and SQL. Among them, my favorite language is Java because it has many of object-oriented features, doesn't depend on OS, and behaves the same way on all platforms.
//
//In my spare time, I enjoy traveling. I've been to California, Taiwan, and several places in Japan. I especially like Taiwan because it is very near from my place. So far I went there 5 times.
//
//I also enjoy scuba diving. Okinawa is known as one of the best islands for scuba diving, so you can enjoy all year round!

** Week1 Discussion [#ga621d0d]
** Week 1 Discussion [#ga621d0d]

- Question

Languages can be fully compiled, fully interpreted or partially compiled and then interpreted. Choose a language from the list of the top 20 currently most popular languages at http://www.tiobe.com/tpci.htm. Choose one that has not already been chosen by one of your classmates. Discuss which of the three translations mechanisms is used with that language. Offer some rationale for why that mechanism was chosen.
Languages can be fully compiled, fully interpreted or partially compiled and then interpreted. Choose a language from the list of the top 20 currently most popular languages at http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html. Choose one that has not already been chosen by one of your classmates. Discuss which of the three translations mechanisms is used with that language. Offer some rationale for why that mechanism was chosen.

- Answer

I chose Java. When you run a Java program, your source code is partially compiled by the Java compiler and then interpreted by Java Virtual Machine (JVM). The JVM allows you to develop programs without dependence on operating systems.

A major function of JVM is supporting garbage collection, which automatically frees dynamically allocated memory on the heap. When you create a instance of an object by using new operator, memory is automatically allocated on the heap. Therefore, you don't have to care about memory allocation and deallocation unlike C.

- Professor's comment

What is the difference between the original JVMs and the newer JIT compilers?

- My answer

Professor,

The original JVM was designed as an interpreter called the Java interpreter for the Java bytecode. The bytecode is an abstract machine language for the JVMs. It is analyzed, and then executed by byte by the interpreter. On the other hand, JIT compilers load the bytecode into memory and compile it to the native code by method before execution. Although a JIT compiler uses more memory because it stores the compiled native code in memory temporarily, it will enhance execution speed of the application.

Using Java HotSpot VM is a hybrid approach between the interpreter and JIT compilers. It compiles only repetitive methods to the native code.

** Week 2 Discussion [#df1d8c3a]

- Question

Define some language using a regular expression. The language must contain an infinite number of strings. The underlying alphabet must have at least three different characters. Draw a deterministic finite state automaton that accepts the strings of that language. Give two character strings that are accepted by that finite state automaton and two that are not.

There is a nice tool available for helping you understand regular expressions and draw finite state automata. It is called JFLAP. I would encourage you to download it and use it for this exercise.

** Week 2 Project 1 Discussion [#pd961c5c]

Be sure to read all the attached posts before beginning the first project. Also post any questions about the project as a response to this topic.

In particular you should be sure to read the following three topics I have posted:

-- Project 1 Skeleton Code
-- Test Data for Project 1
-- Decoupling the Listing Code

** Week 2 Project 1 Skeleton Code [#ia483c18]

Project 1 involves writing the lexical analyzer for the compiler. It requires that you provide an input file for flex that generates the lexical analyzer. The main method can be included in that file. I would recommend that you provide a separate C++ file for the code that generates the compilation listing. I suggest also providing a file named tokens.h, which contains the token definitions as an enumerated type. In project 2, this file will be automatically generated by bison. You must also include a makefile that builds your program. To help you with this project, I have provided a skeleton of the flex input, a skeleton tokens.h file and a possible makefile. None of these are requirements, just some help getting you started if you need it:

-- scanner.l http://nova.umuc.edu/~jarc/cmsc430/scanner.html
-- tokens.h http://nova.umuc.edu/~jarc/cmsc430/tokens.html
-- makefile http://nova.umuc.edu/~jarc/cmsc430/makefile.html

In addition, here are examples of how the outputs should look for a correct program and one that contains two lexical errors:

-- successful compilation http://nova.umuc.edu/~jarc/cmsc430/success.html
-- lexical error http://nova.umuc.edu/~jarc/cmsc430/failure.html

** Week 2 Project 1 Decoupling the Listing Code [#t492c63b]

As a matter of good object-oriented design, it is a good idea to decouple the code that displays the error messages from the flex source code. In the later stages of this project, there will be syntactic and semantic errors. The code for displaying and counting these errors should be separate from the flex and bison code since it will be called from both places.

In the skeleton code that I have provided you I am calling the functions as members of a Listing class. If defined as a class, this would really be defining a singleton object. One way to create a singleton is to make all the functions and data static. The standard practice in C++ is to put the class definition in the header file listing.h and the bodies of the member functions in a corresponding .cc file.

Because, unlike Java, C++ does not require all functions to be put in classes, another alternative would be to define these as ordinary functions that are not members of any class. In that case the function prototypes would still go in listing.h and the function bodies in the corresponding .cc file.

In the call to the appendError function I have passed in a parameter named LEXICAL. This is to designate that this error is a lexical error. My suggestion would be to define an enumerated type as follows:

 enum ErrorType {LEXICAL, SYNTAX, SEMANTIC};

The listing.h file would be a good place to put this enumerated type definition. By supplying the error type, the Listing class can keep a count of the number of messages of each kind, which should be displayed when the end of the program is reached. Adding another function to the Listing class to display the error count is best way to accomplish this.

Finally, let me explain the purpose of the appendError function. It should queue up the error messages so they are displayed at the end of the line. All error messages that occurred on that line can then be displayed by the nextLine function once the line is complete.

** Week 2 Test Data for Project 1 [#oefbfbea]

Attached are the three test cases you should use for project 1. The first is an example of a correct program. The second is one that contains lexical errors. The third one is to test to be sure you are generating correct tokens.

-- test1.txt(191 Bytes) https://learn.umuc.edu/d2l/le/47318/discussions/posts/7416507/ViewAttachment?fileId=2017001
-- test2.txt(191 Bytes) https://learn.umuc.edu/d2l/le/47318/discussions/posts/7416507/ViewAttachment?fileId=2017002
-- test3.txt(313 Bytes) https://learn.umuc.edu/d2l/le/47318/discussions/posts/7416507/ViewAttachment?fileId=2017003

** Week 3 Discussion [#t6b8e619]

- Question

Define a language that is not a regular language using a context-free grammar that is not ambiguous. The alphabet of the
language must contain at least five characters. Describe in English the strings of the language. Choose any string that is in
the language and demonstrate that it is in the language by showing the left-most derivation that derives it. Draw the parse
tree that corresponds to the left-most derivation. Give an example of one string that is not in the language.

- Answer

My language:

A -> I = E
I -> E | x | y | z |
E -> E + T | T
T -> T * F | F
F -> ( E ) | I

The left-most derivation:

A -> I = E
-> x = E
-> x = E + T
-> x = T + T
-> x = F + T
-> x = ( E ) + T
-> x = ( T ) + T
-> x = ( T * F ) + T
-> x = ( F * F ) + T
-> x = ( I * F ) + T
-> x = ( x * F ) + T
-> x = ( x * I ) + T
-> x = ( x * y ) + T
-> x = ( x * y ) + F
-> x = ( x * y ) + ( E )
-> x = ( x * y ) + ( T )
-> x = ( x * y ) + ( T * F )
-> x = ( x * y ) + ( F * F )
-> x = ( x * y ) + ( ( E ) * F )
-> x = ( x * y ) + ( ( E + T ) * F )
-> x = ( x * y ) + ( ( T + T ) * F )
-> x = ( x * y ) + ( ( F + T ) * F )
-> x = ( x * y ) + ( ( I + T ) * F )
-> x = ( x * y ) + ( ( x + T ) * F )
-> x = ( x * y ) + ( ( x + F ) * F )
-> x = ( x * y ) + ( ( x + I ) * F )
-> x = ( x * y ) + ( ( x + y ) * F )
-> x = ( x * y ) + ( ( x + y ) * I )
-> x = ( x * y ) + ( ( x + y ) * z )

An example of one string that is not in the language:

x = ( a * b ) + ( ( x + y ) * z )

** Week 4 Project 2 Submit Files [#a858caa5]

The second project involves writing the syntactic analyzer for the compiler that was begun in the previous project. The grammar of the language is the following:

COLOR(red){program}:~
　{COLOR(red){function}}

COLOR(red){function}:~
　COLOR(blue){FUNCTION IDENTIFIER} [COLOR(red){parameters}] COLOR(blue){RETURNS} COLOR(red){type} COLOR(blue){;} COLOR(red){body}

COLOR(red){parameters:}~
　COLOR(red){parameter} {COLOR(blue){,} COLOR(red){parameter}}

COLOR(red){parameter:}~
　COLOR(blue){IDENTIFIER} : COLOR(red){type}

COLOR(red){type}:~
　COLOR(blue){INTEGER} | COLOR(blue){REAL} | COLOR(blue){BOOLEAN}

COLOR(red){body}:~
　{COLOR(red){variable}} COLOR(blue){BEGIN} COLOR(red){statement} COLOR(blue){END ;}

COLOR(red){variable}:~
　COLOR(blue){IDENTIFIER :} COLOR(red){type} COLOR(blue){IS} COLOR(red){statement}

COLOR(red){statement}:~
　COLOR(red){expression} COLOR(blue){;} |~
　COLOR(blue){IF} COLOR(red){expression} COLOR(blue){THEN} COLOR(red){statement} COLOR(blue){ELSE} COLOR(red){statement} COLOR(blue){ENDIF ;}

COLOR(red){expression}:~
　COLOR(blue){IDENTIFIER} |~
　COLOR(blue){IDENTIFIER (}COLOR(red){expression} {COLOR(blue){,} COLOR(red){expression}}COLOR(blue){)} |~
　COLOR(blue){INT_LITERAL} | COLOR(blue){REAL_LITERAL} | COLOR(blue){BOOLEAN_LITERAL} |~
　COLOR(blue){NOT} COLOR(red){expression} |~
　COLOR(red){expression operator expression} |~
　COLOR(blue){(}COLOR(red){expression}COLOR(blue){)} 

COLOR(red){operator}: COLOR(blue){ADDOP} | COLOR(blue){MULOP} | COLOR(blue){RELOP} | COLOR(blue){AND} | COLOR(blue){OR}

In the above grammar, the red symbols are nonterminals, the blue symbols are terminals and the black punctuation are EBNF metasymbols. The braces denote repetition 0 or more times and the brackets denote optional.

The grammar must be rewritten to eliminate the EBNF brace and bracket metasymbols and to incorporate the significance of parentheses, operator precedence and left associativity for all operators. Among arithmetic operators the multiplying operators have higher precedence than the adding operators. All relational operators have the same precedence. Among the binary logical operators, and has higher precedence than or. Of the categories of operators, the unary logical operator has highest precedence, the arithmetic operators have next highest precedence, followed by the relational operators and finally the binary logical operators. The directives to specify precedence and associativity, such as %prec and %left, may not be used

The syntactic analyzer should be created using bison. It should detect and recover from syntax errors to the extent possible. The semicolon should be used as the primary synchronization token. The compiler should produce a listing of the program with error messages included after the line in which they occur. A count of the number of lexical and syntactic errors and the number of total errors should follow the compilation listing.

Your parser should, however, be able to correctly parse any syntactically correct program without any problem.

You will lose points from the design portion of your grade if your bison input produces any shift/reduce or reduce/reduce errors.

The 40 points that you will receive for the functionality portion of your grade on this project will be based two criteria shown below:
Parses all syntactically correct programs      22 points
Detects and recovers from errors in the function header      3 points
Detects and recovers from errors in variable declarations      3 points
Detects and recovers from errors in conditional expressions      3 points
Detects and recovers from errors in arithmetic expressions      3 points
Detects and recovers from errors in the function body      3 points
Detects and recovers from multiple errors      3 points

Test data will be provide to test each of the above cases.

The next two phases of the project do not require that you implement error checking.

** [C++ for Java Programmers] Inheritance in C++ [#o0ff25b1]

Like Java, C++ is an object oriented language, however, there are some important differences. First when one class extends another class the syntax is a bit different as C++ does not have extends as a reserved word. It is written as follows:

 class Derived: public Base ...

It is important to specify the base class as public to ensure the type of inheritance that is familiar to you in Java because having it public is not the default. The syntax for calling the constructor of a super class is different also. Unlike Java, C++ does not have the reserved word super.

 class Derived: public Base
 {
 public:
      Derived(int value) : Base(value)
      { ...

C++ member functions (methods) are not bound dynamically by default like they are in Java. To ensure dynamic binding, a member function must be declared as virtual in the base class. Failing to declare it as such causes a member function in a derived class with the same prototype (signature) to be considered redefining the function, rather than overriding it.

C++ has the equivalent of abstract methods, but they are called pure virtual functions and are declared as follows:

 class Base
 {
 public:
       virtual void abstract_method() = 0;

Interfaces can be created in C++, although C++ has no reserved word interface. To create an interface, we simply create a class that contains only pure virtual functions. To implement an interface, we use ordinary public class inheritance, as C++ has no reserved words implements. Because C++ allows multiple inheritance of classes, it is possible to extend a class and implement an interface.

** [C++ for Java Programmers] Separating the Interface from the Implementation [#i239ca1f]

Unlike, Java, which allows the complete class definition to be written together, it customary practice in C++ to put the class definition, which contains only the function prototypes (method signatures) and the definition of the member variables (instance variables), in a header file whose name is the name of the class with a .h extension. The bodies of the member functions are then placed in a source file, whose name is the name of the class with a .cpp extension. Because the member function bodies are declared outside the class definition, the function names must be prepended with the class name followed by the scope resolution operator (::) as follows:

 void ClassName::functionName() ...

C++ does allow function bodies to be placed in the class definition in a fashion that resembles Java, however, the signficance of doing so is different. When the body is included in the class definition, it is an implicit recommendation to the compiler to place the function inline, wherever it is called. Consequently, it is best only to place short, one or two line functions, in the class definition

** [C++ for Java Programmers] Memory Allocation of Objects [#xe21e92a]

Unlike Java, C++ does not allocate objects on the heap. By default they are allocated on the compiler's run-time stack like ordinary variables of primitive types. In most cases, this is approach is adequate and because it is simplest since it does not require the instantiation step, so a declaration of an object in C++ looks as follows:

 ClassName object;

The above declaration allocates space for that object on the stack. When calling the member functions (methods) of such objects, we use the dot operator just as we do in Java.

It is sometimes necessary or preferable to dynamically allocate objects on the heap. When that is the case, we must use a pointer and we must instantiate the object much like we do in Java. For example:

 ClassName* objectPointer = new ClassName();

When we call a member function (method) of an object that has been allocated on the heap, we must use the arrow (->) operator rather than the dot operator.

** [C++ for Java Programmers] Managing Include Preprocessor Commands [#af902b12]

One of the greatest difficulties that Java programmers have when writing C++ programs is dealing with the preprocesssor include commands. Unlike Java, in C++ it standard practice to place the class definition in a header (.h) file of same name as the class and the bodies of the member functions (methods) in a corresponding source (.cpp) file. In order for one class to reference another class in any way, the client class must include the header file of the class that it is referencing. There are several approaches to managing include files.

The simplest approach is to place all the include commands in a single include file which is commonly referred to as stdafx.h, and have all the .cpp files include that header file. What is most important to understand is that the order in which the commands are listed is critically important. Every header file must be included after all other header files that it depends on. These dependencies are defined by whether a class corresponding to a header file references another class corresponding to another header file. Keep in mind that there are several ways that one class can reference another. It can be the result of calling a method in another class, containing an object whose type is another class, or finally because of inheritance, in which one class is derived from another class. If we view these dependencies as a directed graph, the order can be considered any topological order of the graph. Of course, such an order only exists when there are no circular dependencies. In that case, which occurs in project 2, an incomplete class definition is required to eliminate the inherent forward reference.

Although this approach is the simplest, one drawback to this approach is that whenever any header file is modified, the entire program must be recompiled. In the early days of C and C++, such an approach would have been unthinkable for large programs because compilers were much slower. Given the speed of computers and compilers today, this concern is much less important unless the program is very large.

Another approach is to put include commands only in the source files and to include only those header files that are needed. In this case, order is still vitally important. The advantage to this approach is that is minimizes recompilation. This is the approach I have used in my examples.

The final approach is to allow include commands in both source and other header file. Because a class definition can depend on another class, allowing header files to include other header file can simplify determining the dependencies. Like the second approach, this approach minimizes recompilation but it introduces the possibility of a header file being included more than once. There are two ways to circumvent this problem. The traditional approach was to use conditional compilation as follows:

 #define A_CLASS
 #ifndef A_CLASS
 
   The class definition
 
 #endif

Many newer compilers provide a pragma for the approach, which is written as follows:

 #pragma once

This pragma has the same effect as using conditional compilation.

You may choose whichever of these approaches that you prefer.