Tool Vaci Manual

version 0.0.9

Author:Toshihiro Kamiya
Created:2008/June/2
Last-Modified:2008/July/18
Contact:info@ccfinder.net
Copyright:2008 ( C ) Tosihiro Kamiya. All rights reserved.

Introduction

Tool Vaci is a kind of reverse engineering tool. The tool statically analyzes source code, and classifies identifiers with the contexts where the identifiers appear. The tool enables the user to investigate inadequate names and to investigate change names between versions of a software product.

Install

  1. Install the required tools and applications.

    1. Install CCFinderX . Vaci uses the CCFinderX in analysis of source code.
    2. Install GraphViz . Add the GraphViz to path (append /Bin of GraphViz install directory to the value of environmental variable PATH). Vaci uses GraphViz in generating a graph.
    3. Install a web browser which supports SVG(Scalable Vector Graphics). (Vaci was tested with Firefox3 RC2. Porting to IE will be after the release of IE 8.) Vaci outputs a figure of graph as SVG file.
  2. Install Vacicmd

    Copy the executable files (*.exe in case of Windows) of Vaci distribution to /bin of the CCFinderX install directory.

  3. Install Vaci plug-in

    At first, unzip vaciplugin.zip and copy the files ( net.ccfinder.plugin.vaci.*.jar ) to /plugin of Eclipse. Then, invoke Eclipse. In a dialog which appears by menu [Window]-[Show View]-[Other...], select [Vaci]-[Vaci View] and click [OK].

Usage

Tool Vaci has two functions: 1) by applying to source code of a product, investigate variation of names in the product, and 2) by applying to source code of versions of a product, investigate change of names between versions.

Usage 1) Investigating variations of names within a product

The command "c" of vacicmd detects sets of identifiers (hereafter, "translation class"), where the identifiers included by the set have distinct names and appear in the similar contexts. When a translation class includes the identifiers that are not regularly named, such variation names may expose a inadequate naming of identifiers.

Vacicmd extracts translation classes and classifies them into the nine types for convenience in analysis. The user can browse these translation classes and their classifications with Vaci plug-in, in order to investigate each translation class by refering the corresponding source code.

Step 1 Detection

We assume the target soruce files are strored in a directory c:\targetsrc . (If you would like to browse the result in Eclipse with Vaci plug-in, select the root directory of a Eclipse project, that is, a directory including a file .project .) Also, we assume that the source code is written in Java programming language. (So in case of that the target is C/C++ source code, replace "java" with "cpp" in the following explanation. )

Type following command line in order to run vacicmd:

c:\>pushd c:\targetsrc
c:\targetsrc>vacicmd c java .

When the vacicmd finished successfully, a file _transclasses.txt is generated. This file is a text file, so you can check the content with more command or text editor:

c:\targetsrc>more _tralsclass.txt

Step 2 Browsing

This step is to explain how to browse translation classes within Eclipse IDE.

Import c:\target with a import menu of Eclipse in advance.

Push button [Vaci search] at Eclipse toolbar. A dialog will appear.

images/dialog_vaci_search.png

In this dialog, you can specify the options in showing translation classes. The three types, which are selected (checked) by default, are highly possibly to be variation of naming.

images/vaci_view_half.png

In Vaci view, the translation classes are shown in trees.

  • Each root node of a tree (icon folder ) expresses a root directory of Eclipse projects.
  • Each node at the next level expresses a translation class. The icon by the node (such as abbreviated minority ) show a type of the node. As for detail of each type, refer the Types of translation class .
  • Each node at the third level (icon identifier ) expresses an identifier, which is a member of the translation class.
  • Each node at the deepest level (icon file ) express a location where the identifier appears.

By double clicking a node at the deepest level, you can make Eclipse text editor to show code fragment where the identifier appears in a source file.

Step 3 Reading with text editor

This step is to explain how to read translation classes with a text editor. (This step will be helpful in developing a tool to process the output of Vacicmd.)

A file of translation classes contains series of sections. Each section starts with a line "type: ..." and ends with an empty line. A section expresses a translation class. The followings is an example of a section:

type: minority
id|cf.endPos
      h:\kamiya\prog\smith2008\GemX\model\layeredgroup\CodeFragment.java:49
id|cf.beginPos
      h:\kamiya\prog\smith2008\GemX\model\layeredgroup\CodeFragment.java:46
--
id|right.end
      h:\kamiya\prog\smith2008\GemX\model\CodeFragment.java:29
id|right.begin
      h:\kamiya\prog\smith2008\GemX\model\CodeFragment.java:25
id|right.leftEnd
      h:\kamiya\prog\smith2008\GemX\model\ClonePair.java:54
id|right.rightEnd
      h:\kamiya\prog\smith2008\GemX\model\ClonePair.java:62
id|right.leftBegin
      h:\kamiya\prog\smith2008\GemX\model\ClonePair.java:50
id|right.rightFile
      h:\kamiya\prog\smith2008\GemX\model\ClonePair.java:46
id|right.rightBegin
      h:\kamiya\prog\smith2008\GemX\model\ClonePair.java:58

The first line shows a type of the translation class. (The type is determined by whether some rule exists among names, which rule is it, etc. Refer Types of translation class) Among the remaining lines, a line "id|..." shows an identifier. A lines starting with a tab shows a location where the identifier appear in source code, by means of file name and line number.

Moreover, when the type of the translation class is "minority", the section is divided two parts by a line "--". The first part contains the identifiers that are regarded as minorities. The second part contains the identifiers that are regarded as majorities.

Usage 2) Investigation of changes in names betwen versions of a product

The "m" command of vacicmd extracts the identifiers that are sharing the common context between two versions and have the distinct names between versions. That is, a set of identifiers in the older version (O) and a set of identifiers in new version (N), where an identifier in O is sharing a context with an identifier in N and the two identifiers are having the distinct names. Such a pair of sets O, N is called "translation map".

With translation maps, the user can investigate the changes in names between versions.

Step 1 Detection

We assume that the source code of the older version and the new version are stored the directory c:\oldver and c:\newer, respectively. The directory where the detection result will be stored is c:\analsys. The target source code is written in Java programming language.

Type the following command line:

c:\>pushd c:\analsys
c:\analsys>vacicmd m java c:\oldver c:\newver

When the vacicmd finished successfully, a file "_transmapss.txt" is generated:

c:\analsys>more _transmaps.txt

Step 2 Browsing

This step is to explain how to generate a HTML file from a translation map and browse it.

Run the visualizetransmap at command line, which will make a sub directory named browse and store the generated HTML files in the directory:

c:\analysis>visualizetransmap _transmaps.txt -p browse

By opening a generated file index.html with a web browser, a page like the following will be shown.

images/transmapbrowse1.png

The left pane contains a summary, the captions Type 1to1, Type 1toN, Type Nto1, and Type MtoN. The summary means the numbers of the detected translation maps of each type (At the initial state, the content of summary is shown at the right pane). Below each type caption, the serial numbers (#number) of the translation maps of the type are shown. By clicking a serial number of the translation map, the content of the map will appear in the right pane.

The right pane contains the content of summary, or the content of the translation map selected by the left pane. The following figure shows a page where content of a translation map in the right pane.

images/transmapbrowse2.png

The right pane contains a serial number of a translation map, a graph of the translation map, and edges of the graph.

The graph is generated from the translation map (this graph will not be visible in the browser without SVG support). Each nodes (box) in the graph expresses an identifier. The nodes at the left side are the identifiers in older version. The nodes at the right side are the identifiers in newer version. Each edge of solid line expresses a relation between identifiers, that is, two identifiers at both sides e are sharing a context. Each edge of dashed edge expresses the same name identiifers, that is, two identifiers at both sides mean an identical name apperas in the older version and the newer version.

Below the caption "Edges", for each of edges in the graph, a shared context, which holds the relation between identifiers, is expressed as locations of code fragments in source code.

Step 3 Reading with text editor

This step is to explain how to read translation maps with a text editor. (This step will be helpful in developing a tool to process the output of Vacicmd.)

A file of translation maps ( _translationmaps.txt ) contains series of sections. Each section starts with a line "type: ..." and ends with an empty line. A section expresses a translation map. The followings is an example of a section:

type: MtoN
id|addPreprocessorItem        id|addPreprocessScript
      c:\analysis\10.1.9\GemX\gemx\MainWindow.java:890        c:\analysis\10.2.3.5\GemX\gemx\MainWindow.java:577
id|add        id|addPreprocessorItem
      c:\analysis\10.1.9\GemX\gemx\MainWindow.java:470        c:\analysis\10.2.3.5\GemX\gemx\MainWindow.java:1040
id|add        id|addPreprocessScript
      c:\analysis\10.1.9\GemX\gemx\MainWindow.java:470        c:\analysis\10.2.3.5\GemX\gemx\MainWindow.java:577

The first line shows a type of the translation map (refer Types of translation map). Among the remaining lines, the lines "id|..." shows a pair of identifiers from the older version and the newer version; the identifier before a tab is an older one, and the identifier after the tab is the newer one. A line starting with a tab shows a locations of the older and newer identifiers, by means of file name and line number; the file name and line number before a tab is a location of the older identifier, the file name and line number after the tab is the newer one.

Appendix

Types of translation class

A translation class, which is extracted by tool Vaci from source code of a product, is classified into one of the following types by characteristics of the names that are included by the translation class.

Icon/Name Notation in file Description
nearly-identical Identical except for name space nearly-identical All identifiers have the same name except for names space (namespace in C++ or package in Java)
distinct-numbers Including distinct numbers distinct-numbers All identifiers have the same name except for numbers in the names.
short-name Short name short-name Some identifier in the translation class has a short name, when name spaces have been removed from the names.
same-prefix Same prefix same-prefix All identifiers have the same prefix.
same-postfix Same postfix same-postfix All identifiers have the same postfix.
distinct-case Distinct case distinct-case All identifiers have the name, when the names have been converted to lower cases and the tail "s"s (which may mean plural form) have been removed.
abbreviated Abbreviated abbreviated It is possible to generate all of names of the identifiers by removing some characters from a name of an identifier.
minority including minority minority Identifiers are divided into minorities and majorities by patterns of their names.
others Others others The identifiers do not fit any of the above cases.

Types of translation map

A translation map, which is extracted by tool Vaci from source code of two versions of a product, is classified into one of the following types by number of the identifiers in the older version and number of the identifiers in the newer version.

Type 1to1 The translation map includes one identifier in the older version and one identifier in the newer version. This possibly means that the identifier has been renamed between versions.
Type 1toN The translation map includes one identifier in the older version and multiple identifiers in the newer version. This possibly means that the concept described by the name in the older version has been split into multiple concepts in the newer version.
Type Nto1 The translation map includes multiple identifiers in the older version and one identifier in the newer version. This possibly means that the multiple names were used to point one concept in the older version and the names have been integrated into one name in the newer version.
Type MtoN The translation map includes multiple identifiers in the older version and multiple identifiers in the newer version