I've been thinking about how to evaluate empirically software tools for a long time. Much of that effort was spent looking at what techniques that could be adapted from other disciplines, such as the social sciences (see "Beg, borrow, or steal," "Evaluating...," and "Characterising...") I am now looking at benchmarking, a "Made in Computer Science" approach to tool evaluation. Coming full circle has allowed me to look at this technique from a different perspective.
Benchmarking is a rich technique that can have a significant impact on the state of research within a field. Other evaluations typically occur within a single laboratory or a privileged industrial setting. In contrast, benchmarks are open and public, and they provide opportunities for the wider community to become involved in evaluating their own tools. This last point is important because, as they say, seeing is believing. Consequently, researchers become more aware of each other's contributions and are more likely to apply these lessons learned to their own tools. My dissertation explores these issues more fully, but it's not done yet. For now, you can read a paper that has been accepted for ICSE2003. As part of this work, I have developed two benchmarks.
Program Comprehension Tool BenchmarkThe first is the xfig structured demonstration for program comprehension tools, developed in collaboration with Margaret-Anne Storey from University of Victoria. We used xfig 3.2.1 as the common subject system and the materials are available if you want to try it yourself. This benchmark has been used at CASCON99, WCRE 2000, to teach graduate classes on reverse engineering. Some term reports are available from Tarja Systä's class at University of Tampere, Finland.
C++ Fact Extractor BenchmarkThe second benchmark is CppETS (C++ Extractor Test Suite, pronounced see-pets). The benchmark consists of a collection of C++ programs that pose various problems commonly found in parsing and reverse engineering. Version 1.0 was used at CASCON2001 and version 1.1 was used at IWPC2002. The CppETS materials can be downloaded from workshop web sites for you to use.
GXL (Graph eXchange Language) has been ratified as the standard exchange format in the reverse engineering and graph transformation communities. It was developed by Andreas Winter (University of Koblenz), Andy Schürr (Darmstadt University of Technology), Ric Holt, and me. We know of over forty groups in eight countries using GXL. Current work involves defining standard schemas for use with this carrier notation and promoting the use of the format. I am currently involved in defining a low-level schema for C++. We are in discussions with the graph drawing community and the IEEE to have the format adopted more widely.
This work is the culmination of many years of discussions at CSER (Consortium for Software Engineering Research, Canada), Waikiki Beach Club (part of the infrastructure initiative of the TCSE Committee on Reengineering), WoSEF (Workshop on Standard Exchange Format held at ICSE 2000), and a Dagstuhl workshop on "Interoperability of Reengineering Tools". A tutorial and workshop were held at CASCON2000.
My Master's work was the design and implementation of a prototype tool called grug, grep using GCL. (Don't you love these acronyms that expand to other acronyms? GCL stands for Generalized Concordance Lists or Gord and Charlie's Language-- history is fuzzy on this point.) SWAG's (Software Architecture Group) Software Bookshelf focuses on presenting architectural views of a a software system, and consequently the information about a system stored in the Books extends only to the file level. While this view is appropriate for architectural comprehension, lower level details is necessary for coding and debugging. My tool was based on grep, but used the semantic information about variables, procedures, etc. that is already available in Software Bookshelf. This tool was presented in a paper at ICSM99. This work relied on a survey on the habits of programmers as they search source code. This survey helped identify the searches that the tool must support in order for it to be considered useful and usable. The results were presented at IWPC98. This work was supervised by Ric Holt and Charlie Clarke.
I undertook four case studies of software immigrants to a development team at IBM. A software immigrant is a newcomer to a project who needs to learn both technical and cultural aspects of the job. Examples of technical information that a software immigrant needs to learn are: new programming languages and tools; background information about the problem domain; and, last, but not least, the software system. Examples of cultural information that a software immigrant needs to learn are: project jargon; coding standards; team dynamics; organizations structure; and business processes. The process of acquiring this information and adapting to the working environment is called naturalization. A paper on this work was presented at ICSE98. I am continuing this work by investigating the experiences of software immigrants at other companies.