<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7116486</id><updated>2011-04-21T16:52:56.152-07:00</updated><title type='text'>bobyjos</title><subtitle type='html'>Brief Profile:

Software Testing Professional</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://bobyjos.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://bobyjos.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>bobyjos</name><uri>http://www.blogger.com/profile/04704110968990110850</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>40</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7116486.post-115883529006982086</id><published>2006-09-21T03:39:00.000-07:00</published><updated>2006-09-21T03:41:30.426-07:00</updated><title type='text'>Accessibility Testing</title><content type='html'>Accessibility Testing Software Compared&lt;br /&gt;Steve Faulkner, Web Accessibility Consultant, Vision Australia Foundation [HREF1], 454 Glenferrie Rd, Kooyong 3144. steven.faulkner@visionaustralia.org.au&lt;br /&gt;&lt;br /&gt;Andrew Arch, Manager Online Accessibility Consulting, Vision Australia Foundation [HREF1], 454 Glenferrie Rd, Kooyong 3144. andrew.arch@visionaustralia.org.au&lt;br /&gt;&lt;br /&gt;Contents: introduction | conformance | investigations | conclusion | references | detailed results&lt;br /&gt;&lt;br /&gt;Abstract&lt;br /&gt;Web accessibility for people with disabilities and other disadvantaged groups is becoming increasingly important for government and educational institutions as they try to meet their obligations under the Disability Discrimination Act and various policies and guidelines for online publishing. Business are also obligated under the DDA not to discriminate against people with disabilities as a result of their online activities.&lt;br /&gt;&lt;br /&gt;A plethora of web accessibility testing tools have been released on to the market over the past eighteen months at prices ranging from a few hundred dollars to many thousands of dollars. This study looks at the ability of four of these testing tools to accurately assess the accessibility issues on a web site against the W3C Web Content Accessibility Guidelines 1.0 Priority 1 checkpoints. We conclude that they all have strengths and weaknesses and that none of them are able to identify all the accessibility issues. At this stage these tools can only aid accessibility testing, not provide a definitive assessment.&lt;br /&gt;&lt;br /&gt;Introduction&lt;br /&gt;With the increasing requirement for Government, education and business to provide accessible online services in order to provide access to all citizens including people with disabilities, many web managers are turning to the "robotic" tools to spider through their sites and tell them the problems and accessibility 'hot spots'. This has led to a veritable industry of software developers trying to solve this problem for those requiring "accessible" web sites, spured along with the recently enacted Section 508 [Section 508] law in the United State requring Federal Government agencies to build accessible web sites and generally to "buy accessible".&lt;br /&gt;&lt;br /&gt;However, as we shall demonstrate in this paper, not all accessibility assessment software tools are created equal. Some overstate the problems while others understate the problems that exist on a web site. And they don't even do this consistently across the sixty five checkpoints of the Web Content Accessibility Guidelines 1.0 (WCAG) [Chisholm, et al ].&lt;br /&gt;&lt;br /&gt;Web accessibility testing tools is a relatively new software area - most tools have been available for less than two years. Comparisons of the efficacy of the tools have not been undertaken as the market has matured. Graves [2001] in Government Computer News conducted one of the earliest comparisons between InFocus 508 and PageScreamer 2.3 noting the problems both products had with tables. Harrison [2002] and Harrison and O' Grady [2002] presented a more rigorous comparison of six analysis and repair tools noting the difficulty in isolating errors identified by some of the tools and the emphasis on US accessibility standards rather than the international WCAG ones.&lt;br /&gt;&lt;br /&gt;Our aim has been to investigate the efficacy and accuracy of some of the available software applications used for web accessibility testing , in order to help assess the value of such tools in the broader process of ensuring the accessibility of web sites [Brewer &amp; Letourneau].&lt;br /&gt;&lt;br /&gt;How is conformance ascertained&lt;br /&gt;What is accessibility?&lt;br /&gt;An accessible web is available to people with disabilities, including those with: &lt;br /&gt;&lt;br /&gt;Vision impairment (e.g. low vision or colour blindness) or vision loss affecting their ability to discern or see the screen &lt;br /&gt;Physical impairment affecting their ability to use a mouse or keyboard &lt;br /&gt;Hearing impairment or loss affecting their ability to discern or hear online audio &lt;br /&gt;Cognitive impairments (e.g. dyslexia, ADD, learning difficulties, memory impairment) affecting their ability to comprehend or understand your site &lt;br /&gt;Literacy impairments (e.g. low reading skills or English is not their first language) possibly affecting their ability to fully understand your site and its messages &lt;br /&gt;Beneficiaries from an accessible web, however, are a much wider group than just people with disabilities and also include: &lt;br /&gt;&lt;br /&gt;People with poor communications infrastructure, especially rural Australians &lt;br /&gt;Older people and new users, often computer illiterate &lt;br /&gt;People with old equipment (not capable of running the latest software) &lt;br /&gt;People with "non-standard" equipment (e.g. WAP phones and PDA's) &lt;br /&gt;People with restricted access environments (e.g. locked-down corporate desktops) &lt;br /&gt;People with temporary impairments or who are coping with environmental distractions &lt;br /&gt;How do we check for an accessible web site?&lt;br /&gt;The Web Accessibility Initiative (WAI) outlines approaches for preliminary and conformance reviews of web sites [Brewer &amp; Letourneau]. Both approaches recommend the use of 'accessibility evaluation tools' to identify some of the issues that occur on a web site. The WAI web site includes a large list of software tools to assist with conformance evaluations [Chisholm &amp; Kasday]. These tools range from automated spidering tools such as the infamous Bobby [ Watchfire, 2003], to tools to assist manual evaluation such as The WAVE [WebAIM], to tools to assist assessment of specific issues such as colour blindness. Some of the automated accessibility assessment software tools also have options for HTML repair.&lt;br /&gt;&lt;br /&gt;What role can automated tools play in assessing the accessibility of a web site?&lt;br /&gt;WCAG 1.0 comprises 65 Checkpoints. Some of these are qualified with "Until user agents ..." and with the advances in browsers and assistive technology since 1999, some of these are no longer applicable - leaving us with 61 Checkpoints. Of these only 13 are clearly capable of being tested definitively, with another 27 that can be tested for the presence of the solution or potential problem, but not whether it has definitively been resolved satisfactorily. With intelligent algorithms many of the tools can narrow down the instances of potential issues that need manual checking, e.g. the use of "spacer" as the alt text for spacer.gif used to position elements on the page.&lt;br /&gt;&lt;br /&gt;These automated tools are very good at identifying pages and lines of code that need to be manually checked for accessibility. Unfortunately, many people misuse these tools and place a "passed" (e.g. XYZ Approved) graphic on their site when the tool can not identify any specific accessibility issues, but the site has not been competently manually assessed for issues that are not software checkable.&lt;br /&gt;&lt;br /&gt;So, automated software tools can:&lt;br /&gt;&lt;br /&gt;check the syntax of the site's code &lt;br /&gt;identify some actual accessibility problems &lt;br /&gt;identify some potential problems &lt;br /&gt;identify pages containing elements that may cause problems &lt;br /&gt;search for known patterns that humans have listed &lt;br /&gt;However, automated software tools cannot:&lt;br /&gt;&lt;br /&gt;check for appropriate meaning &lt;br /&gt;check for appropriate rendering (auditory, variety of visual) &lt;br /&gt;The interpretation of the results from the automated tools requires assessors trained in accessibility techniques with an understanding of the technical and usability issues facing people with disabilities. A thorough understanding of accessibility is also required in order to competently assess the checkpoints that the automated tools cannot check such as consistent navigation, and appropriate writing and presentation style.&lt;br /&gt;&lt;br /&gt;Investigation of efficacy of accessibility software&lt;br /&gt;Choice of software&lt;br /&gt;The choice of tools to review was based on a number of factors:&lt;br /&gt;&lt;br /&gt;The software needed to have the ability check WCAG 1.0 checkpoints (some tools check only for US Section 508 problems). This decision was based on the applicability of the WCAG 1.0 to Australian disability regulations. &lt;br /&gt;The least expensive desktop software products from each software producer. This decision was based upon the hypothesis that the more sophisticated versions of the testing software are built upon the same 'testing engines' and algorithms as the entry level products. &lt;br /&gt;Potential users were considered more likely to purchase the less expensive products as Web Accessibility testing would not be a major priority for many organisations. &lt;br /&gt;The availability of trial versions of the software products to be reviewed and the resources available to conduct the review. &lt;br /&gt;We intend to expand the list of software products reviewed as software and resources become available&lt;br /&gt;&lt;br /&gt;Table 1: Software Reviewed  Software Vendor URL Cost &lt;br /&gt;AccVerify 4.9 HiSoftware http://www.hisoftware.com US $495 &lt;br /&gt;Bobby 4.0.11 Watchfire http://www.watchfire.com US $99 &lt;br /&gt;InFocus 4.2 SSB Technologies http://www.ssbtechnologies.com/ US $1,795 &lt;br /&gt;PageScreamer 4.1 Crunchy Technologies http://www.crunchy.com/ US $1,495 &lt;br /&gt;&lt;br /&gt;1. A new version of Bobby has been released since the testing was conducted.&lt;br /&gt;&lt;br /&gt;It is evident that there is quite a difference in cost across vendors for their entry level products. This difference in cost is partially reflected in the functionality of the software; some tools automatically fix certain problems, but the core functionality, the testing and reporting on WCAG 1.0 issues, is present in all the software reviewed.&lt;br /&gt;&lt;br /&gt;Investigation methodology&lt;br /&gt;Site used for testing&lt;br /&gt;The site "The University of Antarctica" used for the review is a demonstration site developed by WebAIM, a non profit organisation whose stated goal "is to improve accessibility to online learning opportunities for all people".&lt;br /&gt;&lt;br /&gt;The site contains examples of the many potential barriers to accessibility. The site is hosted by WebAIM at http://www.webaim.org/tutorials/uofa/. &lt;br /&gt;&lt;br /&gt;The site consists of: &lt;br /&gt;&lt;br /&gt;28 HTML documents &lt;br /&gt;1 CSS file &lt;br /&gt;16 GIF files &lt;br /&gt;24 JPG files &lt;br /&gt;1 SWF file (Flash) &lt;br /&gt;1 Java file &lt;br /&gt;13 AU files (audio) &lt;br /&gt;1 MOV file (multimedia) &lt;br /&gt;3 MPEG files (multimedia) &lt;br /&gt;The site was chosen because it was built to demonstrate accessibility problems. The scope of the site is quite small and therefore instances of problems are easily quantified. Furthermore, the site was constructed using plain HTML files, there are no pages generated "on the fly' from a database, making the process of manual checking and quantification of the site a manageable task.&lt;br /&gt;&lt;br /&gt;It was also reasoned that the site content and structure is relatively stable and therefore further testing of the site at a later time will still produce accurate comparison results. Further to this, a copy of the site will be stored at http://it-test.com.au/UOFA by Vision Australia Foundation to ensure the sites continuing integrity for testing purposes.&lt;br /&gt;&lt;br /&gt;Process followed&lt;br /&gt;Each of the products in the review were set to produce reports detailing issues in reference to the WCAG 1.0 Priority 1 Checkpoints. The reporting options if present were set to produce the standard reports. All reports were produced in HTML format.&lt;br /&gt;&lt;br /&gt;AccVerify report comprised 60 HTML files:(Accverify report [zip file 1,128kb]) &lt;br /&gt;&lt;br /&gt;3 summary pages, comprising a listing of Priority 1 checkpoint errors and instances and Priority 1 visual checkpoints (needing human judgment) and instances. Graphical representation of this information is also presented along with a page listing the files that failed against any of the checkpoints along with a link to the associated detailed report and checklist &lt;br /&gt;5 statistical pages listing some of the structural elements of the files tested, e.g. tables, forms, images, with links to the pages containing the elements &lt;br /&gt;26 detailed report pages (1 for each file tested): &lt;br /&gt;individual page summary and graphs of checkpoint errors and visual checkpoints &lt;br /&gt;detailing of instances of specific issues with links to the associated checkpoints on the W3C web site. &lt;br /&gt;divided into Priority 1, 2 and 3 issues &lt;br /&gt;26 checkpoint pages (1 for each file tested): &lt;br /&gt;listing of all WCAG 1.0 checkpoints with indication of whether the page passed/failed, not applicable, or needed visual checking for each checkpoint &lt;br /&gt;short explanation of each checkpoint, paraphrased from WCAG 1.0 guidelines &lt;br /&gt;divided into Priority 1, 2 and 3 issues &lt;br /&gt;Bobby report comprised 72 HTML files: (Bobby report [zip file 183kb])&lt;br /&gt;&lt;br /&gt;1 summary page consisting of a short description of all (possible) issues found and links to the site files where an occurrence of the issue was found &lt;br /&gt;1 index page consisting of a list of all the files tested with links to their corresponding detailed report &lt;br /&gt;35 detailed report pages (1 for each file tested): &lt;br /&gt;detailing instances of specific issues with links to (locally stored) explanations of the issues and links to the associated checkpoints on the W3C web site &lt;br /&gt;divided into Priority 1, 2 and 3 issues &lt;br /&gt;further divided into issues either needing, or not needing, user checking to confirm their existence &lt;br /&gt;35 (1 for each file tested) 'text only' versions of the files tested &lt;br /&gt;InFocus report comprising 28 HTML files: (Infocus report [zip file 74kb])&lt;br /&gt;&lt;br /&gt;3 summary/index pages: &lt;br /&gt;a summary compliance report page, listing instances of and explaining each violation with links to the reports of "top 5 pages containing this violation". &lt;br /&gt;a page with links to a detailed report for each page tested, also listing the number of checkpoint 'violations' found on each page &lt;br /&gt;a page with links to a detailed report for each page tested ordered by (descending) total number of 'violations' found on each page &lt;br /&gt;25 detailed 'compliance report' pages (1 for each file tested) &lt;br /&gt;listing each violation found, the associated WCAG 1.0 checkpoint and the offending elements within the HTML code of the page. &lt;br /&gt;PageScreamer report comprising 66 HTML files: (Pagescreamer report [zip file 256kb])&lt;br /&gt;&lt;br /&gt;1 page containing a copy the WCAG 1.0 guidelines from the W3C web site &lt;br /&gt;1 'detail' page listing every HTML element that triggered a checkpoint violation, the elements' URL location and line number &lt;br /&gt;1 verification summary page for the site &lt;br /&gt;containing a graph showing the number of 'tags' in violation of each checkpoint &lt;br /&gt;a table listing all the checkpoints with links to the related sections of the locally stored copy of the WCAG guidelines. &lt;br /&gt;numbers of Compliance violation instances and whether the site passed/failed the checkpoint or the checkpoint needed further verification &lt;br /&gt;1 file containing a text description of the graph &lt;br /&gt;27 verification summary pages (one for each URL found) containing: &lt;br /&gt;containing a table listing all the checkpoints with links to the related sections of the locally stored copy of the WCAG guidelines. &lt;br /&gt;numbers of compliance violation instances and whether the site passed/failed the checkpoint or the checkpoint needed further verification &lt;br /&gt;35 pages containing lists of links to URL's (with links to associated report files) containing instances of a violation of a particular checkpoint (1 page per checkpoint) &lt;br /&gt;Overview of results&lt;br /&gt;A significant measure of the software tools and their ability to report on the accessibility problems of a site is the ability to find all the URL's on the target site. Some of the apparent discrepancies between software of the URL count could be apportioned to the software only listing the URL's of files that contain errors. But in this situation all 28 HTML files contained at least one instance of non conformance with a WCAG 1.0 checkpoint.&lt;br /&gt;&lt;br /&gt;Table 2: HTML files found  AccVerify Bobby InFocus PageScreamer Manual Check &lt;br /&gt;26 20 25 27 28 &lt;br /&gt;&lt;br /&gt;AccVerify &lt;br /&gt;Failed to find any instances of the MPEG (3) (multimedia) or AU (13) (audio) files &lt;br /&gt;Failed to find files (2) that were the targets of forms &lt;br /&gt;Bobby &lt;br /&gt;Identified instances of audio and multimedia files, although it failed to report some instances. &lt;br /&gt;Failed to find a file linked via meta based redirect &lt;br /&gt;Tested and reported twice on the 'Home' page [index.html] &lt;br /&gt;Failed to find a file linked via an image map based link &lt;br /&gt;Failed to find files (2) that were the targets of forms &lt;br /&gt;InFocus &lt;br /&gt;Failed to find files (3) that were the targets of forms &lt;br /&gt;PageScreamer &lt;br /&gt;Tested and reported twice on the 'Home' page [index.html] &lt;br /&gt;Failed to find files (2) that were the targets of forms &lt;br /&gt;Found instances of the MPEG (3) (multimedia) or AU (13) (audio) files but passed site on checkpoints relating to multimedia without verification. &lt;br /&gt;Accuracy and Definitive nature of results&lt;br /&gt;Another significant measure of the products efficacy is its ability to produce both accurate and definitive results without the need for further human interpretation.&lt;br /&gt;&lt;br /&gt;Table 3: Status of (15 priority 1) checkpoints tested   AccVerify Bobby InFocus PageScreamer Manual Check &lt;br /&gt;failed 2 2 5 3 11 &lt;br /&gt;passed 2 n/a 3 - 4 4 4 &lt;br /&gt;Human &lt;br /&gt;intervention (%) 11 (73%) 11 (73%) 8 (53%) 6 (40%) n/a &lt;br /&gt;Not reported 0 2 1 2 1 2 2 n/a &lt;br /&gt;&lt;br /&gt;Bobby and InFocus did not report upon issues that could be checked by the software and found not applicable e.g. they found no server-side image maps therefore did not report about issues concerning client side image maps &lt;br /&gt;PageScreamer failed to provide any information in the report about 2 Priority 1 checkpoints (4.1, 14.1). &lt;br /&gt;Bobby only reported on those checkpoints that it found the site to be in breach of. &lt;br /&gt;PageScreamer incorrectly reported the site having passed 2 checkpoints (1.3, 1.4) that the manual check revealed to be fails. &lt;br /&gt;Graphical representation of data from Table 3&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Table 4: Totals of reported potential/actual failures of (15 priority 1) checkpoints tested   AccVerify 1 Bobby 2 InFocus3 PageScreamer4 Manual Check &lt;br /&gt;failed 2 2 5 3 11 &lt;br /&gt;Potentials Failures &lt;br /&gt;(human Intervention)&lt;br /&gt; 11 11 8 6 n/a &lt;br /&gt;Totals 13 13 13 9 11 &lt;br /&gt;&lt;br /&gt;AccVerify overstated total potential failures in relation to actual failures &lt;br /&gt;Bobby overstated total potential failures in relation to actual failures &lt;br /&gt;InFocus overstated total potential failures in relation to actual failures &lt;br /&gt;PageScreamer defined the least checkpoints as 'potential failures', but understated total potential failures in relation to actual failures &lt;br /&gt;Graphical of representation of data from Table 4.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Over reporting&lt;br /&gt;'Over reporting of instances of potential checkpoint failures was a common feature of all the products reviewed.&lt;br /&gt;&lt;br /&gt;Table 5: Examples of reported Instances of potential/actual checkpoints failures   AccVerify  Bobby  InFocus PageScreamer Manual Check &lt;br /&gt;2.1 - Ensure that all information conveyed with color is also available without color  26 47 25 220 1 &lt;br /&gt;5.1 - For data tables, identify row and column headers.  11 12 1 164 1 &lt;br /&gt;5.2 - For data tables that have two or more logical levels of row or column headers, use markup to associate data cells and header cells. 11 13 4 464 1 &lt;br /&gt;&lt;br /&gt;Discussion&lt;br /&gt;While Bobby was the most successful tool in identifying multimedia and audio files it had the most problems identifying HTML files linked via the META and IMAGE MAP elements as well as those files that were the targets of FORM elements. The inability to identify some of the targets of Forms was a common defect among all the software.&lt;br /&gt;&lt;br /&gt;The product's ability to give a quantitative answer as to whether the site passed or failed against 15 WCAG 1.0 Priority 1 checkpoints varied from 60% for Pagescreamer to 28% for AccVerify, though it should be noted that the Pagescreamer quantitative results produced 2 false passes. The results highlight that the software tools either did not produce a quantitative report or could not produce an accurate report on the status of the site in relation to the majority of the Priority 1 checkpoints tested. For approximately 10 of the 15 Priority 1 checkpoints tested the reports inform us that the checkpoints need a 'visual', 'manual' or 'user' check. Furthermore, although along with the instructions to do a 'manual' check the reports detail potential instances of checkpoint violations (Table 5), which should be helpful in tracking down issues, when comparing the number of potential instances reported against the actual occurrence, it is evident that none of the tools does a very good job at identifying potential errors. All of the products over-reported potential checkpoint errors/violations. A number of checkpoints were detailed as potential errors on every page by some software tool reports. &lt;br /&gt;&lt;br /&gt;All of the products produce a report that upon initial consideration may appear as a detailed analysis of the accessibility problems found by the product on the site tested. Upon closer examination it is revealed that the software tools fail at the initial hurdle of correctly identifying the files to be tested. Furthermore, many of the accessibility issues that may occur on the site are not within the reach of the 'mechanical' rules based analysis that these products undertake. In an attempt to ensure that issues are not missed by the software, the reports tend to overstate the occurrence of potential problems, up to the point where a potential instance of a checkpoint violation is flagged for every file checked, thus undermining even the heuristic values of the report.&lt;br /&gt;&lt;br /&gt;Conclusion&lt;br /&gt;The research into the automated accessibility tools reported here and conducted at Vision Australia Foundation indicates that the rule of "caveat emptor" applies as equally to the field of accessibility testing tools as it does to buying a used car.&lt;br /&gt;&lt;br /&gt;All of the tools tested have advantages over free online testing tools [Steven Faulkner], the main adavantage being their ability check a whole site rather than a limited number of pages. Also files do not have to be published to the web before they can be tested, the user has greater control of what rules are applied when testing, the style and formatting of reports, and in some cases the software will automatically correct problems found.&lt;br /&gt;&lt;br /&gt;None of the tools evaluated, as expected, were able to identify all the HTML files and associated multimedia files that needed to be tested for accessibility on the site. &lt;br /&gt;&lt;br /&gt;All the software tools evaluated will assist the accessibility quality assurance process, however none of them will replace evaluation by informed humans. The user needs to be aware of their limitations and needs a strong understanding of accessibility issues and the implications for people with disabilities in order to interpret the reports and the accessibility issues or potential issues flagged by the software tools.&lt;br /&gt;&lt;br /&gt;We hope the analysis reported here will aid web site quality assurance managers in their choice of web accessibility testing software, and in their understanding of the limitations that automated tools have in the broader process of ensuring accessible web sites.&lt;br /&gt;&lt;br /&gt;References&lt;br /&gt;Chisholm, W. et al (Eds) 1999, Web Content Accessibility Guidelines 1.0, World Wide Web Consortium. [HREF2]&lt;br /&gt;&lt;br /&gt;Brewer, J &amp; Letourneau, C. (Eds) 2002, Evaluating Web Sites for Accessibility, World Wide Web Consortium. [HREF3]&lt;br /&gt;&lt;br /&gt;Chisholm, W &amp; Kasday, L (Eds) 2002, Evaluation, Repair, and Transformation Tools for Web Content Accessibility, World Wide Web Consortium. [HREF4]&lt;br /&gt;&lt;br /&gt;Watchfire, 2003, Welcome to Bobby.[HREF5]&lt;br /&gt;&lt;br /&gt;WebAIM, undated, WAVE 3.0 Accessibility Tool.[HREF6]&lt;br /&gt;&lt;br /&gt;Section 508: http://www.usdoj.gov/crt/508/508law.html [HREF7] &lt;br /&gt;&lt;br /&gt;Graves, Steve 2001, Check sites for 508 with audit-edit tools, Government Computer News [HREF8]&lt;br /&gt;&lt;br /&gt;Harrison, Laurie 2002, Web Accessibility Validation and Repair - Which Tool and Why? (introduction), Center On Disabilities Technology And Persons With Disabilities Conference 2002 (CSUN-2002) [HREF9]&lt;br /&gt;&lt;br /&gt;Harrison, L &amp; O'Grady, L 2002, Web Accessibility Validation and Repair: Which Tool and Why? (analysis), ATRC, University of Toronto.[HREF10]&lt;br /&gt;&lt;br /&gt;Steven Faulkner, Vision Australia Foundation, 2003, Free Web Development and Accessibility Tools. [HREF11]&lt;br /&gt;&lt;br /&gt;Hypertext References&lt;br /&gt;HREF1 &lt;br /&gt;http://www.visionaustralia.org.au/webaccessibility/ &lt;br /&gt;HREF2 &lt;br /&gt;http://www.w3.org/TR/WCAG10/ &lt;br /&gt;HREF3 &lt;br /&gt;http://www.w3.org/WAI/eval/ &lt;br /&gt;HREF4 &lt;br /&gt;http://www.w3.org/WAI/ER/existingtools.html &lt;br /&gt;HREF5 &lt;br /&gt;http://bobby.watchfire.com/bobby/html/en/index.jsp &lt;br /&gt;HREF6 &lt;br /&gt;http://www.wave.webaim.org:8081/wave/index.jsp &lt;br /&gt;HREF7 &lt;br /&gt;http://www.usdoj.gov/crt/508/508law.html &lt;br /&gt;HREF8 &lt;br /&gt;http://www.gcn.com/20_23/reviews/16783-1.html &lt;br /&gt;HREF9 &lt;br /&gt;http://www.csun.edu/cod/conf/2002/proceedings/279.htm &lt;br /&gt;HREF10 &lt;br /&gt;http://snow.utoronto.ca/access/evaltoolreview/validation.html &lt;br /&gt;HREF11 &lt;br /&gt;http://www.visionaustralia.org.au/webaccessibility/workshops/references.html#acheck &lt;br /&gt; &lt;br /&gt;Appendix - Table of results&lt;br /&gt;Priority 1 Checkpoints and Indication of software report on each checkpoint In General (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;1.1 Provide a text equivalent for every non-text element (e.g., via "alt", "longdesc", or in element content). This includes: images, graphical representations of text (including symbols), image map regions, animations (e.g., animated GIF's), applets and programmatic objects, ascii art, frames, scripts, images used as list bullets, spacers, graphical buttons, sounds (played with or without user interaction), stand-alone audio files, audio tracks of video, and video.  fail  fail  fail  fail  fail  &lt;br /&gt;2.1 Ensure that all information conveyed with color is also available without color, for example from context or markup.  fail  visual  user check  manual check  verify  &lt;br /&gt;4.1 Clearly identify changes in the natural language of a document's text and any text equivalents (e.g., captions).  fail  visual  user check  manual check  not reported  &lt;br /&gt;6.1 Organize documents so they may be read without style sheets. For example, when an HTML document is rendered without associated style sheets, it must still be possible to read the document.  fail  visual  user check  manual check  verify  &lt;br /&gt;6.2 Ensure that equivalents for dynamic content are updated when the dynamic content changes.  fail  visual  user check  manual check  verify  &lt;br /&gt;7.1 Until user agents allow users to control flickering, avoid causing the screen to flicker.  pass  visual  user check  manual check  verify  &lt;br /&gt;14.1 Use the clearest and simplest language appropriate for a site's content.  pass  visual  user check  manual check  not reported  &lt;br /&gt;And if you use images and image maps (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;1.2 Provide redundant text links for each active region of a server-side image map.  pass  pass  not reported  not reported  pass  &lt;br /&gt;9.1 Provide client-side image maps instead of server-side image maps except where the regions cannot be defined with an available geometric shape.  pass  pass  not reported  not reported  pass  &lt;br /&gt;And if you use tables (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;5.1 For data tables, identify row and column headers.  fail  visual  user check  fail  verify  &lt;br /&gt;5.2 For data tables that have two or more logical levels of row or column headers, use markup to associate data cells and header cells.  fail  visual  user check  fail  verify  &lt;br /&gt;And if you use frames (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;12.1 Title each frame to facilitate frame identification and navigation.  fail  fail  fail  fail  fail  &lt;br /&gt;And if you use applets and scripts (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;6.3 Ensure that pages are usable when scripts, applets, or other programmatic objects are turned off or not supported. If this is not possible, provide equivalent information on an alternative accessible page.  fail  visual  user check  manual check  fail  &lt;br /&gt;And if you use multimedia (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;1.3 Until user agents can automatically read aloud the text equivalent of a visual track, provide an auditory description of the important information of the visual track of a multimedia presentation.  fail  visual  user check  fail  pass  &lt;br /&gt;1.4 For any time-based multimedia presentation (e.g., a movie or animation), synchronize equivalent alternatives (e.g., captions or auditory descriptions of the visual track) with the presentation.  fail  visual  user check  manual check  pass  &lt;br /&gt;And if all else fails (Priority 1)  Manual Check  AccVerify  Bobby  InFocus Page Screamer  &lt;br /&gt;11.4 If, after best efforts, you cannot create an accessible page, provide a link to an alternative page that uses W3C technologies, is accessible, has equivalent information (or functionality), and is updated as often as the inaccessible (original) page.  not tested &lt;br /&gt;&lt;br /&gt;Copyright&lt;br /&gt;Vision Australia Foundation, © 2000. The authors assign to Southern Cross University and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to Southern Cross University to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7116486-115883529006982086?l=bobyjos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bobyjos.blogspot.com/feeds/115883529006982086/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7116486&amp;postID=115883529006982086' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default/115883529006982086'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default/115883529006982086'/><link rel='alternate' type='text/html' href='http://bobyjos.blogspot.com/2006/09/accessibility-testing.html' title='Accessibility Testing'/><author><name>bobyjos</name><uri>http://www.blogger.com/profile/04704110968990110850</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7116486.post-114310217677206354</id><published>2006-03-23T00:21:00.000-08:00</published><updated>2006-03-23T00:22:56.983-08:00</updated><title type='text'>Test 2</title><content type='html'>test&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7116486-114310217677206354?l=bobyjos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bobyjos.blogspot.com/feeds/114310217677206354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7116486&amp;postID=114310217677206354' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default/114310217677206354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default/114310217677206354'/><link rel='alternate' type='text/html' href='http://bobyjos.blogspot.com/2006/03/test-2.html' title='Test 2'/><author><name>bobyjos</name><uri>http://www.blogger.com/profile/04704110968990110850</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7116486.post-111771534238415333</id><published>2005-06-02T05:28:00.000-07:00</published><updated>2005-06-02T05:29:02.386-07:00</updated><title type='text'>Test Posting</title><content type='html'>Test&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7116486-111771534238415333?l=bobyjos.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://bobyjos.blogspot.com/feeds/111771534238415333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7116486&amp;postID=111771534238415333' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default/111771534238415333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7116486/posts/default/111771534238415333'/><link rel='alternate' type='text/html' href='http://bobyjos.blogspot.com/2005/06/test-posting.html' title='Test Posting'/><author><name>bobyjos</name><uri>http://www.blogger.com/profile/04704110968990110850</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7116486.post-111751602522798806</id><published>2005-05-31T22:04:00.000-07:00</published><updated>2005-05-30T22:11:36.276-07:00</updated><title type='text'>A TESTING METHODOLOGY AND ARCHITECTURE FOR COMPUTER SUPPORTED COOPERATIVE WORK SOFTWARE - By Robert Francis Dugan Jr.</title><content type='html'>&lt;strong&gt;A TESTING METHODOLOGY AND ARCHITECTURE FOR COMPUTER SUPPORTED COOPERATIVE WORK SOFTWARE&lt;br /&gt;By Robert Francis Dugan Jr.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;A thesis submitted to the graduate faculty of&lt;br /&gt;Rensselaer Polytechnic Institute in partial fulfillment of&lt;br /&gt;the requirements for the degree of&lt;br /&gt;DOCTOR OF PHILOSOPHY&lt;/strong&gt;&lt;br /&gt;Major Subject: Computer Science&lt;br /&gt;May 26, 2000 (for Graduation August 2000)&lt;br /&gt;Approved by ______________________________________________________&lt;br /&gt;Professor Ephraim P. Glinert, Computer Science&lt;br /&gt;Chairperson of Supervisory Committee&lt;br /&gt;______________________________________________________&lt;br /&gt;Professor Edwin H. Rogers, Computer Science&lt;br /&gt;Member&lt;br /&gt;______________________________________________________&lt;br /&gt;Professor Mark K. Goldberg, Computer Science&lt;br /&gt;Member&lt;br /&gt;_______________________________________________________&lt;br /&gt;Professor Mark Embrechts, Decision Sciences and Engineering&lt;br /&gt;Systems&lt;br /&gt;Member&lt;br /&gt;Rensselaer Polytechnic Institute&lt;br /&gt;Troy, New York&lt;br /&gt;ii&lt;br /&gt;Rensselaer Polytechnic Institute&lt;br /&gt;Abstract&lt;br /&gt;A TESTING METHODOLOGY AND&lt;br /&gt;ARCHITECTURE FOR COMPUTER&lt;br /&gt;SUPPORTED COOPERATIVE WORK&lt;br /&gt;SOFTWARE&lt;br /&gt;by Robert Francis Dugan Jr.&lt;br /&gt;Despite enormous potential, CSCW software is still immature. In particular, leading&lt;br /&gt;researchers in both the CSCW and testing fields have noted CSCW testing tools are nonexistent.&lt;br /&gt;This thesis contributes a methodology and architecture for execution based testing of&lt;br /&gt;CSCW software. The CSCW Application MEthodoLOgy for Testing (CAMELOT) provides&lt;br /&gt;an organized set of specific techniques that can be used for technological evaluation. The&lt;br /&gt;evaluation is organized into two phases: single user and multi-user. Single user evaluation is&lt;br /&gt;subdivided further into general computing and human computer interaction. General&lt;br /&gt;computing examines software components that provide basic application capabilities. Human&lt;br /&gt;computer interaction focuses on the interface between the user and the software application.&lt;br /&gt;Multi-user evaluation examines distributed computing and human-human interaction.&lt;br /&gt;Distributed computing scrutinizes components responsible for multitasking and&lt;br /&gt;multiprocessing in the application at the thread, process, processor and machine level.&lt;br /&gt;Human-human interaction focuses on how the software facilitates interaction between users&lt;br /&gt;during application use.&lt;br /&gt;Rebecca, our testing architecture contributes to both general and multiuser testing systems. In&lt;br /&gt;the area of general testing Rebecca:&lt;br /&gt;- Provides an extensible component and event model that allows the record/playback of&lt;br /&gt;non-GUI events&lt;br /&gt;- Allows selective event recording through record filtration&lt;br /&gt;- Promotes the integration of the test system into the development environment&lt;br /&gt;- Outputs test scripts in the developer’s native language&lt;br /&gt;- Reduces re-recording using component-centric events and runtime component resolution&lt;br /&gt;- Simplifies the test process using a simple VCR-like interface&lt;br /&gt;In the area of multiuser testing Rebecca:&lt;br /&gt;iii&lt;br /&gt;- Integrates live users into a test session with triggers that playback virtual user behavior&lt;br /&gt;based on user interface, state change, timer, or user customized events&lt;br /&gt;- Provides runtime configuration of triggers via the threshold models&lt;br /&gt;- Simplifies virtual user synchronization with deadlock detection and recovery&lt;br /&gt;- Simplifies multiuser script editing via a global clipboard&lt;br /&gt;- Maintains IPC independence, but allows IPC to be recorded&lt;br /&gt;- Scales well with a resource conserving architecture&lt;br /&gt;Our architecture was implemented in Java as a working system called Rebecca-J. The&lt;br /&gt;methodology, architecture, and working system were evaluated by testing a mature CSCW&lt;br /&gt;application. The evaluation uncovered several dozen problems with the CSCW system. In&lt;br /&gt;addition to validating our approach, the evaluation prompted immediate improvements to the&lt;br /&gt;architecture and implementation, and provided important ideas for future enhancements.&lt;br /&gt;iv&lt;br /&gt;TABLE OF CONTENTS&lt;br /&gt;1 Introduction......................................................................................................................................... 1&lt;br /&gt;1.1 Problem Overview and Motivation....................................................................................... 2&lt;br /&gt;1.2 The Contributions of Our Research ..................................................................................... 6&lt;br /&gt;1.2.1 CSCW Application Methodology for Testing .......................................................... 6&lt;br /&gt;1.2.2 Rebecca: An Architecture for Execution Based Testing of CSCW Software..... 7&lt;br /&gt;1.2.3 Evaluation....................................................................................................................... 8&lt;br /&gt;1.3 Overview of this Document................................................................................................... 9&lt;br /&gt;2 A Survey of Computer Supported Cooperative Work...............................................................10&lt;br /&gt;2.1 Groupware Applications .......................................................................................................11&lt;br /&gt;2.2 Groupware Toolkits ...............................................................................................................19&lt;br /&gt;3 A Preliminary Experiment ...............................................................................................................21&lt;br /&gt;3.1 Architecture.............................................................................................................................21&lt;br /&gt;3.2 Experimental Method ............................................................................................................22&lt;br /&gt;3.3 Task Overview........................................................................................................................23&lt;br /&gt;3.4 Evaluation, Results, and Analysis of Team Performance................................................27&lt;br /&gt;3.5 Lessons Learned from the Development of CollabBillboard ........................................28&lt;br /&gt;4 Survey of Prior Work in Testing Systems .....................................................................................32&lt;br /&gt;4.1 Goals of Testing......................................................................................................................32&lt;br /&gt;4.2 Research Testing Systems......................................................................................................35&lt;br /&gt;4.2.1 Requirements ................................................................................................................35&lt;br /&gt;4.2.2 Specification ..................................................................................................................35&lt;br /&gt;4.2.3 Design............................................................................................................................42&lt;br /&gt;4.2.4 Implementation ............................................................................................................43&lt;br /&gt;4.2.5 Integration .....................................................................................................................47&lt;br /&gt;4.2.6 System Testing ..............................................................................................................48&lt;br /&gt;4.3 Human Computer Interaction Testing ...............................................................................49&lt;br /&gt;4.3.1 Testing Architectures...................................................................................................49&lt;br /&gt;4.3.2 Usability Testing ...........................................................................................................51&lt;br /&gt;4.4 Commercial Test Systems .....................................................................................................52&lt;br /&gt;4.4.1 Test Planning.................................................................................................................53&lt;br /&gt;4.4.2 Test Management.........................................................................................................54&lt;br /&gt;4.4.3 Test Development........................................................................................................55&lt;br /&gt;4.4.4 Test Execution..............................................................................................................56&lt;br /&gt;4.4.5 Test Analysis..................................................................................................................58&lt;br /&gt;4.4.6 Test Measurement........................................................................................................60&lt;br /&gt;4.4.7 Multiuser Testing..........................................................................................................61&lt;br /&gt;5 A CSCW Application Methodology for Testing .........................................................................64&lt;br /&gt;5.1 Related Work ...........................................................................................................................64&lt;br /&gt;5.1.1 Taxonomy of Evaluation Methodologies................................................................64&lt;br /&gt;5.1.2 CSCW Evaluation Methodologies ............................................................................65&lt;br /&gt;5.2 A Technology Focused Methodology.................................................................................67&lt;br /&gt;5.3 Single User Evaluation...........................................................................................................69&lt;br /&gt;v&lt;br /&gt;5.3.1 General Computing .....................................................................................................69&lt;br /&gt;5.3.2 Human Computer Interaction...................................................................................71&lt;br /&gt;5.4 Multi-user Evaluation.............................................................................................................73&lt;br /&gt;5.4.1 Distributed Computing ...............................................................................................74&lt;br /&gt;5.4.2 Human-Human Interaction .......................................................................................79&lt;br /&gt;5.5 Conclusion...............................................................................................................................84&lt;br /&gt;5.5.1 Ordering an Evaluation...............................................................................................84&lt;br /&gt;5.5.2 Comparison to Existing Methodologies ..................................................................85&lt;br /&gt;5.5.3 Part of a Complete Evaluation ..................................................................................85&lt;br /&gt;6 Rebecca: An Architecture for Testing CSCW Applications......................................................87&lt;br /&gt;6.1 General Architecture ..............................................................................................................89&lt;br /&gt;6.1.1 Registration Management ...........................................................................................91&lt;br /&gt;6.1.2 Event List Management..............................................................................................91&lt;br /&gt;6.1.3 Component Management...........................................................................................95&lt;br /&gt;6.1.4 Playback Management.................................................................................................97&lt;br /&gt;6.1.5 State Management ........................................................................................................99&lt;br /&gt;6.1.6 Trigger Management..................................................................................................101&lt;br /&gt;6.2 General Infrastructure..........................................................................................................103&lt;br /&gt;6.2.1 IDE Integration..........................................................................................................103&lt;br /&gt;6.2.2 User Interface Independence...................................................................................106&lt;br /&gt;6.2.3 Extensible Component and Event Models...........................................................108&lt;br /&gt;6.2.4 Record Filtration.........................................................................................................113&lt;br /&gt;6.2.5 Script Simplification...................................................................................................114&lt;br /&gt;6.2.6 Playback Control and Feedback..............................................................................117&lt;br /&gt;6.2.7 Native Language Recordings ...................................................................................119&lt;br /&gt;6.3 Multiuser Support .................................................................................................................122&lt;br /&gt;6.3.1 Interprocess Communication Independence........................................................122&lt;br /&gt;6.3.2 Playback Orchestration .............................................................................................125&lt;br /&gt;6.3.3 Triggers........................................................................................................................137&lt;br /&gt;6.3.4 Threshold Model........................................................................................................139&lt;br /&gt;6.3.5 Global Clipboard........................................................................................................160&lt;br /&gt;6.3.6 Scalability.....................................................................................................................161&lt;br /&gt;6.3.7 Application Independence........................................................................................163&lt;br /&gt;7 Evaluation ........................................................................................................................................166&lt;br /&gt;7.1 The Reconfigurable Collaboration Network ...................................................................167&lt;br /&gt;7.2 Evaluation Phase I: Converting Rebecca to Java 1.2 .....................................................170&lt;br /&gt;7.3 Evaluation Phase II: Getting Rebecca to work with RCN...........................................171&lt;br /&gt;7.3.1 Component Detection ..............................................................................................172&lt;br /&gt;7.3.2 Component Naming..................................................................................................173&lt;br /&gt;7.3.3 Component Existence...............................................................................................175&lt;br /&gt;7.3.4 Modal Dialogs.............................................................................................................175&lt;br /&gt;7.3.5 Menu Bars....................................................................................................................176&lt;br /&gt;7.3.6 Synchronization Feedback........................................................................................176&lt;br /&gt;7.4 Evaluation Phase III: Evaluating RCN.............................................................................177&lt;br /&gt;7.4.1 Single User Tests ........................................................................................................177&lt;br /&gt;7.4.2 Multiuser Tests............................................................................................................185&lt;br /&gt;vi&lt;br /&gt;7.5 Discussion ..............................................................................................................................192&lt;br /&gt;8 Conclusion and Future Work........................................................................................................194&lt;br /&gt;8.1 CSCW Application Methodology for Testing .................................................................194&lt;br /&gt;8.2 Rebecca: An Architecture for Execution Based Testing of CSCW&lt;br /&gt;Applications ...........................................................................................................................195&lt;br /&gt;8.3 Evaluation..............................................................................................................................197&lt;br /&gt;8.4 Future Work...........................................................................................................................197&lt;br /&gt;8.4.1 The Future of CSCW Evaluation............................................................................197&lt;br /&gt;8.4.2 Multiuser Recording...................................................................................................198&lt;br /&gt;8.4.3 User Swapping ............................................................................................................198&lt;br /&gt;8.4.4 Remote Windowing ...................................................................................................199&lt;br /&gt;A Appendix: RCN Bugs Discovered During Evaluation.............................................................200&lt;br /&gt;A.1 Error message displayed when starting up RCNPublicServer in&lt;br /&gt;Win95/98 ...............................................................................................................................202&lt;br /&gt;A.2 Configuration of PATH shell variable necessary for&lt;br /&gt;NativeLibrary.dll for RCNPublicServer in Win95/98.......................................203&lt;br /&gt;A.3 ISServer does not always flush terminated RCNPublicServer ..................................204&lt;br /&gt;A.4 Documentation Errors.........................................................................................................205&lt;br /&gt;A.5 Inconsistent use of Quit, Exit, Leave, Cancel ..................................................211&lt;br /&gt;A.6 “Pick a IS” is grammatically incorrect...............................................................................212&lt;br /&gt;A.7 No version number displayed in RCNPublicServer, rcnClient,&lt;br /&gt;ISServer ...............................................................................................................................213&lt;br /&gt;A.8 User Preference Dialog Displays Invalid Colors ............................................................214&lt;br /&gt;A.9 Preference Dialog Displays Too Many Colors................................................................215&lt;br /&gt;A.10 Preference Dialog Allows Same Color for Two Users in Same Session....................216&lt;br /&gt;A.11 No lock mechanism for simultaneous edits of Team Information.............................217&lt;br /&gt;A.12 Race Condition Joining a Session ......................................................................................218&lt;br /&gt;A.13 Ghost Cursor Hidden By New Applications...................................................................219&lt;br /&gt;A.14 Sticky Mouse Buttons...........................................................................................................220&lt;br /&gt;A.15 Multiple Client Control of Public Machine......................................................................221&lt;br /&gt;A.16 Incorrectly Translated Keys ................................................................................................222&lt;br /&gt;A.17 Sticky Shift, Alt, and Ctrl Keys ....................................................................................223&lt;br /&gt;A.18 Race Condition in rcnClient’s User Interface ...........................................................224&lt;br /&gt;A.19 Race Conditions Joining Sessions, Users, Teams, Publics ............................................225&lt;br /&gt;A.20 Inconsistent use of OK, Okay............................................................................................226&lt;br /&gt;A.21 Flickering Ghost Cursor......................................................................................................227&lt;br /&gt;A.22 Confusing Display of Session Clients ...............................................................................228&lt;br /&gt;A.23 Memory Leaks in Public and Client When Ghosting ....................................................229&lt;br /&gt;A.24 Can’t play Indiana Jones from rcnClient......................................................................230&lt;br /&gt;A.25 Correspondence from RCN Development Team..........................................................231&lt;br /&gt;B Rebecca-J Information....................................................................................................................233&lt;br /&gt;References ...............................................................................................................................................234&lt;br /&gt;vii&lt;br /&gt;LIST OF FIGURES&lt;br /&gt;Figure 1: Rapid prototyping model of the software life cycle [10] .................................................... 3&lt;br /&gt;Figure 2: Time/Space Taxonomy of Groupware [5] .........................................................................10&lt;br /&gt;Figure 3: CollabBillboard socket shadow.............................................................................................22&lt;br /&gt;Figure 4: Sketch of experimental design...............................................................................................23&lt;br /&gt;Figure 5: Selecting a billboard site in the city.......................................................................................24&lt;br /&gt;Figure 6: Control window for assembling billboard. Both users see the same&lt;br /&gt;window, view the entire billboard frame and move pieces.................................................24&lt;br /&gt;Figure 7: Assigned roles "view billboard" window. This user has a zoomed out&lt;br /&gt;view of the billboard frame but cannot move any pieces ...................................................25&lt;br /&gt;Figure 8: Assigned roles "place billboard" window. This user has a zoomed in&lt;br /&gt;view of the billboard frame and can move pieces. ...............................................................26&lt;br /&gt;Figure 9: Z Language schema for CollabBillboard.............................................................................37&lt;br /&gt;Figure 10: GIL: Specification for queueRemotePieceUpdate$n......................................................38&lt;br /&gt;Figure 11: GIL specification for drawRemotePieceUpdate$n.........................................................39&lt;br /&gt;Figure 12: Control flow graph for loop with five possible logic paths...........................................45&lt;br /&gt;Figure 13: Code fragment from CollabBillboard................................................................................46&lt;br /&gt;Figure 14: Usability guidelines from [87]..............................................................................................49&lt;br /&gt;Figure 15: Final Exam C/S Test Multiuser Architecture ..................................................................59&lt;br /&gt;Figure 16: Taxonomy of Evaluation Methodologies [122]...............................................................65&lt;br /&gt;Figure 17: Intersecting Technologies of a CSCW Application ........................................................67&lt;br /&gt;Figure 18: CAMELOT’s Single/Multiuser Stages..............................................................................68&lt;br /&gt;Figure 19: Technology and Social Aspects of CSCW [122] .............................................................85&lt;br /&gt;Figure 20: General architecture diagram for Rebecca........................................................................89&lt;br /&gt;Figure 21: Registration management architecture diagram for Rebecca. .......................................90&lt;br /&gt;Figure 22: High level view of event list model/view/controller architecture ...............................93&lt;br /&gt;Figure 23: Detailed view of event list model/view/controller architecture. .................................94&lt;br /&gt;Figure 24: Component management architecture diagram for Rebecca ........................................95&lt;br /&gt;Figure 25: Playback management architecture diagram for Rebecca..............................................97&lt;br /&gt;Figure 26: Algorithm for event list replay. ...........................................................................................98&lt;br /&gt;Figure 27: Algorithm for native language replay.................................................................................98&lt;br /&gt;Figure 28: State management architecture diagram for Rebecca.....................................................99&lt;br /&gt;Figure 29: Trigger management architecture diagram for Rebecca...............................................101&lt;br /&gt;Figure 30: Connecting to Rebecca-J using IBM’s Visual Age Visual Composition&lt;br /&gt;Editor.........................................................................................................................................105&lt;br /&gt;Figure 31: Connecting to Rebecca-J using inline code ....................................................................106&lt;br /&gt;Figure 32: Recording is played back correctly event though UI components have&lt;br /&gt;moved. .......................................................................................................................................107&lt;br /&gt;Figure 33: UI Components translated to Rebecca’s Component Hierarchy...............................109&lt;br /&gt;Figure 34: Creation and initialization of PropertyChangeComponentInt in&lt;br /&gt;AgentTester ...............................................................................................................................110&lt;br /&gt;Figure 35: AgentTester’s modified setter for monitoring state change to integer&lt;br /&gt;count ..........................................................................................................................................111&lt;br /&gt;viii&lt;br /&gt;Figure 36: Implementation of dispatchEvent() for PropertyChangeEventRecord....................111&lt;br /&gt;Figure 37: Implementation of playbackEvent() for AgentTester...............................................112&lt;br /&gt;Figure 38: Selective recording with Rebecca-J...................................................................................114&lt;br /&gt;Figure 39: Recorder turned on and recording of plus push button press made.........................115&lt;br /&gt;Figure 40: Push button press events copied and pasted back into the event list........................116&lt;br /&gt;Figure 41: Result of replay.....................................................................................................................116&lt;br /&gt;Figure 42: Feedback for synchronization state in Rebecca-J..........................................................118&lt;br /&gt;Figure 43: Implementation of MouseEventRecord’s toJavaString() Method .............................120&lt;br /&gt;Figure 44: Sample output from MouseEventRecord’s toJavaString() Method .......................121&lt;br /&gt;Figure 45: Sample implementation of executeEventRecordList().................................................121&lt;br /&gt;Figure 46: Recording customized with a for loop ............................................................................122&lt;br /&gt;Figure 47: Implementation of setCount() ..........................................................................................124&lt;br /&gt;Figure 48: Implementation of remoteSetCount() .............................................................................124&lt;br /&gt;Figure 49: Record filtration to remove redundant events while recording IPC. ........................125&lt;br /&gt;Figure 50: Original Playback Orchestration Proposal .....................................................................126&lt;br /&gt;Figure 51: An (V+E) algorithm to determine cycles in a graph.................................................128&lt;br /&gt;Figure 52: Resource graph (left) with deadlock cycle detected (right) ..........................................129&lt;br /&gt;Figure 53: Reworked playback orchestration ....................................................................................130&lt;br /&gt;Figure 54: Algorithm to Process Synchronization Events..............................................................131&lt;br /&gt;Figure 55: Algorithm for the removal of a synchronization event from a script. ......................131&lt;br /&gt;Figure 56: Determining synchronization points for SecondWind’s recording. ..........................132&lt;br /&gt;Figure 57: Synchronization Dialog for SecondWind’s Recording.................................................133&lt;br /&gt;Figure 58: Synchronization event inserted just before mouse press on slider bar&lt;br /&gt;in SecondWind’s recording.....................................................................................................134&lt;br /&gt;Figure 59: Timer trigger and virtual user script to support the metronome in&lt;br /&gt;Rebecca-J...................................................................................................................................135&lt;br /&gt;Figure 60: Deadlocked scripts. .............................................................................................................136&lt;br /&gt;Figure 61: Deadlock dialog. ..................................................................................................................136&lt;br /&gt;Figure 62: User interface for triggers in Rebecca-J...........................................................................137&lt;br /&gt;Figure 63: A threshold editor is necessary for a simple event type threshold&lt;br /&gt;model. ........................................................................................................................................140&lt;br /&gt;Figure 64: Rebecca-J’s editor for the propertyChangeInt threshold model. ...............................140&lt;br /&gt;Figure 65: The mouseRegion threshold model editor...............................................................141&lt;br /&gt;Figure 66: Timer browser in Rebecca-J ..............................................................................................142&lt;br /&gt;Figure 67: Configuring a timer trigger for a single virtual user. .....................................................143&lt;br /&gt;Figure 68: Ordering recording players in Rebecca-J ........................................................................144&lt;br /&gt;Figure 69: Adding a customized threshold model to ThresholdList’s initialize()&lt;br /&gt;method.......................................................................................................................................145&lt;br /&gt;Figure 70: Implementation of the compare() method for low level key event&lt;br /&gt;threshold models. .....................................................................................................................146&lt;br /&gt;Figure 71: Implementation of compare() method for keySequence threshold&lt;br /&gt;model. ........................................................................................................................................147&lt;br /&gt;Figure 72: Constructor for mouseRegion threshold model............................................................148&lt;br /&gt;Figure 73: Implementation of event sequencing threshold model in Rebecca-J. .......................149&lt;br /&gt;Figure 74: A shared drawing/chat application ..................................................................................154&lt;br /&gt;Figure 75: An example of trigger chaining.........................................................................................156&lt;br /&gt;ix&lt;br /&gt;Figure 76: Trigger chaining extends shared drawing area test........................................................157&lt;br /&gt;Figure 77: Trigger state chaining example..........................................................................................158&lt;br /&gt;Figure 78: Derivation of unique name from root component.......................................................174&lt;br /&gt;x&lt;br /&gt;LIST OF TABLES&lt;br /&gt;Table 1: SCR Table for CollabBillboard...............................................................................................40&lt;br /&gt;Table 2: Equivalence classes for cos...................................................................................................44&lt;br /&gt;Table 3: Final Exam C/S-Test™ TML Script Commands for Multiuser Script&lt;br /&gt;Synchronization ..........................................................................................................................62&lt;br /&gt;Table 4: Session Control window from SQA Suite™.......................................................................64&lt;br /&gt;Table 5: General Computing Techniques from 1[14] and 2[10]........................................................70&lt;br /&gt;Table 6 General Computing ∩Human Computer Interaction Techniques from&lt;br /&gt;1[125], 2[40] ..................................................................................................................................72&lt;br /&gt;Table 7: Usability Techniques from [40] ..............................................................................................73&lt;br /&gt;Table 8: Distributed Computing Techniques......................................................................................77&lt;br /&gt;Table 9: General Computing ∩Distributed Computing Techniques............................................78&lt;br /&gt;Table 10: Human Computer Interaction ∩Distributed Computing Techniques .......................79&lt;br /&gt;Table 11: Human-Human Interaction Techniques ............................................................................82&lt;br /&gt;Table 12: Human-Human Techniques Organized by CAMELOT Code .....................................84&lt;br /&gt;Table 13: Rebecca’s Remote Objects....................................................................................................93&lt;br /&gt;Table 14: Threshold Models implemented in Rebecca-J.................................................................139&lt;br /&gt;Table 15: Bugs discovered in RCN using CAMELOT and Rebecca-J ........................................177&lt;br /&gt;Table 16: RCN's shared objects classified by coupling and architecture......................................189&lt;br /&gt;Table 17: Results of RCN Ghost Scalability Testing .......................................................................190&lt;br /&gt;xi&lt;br /&gt;ACKNOWLEDGMENTS&lt;br /&gt;My six-year doctoral journey was filled with detours. Some needed to be explored, some&lt;br /&gt;should have been left alone, and some thank goodness, I managed to avoid. I suspect the&lt;br /&gt;twenty other students that entered the program with me in 1994 began a similar journey.&lt;br /&gt;Thirteen made it through the qualification exam. Four passed the candidacy exam. Three are&lt;br /&gt;completing the degree. I wanted to thank the people who helped me make this journey a&lt;br /&gt;success.&lt;br /&gt;Thank you, Mr. Brown, my eighth grade science teacher at East Lyme Junior High School.&lt;br /&gt;Your guidance and confidence in my abilities awakened a source of inner strength and drive&lt;br /&gt;that changed my life forever. You are a wonderful teacher.&lt;br /&gt;Thank you, Mom and Dad for watching over me when I was young and being supportive&lt;br /&gt;while letting me find my own way as an adult. Dad, you taught me the value of hard work and&lt;br /&gt;persistence. Mom, you taught me an artist’s creativity.&lt;br /&gt;Thank you Mike, Tim, Kathleen, and Grandmommie. During the frustrations and doubts that&lt;br /&gt;appeared along the way, your faith and confidence that I was doing the right thing reaffirmed&lt;br /&gt;my own.&lt;br /&gt;Thank you cousins: Jenny, Katie, Lizzie, Byron, Brendan, and Aeron. You were my home&lt;br /&gt;away from home during my tenure here at Rensselear. Your questions over the years about&lt;br /&gt;what grade I was in were fun to answer and a pressing reminder to finish.&lt;br /&gt;Thank you, friends in the Computer Science department: Jeff Neshewait, Patrick Fry, Amir&lt;br /&gt;and Amanda Sehic, Dr. Stephen Blythe, Rick Klein, Gregg Steuben, Louis Ziantl, Terry&lt;br /&gt;Hayden, Pam Paslow, Darren Lim, Quincy Stokes, and Lina Guzman. You made me feel&lt;br /&gt;welcome, gave me great advice, and showed me how to have fun in Troy. The time has passed&lt;br /&gt;too quickly.&lt;br /&gt;xii&lt;br /&gt;Thank you, friends in the Literature, Language, and Communications department: Lynne&lt;br /&gt;Cooke, Anne Navin, Dr. Joe Downing, Dr. Lee Honeycutt and Carolyn Honeycutt. You kept&lt;br /&gt;the other side of my brain from atrophying while I was immersed in geekdom.&lt;br /&gt;Thank you, Bill Oldfield and Paula Paul. You two had a profound impact on my professional&lt;br /&gt;career. You gave me responsibility and challenging projects and your confidence in me&lt;br /&gt;brought a dawning realization that I could succeed at anything I set my mind to.&lt;br /&gt;Thank you, Dr. Steven Howes. You’ve been my best friend for over twenty years. You blazed&lt;br /&gt;the Ph.D path before me, showing me that it was attainable by mere mortals. You advice and&lt;br /&gt;support as a veteran of the process was invaluable.&lt;br /&gt;Thank you, WPI professors Nabil Hachem, Matthew Ward, E. Malcom Parkinson, Michael&lt;br /&gt;Gennert, and Stanley Selkow. I clearly remember the look on Matt’s face one night as he&lt;br /&gt;described the sabbatical to Australia he was about to take. Your encouragement and&lt;br /&gt;enthusiasm when I was deciding whether to pursue a doctorate tipped the scale.&lt;br /&gt;Thank you, Professor Ephraim Glinert. You took me on as an advisee and gave me freedom&lt;br /&gt;to pursue my own curiosity. Your wise counsel kept me from straying down too many dead&lt;br /&gt;ends. Your willingness to fund my research on and off campus, and to grant a brief leave of&lt;br /&gt;absence gave me the flexibility I needed to get the degree completed.&lt;br /&gt;Thank you, Professor Edwin Rogers. Your advice and involvement as a member of my&lt;br /&gt;committee was invaluable. You prodded me to produce a formal testing methodology that has&lt;br /&gt;become an important part of the thesis. Your research group’s application, RCN, was exactly&lt;br /&gt;the kind of collaborative system I needed for an evaluation of the thesis. Finally, our many&lt;br /&gt;conversations about sailing helped kept me sane.&lt;br /&gt;Thank you, Rensselaer computer science senior J.J. Johns. You lead the RCN development&lt;br /&gt;team and helped me a great deal during the evaluation phase of my thesis. I appreciated your&lt;br /&gt;willingness to accommodate my needs while taking a full course load and continuing your&lt;br /&gt;regular RCN duties.&lt;br /&gt;xiii&lt;br /&gt;A special thank you to my wife, Becky Dugan. I’m so grateful for your support this past year.&lt;br /&gt;You’ve taken care of a lot of the details of daily life so I could focus on finishing this&lt;br /&gt;dissertation. You’ve also been a great editor, therapist, and friend.&lt;br /&gt;“The journey is the reward” to quote a Tao saying and I couldn’t agree more. These past six&lt;br /&gt;years have been the most incredible of my adult life. I’ve honed my skills as a computer&lt;br /&gt;scientist, taught classes, worked for an Internet startup company, and conducted serious&lt;br /&gt;academic research. To top it all in a Wilderness First Aid class on campus I met the best thing&lt;br /&gt;that ever happened to me: my wife Becky. Thank you Rensselaer!&lt;br /&gt;1&lt;br /&gt;1 Introduction&lt;br /&gt;Human beings are social animals [1]. Many of the developments that are the hallmarks of&lt;br /&gt;human society can be traced to the need to interact and cooperate. Language allows more&lt;br /&gt;efficient and expressive communication. Money is used to acquire goods and services&lt;br /&gt;from others. Organizations such as the family, university, workplace, government, and law&lt;br /&gt;that preserve, protect, and advance humanity rely on complex interplay between&lt;br /&gt;individuals [2].&lt;br /&gt;Technology has also played an important role in the evolution of humanity. Tools of the&lt;br /&gt;mind - for gathering, processing, and distributing information - have had the greatest&lt;br /&gt;impact in the twentieth century [3]. Among these tools is the computer, arguably the most&lt;br /&gt;powerful tool ever developed. This power comes from the computer’s ability to deal with&lt;br /&gt;information management in a generalized fashion [4].&lt;br /&gt;"A computer-based system that supports groups of people engaged in a common task (or&lt;br /&gt;goal) and provides an interface to a shared environment" is called groupware [5]. Douglas&lt;br /&gt;Englebart published a visionary paper describing a groupware system called NLS in 1968.&lt;br /&gt;NLS contained many of the basic functions that can be found in modern groupware&lt;br /&gt;systems including e-mail, shared annotations, shared screens, shared pointers, and&lt;br /&gt;audio/video conferencing [6]. During the 1970s, e-mail and threaded text conversations&lt;br /&gt;(e.g. conferencing systems and bulletin boards) became commonplace.&lt;br /&gt;The need to interact and socialize combined with the technological progress of the past&lt;br /&gt;several decades has led to the development of a branch of study known as Computer&lt;br /&gt;Supported Cooperative Work (CSCW). Research expertise in the CSCW field covers a&lt;br /&gt;wide range of disciplines including computer science, psychology, anthropology, and&lt;br /&gt;education. Applications that fall under the CSCW umbrella are diverse: electronic mail,&lt;br /&gt;newsgroups, chat, multi-user editors, meeting support, videoconferencing, shared&lt;br /&gt;simulations, and workflow are some examples.&lt;br /&gt;Despite enormous potential, CSCW applications are still immature. Four software&lt;br /&gt;technology components must be successfully integrated in order to create a useful system:&lt;br /&gt;2&lt;br /&gt;General computing provides basic application functionality found in any software&lt;br /&gt;system. Determining that a software program, even a simple one, functions&lt;br /&gt;correctly has been the subject of decades of research.&lt;br /&gt;Human-computer interaction technology supports the interaction between a user and&lt;br /&gt;the application. All of the difficulties inherent in developing the interface for a&lt;br /&gt;single user system apply including: iterative design, reactive programming,&lt;br /&gt;multithreading, undo/redo, and real-time programming.&lt;br /&gt;Distributed systems cover software that supports the execution of the application on&lt;br /&gt;multiple computers. Classic problems of multiprocessing that have to be&lt;br /&gt;confronted include: inter-process communication, process synchronization,&lt;br /&gt;session management, and fault tolerance.&lt;br /&gt;Human-human interaction deals with functionality supporting interaction among&lt;br /&gt;several users. Issues include: coordination, coupling, privacy and user awareness.&lt;br /&gt;Creating, testing, and maintaining a program that uses any one of these software&lt;br /&gt;technologies is difficult. The effort involved in a system that combines all these&lt;br /&gt;technologies is truly daunting.&lt;br /&gt;To attack these difficulties, the CSCW research community has tried to simplify the&lt;br /&gt;creation of groupware through the development of toolkits. These toolkits address four&lt;br /&gt;important areas: run-time architecture, programming abstractions, groupware widgets, and&lt;br /&gt;session management. The run-time architecture aids the programmer with process&lt;br /&gt;management and inter-process communication. Programming abstractions simplify&lt;br /&gt;synchronization of distributed events and data. Groupware widgets provide the&lt;br /&gt;programmer with GUI components for multiuser applications. Session management&lt;br /&gt;allows the programmer to customize how users create, join, leave, and manage a multiuser&lt;br /&gt;application.&lt;br /&gt;1.1 Problem Overview and Motivation&lt;br /&gt;Researchers concede that there is room for improvement of groupware toolkits [7]. For&lt;br /&gt;example, little work has been done to integrate audio and video into CSCW applications&lt;br /&gt;[8]. By examining the software lifecycle, other areas for CSCW application improvement&lt;br /&gt;can be discovered (see Figure 1).&lt;br /&gt;The difficulties encountered in creating CSCW applications also apply to their verification.&lt;br /&gt;For example, security and privacy need to be validated before users can feel confident that&lt;br /&gt;3&lt;br /&gt;their private work is protected. The development process that user interface intensive&lt;br /&gt;CSCW applications go through requires constant reevaluation. Undo/Redo scenarios can&lt;br /&gt;get extremely complicated in multiuser settings, and require thorough verification.&lt;br /&gt;Distributed systems like CSCW software are “notoriously difficult to write, test, and debug"&lt;br /&gt;[9]. . Leading researchers in both the CSCW and testing fields note “CSCW testing tools&lt;br /&gt;are non-existent” [8].&lt;br /&gt;To date CSCW evaluation efforts have been broad based, advocating the examination of&lt;br /&gt;both the social and technological aspects of an application. These broad based approaches&lt;br /&gt;combined with the research community’s preference for social evaluation have created a&lt;br /&gt;lack of specific techniques for the technological evaluation of CSCW software. A&lt;br /&gt;methodology is sorely needed given the complexity of the testing task.&lt;br /&gt;Figure 1: Rapid prototyping model of the software life cycle [10]&lt;br /&gt;In addition to the lack of techniques for testing, there is also the logistical problem of&lt;br /&gt;finding “real” users to exercise the software [11]. A usability test requires users to exercise&lt;br /&gt;the application. Typically, the first user to exercise a compiled and linked program is the&lt;br /&gt;developer. When the developer is satisfied that the application is operating properly, a&lt;br /&gt;Specifications&lt;br /&gt;Verify&lt;br /&gt;Rapid&lt;br /&gt;Prototyping&lt;br /&gt;Verify&lt;br /&gt;Requirements&lt;br /&gt;Design&lt;br /&gt;Verify&lt;br /&gt;Implementation&lt;br /&gt;Test&lt;br /&gt;Integration&lt;br /&gt;Test&lt;br /&gt;Maintenance&lt;br /&gt;Changed&lt;br /&gt;Requirements&lt;br /&gt;Verify&lt;br /&gt;4&lt;br /&gt;second stage begins when real users are brought in for further study. It is relatively easy&lt;br /&gt;for a developer to play the role of a user in a single user application, but in a CSCW setting&lt;br /&gt;this becomes more challenging. “Because we need at least two or more people for each&lt;br /&gt;observation scenario, we spend more time scheduling subjects and setting up equipment to&lt;br /&gt;observe each subject” [12]. It is hard enough to get one user to commit to a block of test&lt;br /&gt;time. It is even more difficult to get two or more users to agree to the same block of time.&lt;br /&gt;Higher costs in terms of time and money are incurred during CSCW testing because of this&lt;br /&gt;scheduling problem and the greater number of users needed for testing.&lt;br /&gt;A common sense approach runs both users’ portions of the program on a single machine.&lt;br /&gt;Input and output are straightforward since both users have the same keyboard, mouse, and&lt;br /&gt;display. It is possible for the developer to see the immediate effect that one user’s action&lt;br /&gt;has on the other because all output goes to the same display. From a cost standpoint, this&lt;br /&gt;method is attractive because it only requires a single machine. There are, however,&lt;br /&gt;significant drawbacks to this approach. Concurrency is severely restricted because only&lt;br /&gt;one user can have the input focus on the machine. Network performance is inaccurately&lt;br /&gt;represented because communication between users never leaves the local machine.&lt;br /&gt;General system performance is also misrepresented. In a heavily graphical application, for&lt;br /&gt;example, the performance when multiple users run on the same machine may be&lt;br /&gt;unacceptable due to intense image manipulation. Screen real estate can also be a problem.&lt;br /&gt;Since many CSCW applications are designed for one user per display, it may be difficult to&lt;br /&gt;view both users’ output simultaneously. Other one-per-machine resources may not be&lt;br /&gt;shared properly. For example, there is only one system cursor per machine. It may be&lt;br /&gt;impossible to test a system cursor remote control function until the application runs on&lt;br /&gt;two machines. Multiuser audio output is also difficult to test on a single machine.&lt;br /&gt;Distributing a two-user application across two machines eliminates most of the single user&lt;br /&gt;problems and more accurately represents how the system will behave. However, a single&lt;br /&gt;developer trying to exercise the application on two machines requires a great deal of&lt;br /&gt;dexterity and agility. Two displays provide an overwhelming amount of screen surface to&lt;br /&gt;observe during simultaneous visual updates. Multiple keyboards and mice allow&lt;br /&gt;concurrent input, but require dexterous skills for anything beyond a simple key-press or&lt;br /&gt;5&lt;br /&gt;button-click. Imagine a single developer trying to type two sentences on two keyboards at&lt;br /&gt;the same time! Sophisticated simultaneous mouse manipulation is also difficult. Audio&lt;br /&gt;also presents a problem. It can be difficult, for example, to isolate which machine is&lt;br /&gt;producing audio output during execution. Headphones offer an option with multiple&lt;br /&gt;testers, but this isn’t possible with a single developer. The difficulty of usability testing a&lt;br /&gt;CSCW application increases when three, four, or more users are added to the system.&lt;br /&gt;We acquired first hand experience with the difficulties of developing and testing a CSCW&lt;br /&gt;application during the creation of CollabBillboard [13]. CollabBillboard is a multiuser&lt;br /&gt;simulation developed to test the theory that explicit user roles can induce greater&lt;br /&gt;collaboration. Although our evaluation of the application supported the hypothesis, we&lt;br /&gt;found the entire process frustrating. The biggest problem was how much we&lt;br /&gt;underestimated the amount of time needed to complete the application. It took almost&lt;br /&gt;three times longer than we expected! A major contributor to the delay was difficulty in&lt;br /&gt;finding subjects to help test the application. For the reasons discussed above, a single user,&lt;br /&gt;the developer, was not sufficient to thoroughly exercise the program. It was often&lt;br /&gt;necessary to comb the halls for volunteers, and as the months went by, they became&lt;br /&gt;increasingly reluctant.&lt;br /&gt;We began an investigation of testing systems to determine if any of them could have&lt;br /&gt;helped during the development of CollabBillboard. The research community has focused&lt;br /&gt;primarily on efforts that automate verification early in the software life cycle. The earlier a&lt;br /&gt;software error can be detected in the life cycle, the less costly it is to fix [14]. Even with&lt;br /&gt;black box and white box testing (see Section 4.2.4), which appear late in the cycle,&lt;br /&gt;automatic testing techniques are used. Work in early life testing has proven impractical for&lt;br /&gt;large complex applications. Late cycle techniques like black and white box testing are&lt;br /&gt;intractable for all but the simplest of programs. The research community has almost&lt;br /&gt;completely ignored the system test stage.&lt;br /&gt;The commercial world, on the other hand, takes a less formal, execution-based approach&lt;br /&gt;to verification. The tester is responsible for manually creating test cases to be executed&lt;br /&gt;against the system, with little guidance from the testing tool. The test cases are executed&lt;br /&gt;against the application during the implementation, integration, and system test phases.&lt;br /&gt;6&lt;br /&gt;Fixed test cases are insufficient for the verification of a CSCW application. There is no&lt;br /&gt;opportunity for the CSCW tester to participate in the test. The tester cannot change the&lt;br /&gt;direction of a test case on the fly. As a passive observer, the tester cannot view the actions&lt;br /&gt;of a user effectively because the automated test executes too quickly. Finally, the tester&lt;br /&gt;lacks fine-grained control over the virtual users participating in the test.&lt;br /&gt;1.2 The Contributions of Our Research&lt;br /&gt;Our research has focused on improvements to execution based testing of CSCW software.&lt;br /&gt;We have developed CAMELOT, a CSCW Application MEthodoLOgy for Testing.&lt;br /&gt;Developers and quality assurance personnel can use CAMELOT to evaluate software&lt;br /&gt;technology that comprises a CSCW application. We devised Rebecca, an architecture for&lt;br /&gt;an execution based test system, motivated by the desire to support live user participation in&lt;br /&gt;a CSCW test. In addition, the architecture makes important contributions to general&lt;br /&gt;execution based testing systems. To determine the efficacy of our work, CAMELOT and&lt;br /&gt;a Java based implementation of Rebecca were used to evaluate a mature CSCW&lt;br /&gt;application: Rensselaer Collaborative Network (RCN). The evaluation uncovered over&lt;br /&gt;twenty bugs in RCN, flaws in Rebecca and the implementation, and provided valuable&lt;br /&gt;feedback for future work.&lt;br /&gt;1.2.1 CSCW Application Methodology for Testing&lt;br /&gt;Existing methodologies take a broad based approach to the evaluation of a CSCW&lt;br /&gt;application. While acknowledging that technology plays a role in a CSCW system, these&lt;br /&gt;methods give few details on how its evaluation should proceed. The CSCW Application&lt;br /&gt;Methodology for Testing (CAMELOT) provides an organized set of specific techniques&lt;br /&gt;that can be used for technological evaluation. The methodology breaks the testing process&lt;br /&gt;into two stages: single user and multi-user. In the single user stage, General Computing&lt;br /&gt;and Human Computer Interaction features are examined. During the multi-user stage,&lt;br /&gt;Distributed Computing and Human-Human Interaction aspects are investigated.&lt;br /&gt;A unique code is associated with each technique. The code provides a classification&lt;br /&gt;scheme for the tests used and problems uncovered during application evaluation. We&lt;br /&gt;believe CAMELOT’s techniques are inclusive of most of the technology tests an evaluator&lt;br /&gt;would want to perform on a CSCW application.&lt;br /&gt;7&lt;br /&gt;1.2.2 Rebecca: An Architecture for Execution Based Testing of CSCW Software&lt;br /&gt;A critical component is missing from multiuser CSCW application development that is&lt;br /&gt;taken for granted in single user applications: support for live user testing. Anytime someone&lt;br /&gt;wants to test a single user application, they can pose as the user and run the application.&lt;br /&gt;As explained above, it is very difficult for a single person to perform a live user test when&lt;br /&gt;multiple users are required. State of the art commercial and research testing systems do not&lt;br /&gt;provide adequate guidance or support for a single person to perform live multiuser&lt;br /&gt;verification.&lt;br /&gt;Our approach to integrating a live user into an execution based testing architecture focuses&lt;br /&gt;on the shortcomings of traditional execution based test systems. Rebecca makes&lt;br /&gt;significant contributions to the general infrastructure of execution based testing systems:&lt;br /&gt;The record/playback process is improved beyond the user interface with&lt;br /&gt;extensible component and event models. Any application activity can be replayed&lt;br /&gt;if the source is defined as a component, and the activity is defined as an event.&lt;br /&gt;A record filtration system is defined that allows the user to filter events by selecting&lt;br /&gt;which components participate in a recording. In past systems, the only filtration&lt;br /&gt;options were manually intensive intermittent recording or editing of the recording.&lt;br /&gt;Unlike traditional testing systems that view testing as a separate task from&lt;br /&gt;development, the architecture seamlessly integrates into existing integrated&lt;br /&gt;development tools such as IBM's Visual Age.&lt;br /&gt;For sophisticated data structures and control flow in a test script, Rebecca&lt;br /&gt;describes a blueprint for exporting recordings in a familiar format: the IDE's native&lt;br /&gt;programming language. This contrasts with traditional test systems which require&lt;br /&gt;the user to learn a proprietary scripting language.&lt;br /&gt;Re-recording of scripts after application changes have been made is reduced using&lt;br /&gt;runtime resolution of components and component-centric events.&lt;br /&gt;Recording script management is simplified with a VCR-like metaphor for creating,&lt;br /&gt;editing and executing tests. This allows the user to create and run a test in&lt;br /&gt;seconds.&lt;br /&gt;Rebecca also breaks new ground in the area of multiuser execution based testing:&lt;br /&gt;The ability to incorporate live and virtual users into a single test session using&lt;br /&gt;distributed triggers. With triggers, virtual users react to events generated by other&lt;br /&gt;8&lt;br /&gt;users (live or virtual). Existing test systems completely prescribe a test session&lt;br /&gt;which precludes meaningful live user participation.&lt;br /&gt;Virtual users can react to four classes of events using triggers: user interface, state&lt;br /&gt;change, timer, and customized. This allows the virtual user to respond to virtually&lt;br /&gt;an application activity, much like a live user.&lt;br /&gt;Threshold models are provided which allow the tester specify the characteristics of&lt;br /&gt;an event or sequence of events that will fire a trigger. A threshold model has a&lt;br /&gt;user interface component, which allows runtime specification of firing conditions.&lt;br /&gt;An extensible object oriented framework for complete customization is also&lt;br /&gt;included.&lt;br /&gt;Improvements to synchronization during multiuser playback including an&lt;br /&gt;orchestration metaphor, simplified synchronization mechanisms, deadlock&lt;br /&gt;detection, and deadlock recovery.&lt;br /&gt;A global recording clipboard, which simplifies the process of sharing some or all of&lt;br /&gt;a recording between virtual users.&lt;br /&gt;Ability to record, playback, and monitor application communication while&lt;br /&gt;maintaining independence from the communication mechanism. Existing test&lt;br /&gt;systems do not provide the ability to monitor application communication. The&lt;br /&gt;few academic systems that do provide this ability are mechanism specific.&lt;br /&gt;A resource conserving architecture. This allows the system to run in tandem with&lt;br /&gt;an IDE, and improves scalability as the number of users participating in a test&lt;br /&gt;increases.&lt;br /&gt;It is expected that Rebecca will impact the development of future execution based testing&lt;br /&gt;systems and collaborative software. Rebecca promotes the integration of testing early in&lt;br /&gt;the software life cycle. This is critical because studies have shown that the earlier a bug is&lt;br /&gt;discovered the less expensive it is to correct. The architecture also provides guidance for&lt;br /&gt;the development of future multiuser testing. This guidance includes independence from&lt;br /&gt;the application's communication infrastructure, improvements to multiuser&lt;br /&gt;synchronization, triggers, and a scalable design. Finally, Rebecca-J, a Java-based&lt;br /&gt;implementation of the test system architecture is available for immediate use for the&lt;br /&gt;development and testing of Java-based collaborative software. In addition to&lt;br /&gt;improvements in multiuser testing, this should immediately benefit the research&lt;br /&gt;community by alleviating the need for live users during a multiuser test.&lt;br /&gt;1.2.3 Evaluation&lt;br /&gt;We believe the evaluation of our methodology and testing architecture was a success.&lt;br /&gt;Unsolicited correspondence from the RCN team (see Section A.25) showed gratitude for&lt;br /&gt;9&lt;br /&gt;the problems uncovered by the CAMELOT and Rebecca approach. Two-dozen bugs&lt;br /&gt;were discovered in this mature CSCW application. Some of the problems were cosmetic.&lt;br /&gt;However, some of them were serious and are being corrected to make RCN a robust&lt;br /&gt;application.&lt;br /&gt;Rebecca was also significantly improved. Flaws in the component management&lt;br /&gt;architecture were uncovered and corrected. Problems with modal dialogs were also fixed.&lt;br /&gt;Finally, several ideas for enhancements to Rebecca were formulated.&lt;br /&gt;1.3 Overview of this Document&lt;br /&gt;The rest of the document is broken up into several chapters. Chapter 1 gives the reader an&lt;br /&gt;understanding of the scope of CSCW and describes some of the major groupware toolkits.&lt;br /&gt;Chapter 2 describes CSCW application CollabBillboard and the lessons we learned from&lt;br /&gt;creating the software. One of the biggest problems we had developing CollabBillboard&lt;br /&gt;was testing the system between versions. Chapter 4 looks at state of the art academic and&lt;br /&gt;commercial contributions to the field of software testing. Chapter 5 describes&lt;br /&gt;CAMELOT, the CSCW Application Methodology for Testing, a set of techniques we have&lt;br /&gt;developed specifically for testing collaborative software. Chapter 6 describes Rebecca, an&lt;br /&gt;architecture we have created for a collaborative software testing system. In Chapter 7,&lt;br /&gt;CAMELOT and Rebecca are evaluated by using them to test the Reconfigurable&lt;br /&gt;Collaboration Network. Chapter 8 concludes the thesis with some thoughts on future&lt;br /&gt;work.&lt;br /&gt;10&lt;br /&gt;2 A Survey of Computer Supported Cooperative Work&lt;br /&gt;Groupware applications can be classified in several ways. One common method of&lt;br /&gt;categorization looks at how an application deals with issues of time and space. When&lt;br /&gt;multiple users are using a groupware application, when and where interaction occurs helps to&lt;br /&gt;define the system’s capabilities. Temporally, users can interact at the same time or at&lt;br /&gt;different times. Spatially, users can interact in the same place or from different places.&lt;br /&gt;Figure 2 illustrates these possibilities.&lt;br /&gt;Figure 2: Time/Space Taxonomy of Groupware [5]&lt;br /&gt;An example of same time/same place groupware is Rensselaer’s Design Conference Room&lt;br /&gt;Collaboration Network. This software was designed for face-to-face design meetings.&lt;br /&gt;Participants each have access to a private workstation and use a floor control policy to&lt;br /&gt;control access to a shared public workstation [15]. Chat programs like Internet Relay Chat&lt;br /&gt;(IRC) are examples of same time/different place groupware. Users communicate with&lt;br /&gt;each other via shared text windows where messages are typed and responses are viewed in&lt;br /&gt;real-time [16]. Other groupware that facilitates communication between users at the same&lt;br /&gt;time regardless of spatial location is known as synchronous groupware. E-mail is an example of&lt;br /&gt;different time/different place groupware. The final category, different time/same place&lt;br /&gt;groupware, has no known applications. A bulletin board where people could leave&lt;br /&gt;messages for each other demonstrates this type of collaboration [5].&lt;br /&gt;synchronous&lt;br /&gt;distributed&lt;br /&gt;interaction&lt;br /&gt;asynchronous&lt;br /&gt;interaction&lt;br /&gt;face-to-face&lt;br /&gt;interaction&lt;br /&gt;Same Time Different Times&lt;br /&gt;Same&lt;br /&gt;Place&lt;br /&gt;Different&lt;br /&gt;Place&lt;br /&gt;asynchronous&lt;br /&gt;distributed&lt;br /&gt;interaction&lt;br /&gt;11&lt;br /&gt;2.1 Groupware Applications&lt;br /&gt;This section presents an overview of the major classes of CSCW applications and some of&lt;br /&gt;the important software systems that have been developed to implement them.&lt;br /&gt;Some Common CSCW Applications: A number of common computer applications fall&lt;br /&gt;under the domain of CSCW. Electronic mail consists of the asynchronous exchange of&lt;br /&gt;information between a sender and one or more recipients [17].&lt;br /&gt;Newsgroups operate in a manner similar to electronic mail. Information is exchanged&lt;br /&gt;asynchronously between a sender and newsgroup covering a specific topic of interest&lt;br /&gt;through activity known as “posting”. Users interested in a particular newsgroup then&lt;br /&gt;download and read postings using a newsreader. Newsgroups are a more public form of&lt;br /&gt;expression than electronic mail, which directs messages to a limited group of recipients&lt;br /&gt;[18].&lt;br /&gt;Chat allows two or more users to communicate synchronously using text. Users can add&lt;br /&gt;messages to the shared text window by typing in a private compose area of the client chat&lt;br /&gt;program and selecting a “send text” option. Within seconds, the text message will appear&lt;br /&gt;in the shared text window of all room occupants. Chat is more conversational than&lt;br /&gt;electronic mail or newsgroups, because of the real-time communication [19].&lt;br /&gt;Videoconferencing is a method of synchronous distance communication between participants&lt;br /&gt;using live audio and video. There is strong interest in this technology because of the time&lt;br /&gt;and money involved in attending face-to-face meetings. Despite obvious benefits,&lt;br /&gt;videoconferencing has not replaced face-to-face meetings because of issues like lack of&lt;br /&gt;support for eye contact, difficulty integrating remote users from multiple sites, and&lt;br /&gt;insufficient network bandwidth [20].&lt;br /&gt;Workflow software is asynchronous groupware that helps improve the process of&lt;br /&gt;performing multi-person tasks in the workplace. Some examples of improvement include&lt;br /&gt;reduced lag time because manual task routing is eliminated; and better feedback about the&lt;br /&gt;state of the tasks that comprise the business process [21].&lt;br /&gt;12&lt;br /&gt;Shared Windows: A shared window system allows synchronous collaboration through a&lt;br /&gt;logical window physically replicated on the screens of participating users. A user provides&lt;br /&gt;input and views output in this window in exactly the same manner as other windows on&lt;br /&gt;the display. However, any action in a shared window is immediately reflected on the&lt;br /&gt;displays of other participating users [22]. Single user applications running inside the shared&lt;br /&gt;window are collaborative with no modifications. This straightforward approach to&lt;br /&gt;collaboration has drawbacks, however. WYSIWIS is the only viewing option, and conflict&lt;br /&gt;resolution is limited to a generalized floor control policy. Social awareness is supported to&lt;br /&gt;a limited degree in some systems through telepointers and a shared transparent layer where&lt;br /&gt;users can make graphical and text annotations [8]. Shared windows are used in areas&lt;br /&gt;including&lt;br /&gt;classroom/meeting support: where users can share the same view of an application&lt;br /&gt;relevant to the discussion;&lt;br /&gt;technical support: where technician can walk a user through a software problem&lt;br /&gt;VConf was one of the earliest shared window systems developed [23]. Rensselaer’s Design&lt;br /&gt;Conference Room Collaboration Network takes the idea of shared windows to an extreme&lt;br /&gt;by sharing windows, display, and an entire workstation between users [15]. Farralon’s&lt;br /&gt;Timbuktuand Travelling Software’s LapLinkare examples of commercial systems.&lt;br /&gt;Multiuser Editing: Multiuser editing can be asynchronous or synchronous. Within these&lt;br /&gt;divisions, further specialization occurs based on the type of information (e.g. text,&lt;br /&gt;graphics). Multiuser asynchronous text editors allow multiple users to edit the same document&lt;br /&gt;over time. At any specific moment, only one user can be editing the document.&lt;br /&gt;Synchronization of the document among users, distributed access, and control are essential&lt;br /&gt;requirements for this type of editing system. Commercial word processors like Microsoft&lt;br /&gt;Wordprovide primitive support through file locking which prevents simultaneous&lt;br /&gt;editing, and file sharing which provides multiuser access and control. Despite limited&lt;br /&gt;capabilities, the basic asynchronous multiuser editor mirrors how collaborative documents&lt;br /&gt;are produced and has been readily adopted by the business community.&lt;br /&gt;13&lt;br /&gt;More advanced asynchronous editors support a variety of collaboration styles. This is&lt;br /&gt;important because it has been observed that collaboration needs change during a&lt;br /&gt;document’s evolution [24]. The PREP asynchronous editor breaks a document up into&lt;br /&gt;layers called columns. The main text, co-author’s notations, and comment&lt;br /&gt;request/responses are examples of columns. A column is composed of chunks that&lt;br /&gt;correspond to a logical unit of information (e.g. paragraph, request/response pair). Each&lt;br /&gt;user receives a copy of the document. Updates to the local copy are received from other&lt;br /&gt;users on a periodic basis. Specialized software helps the user visualize and integrate&lt;br /&gt;remote updates into the local copy. How these updates are sent and received is controlled&lt;br /&gt;by user configurable parameters of interaction. Grain size controls the size (column, chunk,&lt;br /&gt;keystroke) of a document update. Flow determines when an update occurs (automatic,&lt;br /&gt;upon request). Transmission speed controls how fast information must flow from one site to&lt;br /&gt;another via the network. PREP also allows users to manage the task of multiuser editing.&lt;br /&gt;Users are able to negotiate interaction parameters, set document access control, and make&lt;br /&gt;commitments to deadlines [25]. Quilt[26] is an example of another asynchronous editor.&lt;br /&gt;Multiuser synchronous text editors allow multiple users to simultaneously edit the same text&lt;br /&gt;document. Changes made to the document by one user are immediately seen by all other&lt;br /&gt;users. Users may be allowed separate, independent views of the document (What You See&lt;br /&gt;Is Not What I See - WYSINWIS) as in the GroupKit Fish-Eye editor [27], or the views&lt;br /&gt;may be linked (What You See Is What I See - WYSIWIS) as with XEROX PARC’s Cnoter&lt;br /&gt;[28]. Conflict invariably arises during multiuser editing sessions. Rapport [29] uses a&lt;br /&gt;floor control mechanism in which users request permission to modify sections of the&lt;br /&gt;document. GROVE [5] relies on simple voice communication to resolve differences.&lt;br /&gt;Cognoter [9] uses access control to prevent other users from modifying an area that is&lt;br /&gt;already being changed. Public and private access to sections of the document is a desirable&lt;br /&gt;capability. GROVE supports the ability to limit a section’s read/write access to one or&lt;br /&gt;more users. The editing experience is enhanced through social awareness: allowing a user&lt;br /&gt;to know who else is modifying the document, and where the changes are being made.&lt;br /&gt;Groupkit’s Fish-Eye editor uses icons to represent each user in the editing session, and a&lt;br /&gt;graphical representation of the entire document with fish-eye lenses over sections that&lt;br /&gt;14&lt;br /&gt;users are currently editing. Many editors also support telepointers (also known as ghost&lt;br /&gt;cursors, or remote pointers) to indicate the location of a remote user’s pointer.&lt;br /&gt;Multiuser synchronous editing of complex information requires functionality similar to the&lt;br /&gt;synchronous text editors mentioned above. Shared drawing systems are an example of this&lt;br /&gt;kind of editor. Users share a drawing area where text, 2-dimensional graphics, and images&lt;br /&gt;can be manipulated [30]. Any change made to the drawing area by one user is immediately&lt;br /&gt;seen by all other users. Microsoft NetmeetingWhiteboard is a commercial example of&lt;br /&gt;such a system. NetMeeting Whiteboard supports WYSINWIS by dividing the shared&lt;br /&gt;drawing area into sheets, with a set of horizontal and vertical scrollbars for navigation&lt;br /&gt;within a single sheet. Users are allowed to lock sheets to prevent other users from making&lt;br /&gt;changes. There is no support for a private drawing area. Social awareness is limited to&lt;br /&gt;telepointers. One unique feature of the system is the ability to cut and paste any visible&lt;br /&gt;window or window portion onto the drawing surface [31].&lt;br /&gt;Meeting Support: Meeting Support consists of technological and physical environment&lt;br /&gt;additions to a conference room. Interest in this type of technology is widespread because&lt;br /&gt;statistics have shown that workers spend an average of 30-70 percent of their time in&lt;br /&gt;meetings [9]. Technologically, networked computers, whiteboards, shared views, and a&lt;br /&gt;Group Decision Support System (GDSS) are the major components. Most CSCW&lt;br /&gt;meeting rooms allocate one networked computer per attendee, and network to other&lt;br /&gt;devices in the room. The stand-alone whiteboard, an important focus of attention in the&lt;br /&gt;regular conference room, is integrated electronically. Rensselaer’s Design Conference&lt;br /&gt;Room (DCR) [15] includes a Softboard™ whose software records activity as the user&lt;br /&gt;writes with a magic marker on the whiteboard surface. In addition to saving the final&lt;br /&gt;board image, the software can play back the strokes that created the image. XEROX&lt;br /&gt;PARC’s DOLPHIN System [32] and Berkeley’s Colab System [9] take whiteboards to a new&lt;br /&gt;level with liveboards which are essentially large touch sensitive computer displays.&lt;br /&gt;Handwriting recognition, sketching, and gesturing capabilities facilitate interaction with the&lt;br /&gt;device.&lt;br /&gt;The issue of public and private information is an important one during meetings.&lt;br /&gt;Sometimes, users may wish to share information displayed on a private display, while at&lt;br /&gt;15&lt;br /&gt;other times there is a desire for privacy. Colab allows a single window to be shared among&lt;br /&gt;meeting members. This window is usually displayed on the liveboard at all times.&lt;br /&gt;Anything a user wants to share with the rest of the group must be pasted into this shared&lt;br /&gt;window. DOLPHIN uses a sophisticated shared hypermedia document model where&lt;br /&gt;artifacts generated privately can be shared between users and the liveboard. The DCR&lt;br /&gt;allows sharing through a public computer and display. Users can access the public&lt;br /&gt;computer/display through their private computer’s keyboard and mouse.&lt;br /&gt;Many CSCW meeting rooms include special GDSS software to facilitate the meeting&lt;br /&gt;process. The DCR provides a set of flexible, unstructured tools including floor control for&lt;br /&gt;controlling the public display, anonymous chat for brainstorming and private chat for side&lt;br /&gt;conversations. Colab provides two applications: Cognoter and Argnoter. Cognoter is&lt;br /&gt;used for group creation of presentations. Software guides the participants through three&lt;br /&gt;stages: brainstorming, organizing, and evaluation. Argnoter is used for group decisions on&lt;br /&gt;competing proposals. The program brings participants through three different stages:&lt;br /&gt;proposing, arguing, and evaluating. GroupSystems [33] provides applications to support&lt;br /&gt;brainstorming, commenting on a specific topic, and idea organization.&lt;br /&gt;The physical design of the conference room is very important. Colab and DOLPHIN&lt;br /&gt;accommodate six participants around a U-shaped table. The liveboard is placed at the top&lt;br /&gt;of the “U”. GroupSystem accommodates 24 participants with two concentric tiered rows&lt;br /&gt;of seats centered around a large shared display. Each participant has access to a computer&lt;br /&gt;and display, which is slightly recessed to allow greater visual contact with other users. The&lt;br /&gt;DCR uses a hexagonal table that accommodates six. Each participant has a private&lt;br /&gt;computer and public access to a shared computer and display. One unique property of the&lt;br /&gt;DCR is that all display devices are completely recessed within the table. This affords users&lt;br /&gt;total use of the conference room table surface, removes visual obstructions completely,&lt;br /&gt;and helps to make the technology less obtrusive.&lt;br /&gt;Simulations: Simulations involving multiple participants have become commonplace&lt;br /&gt;with the ubiquity of networked computers in diverse application domains including&lt;br /&gt;defense, aeronautics, and entertainment. The U.S. Department of Defense has been&lt;br /&gt;actively developing networked simulators over the past decade. The result of this effort is&lt;br /&gt;16&lt;br /&gt;the Distributed Interactive Simulation (DIS), a set of protocols that allow network&lt;br /&gt;connected simulators to participate in synchronous combat operations using a shared&lt;br /&gt;electronic terrain [34]. Advantages of DIS over single user simulators include group instead&lt;br /&gt;of individual training, support for user participation anywhere on earth, time sensitive&lt;br /&gt;challenges that demand immediate responses from the users, creation of new tasks based&lt;br /&gt;on the actions of the users, and rich interaction possibilities due to the large number&lt;br /&gt;entities (user and computer controlled) simultaneously supported [35].&lt;br /&gt;In the entertainment arena, multiuser games enhance the recreational experience because&lt;br /&gt;they allow cooperation/competition with live users. Presumably, a live user will offer&lt;br /&gt;more interesting challenges than a computer generated opponent. A synchronous&lt;br /&gt;simulated automotive race is much more interesting if the car being challenged belongs to&lt;br /&gt;a friend down the hall (or in the next state!) [36]. Communication between users, if&lt;br /&gt;supported at all, is limited to a shared chat window. Game servers are appearing on the&lt;br /&gt;internet that allow users to join in games with other users anywhere in the world, anytime&lt;br /&gt;of the day or night (e.g. Microsoft’s Internet Gaming Zone, Blizzard’s battle.net, Mplayer,&lt;br /&gt;Iron Wolf) [37]. Sample games include public domain systems like Xpilot and Netrek and&lt;br /&gt;commercial systems like Warcraft, Quake II, and Jedi Knight.&lt;br /&gt;Computer Supported Collaborative Learning (CSCL): CSCL applications occupy an&lt;br /&gt;entire sub-discipline within CSCW. Any application that facilitates both cooperation and&lt;br /&gt;learning falls under the CSCL umbrella. Some important areas of research include distance&lt;br /&gt;learning, teaching rooms, knowledge construction, and shared reality.&lt;br /&gt;Distance Learning is playing an increasingly important role at the college level. A distance&lt;br /&gt;learning student is usually a full-time professional taking classes part-time. Most courses&lt;br /&gt;are viewed as lectures broadcast live (or tape delayed). Interaction with the lecturer and&lt;br /&gt;on-campus class is limited to the telephone, and asynchronous text exchanges (e-mail,&lt;br /&gt;newsgroups, or the web) [38]. A lack of real-time interaction inhibits the kind of exchange&lt;br /&gt;seen in the regular classroom, and in face-to-face collaboration [39]. Desktop&lt;br /&gt;videoconferencing technology may help to solve this problem, however this has its own&lt;br /&gt;challenges. It is difficult for an instructor maintain an awareness of remote students (i.e.&lt;br /&gt;17&lt;br /&gt;gestures, gaze direction, body language) simultaneously at multiple sites. Turn taking is&lt;br /&gt;also a problem [40].&lt;br /&gt;Teaching Rooms are classrooms that incorporate computing technology to facilitate&lt;br /&gt;synchronous, face-to-face cooperative learning. Each student usually has access to a&lt;br /&gt;networked connected computer. The computer display can be recessed to give the student&lt;br /&gt;a line of sight to the lecturer. The lecturer may have the ability to display information on&lt;br /&gt;his computer on a large screen visible to the entire class. The instructor may also have the&lt;br /&gt;ability to project any student’s display onto the large screen [40].&lt;br /&gt;Rensselaer’s Collaborative Classroom (CC) [41] has made a number of improvements to&lt;br /&gt;the basic teaching room. The CC provides seating for teams of two to six students per&lt;br /&gt;table. Embedded in the table is a networked Windows workstation. Students share&lt;br /&gt;control of this workstation using specialized software that runs on their private laptops, or&lt;br /&gt;with shared keyboards and mice provided with the table. Any computer in the room can&lt;br /&gt;view the display of, or take control of any other computer in the room. This allows variety&lt;br /&gt;of interaction styles including instructor demonstration, peer learning, team meetings,&lt;br /&gt;instructor consultation, client consultation and class-wide presentation and critique.&lt;br /&gt;Research has shown that teaching rooms can create experiences that are more interesting&lt;br /&gt;for students than the traditional classroom. The teaching room is not a panacea, and has&lt;br /&gt;had mixed responses from faculty. Some refuse to return to an ordinary classroom.&lt;br /&gt;Others apply newly discovered teaching techniques to the regular classroom. Still others&lt;br /&gt;find changes in teaching styles are too radical, and decide to return to a more traditional&lt;br /&gt;lecture format [40].&lt;br /&gt;Knowledge Construction: Knowledge Construction focuses on collective building of domain&lt;br /&gt;understanding. A newsgroup is a basic form of group knowledge construction. The&lt;br /&gt;Computer Supported Intentional Learning Environment (CSILE) system, from the&lt;br /&gt;Ontario Institute for Studies in Education, is a community database created by students on&lt;br /&gt;networked computers on and off campus [42]. Students can create multimedia notes,&lt;br /&gt;comment on other student’s notes (with automatic notification to the original author), and&lt;br /&gt;organize notes into different informational structures. The Collaboratory Notebook&lt;br /&gt;18&lt;br /&gt;provides students access to a shared multimedia document modeled after a scientific&lt;br /&gt;notebook [43]. A student can create eight kinds of pages: questions, conjectures, evidence&lt;br /&gt;for, evidence against, plans, steps in plans, and commentaries. Hyperlinks provide the&lt;br /&gt;ability to create non-sequential relationships between the pages. Other systems modeled&lt;br /&gt;after the collaborative notebook include CaMILLE for engineering students [44] and&lt;br /&gt;CALE for medical students [44]. KMap is a web-based tool for creating and browsing&lt;br /&gt;concept maps [45]. A concept map is a visual representation of information and forms of&lt;br /&gt;argument. KMap represents pieces of knowledge as text-labeled nodes, with links between&lt;br /&gt;the nodes representing knowledge relationships. When the cursor is over a node, the user&lt;br /&gt;can select from a list of associated multimedia information. KMap can be used to generate&lt;br /&gt;concept maps individually or in a group, then to place them on the web for wider audience&lt;br /&gt;to comment and improvement. Some of the advantages of knowledge construction are&lt;br /&gt;elimination of turn taking problems, peer commentary, progressive results, time for&lt;br /&gt;reflection, independent thought, and cumulative/progressive results [42].&lt;br /&gt;Shared Reality: Shared Reality refers to computer constructed worlds where students can&lt;br /&gt;explore, collaborate, and learn. Examples of shared realities include Multiuser Dungeons&lt;br /&gt;(MUDs), microworlds, and collaborative games. A MUD is a text based shared reality that&lt;br /&gt;consists of rooms, exits, objects, and users. A server hosts the MUD, accepts user&lt;br /&gt;connections, allows users to manipulate and add to the shared reality, and supports&lt;br /&gt;interaction between users. Users communicate synchronously via a chat-like interface.&lt;br /&gt;This same interface also reports the results of interactions with objects and rooms.&lt;br /&gt;Historically, MUDs have been a form of recreational activity; however, recent applications&lt;br /&gt;include MUDs for astrophysicists [46], system administrators [47], and students. For&lt;br /&gt;example, MOOSE Crossing is an educational system where children develop social and&lt;br /&gt;computer skills by programming rooms and objects for a MUD [48]. MUDs are an&lt;br /&gt;effective community for learning because they provide motivation for learning, emotional&lt;br /&gt;support, technical support, and an appreciative audience [49].&lt;br /&gt;SharedARK is a system for creating synchronous, shared microworlds [50]. A SharedARK&lt;br /&gt;microworld is an infinite, shared, two-dimensional “flatland” of which only a small portion&lt;br /&gt;is visible on any one-computer display. Users manipulate objects using a hand shaped&lt;br /&gt;19&lt;br /&gt;pointer. The system can operate in both face-to-face and distance modes. When users&lt;br /&gt;encounter each other in SharedARK, they can set up audio/video links. A basic model of&lt;br /&gt;the physical world is built into the system. Users can experiment and create objects that&lt;br /&gt;have mass, density, and momentum. Several applications have been created including the&lt;br /&gt;Puckland [51] simulator for elastic collisions and ARKCola [52], a simulation of a soft drink&lt;br /&gt;bottling plant. Experiments with SharedARK systems have shown that students are more&lt;br /&gt;engaged and perform deeper evaluations of problem sets than they do when working with&lt;br /&gt;paper and pencil [50].&lt;br /&gt;Other examples of shared reality include MacCandy [53] and TurboTurtle [54]. MacCandy&lt;br /&gt;simulates a candy factory where candies are packed in rolls of ten and rolls are packed in&lt;br /&gt;boxes of ten. The system was designed to help second grade students learn about&lt;br /&gt;estimation, symbology, and addition/subtraction. The microworld is the focus of&lt;br /&gt;classroom-wide discussion when displayed on the instructor’s screen at the front of the&lt;br /&gt;room. TurboTurtle is a system for exploring Newtonian physics, similar to SharedARK.&lt;br /&gt;A distinguishing feature of the system is its sophisticated support for awareness of other&lt;br /&gt;users including user lists, telepointers, and shared widget controls.&lt;br /&gt;2.2 Groupware Toolkits&lt;br /&gt;With so many issues to consider, building a groupware application can be a daunting task.&lt;br /&gt;Researchers have attempted to reduce the development burden by producing groupware&lt;br /&gt;toolkits. Most of this work has been aimed at synchronous groupware. These toolkits&lt;br /&gt;contain generic building blocks that can be used to assemble a CSCW application faster&lt;br /&gt;than conventional single user development tools. Typical groupware toolkits address the&lt;br /&gt;four important areas [8]:&lt;br /&gt;Run-time Architecture – aid the programmer with process management, process&lt;br /&gt;interconnection and inter-process communication&lt;br /&gt;Programming Abstractions – make it easier for the programmer to synchronize&lt;br /&gt;distributed events and data&lt;br /&gt;Groupware Widgets – provide the programmer with a set of generic groupware GUI&lt;br /&gt;tools for synchronous multiuser applications&lt;br /&gt;Session Managers – allow programmer to customize how users create, join, leave, and&lt;br /&gt;manage participation in a CSCW application.&lt;br /&gt;20&lt;br /&gt;At last count, more than thirty groupware toolkits have been developed by the research&lt;br /&gt;community. Toolkits frequently cited as reference systems include Groupkit [55],&lt;br /&gt;Rendezvous [56], and Suite [57]. Groupkit is a Tcl/Tk based toolkit available on Unix,&lt;br /&gt;Windows95, and Macintosh platforms. It uses a replicated architecture, with event&lt;br /&gt;broadcasting when local changes need to be sent to remote users. Remote events are&lt;br /&gt;processed in a manner similar to local events. A large number of groupware widgets are&lt;br /&gt;provided including social awareness, multiuser toolbars and text widgets, telepointers, and&lt;br /&gt;transparent annotation windows. A programmer-configurable session manager is also&lt;br /&gt;furnished.&lt;br /&gt;Rendezvous is an LISP/X-Windows based toolkit available on Unix platforms. It is a&lt;br /&gt;centralized system based on Smalltalk’s Model-View-Controller (MVC) architecture [58].&lt;br /&gt;Much of the remote event handling and synchronization is abstracted into a programmable&lt;br /&gt;constraint system. By specifying constraints between user interface components and the&lt;br /&gt;data model, the constraint solver automatically keeps user views and their data&lt;br /&gt;synchronized. The toolkit is based on an object-oriented version of LISP that provides&lt;br /&gt;over 350 reusable classes. These classes include support for telepointers, floor control, and&lt;br /&gt;multiuser text and graphics. Classes are also included for session management.&lt;br /&gt;Suite is a C-based user interface independent toolkit available on Unix platforms. It is a&lt;br /&gt;centralized system designed around the concept of a multiuser text editor. Applications&lt;br /&gt;consist of editable objects, which are made up of publicly accessible shared variables.&lt;br /&gt;These shared variables are modified through calls issued from interaction variables&lt;br /&gt;associated with a specific local user interface. When an end user interacts with a widget, it&lt;br /&gt;modifies the interaction variable that in turn modifies the active variable. Changes to&lt;br /&gt;shared variables trigger update callbacks for the interaction variables of other users. Enduser&lt;br /&gt;coupling configuration is one unique feature of the system. Users are able to specify&lt;br /&gt;how frequently their user interface updates/is updated by the application’s shared objects.&lt;br /&gt;Suite is user interface independent, so there are no groupware widgets. Session&lt;br /&gt;management is enabled at a high level by giving the end user the ability create and modify&lt;br /&gt;21&lt;br /&gt;user groups within an application session. The programmer can add additional&lt;br /&gt;functionality like access control using Suite primitives.&lt;br /&gt;21&lt;br /&gt;3 A Preliminary Experiment&lt;br /&gt;We gained first hand experience with the difficulties of developing a CSCW application&lt;br /&gt;during the creation of CollabBillboard. CollabBillboard grew out of ideas we had been&lt;br /&gt;developing about assigned roles in a team [13]. Instead of dividing a task into smaller&lt;br /&gt;independent subtasks to be completed in parallel, team members are assigned different but&lt;br /&gt;complementary roles for completing a shared task. Our hypothesis was that explicitly&lt;br /&gt;assigned roles could induce stronger collaboration among team members. To test the&lt;br /&gt;hypothesis we developed this synchronous collaborative simulation.&lt;br /&gt;Although an evaluation of CollabBillboard supported the theory, we found the entire&lt;br /&gt;process frustrating. The biggest problem was how much we underestimated the amount&lt;br /&gt;of time needed to complete the application. It took almost three times longer than we&lt;br /&gt;expected! One of the major contributors to the delay was finding physical users to help&lt;br /&gt;test the application. For reasons discussed in Section 1.1, a single user, the developer, was&lt;br /&gt;not sufficient to thoroughly exercise the program. It was often necessary to comb the halls&lt;br /&gt;for volunteers, and as the months continued, they became increasingly reluctant.&lt;br /&gt;3.1 Architecture&lt;br /&gt;CollabBillboard is a synchronous face-to-face two-player simulation that attempts to&lt;br /&gt;address some shortcomings of previous multiuser simulations through explicitly assigned&lt;br /&gt;roles and group evaluation. Assigned roles require each user to take on a specific role&lt;br /&gt;during the simulation. These roles are complementary, but non-overlapping. Both users&lt;br /&gt;must cooperate within their roles in order to achieve the simulation goal. Group&lt;br /&gt;evaluation, rather than individual based, uses team based performance criteria.&lt;br /&gt;The CollabBillboard application is designed for networked personal computers running&lt;br /&gt;Windows 95 or NT. The development environment, Microsoft Visual C++ (VC++), was&lt;br /&gt;augmented with Microsoft Foundation Classes (MFC) for GUI support, DirectX for high&lt;br /&gt;performance graphics, and Winsock for communication.&lt;br /&gt;22&lt;br /&gt;Applications developed with VC++ and MFC have a structure oriented around the user&lt;br /&gt;interface. Each dialog is associated with a C++ class. Events generated by widgets in the&lt;br /&gt;dialog are converted to messages that invoke class methods. To enable multiuser&lt;br /&gt;capabilities, CollabBillboard includes a shadow socket class with each dialog. The socket&lt;br /&gt;shadow contains methods for communication setup/takedown, sending special events,&lt;br /&gt;and receiving special events. Send event methods report local events and data that are of&lt;br /&gt;interest to remote users. The receive event method converts remote user messages to a&lt;br /&gt;local event and data format. Figure 3 depicts the socket shadow class for the initial dialog&lt;br /&gt;panel. The member functions OnAccept and OnConnect are invoked during&lt;br /&gt;communication setup/takedown. SendOK is invoked by the dialog class method ButtonOK&lt;br /&gt;that is invoked when the user presses the OK button. OnReceive is invoked when a&lt;br /&gt;remote message arrives. For this dialog, OnReceive gets remote ButtonOK events and&lt;br /&gt;invokes same local dialog method.&lt;br /&gt;Class CcollabBillBoardDlgSocket : public CollabBillBoardSocket&lt;br /&gt;{&lt;br /&gt;private:&lt;br /&gt;void OnAccept(int theErrorCode);&lt;br /&gt;void OnConnect(int theErrorCode);&lt;br /&gt;void OnReceive(int theErrorCode);&lt;br /&gt;public:&lt;br /&gt;BOOL InitializeSockets();&lt;br /&gt;BOOL SendOK();&lt;br /&gt;};&lt;br /&gt;Figure 3: CollabBillboard socket shadow&lt;br /&gt;The system requires one machine per user. The complete simulation state is replicated on&lt;br /&gt;each machine. Participants can be situated at different physical locations. However, the&lt;br /&gt;game is designed with activities that require high bandwidth communication between&lt;br /&gt;participants. For this reason, a face-to-face experimental setup was used.&lt;br /&gt;3.2 Experimental Method&lt;br /&gt;A study was conducted by to evaluate the effect CollabBillboard might have on&lt;br /&gt;collaboration between pairs of users. The study used two versions of the program, one&lt;br /&gt;with and another without assigned roles. Time to completion, percent of time spent&lt;br /&gt;conversing, and accurate billboard placement were some performance criteria measured.&lt;br /&gt;Subjects were then given a paper and pencil collaborative exercise. The results of this&lt;br /&gt;23&lt;br /&gt;exercise were compared against a solution key. Finally, the subjects were given a survey to&lt;br /&gt;complete that allowed them to express their subjective feelings about the simulation and&lt;br /&gt;about collaborative experiences during the session.&lt;br /&gt;Figure 4: Sketch of experimental design.&lt;br /&gt;A long desk with monitors at opposite ends was set up in an office. Users sat on different&lt;br /&gt;sides of the desk, each in front of a monitor. The monitors were set up so that each could&lt;br /&gt;be seen only by the user in front of it, and were angled so that both users would sit&lt;br /&gt;between a three foot gap between the monitors on the table; this arrangement afforded&lt;br /&gt;line-of-site viewing for non-verbal communication.&lt;br /&gt;3.3 Task Overview&lt;br /&gt;Research participants worked on one of two versions of the CollabBillboard simulation.&lt;br /&gt;One version of the simulation used assigned roles, while the other (the control) did not.&lt;br /&gt;Participants were grouped into pairs, with each pair using one version of CollabBillboard.&lt;br /&gt;When the simulation was completed, participants worked through a classic paper and&lt;br /&gt;pencil collaborative exercise called Lost At Sea [59]. At the end of the experiment, the pair&lt;br /&gt;was asked to complete a survey about their experiences.&lt;br /&gt;Pairs of participants were scheduled for a one-hour session. When they arrived, they were&lt;br /&gt;introduced to each other, the tasks to be performed were explained, and they were asked&lt;br /&gt;to sign a consent form. A tape recorder was started to record the audio exchange during&lt;br /&gt;the CollabBillBoard portion of the session. Participants started the CollabBillBoard&lt;br /&gt;application on their respective machines. When network communication was established,&lt;br /&gt;24&lt;br /&gt;one of the users pressed the OK button on the initial dialog window, and both users were&lt;br /&gt;presented with a task menu.&lt;br /&gt;Figure 5: Selecting a billboard site in the city.&lt;br /&gt;The session moderator explained that the participants were part of a fictitious advertising&lt;br /&gt;company that wanted to place a billboard in the city of Boston. Two major tasks were&lt;br /&gt;needed to complete the application: select a site in the city to place the billboard; assemble&lt;br /&gt;the scrambled pieces of the billboard on the site’s billboard frame.&lt;br /&gt;Figure 6: Control window for assembling billboard. Both users see&lt;br /&gt;the same window, view the entire billboard frame and move pieces.&lt;br /&gt;The first task, Site Selection, brought up a shared map of the city of Boston, Massachusetts&lt;br /&gt;(see Figure 5). Telepointers were used to indicate remote user focus on the map. As users&lt;br /&gt;moved over possible sites, an informational window appeared describing the site. When a&lt;br /&gt;25&lt;br /&gt;site was selected, it was highlighted. These actions appeared on both participants' maps,&lt;br /&gt;with separate colors indicating a local or remote action. Once participants selected a site,&lt;br /&gt;they proceeded to the second task.&lt;br /&gt;The second task, Billboard Assembly, involved assembling randomly placed pieces of the&lt;br /&gt;billboard in the correct order and properly centering them on a billboard frame. At this&lt;br /&gt;point, the assigned roles and control versions of the program diverged. The control&lt;br /&gt;version brought up a shared billboard frame that users could add billboard pieces to. Each&lt;br /&gt;new piece appeared simultaneously in the same random location on both participants’&lt;br /&gt;screens (see Figure 6). Participants could grab and move any piece of the billboard at any&lt;br /&gt;time. The frame contained a green box representing the local user’s position in the frame.&lt;br /&gt;A red box represented the remote user’s position. To move a billboard piece, a user&lt;br /&gt;placed the green box on a billboard piece, selected the grab button, and then used the&lt;br /&gt;directional arrows. A zoom window was included for fine-grained piece movement.&lt;br /&gt;Figure 7: Assigned roles "view billboard" window. This user has a&lt;br /&gt;zoomed out view of the billboard frame but cannot move any pieces&lt;br /&gt;The assigned roles version of the program split the billboard piece assembly into separate&lt;br /&gt;subtasks: View Placement and Place Billboard. The View Placement task presented the&lt;br /&gt;user with a zoomed out view of the billboard frame. This user could see all billboard&lt;br /&gt;pieces and a green box, which represented the Place Billboard user’s view. The View&lt;br /&gt;Placement user could add new pieces to the frame, and move the other user’s view.&lt;br /&gt;26&lt;br /&gt;However, the View user could not move a billboard piece even if the Place user was&lt;br /&gt;currently grabbing one (see Figure 7).&lt;br /&gt;The Place Billboard task presented the user with a zoomed in section of the billboard&lt;br /&gt;frame. The Place Billboard user could navigate around the billboard frame using the&lt;br /&gt;dialog’s arrow widget. The user could also grab, move, and drop billboard pieces (see&lt;br /&gt;Figure 8).&lt;br /&gt;Figure 8: Assigned roles "place billboard" window. This user has a&lt;br /&gt;zoomed in view of the billboard frame and can move pieces.&lt;br /&gt;Complications arose with assigned roles because neither user could complete the&lt;br /&gt;simulation goal independently. The Place Billboard subtask had a view that represented a&lt;br /&gt;small portion of the billboard frame (approximately 1/4 of a billboard piece). This view&lt;br /&gt;could be very disorienting. The View Placement task had a good view of the frame, but&lt;br /&gt;did not allow the user to move billboard pieces. Consequently, both users depended on&lt;br /&gt;each other to complete the billboard assembly.&lt;br /&gt;Once the Billboard had been assembled in either the control or assigned roles version of&lt;br /&gt;the program, the team received a score based on four factors: choice of billboard site,&lt;br /&gt;properly assembled billboard, properly centered billboard, and time to completion. A brief&lt;br /&gt;discussion about the score with the moderator then ensued. At this point, the tape&lt;br /&gt;recorder was turned off.&lt;br /&gt;27&lt;br /&gt;The second part of the session involved a classic paper and pencil collaborative exercise&lt;br /&gt;called Lost At Sea. Participants were told to read a brief scenario where they imagined&lt;br /&gt;themselves on a sinking ship. They had to rank 15 items in the order that they would be&lt;br /&gt;taken because the ship might sink at any moment. After the task was completed, the&lt;br /&gt;moderator discussed the US Merchant Marine’s ranking of the same items.&lt;br /&gt;The final activity of the session was a survey. The survey covered three areas: subjective&lt;br /&gt;feelings about CollabBillboard, subjective feelings about collaboration during the session,&lt;br /&gt;and personal information. When the survey was completed, the participants were&lt;br /&gt;debriefed by the moderator.&lt;br /&gt;3.4 Evaluation, Results, and Analysis of Team Performance&lt;br /&gt;Team performance was determined using measurements depending on the stage of the&lt;br /&gt;session. For the CollabBillboard stage, five team measurements were used: choice of&lt;br /&gt;billboard site, properly assembled billboard, properly centered billboard, time to&lt;br /&gt;completion, and conversation as a percentage of task completion time.&lt;br /&gt;For the Lost at Sea stage, 17 team measurements were made. The first 15 were absolute&lt;br /&gt;values of the difference between the correct ranking for each item and the team’s ranking&lt;br /&gt;of the item. Next was a cumulative sum of these deltas. Finally, time to complete the&lt;br /&gt;stage was measured.&lt;br /&gt;For the exit survey stage, 31 questions were asked to subjectively assess CollabBillboard&lt;br /&gt;and collaborative experiences during the session. Most of these questions used a rating&lt;br /&gt;scale from one to five, with lower numbers representing a more positive feeling about the&lt;br /&gt;question and higher numbers indicating a negative feeling. A “no opinion” option was&lt;br /&gt;available for each question.&lt;br /&gt;The complete details of the results and analysis of team performance are available [13].&lt;br /&gt;The results and analysis of our study support the hypothesis that assigned roles can&lt;br /&gt;improve collaboration both during the simulation and in subsequent group activities.&lt;br /&gt;Although it took longer for the assigned roles group to complete the simulation, they&lt;br /&gt;produced higher quality results indicating collaboration that is more effective.&lt;br /&gt;28&lt;br /&gt;Conversation, another measure of collaboration, occurred during 85% of the assembly task&lt;br /&gt;for assigned roles and only 44% of the assembly task for the control. On the second&lt;br /&gt;collaborative activity, the assigned roles group completed the work in less time with&lt;br /&gt;superior results. In every instance that the exit survey had statistically valid mean&lt;br /&gt;differences, the responses were more positive about collaboration in the assigned roles&lt;br /&gt;group.&lt;br /&gt;3.5 Lessons Learned from the Development of CollabBillboard&lt;br /&gt;The development of CollabBillboard was a lengthier process than we had anticipated. Our&lt;br /&gt;original schedule called for three months to be spent on application development, but in&lt;br /&gt;actuality, eight months were needed to complete the system. A number of lessons were&lt;br /&gt;learned from reflecting on the experience. The lack of development tools, in particular, a&lt;br /&gt;VC++ groupware toolkit, contributed to the delay. Originally, we intended to build only&lt;br /&gt;the assigned user roles version of CollabBillboard. Building a second, control version was&lt;br /&gt;necessary to evaluate the system&lt;br /&gt;However, the majority of our time was spent developing, testing, and reworking the&lt;br /&gt;human-computer and human-human interfaces for the application. These interfaces&lt;br /&gt;account for a sizeable portion of the elements that make up a CSCW application. Testing&lt;br /&gt;them was a continual problem. Finding subjects to help test the application was also a&lt;br /&gt;challenge. It was often necessary to comb the halls for volunteers, and as the months went&lt;br /&gt;on, they became increasingly reluctant. The next several paragraphs present additional&lt;br /&gt;problems that we uncovered in the process of developing CollabBillboard that we feel may&lt;br /&gt;have been detected earlier and resolved more efficiently with a multiuser-testing&lt;br /&gt;environment.&lt;br /&gt;Usability testing examines the program's human factors issues [14]. General application issues&lt;br /&gt;include Is application appropriate to the user background and experience? Are outputs&lt;br /&gt;meaningful and non-offensive? Are error diagnostics meaningful? Are the interfaces&lt;br /&gt;consistent throughout the application? Are there too many options? Is the system easy to&lt;br /&gt;use? There is no formula for constructing a CSCW application because it is not always&lt;br /&gt;clear how some of the issues discussed in Chapter 1 should be addressed. As with other&lt;br /&gt;29&lt;br /&gt;GUI intensive applications, the correct implementation can require many iterations of a&lt;br /&gt;prototype followed by usability testing. Other issues requiring iterative usability testing&lt;br /&gt;include user interaction coordination, user awareness, undo/redo, locking policy, and&lt;br /&gt;session management&lt;br /&gt;In our implementation of CollabBillboard, we found that tight coupling of telepointers was&lt;br /&gt;visually distracting when users tried to select a site to place the advertising billboard. After&lt;br /&gt;several iterations of the program, a looser coupling was implemented where the local user&lt;br /&gt;was informed only when the remote user made a site selection [13].&lt;br /&gt;We ran into several shared workspace synchronization problems because local user actions&lt;br /&gt;interfered with the processing of a remote user action. For example, in an early prototype&lt;br /&gt;of the system one user could rotate the billboard picture. In test with live users, it was&lt;br /&gt;relatively easy for them to create a scenario where their pictures were rotationally&lt;br /&gt;unsynchronized. One problem that we had a lot of difficulty with later was correctly&lt;br /&gt;reflecting the positions of billboard pieces moved by the remote user. It took about a&lt;br /&gt;week of test trials with live users to find and debug the error. A similar kind of problem&lt;br /&gt;occurred with enforcing boundary conditions on pieces moved by remote users.&lt;br /&gt;Stress testing subjects the program to heavy loads or stresses. A stress test differs from a&lt;br /&gt;load test in that it focuses on data volume over time versus just data volume [14].&lt;br /&gt;Synchronous CSCW applications are particularly susceptible to stress problems because of&lt;br /&gt;interactivity requirements. Events processing on both the network and user machines are&lt;br /&gt;a common cause of interactivity loss. In CollabBillboard, for example, the control version&lt;br /&gt;of the simulation locked the local user out local mouse events when a remote user flooded&lt;br /&gt;the system with billboard piece move events. Several days of investigation uncovered a&lt;br /&gt;flaw in the Windows 95 OS design that gave network and DirectX graphics events priority&lt;br /&gt;over local mouse events. To circumvent this design, coupling was loosened by creating a&lt;br /&gt;temporal buffer that accumulated remote user draw events until a timer expired.&lt;br /&gt;Compatibility/Conversion testing identifies problems between the new software and preexisting&lt;br /&gt;programs and data [14]. Conversion issues revolve around the ability of the new software&lt;br /&gt;to support persistent storage data formats from earlier versions or other programs.&lt;br /&gt;30&lt;br /&gt;Conversion may also require the new software to output data in a format readable by&lt;br /&gt;preexisting software. The distributed nature of CSCW applications makes them&lt;br /&gt;particularly susceptible to compatibility problems when different machines have different&lt;br /&gt;versions of the executable. CollabBillboard suffered from several compatibility problems.&lt;br /&gt;Version 1.0 of CollabBillboard was made publicly available on the web in September 1997.&lt;br /&gt;A second version of the program was made available in April 1998. The event data&lt;br /&gt;generated by these versions are incompatible because of a change from reporting relative&lt;br /&gt;coordinates to absolute coordinates on the billboard frame. Version 2.0 of&lt;br /&gt;CollabBillboard provides two separate applications: user roles and control. Since the&lt;br /&gt;communication protocols for both forms are identical syntactically, it is possible to&lt;br /&gt;connect a client from one application with a server from the other. This combination&lt;br /&gt;results in an unstable environment that causes the application to crash when the users&lt;br /&gt;begin the billboard assembly task.&lt;br /&gt;Recovery testing exercises the software's ability to handle situations during programming,&lt;br /&gt;hardware, and data errors [14]. To test programming errors, code can be injected with&lt;br /&gt;problems (e.g. hard coding an invalid assert). Simulation is a common technique for&lt;br /&gt;testing hardware errors (e.g. returning a network message with an incorrect number of&lt;br /&gt;bytes). Data errors can be purposely created to analyze the system's reaction (e.g. user&lt;br /&gt;types in "-1" as number participants in a CSCW session). In addition to general kinds of&lt;br /&gt;recovery testing, CSCW applications should also test the effects of unpredictable or hostile&lt;br /&gt;remote user actions. Early testing of CollabBillboard discovered a problem when one user&lt;br /&gt;quit the session while the other user remained. The remaining user was able to use the&lt;br /&gt;application for several minutes until the system hung. The problem turned out to be that&lt;br /&gt;the network messaging API buffered messages for the non-existent remote user and when&lt;br /&gt;the buffer overflowed, the system froze.&lt;br /&gt;32&lt;br /&gt;4 Survey of Prior Work in Testing Systems&lt;br /&gt;As discussed in the previous chapter, our preliminary experiment with CollabBillboard&lt;br /&gt;provided us with first hand experience developing a synchronous CSCW application. One&lt;br /&gt;of the greatest difficulties we encountered was testing the software. Because it was a&lt;br /&gt;multiuser synchronous system, we needed several physical users exercising the application&lt;br /&gt;simultaneously. Because it was a GUI application with human-human and humancomputer&lt;br /&gt;interactions, we went through continual iterations to get the interface correct.&lt;br /&gt;Most of the people we asked for testing assistance were willing to help a few times, but we&lt;br /&gt;began to try their patience around the fourth or fifth system build.&lt;br /&gt;This chapter presents a survey of the state of the art in testing. The first goal of the survey&lt;br /&gt;was to uncover the major contributions made by academia and industry to software&lt;br /&gt;testing. The second goal was to understand current testing system shortcomings that&lt;br /&gt;prevent CSCW developers from effectively testing an application. The chapter is&lt;br /&gt;organized around four main sections. Section 4.1 lists the important goals of testing.&lt;br /&gt;Section 4.2 presents the research community's contributions to testing organized by the&lt;br /&gt;software life-cycle process. Section 4.3 discusses academic contributions to GUI-based&lt;br /&gt;testing. Finally, Section 4.4 analyzes three commercial testing systems.&lt;br /&gt;4.1 Goals of Testing&lt;br /&gt;Testing during the software lifecycle is a process by which the behavioral properties of the&lt;br /&gt;software are verified. These properties are correctness, utility, reliability, robustness, and&lt;br /&gt;performance.&lt;br /&gt;A program is behaving correctly if it "satisfies its output specifications independent of its use&lt;br /&gt;of computing resources when operated under permitted conditions" [10]. Correctness is&lt;br /&gt;neither a necessary, nor a sufficient condition for an acceptable program. Correctness is&lt;br /&gt;not necessary because some kinds of errors can be tolerated. For example, in a graphical&lt;br /&gt;editor, a "drag graphical object" command might cause artifacts to appear on the drawing&lt;br /&gt;surface along an object's path. This kind of behavior might be considered a bug, but is&lt;br /&gt;acceptable if the user is provided with some form of drawing surface refresh command&lt;br /&gt;33&lt;br /&gt;that removes the artifacts. Correctness is also not a sufficient condition for an acceptable&lt;br /&gt;program. A program may satisfy its specifications, but the specifications may be incorrect.&lt;br /&gt;The utility of a program is determined by the extent to which it meets user needs. Utility&lt;br /&gt;answers questions about things like ease-of-use and cost effectiveness. Typically, a&lt;br /&gt;program is utility tested in a friendly environment with only valid input. Utility is&lt;br /&gt;extremely important, because if the product does not perform useful functions, then there&lt;br /&gt;is no point in further testing. Work done with Rensselaer's DCR illustrates this&lt;br /&gt;importance. A great deal of effort was expended developing a floor control policy for&lt;br /&gt;shared use of the system's public workstation. The policy was implemented in software as&lt;br /&gt;a FIFO queue. Meeting participants taking control of the public workstation had to make&lt;br /&gt;a request, which was added to the queue behind other requests. When the participant was&lt;br /&gt;at the top of the request queue, s/he was allowed to control the public workstation.&lt;br /&gt;Although the system was straightforward from a programming standpoint, analysis&lt;br /&gt;showed that users tended to ignore the floor control policy, opting instead for a simple&lt;br /&gt;control interrupt capability added later.&lt;br /&gt;Reliability refers to a program's mean time to failure. Ideally, the program and its&lt;br /&gt;supporting infrastructure should never fail, but the cost of verifying this level of reliability&lt;br /&gt;can be prohibitively expensive. One area where high reliability is justified is life-critical&lt;br /&gt;applications such as aviation software. The Federal Aviation Administration refuses to&lt;br /&gt;allow commercial off-the-shelf (COTS) software in any portion of the nation's aviation&lt;br /&gt;system, relying instead on thoroughly tested, but expensive, customized software. COTS&lt;br /&gt;software, like Windows 95, is notoriously unreliable, and while a simple reboot for a&lt;br /&gt;system hang is tolerated by most PC users, it could spell disaster for a busy air traffic&lt;br /&gt;control system [60]. For less critical applications a return on investment analysis can&lt;br /&gt;determine how much testing will ensure a level of reliability that will keep customers&lt;br /&gt;satisfied.&lt;br /&gt;A program is considered robust if it is able to handle different, possibly hostile, operating&lt;br /&gt;conditions, input, and users. The application should tolerate a variety of operating&lt;br /&gt;environments in its supporting infrastructure. This infrastructure includes hardware and&lt;br /&gt;software associated with the network CPU, disk and graphics device. A robust CSCW&lt;br /&gt;34&lt;br /&gt;application, for example, should gracefully handle heavy network loads when trying to&lt;br /&gt;send and receive events between users. Handling invalid input is also important. If a user&lt;br /&gt;types "-1" for the number of participants in a collaborative session, the system should&lt;br /&gt;prompt for a correction. Hostile user actions should also be anticipated. If the user&lt;br /&gt;hosting a CSCW session exits before the rest of the team, the application should ensure&lt;br /&gt;either that the session artifacts are saved, or that the session continues by using a different&lt;br /&gt;host.&lt;br /&gt;Performance is another important criterion that must be verified before the CSCW&lt;br /&gt;application is released. Interactive feedback from a user action must be approximately 16&lt;br /&gt;milliseconds to avoid a feeling of sluggishness [11]. An additional rule of thumb is that&lt;br /&gt;local user performance should always take priority over processing remote user actions.&lt;br /&gt;This means that the developer must be careful that tight coupling does not impact local&lt;br /&gt;activity. In the CollabBillboard application, movement of billboard pieces during the&lt;br /&gt;assembly task was tightly coupled. When live user testing began, it was discovered that&lt;br /&gt;piece updates from the remote user created a feedback cycle that excluded local user&lt;br /&gt;actions until the stream of remote updates ended. Piece movement coupling had to be&lt;br /&gt;loosened to allow local user actions to be processed. The choice of centralized versus&lt;br /&gt;replicated architecture has a big impact on performance. A replicated architecture will&lt;br /&gt;usually have better performance, while the centralized architecture will have less&lt;br /&gt;complicated synchronization and locking mechanisms. One method for verifying the&lt;br /&gt;system will perform acceptably for its chosen architecture is to observe how it behaves&lt;br /&gt;under a scalability test. Network performance is critical to the overall performance of the&lt;br /&gt;CSCW application. The application may consume too much bandwidth when sending&lt;br /&gt;messages between machines. This happens when messages occur too frequently, contain&lt;br /&gt;too much information, or both. Acceptable bandwidth use can be verified by exercising&lt;br /&gt;the application over the network. The application also needs to be tested under various&lt;br /&gt;network conditions including heavy traffic from sources outside the application, increased&lt;br /&gt;traffic from scaling the number of users, and message delay over a wide area network.&lt;br /&gt;35&lt;br /&gt;4.2 Research Testing Systems&lt;br /&gt;This section discusses the contribution that the research community has made to testing&lt;br /&gt;state of the art. It is organized around a modified version of the phases of the software life&lt;br /&gt;cycle: requirements, specifications, design, implementation, integration, and maintenance.&lt;br /&gt;The software life-cycle model describes the process of creating and maintaining a software&lt;br /&gt;application. Competing life-cycle models have been developed over the past several&lt;br /&gt;decades. These models were created to combat the inefficient process of “build and fix”&lt;br /&gt;where developers built some software components, showed the results to the client, and&lt;br /&gt;fixed the software based on client feedback. Most popular models in the literature evolved&lt;br /&gt;from the Waterfall Model [61]. In the waterfall model, software production is broken&lt;br /&gt;down into seven stages: requirements, specifications, design, implementation, integration,&lt;br /&gt;and maintenance. Figure 1 depicts the rapid prototyping life cycle model. The goal of this&lt;br /&gt;model is to quickly turn around versions of the software for client evaluation. Less intercycle&lt;br /&gt;feedback reduces the amount of time it takes to produce a prototype. Rapid&lt;br /&gt;prototyping is particularly useful for user interface development. The client can be&lt;br /&gt;continually involved in the process of creating a friendly, useful, user interface. Feedback&lt;br /&gt;from commercial software development has led to the creation of the incremental model.&lt;br /&gt;The incremental model develops a product as a series of progressive builds, with each&lt;br /&gt;build adding a new set of functions to the application. Each build creates a completely&lt;br /&gt;runnable system with increasingly powerful capabilities [10].&lt;br /&gt;4.2.1 Requirements&lt;br /&gt;The purpose of testing in the requirements phase is to determine if the software team&lt;br /&gt;correctly understands the user's requirements. Building a prototype and discussing the&lt;br /&gt;program with potential users is an effective way of accomplishing this goal [10].&lt;br /&gt;4.2.2 Specification&lt;br /&gt;The purpose of testing in the specification phase is to determine if the software team has&lt;br /&gt;correctly translated the functions required by the user into a software specification. The&lt;br /&gt;most common forms of specification testing are walkthroughs and inspections. A&lt;br /&gt;walkthrough consists of periodic meetings by a small team (led by the author) that reviews&lt;br /&gt;the specifications document. The team size rarely exceeds five people and the meetings&lt;br /&gt;36&lt;br /&gt;last less than 2 hours. During a meeting, the goal is to discover, but not correct, problems.&lt;br /&gt;The author can correct the problems later. Individuals prepare for the meeting by&lt;br /&gt;reviewing the specification and requirements documents [10].&lt;br /&gt;An inspection is a more highly structured process consisting of five formalized steps:&lt;br /&gt;overview, preparation, inspection, rework, and follow-up. The International Institute of&lt;br /&gt;Electrical Engineers (IEEE) has published an international standard for the inspection&lt;br /&gt;process [62]. The overview step is a preliminary meeting where members of the inspection&lt;br /&gt;team are assigned roles and given specific tasks to prepare for the inspection. In the&lt;br /&gt;preparation step team members examine the specification from the perspective of their&lt;br /&gt;assigned roles and prepare checklists for verification during the group inspection. A series&lt;br /&gt;of inspection meetings are then held with the team measuring the specification against the&lt;br /&gt;checklist. Again, problems are only identified, not solved during these sessions. The&lt;br /&gt;rework step corrects problems discovered in the specification. Follow-up ensures that the&lt;br /&gt;rework corrected the problems identified, and didn't introduce any new ones. Although&lt;br /&gt;no formal studies have been done, it is thought that inspections take more time but are&lt;br /&gt;more effective than walkthroughs. IBM's cleanroom verification technique uses inspection&lt;br /&gt;as the main verification tool throughout the software life cycle [63].&lt;br /&gt;Specifications can be written in a variety of formats from informal prose to a formal&lt;br /&gt;algebraic description. The testing research community is interested in formal specifications&lt;br /&gt;because of the potential for early, automated debugging, testing, and analysis [64]. Recall&lt;br /&gt;that the sooner a problem can be found in the development cycle, the less expensive it is&lt;br /&gt;to fix (see Section 4). A formal specification can also be useful in later stages of the&lt;br /&gt;software life cycle such as design phase mathematical proofs of correctness (Section 4.2.3),&lt;br /&gt;and input selection/output analysis for functional testing (Section 4.2.4). There are two&lt;br /&gt;kinds of formal specification: process-based and model-based.&lt;br /&gt;A process-based specification views the program as being comprised of subprograms. A&lt;br /&gt;critical part of the specification is to formally specify the interfaces between subprograms&lt;br /&gt;and abstract data types (ADT). The specification is developed using a top-down process&lt;br /&gt;where successive revisions of the specification result in smaller subprograms with greater&lt;br /&gt;interface and ADT detail. The finest level of detail is a formal algebraic notation.&lt;br /&gt;37&lt;br /&gt;Reusable generic specifications are one advantage of this technique. For example, instead&lt;br /&gt;of describing an integer specific sort routine, a generic sort routine could be specified.&lt;br /&gt;This routine is written at a high enough level that it could sort any data type (e.g. integer,&lt;br /&gt;real, programmer defined). When a specific kind of sorting is needed, another refinement&lt;br /&gt;of the routine is performed with the data type needed [65]. Larch [66] is an example of a&lt;br /&gt;system that supports the process-based specification technique.&lt;br /&gt;A model-based specification is a formal mathematical model of the entire software system.&lt;br /&gt;The specification not only describes the interfaces and data structures of the software&lt;br /&gt;system, but also describes state behavior in a formal way. Z [67] and the Vienna Definition&lt;br /&gt;Model [68] are examples of model-based specifications. Z uses a set/relation notation&lt;br /&gt;where components that make up the software system are represented as schemas. The&lt;br /&gt;following is an example of a Z schema for the CSCW application CollabBillboard:&lt;br /&gt;MoveBillboardPiece&lt;br /&gt;∆Billboard&lt;br /&gt;owner?: OWNER&lt;br /&gt;pieceID?: PIECEID&lt;br /&gt;x?: &lt;br /&gt;y?: &lt;br /&gt;owner ∈(Billboard.pieceList(pieceID?)).owner&lt;br /&gt;0 ≤x? ≤XMAX&lt;br /&gt;0 ≤y? ≤YMAX&lt;br /&gt;Billboard.pieceList(pieceID?).x = x?&lt;br /&gt;Billboard.pieceList(pieceID?).y = y?&lt;br /&gt;Figure 9: Z Language schema for CollabBillboard&lt;br /&gt;The MoveBillboardPiece function is responsible for updating the x,y location of a&lt;br /&gt;billboard piece on CollabBillboard's shared workspace. ∆Billboard at the beginning&lt;br /&gt;indicates that the schema will change the system state by altering Billboard. The schema&lt;br /&gt;signature describes the input variables and their data types. For example, x?: indicates&lt;br /&gt;that the input variable x can be any natural number. Schema predicates indicate&lt;br /&gt;Schema name&lt;br /&gt;Indicates schema will&lt;br /&gt;cause a state change.&lt;br /&gt;Schema signature&lt;br /&gt;Schema predicates&lt;br /&gt;38&lt;br /&gt;conditions that must hold for system state and input variables. For example, x must be a&lt;br /&gt;non-negative natural number with a value less than the width of the workspace (XMAX) for&lt;br /&gt;the billboard piece to be displayed properly. Schema predicates can also contain set or&lt;br /&gt;relation operations. The partial predicate pieceList(pieceID?) performs a range&lt;br /&gt;lookup on the set pieceList. This set represents a total mapping of piece IDs (domain)&lt;br /&gt;to actual piece structures (range). The pieceList(pieceID?) predicate returns the piece&lt;br /&gt;whose ID is represented by the input variable pieceID?.&lt;br /&gt;The Test Template Framework [64] uses a Z specification to create cases for&lt;br /&gt;implementation testing. An analysis of the schema signature is done to create an input&lt;br /&gt;space for each variable. A variable's input space is refined into a valid input space through&lt;br /&gt;schema predicate constraints. The valid input space is then grouped into categories using&lt;br /&gt;techniques based on the category partition testing method [69]. The result of this&lt;br /&gt;processing is a Z language specification for a set of generic test cases. Actual test cases are&lt;br /&gt;instantiated by executing the function derived from the specification with data that satisfies&lt;br /&gt;the Z specification for the input variables. Analysis of the test case results is performed by&lt;br /&gt;comparing output against schema signature and predicate constraints. Other specification&lt;br /&gt;based testing work includes Haye's [70] techniques for constructing input/output&lt;br /&gt;constraints from a Z schema specification, and Stanford's Anna [71] system for runtime&lt;br /&gt;checking of Ada programs using specification derived constraints.&lt;br /&gt;In addition to general specification systems like Larch and Z, specialized systems have&lt;br /&gt;been developed for concurrent and real-time programming. Specialized systems providing&lt;br /&gt;verification support include Concurrent Temporal Logic (CTL) [72] an SCR specification&lt;br /&gt;system for event-driven applications, Graphical Interval Logic (GIL) [73], a visual temporal&lt;br /&gt;specification system, and the constrained expression toolkit [74] for real-time programs.&lt;br /&gt;Figure 10: GIL: Specification for queueRemotePieceUpdate$n&lt;br /&gt;GIL allows the temporal properties of a concurrent system to be specified using a&lt;br /&gt;annotated graphical timing diagram. GIL developers claim that the graphical notation of&lt;br /&gt;remoteUpdate$n ^ timerExpired&lt;br /&gt;39&lt;br /&gt;timing diagrams is superior to a temporal logic text specification because visualization&lt;br /&gt;increases the understanding of relationships between the temporal properties of the&lt;br /&gt;system. The semantics underlying GIL allow the diagrams to be converted into&lt;br /&gt;propositional temporal logic, which can then be run through a proof checker. The proof&lt;br /&gt;checker is not automatic, and must be told which diagrams should be included in a&lt;br /&gt;particular proof. Figures Figure 10 and Figure 11 depict a GIL specification for a portion&lt;br /&gt;of the CollabBillboard application.&lt;br /&gt;queueRemotePieceUpdate$n. 0&lt;= n &lt;=(Number of Billboard Pieces - 1)&lt;br /&gt;A remote update event arrives for a piece of the billboard and the event is queued until a&lt;br /&gt;timer expires.&lt;br /&gt;Queuing a remote update keeps remote piece movement from interfering with local user&lt;br /&gt;performance. The remoteUpdate$n boolean is set to TRUE when a remote update event&lt;br /&gt;arrives from another user for Billboard piece n. timerExpired is set to TRUE every 200&lt;br /&gt;milliseconds. The interval depicted by this timing diagram indicates that as long as remote&lt;br /&gt;piece updates are being received and the timer hasn't expired, then the condition&lt;br /&gt;queueRemotePieceUpdate$n will be TRUE.&lt;br /&gt;drawRemotePieceUpdate$n. 0&lt;= n &lt;=(Number of Billboard Pieces - 1)&lt;br /&gt;Figure 11: GIL specification for drawRemotePieceUpdate$n&lt;br /&gt;A billboard piece is redrawn if it has been queued as a remote update and the timer has&lt;br /&gt;expired.&lt;br /&gt;remoteUpdate$n ^ timerExpired&lt;br /&gt;queueForRedrawPiece$n&lt;br /&gt;redraw&lt;br /&gt;remoteUpdate$n&lt;br /&gt;40&lt;br /&gt;The conditions that identify this interval are that the piece has had a remoteUpdate$n&lt;br /&gt;event associated with it since the last time the timer expired. The implication arrow (→)&lt;br /&gt;indicates that if these conditions for the interval are met then the remoteUpdate$n&lt;br /&gt;boolean will be set to FALSE for the billboard piece and the piece will be considered&lt;br /&gt;queued for local redraw until the actual redraw occurs.&lt;br /&gt;Start Mode In Site Remote In&lt;br /&gt;Site&lt;br /&gt;Button Down Remote&lt;br /&gt;Button Down&lt;br /&gt;End Mode&lt;br /&gt;Clear Map F F F F Clear Map&lt;br /&gt;@T - F F Site Info&lt;br /&gt;- @T F F Remote Site&lt;br /&gt;Info&lt;br /&gt;Site Info T F F F Site Info&lt;br /&gt;@F F F F Clear Map&lt;br /&gt;T @T F F Remote Site&lt;br /&gt;Info&lt;br /&gt;T @F F F Site Info&lt;br /&gt;T - @T F Site Selected&lt;br /&gt;Remote Site&lt;br /&gt;Info&lt;br /&gt;F T F F Remote Site&lt;br /&gt;Info&lt;br /&gt;@T - F F Site Info&lt;br /&gt;F @F F F Clear Map&lt;br /&gt;F T F @T Remote Site&lt;br /&gt;Selected&lt;br /&gt;Site Selected - - F F Site Selected&lt;br /&gt;F - @T F Clear Map&lt;br /&gt;- F F @T Clear Map&lt;br /&gt;- T F @T Remote Site&lt;br /&gt;Selected&lt;br /&gt;Remote Site&lt;br /&gt;Selected&lt;br /&gt;- - F F Remote Site&lt;br /&gt;Selected&lt;br /&gt;F - @T F Clear Map&lt;br /&gt;- F F @T Clear Map&lt;br /&gt;T - @T Site Selected&lt;br /&gt;Table 1: SCR Table for CollabBillboard&lt;br /&gt;The Software Cost Reduction (SCR) method is a formal method for specifying the&lt;br /&gt;requirements of real-time systems. The SCR method has been used successfully in a&lt;br /&gt;variety of application domains including aviation, telephony, and nuclear power. System&lt;br /&gt;behavior is modeled as a relationship between two types of variables: monitored variables,&lt;br /&gt;which denote environmental quantities, monitored by the system and controlled variables&lt;br /&gt;that denote environmental quantities the system controls. Conditions, events, and tables&lt;br /&gt;provide details on how monitored variables affect controlled variables. A condition is a&lt;br /&gt;predicate defined on one or more variables in the specification. When any variable&lt;br /&gt;41&lt;br /&gt;changes value, it is called an event. An SCR table specifies a variable's value based on&lt;br /&gt;conditions and events [75].&lt;br /&gt;The following is a mode transition table for the CollabBillboard application's shared map&lt;br /&gt;task. The purpose of this type of SCR table is to show how the system state changes&lt;br /&gt;because of new input conditions. The left-most and right-most columns represent the&lt;br /&gt;current mode and new mode respectively. Clear Map is a mode where the cursor is not on&lt;br /&gt;any billboard site and no site has been selected. Site Info and Remote Site Info are modes&lt;br /&gt;where the cursor is over one of the billboard sites and an information box appears&lt;br /&gt;describing the box. Site Selected and Remote Site Selected are modes where one of the&lt;br /&gt;users has selected a site for billboard placement. In this mode, the selected site is&lt;br /&gt;highlighted with a special yellow (local) or gray (remote) box. The In Site, Remote In&lt;br /&gt;Site, Button Down, and Remote Button Down represent condition variables that are&lt;br /&gt;monitored by the system. An environmental condition can have four possible values: T&lt;br /&gt;(currently TRUE), @T (just turned TRUE), F (currently FALSE), @F (just turned FALSE):&lt;br /&gt;Several verification tools have been developed for SCR specifications. The SCR* system&lt;br /&gt;[75] provides a consistency checker to detect syntax errors, incomplete variable definitions,&lt;br /&gt;or circular variable definitions. The CTL system [72] converts an SCR specification into a&lt;br /&gt;finite state machine and a set of temporal logic propositions. The converted specification&lt;br /&gt;is then nondeterministically executed. The execution proceeds in discrete time units,&lt;br /&gt;which represent single state transitions. Since a transition is activated every time unit, at&lt;br /&gt;least one of the current state's transitions will be enabled at all times. As the machine is&lt;br /&gt;executing, the temporal properties that must hold are checked.&lt;br /&gt;The GIL and CTL systems represent two competing verification techniques for realtime/&lt;br /&gt;concurrent specification systems: theorem proving and state-based. The problem&lt;br /&gt;with theorem proving is that it is difficult to automate. The GIL system, for example,&lt;br /&gt;requires the user to indicate by hand the specifications that will be used to verify a&lt;br /&gt;particular constraint. State-based systems like CTL suffer from an exponential explosion&lt;br /&gt;in the number of states that must be explored to completely verify a system. The&lt;br /&gt;constraint expression toolkit [74], [76] provides tractable automated verification using&lt;br /&gt;integer programming.&lt;br /&gt;42&lt;br /&gt;The constraint expression toolkit is used for bounding the time between events in a&lt;br /&gt;concurrent real-time system. It converts an Ada-like specification into a set of finite&lt;br /&gt;automata, one for each process in the system. The alphabet of each automaton consists of&lt;br /&gt;symbols for computation within the process and for synchronous communication with&lt;br /&gt;other processes. A set of transition variables is assigned to each automaton edge. These&lt;br /&gt;variables count the number of times the edge is traversed during process execution. Start&lt;br /&gt;and halt variables are assigned to each node with an exiting start event edge, and entering&lt;br /&gt;halt event edge. If the process does not contain a start event edge, then all nodes are&lt;br /&gt;labeled with a start variable. The same technique is used for processes without the halt&lt;br /&gt;event edge. Equations are then derived by treating the automatons as a network flow&lt;br /&gt;where the number of times a state is entered equals the number of times it is exited.&lt;br /&gt;Additional equations are added by forcing each automaton to start and halt at exactly one&lt;br /&gt;place. A final set of equations can be added by recognizing that transition variables&lt;br /&gt;representing communication between processes must sum to the same value. Once the&lt;br /&gt;equations have been determined, an integer-programming objective is established, for&lt;br /&gt;example:&lt;br /&gt;tixi&lt;br /&gt;i&lt;br /&gt;where ti is the time it takes to move along the edge labeled with transition variable xi and xi&lt;br /&gt;is the number of times the edge has been traversed. The bounds can be determined by&lt;br /&gt;using integer-programming techniques to solve for the minimum and maximum values of&lt;br /&gt;the objective. Although in general integer programming is NP-complete, there are special&lt;br /&gt;cases that reduce to polynomial time linear programming. The types of equations&lt;br /&gt;generated by the constrained expression toolkit generally reduce to one of these special&lt;br /&gt;cases.&lt;br /&gt;4.2.3 Design&lt;br /&gt;The essential difference between specification and design is that the specification states&lt;br /&gt;what the program is supposed to do, while the design shows how the program will do it.&lt;br /&gt;The purpose of testing in the design phase is to ensure a correct implementation of the&lt;br /&gt;specification. Informal techniques like walkthroughs and inspections are commonly used&lt;br /&gt;43&lt;br /&gt;in design verification (see Section 4.2.2). Formal techniques, such as proofs of correctness,&lt;br /&gt;are also used.&lt;br /&gt;One proof of correctness technique uses mathematical induction on loop invariants. The&lt;br /&gt;idea is to identify set variables and characteristics about those variables that do not change&lt;br /&gt;from loop iteration to iteration. A proof by induction is then performed on the variables&lt;br /&gt;and their characteristics [77]. An alternative proof technique is Hoare's axiomatic method,&lt;br /&gt;which uses deduction on axioms derived from program statements [78], [79]. Formal&lt;br /&gt;proofs of correctness have not found widespread acceptance in the software community as&lt;br /&gt;a verification tool. This is due to a number of factors including the mathematics skill&lt;br /&gt;needed to manipulate predicate calculus and temporal logic, the immense effort needed to&lt;br /&gt;prove even the smallest of designs, and the inability to automate due to the need for&lt;br /&gt;human intervention needed to determine things like loop invariants. Despite these&lt;br /&gt;drawbacks, these formal techniques have been successful in a number of domains,&lt;br /&gt;particularly where the cost of verification is negligible compared to the cost of program&lt;br /&gt;failure (e.g. NASA space missions) [10]. However, even if the cost of correctness proving&lt;br /&gt;could be ignored, it is not a panacea for software verification because "we can never be&lt;br /&gt;sure that the specification is correct" and "we can never be certain that the verification&lt;br /&gt;system is correct" [80].&lt;br /&gt;4.2.4 Implementation&lt;br /&gt;In the implementation phase, the actual code has been written and the process of verifying&lt;br /&gt;the physical program commences. Verification during the implementation phase is also&lt;br /&gt;known as unit, functional, or module level testing. Two kinds of testing are performed&lt;br /&gt;during this phase: black box and white box.&lt;br /&gt;Black box testing ignores the internals of a routine and uses the specification to determine&lt;br /&gt;the expected output given a specific input. Testing is performed by executing the routine&lt;br /&gt;with input and analyzing the output for correctness. This form of testing is attractive&lt;br /&gt;because the tester does not have to be concerned with the internals of the routine, which&lt;br /&gt;allows someone other than the routine author to perform the test. The problem with&lt;br /&gt;black box testing is that in order to thoroughly test a routine, all possible inputs must be&lt;br /&gt;tried. This results in a combinatorial explosion that causes the test of even a simple&lt;br /&gt;44&lt;br /&gt;routine to be computationally infeasible [14]. Consider the following routine from&lt;br /&gt;CollabBillboard that returns the distance between a pair of two-dimensional points:&lt;br /&gt;float distance(int x1; int y1; int x2; int y2;)&lt;br /&gt;In order to thoroughly black box test this routine, the function must be executed and&lt;br /&gt;verified for each possible x and y value. Assuming 32 bit integers, this would result in 2128&lt;br /&gt;test cases, requiring more than 1023 years to complete on an Intel Pentium 166/MMX&lt;br /&gt;machine.&lt;br /&gt;Equivalence partitioning is a method for reducing the number of black box test cases. The&lt;br /&gt;idea is to partition the input space into a set of equivalence classes where any input value in&lt;br /&gt;a class is equivalent to any other input value in the class. Equivalence partitioning&lt;br /&gt;eliminates the need for exhaustive testing because only one representative test needs to be&lt;br /&gt;performed for each equivalence class. For example, suppose a routine that calculates cos,&lt;br /&gt;where is the angle in degrees, is to be tested. By examining the behavior of the cosine&lt;br /&gt;curve between 0 and 360 degrees a number of equivalence classes emerge (see Table 2). If&lt;br /&gt;the equivalence classes are chosen correctly, then a single test with any value from class I&lt;br /&gt;(e.g. 5 degrees) should be sufficient for testing the entire range of values from 0 to 89&lt;br /&gt;degrees. The actual input values tested can be selected by hand, or automatically by&lt;br /&gt;random sampling [81] from each equivalence class.&lt;br /&gt;Equivalence&lt;br /&gt;Class&lt;br /&gt;Cosine Behavior Range&lt;br /&gt;I 1 →0 0 →89&lt;br /&gt;II 0 →-1 90 →179&lt;br /&gt;III -1 →0 180 →269&lt;br /&gt;IV 0 →1 270 →359&lt;br /&gt;V 1 →0 →-1 →0 →-1 Negative multiples of 360 - (0 →359)&lt;br /&gt;VI 1 →0 →-1 →0 →-1 Positive multiples of 360 + (0 →359)&lt;br /&gt;Table 2: Equivalence classes for cos&lt;br /&gt;Despite some heuristics for performing equivalence partitioning [14], it is essentially a&lt;br /&gt;manual process. The process requires a deep understanding of both input parameters and&lt;br /&gt;the purpose of the routine to be tested. There is no way to guarantee that a partitioning&lt;br /&gt;scheme is correct; that each value in an equivalence class will exercise the same code in the&lt;br /&gt;45&lt;br /&gt;same manner in a routine. In the cosexample above if the developer decided to&lt;br /&gt;implement the function using a lookup table, then the equivalence classes in Table 2 would&lt;br /&gt;be insufficient. The technique is not foolproof, but attempts to create a manageable&lt;br /&gt;number of test cases with maximum impact.&lt;br /&gt;Boundary value analysis is an enhancement to equivalence partitioning. The idea is that&lt;br /&gt;test cases that use input values near the boundaries of equivalence classes have greater&lt;br /&gt;impact. Input values are generated from below, on, and above the edges of an equivalence&lt;br /&gt;class. More formally [10]:&lt;br /&gt;For each range (R1,R2) of an equivalence class, five test cases should be created:&lt;br /&gt;(1) &lt; R1 (4) = R2&lt;br /&gt;(2) = R1 (5) &gt; R2&lt;br /&gt;(3) R1 &lt; ∝&lt; R2&lt;br /&gt;In the cosexample above, the boundary conditions for class I would be the following&lt;br /&gt;angles:(-1,0, 1,45,89,90,91).&lt;br /&gt;Figure 12: Control flow graph for loop with five possible&lt;br /&gt;logic paths&lt;br /&gt;46&lt;br /&gt;White box testing uses a routine's internal logic to create test cases. Test case output is&lt;br /&gt;compared against expected output given input values and the specification. The advantage&lt;br /&gt;of white box testing is precise control over the routine logic exercised by each test case.&lt;br /&gt;Unfortunately, testing every possible logic path results in a combinatorial explosion similar&lt;br /&gt;to black box testing. Consider the control flow graph shown in Figure 12 of a loop with 5&lt;br /&gt;possible logic paths per iteration.&lt;br /&gt;To thoroughly test every logic path for 20 iterations of the loop would require 520 + 519 +&lt;br /&gt;… 51 = 1014 test cases. Assuming an Intel Pentium 166/MMX machine, it would take&lt;br /&gt;approximately 21 days to simply execute the test cases. This doesn't count time spent&lt;br /&gt;analyzing the results.&lt;br /&gt;Another problem with path coverage is that it doesn't guarantee that all states will be&lt;br /&gt;exercised in the implemented program. Different input values can cause the program to&lt;br /&gt;behave differently even if the same path is executed. Consider the statements in Figure 13.&lt;br /&gt;Setting theNumber = -1 and theNumber = 0 will cause the same path to be executed, but in&lt;br /&gt;one case the program will print out "MINUS -1" and in the other it will generate a divide by&lt;br /&gt;zero fault.&lt;br /&gt;Figure 13: Code fragment from CollabBillboard&lt;br /&gt;Statement coverage reduces the combinatorial explosion of test cases by ensuring every&lt;br /&gt;statement in the program is executed correctly at least once. One problem with this&lt;br /&gt;approach is that particular test data may give the illusion of statement correctness.&lt;br /&gt;Consider the following code sequence from CollabBillboard:&lt;br /&gt;This code fragment ensures the upper left corner of the billboard frame's movable view&lt;br /&gt;window stays within bounds of the drawing surface. The problem with the statements&lt;br /&gt;if (point.x &lt; 0) point.x = 0; if (point.x &gt; YMAX) point.x = XMAX;&lt;br /&gt;if (point.y &lt; 0) point.y = 0; if (point.y &gt; YMAX) point.y = YMAX;&lt;br /&gt;Test Cases:&lt;br /&gt;1 - point.x = -1; point.y = -1;&lt;br /&gt;2 - point.x = XMAX + 1; point.y = YMAX + 1;&lt;br /&gt;1 - point.x = -1; point.y = -1;&lt;br /&gt;2 - point.x = XMAX + 1; point.y = YMAX + 1;&lt;br /&gt;1&lt;br /&gt;47&lt;br /&gt;above is that the upper bounds for point.x should be XMAX, not YMAX. The error is difficult&lt;br /&gt;to detect because an x-value greater than XMAX will always trigger the upper bounds&lt;br /&gt;conditional because the drawing surface is wider than it is long.&lt;br /&gt;Branch coverage provides statement coverage and additionally ensures every conditional path&lt;br /&gt;is executed at least once. Branch coverage will test the conditionals in Figure 13 for both&lt;br /&gt;TRUE and FALSE conditions. Data from a test case that should trigger a FALSE path&lt;br /&gt;execution for the if (point.x &gt; YMAX)… statement might identify the conditional error.&lt;br /&gt;Although an improvement over statement coverage, branch coverage is still very sensitive&lt;br /&gt;to test case data selection.&lt;br /&gt;Numerous path coverage techniques have been devised which exercise paths through the&lt;br /&gt;code. Combinatorial explosion is avoided by executing paths through the code a non-zero&lt;br /&gt;minimum number of times. A common path coverage technique constructs a control flow&lt;br /&gt;graph to find paths through the code [82]. Path coverage performance can be improved by&lt;br /&gt;discovering the minimum number of paths that have to be traversed to cover all paths [83].&lt;br /&gt;One of the challenges of path coverage is to discover the input values that will cause a&lt;br /&gt;particular path to be executed. Using data flow graphs to create def-use paths for variables&lt;br /&gt;used in the program [84] makes it easier to discover how a particular input value affects&lt;br /&gt;program flow. DFG analysis cannot automatically select input values for test cases, but it&lt;br /&gt;can let the tester know what paths still need to be traversed for a particular variable. The&lt;br /&gt;DELLA PASTA [85] system extends the def-use technique to parallel programs. The core&lt;br /&gt;of the DELLA PASTA system is an algorithm that creates paths for variables defined in&lt;br /&gt;one thread and used in another. The system is very limited in that it only works in a&lt;br /&gt;shared memory architecture and provides no control over the temporal aspects of&lt;br /&gt;execution which can also influence path coverage.&lt;br /&gt;4.2.5 Integration&lt;br /&gt;When the program has been implemented and individual functions have been tested, it is&lt;br /&gt;time to test the program as a whole. Integration testing approaches revolve around how&lt;br /&gt;modules are assembled for verification. Separate integration verifies each module separately,&lt;br /&gt;then modules are combined all at once and the entire program is tested. Top-down integration&lt;br /&gt;integrates and verifies the highest level modules first with stubs for functions in lower level&lt;br /&gt;48&lt;br /&gt;modules. This technique is excellent for identifying major design flaws early in the&lt;br /&gt;software life cycle, but does a poor job of detecting flaws in lower level modules. Bottom-up&lt;br /&gt;integration assembles and verifies the lower level modules first and tests the higher level&lt;br /&gt;modules later. This technique is excellent for identifying problems with lower level&lt;br /&gt;functions, but high-level design flaws are detected late in the life cycle. Sandwich integration&lt;br /&gt;divides modules into low level "utility" functions, and high level "glue-like" logic functions.&lt;br /&gt;Bottom-up integration is performed on the utility modules and top-down integration is&lt;br /&gt;performed on the logic modules [10].&lt;br /&gt;4.2.6 System Testing&lt;br /&gt;When the program has been implemented, its individual functions and combined&lt;br /&gt;functions tested against the specification, there is still verification to perform. System&lt;br /&gt;testing refers to verifying the program against the requirements, not the specification [14].&lt;br /&gt;Facility testing verifies that each objective discussed in the requirements is actually met by&lt;br /&gt;the program. Volume testing subjects the program to heavy volumes of data. Stress testing&lt;br /&gt;subjects the program to heavy loads or stresses. A stress test differs from a load test in&lt;br /&gt;that it focuses on data volume over time versus just data volume. Usability testing examines&lt;br /&gt;the program's human factors issues. Security testing tries to subvert the program's security&lt;br /&gt;mechanisms. Security testing is particularly important in CSCW where issues of privacy&lt;br /&gt;and user roles arise. Performance testing ensures that the program meets requirements for&lt;br /&gt;response times and throughput under various workloads and configurations. Configuration&lt;br /&gt;testing examines how the program operates in a variety of hardware and software&lt;br /&gt;environments. Memory testing is a specific form of configuration testing that verifies the&lt;br /&gt;software's main and secondary storage needs. Compatibility/Conversion testing identifies&lt;br /&gt;problems between the new software and preexisting programs and data. Install testing&lt;br /&gt;exercises the procedures involved with getting the software installed and running.&lt;br /&gt;Reliability testing is performed implicitly throughout the software life cycle (see Section 4.1:&lt;br /&gt;Reliability). Recovery testing exercises the software's ability to handle situations when&lt;br /&gt;programming, hardware, and data errors occur. Serviceability testing investigates&lt;br /&gt;requirements for fixing and maintaining the program. Documentation testing verifies that the&lt;br /&gt;user documentation is correct. Some verification techniques include document inspection,&lt;br /&gt;and incorporating every example into the test case suite. Procedure testing deals with the&lt;br /&gt;49&lt;br /&gt;verification of procedures that users must follow. Acceptance testing is the final test before&lt;br /&gt;the software is formally delivered to the user community.&lt;br /&gt;4.3 Human Computer Interaction Testing&lt;br /&gt;Human Computer Interaction testing research has focused primarily in two areas: testing&lt;br /&gt;architectures and usability testing. Automated testing research has examined the problems&lt;br /&gt;encountered when a testing system is used to automate the evaluation of applications with&lt;br /&gt;graphical user interfaces. Usability testing has attempted to provide techniques and&lt;br /&gt;evaluation techniques for an application’s user interface.&lt;br /&gt;4.3.1 Testing Architectures&lt;br /&gt;Script reusability has been the major focus of academic testing architectures. Because of&lt;br /&gt;the highly iterative nature of GUI application development, the test scripts recorded with&lt;br /&gt;one version of the application quickly become invalid. The bitmap comparison techniques&lt;br /&gt;used in early systems were insufficient because of dependencies on precise location and&lt;br /&gt;content of the GUI. Advocating a programmatic approach, early researchers argued that&lt;br /&gt;test scripts that drive the application by identifying the GUI components&lt;br /&gt;programmatically, rather than graphically, have less sensitivity to specific application state&lt;br /&gt;[86].&lt;br /&gt;Figure 14: Usability guidelines from [87]&lt;br /&gt;The Test Development Environment (TDE) addresses this issue with a visual test&lt;br /&gt;development system that abstracts low-level GUI events into higher-level operations on&lt;br /&gt;Use a simple and natural dialog&lt;br /&gt;Provide an intuitive visual layout&lt;br /&gt;Minimize a user’s memory load&lt;br /&gt;Be consistent&lt;br /&gt;Provide feedback&lt;br /&gt;Provide clearly marked exits&lt;br /&gt;Provide shortcuts&lt;br /&gt;Provide good help&lt;br /&gt;Allow user customization&lt;br /&gt;Minimize the use and effectiveness of modes&lt;br /&gt;Support input device continuity&lt;br /&gt;50&lt;br /&gt;specific GUI components [88]. An organizational tool is provided to group operations&lt;br /&gt;into scripts and store them in a design library. To create a test case, the tester uses a visual&lt;br /&gt;programming environment to select a set of scripts from the library. The visual language&lt;br /&gt;includes provisions for if/then and looping control constructs. Data variance using formbased&lt;br /&gt;constraints is also included to increase script reusability. Low-level application&lt;br /&gt;events are regenerated from the high level operations to exercise the application. When a&lt;br /&gt;new version of the application is developed, the TDE examines the GUI components&lt;br /&gt;using the components it is aware of from the scripts in the design library. Discrepancies&lt;br /&gt;are identified and can be corrected by the tester with the help of mapping wizards included&lt;br /&gt;in the TDE.&lt;br /&gt;Other techniques attack script reusability by generating test cases automatically. To&lt;br /&gt;thoroughly test the application, however, each GUI action has to be tried in combination&lt;br /&gt;with every other GUI action. Like black and white box testing this creates a combinatorial&lt;br /&gt;explosion of test cases. Several approaches have been investigated to reduce this growth.&lt;br /&gt;Pair-wise grouping restricts the length of an interaction chain to two. The creators of this&lt;br /&gt;approach found a significant reduction in test cases without a corresponding drop in&lt;br /&gt;detected bugs [89]. Latin-squares arranges n distinct GUI interactions in an n x n grid&lt;br /&gt;where every interaction occurred exactly once in each row and once in each column [90].&lt;br /&gt;Test case reduction without significant loss of bugs was also found using this approach.&lt;br /&gt;Artificial Intelligence (AI) planning techniques have also been used [91]. One system&lt;br /&gt;analyzes the application’s GUI to derive a set of user actions. The test designer manually&lt;br /&gt;encodes pre and post conditions for each interaction (e.g. to display panel X the user must&lt;br /&gt;press button Y). The designer then defines start and goal states for the application. The&lt;br /&gt;system uses an AI planner to find a path from the start state to the goal state using the&lt;br /&gt;GUI interactions encoded by the designer. Test case reduction is achieved because only&lt;br /&gt;one path is generated for each goal state. Unfortunately, like the techniques in Section&lt;br /&gt;4.2.4, approaches that eliminate test cases can’t guarantee that all problems will be found in&lt;br /&gt;an application.&lt;br /&gt;In addition to script reusability, researchers have investigated visual programming, script&lt;br /&gt;analysis, and multi-modal scripting. A methodology and architecture has been created for&lt;br /&gt;51&lt;br /&gt;testing visual programs like spreadsheets [92]. The system defines cell relation graphs and&lt;br /&gt;constructed compiler-like “definition-use” links between cells that define values and cells&lt;br /&gt;that used definition cells. The testing system highlights dependent cells that have not been&lt;br /&gt;tested. To exercise the cell, the tester changes the value of one or more definition cells.&lt;br /&gt;Highlighting is removed once the code in a dependent cell is executed.&lt;br /&gt;GUITESTER uses script analysis to determine usability problems in an application’s user&lt;br /&gt;interface [93]. Scripts of different users performing the same application task are analyzed.&lt;br /&gt;The analysis extracts common interaction patterns, mean mouse movement distances,&lt;br /&gt;mean interval between user actions, and the proportion of users who were unable to&lt;br /&gt;complete each sub-task. This information is used to identify clarity, safety, simplicity, and&lt;br /&gt;continuity problems. For example, a long mean distance between mouse clicks in a&lt;br /&gt;relatively short interval could mean the user interface suffers from a continuity problem.&lt;br /&gt;Multi-modal scripting integrates additional data into a script recording to improve the&lt;br /&gt;richness script playback. A script can be enhanced with synchronized videotape and voice&lt;br /&gt;captured at the time the user exercised the application. Observer text and voice&lt;br /&gt;annotations can be added later [94]. MITRE’s Multi-modal Logger allows multiple&lt;br /&gt;applications and simultaneous users to be recorded in a single script [95]. The rich&lt;br /&gt;information provided by these recording systems adds important context to the&lt;br /&gt;application during playback analysis.&lt;br /&gt;4.3.2 Usability Testing&lt;br /&gt;Academia and industry have produced numerous guidelines for user interface design and&lt;br /&gt;evaluation [40], [87], [96], [97], [98]. The guidelines range in size from a concise set of one&lt;br /&gt;line statements size as in Figure 14 to a detailed breakdown and description of every aspect&lt;br /&gt;of a graphical user interface.&lt;br /&gt;Most researchers agree on the general principles for a good user interface. Research is very&lt;br /&gt;active, however, in determining if an application violates these principles. Techniques&lt;br /&gt;include empirical evaluation, where users are observed using the application in a usability&lt;br /&gt;lab or in the field. Observers go to great lengths to avoid contact with subjects in order to&lt;br /&gt;preserve realistic application use. Empirical evaluation is an excellent tool real world&lt;br /&gt;52&lt;br /&gt;observation, however, it can be very expensive and time consuming [40]. Another&lt;br /&gt;technique, the walkthrough, uses deliberate attempts to expose usability problems in the&lt;br /&gt;application. Typically, quality assurance personnel or human factors experts perform the&lt;br /&gt;walkthrough, rather than regular users. Walkthroughs provide a cost and time efficient&lt;br /&gt;evaluation, but suffer because they lack a real world setting. Karat provides an excellent&lt;br /&gt;survey of walkthrough techniques including pluralistic walkthroughs, heuristic evaluations,&lt;br /&gt;cognitive walkthroughs, think-aloud evaluations, and scenario-based reviews[87]. More&lt;br /&gt;recently, advocates for participatory evaluation have voiced the opinion that having&lt;br /&gt;evaluators and possibly developers in the same room with real users offers the benefits of&lt;br /&gt;both empirical and walkthrough techniques [99], [100].&lt;br /&gt;4.4 Commercial Test Systems&lt;br /&gt;A survey of testing would be incomplete without a review of modern commercial testing&lt;br /&gt;systems. Unfortunately, there appears to be little contact between academia and the&lt;br /&gt;commercial testing community. Statements from researchers, such as "testing tools for&lt;br /&gt;CSCW applications are non-existent"[8] are simply untrue. At ISSTA '98, the premier&lt;br /&gt;annual academic conference on testing, over a dozen well-known researchers were&lt;br /&gt;questioned about multi-user testing architectures. None of them, including an individual&lt;br /&gt;citing SQA Suite™ in a conference paper, was aware of any multi-user support.&lt;br /&gt;The lack of a rigorous review of commercial testing in the literature necessitated examining&lt;br /&gt;a variety of alternative information sources including:&lt;br /&gt;USENET's comp.sys.testing which provides a regularly updated list of over 200&lt;br /&gt;commercial and public domain testing tools.&lt;br /&gt;Reviews in the Software Testing Online Resources (STORM) web site maintained&lt;br /&gt;by Roland Untch at Middle Tennessee State University [101].&lt;br /&gt;Software review articles from commercial magazines [102], [103], [104], [105].&lt;br /&gt;Several discussions with Dr. Anne Ferraro who performed a review of commercial&lt;br /&gt;testing systems for Microstrategies, Inc. [106].&lt;br /&gt;Test software company web sites.&lt;br /&gt;Several criteria were used to determine a system's desirability. First, the system had to run&lt;br /&gt;on Windows95/NT platforms. This was necessary because software for the Collaborative&lt;br /&gt;53&lt;br /&gt;Classroom, including CollabBillboard, was developed on these platforms. Second, the&lt;br /&gt;system had to support multi-user testing. Determining this capability was challenging&lt;br /&gt;because marketing literature uses words like "stress testing", "load testing", and&lt;br /&gt;"client/server testing" inconsistently. Sometimes this meant that the product was capable&lt;br /&gt;of multi-user testing. Other times this meant that the product could be used to simulate&lt;br /&gt;loads or client behavior in a single user environment. Finally, the company had to provide&lt;br /&gt;an evaluation copy. Four systems were initially selected: Platinum Technology's Final&lt;br /&gt;Exam C/S-Test™ [107], Mercury Interactive's Test Suite™ [108], Rational Software's SQA&lt;br /&gt;Suite™ [109], and Segue's Silk Enterprise Edition™. Unfortunately, negotiations with&lt;br /&gt;Segue broke down before an evaluation copy of their system was obtained.&lt;br /&gt;The review of commercial systems is organized around a reference testing architecture. A&lt;br /&gt;software test environment (STE) can be broken down into six functional categories: test&lt;br /&gt;execution, test development, test failure analysis, test measurement, test management, and&lt;br /&gt;test planning [110].&lt;br /&gt;4.4.1 Test Planning&lt;br /&gt;Test planning provides the tools necessary for managing staff, schedules, and resources&lt;br /&gt;necessary for product testing. Areas covered by this function include features of software&lt;br /&gt;to be tested, detailed test plans, risk assessment, organization training needs, resource&lt;br /&gt;needs, staffing needs, staffing roles, staffing responsibilities, and schedule.&lt;br /&gt;SQA Suite™ and TestSuite™ provide extensive tools for test planning. SQA Suite™&lt;br /&gt;defines the testing process as a sequence of six steps: Test Planning ∝Test Development ∝&lt;br /&gt;Test Results ∝Defect Tracking ∝Summary Reporting and Analysis. SQA Manager is&lt;br /&gt;provided to define and organize test requirements. Test requirements are defined using a&lt;br /&gt;hierarchical folder/document tree. Folders describe level testing objectives with higher&lt;br /&gt;level objectives appearing closer to the root. The leaves are documents, which describe the&lt;br /&gt;detailed low level requirements for a specific test.&lt;br /&gt;TestSuite™ defines the testing process in three steps: Test Planning ∝Test Execution ∝&lt;br /&gt;Bug Tracking. TestSuite™ merges testing planning and development into a single step&lt;br /&gt;following IEEE Standard 829 [111]: Define Goals (requirements) ∝Define Major&lt;br /&gt;54&lt;br /&gt;Capabilities to Test (specification) ∝Define Tests (design) ∝Define Steps for each Test&lt;br /&gt;(implementation) ∝Automate Tests (automation). Testing is viewed as a life cycle that&lt;br /&gt;parallels software development. Wizards are provided which guide the tester through each&lt;br /&gt;step of the planning process.&lt;br /&gt;Final Exam C/S-Test™ does not provide any test planning facilities.&lt;br /&gt;4.4.2 Test Management&lt;br /&gt;Test management deals with the storage and maintenance of test artifacts and their&lt;br /&gt;interrelationships. A sophisticated storage mechanism, such as a database, is needed to&lt;br /&gt;maintain artifact relationships.&lt;br /&gt;SQA Suite™ manages the entire testing process through the SQA Manager [109] program.&lt;br /&gt;SQA Manager allows the tester to perform test planning, archive developed test cases,&lt;br /&gt;archive the results from test execution, and perform analysis on the test case results. Email&lt;br /&gt;and bug tracking support is also provided to tie development, quality assurance, and&lt;br /&gt;management into the process. The artifacts of the test process are stored in either a&lt;br /&gt;Microsoft Access or Sybase relational database. SQA Manager provides a query&lt;br /&gt;mechanism for information in the test repository. Unfortunately, the data model for the&lt;br /&gt;repository is not exposed, so there is no way to link the test system into other development&lt;br /&gt;tools, such as the code library. This would be useful for synchronizing the bug fixes on&lt;br /&gt;the development and test side. A graphing and report writer facility is also included for&lt;br /&gt;reviewing and analyzing software defect information (e.g. age and priority of outstanding&lt;br /&gt;defects, defect ownership, number of defects over time).&lt;br /&gt;TestSuite™ provides similar management through TestDirector [112]. The repository uses&lt;br /&gt;Microsoft Access and exposes some of the data model to external applications.&lt;br /&gt;Specifically, read only views are available for test case results. This allows the tester to run&lt;br /&gt;standard report writing tools against the results. TestDirector provides excellent support&lt;br /&gt;for testing during the iterative GUI development process. When a new build is brought&lt;br /&gt;into the test system, the widgets on each dialog are analyzed. If there are differences&lt;br /&gt;between the widgets in the new build and previous build (e.g. a widget was deleted), and&lt;br /&gt;the archived test cases have dependencies on these differences, then the system will alert&lt;br /&gt;55&lt;br /&gt;the tester and provide a wizard to help modify the test cases. Like SQA Manager, bugtracking&lt;br /&gt;facilities are not integrated with any code library systems. Another shortcoming&lt;br /&gt;of TestDirector is a lack of query tools for data archived in the repository. A graphing and&lt;br /&gt;report writer is also included for defect analysis.&lt;br /&gt;Final Exam C/S-Test™ does not provide any test management facilities.&lt;br /&gt;4.4.3 Test Development&lt;br /&gt;Test development adds the ability to specify test executions. A test suite is developed for&lt;br /&gt;the software under verification. The suite consists of individual test cases. Each test case&lt;br /&gt;includes the input required to run the case, adequacy criteria to determine if the case&lt;br /&gt;passed or failed, and documentation.&lt;br /&gt;Final Exam C/S-Test™ records user actions performed on the application under test&lt;br /&gt;(AUT). Actions are written to a test script, which can then be played back. User actions&lt;br /&gt;are divided into two categories. High level actions involve the manipulation of a GUI&lt;br /&gt;widget (e.g. pushing a button). Low level actions involve device level manipulation (e.g.&lt;br /&gt;mouse click, or keyboard press). The recorder interprets actions at a high level whenever&lt;br /&gt;possible. This gives the test script greater flexibility during execution. A test script that&lt;br /&gt;records an OK button press is more flexible than one that records the absolute screen&lt;br /&gt;coordinates of the mouse click that caused the button press. If the script is run with the&lt;br /&gt;AUT at a new position on the display, the high level action will be replayed, while the low&lt;br /&gt;level one will cause undesired behavior. The following test script action sets the keyboard&lt;br /&gt;and mouse input focus to window specified:&lt;br /&gt;titlename is the text string name of the GUI window. internalId is a special C/STest&lt;br /&gt;™ internal identifier for the window. dbKey is used to lookup information about the&lt;br /&gt;window in a special Windows95/NT repository. dbId identifies the name of the&lt;br /&gt;repository. delay specifies the maximum amount of time the replay system should wait&lt;br /&gt;before deciding that the window cannot be found.&lt;br /&gt;setwindow( titlename, internalId, dbKey, dbId, delay);&lt;br /&gt;56&lt;br /&gt;Once a set of user actions has been recorded, the script can be enhanced with constructs&lt;br /&gt;from the Test Manipulation Language (TML). It is a weakly typed C-like language that&lt;br /&gt;includes conditionals, loops, and four variable types: string, float, int, and list, and includes&lt;br /&gt;subprograms. User exit support is provided so the script writer can call on pre-compiled&lt;br /&gt;subroutines developed in other languages like C and C++.&lt;br /&gt;TestSuite™ provides a similar recording tool and scripting language [108]. In addition to&lt;br /&gt;delaying actions with a timer, the language provides a waitbitmap() function which pauses&lt;br /&gt;test script execution until an geometric area of the AUT matches the specified bitmap.&lt;br /&gt;SQA Suite™ provides a recording tool with an extremely small footprint on the screen.&lt;br /&gt;This is an important benefit over the other two test systems. One of the problems with&lt;br /&gt;recording test cases was that whenever there was a need to interact with the test-recording&lt;br /&gt;tool, the actions necessary to get to the recording tool were also recorded in the test script.&lt;br /&gt;The small footprint provided by SQA Suite™'s tool meant that the program's interface&lt;br /&gt;could be placed in a location next to, but not on top of or underneath the application.&lt;br /&gt;SQA Suite™'s scripting language [113] is a powerful subset of Visual Basic. Support is also&lt;br /&gt;included for any program written in Microsoft's Visual Basic if the user doesn't want to&lt;br /&gt;perform multiuser tests.&lt;br /&gt;4.4.4 Test Execution&lt;br /&gt;Test execution exercises the software and records the results of the execution. The&lt;br /&gt;software exercised may have been be specially instrumented for testing. The artifacts of&lt;br /&gt;test execution include test system and program output, execution traces, and bookkeeping&lt;br /&gt;data (e.g. when test was run, against what build/configuration, with what test case data, by&lt;br /&gt;whom). Systems supporting only test execution were the first kind of STEs developed.&lt;br /&gt;Final Exam C/S-Test™ provides a single system window for test recording, playback, and&lt;br /&gt;analysis. To execute a test case, the tester opens a script file and issues the run command&lt;br /&gt;through the system window. In order to begin the test, the AUT must be in the same state&lt;br /&gt;that it was when the test script was recorded. A text window displays the test script,&lt;br /&gt;highlighting the line currently being executed. A debugger is provided which allows the&lt;br /&gt;tester to single step through the script, set breakpoints, and query the contents of any&lt;br /&gt;script variable. The playback command two speed options: actual and fast. Actual will&lt;br /&gt;57&lt;br /&gt;replay the script actions at the same speed they were recorded. Fast will replay the script&lt;br /&gt;actions with smaller default delays. The results of the test case are saved in a log file for&lt;br /&gt;later analysis. Test scripts can be run in automatic batch mode by creating a script with a&lt;br /&gt;sequence of testExec(fileName) commands (where fileName is the name of a test script&lt;br /&gt;file).&lt;br /&gt;TestSuite™ views test execution as more formal process consisting of test cycles,&lt;br /&gt;automated and manual tests, and test result analysis. Four test cycles are identified: sanity,&lt;br /&gt;normal, advanced, and regression. A sanity test cycle tests the breadth of the application&lt;br /&gt;and consists mostly of tests that should have positive results. Normal and advanced cycles&lt;br /&gt;increase the depth of application testing and contain cases that are more destructive. The&lt;br /&gt;regression cycle verifies that changes in the AUT didn't cause failures other areas of the&lt;br /&gt;application. In addition to a batch mode support for scripts, TestSuite™ supports manual&lt;br /&gt;testing within the system. During a manual test, a dialog box is provided which allows the&lt;br /&gt;tester to indicate pass/fail status of the test and make comments.&lt;br /&gt;TestSuite™'s debugger is comparable to Final Exam C/S-Test™. In addition, it provides&lt;br /&gt;a variable watch list that allows the user view the values of variables and expressions as a&lt;br /&gt;test script is executing. Scripts can be played back in three modes: verify, debug, and&lt;br /&gt;update. The default mode, verify, executes the script and performs implicit and explicit&lt;br /&gt;verification. Debug mode allows the script to be played back with the debugger. Update&lt;br /&gt;sets the reference data used in implicit and explicit verification to be data from the current&lt;br /&gt;run.&lt;br /&gt;TestSuite™ allows the tester to set a number of execution options beyond the script's&lt;br /&gt;playback speed. The min_diff parameter defines the number of pixels that constitute a&lt;br /&gt;threshold match for bitmap verification. delay defines a frequency check for window&lt;br /&gt;stability. A window is sample at the delay specified rate until two consecutive passes&lt;br /&gt;result in the same display. This ensures the window is stable for verification or&lt;br /&gt;synchronization checks.&lt;br /&gt;SQA Suite™ views test execution in two phases: test development, regression testing.&lt;br /&gt;Test development is the process of creating, debugging, and baselining test cases for the&lt;br /&gt;58&lt;br /&gt;AUT. Regression testing executes the developed test cases against the current AUT's&lt;br /&gt;current build. The results of the execution are compared against the case's baseline. Any&lt;br /&gt;discrepancies are reported as potential errors. Although SQA Suite™ supports batch&lt;br /&gt;mode for scripts, it does not integrate manual testing into the process.&lt;br /&gt;The SQA Suite™ script debugger is comparable to TestSuite™'s. Because Visual Basic&lt;br /&gt;allows complex data types, the debugger also includes a data structure browser. SQA&lt;br /&gt;Suite™ only supports verify and debug execution modes. The baseline for a test case&lt;br /&gt;must be collected during recording. Script execution options focus on script playback&lt;br /&gt;speed, and matching window captions. Caption matching is a particular problem if an&lt;br /&gt;application is supported on different versions of the Windows operating system. For&lt;br /&gt;example, Windows 3.1 only supports 8 character filenames with 3 character extensions.&lt;br /&gt;The tester is also able to set test log options before executing a test script. These options&lt;br /&gt;include the level of detail written to the log (all, pass/fail, fail) and whether the results of&lt;br /&gt;the test should be written to the test repository. Finally, error recovery options are&lt;br /&gt;available. The user can specify how the playback should proceed if a script command fails,&lt;br /&gt;test case fails, or the AUT crashes.&lt;br /&gt;4.4.5 Test Analysis&lt;br /&gt;Test analysis examines a test case, both during and after execution to determine pass or&lt;br /&gt;failure. Artifacts from failure analysis include test case pass/failure, and a report for each&lt;br /&gt;failure. Some STEs with failure analysis capability use a test oracle, a subsystem that&lt;br /&gt;automatically analyzes software behavior and output during test execution. All-purpose&lt;br /&gt;test oracles do not currently exist, but several domain specific oracles have been&lt;br /&gt;developed. Poirot [114] analyzes the execution of parallel programs to determine and&lt;br /&gt;isolate performance problems. TAOS/GIL [115] compares a program's temporal&lt;br /&gt;specification against the trace of its implementation execution. TAOS/Reactive [116]&lt;br /&gt;requires the tester to translate specification locations where certain conditions must hold&lt;br /&gt;true to the same location within the implementation.&lt;br /&gt;An oracle is then constructed by creating assertions on these conditions in the&lt;br /&gt;implementation. Final Exam C/S-Test™ TML includes six kinds of verification&lt;br /&gt;statements that the script recorder can select. Bitmap verification allows the user to&lt;br /&gt;59&lt;br /&gt;identify a GUI widget or a geometric subset for comparison. A graphical snapshot is taken&lt;br /&gt;of the area at recording time. When the test script is run, a pixel by pixel comparison is&lt;br /&gt;made between the snapshot and the same area the AUT during playback. GUI object&lt;br /&gt;verification saves the state of one or more GUI widgets. During test script playback, a&lt;br /&gt;comparison is done between a widget's saved state and actual state on the AUT. Text&lt;br /&gt;verification is a special verification tool used for applications that support complex fonts,&lt;br /&gt;such as a WYSIWYG editor. Snapshots of the text area are taken and processed using&lt;br /&gt;Optical Character Recognition techniques to extract the actual text. Comparisons are&lt;br /&gt;made between the text at record and playback time. File verification performs a byte by&lt;br /&gt;byte comparison of a files generated at record and playback time. A user exit is provided&lt;br /&gt;so that the tester can define application specific verification routines. TestSuite™ and&lt;br /&gt;SQA Suite™ provide similar verification tools.&lt;br /&gt;Figure 15: Final Exam C/S Test Multiuser Architecture&lt;br /&gt;In Final Exam C/S-Test™, the results of a test execution are written to a log file. The log&lt;br /&gt;file contains verification pass/fail statements, test script parse and runtime errors, and user&lt;br /&gt;defined messages entered into the log file via the log() script command. A text browser is&lt;br /&gt;provided so the user can review the log. There are two viewing options: all and fail. All&lt;br /&gt;displays all log file output. Fail displays only test script failures. Both TestSuite™ and&lt;br /&gt;SQA Suite™ provide more sophisticated log file analysis tools.&lt;br /&gt;Monitor&lt;br /&gt;Server&lt;br /&gt;Workstation&lt;br /&gt;Server&lt;br /&gt;Workstation&lt;br /&gt;Server&lt;br /&gt;Workstation&lt;br /&gt;60&lt;br /&gt;SQA Suite™ provides special browser called the SQA Test Log Viewer [117]. The Log&lt;br /&gt;Viewer displays an abstraction of the log file that initially lists ten different kinds of log&lt;br /&gt;events, the date and time the event occurred, and a pass/fail status. Examples of events&lt;br /&gt;include start of a test script, call/return from a procedure, general protection fault, and&lt;br /&gt;script command failure. The tester can apply a filter to the event log to view only specific&lt;br /&gt;event types. The tester can get more detail about certain events in the log by selecting the&lt;br /&gt;event. For example, a test case event that has a failure status will display the script&lt;br /&gt;command that actually caused the failure. By double clicking on the test case event, the&lt;br /&gt;user can jump to the actual command in the test script editor. SQA Suite™ also provides&lt;br /&gt;a special comparator application, which allows the tester to compare the results of a test&lt;br /&gt;with the original baseline to determine if the failure recorded, is actually a problem. There&lt;br /&gt;are comparators for images, GUI objects, and text. If a test failure has been determined to&lt;br /&gt;be a program defect, the tester can enter a defect into the SQA Repository. The defect&lt;br /&gt;number will automatically be assigned to the test case results in the log file. TestSuite™&lt;br /&gt;provides a logfile with capabilities similar to SQA Suite™ integrated in the WinRunner&lt;br /&gt;application.&lt;br /&gt;SQA Suite™ includes a graphing package specifically for performance analysis. The&lt;br /&gt;execution times for test scripts and specific start/stop timer script commands are recorded&lt;br /&gt;in the log file. The tester can extract the results from the log file and display them on one&lt;br /&gt;and two-dimensional graphs. Several types of graphs are supported. Elapsed Times -&lt;br /&gt;Summary: graph shows the average elapsed times of repeated executions of a series of test&lt;br /&gt;scripts. Elapsed Times - Chronology: graph shows changes in elapsed time over the series&lt;br /&gt;of test script runs. Elapsed Times: Avg Min Max: graph shows average, min, max values&lt;br /&gt;of repeated executions of a series of test scripts. Performance: graph a series of test script&lt;br /&gt;runs vs. size of data processed. Errors: graph error frequency by test script. Neither&lt;br /&gt;TestSuite™ nor Final Exam C/S-Test™ provides any performance graphing utilities.&lt;br /&gt;4.4.6 Test Measurement&lt;br /&gt;Test measurement includes test coverage measurement, analysis, and instrumentation for&lt;br /&gt;data collection during execution traces. Artifacts include test coverage measures. Section&lt;br /&gt;4.2.4:White box testing discussed test coverage issues. Instrumentation presents a testing&lt;br /&gt;61&lt;br /&gt;challenge because code that has been instrumented behaves differently than the original&lt;br /&gt;code [118]. Standard profiling tools like prof exist for single process programs which&lt;br /&gt;provide call graphs, statement and function counts, and timing statistics. For parallel&lt;br /&gt;programs, instrumented communication libraries, such as the Portable Instrumented&lt;br /&gt;Communication Library (PICL) which trace the send/receive events and record&lt;br /&gt;communication statistics can be used [119]. One problem with massively parallel programs&lt;br /&gt;is that their size and lengthy execution times can result in extremely large execution traces.&lt;br /&gt;Selective instrumentation reduces the amount of data collected by allowing the tester to&lt;br /&gt;select when and what parts of the program will be instrumented. Paradyn, for example,&lt;br /&gt;allows code to be instrumented and de-instrumented on the fly [120].&lt;br /&gt;Apart from the recording test script execution times and providing basic test script&lt;br /&gt;start/stop timer commands none of the test systems have any sophisticated test&lt;br /&gt;measurement and instrumentation capabilities.&lt;br /&gt;4.4.7 Multiuser Testing&lt;br /&gt;Final Exam C/S-Test™ uses two kinds of specialized software to conduct multiuser&lt;br /&gt;testing. A single copy of the monitor program resides on one of the networked&lt;br /&gt;workstations. The monitor provides a session control tools to schedule and view status of&lt;br /&gt;test scripts executing on remote workstations. All workstations participating in a multiuser&lt;br /&gt;test are controlled with a local server program. The server program identifies the&lt;br /&gt;workstation to the monitor as available for testing, and responds to requests from the&lt;br /&gt;monitor (e.g. start executing test script). A test script executing in during a multiuser test is&lt;br /&gt;called a "virtual user". The log files from remote executions are written to a public&lt;br /&gt;directory accessible to all test machines. TestSuite™ and SQA Suite™ use a similar&lt;br /&gt;architecture.&lt;br /&gt;One area that TestSuite™ and SQA Suite™ differ from Final Exam C/S-Test™ is in a&lt;br /&gt;distinction between types of virtual users. In SQA Suite™, a GUI user executes a test&lt;br /&gt;script containing interactions with the application's user interface. Only one GUI user is&lt;br /&gt;allowed per workstation. The main goal of a GUI user is to perform correctness testing.&lt;br /&gt;A virtual user issues http commands against a web server, bypassing the user interface&lt;br /&gt;completely. Because of the reduced processing needed by the test system for text&lt;br /&gt;62&lt;br /&gt;commands, there can be many Virtual users on a single workstation. SQA Suite™&lt;br /&gt;guidelines state that each GUI user requires 20 MB of RAM, while a Virtual user requires&lt;br /&gt;just 1.5 MB. The purpose of Virtual users is to perform load and stress testing.&lt;br /&gt;TestSuite™ GUI and dB Users perform roles similar to SQA Suite™'s GUI and Virtual&lt;br /&gt;users.&lt;br /&gt;Synchronization plays a vital part in coordinating the execution of test scripts on multiple&lt;br /&gt;networked machines. Final Exam C/S-Test™ provides support for both synchronous&lt;br /&gt;and asynchronous messaging for synchronization (see Table 3):&lt;br /&gt;Script Command Description&lt;br /&gt;when ("msgId") enabled {&lt;br /&gt;stmts…&lt;br /&gt;}&lt;br /&gt;Tells TML to look at each incoming message id. If it&lt;br /&gt;matches "msgId" then the statements inside the code&lt;br /&gt;block are executed.&lt;br /&gt;enable "msgId"/disable "msgId" Enables/disables when blocks.&lt;br /&gt;sendMessage()&lt;br /&gt;Sends a message to remote host. Messages contain no&lt;br /&gt;information beyond the message ID. Message is&lt;br /&gt;acknowledged if received.&lt;br /&gt;multiMessage() Sends a message to multiple remote hosts. No&lt;br /&gt;acknowledgement is made if received.&lt;br /&gt;waitMessage() Waits for any message to enter the message queue.&lt;br /&gt;peekMessage() Looks at message on top of message queue without&lt;br /&gt;removing it.&lt;br /&gt;sendMessageToTML() C function allows AUT to send messages to local test&lt;br /&gt;script.&lt;br /&gt;RemoteCallerName() Returns the name of the remote host that caused the test&lt;br /&gt;script to be executed locally.&lt;br /&gt;Run "file" on "hostId" Runs test script file on remote host identified by&lt;br /&gt;host id.&lt;br /&gt;Table 3: Final Exam C/S-Test™ TML Script Commands for Multiuser Script Synchronization&lt;br /&gt;TestSuite™ coordinates virtual users with a synchronous messaging technique called&lt;br /&gt;"rendezvous". Each virtual user declares a rendezvous using the declare_rendezvous("&lt;br /&gt;rzvId") statement. To synchronize across test scripts, the command rendezvous("&lt;br /&gt;rzvId") is issued by all virtual users. Execution will not continue until all virtual&lt;br /&gt;users have executed the rendezvous() command with the same id. SQA Suite™ has a&lt;br /&gt;similar command, SQAVuSyncAndResume(), which provides the some additional capabilities.&lt;br /&gt;Through the monitor, the tester can specify a threshold for the number of virtual users&lt;br /&gt;that must reach the rendezvous point before execution can continue. The tester is also&lt;br /&gt;63&lt;br /&gt;allowed to explicitly force a virtual user to continue. Finally, a timeout option is provided&lt;br /&gt;which allows the virtual user to continue if the rendezvous condition has not been met.&lt;br /&gt;The Final Exam C/S-Test™ session control monitor consists of a status and message&lt;br /&gt;window. The status window reports the status of each workstation participating in the test&lt;br /&gt;session. Connected indicates that the workstation is ready to run a test script. Running&lt;br /&gt;means, a test script is executing on the machine. Getline means that the remote test script is&lt;br /&gt;waiting for the tester to enter some text at a special monitor command prompt. Waiting&lt;br /&gt;indicates the test script is waiting for a test script event (via the waitMessage() command).&lt;br /&gt;Error denotes some kind of error (verification, general protection fault, and script&lt;br /&gt;command) occurred. Stop is displayed when the script has successfully executed.&lt;br /&gt;Disconnected is displayed when the workstation has been dropped from the test session.&lt;br /&gt;The messages window displays any messages transmitted between test scripts via the&lt;br /&gt;sendMessage() or multiMessage() command.&lt;br /&gt;Both TestSuite™ and SQA Suite™ offer a more sophisticated session control interface.&lt;br /&gt;Besides remote workstation status, interfaces provide scheduling and limited&lt;br /&gt;synchronization capabilities. Table 4 is a slightly modified version of the session control&lt;br /&gt;interface for SQA Suite™ [121]. The label field associates a specific workstation and test&lt;br /&gt;script with an identifier. Test station identifies the name of a workstation used in the test&lt;br /&gt;session. Test entry contains a list of test scripts to be run in sequential order on the&lt;br /&gt;workstation specified by test station. The order the scripts appear in the list is the order&lt;br /&gt;they will be run unless overridden by a scheduling method. Status indicates the status of&lt;br /&gt;the workstation: editing, connected, not responding, running, run completed. Editing&lt;br /&gt;indicates that the tester is modifying the entries for the workstation in the session control&lt;br /&gt;window. The other states are self-explanatory. Scheduling method provides the user with&lt;br /&gt;some synchronization control. Valid methods are None, Wait, After &lt;test station&gt;&lt;br /&gt;&lt;time&gt;, and After &lt;label&gt;. None means that the test script appearing on the same row in&lt;br /&gt;the test entry column will be executed immediately after the script preceding it completes.&lt;br /&gt;Wait &lt;time&gt; delays the execution of the test script by &lt;time&gt; seconds. After &lt;test station&gt;&lt;br /&gt;&lt;time&gt; delays execution of test script until the first script on the workstation identified by&lt;br /&gt;64&lt;br /&gt;&lt;test station&gt; begins running and &lt;time&gt; seconds have passed. After &lt;label&gt; waits until&lt;br /&gt;the test script associated with &lt;label&gt; has finished executing.&lt;br /&gt;Table 4 depicts a scenario where two workstations (Hoover and Invicta) are being used to&lt;br /&gt;test the CollabBillboard application. Each workstation has two scripts to run&lt;br /&gt;consecutively. xSITESELECTION deals the multiuser effort to select a site in the city for the&lt;br /&gt;Billboard. SELECTVIEWBB causes the Invicta virtual user to select the View Billboard user&lt;br /&gt;role. The Hoover virtual user must wait until the Invicta virtual user has made the user&lt;br /&gt;role selection so an After clause was added to the scheduling method.&lt;br /&gt;Label Test Station Test Entry Status Scheduling Method&lt;br /&gt;HOOVER CSITESELECTION&lt;br /&gt;SELECTPLACEBB&lt;br /&gt;Running&lt;br /&gt;After SELECT &lt;20&gt;&lt;br /&gt;SELECT&lt;br /&gt;INVICTA SSITESELECTION&lt;br /&gt;SELECTVIEWBB&lt;br /&gt;Running&lt;br /&gt;Table 4: Session Control window from SQA Suite™&lt;br /&gt;64&lt;br /&gt;5 A CSCW Application Methodology for Testing&lt;br /&gt;Design, implementation, test, and maintenance of CSCW applications is a much more&lt;br /&gt;difficult process than for single user applications [7], [8], [9], [12], [40], [57]. Prior CSCW&lt;br /&gt;evaluation efforts were broad based, advocating the examination of both the social and&lt;br /&gt;technological aspects of an application. These broad based approaches combined with the&lt;br /&gt;research community’s preference for social evaluation have created a lack of specific&lt;br /&gt;techniques for the technological evaluation of CSCW software.&lt;br /&gt;In this chapter, we present a novel methodology for evaluating collaborative software. In&lt;br /&gt;contrast to existing techniques, our approach has a deliberate technological focus and&lt;br /&gt;derives from our observation that the evaluation of a CSCW application is divided into&lt;br /&gt;two stages: single user and multi-user. The single user stage is subdivided into general&lt;br /&gt;computing and human-computer interaction testing. The multi-user stage is decomposed&lt;br /&gt;into distributed computing and human-human interaction testing. The methodology&lt;br /&gt;provides a checklist and description of testing techniques for each stage.&lt;br /&gt;5.1 Related Work&lt;br /&gt;Early work in CSCW evaluation had a strong psychological component. Researchers&lt;br /&gt;focused on the group dynamics generated by the introduction of collaborative software&lt;br /&gt;into an organization. The techniques used to study CSCW software were taken primarily&lt;br /&gt;from the psychological and social sciences [122].&lt;br /&gt;5.1.1 Taxonomy of Evaluation Methodologies&lt;br /&gt;A number of methodologies have been applied to the evaluation of CSCW applications.&lt;br /&gt;These methodologies were developed apart from and then applied to the CSCW domain.&lt;br /&gt;Ramage [122] organized them into the taxonomy shown in Figure 16. Ethnography is the&lt;br /&gt;study of an entire organization in its natural surroundings over a prolonged period of time.&lt;br /&gt;Other Qualitative methodologies ask people questions about their experiences and&lt;br /&gt;compare/contrast the answers to other people surveyed. Psychological methods use either&lt;br /&gt;lab experiments that focus on the isolation and analysis of a very specific phenomenon or&lt;br /&gt;analytic approaches that attempt to describe human interaction using formal models.&lt;br /&gt;Systems Building focuses on the development of partial or complete systems with the goal of&lt;br /&gt;65&lt;br /&gt;improving them based on the evaluation. Taking Advice uses oral, video, and written&lt;br /&gt;information about an application as an evaluation mechanism.&lt;br /&gt;Ramage points out that his taxonomy is imperfect. The multi-disciplinary nature of CSCW&lt;br /&gt;makes it difficult to create a unifying taxonomy of evaluation methods. Also, given the&lt;br /&gt;breadth of his taxonomy there is some overlap between the methodologies.&lt;br /&gt;Figure 16: Taxonomy of Evaluation Methodologies [122]&lt;br /&gt;5.1.2 CSCW Evaluation Methodologies&lt;br /&gt;There are a handful of evaluation methodologies that were designed specifically for the&lt;br /&gt;CSCW domain. These are the Soft Systems Methodology [123], Participatory Evaluation&lt;br /&gt;Through Redesign and Analysis Methodology [99], Systemic Evaluation Shareholder&lt;br /&gt;Learning Methodology [122], and MITRE’s Evaluation Working Group Methodology [95],&lt;br /&gt;[124].&lt;br /&gt;The Soft Systems Methodology or SSM is rooted in management and information&lt;br /&gt;sciences. The evaluation is conducted without preconceived notions or questions about&lt;br /&gt;the nature of the system. The questions and views that are formulated are an important&lt;br /&gt;part of the evaluation process. According to Ramage, although it is difficult for novices to&lt;br /&gt;use SSM, it is a powerful methodology in the hands of an expert.&lt;br /&gt;The Participatory Evaluation Through Redesign and Analysis Methodology or PETRA&lt;br /&gt;was Ramage’s first attempt at an inclusive CSCW evaluation methodology. Multiplicity,&lt;br /&gt;the use of multiple evaluation methodologies, was important for an effective evaluation.&lt;br /&gt;The evaluation should incorporate multiple perspectives: evaluator and users. The&lt;br /&gt;Ethnography Other Qualitative Psychological Systems Building&lt;br /&gt;Taking Advice&lt;br /&gt;Ethnomethodology&lt;br /&gt;Conversational Analysis&lt;br /&gt;Interaction Analysis&lt;br /&gt;Distributed Cognition&lt;br /&gt;Activity Theory&lt;br /&gt;Structuration Theory&lt;br /&gt;Breakdown Analysis&lt;br /&gt;Others&lt;br /&gt;Interviews&lt;br /&gt;Questionnaires&lt;br /&gt;Group Discussions&lt;br /&gt;Lab Experiments&lt;br /&gt;Analytic Approaches&lt;br /&gt;GOMS Approach&lt;br /&gt;Iterative Prototyping&lt;br /&gt;Participatory Design&lt;br /&gt;Beta Testing/Customer Feedback&lt;br /&gt;Heuristic Evaluation&lt;br /&gt;User Testing&lt;br /&gt;Semi-Situated Ethnography&lt;br /&gt;Consumer Reports&lt;br /&gt;Consultancy Reports&lt;br /&gt;Marketing Literature&lt;br /&gt;66&lt;br /&gt;evaluator will be interested in theoretical models of collaborative activity induced by the&lt;br /&gt;system. Users, on the other hand, will be preoccupied with the design of the system and&lt;br /&gt;how they feel as a participant.&lt;br /&gt;The Systems Evaluation Stakeholder Learning Methodology or SESL improved upon&lt;br /&gt;PETRA. Like PETRA, SESL recognized the need for multiple evaluation methodologies.&lt;br /&gt;The “perspective” concept in the evaluation was replaced by the “stakeholder” concept.&lt;br /&gt;Ramage recognized that there were many different types of stakeholders involved in the&lt;br /&gt;evaluation of a CSCW system. The view of a stakeholder influences the evaluation. A&lt;br /&gt;software developer, for example, may consider the system good if it doesn’t have any&lt;br /&gt;detectable bugs. A psychologist may consider the same system bad because the floor&lt;br /&gt;control policies create tension between the participants. A manager using the system&lt;br /&gt;might dislike it because it the floor control policy makes it difficult to control employees.&lt;br /&gt;Employees might like the system because it allows them more freedom of expression. The&lt;br /&gt;idea of learning was also introduced in the evaluation. Ramage believed that the evaluator,&lt;br /&gt;as an active participant, could both learn and learn from the other evaluation participants.&lt;br /&gt;MITRE’s Evaluation Working Group Methodology or ECW was developed to give the&lt;br /&gt;group a timely, low-cost technique for evaluating CSCW systems for DARPA. The&lt;br /&gt;methodology has two phases. In phase one, the CSCW system is classified according to a&lt;br /&gt;CSCW framework developed by ECW. The framework consists of four broad categories:&lt;br /&gt;requirements, capabilities, service, and technology. The requirements level specifies tasks&lt;br /&gt;the users will perform. Tasks include work tasks like editing a document, transition tasks&lt;br /&gt;passing a document onto a reviewer for comments, and social protocols like coming to a&lt;br /&gt;consensus about who controls the floor during a discussion. The capability level describes&lt;br /&gt;the functional components a CSCW system has to support tasks from the requirements&lt;br /&gt;level. For example, two systems might support shared editing, however, one allows&lt;br /&gt;synchronous editing while the other is strictly asynchronous. The service level describes&lt;br /&gt;the general types of applications available to support collaborative activity. Examples&lt;br /&gt;include e-mail, audio, video, and remote windowing. The technology level describes a&lt;br /&gt;specific implementation of a service. For example, Eudora is an implementation of an email&lt;br /&gt;program.&lt;br /&gt;67&lt;br /&gt;The second phase uses pre-written scenarios to evaluate a CSCW application. Scenarios&lt;br /&gt;are not application specific rather they are derived from the categories and sub-categories&lt;br /&gt;in ECW’s framework. The classification of the application in phase one identifies the&lt;br /&gt;scenarios to use in the second phase. Scenarios are supported at each level of the&lt;br /&gt;framework and come in two forms: unscripted and scripted. An unscripted scenario&lt;br /&gt;provides a general description of what a user is supposed to be doing. This freedom of&lt;br /&gt;action allows observers to determine how easy it is for users to complete a set of tasks.&lt;br /&gt;Scripted scenarios dictate exactly how a user will user the system to complete set of tasks.&lt;br /&gt;This rigid structure allows observers to compare similar applications, measure the&lt;br /&gt;effectiveness of the same application with different groups, and reproduce user activity.&lt;br /&gt;5.2 A Technology Focused Methodology&lt;br /&gt;Figure 17: Intersecting Technologies of a CSCW&lt;br /&gt;Application&lt;br /&gt;Our CSCW Application MEthodoLOgy for Testing (CAMELOT) views the evaluation of&lt;br /&gt;a collaborative system from a technology perspective. The methodology decomposes a&lt;br /&gt;CSCW application into four intersecting software technologies (see Figure 17): General&lt;br /&gt;Computing, Human Computer Interaction, Distributed Computing, and Human-Human&lt;br /&gt;Interaction. Techniques derived from the literature are enumerated for each technology.&lt;br /&gt;Each technique has a unique label that can be used to classify tests and problems when&lt;br /&gt;using CAMELOT to evaluate an application.&lt;br /&gt;General Computing describes software components that provide general application&lt;br /&gt;capabilities. In its most primitive form, this describes a Turing Machine that takes&lt;br /&gt;Human-&lt;br /&gt;Computer&lt;br /&gt;Interaction&lt;br /&gt;Distributed&lt;br /&gt;Computing&lt;br /&gt;Human-Human&lt;br /&gt;Interaction&lt;br /&gt;General Computing&lt;br /&gt;68&lt;br /&gt;input, performs operations on the input, and produces output. All software&lt;br /&gt;technology falls under this broad category.&lt;br /&gt;Human-Computer Interaction describes components that deal with the interface&lt;br /&gt;between the user and the software system. These components include: processing&lt;br /&gt;user input from voice, mouse, joystick, and keyboard; graphical interfaces like&lt;br /&gt;windows, menu bars, push buttons, and text fields; processing application output&lt;br /&gt;like audio, video, and graphics.&lt;br /&gt;Distributed Computing describes components that are responsible for multitasking&lt;br /&gt;and multiprocessing in the application at the thread, process, processor, and&lt;br /&gt;machine levels. The main focus of distributed computing in the CSCW domain is&lt;br /&gt;the management of objects shared across users.&lt;br /&gt;Human-Human Interaction describes components that facilitate interaction between&lt;br /&gt;users during application use. Examples include floor control, session management,&lt;br /&gt;and shared windowing.&lt;br /&gt;The methodology is applied in two stages: single user followed by multi-user (see Figure&lt;br /&gt;18). In the single user stage, the evaluation focuses on the single user problems in the&lt;br /&gt;application. For the most part, these are described by general computing and humancomputer&lt;br /&gt;interaction techniques. Distributed computing and human-human interaction&lt;br /&gt;techniques are used to uncover flaws in the multi-user stage.&lt;br /&gt;Figure 18: CAMELOT’s Single/Multiuser Stages&lt;br /&gt;The intersecting nature of single and multi-user technologies may cause the techniques&lt;br /&gt;from one to trigger the development of tests or discovery of problems in another. In&lt;br /&gt;Chapter 7, for example, we shall see that the results of a single user general computing&lt;br /&gt;functional test of the client keyboard triggered the development of a multi-user distributed&lt;br /&gt;computing test that uncovered a race condition.&lt;br /&gt;General&lt;br /&gt;Computing&lt;br /&gt;Distributed&lt;br /&gt;Computing&lt;br /&gt;Human&lt;br /&gt;Computer&lt;br /&gt;Interaction&lt;br /&gt;Human-&lt;br /&gt;Human&lt;br /&gt;Interaction&lt;br /&gt;Computer Human&lt;br /&gt;Single User&lt;br /&gt;Multi-User&lt;br /&gt;69&lt;br /&gt;A unique code is associated with each test category. The code provides a classification&lt;br /&gt;scheme for the tests used and problems uncovered during application evaluation. We&lt;br /&gt;believe CAMELOT’s techniques are inclusive of most of the technology tests an evaluator&lt;br /&gt;would want to perform on a CSCW application. As new technologies are introduced,&lt;br /&gt;however, we expect the list to expand.&lt;br /&gt;CAMELOT provides a detailed set of techniques for detecting problems in CSCW&lt;br /&gt;software. Our approach is not algorithmic and cannot be fully automated. In order to&lt;br /&gt;guarantee that a program operates correctly, an automated test system would have to try&lt;br /&gt;every possible combination of input values or execution paths. Chapter 4’s survey of&lt;br /&gt;software testing has shown that researchers have been unable to identify a computationally&lt;br /&gt;feasible approach to automated testing. Like other intractable problems in computer&lt;br /&gt;science, practical testing approaches use heuristics to reduce the number of tests that must&lt;br /&gt;be performed. As with any heuristic, practical testing approaches like CAMELOT cannot&lt;br /&gt;guarantee that all application problems will be found.&lt;br /&gt;5.3 Single User Evaluation&lt;br /&gt;The first stage in CAMELOT’s evaluation process approaches the CSCW application from&lt;br /&gt;the perspective of a single user. There are two types of single user evaluation: General&lt;br /&gt;Computing and Human Computer Interaction. General Computing focuses testing&lt;br /&gt;techniques that can be used with any kind of software application. Human Computer&lt;br /&gt;Interaction techniques concentrate on identifying problems with application’s user&lt;br /&gt;interface. Single user tests are simpler to create, execute and analyze than multi-user tests.&lt;br /&gt;The insights gained during this stage can be used later in the evaluation. For example,&lt;br /&gt;shared objects used in the application can be identified for subsequent race condition and&lt;br /&gt;synchronization tests. As another example, the single user performance of an application&lt;br /&gt;function can give an indication of how that function will scale.&lt;br /&gt;5.3.1 General Computing&lt;br /&gt;Decades of research have gone into the discipline of software testing. A survey of this&lt;br /&gt;work appears in Chapter 4. The survey includes a definition of software testing and goals&lt;br /&gt;that are summarized here. Testing during the software lifecycle is a process by which the&lt;br /&gt;behavior-al properties of the software are verified. These properties include:&lt;br /&gt;70&lt;br /&gt;Correctness: disregarding system resource usage, meets specifications.&lt;br /&gt;Utility: meets user’s needs.&lt;br /&gt;Reliability: large mean time between failures.&lt;br /&gt;Robustness: the ability to handle different, hostile operating conditions.&lt;br /&gt;Performance: small response time, large throughput, and low resource utilization.&lt;br /&gt;There is little evidence that testing methodologies that verify the system at the&lt;br /&gt;requirements, specification, or design stages are used outside academia. The extraordinary&lt;br /&gt;amount of effort required by these testing methods, even for small software systems, is&lt;br /&gt;unattractive to the commercial software community. Taking its cue from difficulties with&lt;br /&gt;early life cycle testing, CAMELOT focuses on execution based testing of software. The&lt;br /&gt;structure of CAMELOT’s general testing methodology comes from Meyer’s classic work&lt;br /&gt;“The Art of Software Testing” [14]. The book presents a common sense approach to&lt;br /&gt;verification of software systems that has stood the test of time in both the commercial and&lt;br /&gt;academic communities. The techniques listed in Table 5 are used in later stages of the&lt;br /&gt;software life cycle. Detailed descriptions of these tests can be found in Section 4.2.6.&lt;br /&gt;CAMELOT Code Development&lt;br /&gt;Cycle&lt;br /&gt;Technique&lt;br /&gt;Implementation&lt;br /&gt;GC.IM.1 Functional Test1&lt;br /&gt;Integration&lt;br /&gt;GC.IN.1 Bottom Up2&lt;br /&gt;GC.IN.2 Top Down 2&lt;br /&gt;GC.IN.3 Sandwich2&lt;br /&gt;System Test&lt;br /&gt;GC.ST.1 Facility Test1&lt;br /&gt;GC.ST.2 Volume Test1&lt;br /&gt;GC.ST.3 Stress Test1&lt;br /&gt;GC.ST.4 Security Test1&lt;br /&gt;GC.ST.5 Performance Test1&lt;br /&gt;GC.ST.6 Configuration&lt;br /&gt;Test1&lt;br /&gt;GC.ST.7 Memory Test1&lt;br /&gt;GC.ST.8 Compatibility/&lt;br /&gt;Conversion Test1&lt;br /&gt;GC.ST.9 Install Test1&lt;br /&gt;GC.ST.10 Recovery Test1&lt;br /&gt;GC.ST.11 Documentation&lt;br /&gt;Test1&lt;br /&gt;GC.ST.12 Procedure Test1&lt;br /&gt;GC.ST.13 Acceptance Test1&lt;br /&gt;Table 5: General Computing Techniques from 1[14] and 2[10]&lt;br /&gt;71&lt;br /&gt;In Section 4.2.4, the two principal forms of functional testing are black box and white box.&lt;br /&gt;Though solid theoretically, these techniques both suffer from an impractical number of&lt;br /&gt;test cases generated by the combinatorics of data sets and statement paths. Despite over&lt;br /&gt;twenty years of work, researchers have been unable to reduce size of the test cases without&lt;br /&gt;sacrificing the accuracy of the verification. CAMELOT advocates the use of black box&lt;br /&gt;and white box testing where it is practical and where it is likely to find the most errors.&lt;br /&gt;What routines will be tested using these techniques is a subjective judgment made by the&lt;br /&gt;evaluator. For example, if the logic of a routine was difficult for a developer to encode,&lt;br /&gt;then it is a good candidate for white box testing. If a critical routine was designed to&lt;br /&gt;handle a wide range of input values, but only a small subset are typically provided, then it is&lt;br /&gt;a good candidate for a black box test. If quality assurance personnel are performing the&lt;br /&gt;evaluation, interviews with the development team can help identify candidates for black&lt;br /&gt;and white box testing.&lt;br /&gt;Performance tests are an important mechanism for evaluating an application. There are two&lt;br /&gt;kinds of performance issues in CSCW systems: single user and multi-user. For general&lt;br /&gt;testing, the evaluator should focus on the response time, throughput, and resource&lt;br /&gt;utilization of single user scenarios. Multi-user performance will be discussed in detail in&lt;br /&gt;Section 5.4.1.&lt;br /&gt;Although the techniques listed in Table 5 are organized by life cycle stage, the tests can be&lt;br /&gt;performed at any point in the cycle. For example, a security test might be performed&lt;br /&gt;during the implementation phase to prototype an application’s security features.&lt;br /&gt;5.3.2 Human Computer Interaction&lt;br /&gt;As we have seen in Section 4.3, a great deal of work by the academic and commercial&lt;br /&gt;communities has focused on testing human computer interaction. These efforts are&lt;br /&gt;concentrated in two main areas: general computing and usability.&lt;br /&gt;5.3.2.1 General Computing ∩Human Computer Interaction Techniques&lt;br /&gt;General computing intersects human computer interaction defining the correctness of the&lt;br /&gt;user interface as “proper behavior of the graphical user interface and proper computation&lt;br /&gt;of the underlying application.” [88] A general computing approach to human computer&lt;br /&gt;72&lt;br /&gt;interaction testing exercises the application using the techniques from Section 5.3.1. Yip&lt;br /&gt;[125] and Schneiderman [40] provide some additional techniques:&lt;br /&gt;Automated record/playback tools like those mentioned in Sections 4.3 and 0 are&lt;br /&gt;frequently used when taking a general computing test approach. These tools allow the&lt;br /&gt;evaluator to create a suite of regression or smoke tests that can be run on the application&lt;br /&gt;to ensure the stability of a new code release [86]. These test make sure that a new version&lt;br /&gt;of the software doesn’t corrupt the functionality of a previous version.&lt;br /&gt;CAMELOT&lt;br /&gt;Code&lt;br /&gt;Technique&lt;br /&gt;GC/HCI.1 Missing, invisible, unreachable components1 derived from:&lt;br /&gt;(GC.IN.1∩HCI) →GC/HCI.1&lt;br /&gt;GC.HCI.2 Failure to respond to user inputs1 derived from:&lt;br /&gt;GC.IN.1∩HCI) →GC.HCI.2&lt;br /&gt;GC/HCI.3 Cross-wired components (e.g. button press displays wrong&lt;br /&gt;component) 1 derived from: (GC.IN.1∩HCI) →GC/HCI.3&lt;br /&gt;GC/HCI.4 Incompleteness (e.g. close box present in some windows,&lt;br /&gt;but not others) 1 derived from (GC.ST.13∩HCI) →GC/HCI.4&lt;br /&gt;GC/HCI.5 Response time2 derived from: (GC.ST.5∩HCI) →GC/HCI.5&lt;br /&gt;Table 6 General Computing ∩Human Computer Interaction Techniques from 1[125], 2[40]&lt;br /&gt;General computing tests of user interfaces suffer from a combinatorial explosion of test&lt;br /&gt;cases due to the number of different paths a tester can take to exercise the same&lt;br /&gt;application function [90]. Like black and white box tests, CAMELOT’s approach to UI&lt;br /&gt;path testing requires evaluator judgment. Path tests should be conducted where the&lt;br /&gt;evaluator feels they will be the most fruitful in uncovering application flaws.&lt;br /&gt;5.3.2.2 Usability Techniques&lt;br /&gt;Usability testing evaluates a software application from the user’s perspective. The&lt;br /&gt;correctness of an application is measured in terms of the user’s effectiveness and feelings&lt;br /&gt;about the application, rather than the general computing techniques from Section 5.3.2.1.&lt;br /&gt;Over the past two decades, Schneiderman has produced and revised a thorough survey of&lt;br /&gt;user interface development techniques [40]. CAMELOT’s usability techniques, shown in&lt;br /&gt;Table 7, are taken from this survey. Usability criteria represent a general set of questions&lt;br /&gt;the evaluator should ask about a user’s use of the application. The Golden Rules for&lt;br /&gt;Application Design are eight guidelines for the design of any application with a user&lt;br /&gt;interface. User Interface Technology Guidelines is a list of specific techniques organized&lt;br /&gt;73&lt;br /&gt;by the user interface technology. Rather than repeating the guideline specifics here the&lt;br /&gt;reader is referred to the original text for more detail [40]:&lt;br /&gt;CAMELOT&lt;br /&gt;Code&lt;br /&gt;Technique&lt;br /&gt;Usability Criteria&lt;br /&gt;HCI.UC.1 Time to learn system: How long does it take for&lt;br /&gt;a typical user to learn to use the system?&lt;br /&gt;HCI.UC.2 Performance of tasks: How long does it take for&lt;br /&gt;a user to perform a typical set of tasks?&lt;br /&gt;HCI.UC.3 User errors: How many and what kind occur while&lt;br /&gt;performing a typical set of tasks?&lt;br /&gt;HCI.UC.4 Retention over time: Is it easy to remember how&lt;br /&gt;to use the system with infrequent use?&lt;br /&gt;HCI.UC.5 Subjective satisfaction: Do users like the&lt;br /&gt;system?&lt;br /&gt;Golden Rules for Application Design&lt;br /&gt;HCI.GR.1 Strive for consistency.&lt;br /&gt;HCI.GR.2 Enable frequent users to use shortcuts.&lt;br /&gt;HCI.GR.3 Offer informative feedback.&lt;br /&gt;HCI.GR.4 Design dialogs to yield closure.&lt;br /&gt;HCI.GR.5 Offer simple error handling.&lt;br /&gt;HCI.GR.6 Permit easy reversal of actions.&lt;br /&gt;HCI.GR.7 Support internal locus of control.&lt;br /&gt;HCI.GR.8 Reduce short-term memory load.&lt;br /&gt;User Interface Technology Techniques&lt;br /&gt;HCI.UITG.1 Data Display&lt;br /&gt;HCI.UITG.2 Getting the User’s Attention&lt;br /&gt;HCI.UITG.3 Data Entry&lt;br /&gt;HCI.UITG.4 Menu Selection&lt;br /&gt;HCI.UITG.5 Form Fillin Design&lt;br /&gt;HCI.UITG.6 Command Languages&lt;br /&gt;HCI.UITG.7 Direct Manipulation&lt;br /&gt;HCI.UITG.8 Interaction Devices&lt;br /&gt;HCI.UITG.9 Error Messages&lt;br /&gt;HCI.UITG.10 Color&lt;br /&gt;Table 7: Usability Techniques from [40]&lt;br /&gt;5.4 Multi-user Evaluation&lt;br /&gt;The second stage in CAMELOT’s evaluation process approaches the CSCW application&lt;br /&gt;from a multi-user perspective. Single users tests are simpler to create, execute and analyze&lt;br /&gt;than multi-user tests. The insights gained during this stage can be used later in the&lt;br /&gt;evaluation. For example, shared objects used in the application can be identified for&lt;br /&gt;subsequent race condition and synchronization tests. As another example, the single user&lt;br /&gt;performance of an application function can give an indication of how that function will&lt;br /&gt;scale.&lt;br /&gt;74&lt;br /&gt;5.4.1 Distributed Computing&lt;br /&gt;Distributed computing encompasses software written for multithreaded, multitasking,&lt;br /&gt;multiprocessor, or multimachine architectures. The technology is concerned with&lt;br /&gt;communication between one or more routines executing in parallel. Communication&lt;br /&gt;consists primarily of requests for/updates about some form of shared data. Distributed&lt;br /&gt;computing software suffers from four common problems: race conditions, deadlock,&lt;br /&gt;temporal consistency and scalability.&lt;br /&gt;5.4.1.1 Race Condition&lt;br /&gt;When two or more routines executing in parallel are allowed to simultaneously manipulate&lt;br /&gt;the same data instance without proper control it is called a race condition. Lack of controlled&lt;br /&gt;access to shared data may result in data corruption. A classic illustration of this is the&lt;br /&gt;ATM withdrawal example from database literature [126]. Bob and Becky share a joint&lt;br /&gt;checking account. At one ATM Bob views an account balance of $100.00 and makes a&lt;br /&gt;withdrawal of $50.00. Simultaneously, at another ATM, Becky views the account balance&lt;br /&gt;of $100.00 and makes a deposit of $50.00. The ending balance should read $100.00, but&lt;br /&gt;depending on the order ATM actions were processed the balance could also read $50.00 or&lt;br /&gt;$150.00. Race conditions are notoriously difficult to uncover and debug during testing&lt;br /&gt;because of subtle timing dependencies.&lt;br /&gt;5.4.1.2 Deadlock&lt;br /&gt;Synchronization eliminates race conditions by restricting access to shared data in a&lt;br /&gt;controlled manner using synchronization primitives such as mutual exclusion, semaphores,&lt;br /&gt;or message passing [127]. Consider the following scenario: Bob prepares to access the&lt;br /&gt;shared checking account. First the synchronization primitive guarding the account is&lt;br /&gt;checked. The primitive indicates the account is not being used. Bob is granted permission&lt;br /&gt;to access the account. Simultaneously, Becky prepares to access the shared checking&lt;br /&gt;account. Again, the synchronization primitive is checked. The primitive indicates the&lt;br /&gt;checking account is being used. Becky waits for the account to be free. When Bob&lt;br /&gt;finishes an indication is sent to the synchronization primitive that the account is now free.&lt;br /&gt;Becky can now access the account.&lt;br /&gt;75&lt;br /&gt;Synchronization introduces the potential for deadlock. Deadlock can occur when two or&lt;br /&gt;more parallel routines share two or more synchronization primitives. From the previous&lt;br /&gt;scenario if Bob cannot release access to the checking account until he has been granted&lt;br /&gt;access to the savings account and Becky cannot release access to the savings account until&lt;br /&gt;she has been granted access to the checking account, then deadlock occurs. Both will wait&lt;br /&gt;indefinitely for the other to exit the critical section. Deadlock can be avoided through&lt;br /&gt;careful software design. Like race conditions, detecting deadlock is notoriously difficult&lt;br /&gt;because of subtle timing dependencies. It is also difficult to debug because of complicated&lt;br /&gt;dependencies between parallel routines and synchronization primitives.&lt;br /&gt;5.4.1.3 Temporal Consistency&lt;br /&gt;Temporal consistency is the ability to correctly order messages within the CSCW&lt;br /&gt;application. If processA sends message0 at time t0 and processB sends a message1 at&lt;br /&gt;time t1 then processC should process message0 before message1. This is true even if&lt;br /&gt;the messages arrive at processC out of order. Temporal consistency is especially&lt;br /&gt;important when providing communication, feedback for the manipulation of shared&lt;br /&gt;objects, and user awareness. For example, consider a chat system with three users. userC&lt;br /&gt;is observing a conversation between userA and userB. UserA types “Thank you!” and&lt;br /&gt;userB types “You’re welcome” in response. Because of a network delay between userA&lt;br /&gt;and userC, userB’s message arrives first. Another example uses three users, user&lt;br /&gt;awareness, and a shared editing system. userA types the word “dessertation”. userB&lt;br /&gt;corrects the word by moving the cursor after the first ‘e’ and changing it to ‘i’. Again,&lt;br /&gt;because of a network delay between userA and userC, userB’s corrections to the word&lt;br /&gt;arrive at userC before the actual word arrives. Testing for temporal consistency problems&lt;br /&gt;involve techniques similar to those used for race conditions and deadlock. Network delay&lt;br /&gt;can be introduced by artificially consuming bandwidth, or by instrumenting the application&lt;br /&gt;to introduce artificial message delays.&lt;br /&gt;5.4.1.4 Scalability&lt;br /&gt;Scalability is also an important consideration in distributed computing. A system’s ability to&lt;br /&gt;scale as the number of users is increased measured using performance evaluation&lt;br /&gt;techniques. Although these techniques can be described generally, the actual evaluation is&lt;br /&gt;76&lt;br /&gt;application specific. Jain’s well-known text “The Art of Performance Evaluation” presents&lt;br /&gt;a systematic approach for any application[128].&lt;br /&gt;The key to the performance evaluation of CSCW applications is a thorough understanding&lt;br /&gt;of the application’s architecture and intended use. This understanding will reveal services&lt;br /&gt;that are candidates for scalability testing. Creating user scenarios that represent common&lt;br /&gt;user activity and then running these scenarios on the system using live or virtual users will&lt;br /&gt;place the system under a “typical” load. Measuring the system-wide resource utilization&lt;br /&gt;for this “typical” load can give the evaluator clues about how well the system will scale.&lt;br /&gt;Using a simulator to stress test a service will give the evaluator information about the&lt;br /&gt;service’s throughput, another measure of scalability. Finally, the simulator can also be used&lt;br /&gt;to add additional users to the system until performance of various services begin to&lt;br /&gt;degrade. This is a particularly powerful approach if used in conjunction with live users.&lt;br /&gt;The distributed architectures of CSCW systems fall between two extremes: centralized and&lt;br /&gt;decentralized. A centralized architecture concentrates the shared state in a single process&lt;br /&gt;on a single machine. When a process in the system manipulates shared data, it makes a&lt;br /&gt;request to the shared state process. Centralization simplifies access control for shared data&lt;br /&gt;by placing synchronization logic in a single process. Scalability problems can occur as an&lt;br /&gt;increasing number of users compete for the attention of the single state process.&lt;br /&gt;A decentralized architecture replicates shared state within each user process. A process&lt;br /&gt;manipulates shared data locally and the results of the manipulation are broadcast to other&lt;br /&gt;processes. Decentralization has scalability advantages because the cost of data&lt;br /&gt;manipulation is distributed across many processes. Shared data access control, however, is&lt;br /&gt;more challenging because the synchronization primitives must also be decentralized.&lt;br /&gt;Tightly coupled systems provide near instantaneous notification to all processes when&lt;br /&gt;shared data changes. Loosely coupled systems do not have strict temporal requirements.&lt;br /&gt;Tightly coupled systems have to be examined closely for scalability problems. The two&lt;br /&gt;areas to investigate are the frequency and size of the messages necessary to maintain the&lt;br /&gt;coupling. As the number of users in the system increases, the communication necessary&lt;br /&gt;77&lt;br /&gt;for state change notification will also rise. At a certain point this communication will&lt;br /&gt;consume all available network bandwidth.&lt;br /&gt;Another area to investigate is the impact of network delay on tightly coupled systems. In a&lt;br /&gt;typical development environment, there is almost no network delay because the equipment&lt;br /&gt;used to develop the system is on the same LAN. If the CSCW application is intended to&lt;br /&gt;deploy on the Internet across LANs, WANs, and backbones, then the application should&lt;br /&gt;be tested with network delays. Network delays can create untested timing configurations&lt;br /&gt;that trigger race conditions and deadlock. Network delays can be inexpensively simulated&lt;br /&gt;on a LAN by reducing bandwidth (downloading a large file on the LAN during a test) or&lt;br /&gt;by instrumenting the application with built in messaging delays.&lt;br /&gt;Loosely coupled systems can also suffer from race conditions. In a loosely coupled&lt;br /&gt;system, a shared object is manipulated locally. Updates to the object are sent to the rest of&lt;br /&gt;the system intermittently, perhaps as the result of a save, refresh, or update command.&lt;br /&gt;The race condition occurs when two users manipulate the same object simultaneously.&lt;br /&gt;CAMELOT&lt;br /&gt;Code&lt;br /&gt;Technique&lt;br /&gt;DC.RC.1 Race Condition&lt;br /&gt;DC.RC.2 Centralized Architecture&lt;br /&gt;DC.RC.3 Decentralized Architecture&lt;br /&gt;DC.RC.4 Loosely Coupled&lt;br /&gt;DC.D.1 Deadlock&lt;br /&gt;DC.D.2 Centralized Architecture&lt;br /&gt;DC.D.3 Decentralized Architecture&lt;br /&gt;DC.TC.1 Temporal Consistency&lt;br /&gt;DC.TC.2 Network Delay&lt;br /&gt;DC.S.1 Scalability&lt;br /&gt;DC.S.2 User Scenario&lt;br /&gt;DC.S.3 Stress User Scenario&lt;br /&gt;DC.S.4 Centralized Architecture&lt;br /&gt;DC.S.5 Decentralized Architecture&lt;br /&gt;DC.S.6 Tightly Coupled&lt;br /&gt;DC.S.7 Tightly Coupled/Network Delay&lt;br /&gt;DC.S.8 Loosely Coupled&lt;br /&gt;DC.S.9 Synchronization&lt;br /&gt;Table 8: Distributed Computing Techniques&lt;br /&gt;Typically, the system view will reflect the last user update of the shared object. An&lt;br /&gt;example of this is loosely coupled editing of a text document. If userA and userB are&lt;br /&gt;editing the same document, then one of the user’s edits will be lost. The system will only&lt;br /&gt;78&lt;br /&gt;retain the document state from the last user’s save command overwriting previous user&lt;br /&gt;saves.&lt;br /&gt;5.4.1.5 Distributed Computing Techniques&lt;br /&gt;It is critical that the evaluator have a deep understanding of the system’s architecture to&lt;br /&gt;test for race condition, deadlock, and scalability problems. In particular, the evaluator&lt;br /&gt;should understand the types of shared data in the system, the architecture that maintains&lt;br /&gt;the data, and user actions that trigger manipulation of the data.&lt;br /&gt;CAMELOT&lt;br /&gt;Code&lt;br /&gt;Technique&lt;br /&gt;GC/DC.1 Stress testing: multiple users joining/leaving the&lt;br /&gt;application simultaneously. There is a good chance this&lt;br /&gt;will during realistic application use. Derived from:&lt;br /&gt;(GC.ST.3 ∩DC) →GC/DC.1&lt;br /&gt;GC/DC.2 Stress testing: single user stress tests on shared&lt;br /&gt;objects can give valuable information about scalability&lt;br /&gt;by converting single user throughput to multiple users.&lt;br /&gt;Derived from: (GC.ST.3 ∩DC) →GC/DC.2&lt;br /&gt;GC/DC.3 Volume testing: can impact scalability. As the size of&lt;br /&gt;shared objects increase so will network bandwidth and&lt;br /&gt;possibly CPU processing requirements. This can lower&lt;br /&gt;throughput. Derived from: (GC.ST.2 ∩DC) →GC/DC.3&lt;br /&gt;GC/DC.4 Compatibility testing: be careful of situations where&lt;br /&gt;different machines have incompatible versions of the&lt;br /&gt;components that comprise the application. The more&lt;br /&gt;portable the machine, the greater the likelihood it will&lt;br /&gt;get out of synch with the distributed application.&lt;br /&gt;Derived from: (GC.ST.8 ∩DC) →GC/DC.4&lt;br /&gt;GC/DC.5 Subclass of distributed compatibility testing: different&lt;br /&gt;versions of the application can have different versions&lt;br /&gt;of on-line documentation. This can lead to confusion&lt;br /&gt;between cooperating users about how to use the&lt;br /&gt;application. Derived from: (GC.ST.8 ∩GC.ST.11 ∩DC) →&lt;br /&gt;GC/DC.5&lt;br /&gt;GC/DC.6 Recovery testing: users joining and leaving the&lt;br /&gt;application at unanticipated points in the execution of&lt;br /&gt;the application. Derived from: (GC.ST.10 ∩DC) →GC/DC.6&lt;br /&gt;Table 9: General Computing ∩Distributed&lt;br /&gt;Computing Techniques&lt;br /&gt;Table 8 reduces this section’s distributed computing discussion to a set of techniques:&lt;br /&gt;In addition to pure distributed computing, Table 9 introduces techniques resulting from&lt;br /&gt;the intersection with General Computing:&lt;br /&gt;79&lt;br /&gt;Table 10 presents techniques resulting from the intersection of Human Computer&lt;br /&gt;Interaction and Distributed Computing:&lt;br /&gt;CAMELOT&lt;br /&gt;Code&lt;br /&gt;Technique&lt;br /&gt;HCI/DC.1 Race condition testing: modern GUIs are multithreaded so&lt;br /&gt;race conditions have to be tested. In particular, watch&lt;br /&gt;out for situations where the user makes a UI selection&lt;br /&gt;but the results of the selection may be delayed for some&lt;br /&gt;period due to the processing involved. During this&lt;br /&gt;delay the user may make the same selection again or&lt;br /&gt;other selections because of anxiety about the slow&lt;br /&gt;response time of the first selection. This can place&lt;br /&gt;the application in a strange state as it tries to&lt;br /&gt;execute functionality associated with both UI&lt;br /&gt;selections. Derived from: (HCI ∩DC.RC.1) →HCI/DC.1&lt;br /&gt;HCI/DC.2 Deadlock testing: modern GUIs are multithreaded so&lt;br /&gt;deadlock has to be tested. Derived from: (HCI ∩&lt;br /&gt;DC.D.1) →HCI/DC.2&lt;br /&gt;GC/HCI/DC.1 Response time testing: The introduction of a network&lt;br /&gt;delay will quickly reveal response time problems with&lt;br /&gt;tightly coupled GUI components. Derived from:&lt;br /&gt;((GC.ST.5∩HCI).1 ∩DC.S.7) →GC/HCI/DC.1&lt;br /&gt;Table 10: Human Computer Interaction ∩Distributed Computing Techniques&lt;br /&gt;5.4.2 Human-Human Interaction&lt;br /&gt;Human-Human Interaction deals with functionality supporting interaction between&lt;br /&gt;application users. Much of this is social, and a great deal of research has focused on&lt;br /&gt;studying the social aspects of CSCW systems [122]. As mentioned earlier, CAMELOT&lt;br /&gt;does not focus on higher levels of social interaction. However, there are core CSCW&lt;br /&gt;technologies that support human-human interaction that CAMELOT can be used to&lt;br /&gt;evaluate. These technologies are the software components that facilitate communication,&lt;br /&gt;coordination, coupling, privacy, and user awareness.&lt;br /&gt;Communication allows one user to converse with one or more users in the application.&lt;br /&gt;Communication can be from of voice, visual, text, or gesture. Unless users share the same&lt;br /&gt;location, the intersection between distributed computing and human-human interaction is&lt;br /&gt;critical. Some form of network will be responsible for transportation of user&lt;br /&gt;communications. In the case of high bandwidth communication such as voice or visual,&lt;br /&gt;the tester should ensure that there is enough network capacity. This is particularly&lt;br /&gt;important if the application was developed in a lab with a high speed LAN but is to be&lt;br /&gt;80&lt;br /&gt;deployed across multiple LANs, WANs, or the Internet. The impact of bandwidth&lt;br /&gt;consumption from user communication on the rest of the application should also be&lt;br /&gt;studied. Revealing tests will be ones that exercise tightly coupled function (such as remote&lt;br /&gt;cursor movement) during user communication. Scalability testing is also important. More&lt;br /&gt;users mean more communication and greater bandwidth consumption.&lt;br /&gt;Tests that examine communication in combination with other technologies discussed in&lt;br /&gt;this section may also be necessary. Are coordination mechanisms available in the&lt;br /&gt;application to control communication? For example, can two users talk at the same time?&lt;br /&gt;How tightly coupled is the act of communication to its delivery? If users have an&lt;br /&gt;expectation of instantaneous communication, what is the impact of network delays? If the&lt;br /&gt;system supports private or anonymous communication, can it be subverted? When&lt;br /&gt;communication occurs, can the user determine whom it came from?&lt;br /&gt;Coordination of interaction focuses on how the software allows users to work together.&lt;br /&gt;Examples of coordination include floor control policies and social protocols. From a&lt;br /&gt;social perspective, poorly designed coordination will be ignored [129] or in the worst case,&lt;br /&gt;may interfere with the collaborative process [32]. From a technology standpoint,&lt;br /&gt;coordination can be broken down into components that provide group control and&lt;br /&gt;feedback about that control within the application. Human-computer interaction&lt;br /&gt;evaluation of these components is necessary. Data associated with coordination can also&lt;br /&gt;be considered a form of shared object, thus distributed computing evaluation is also&lt;br /&gt;necessary. For example, the “floor” can be considered a shared object. What happens if&lt;br /&gt;two users try to grab control of the floor at the same time?&lt;br /&gt;Coupling defines how users see changes that others make to the shared workspace. Tight&lt;br /&gt;coupling provides more frequent change updates; loose coupling provides less frequent&lt;br /&gt;updates. There is no single “correct” coupling for CSCW. What kind of coupling should&lt;br /&gt;be used varies from application to application, and even within a single application [57].&lt;br /&gt;Human computer interaction response time tests and distributed computing scalability&lt;br /&gt;tests are useful with this technology.&lt;br /&gt;81&lt;br /&gt;Security, privacy and trust are important to cooperating users. Users should be able to work in&lt;br /&gt;a private area where they feel confident that their activities are protected from others.&lt;br /&gt;Access control for individual or group information should be available to users [130]. In&lt;br /&gt;situations where anonymous input is supported, users should feel assured of their&lt;br /&gt;anonymity [131]. General computing security tests help evaluate these issues.&lt;br /&gt;Awareness of other users provides a social context in which work is conducted. The&lt;br /&gt;realization that other users are participating in the application has been shown to have a&lt;br /&gt;powerful motivating effect [16]. Many kinds of user awareness capabilities that have been&lt;br /&gt;added to CSCW applications including activity graphs, telepointers and cursors, user lists,&lt;br /&gt;multi-user scrollbars, radar views, and fisheye views [27]. As with coupling, there is no&lt;br /&gt;single correct form of user awareness. The testing issues include general computing,&lt;br /&gt;human computer interaction, and distributed computing. Given the information richness&lt;br /&gt;of some forms of user awareness, the evaluator should pay particular attention to general&lt;br /&gt;computing performance, human computer interaction response time, and distributed&lt;br /&gt;computing scalability problems as the number of users increase. Performance and&lt;br /&gt;response time problems can occur when the resources of the local user’s system are&lt;br /&gt;consumed by providing rich feedback about remote user activity. Scalability problems&lt;br /&gt;occur when feedback about user activity competes for network bandwidth with updates to&lt;br /&gt;shared data modified by user activity.&lt;br /&gt;Table 11 summarizes the human-human interaction techniques. For lookup convenience,&lt;br /&gt;Table 12 reorganizes Table 11 by CAMELOT code.&lt;br /&gt;CAMELOT Code Technique&lt;br /&gt;Communication&lt;br /&gt;HHI.CM.1 Network bandwidth sufficient to support user&lt;br /&gt;communication.&lt;br /&gt;HHI.CM.2 Impact of user communication on other communication in&lt;br /&gt;the application.&lt;br /&gt;HHI.CM.3 Impact of user communication on tightly coupled&lt;br /&gt;functions.&lt;br /&gt;DC/HHI.1 Distributed computing scalability tests. Derived&lt;br /&gt;from: (DC.S.1 ∩HHI.CM) →DC/HHI.1&lt;br /&gt;DC/HHI.2 Distributed computing temporal consistency tests.&lt;br /&gt;Derived from: (DC.TC.1 ∩HHI.CM) →DC/HHI.2&lt;br /&gt;HHI.1 User communication and coordination. Derived from:&lt;br /&gt;82&lt;br /&gt;(HHI.CM ∩HHI.CD) →HHI.1&lt;br /&gt;HHI.2 User communication and coupling. Derived from:&lt;br /&gt;(HHI.CM ∩HHI.CP) →HHI.2&lt;br /&gt;HHI.3 User communication and security. Derived from:&lt;br /&gt;(HHI.CM ∩HHI.S) →HHI.3&lt;br /&gt;Coordination&lt;br /&gt;HCI/HHI.2 Human computer interaction issues related to group&lt;br /&gt;control. Derived from: (HCI ∩HHI.CD) →HCI/HHI.2&lt;br /&gt;DC/HHI.3 Distributed computing race condition and deadlock&lt;br /&gt;tests for coordination shared objects. Derived from:&lt;br /&gt;(DC.RC.1 ∩DC.D.1 ∩HHI.CD) →DC/HHI.3&lt;br /&gt;Coupling&lt;br /&gt;GC/HCI/HHI.1 Human computer interaction response time tests.&lt;br /&gt;Derived from: (GC/HCI.5 ∩HHI.CP) →GC/HCI/HHI.1&lt;br /&gt;DC/HHI.4 Distributed computing scalability tests. Derived&lt;br /&gt;from: (DC.S ∩HHI.CP) →DC/HHI.1&lt;br /&gt;DC/HHI.5 Distributed computing temporal consistency tests.&lt;br /&gt;Derived from: (DC.TC ∩HHI.CP) →DC/HHI.2&lt;br /&gt;Security&lt;br /&gt;GC/HHI.1 General computing security tests. Derived from:&lt;br /&gt;(GC.ST.4 ∩HHI.S) →GC/HHI.1&lt;br /&gt;Awareness&lt;br /&gt;GC/HHI.2 General computing performance tests. Derived from&lt;br /&gt;(GC.ST.5 ∩GC/HCI.5 ∩HHI.A) →GC/HCI/HHI.2&lt;br /&gt;GC/HCI/HHI.2 Human computer interaction response time tests.&lt;br /&gt;Derived from (GC/HCI.5 ∩HHI.A) →GC/HCI/HHI.2&lt;br /&gt;DC/HHI.5 Distributed computing scalability tests. Derived&lt;br /&gt;from: (DC.S ∩HHI.CP) →DC/HHI.5&lt;br /&gt;DC/HHI.6 Distributed computing temporal consistency tests.&lt;br /&gt;Derived from: (DC.TC ∩HHI.CP) →DC/HHI.6&lt;br /&gt;Table 11: Human-Human Interaction Techniques&lt;br /&gt;CAMELOT Code Category Technique&lt;br /&gt;HHI.1 Communication/Coordination User communication and&lt;br /&gt;coordination. Derived&lt;br /&gt;from: (HHI.CM ∩HHI.CD) →&lt;br /&gt;HHI.1&lt;br /&gt;HHI.2 Communication/Coupling User communication and&lt;br /&gt;coupling. Derived from:&lt;br /&gt;(HHI.CM ∩HHI.CP) →HHI.2&lt;br /&gt;HHI.3 Communication/Security User communication and&lt;br /&gt;security. Derived from:&lt;br /&gt;(HHI.CM ∩HHI.S) →HHI.3&lt;br /&gt;HHI.CM.1 Communication Network bandwidth&lt;br /&gt;sufficient to support user&lt;br /&gt;communication.&lt;br /&gt;HHI.CM.2 Communication Impact of user&lt;br /&gt;i ti th&lt;br /&gt;83&lt;br /&gt;communication on other&lt;br /&gt;communication in the&lt;br /&gt;application.&lt;br /&gt;HHI.CM.3 Communication Impact of user&lt;br /&gt;communication on tightly&lt;br /&gt;coupled functions.&lt;br /&gt;DC/HHI.1 Communication Distributed computing&lt;br /&gt;scalability tests.&lt;br /&gt;Derived from: (DC.S.1 ∩&lt;br /&gt;HHI.CM) →DC/HHI.1&lt;br /&gt;DC/HHI.2 Communication Distributed computing&lt;br /&gt;temporal consistency&lt;br /&gt;tests. Derived from:&lt;br /&gt;(DC.TC.1 ∩HHI.CM) →&lt;br /&gt;DC/HHI.2&lt;br /&gt;DC/HHI.3 Coordination Distributed computing race&lt;br /&gt;condition and deadlock&lt;br /&gt;tests for coordination&lt;br /&gt;shared objects. Derived&lt;br /&gt;from: (DC.RC.1 ∩DC.D.1 ∩&lt;br /&gt;HHI.CD) →DC/HHI.3&lt;br /&gt;DC/HHI.4 Coupling Distributed computing&lt;br /&gt;scalability tests.&lt;br /&gt;Derived from: (DC.S ∩&lt;br /&gt;HHI.CP) →DC/HHI.1&lt;br /&gt;DC/HHI.5 Coupling Distributed computing&lt;br /&gt;temporal consistency&lt;br /&gt;tests. Derived from:&lt;br /&gt;(DC.TC ∩HHI.CP) →&lt;br /&gt;DC/HHI.2&lt;br /&gt;DC/HHI.6 Awareness Distributed computing&lt;br /&gt;scalability tests.&lt;br /&gt;Derived from: (DC.S ∩&lt;br /&gt;HHI.CP) →DC/HHI.6&lt;br /&gt;CAMELOT Code Category Technique&lt;br /&gt;DC/HHI.7 Awareness Distributed computing temporal consistency&lt;br /&gt;tests. Derived from: (DC.TC ∩HHI.CP) →&lt;br /&gt;DC/HHI.7&lt;br /&gt;GC/HHI.1 Security General computing security tests. Derived&lt;br /&gt;from: (GC.ST.4 ∩HHI.S) →GC/HHI.1&lt;br /&gt;GC/HHI.2 Awareness General computing performance tests.&lt;br /&gt;Derived from (GC.ST.5 ∩GC/HCI.5 ∩HHI.A)&lt;br /&gt;→GC/HCI/HHI.2&lt;br /&gt;HCI/HHI.2 Coordination Human computer interaction issues related&lt;br /&gt;to group control. Derived from: (HCI ∩&lt;br /&gt;HHI.CD) →HCI/HHI.2&lt;br /&gt;GC/HCI/HHI.1 Coupling Human computer interaction response time&lt;br /&gt;tests. Derived from: (GC/HCI.5 ∩HHI.CP)&lt;br /&gt;→GC/HCI/HHI.1&lt;br /&gt;GC/HCI/HHI.2 Awareness Human computer interaction response time&lt;br /&gt;tests. Derived from (GC/HCI.5 ∩HHI.A) →&lt;br /&gt;84&lt;br /&gt;GC/HCI/HHI.2&lt;br /&gt;Table 12: Human-Human Techniques Organized by CAMELOT Code&lt;br /&gt;5.5 Conclusion&lt;br /&gt;Despite acknowledging a technological aspect to CSCW, existing methodologies provide&lt;br /&gt;little guidance to developers and quality assurance personnel. CAMELOT provides this&lt;br /&gt;guidance by organizing a technical evaluation into two stages and four intersecting&lt;br /&gt;technologies and providing detailed techniques for each. In this section we discuss the&lt;br /&gt;steps involved in an evaluation using CAMELOT, compare our methodology to prior art,&lt;br /&gt;and look at the future of CSCW evaluation.&lt;br /&gt;5.5.1 Ordering an Evaluation&lt;br /&gt;Application evaluation using CAMELOT should proceed in the following manner.&lt;br /&gt;Ordering of the techniques in each technology category is not important, but the order&lt;br /&gt;that the categories are used in an evaluation is critical. The application should be examined&lt;br /&gt;from a single user perspective first because multiuser problems are more difficult to detect.&lt;br /&gt;This will familiarize the evaluator with application function, architecture, and user interface&lt;br /&gt;before tackling more complicated testing issues associated with distributed computing and&lt;br /&gt;human-human interaction. Within the single user stage, general computing tests should be&lt;br /&gt;performed before investigating human computer interaction. This will familiarize the&lt;br /&gt;evaluator with application functionality and provide a context for the user interface. The&lt;br /&gt;stress tests and general performance of the application during the single user tests will give&lt;br /&gt;the evaluator valuable insight into the application’s ability to scale with multiple users.&lt;br /&gt;During the multiuser testing stage, distributed computing problems should be investigated&lt;br /&gt;first to provide context for later human-human interaction testing.&lt;br /&gt;As mentioned earlier, the intersecting nature of single and multi-user technologies may&lt;br /&gt;cause the techniques from one to trigger the development of tests or discovery of&lt;br /&gt;problems in another. In Chapter 7, for example, we will see that the results of a single user&lt;br /&gt;general computing functional test of the client keyboard triggered the development of a&lt;br /&gt;multi-user distributed computing test that uncovered a race condition&lt;br /&gt;85&lt;br /&gt;5.5.2 Comparison to Existing Methodologies&lt;br /&gt;Compared to the broadly scoped methodologies listed in Section 5.1, CAMELOT is more&lt;br /&gt;narrowly focused. Recall the primary motivation for the thesis was the frustration&lt;br /&gt;encountered during our own CSCW development efforts (Chapter 3). The major cause of&lt;br /&gt;that frustration was the lack of techniques and tools for testing CSCW software.&lt;br /&gt;CAMELOT is a testing methodology that can be used by application developers during&lt;br /&gt;application implementation, and quality assurance personnel in later stages of the software&lt;br /&gt;life cycle of a CSCW application. Application developers perform this type of evaluation&lt;br /&gt;continuously during the development of a software system. A design is implemented, and&lt;br /&gt;the implementation is exercised to see if it is acceptable. Quality assurance personnel also&lt;br /&gt;perform this type of evaluation during analysis of the application.&lt;br /&gt;CAMELOT complements ECW’s framework, specifically by adding new evaluation&lt;br /&gt;capabilities at the technology level. Rather than specific software products that comprise a&lt;br /&gt;CSCW system, CAMELOT is concerned with testing underlying technologies that&lt;br /&gt;compose the software products. For example, ShrEdit [132] uses a single, centralized&lt;br /&gt;server to manage objects shared between users. From CAMELOT’s distributed&lt;br /&gt;computing perspective, the design decision gives an evaluator clues that the system may&lt;br /&gt;suffer from scalability problems as users contend for the attention of a single server.&lt;br /&gt;5.5.3 Part of a Complete Evaluation&lt;br /&gt;Figure 19: Technology and Social Aspects of CSCW [122]&lt;br /&gt;Ramage believed that CSCW applications had both social and technological components.&lt;br /&gt;He found that most prior work in CSCW evaluation focused exclusively on the social&lt;br /&gt;aspects of the system. On the technological side he warned that:&lt;br /&gt;Technical&lt;br /&gt;Social&lt;br /&gt;86&lt;br /&gt;It may well be the case that a computer system will be designed perfectly, with all of the&lt;br /&gt;right sort of software engineering procedures, requirements analysis and usability testing,&lt;br /&gt;but that the system is introduced insensitively, or it cuts across the way people have&lt;br /&gt;become used to working or it changes the power relationships between workers. [122]&lt;br /&gt;He argued that an evaluation approach should take both the social and the technological&lt;br /&gt;aspects of the system into account.&lt;br /&gt;CAMELOT’s deliberate technological focus is not concerned with higher-level social&lt;br /&gt;aspects of a CSCW system. CAMELOT’s main contribution is an organization and&lt;br /&gt;detailed description of the technologies that comprise CSCW software and the problems&lt;br /&gt;that should be tested for using these technologies. Following Ramage’s concept of&lt;br /&gt;multiplicity, CAMELOT should be used in conjunction with other methodologies for a&lt;br /&gt;complete evaluation of a CSCW system.&lt;br /&gt;87&lt;br /&gt;6 Rebecca: An Architecture for Testing CSCW Applications&lt;br /&gt;CAMELOT, the methodology we presented in Chapter 5, can be used to evaluate CSCW&lt;br /&gt;applications with live users, testing software, or a combination of both. In addition to&lt;br /&gt;CAMELOT, we have developed Rebecca, an architecture for testing CSCW applications.&lt;br /&gt;Motivation for Rebecca came from frustrating experiences we had developing collaborative&lt;br /&gt;software. No tools were available for execution based testing of the software we developed.&lt;br /&gt;Our only alternative was to use live users to exercise the application. Initially, volunteers were&lt;br /&gt;enthusiastic. As time went on, however, they became more reluctant as they realized how&lt;br /&gt;much time was needed to debug the system. It was especially difficult when we needed the&lt;br /&gt;constant presence of a group of volunteers to troubleshoot an insidious timing problem. Our&lt;br /&gt;investigation into state-of-the art testing software revealed no research work in the area of&lt;br /&gt;multiuser testing. Commercial multiuser testing software existed, but allowed only rigid&lt;br /&gt;prescribed evaluations, which precluded live user participation.&lt;br /&gt;Rebecca makes significant contributions to both the general and multiuser infrastructure of&lt;br /&gt;execution based testing systems. The contributions are summarized informally in this&lt;br /&gt;introduction. The rest of the chapter is organized into three main sections. Section 6.1&lt;br /&gt;discusses the basic architecture including agent, registration, event list, component, playback,&lt;br /&gt;state, and trigger management. Section 6.2 presents the architecture’s improvements to the&lt;br /&gt;general infrastructure of execution based testing. Section 6.3 covers improvements to&lt;br /&gt;multiuser testing. Each subsection in 6.2 and 6.3 presents the architecture followed by a&lt;br /&gt;discussion of the Java-based implementation in Rebecca-J.&lt;br /&gt;Rebecca contributions to the general infrastructure of execution based testing systems include:&lt;br /&gt;The record/playback process is improved beyond the user interface with extensible&lt;br /&gt;component and event models. Any application activity can be replayed if the source is&lt;br /&gt;defined as a component, and the activity is defined as an event.&lt;br /&gt;A record filtration system is defined that allows the user to filter events by selecting&lt;br /&gt;which components participate in a recording. In past systems, the only filtration&lt;br /&gt;options were manually intensive intermittent recording or editing of the recording.&lt;br /&gt;88&lt;br /&gt;Unlike traditional testing systems, which view testing as a separate task from&lt;br /&gt;development, the architecture seamlessly integrates into existing integrated&lt;br /&gt;development tools such as IBM's Visual Age.&lt;br /&gt;For sophisticated data structures and control flow in a test script, Rebecca describes a&lt;br /&gt;blueprint for exporting recordings in a familiar format: the IDE's native programming&lt;br /&gt;language. This contrasts with traditional test systems, which require the user to learn a&lt;br /&gt;proprietary scripting language.&lt;br /&gt;Re-recording of scripts after application changes have been made is reduced using&lt;br /&gt;runtime resolution of components and component-centric events.&lt;br /&gt;Recording script management is simplified with a VCR-like metaphor for creating,&lt;br /&gt;editing and executing tests. This allows the user to create and run a test in seconds.&lt;br /&gt;Rebecca also breaks new ground in the area of multiuser execution based testing including:&lt;br /&gt;The ability to incorporate live and virtual users into a single test session using&lt;br /&gt;distributed triggers. With triggers, virtual users react to events generated by other users&lt;br /&gt;(live or virtual). Existing test systems completely prescribe a test session, which&lt;br /&gt;precludes meaningful live user participation.&lt;br /&gt;Virtual users can react to four classes of events using triggers: user interface, state&lt;br /&gt;change, timer, and customized. This allows the virtual user to respond to virtually an&lt;br /&gt;application activity, much like a live user.&lt;br /&gt;Threshold models are provided which allow the tester specify the characteristics of an&lt;br /&gt;event or sequence of events that will fire a trigger. A threshold model has a user&lt;br /&gt;interface component, which allows runtime specification of firing conditions. An&lt;br /&gt;extensible object oriented framework for complete customization is also included.&lt;br /&gt;Improvements to synchronization during multiuser playback including an&lt;br /&gt;orchestration metaphor, simplified synchronization mechanisms, deadlock detection,&lt;br /&gt;and deadlock recovery.&lt;br /&gt;A global recording clipboard, which simplifies the process of sharing some or all of a&lt;br /&gt;recording between virtual users.&lt;br /&gt;Ability to record, playback, and monitor application communication while maintaining&lt;br /&gt;independence from the communication mechanism. Existing test systems do not&lt;br /&gt;provide the ability to monitor application communication. The few academic systems&lt;br /&gt;that do provide this ability are mechanism specific.&lt;br /&gt;A resource conserving architecture. This allows the system to run in tandem with an&lt;br /&gt;IDE, and improves scalability as the number of users participating in a test increases.&lt;br /&gt;89&lt;br /&gt;Figure 20: General architecture diagram for Rebecca&lt;br /&gt;6.1 General Architecture&lt;br /&gt;Figure 20 shows Rebecca’s general architecture. An agent is activated by the application under&lt;br /&gt;test. Once activated, the agent connects to the server. Each agent maintains a list of trigger&lt;br /&gt;listeners and recording players. A trigger listener fires when the application generates a specific&lt;br /&gt;component event. A recording player replays a recording when requested by the server, or&lt;br /&gt;when a trigger fires. In addition to replay and triggers, the agent is responsible for event list&lt;br /&gt;and component management. A recording is stored as an ordered list of events and associated&lt;br /&gt;with a recording player. All commands to insert, update, delete, replay, or change the current&lt;br /&gt;position in the recording are handled by the agent. The agent also acts as the monitoring&lt;br /&gt;system for components the application registers with Rebecca. Special logic is included in the&lt;br /&gt;component management system to automatically register GUI components.&lt;br /&gt;General Architecture&lt;br /&gt;Application&lt;br /&gt;Agent Agent&lt;br /&gt;Listeners Players&lt;br /&gt;machine0&lt;br /&gt;Application&lt;br /&gt;Agent Agent&lt;br /&gt;Listeners Players&lt;br /&gt;machine1&lt;br /&gt;Application&lt;br /&gt;Agent Agent&lt;br /&gt;Listeners Players&lt;br /&gt;Server Server&lt;br /&gt;Syncrhonization&lt;br /&gt;Management&lt;br /&gt;Agent&lt;br /&gt;Management&lt;br /&gt;Trigger&lt;br /&gt;Management&lt;br /&gt;machineK&lt;br /&gt;. . .&lt;br /&gt;Trigger&lt;br /&gt;Management&lt;br /&gt;Event List&lt;br /&gt;Management&lt;br /&gt;Replay&lt;br /&gt;Management&lt;br /&gt;Component&lt;br /&gt;Management&lt;br /&gt;Agent&lt;br /&gt;machineN&lt;br /&gt;User Interface&lt;br /&gt;90&lt;br /&gt;The server provides Rebecca’s user interface. Manipulation of trigger listeners, recording&lt;br /&gt;players, and recordings is performed through this user interface. As agents connect to the&lt;br /&gt;server, they are added to trigger listener and recording player lists. These lists specify triggering&lt;br /&gt;agents, and agents with recordings. The user interface also includes a remote view of a&lt;br /&gt;triggering agent’s component hierarchy. This view is used to specify the component and event&lt;br /&gt;associated with a trigger.&lt;br /&gt;For remote manipulation of an agent’s recording, the server displays a VCR-like control panel&lt;br /&gt;and editing window. All editing and control commands are forwarded to the agent that owns&lt;br /&gt;the recording. Agent feedback about the state of the recording is sent to the server and&lt;br /&gt;displayed in the control panel.&lt;br /&gt;Synchronization events that occur during replay of an agent’s recording are forwarded to the&lt;br /&gt;server. The server is responsible for determining when to block or release agent players during&lt;br /&gt;synchronization.&lt;br /&gt;Figure 21: Registration management architecture diagram for Rebecca.&lt;br /&gt;Agent1&lt;br /&gt;Server&lt;br /&gt;Naming&lt;br /&gt;Service&lt;br /&gt;RebeccaServerImpl&lt;br /&gt;RRecordStateListenerImpl&lt;br /&gt;TriggerCounterBeanImpl&lt;br /&gt;Register&lt;br /&gt;Properties File&lt;br /&gt;...&lt;br /&gt;ServerName=&lt;br /&gt;NamingService=&lt;br /&gt;Agent0&lt;br /&gt;RebeccaAgentImpl&lt;br /&gt;ComponentMonitorImpl&lt;br /&gt;ComponentTreeModelImpl&lt;br /&gt;EventListImpl&lt;br /&gt;TriggerPlayerImpl&lt;br /&gt;RecordImpl&lt;br /&gt;PlaybackThreadProxyImpl&lt;br /&gt;AgentPlayer Objects Agent Objects&lt;br /&gt;TriggerListenerImpl&lt;br /&gt;AgentListener Objects&lt;br /&gt;Properties File&lt;br /&gt;...&lt;br /&gt;ServerName=&lt;br /&gt;NamingService=&lt;br /&gt;Registration Management&lt;br /&gt;Register&lt;br /&gt;Register&lt;br /&gt;91&lt;br /&gt;6.1.1 Registration Management&lt;br /&gt;Figure 21 shows the registration management architecture for Rebecca. The server first&lt;br /&gt;locates the naming service using the URL specified by the properties file entry&lt;br /&gt;NamingService. The server registers itself with the naming service using the prefix from the&lt;br /&gt;properties file entry ServerName. The name looks something like a URL:&lt;br /&gt;//SecondWind/RebeccaServer. The first part of the string identifies the server’s machine.&lt;br /&gt;The second part is the name of the server. The naming service ensures the server has a unique&lt;br /&gt;name by looking up the id in a hashtable. If the string is found, a counter associated with it is&lt;br /&gt;incremented. If the string is not found, it is added to the hashtable, a counter is associated&lt;br /&gt;with it, and the counter is set to zero. The value of the counter is then appended to the id.&lt;br /&gt;Agents locate the naming service using the NamingService properties file entry. An agent is&lt;br /&gt;assigned a unique name by the naming service. The agent then locates the server using the&lt;br /&gt;properties file entry ServerName. The agent registers with the server, passing its unique name.&lt;br /&gt;This allows the server to identify the agent during a testing session.&lt;br /&gt;Rebecca uses a non-monolithic approach for communication between processes. Once a link&lt;br /&gt;between the server and an agent is established, objects are exported which can service remote&lt;br /&gt;requests on independent threads. Threading increases the multiprocessing capabilities of the&lt;br /&gt;testing system. Remote objects encapsulate important subsystems in independent blocks of&lt;br /&gt;application logic. This reduces the likelihood of threading problems.&lt;br /&gt;Table 13 lists the important objects exported by the server and agents.&lt;br /&gt;6.1.2 Event List Management&lt;br /&gt;Rebecca stores recorded application activity in two possible formats: ordered event list and&lt;br /&gt;native language. Figure 22 shows a high level view of Rebecca’s event list management&lt;br /&gt;subsystem. The subsystem uses a model/view/controller design pattern. The model, stored&lt;br /&gt;in an agent, is an ordered set of events that make up the recording. The view, presented on&lt;br /&gt;the server, displays a subset of the model in a scrollable window. The controller, also on the&lt;br /&gt;92&lt;br /&gt;server, intercepts keyboard and mouse commands directed at the view and forwards them to&lt;br /&gt;the model.&lt;br /&gt;Object Name Description&lt;br /&gt;RRecordStateListenerImpl Used in combination with RecordProxy on server side and&lt;br /&gt;RecordImpl on agent side to forward state changes from a&lt;br /&gt;record/playback session. For example, if an agent’s replay finishes, the stop&lt;br /&gt;state will be forwarded to the server. One per RecordImpl.&lt;br /&gt;TriggerCounterBeanImpl An encapsulated object that follows the MVC paradigm. Allows user to set a&lt;br /&gt;maximum number of trigger firings for specific trigger listener. Agent&lt;br /&gt;responsible the trigger listener, updates the count each time a trigger is fired.&lt;br /&gt;The object displays the firing count on the server. One per&lt;br /&gt;TriggerListenerImpl.&lt;br /&gt;RebeccaAgentImpl Main handle for an agent. Used by server to access agent’s other remote&lt;br /&gt;objects. Used by application to register with test system. One per agent.&lt;br /&gt;ComponentMonitorImpl/&lt;br /&gt;ComponentTreeModelImpl&lt;br /&gt;Contains all components the application registered with the test system.&lt;br /&gt;Used by agents to:&lt;br /&gt;- listen for events during recording&lt;br /&gt;- listen for trigger events&lt;br /&gt;- replay recorded events&lt;br /&gt;Used by server to:&lt;br /&gt;- remotely browse/select trigger components&lt;br /&gt;- remotely browse/select record filtration components&lt;br /&gt;One per agent.&lt;br /&gt;TriggerPlayerImpl Main handle for a recording player within an agent. Used by the server to&lt;br /&gt;access player’s remote objects. Used by trigger listener to fire trigger. There&lt;br /&gt;can be many of these objects per agent.&lt;br /&gt;RecordImpl Server uses this to set/get state of an agent’s recording player. Provides&lt;br /&gt;control and feedback of record/playback subsystem (e.g. start play, stop&lt;br /&gt;play, etc.) One per TriggerPlayerImpl.&lt;br /&gt;EventListImpl Contains an ordered list recording of application activity.&lt;br /&gt;PlaybackThreadProxyImpl uses this as the list of events to replay.&lt;br /&gt;RecordImpl uses this as the place to store events when recording. Used&lt;br /&gt;as a model for editing recorded events on the server. If a native language&lt;br /&gt;recording is used, then this object is not available. One per&lt;br /&gt;TriggerPlayerImpl.&lt;br /&gt;PlaybackThreadProxyImpl Server uses this to set playback parameters of an agent’s recording player&lt;br /&gt;(e.g. playback with no delay, set replay cursor on/off, etc.) Contains one of&lt;br /&gt;two recording types: ordered list of events or native language recording.&lt;br /&gt;One per RecordImpl.&lt;br /&gt;TriggerListenerImpl Main handle for trigger listener in an agent. Used by server to:&lt;br /&gt;- set component to listen to for trigger&lt;br /&gt;- set event to listen to for trigger&lt;br /&gt;- set event threshold to apply to event&lt;br /&gt;- set recording player to activate when firing trigger.&lt;br /&gt;There can be many of these objects per agent.&lt;br /&gt;RebeccaServerImpl Main handle for the server. Used by agents to:&lt;br /&gt;- access server’s other remote objects&lt;br /&gt;- signal synchronization event&lt;br /&gt;- register/unregister&lt;br /&gt;One per server.&lt;br /&gt;93&lt;br /&gt;Table 13: Rebecca’s Remote Objects&lt;br /&gt;Figure 22: High level view of event list model/view/controller&lt;br /&gt;architecture&lt;br /&gt;Figure 23 shows a detailed view of the event list management subsystem. The user&lt;br /&gt;manipulates the model with keyboard and mouse commands directed at the view. The&lt;br /&gt;controller intercepts these commands and translates them according to Figure 22. For&lt;br /&gt;example, pressing the up arrow key in the view translates to decrement the current position in&lt;br /&gt;the event list. The controller sends the translated command to the model proxy. The model&lt;br /&gt;proxy forwards the command to the model. If the command results in a change to the model,&lt;br /&gt;the model proxy is notified. The model proxy forwards changes to the view. The user sees&lt;br /&gt;the model change reflected in the view.&lt;br /&gt;If the model change includes copy, cut, or paste of events, then the global clipboard is&lt;br /&gt;involved. In addition to informing the model proxy of the editing command, the global&lt;br /&gt;Event List Management&lt;br /&gt;Current Position&lt;br /&gt;Model View Controller&lt;br /&gt;Ctrl + C&lt;br /&gt;Ctrl + X&lt;br /&gt;Ctrl + V&lt;br /&gt;Copy current selection to global clipboard.&lt;br /&gt;Cut current selection to global clipboard.&lt;br /&gt;Paste into model from global clipboard at&lt;br /&gt;current position.&lt;br /&gt;Set current position to first element&lt;br /&gt;in model.&lt;br /&gt;Set current position to last element&lt;br /&gt;in model.&lt;br /&gt;Move current position back one&lt;br /&gt;element in model.&lt;br /&gt;Move current position forward one&lt;br /&gt;element in model.&lt;br /&gt;Scroll view without changing current position.&lt;br /&gt;Shift + Click&lt;br /&gt;Select elements from current position to&lt;br /&gt;mouse pointer position.&lt;br /&gt;Click Set current position to mouse pointer position.&lt;br /&gt;94&lt;br /&gt;clipboard is informed. The clipboard asks the model proxy for all elements in the ranges&lt;br /&gt;supplied by the controller for a copy/cut command. For a paste command, the clipboard tells&lt;br /&gt;the model proxy to insert clipboard elements after the current position.&lt;br /&gt;Figure 23: Detailed view of event list model/view/controller architecture.&lt;br /&gt;Proxy Design Pattern&lt;br /&gt;The model proxy is a local proxy for a remote object. This design pattern is used throughout&lt;br /&gt;Rebecca to encapsulate subtle differences between local and remote objects. A proxy object&lt;br /&gt;allows local objects to remain unaware of remote interaction that is occurring.&lt;br /&gt;In the case of the event list, the view registers interest in model changes with the model proxy,&lt;br /&gt;rather than the model. The model proxy registers interest with the remote model. Changes in&lt;br /&gt;the model are sent remotely to the model proxy. The proxy then forwards changes to the local&lt;br /&gt;view. This technique simplifies the coding of the view object. If the view object registered&lt;br /&gt;Event List Management&lt;br /&gt;Current Position&lt;br /&gt;Model View&lt;br /&gt;Controller&lt;br /&gt;Model Proxy&lt;br /&gt;Global Clipboard&lt;br /&gt;Changes&lt;br /&gt;in model.&lt;br /&gt;Forward&lt;br /&gt;changes&lt;br /&gt;in model.&lt;br /&gt;Copy, cut, paste&lt;br /&gt;commands.&lt;br /&gt;Copy of elements for&lt;br /&gt;copy/cut commands.&lt;br /&gt;Copy of clipboard elements&lt;br /&gt;for paste command.&lt;br /&gt;Keyboard/&lt;br /&gt;mouse&lt;br /&gt;events.&lt;br /&gt;Model&lt;br /&gt;manipulation&lt;br /&gt;commands.&lt;br /&gt;Forward model&lt;br /&gt;manipulation&lt;br /&gt;commands.&lt;br /&gt;Server Agent&lt;br /&gt;95&lt;br /&gt;directly with the remote model, then special case code would have to be written. This code&lt;br /&gt;would involve exporting the view as a remote object to the model.&lt;br /&gt;In Rebecca-J, the Java based implementation of Rebecca, exporting the view as a remote&lt;br /&gt;object would have caused another problem. The view contains GUI components. GUI&lt;br /&gt;components are not compatible between versions of the JDK. This would have placed the&lt;br /&gt;additional requirement that the same JDK be used on all machines participating in a test&lt;br /&gt;session, an administrative nightmare.&lt;br /&gt;Figure 24: Component management architecture diagram&lt;br /&gt;for Rebecca&lt;br /&gt;6.1.3 Component Management&lt;br /&gt;Figure 24 shows the component management architecture diagram for Rebecca. In order to&lt;br /&gt;record/playback user actions, the application must register components with Rebecca’s&lt;br /&gt;ComponentMonitorImpl subsystem. For example, if the tester wants to record/playback&lt;br /&gt;events generated by a GUI push button, the widget must be registered as a component.&lt;br /&gt;Section 6.2.3 provides details of automatic registration of GUI components, and how almost&lt;br /&gt;any application activity can be recorded using the component/event model. Components&lt;br /&gt;Component Management&lt;br /&gt;ComponentMonitorImpl&lt;br /&gt;-GUI Components&lt;br /&gt;- Window0&lt;br /&gt;- Widget0&lt;br /&gt;- Widget1&lt;br /&gt;- ...&lt;br /&gt;+ Window1&lt;br /&gt;+ ...&lt;br /&gt;- StateChange Components&lt;br /&gt;-StateChangeComponent0&lt;br /&gt;-StateChangeComponent1&lt;br /&gt;- ...&lt;br /&gt;+ Customized Components&lt;br /&gt;Application&lt;br /&gt;Register Component&lt;br /&gt;PlaybackThreadProxyImpl&lt;br /&gt;replayInitialization()&lt;br /&gt;replayEvent()&lt;br /&gt;Shut off recorder.&lt;br /&gt;Stop listening to components.&lt;br /&gt;Lookup runtime component given ID&lt;br /&gt;Invoke dispatch event&lt;br /&gt;Stop listening t&lt;br /&gt;o components.&lt;br /&gt;Lookup component&lt;br /&gt;RecordImpl&lt;br /&gt;recordInitialization()&lt;br /&gt;Shutoff replay.&lt;br /&gt;Shutoff triggers.&lt;br /&gt;Listen to filtered components.&lt;br /&gt;processEvent()&lt;br /&gt;Add event to event list model.&lt;br /&gt;EvenListImpl&lt;br /&gt;Add event&lt;br /&gt;Listen to filtered components.&lt;br /&gt;Shutoff&lt;br /&gt;Shutoff&lt;br /&gt;96&lt;br /&gt;encapsulate parts of the application under test that generate activity. Events generated by&lt;br /&gt;these components encapsulate the activity.&lt;br /&gt;Most component management takes place in the agent. The server, however, can view and&lt;br /&gt;manipulate agent components remotely using MVC and proxy design patterns (see Section&lt;br /&gt;6.1.2). Server manipulation gives the tester the ability to select components for record&lt;br /&gt;filtration (see Section 6.2.4) and triggers (see Section 6.3.2).&lt;br /&gt;PlaybackThreadProxyImpl is responsible for the replay of recorded events. Before replay&lt;br /&gt;begins, agent recording is suspended. This simplifies the user’s understanding of the agent’s&lt;br /&gt;state. An agent can be either in a record state or playback state, but not both. It also&lt;br /&gt;eliminates unnecessary recording of application activity already recorded. If the tester wants to&lt;br /&gt;duplicate a recording that already exists in an agent, then it can be copied from one recording&lt;br /&gt;player to another using the event list editing.&lt;br /&gt;ComponentMonitorImpl suspends recording by iterating through registered components and&lt;br /&gt;removing interest in the events they generate. Recording is resumed by repeating the iteration&lt;br /&gt;and adding interest in component events.&lt;br /&gt;As part of the playback process, the PlaybackThreadProxyImpl must determine the runtime&lt;br /&gt;component to send a recorded event to. This resolution is always done at runtime to improve&lt;br /&gt;the reusability of a recording (see Section 6.2.2). A recorded event contains a persistent store&lt;br /&gt;id identifying the component receiving the event. The runtime component is resolved with&lt;br /&gt;using a hashtable lookup in the ComponentMonitorImpl with the persistent store id as the&lt;br /&gt;key.&lt;br /&gt;RecordImpl is responsible for control of the recording player’s state and the creation of&lt;br /&gt;recordings. To create a recording, RecordImpl first suspends replay activity and triggering.&lt;br /&gt;Next interest in component events is registered through the ComponentMonitorImpl.&lt;br /&gt;Component events are then appended to the event list for the duration of the recording.&lt;br /&gt;97&lt;br /&gt;Playback Management&lt;br /&gt;Server&lt;br /&gt;Load Load&lt;br /&gt;Save Save&lt;br /&gt;Agent&lt;br /&gt;PlaybackThreadProxyImpl&lt;br /&gt;EventListPlaybackThread&lt;br /&gt;Event List Model&lt;br /&gt;NativeLanguagePlaybackThread&lt;br /&gt;Save Load&lt;br /&gt;Get next event&lt;br /&gt;Replay event&lt;br /&gt;Test for sleep&lt;br /&gt;executeEventRecord (...)&lt;br /&gt;executeEventRecord (...)&lt;br /&gt;executeEventRecord (...)&lt;br /&gt;...&lt;br /&gt;...&lt;br /&gt;executeEventRecords ()&lt;br /&gt;Replay event&lt;br /&gt;Test for sleep&lt;br /&gt;executeEventRecord ()&lt;br /&gt;Load&lt;br /&gt;Save as Native Language&lt;br /&gt;Save/Load Event List&lt;br /&gt;Save Native Language&lt;br /&gt;Load Native Language&lt;br /&gt;Save/Load Event List&lt;br /&gt;Save Native Language&lt;br /&gt;Figure 25: Playback management architecture diagram for&lt;br /&gt;Rebecca&lt;br /&gt;6.1.4 Playback Management&lt;br /&gt;Figure 25 shows the playback management architecture diagram for Rebecca. Playback&lt;br /&gt;management can be broken down into several core areas: persistent store, playback, and player&lt;br /&gt;state.&lt;br /&gt;Persistent store of a recording provides reuse across test sessions. Rebecca allows a recording&lt;br /&gt;to be stored in two formats: ordered event list or native language. For information on how&lt;br /&gt;these formats are created, see Section 6.2.3. If the event list option is chosen, the recording is&lt;br /&gt;converted to a flat file of event records. If the native language option is chosen, the recording&lt;br /&gt;is converted to the text of a program. The user is responsible for compiling the program and&lt;br /&gt;placing the object file in a location accessible to Rebecca at runtime.&lt;br /&gt;98&lt;br /&gt;Rebecca allows a recording to be loaded in event list or native language format. The user&lt;br /&gt;specifies the event list’s file name from the server’s user interface. However, because the agent&lt;br /&gt;will load the file locally, the location of the file must be relative to the agent’s file system. A&lt;br /&gt;similar pattern is followed for native language format recordings. The location of the object&lt;br /&gt;file specified must be relative to the agent’s process, not the server’s.&lt;br /&gt;Replay of a recording is handled differently based on format. The algorithm for event list&lt;br /&gt;replay is shown in Figure 26. For more information on sleep state, see Section 6.1.5.&lt;br /&gt;Figure 26: Algorithm for event list replay.&lt;br /&gt;For a native language recording, events are stored sequentially as separate subroutine calls to&lt;br /&gt;executeEventRecord() within a subroutine called executeEventRecords(). The algorithm for&lt;br /&gt;executeEventRecord() is shown in Figure 27.&lt;br /&gt;Figure 27: Algorithm for native language replay&lt;br /&gt;Section 6.2.3 contains an example of the executeEventRecords() subroutine in Java.&lt;br /&gt;Unlike event list recordings, replay of native language recordings is sequential from start to&lt;br /&gt;finish. It is not possible to stop the replay, back up several events, and continue.&lt;br /&gt;replayEventList() {&lt;br /&gt;while (eventList.hasMoreElements()) {&lt;br /&gt;event = eventList.hasMoreElements();&lt;br /&gt;component = ComponentMonitorImpl&lt;br /&gt;.lookupRuntimeComponent(event.getComponentSourceId())&lt;br /&gt;if (sleep conditions satisfied) enter SLEEP STATE&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;executeEventRecord(EventRecord event) {&lt;br /&gt;component = ComponentMonitorImpl&lt;br /&gt;.lookupRuntimeComponent(event.getComponentSourceId())&lt;br /&gt;if (sleep conditions satisfied) enter SLEEP STATE&lt;br /&gt;}&lt;br /&gt;99&lt;br /&gt;Figure 28: State management architecture diagram for&lt;br /&gt;Rebecca.&lt;br /&gt;6.1.5 State Management&lt;br /&gt;Figure 28 shows the state management architecture diagram for Rebecca. State management&lt;br /&gt;focuses on manipulation and feedback of recording player state. Although the diagram shows&lt;br /&gt;an agent with a single recording player, recall from Section 6.1 that there may be multiple&lt;br /&gt;players.&lt;br /&gt;The server is the primary mechanism for manipulating recording player state. The server&lt;br /&gt;assigns a VCR-like control panel to each agent player. The tester can create, replay,&lt;br /&gt;continuously replay, fast forward, rewind, or stop the replay of a recording using this panel.&lt;br /&gt;The MVC proxy design pattern discussed in Section 6.1.2 is used to send commands from a&lt;br /&gt;server proxy to the agent’s remote RecordImpl object. Changes to RecordImpl’s state are&lt;br /&gt;State Management&lt;br /&gt;RecordImpl&lt;br /&gt;setState()&lt;br /&gt;NO_STATE, RECORD_STATE, PLAY_STATE,&lt;br /&gt;STOP_STATE, FF_STATE, REW_STATE,&lt;br /&gt;CONTINUOUSPLAY_STATE,&lt;br /&gt;SYNCHRONIZE_STATE Wakeup because:&lt;br /&gt;- Play button pressed&lt;br /&gt;- Trigger fired&lt;br /&gt;- Synchronize release&lt;br /&gt;Sleep because:&lt;br /&gt;- Stop state&lt;br /&gt;- Synchronize state&lt;br /&gt;- End of list&lt;br /&gt;PlaybackThreadProxyImpl&lt;br /&gt;setState()&lt;br /&gt;PLAY_STATE, STOP_STATE,&lt;br /&gt;CONTINUOUSPLAY_STATE,&lt;br /&gt;SYNCHRONIZE_STATE&lt;br /&gt;Report state change&lt;br /&gt;Report state change&lt;br /&gt;SynchronizationController&lt;br /&gt;Determine if playback&lt;br /&gt;thread should sleep or&lt;br /&gt;wakeup.&lt;br /&gt;RecorderPlayer&lt;br /&gt;State change&lt;br /&gt;command&lt;br /&gt;Report state&lt;br /&gt;change&lt;br /&gt;Sleep/Wakeup&lt;br /&gt;playback thread&lt;br /&gt;Synchronizaton event&lt;br /&gt;encountered.&lt;br /&gt;TriggerListenerImpl&lt;br /&gt;Fire trigger if event and&lt;br /&gt;threshold conditions have&lt;br /&gt;been met.&lt;br /&gt;Wakeup playback&lt;br /&gt;thread.&lt;br /&gt;Agent&lt;br /&gt;Agent&lt;br /&gt;Server&lt;br /&gt;100&lt;br /&gt;forwarded via the proxy object to the control panel. The panel is updated to reflect the state&lt;br /&gt;change.&lt;br /&gt;The server also provides a centralized processing for synchronization. When the server&lt;br /&gt;determines that a recording player should be synchronized, it changes the player’s state to&lt;br /&gt;SYNCHRONIZE_STATE. This suspends replay in the PlaybackThreadProxyImpl thread until&lt;br /&gt;it is awakened by a new state change. Normally, this state change would result from a&lt;br /&gt;setState(PLAY_STATE) command issued by the server when it has determined that the&lt;br /&gt;synchronization condition has been satisfied. For details on synchronization including player&lt;br /&gt;synchronization/release and deadlock detection see Section 6.3.2.3.&lt;br /&gt;A trigger listener changes the state of a recording player when firing. If the player is in a&lt;br /&gt;STOP_STATE, the firing trigger will set the player’s state to PLAY_STATE. If the player is&lt;br /&gt;already in a PLAY_STATE, and queuing is turned on, then the firing will be queued for replay&lt;br /&gt;later. If the player is in any other state, the firing will be ignored. For more details on trigger&lt;br /&gt;use and architecture see Sections 6.3.2 and 6.1.6.&lt;br /&gt;PlaybackThreadImpl is directly responsible for the control of the replay of a recording.&lt;br /&gt;Replay executes in an independent thread that allows it to be manipulated during playback.&lt;br /&gt;There are two broad categories of state that the replay thread can be in: awake and suspended.&lt;br /&gt;The thread is awake while replaying a recording. The thread enters a suspended state when&lt;br /&gt;one of the following occurs:&lt;br /&gt;STOP_STATE reported from RecordImpl (because user pressed stop button)&lt;br /&gt;Event replayed and single step set to TRUE (reports STOP_STATE to RecordImpl)&lt;br /&gt;Component referred to by replayed event doesn’t exist or is not visible&lt;br /&gt;SYNCHRONIZE_STATE reported from RecordImpl&lt;br /&gt;End of event list encountered (reports STOP_STATE to RecordImpl)&lt;br /&gt;While suspended, the thread consumes no resources. The thread is awakened when one of the&lt;br /&gt;following occurs:&lt;br /&gt;101&lt;br /&gt;Event component exists or becomes visible (applies to suspension because component&lt;br /&gt;didn’t exist or wasn’t visible)&lt;br /&gt;PLAY_STATE reported from RecordImpl because user pressed play button&lt;br /&gt;CONTINUOUSPLAY_STATE reported from RecordImpl because user pressed&lt;br /&gt;continuous play button&lt;br /&gt;PLAY_STATE reported from RecordImpl because trigger was fired&lt;br /&gt;PLAY_STATE reported from RecordImpl because of synchronization release&lt;br /&gt;Figure 29: Trigger management architecture diagram for&lt;br /&gt;Rebecca.&lt;br /&gt;6.1.6 Trigger Management&lt;br /&gt;Figure 29 shows the trigger management architecture diagram for Rebecca. The discussion of&lt;br /&gt;the architecture is structured around the steps taken to configure a trigger. For more&lt;br /&gt;information on trigger use, see Section 6.3.2.&lt;br /&gt;Trigger Management&lt;br /&gt;ComponentMonitorImpl&lt;br /&gt;-GUI Components&lt;br /&gt;- Window0&lt;br /&gt;- Widget0&lt;br /&gt;- Widget1&lt;br /&gt;- ...&lt;br /&gt;+ Window1&lt;br /&gt;+ ...&lt;br /&gt;- StateChange Components&lt;br /&gt;-StateChangeComponent0&lt;br /&gt;-StateChangeComponent1&lt;br /&gt;- ...&lt;br /&gt;+ Customized Components&lt;br /&gt;ComponentProxy ComponentProxy&lt;br /&gt;-GUI Components&lt;br /&gt;- Window0&lt;br /&gt;- Widget0&lt;br /&gt;- Widget1&lt;br /&gt;- ...&lt;br /&gt;+ Window1&lt;br /&gt;+ ...&lt;br /&gt;+ StateChange Components&lt;br /&gt;+ Customized Components&lt;br /&gt;Component View&lt;br /&gt;- mouseEvent&lt;br /&gt;- keyboardEvent&lt;br /&gt;- ...&lt;br /&gt;Triggering&lt;br /&gt;Event Threshold&lt;br /&gt;Model&lt;br /&gt;Threshold&lt;br /&gt;Model&lt;br /&gt;Events component&lt;br /&gt;can generate&lt;br /&gt;Apply threshold test to&lt;br /&gt;triggering event. Fire&lt;br /&gt;trigger if it passes.&lt;br /&gt;Activate AgentPlayer&lt;br /&gt;Increment Counter&lt;br /&gt;TriggerCounterImpl TriggerCounterImpl&lt;br /&gt;If maximum firings exceeded,&lt;br /&gt;disable trigger.&lt;br /&gt;Disable&lt;br /&gt;trigger&lt;br /&gt;Agent0&lt;br /&gt;Agent1&lt;br /&gt;...&lt;br /&gt;AgentN&lt;br /&gt;AgentListeners&lt;br /&gt;Agent0&lt;br /&gt;Agent1&lt;br /&gt;...&lt;br /&gt;AgentN&lt;br /&gt;AgentPlayers&lt;br /&gt;Threshold List&lt;br /&gt;mouseEvent&lt;br /&gt;mouseMove&lt;br /&gt;mouseDrag&lt;br /&gt;mousePress&lt;br /&gt;mouseRelease&lt;br /&gt;mouseClick&lt;br /&gt;mouseRegion&lt;br /&gt;keyPress&lt;br /&gt;keyRelease&lt;br /&gt;keySequence&lt;br /&gt;...&lt;br /&gt;Threshold Editor&lt;br /&gt;Set&lt;br /&gt;threshold&lt;br /&gt;model&lt;br /&gt;Set threshold&lt;br /&gt;parameters&lt;br /&gt;Set&lt;br /&gt;threshold&lt;br /&gt;model&lt;br /&gt;Forward&lt;br /&gt;changes&lt;br /&gt;in model&lt;br /&gt;Component&lt;br /&gt;selected.&lt;br /&gt;Select&lt;br /&gt;trigger&lt;br /&gt;listener&lt;br /&gt;1&lt;br /&gt;2 Select a&lt;br /&gt;component&lt;br /&gt;to listen to.&lt;br /&gt;3 Select a&lt;br /&gt;threshold&lt;br /&gt;model.&lt;br /&gt;4 Edit&lt;br /&gt;threshold&lt;br /&gt;model.&lt;br /&gt;5 Select&lt;br /&gt;agent&lt;br /&gt;player.&lt;br /&gt;6&lt;br /&gt;7&lt;br /&gt;Agent&lt;br /&gt;102&lt;br /&gt;In step one, the trigger listener agent is selected from a list of registered agents. An agent is&lt;br /&gt;added to this list when it registers with the server. When the application under test terminates,&lt;br /&gt;the agent is removed from the list. The user’s selection creates a new trigger listener within the&lt;br /&gt;agent. The server is given a remote handle to communicate with the trigger listener.&lt;br /&gt;In step two, the user selects an application component to listen to. The server uses the MVC&lt;br /&gt;proxy design pattern presented in Section 6.1.2 to present a component browser. Using the&lt;br /&gt;browser, the user selects the agent component that will generate the triggering event. If the&lt;br /&gt;agent component model changes while browsing, the server will be notified via the proxy&lt;br /&gt;model. Because the agent and server execute in different process spaces, the server retrieves&lt;br /&gt;the persistent store id of the selected component, rather than its runtime id. The server then&lt;br /&gt;passes the component id to the trigger listener using the listener’s remote handle.&lt;br /&gt;In step three, a threshold model is selected. Threshold models register themselves with the&lt;br /&gt;server at initialization time. If necessary, the user can add custom threshold models to the&lt;br /&gt;server. Custom threshold models must also register with the server at initialization time. For&lt;br /&gt;more information about customization, see Section 6.3.4.5. Both the server and the remote&lt;br /&gt;trigger listener are informed of the selection. On the trigger listener side, default parameters&lt;br /&gt;for the threshold are set. These default values are used to test component events if the user&lt;br /&gt;chooses not to edit the threshold model. Additionally, the threshold model registers interest&lt;br /&gt;with the user-selected component in the events that will fire the trigger. On the server side, an&lt;br /&gt;editor for the threshold model is configured.&lt;br /&gt;In step four the user edits the threshold model using the model’s editor. The editor allows the&lt;br /&gt;user to customize the threshold for a specific event. Editors are model specific. The editor&lt;br /&gt;can be as simple as a set of editable fields describing an event or as sophisticated as a graphical&lt;br /&gt;subsystem that specifies the region of the application’s GUI the event must occur in. The&lt;br /&gt;result of the editing process is a set of parameters that determine how the threshold will test&lt;br /&gt;component events. These parameters are sent from the server to the threshold model residing&lt;br /&gt;in the agent’s trigger listener.&lt;br /&gt;103&lt;br /&gt;In step five, the recording player agent is selected from a list of registered agents. The agents&lt;br /&gt;are the same ones in the recording player list. The user’s selection creates a new recording&lt;br /&gt;player within the agent. The trigger listener is given a remote handle to communicate with the&lt;br /&gt;recording player.&lt;br /&gt;In step six, the trigger listener’s application begins generating events. The threshold model&lt;br /&gt;tests potential trigger events generated by the user-selected component. If an event meets the&lt;br /&gt;threshold condition, the trigger fires. Consider, for example, consider a trigger component set&lt;br /&gt;to a GUI push button, and threshold model set to mouse press events. When the cursor is&lt;br /&gt;inside the push button region, all mouse events are sent to the threshold model. The model&lt;br /&gt;ignores the events until the user presses the mouse button generating a mouse press event.&lt;br /&gt;Once received, the mouse press event causes the trigger to fire.&lt;br /&gt;In step seven, the trigger is finally fired resulting in two actions. First, the trigger listener&lt;br /&gt;activates the recording player. This involves setting the recording player’s RecordImpl state&lt;br /&gt;to PLAY_STATE. This will cause the player to replay its recording. Second, the server side&lt;br /&gt;trigger counter is incremented. The change is displayed in a counter field associated with the&lt;br /&gt;trigger on the server. A check is performed to see if the firing count is exceeded. If exceeded,&lt;br /&gt;the trigger listener is disabled so that no more firings can occur.&lt;br /&gt;6.2 General Infrastructure&lt;br /&gt;6.2.1 IDE Integration&lt;br /&gt;A major goal of the architecture is seamless integration into existing integrated development&lt;br /&gt;environments (IDEs) such as Delphi, Microsoft’s Visual C++, and IBM’s Visual Age. As part&lt;br /&gt;of the developer’s daily tool set, there is encouragement to exercise the application using&lt;br /&gt;Rebecca throughout the implementation phase. The architecture does not restrict Rebecca to&lt;br /&gt;the IDE, however. This allows it to be used during integration and system testing by quality&lt;br /&gt;assurance personnel.&lt;br /&gt;Rebecca’s IDE integration contrasts with traditional execution based testing systems which&lt;br /&gt;view testing as a separate task from development. These tools are generally resource intensive&lt;br /&gt;104&lt;br /&gt;making it difficult to run alongside an IDE and application on the same machine. For&lt;br /&gt;multiuser testing, they may restrict the use of other applications on the same machine. Scripts&lt;br /&gt;produced from a recording session take the form of a C or BASIC-like proprietary language.&lt;br /&gt;Modifications to a script require editing, often with a proprietary editor. Compilation of the&lt;br /&gt;script to a proprietary file format is necessary before executing a test. Execution requires&lt;br /&gt;invocation of the script from within the test system. Some systems also provide a script&lt;br /&gt;debugger. The testing system becomes, in effect, a separate IDE for creating, editing, and&lt;br /&gt;executing tests.&lt;br /&gt;The artificial separation of testing and development creates a barrier for using the testing&lt;br /&gt;system during the implementation phase. In addition to resource competition between the&lt;br /&gt;IDE and testing system, the developer must also learn a completely different programming&lt;br /&gt;language and IDE.&lt;br /&gt;Rebecca facilitates IDE integration in several ways. RebeccaServer, responsible for user&lt;br /&gt;control of the test system, has a lightweight footprint. Low resource requirements allow it to&lt;br /&gt;execute on the same machine as the IDE. Rebecca distributes the resource burden for test&lt;br /&gt;execution to RebeccaAgents providing good scalability. This allows RebeccaServer continue&lt;br /&gt;to share the same machine with the IDE as the number of users in a test increase.&lt;br /&gt;Rebecca describes an easy-to-use record/playback subsystem for creating, editing, and&lt;br /&gt;executing tests. Events are recorded and played back using a VCR-like interface. The model&lt;br /&gt;for recorded script is an ordered list that can be edited and executed immediately without&lt;br /&gt;compiling. A variety of playback options provides sophisticated script control without&lt;br /&gt;programming. The guiding principle of the subsystem is that if it is easy to use, then it will be&lt;br /&gt;used. More details are provided in sections 6.2.5 and 6.2.6.&lt;br /&gt;In many situations, executing a test script containing an ordered list of events for playback is&lt;br /&gt;sufficient. However, if sophisticated data structures and control flows for test execution are&lt;br /&gt;desired, the script must contain elements of a programming language. Rather than a&lt;br /&gt;proprietary test script, Rebecca requires the script be exported in the IDE’s own programming&lt;br /&gt;105&lt;br /&gt;language. This allows the script developer to use familiar tools to create sophisticated tests.&lt;br /&gt;More detail is provided in Section 6.2.7.&lt;br /&gt;Unlike traditional test systems, the architecture requires an application to explicitly register&lt;br /&gt;itself with Rebecca. The registration process consists of creating and registering with a&lt;br /&gt;RebeccaAgent. The process is straightforward (consisting of a few lines of code), and is done&lt;br /&gt;by the application at initialization time. Registration gives the IDE runtime control over&lt;br /&gt;Rebecca. This allows the developer, for example, to use the IDE’s debugger to control&lt;br /&gt;runtime execution of the test script. The flow of events from test script to application can also&lt;br /&gt;be traced with the debugger. In addition, as we shall see in Section 6.2.3, the application’s&lt;br /&gt;peripheral awareness of Rebecca dramatically expands the ability to record and playback&lt;br /&gt;events.&lt;br /&gt;Figure 30: Connecting to Rebecca-J using IBM’s Visual Age Visual Composition&lt;br /&gt;Editor&lt;br /&gt;Rebecca-J, the Java implementation of Rebecca, provides two mechanisms for&lt;br /&gt;connecting to an application: visual editing and inline code. When using a visual editor,&lt;br /&gt;the Java bean RebeccaAgentImpl’s property topUIComponent is connected to the application’s&lt;br /&gt;106&lt;br /&gt;highest-level user interface (UI) component. Figure 30 shows the connection between the&lt;br /&gt;bean and the JFrame of a sample program using IBM’s Visual Age visual composition&lt;br /&gt;editor.&lt;br /&gt;Rebecca-J can also be connected to a Java application with two lines of code (see Figure 31).&lt;br /&gt;The first line of code creates an instance of the RebeccaAgentImpl class. The second line tells&lt;br /&gt;Rebecca-J that the application’s highest level UI component is the return value of the method&lt;br /&gt;getJFrame().&lt;br /&gt;Figure 31: Connecting to Rebecca-J using inline code&lt;br /&gt;6.2.2 User Interface Independence&lt;br /&gt;Applications with a user interface component require constant revision as the developer finetunes&lt;br /&gt;human computer interaction. Iterative development creates compatibility problems for&lt;br /&gt;recordings made from previous versions of the application. Rebecca reduces these problems&lt;br /&gt;with recordings that are independent on a specific version of the user interface.&lt;br /&gt;Once a runtime connection with the application is made, Rebecca traverses the UI component&lt;br /&gt;tree registering all subcomponents and the events that they generate with a&lt;br /&gt;ComponentMonitor. Registration also causes the RebeccaAgent to search for and connect to&lt;br /&gt;a RebeccaServer. When the VCR-like record button is pressed, Rebecca records events on&lt;br /&gt;components that have been registered with the ComponentMonitor. The recording is&lt;br /&gt;modeled as an ordered list of events. Each list element includes: a non-volatile identifier of the&lt;br /&gt;component generating the event, a wait time indicating the time in milliseconds since the&lt;br /&gt;previous event was recorded, and event specific information.&lt;br /&gt;The non-volatile component identifier is necessary for persistent store and multi-agent&lt;br /&gt;playback of recordings. The ComponentMonitor maintains a map of identifiers to runtime&lt;br /&gt;rebeccaAgentImpl = new rebeccaAgentImpl();&lt;br /&gt;rebeccaAgentImpl.setTopUIComponent(getJFrame());&lt;br /&gt;107&lt;br /&gt;components. The non-volatile identifier is generated from a static label that is part of each&lt;br /&gt;component. In Rebecca-J, for example, the identifier is generated from the method&lt;br /&gt;getName(). An error occurs if getName() returns an identifier that has already been mapped.&lt;br /&gt;If the component does not support getName(), then the identifier is simply the classname and&lt;br /&gt;a unique sequence number. During playback, the component that receives the event is&lt;br /&gt;determined by mapping the event’s component identifier to a runtime component using the&lt;br /&gt;ComponentMonitor.&lt;br /&gt;By default, Rebecca records only the lowest level events (e.g. simple mouse and keyboard&lt;br /&gt;events) on UI components. Recording low-level events reduces dependencies between&lt;br /&gt;components. This independence makes it easier to add and remove components from the&lt;br /&gt;application without re-recording. If a recording contains events generated by components that&lt;br /&gt;no longer exist, however, some editing may be necessary. In some cases, a recording will have&lt;br /&gt;to be deleted entirely because meaning is lost when a component is removed.&lt;br /&gt;Data is recorded relative to the component that generated the event to promote location&lt;br /&gt;independence. If a component’s location changes, recordings containing the component are&lt;br /&gt;still valid. Location can be spatial (e.g. a location on the display), or hierarchical (e.g. the child&lt;br /&gt;of another component) or both as in the case of UI components. For example, if events are&lt;br /&gt;recorded from a push button UI component, and the location of the push button changes,&lt;br /&gt;then the recording will still playback correctly (see Figure 32).&lt;br /&gt;Figure 32: Recording is played back correctly event though UI components&lt;br /&gt;have moved.&lt;br /&gt;108&lt;br /&gt;6.2.3 Extensible Component and Event Models&lt;br /&gt;Rebecca improves traditional record/playback beyond the UI with extensible component and&lt;br /&gt;event models. Any application activity can be replayed if the source of the activity is defined&lt;br /&gt;to Rebecca as a component, and the activity is defined to Rebecca as an event. Examples of&lt;br /&gt;activities that can be supported include: alternative user input such as voice or gesture, device&lt;br /&gt;input such as a timer or temperature gauge, remote communication from another application,&lt;br /&gt;and changes in application state.&lt;br /&gt;Rebecca defines an object-oriented model for components. A component is the source of an&lt;br /&gt;event when recording and the receiver of an event during playback. A component also has a&lt;br /&gt;location within the hierarchy maintained by the ComponentMonitor. A component is&lt;br /&gt;responsible specifying its location in the hierarchy.&lt;br /&gt;Rebecca also defines an object-oriented model for events. In addition to data requirements&lt;br /&gt;discussed in Section 6.2.3, an event has several functional requirements. A constructor must&lt;br /&gt;be provided to create a complete instance with a single call. The constructor is invoked when&lt;br /&gt;activity that maps to the event is recorded. In concert with the constructor, a method is&lt;br /&gt;implemented that outputs a string for creating an instance of the event in the IDE’s native&lt;br /&gt;language. This allows recordings to be output in the IDE’s native language. A similar method&lt;br /&gt;outputs event data as a human readable string. This string can be customized to display only&lt;br /&gt;event data of interest. The VCR-like record/playback system uses the string when displaying a&lt;br /&gt;recording. Finally, a method for locating the receiving component and replaying the event&lt;br /&gt;during playback is required.&lt;br /&gt;109&lt;br /&gt;Figure 33: UI Components translated to Rebecca’s Component Hierarchy&lt;br /&gt;Many UI frameworks (e.g. AWT, JFC, X-Windows, Motif, MFC) lend themselves naturally to&lt;br /&gt;Rebecca’s component and event models. An UI component from one of these frameworks&lt;br /&gt;can accept events from sources other than raw input devices such as the mouse and keyboard.&lt;br /&gt;This allows Rebecca’s playback system to feed recorded UI events to an application at runtime.&lt;br /&gt;An UI event generated by one of these frameworks readily identifies the component for which&lt;br /&gt;the event was intended. This allows Rebecca’s record system to identify where the event&lt;br /&gt;should be replayed. Figure 33 shows a base window acting as the root UI component for the&lt;br /&gt;application. The window is a container for three pushbuttons, a slider bar, and a text field.&lt;br /&gt;This organization maps to the component hierarchy stored in Rebecca’s ComponentMonitor&lt;br /&gt;(see Figure 33). Rebecca’s affinity for UI frameworks reduces the amount of code the&lt;br /&gt;developer must write to record/playback UI activity (see setTopUIComponent() in Section&lt;br /&gt;6.2.3).&lt;br /&gt;There are situations where Rebecca’s component and event models do not map directly to&lt;br /&gt;application activity. In these situations, the developer must write special code to encapsulate&lt;br /&gt;the activity for Rebecca. Consider the case where the developer wishes to record/playback&lt;br /&gt;changes in application state.&lt;br /&gt;Rebecca has five requirements for encapsulating state change. The first is the use of a pattern&lt;br /&gt;derived from Java’s PropertyChangeEvent class [133]. A PropertyChangeEvent contains&lt;br /&gt;information about what property changed, when it changed, and the property’s old and new&lt;br /&gt;values.&lt;br /&gt;Button0&lt;br /&gt;Slider&lt;br /&gt;+&lt;br /&gt;-&lt;br /&gt;count&lt;br /&gt;0&lt;br /&gt;Sample Application&lt;br /&gt;Application:&lt;br /&gt;GUI Components:&lt;br /&gt;Window: Sample Application&lt;br /&gt;Button0&lt;br /&gt;Slider&lt;br /&gt;ButtonPlus&lt;br /&gt;ButtonMinus&lt;br /&gt;TextFieldCount&lt;br /&gt;110&lt;br /&gt;The observer-listener pattern is also used [134]. A listener registers with the observer to&lt;br /&gt;indicate interest in the property change. Whenever the property is changed, the observer&lt;br /&gt;iterates through the listeners sending each a PropertyChangeEvent message.&lt;br /&gt;Another requirement is a component wrapper for the data structure containing the state to be&lt;br /&gt;monitored. The wrapper is responsible for converting a state change to an event, and&lt;br /&gt;forwarding the event to listeners. The wrapper is also a placeholder for the state in the&lt;br /&gt;Component Monitor’s hierarchy. Finally, the wrapper forwards the state change events from&lt;br /&gt;Rebecca’s player to the data structure that contains actual state for playback.&lt;br /&gt;Figure 34: Creation and initialization of PropertyChangeComponentInt in AgentTester&lt;br /&gt;The fourth requirement is for the data structure maintaining state to provide a Smalltalk-like&lt;br /&gt;getter/setter for retrieving and altering the state. The only access the rest of the application&lt;br /&gt;has to the monitored state is through this getter/setter. The getter simply returns the current&lt;br /&gt;state. In addition to altering state, the setter generates and sends a state change message to the&lt;br /&gt;component wrapper.&lt;br /&gt;// Create an instance of PropertyChangeComponentInt. The constructor&lt;br /&gt;// takes a parameter which identifies the receiver for state changes&lt;br /&gt;// during playback.&lt;br /&gt;PropertyChangeComponentInt c = new PropertyChangeComponentInt(this);&lt;br /&gt;// Assign a unique name to the component (e.g. “count”);&lt;br /&gt;c.setName(“count”);&lt;br /&gt;// Find the state change root node in the ComponentMonitor’s component&lt;br /&gt;// hierarchy&lt;br /&gt;ComponentNode stateRoot =&lt;br /&gt;ComponentMonitorImpl.getComponentMonitorImpl().getStateChangeRoot();&lt;br /&gt;// Register the component as a child of the state change root component&lt;br /&gt;stateRoot.getChildren().addElement(c);&lt;br /&gt;111&lt;br /&gt;The last requirement is that a playbackEvent() routine be implemented. This routine&lt;br /&gt;converts the state change event into an actual modification of application state. Other actions&lt;br /&gt;may be performed by the routine, if necessary, when side effects of the state change also need&lt;br /&gt;to be replayed. For example, if the UI is usually updated to reflect a change the state change,&lt;br /&gt;then playbackEvent() will have to perform this function. Organizing the side effect code in&lt;br /&gt;the state’s setter can streamline the implementation of playbackEvent(), but this is not&lt;br /&gt;always possible or desirable.&lt;br /&gt;Figure 35: AgentTester’s modified setter for monitoring state change to integer count&lt;br /&gt;Figure 36: Implementation of dispatchEvent() for PropertyChangeEventRecord&lt;br /&gt;Rebecca-J provides an example of state change encapsulation for integers. This can be&lt;br /&gt;adapted with minor modifications to support any application state change. In the example, the&lt;br /&gt;class, AgentTester, maintains the integer variable count that the developer wishes to&lt;br /&gt;monitor. The first step in monitoring count is for AgentTester to create and initialize an&lt;br /&gt;instance of the component wrapper PropertyChangeComponentInt. The wrapper will go&lt;br /&gt;around the AgentTester class (see Figure 34).&lt;br /&gt;public void setCount(int newValue) {&lt;br /&gt;// Tell all listeners that state changed&lt;br /&gt;getCountPropertyChangeComponentInt().firePropertyChange(“count”,&lt;br /&gt;new Integer(this.count),&lt;br /&gt;new Integer(newValue));&lt;br /&gt;// Change the state&lt;br /&gt;this.count = newValue;&lt;br /&gt;}&lt;br /&gt;public class dispatchEvent() {&lt;br /&gt;// Get the ComponentMonitor&lt;br /&gt;ComponentMonitor cm = ComponentMonitorImpl.getComponentMonitorImpl();&lt;br /&gt;// Find the PropertyChangeComponentInt associated with this event.&lt;br /&gt;// This isn’t the component that will RECEIVE the event for playback.&lt;br /&gt;// It will tell us who the receiver is.&lt;br /&gt;PropertyChangeComponentInt pci = cm.getComponent(getComponentId());&lt;br /&gt;// Get the component that is supposed to playback the event.&lt;br /&gt;PropertyChangePlaybackListener receiver = pci.getSource();&lt;br /&gt;// Playback the event.&lt;br /&gt;receiver.playbackEvent(this);&lt;br /&gt;}&lt;br /&gt;112&lt;br /&gt;AgentTester provides getter/setter access to the integer count. The code for the getter,&lt;br /&gt;getCount(), simply returns the current value of count. In the setter, the method&lt;br /&gt;firePropertyChange() sends a PropertyChangeEvent from the component wrapper to&lt;br /&gt;interested listeners. In most situations, Rebecca-J’s recorder is the only listener. If the&lt;br /&gt;recorder is active, the event is appended to the recording. The code for the setter is shown in&lt;br /&gt;Figure 35.During playback Rebecca-J iterates through the recording’s ordered list and executes&lt;br /&gt;the abstract method dispatchEvent() on each event. Although each event has its own&lt;br /&gt;implementation of the method, the purpose is always to identify the component that will&lt;br /&gt;receive the event, and send the event to the component for replay.&lt;br /&gt;Figure 36 shows dispatchEvent() for PropertyChangeEventIntRecord. This method’s&lt;br /&gt;implementation was actually provided by the superclass PropertyChangeEventRecord,&lt;br /&gt;and did not need to be overridden by the developer to support state change for integers.&lt;br /&gt;The final requirement is for the receiver, AgentTester, to implement the abstract method&lt;br /&gt;playbackEvent(). In the count example, we want the playback of a state change to trigger&lt;br /&gt;the update of an UI component (see Figure 37) in addition to changing the value of count.&lt;br /&gt;public void playbackEvent(PropertyChangeEventIntRecord e) {&lt;br /&gt;setCount(((Integer) e.getNewValue()).intValue());&lt;br /&gt;getJTextField1().setText(String.valueOf(getTheCount()));&lt;br /&gt;}&lt;br /&gt;Figure 37: Implementation of playbackEvent() for&lt;br /&gt;AgentTester&lt;br /&gt;Rebecca-J uses object-oriented models for components and events to expand record/playback&lt;br /&gt;beyond the traditional UI. An example is included with the system that shows how to record&lt;br /&gt;changes to application state with the integer variable count. The actual amount of code the&lt;br /&gt;developer must write to monitor a state change is small: four lines for initialization, a few lines&lt;br /&gt;each for the getter/setter, and a couple of lines for playbackEvent(). The example’s pattern&lt;br /&gt;can be applied to virtually any state change.&lt;br /&gt;113&lt;br /&gt;6.2.4 Record Filtration&lt;br /&gt;Rebecca includes a filtration system that improves traditional recording. Traditionally, all&lt;br /&gt;events generated by an application become part of the recording. This leaves the user with&lt;br /&gt;two options for filtering events: intermittent recording, and manual editing. With intermittent&lt;br /&gt;recording, the recorder is turned on during specific periods of application use. With manual&lt;br /&gt;editing, the user removes extraneous events from the recording after it has been made.&lt;br /&gt;Record filtration aids in test specificity and the removal of duplicate events. A specific area of&lt;br /&gt;the application can be tested from a recording that drives the application with a specific set of&lt;br /&gt;events. For example, if the user wanted to test only the behavior of the plus and minus push&lt;br /&gt;buttons of the application in Figure 33, then the events generated as the mouse pointer is&lt;br /&gt;moved from the desktop through the main application window to the push buttons are&lt;br /&gt;extraneous.&lt;br /&gt;Duplication occurs when a replayed event triggers the generation of a runtime event by the&lt;br /&gt;application, and the runtime event also appears in the recording. Consider a recording of the&lt;br /&gt;application shown in Figure 33. The application has been written so that plus button&lt;br /&gt;increments an internal value count, and the new value is displayed in a text field. Before any&lt;br /&gt;user actions are recorded, the value of count is initialized to “0”. The recorder is turned on&lt;br /&gt;and the user presses the plus push button once. The recorder is then turned off. The value of&lt;br /&gt;count is now “1”. Rebecca’s ability to record both the push button press and the change to&lt;br /&gt;the integer value, can cause problems during playback.&lt;br /&gt;When the recording is replayed, the button press event is executed first. This triggers an&lt;br /&gt;update of the integer count, followed by the display of the value “2” in the text field. The&lt;br /&gt;recorded change to count is replayed next. This will cause a second update of count followed by&lt;br /&gt;the display of the value “1” in the text field.&lt;br /&gt;Intermittent recording cannot always correct event duplication because the user may not be&lt;br /&gt;able to turn the recorder off soon enough. In the count example, it would be almost&lt;br /&gt;impossible because the push button press event and value change event occur within&lt;br /&gt;milliseconds of each other. Manual editing of the recording is the only alternative available in&lt;br /&gt;114&lt;br /&gt;traditional test systems. Although this works, it can be a time consuming process particularly&lt;br /&gt;for large recordings.&lt;br /&gt;In addition to traditional record filtration, the architecture offers another alternative: selective&lt;br /&gt;recording. Selective recording gives the user the ability to filter events by selecting which&lt;br /&gt;components participate in a recording. The components are presented to the user with a UI&lt;br /&gt;that renders the ComponentMonitor’s hierarchy. The user interacts with the UI to activate or&lt;br /&gt;deactivate components in the hierarchy. Only events from active components are included in&lt;br /&gt;a recording.&lt;br /&gt;Rebecca-J implements selective recording using a graphical tree to represent the&lt;br /&gt;ComponentMonitor’s component hierarchy. In Figure 38, the component that monitors the&lt;br /&gt;state change for the integer count is disabled to prevent event duplication.&lt;br /&gt;Figure 38: Selective recording with Rebecca-J&lt;br /&gt;6.2.5 Script Simplification&lt;br /&gt;Rebecca describes an easy-to-use record/playback subsystem for creating, editing, and&lt;br /&gt;executing tests. A recording is activated when the user presses Rebecca’s VCR-like record&lt;br /&gt;button. The recording is deactivated when the user presses the stop button. Unlike traditional&lt;br /&gt;test systems, however, the recording is modeled as an ordered list of events rather than a text&lt;br /&gt;file of proprietary script statements.&lt;br /&gt;115&lt;br /&gt;Rebecca specifies an UI to navigate through the recorded event list. The UI renders each&lt;br /&gt;event as a separate line in a scrolling list box with a subset of the list visible at any one time.&lt;br /&gt;The REW and FF buttons jump to the beginning and end of the list respectively. A scrollbar&lt;br /&gt;allows the user to move through the list with more granularity. Finally, the up and down arrow&lt;br /&gt;keys move line-by-line through the list.&lt;br /&gt;Rebecca also defines editing capabilities on the event list. Using the UI, the user can select a&lt;br /&gt;region of the event list for cutting or copying. Once the edit command is executed, the events&lt;br /&gt;in the specified region are copied to a global clipboard. The user can then use UI navigation&lt;br /&gt;tools to select a position in the list to paste the clipboard events to.&lt;br /&gt;Figure 39: Recorder turned on and recording of plus push button press&lt;br /&gt;made.&lt;br /&gt;The global event clipboard allows events recorded for one user to be copied to another user’s&lt;br /&gt;recording. The ability to share events between recordings makes it easy to create new scripts&lt;br /&gt;with less actual recording. This reduces the amount of work necessary to create a multiuser&lt;br /&gt;test.&lt;br /&gt;116&lt;br /&gt;Figure 40: Push button press events copied and pasted back into the&lt;br /&gt;event list.&lt;br /&gt;Rebecca always maintains the current position in the event list. This position is used by the&lt;br /&gt;clipboard for determining the starting point of a list selection for a cut/copy to the clipboard.&lt;br /&gt;It also indicates where to insert the clipboard contents for a paste. Finally it can splice new&lt;br /&gt;events directly into the event list, rather than at the end, if recorder is reactivated.&lt;br /&gt;Rebecca-J’s implementation of the event list script architecture is shown in the series of&lt;br /&gt;storyboard snapshots in Figure 39 and Figure 40. The recorder is turned on and a recording of&lt;br /&gt;pressing the plus push button is made (Figure 39). The recorder is turned off, the event list is&lt;br /&gt;displayed, and the specific events that resulted from the push button press are selected and&lt;br /&gt;copied to the clipboard. The clipboard contents are pasted to the event list (Figure 40). The&lt;br /&gt;script is rewound to the beginning and played back. The results of the replay are displayed.&lt;br /&gt;Note that the count text field displays “3” because script contains two plus push button press&lt;br /&gt;events (Figure 41).&lt;br /&gt;Figure 41: Result of replay.&lt;br /&gt;117&lt;br /&gt;6.2.6 Playback Control and Feedback&lt;br /&gt;Rebecca defines a VCR metaphor for playback control of recordings. Playback of a recorded&lt;br /&gt;script begins when the play button is pressed. Playback terminates when one of the following&lt;br /&gt;conditions is met: synchronization event occurs, breakpoint reached, single step active, stop&lt;br /&gt;button pressed, end of script reached. When the playback system encounters a&lt;br /&gt;synchronization event, the replay of the script is paused. If the synchronization condition is&lt;br /&gt;met or the play button is pressed playback continues with the event following the&lt;br /&gt;synchronization event. For more information on synchronization, see Section 6.3.2.&lt;br /&gt;When a breakpoint is encountered, replay of the script is stopped. If the play button is pressed&lt;br /&gt;again, playback continues with the event following the breakpoint. Activating single step&lt;br /&gt;mode is equivalent to setting a breakpoint after every event in the script. Pressing the stop&lt;br /&gt;button during playback has the same effect as encountering a breakpoint. When the end of&lt;br /&gt;recording is reached, playback is stopped. If the play button is pressed again, the script is&lt;br /&gt;rewound and playback begins with the first event in the script.&lt;br /&gt;Unlike traditional test script systems, Rebecca allows playback to start anywhere in a script.&lt;br /&gt;The starting position is specified using the navigational tools (FF button, REW button, scrollbar,&lt;br /&gt;and keyboard keys) discussed in Section 6.2.6. An arbitrary starting position is useful when the&lt;br /&gt;user wants to execute part of a recording, but isn’t committed to pruning the recording to a&lt;br /&gt;specific subset. This capability is particularly useful in the early stages of runtime error&lt;br /&gt;discovery. In early stages, the user knows that the recording triggers an application error, but&lt;br /&gt;isn’t sure what subset of events actually triggers the problem.&lt;br /&gt;Rebecca describes a continuous play control that disables script termination when the end of&lt;br /&gt;the recording is reached. Rather than terminating, the script is rewound and playback resumes&lt;br /&gt;with the first event in the recording. Repeatedly executing a recording places a constant load&lt;br /&gt;on the application. This load helps assess the performance characteristics. When multiple&lt;br /&gt;recordings are executed simultaneously with continuous play, the scalability of a multiuser&lt;br /&gt;application can be assessed. Finally, continuous play can be used to flush out bugs that only&lt;br /&gt;appear when the application is under load.&lt;br /&gt;118&lt;br /&gt;Rebecca allows the user to control the how long the playback system waits before replaying the&lt;br /&gt;next event. Rebecca specifies three types of wait time: none, recorded, and fixed value.&lt;br /&gt;Setting the wait time to none is useful when stress testing, and when determining maximum&lt;br /&gt;throughput. At the time a recording is made, the wait time between each event is retained.&lt;br /&gt;Using the recorded wait time gives an accurate playback of the recording. This helps assess&lt;br /&gt;the how the application behaves under normal throughput levels.&lt;br /&gt;Traditional testing systems provide wait time options like none and recorded. Rebecca,&lt;br /&gt;however, also allows the wait time to be set to a fixed value. This gives the ability to control&lt;br /&gt;the rate at which events are passed to the application. Throughput control is useful for&lt;br /&gt;investigating performance characteristics and timing dependent bugs. In addition, setting the&lt;br /&gt;wait time to a large value allows application behavior to be observed in slow motion. This is a&lt;br /&gt;powerful tool for investigating errors that happen too quickly at normal throughput rates.&lt;br /&gt;Rebecca’s VCR metaphor is defined for feedback as well as control. The push buttons that&lt;br /&gt;make up Rebecca’s VCR-like user interface indicate the current state of the record/playback&lt;br /&gt;system. During playback, for example, the play button will be depressed. If a synchronization&lt;br /&gt;event occurs, the play button will release and the synchronization button will be depressed.&lt;br /&gt;This feedback happens without the need for user intervention. Figure 42 shows the feedback&lt;br /&gt;for the transition from playback to synchronization in the Java implementation of Rebecca.&lt;br /&gt;For more information on synchronization see Section 6.3.2.&lt;br /&gt;Figure 42: Feedback for synchronization state in Rebecca-J.&lt;br /&gt;Rebecca requires feedback for events in the recording. Rebecca maintains the current position&lt;br /&gt;in the event list during record/playback. Feedback about this position should be continually&lt;br /&gt;displayed to the user.&lt;br /&gt;Rebecca-J experimented with this requirement by implementing four different feedback&lt;br /&gt;mechanisms: progress bar, highlighted event, scrollbar, and current event field. The progress&lt;br /&gt;bar displays the current playback position as a shaded percentage of the bar. If the current&lt;br /&gt;119&lt;br /&gt;position is ten percent of the way through the script, then the progress bar will indicate ten&lt;br /&gt;percent complete. Rebecca-J displays a subset of the event list around the current event in a&lt;br /&gt;scrolling window. The current event is highlighted in this window. The scrollbar associated&lt;br /&gt;with the scrolling window also indicates the position of the current event with respect to the&lt;br /&gt;event list. Finally, a separate text field displays the event at the current position.&lt;br /&gt;Several feedback mechanisms were implemented when it was discovered that an artificial load&lt;br /&gt;was being placed on the application. The feedback was helpful, but the artificial load created&lt;br /&gt;recordings and playbacks that were sluggish. Although the progress bar provided feedback&lt;br /&gt;with the smallest amount of load, it was impossible to completely eliminate the probe effect.&lt;br /&gt;Rebecca-J was modified to provide the feedback mechanisms as options that are turned off by&lt;br /&gt;default.&lt;br /&gt;Rebecca requires additional feedback within the application during playback: replay cursor.&lt;br /&gt;The replay cursor shows where UI activity is occurring during playback. The look of the&lt;br /&gt;replay cursor is configurable. By default it is several times larger than the default system&lt;br /&gt;cursor. An enlarged cursor was chosen when it was observed that users had difficulty seeing&lt;br /&gt;UI activity during single user playback. The larger cursor also makes it easier to see UI activity&lt;br /&gt;on multiple displays during a multiuser test.&lt;br /&gt;6.2.7 Native Language Recordings&lt;br /&gt;Executing a test script containing a simple list of runtime events for playback is sufficient in&lt;br /&gt;many situations. However, if sophisticated data structures and control flow for test execution&lt;br /&gt;are desired, the script must contain elements of a programming language. Traditional test&lt;br /&gt;systems satisfy this requirement with a proprietary C-like language. Instead of this proprietary&lt;br /&gt;approach, Rebecca describes a blueprint for exporting scripts in a familiar format: the IDE’s&lt;br /&gt;native programming language.&lt;br /&gt;Using the IDE’s programming language has several advantages. First, the test system’s&lt;br /&gt;learning curve is reduced. The developer uses the same language to produce both the&lt;br /&gt;120&lt;br /&gt;application and test scripts. A familiar process is also followed for editing, compiling, and&lt;br /&gt;executing.&lt;br /&gt;Since the test script is a program, the IDE’s powerful debugging facilities are available.&lt;br /&gt;Modern debugger capabilities include line-by-line execution control, remote debugging,&lt;br /&gt;variable monitoring, and runtime code changes. Traditional test systems do not have a&lt;br /&gt;debugger.&lt;br /&gt;Figure 43: Implementation of MouseEventRecord’s toJavaString() Method&lt;br /&gt;The architecture for native language recordings consists of three main components. First,&lt;br /&gt;each event in the event list must support the ability to export itself in the IDE’s native&lt;br /&gt;language. Second, during the export process, the test system provides some wrapper code&lt;br /&gt;around the exported events. The wrapper code prepares the test system for script execution&lt;br /&gt;and cleanup. Third, the same VCR-like control mechanisms available to the user when&lt;br /&gt;executing runtime event lists, must be available for native language recordings. However, the&lt;br /&gt;FF button, REW button, and positional feedback described in Section 6.2.6 will not be available.&lt;br /&gt;This is because the events in a native recording can be non-sequential. For example, eventC&lt;br /&gt;public String toJavaString() {&lt;br /&gt;String result = new String();&lt;br /&gt;Result = result.concat("//" + toString() + "/n");&lt;br /&gt;result = result.concat("executeEventRecord(new MouseEventRecord(" +&lt;br /&gt;getTimestamp() +&lt;br /&gt;"L, \"" +&lt;br /&gt;getComponentId() +&lt;br /&gt;"\", " +&lt;br /&gt;getWaittime() +&lt;br /&gt;", " +&lt;br /&gt;getID() +&lt;br /&gt;", " +&lt;br /&gt;getWhen() +&lt;br /&gt;", " +&lt;br /&gt;getModifiers() +&lt;br /&gt;", " +&lt;br /&gt;getX() +&lt;br /&gt;", " +&lt;br /&gt;getY() +&lt;br /&gt;", " +&lt;br /&gt;getClickCount() +&lt;br /&gt;", " +&lt;br /&gt;getPopupTrigger() +&lt;br /&gt;"));\n\n");&lt;br /&gt;return result;&lt;br /&gt;}&lt;br /&gt;121&lt;br /&gt;may appear after eventB in a native language recording, but if eventA and eventB are inside&lt;br /&gt;a loop construct, eventC may not be executed after eventB. Positional feedback is available,&lt;br /&gt;however, using the IDE’s debugger&lt;br /&gt;Figure 44: Sample output from MouseEventRecord’s toJavaString() Method&lt;br /&gt;Rebecca-J divides the process of creating a native language recording into two parts. First, all&lt;br /&gt;recorded events in the runtime event list must support the method toJavaString(). This&lt;br /&gt;method returns a string containing an invocation of the PlaybackAsClass method&lt;br /&gt;executeEventRecord() with a single parameter. The parameter invokes an event class&lt;br /&gt;constructor to create an event instance. Figure 43 shows the implementation for&lt;br /&gt;MouseEventRecord’s method toJava(). Figure 44 gives an example of output from the&lt;br /&gt;toJava() method. The output includes a human-readable comment to describe the event.&lt;br /&gt;Second, to export a recording as Java code, a class that inherits from the abstract class&lt;br /&gt;PlaybackThreadAsClass is created. The only code in this new class is the implementation&lt;br /&gt;of the abstract method exececuteEventRecordList() containing strings produced from&lt;br /&gt;invoking toJavaString() for each event in the runtime event list (see Figure 45).&lt;br /&gt;Figure 45: Sample implementation of executeEventRecordList()&lt;br /&gt;. . .&lt;br /&gt;//50 JButton2 MouseEvent(MOUSE_MOVED,(16,5),popUp=false,clicks=0)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529350140L, "JButton2", 50, 503, 947529350140, 0, 16, 5, 0,&lt;br /&gt;false));&lt;br /&gt;//770 JButton2 MouseEvent(MOUSE_PRESSED,(16,5),popUp=false,clicks=1)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529350910L, "JButton2", 770, 501, 947529350910, 16, 16, 5, 1,&lt;br /&gt;false));&lt;br /&gt;//60 JButton2 MouseEvent(MOUSE_RELEASED,(16,5),popUp=false,clicks=1)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529351410L, "JButton2", 60, 502, 947529351020, 16, 16, 5, 1,&lt;br /&gt;false));&lt;br /&gt;//0 JButton2 MouseEvent(MOUSE_CLICKED,(16,5),popUp=false,clicks=1)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529351410L, "JButton2", 0, 500, 947529351020, 16, 16, 5, 1,&lt;br /&gt;false));&lt;br /&gt;//390 JButton2 MouseEvent(MOUSE MOVED (16 4) popUp=false li ks=0)&lt;br /&gt;//50 JButton2 MouseEvent(MOUSE_MOVED,(16,5),popUp=false,clicks=0)&lt;br /&gt;executeEventRecord( new MouseEventRecord(947529350140L, "JButton2", 50, 503,&lt;br /&gt;947529350140, 0, 16, 5, 0, false));&lt;br /&gt;122&lt;br /&gt;Figure 46: Recording customized with a for loop&lt;br /&gt;When the user presses Rebecca-J’s VCR-like play button, executeEventRecordList() will&lt;br /&gt;be executed from start to finish in a manner similar to the runtime event list discussed in&lt;br /&gt;previous sections. Unlike the runtime event list, however, the developer can customize the&lt;br /&gt;recording using Java data structures and control constructs. Figure 46 shows a modification of&lt;br /&gt;Figure 46 where a push button event is executed ten times in a for loop.&lt;br /&gt;6.3 Multiuser Support&lt;br /&gt;6.3.1 Interprocess Communication Independence&lt;br /&gt;Independence from an application’s interprocess communication mechanism (IPC) is an&lt;br /&gt;important goal of the architecture. Rebecca’s specifications include IPC independence and the&lt;br /&gt;ability to record, playback, and monitor process to process communication. This combination&lt;br /&gt;is not available in other test systems.&lt;br /&gt;Multiuser applications require interprocess communication to share information between&lt;br /&gt;distributed components. A variety of IPC mechanisms are available including: shared&lt;br /&gt;memory, pipes, files, UDP, sockets, RPC, CORBA, DCOM, http, EJB, and RMI. The list&lt;br /&gt;continues to grow as communication technology improves. Recently, for example, Sun&lt;br /&gt;announced the availability of the Java Shared Data Toolkit (JSDT) for building collaborative&lt;br /&gt;applications [135].&lt;br /&gt;. . .&lt;br /&gt;//50 JButton2 MouseEvent(MOUSE_MOVED,(16,5),popUp=false,clicks=0)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529350140L, "JButton2", 50, 503, 947529350140, 0, 16, 5, 0,&lt;br /&gt;false));&lt;br /&gt;// Push the button 10 times&lt;br /&gt;for (int i; i&lt;10; i++) {&lt;br /&gt;//770 JButton2 MouseEvent(MOUSE_PRESSED,(16,5),popUp=false,clicks=1)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529350910L, "JButton2", 770, 501, 947529350910, 16, 16, 5, 1,&lt;br /&gt;false));&lt;br /&gt;//60 JButton2 MouseEvent(MOUSE_RELEASED,(16,5),popUp=false,clicks=1)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529351410L, "JButton2", 60, 502, 947529351020, 16, 16, 5, 1,&lt;br /&gt;false));&lt;br /&gt;//0 JButton2 MouseEvent(MOUSE_CLICKED,(16,5),popUp=false,clicks=1)&lt;br /&gt;executeEventRecord(new MouseEventRecord(947529351410L, "JButton2", 0, 500, 947529351020, 16, 16, 5, 1,&lt;br /&gt;false));&lt;br /&gt;}&lt;br /&gt;//390 JButton2 MouseEvent(MOUSE_MOVED,(16,4),popUp=false,clicks=0)&lt;br /&gt;123&lt;br /&gt;Testing systems have different approaches to handing application IPC. Commercial products&lt;br /&gt;are usually IPC independent. Independence is attractive because it increases the number of&lt;br /&gt;applications that can be tested. These systems treat distributed application communication as&lt;br /&gt;a black box. The disadvantage to the black box approach is that it is not possible to monitor&lt;br /&gt;IPC. This restriction means that an application’s IPC subsystem cannot be tested directly.&lt;br /&gt;Several academic systems support the ability to record, playback, and monitor application IPC.&lt;br /&gt;Sridharan, for example, proposed a test system for applications that communicate using&lt;br /&gt;CORBA [136]. The system hooks into the IPC of any CORBA application by analyzing IDL&lt;br /&gt;stubs. This approach has several drawbacks. First, it is tied to a specific IPC: CORBA. This&lt;br /&gt;restricts the number of distributed applications useable with the test system. Second, it forces&lt;br /&gt;the test system to monitor all IPC communication, rather than a targeted subset. This all or&lt;br /&gt;nothing approach could overwhelm the user with the amount of distributed communication&lt;br /&gt;occurring in the application. Additionally, the probe effect caused by monitoring could affect&lt;br /&gt;the behavior of the application while the test system is being used.&lt;br /&gt;Rebecca provides for the targeted monitoring of application IPC with extensible component&lt;br /&gt;and event models (see section 6.2.3). Process to process communication is treated as a&lt;br /&gt;component event, similar to any other event in the application. The developer is responsible&lt;br /&gt;for using Rebecca’s component and event frameworks to define a receiving component and&lt;br /&gt;event for the IPC.&lt;br /&gt;The developer is also responsible for using Rebecca’s record filtration system to ensure that&lt;br /&gt;redundant events are not present in the recording. Redundant events can occur when an IPC&lt;br /&gt;causes the receiving process to generate a local event that is also recorded. For example, an&lt;br /&gt;IPC for a remote mouse event is received, and generates a local mouse event. Both the IPC&lt;br /&gt;and the local mouse event will appear in the recording unless record filtration is used.&lt;br /&gt;Rebecca-J includes an example of IPC record/playback. The program shown in Figure 41 and&lt;br /&gt;described in previous sections was expanded into a multiuser application. When count is&lt;br /&gt;changed locally, the new value is broadcast via RMI to all remote instances of the application.&lt;br /&gt;Two methods, remoteSetCount() and setCount() distinguish between remote and local&lt;br /&gt;124&lt;br /&gt;changes to count. This distinction is necessary to keep remote receivers from re-sending the&lt;br /&gt;change.&lt;br /&gt;setCount() shown in Figure 47, records a local change in the count value with the&lt;br /&gt;ComponentMonitor by invoking firePropertyChange(). The local change is broadcast to&lt;br /&gt;all remote instances of the application by invoking fireUpdateCountRemotely(). Finally,&lt;br /&gt;the actual value of count is updated.&lt;br /&gt;Figure 47: Implementation of setCount()&lt;br /&gt;Figure 48: Implementation of remoteSetCount()&lt;br /&gt;remoteSetCount() shown in Figure 48 records the remote change in the count value&lt;br /&gt;with the ComponentMonitor by invoking firePropertyChange on a component named&lt;br /&gt;“remoteCount”. The local change in the count value is also recorded by invoking the&lt;br /&gt;same method on the component named “count”. The local value of count is updated.&lt;br /&gt;Finally, the UI text field displaying count is updated with the new value.&lt;br /&gt;The code for remoteSetCount() will introduce redundant events into a recording unless&lt;br /&gt;record filtration is used (see Figure 49). One event appears in the recording for the remote&lt;br /&gt;change to count. Another event appears in the recording for the local change to count.&lt;br /&gt;public void setCount(int newValue) {&lt;br /&gt;getCountPropertyChangeComponentInt().firePropertyChange("count",new Integer(this.count),&lt;br /&gt;new Integer(newValue));&lt;br /&gt;fireUpdateCountRemotely(newValue);&lt;br /&gt;this.count = newValue;&lt;br /&gt;}&lt;br /&gt;1&lt;br /&gt;public void remoteSetCount(int newValue) {&lt;br /&gt;getRemoteCountPropertyChangeComponentInt().firePropertyChange("remoteCount",&lt;br /&gt;new Integer(this.count),&lt;br /&gt;new Integer(newValue));&lt;br /&gt;getCountPropertyChangeComponentInt().firePropertyChange("count",&lt;br /&gt;new Integer(this.count),&lt;br /&gt;new Integer(newValue));&lt;br /&gt;this.count = newValue;&lt;br /&gt;getJTextField1().setText(String.valueOf(this.count));&lt;br /&gt;}&lt;br /&gt;125&lt;br /&gt;Figure 49: Record filtration to remove redundant events while&lt;br /&gt;recording IPC.&lt;br /&gt;6.3.2 Playback Orchestration&lt;br /&gt;The phrase "orchestration" conjures up the image of a concert hall filled with musicians led by&lt;br /&gt;a conductor. The musical score and conductor provide individual musicians with mechanisms&lt;br /&gt;to synchronize their instruments with the rest of the orchestra. Rebecca uses a musical sheet&lt;br /&gt;metaphor for orchestrating virtual users. The tester is the composer. Rebecca is the&lt;br /&gt;conductor. The virtual user is the instrument. The test script is the musical score. The score&lt;br /&gt;provides both notes and timing for the notes. Just like the timing bars in a musical score, the&lt;br /&gt;script score has events that mark off intervals in the test script. These intervals are&lt;br /&gt;synchronization points. During a test session, a test script statement in the next interval can't&lt;br /&gt;proceed until all concurrently executing test scripts reach the end of the current interval. This&lt;br /&gt;synchronization technique is a modified version of the BSP algorithm [137].&lt;br /&gt;Orchestration is viewed as the process of scoring test scripts individually, and then tying the&lt;br /&gt;scripts together. A synchronization event will appear when a script is synchronized with one&lt;br /&gt;or more virtual users. When scripts are linked, they may have unequal numbers of&lt;br /&gt;synchronization events. This can occur, for example, when two virtual users synchronize on&lt;br /&gt;an event that doesn't involve a third virtual user. Fortunately, this does not present a problem.&lt;br /&gt;During execution, the third virtual user would be unaware of the synchronization and move on&lt;br /&gt;126&lt;br /&gt;to its next synchronization event. This scenario is the musical equivalent of a duet between&lt;br /&gt;two solo instruments while the rest of the orchestra plays in step in the background.&lt;br /&gt;Start CollabBillboard Server&lt;br /&gt;Wait for Start Server Dialogue&lt;br /&gt;Wait for OK button to activate&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Site Selection button&lt;br /&gt;Wait for Site Selection Dialogue&lt;br /&gt;Move to x,ypos100,100&lt;br /&gt;Press left mouse button&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press View Placement button&lt;br /&gt;Wait for View Placement Dialogue&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;Start CollabBillboard Client&lt;br /&gt;Wait for Start Client Dialogue&lt;br /&gt;Press Configure Button&lt;br /&gt;Wait for Configure Dialogue&lt;br /&gt;Enter server IP address in text field&lt;br /&gt;Press OK Button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Site Selection button&lt;br /&gt;Wait for Site Selection Dialogue&lt;br /&gt;Move to x,ypos100,200&lt;br /&gt;Press left mouse button&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Place Billboard button&lt;br /&gt;Wait for Place Billboard Dialogue&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;Start CollabBillboard Client&lt;br /&gt;Wait for Start Client Dialogue&lt;br /&gt;Press Configure Button&lt;br /&gt;Wait for Configure Dialogue&lt;br /&gt;Enter server IP address in text field&lt;br /&gt;Press OK Button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Site Selection button&lt;br /&gt;Wait for Site Selection Dialogue&lt;br /&gt;Move to x,ypos 100,200&lt;br /&gt;Press left mouse button&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Place Billboard button&lt;br /&gt;Wait for Place Billboard Dialogue&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;Figure 50: Original Playback Orchestration Proposal&lt;br /&gt;Figure 50 demonstrates the original vision for playback orchestration. The goal of the test case&lt;br /&gt;is to see if the system behaves correctly if one or more users enter the application’s puzzle&lt;br /&gt;assembly task while another user is still performing the site selection task. The left and middle&lt;br /&gt;users enter the site selection task at the same time by synchronizing on the display of the Site&lt;br /&gt;Selection Task dialogue. The left user then continues, unhindered, to the puzzle assembly task.&lt;br /&gt;The middle user waits for the right user to begin the puzzle assembly before exiting site&lt;br /&gt;selection. Finally, the left and middle users resynchronize in the puzzle assembly task. The far&lt;br /&gt;right column shows the relative positioning of all synchronization points in the test script.&lt;br /&gt;6.3.2.1 Differences from Candidacy&lt;br /&gt;The original vision for playback orchestration was guided by the desire for a simple, powerful&lt;br /&gt;metaphor for playback orchestration between virtual users: the musical score. A musical score&lt;br /&gt;is basically an ordered list of notes. Notes are synchronized between instruments using&lt;br /&gt;measures. With the exception of the refrain, a simple control construct, musical notes in a&lt;br /&gt;score are always played sequentially.&lt;br /&gt;Rebecca’s playback orchestration design reflects a deeper understanding of the record,&lt;br /&gt;playback and synchronization process. Using a separate graphical representation for&lt;br /&gt;synchronization outside of the script works well when no control constructs are present. This&lt;br /&gt;is compatible with Rebecca’s default recording format: the ordered list. However, Rebecca’s&lt;br /&gt;127&lt;br /&gt;recordings can also be exported in the IDE’s native language. The native language version of&lt;br /&gt;the script can be modified with sophisticated control constructs.&lt;br /&gt;In order to represent control constructs using the musical metaphor, additional graphical&lt;br /&gt;elements would have to be added to the synchronization language. In fact, the entire flow of&lt;br /&gt;the script would have to be redisplayed in the synchronization language. Whether this is done&lt;br /&gt;automatically by a native language to synchronization language translator, or the developer is&lt;br /&gt;required to create the score manually, it is not an attractive solution because it requires the&lt;br /&gt;developer to invest time learning a new language.&lt;br /&gt;Rebecca already has two languages for representing a recording. First is the sequential&lt;br /&gt;language of the ordered list of events. Second is the native language representation of the&lt;br /&gt;events. Rather than creating a third language for synchronization, Rebecca uses the event&lt;br /&gt;metaphor that already exists for recordings. Synchronization is treated as an event in the&lt;br /&gt;recording, just like any other event. This allows the developer to continue to work with&lt;br /&gt;familiar tools, language, and metaphors.&lt;br /&gt;6.3.2.2 Improvements to Traditional Testing&lt;br /&gt;Rebecca improves traditional testing system synchronization in several areas. First, the process&lt;br /&gt;of creating a synchronization point in a recording is straightforward. It is simply a matter of&lt;br /&gt;inserting a single line synchronization event in the recording. In many testing systems,&lt;br /&gt;however, the synchronization process is more complicated. Final Exam™ C/S Test, for&lt;br /&gt;example, uses a sophisticated messaging system with nine commands to deal with&lt;br /&gt;synchronization.&lt;br /&gt;Rebecca’s approach to synchronization is similar to TestSuite™’s rendezvous() and SQA&lt;br /&gt;Suite™’s SQAVuSyncAndResume() commands. In these systems, the script command is&lt;br /&gt;executed with a synchronization id passed as a parameter. The script blocks until the&lt;br /&gt;command is executed by other virtual users with the same synchronization id.&lt;br /&gt;128&lt;br /&gt;Rebecca improves on TestSuite and SQA Suite as well. Both of these systems require the&lt;br /&gt;developer to declare the synchronization id before the synchronization command is used.&lt;br /&gt;This allows the test system to determine which virtual users are participating in a specific&lt;br /&gt;synchronization. Rebecca simplifies the process by not requiring this extra declaration. It&lt;br /&gt;parses the recording on-the-fly to determine synchronization participants automatically.&lt;br /&gt;Figure 51: An (V+E) algorithm to determine cycles in a&lt;br /&gt;graph.&lt;br /&gt;Traditional testing systems only allow executing scripts to participate in synchronization.&lt;br /&gt;Rebecca relaxes this restriction to provide support for triggers (see Section 6.3.3). With&lt;br /&gt;triggers, a script that is not currently executing may execute in the future. In fact, any loaded&lt;br /&gt;script can be executed in the future by pressing Rebecca’s VCR-like play button. Rebecca&lt;br /&gt;simplifies candidate selection by viewing all loaded scripts as synchronization candidates. To&lt;br /&gt;remove a script from the list of synchronization candidates, it must be unloaded or marked as&lt;br /&gt;disabled by the user.&lt;br /&gt;6.3.2.3 Deadlock Detection and Recovery&lt;br /&gt;Synchronization between virtual users always has the potential for deadlock. Traditional test&lt;br /&gt;systems place the problem of deadlock on the shoulders of the script developer. The script&lt;br /&gt;detectCycle(Graph G) {&lt;br /&gt;G.setCycle(false)&lt;br /&gt;for each vertex in G do&lt;br /&gt;vertex.setColor(white)&lt;br /&gt;vertex.clearPredecessorList()&lt;br /&gt;endfor&lt;br /&gt;time = 0;&lt;br /&gt;for each vertex in G do&lt;br /&gt;if (vertex.getColor() == white)&lt;br /&gt;dfsVisit(vertex, G)&lt;br /&gt;endif&lt;br /&gt;if (G.getCycle()) return false&lt;br /&gt;endfor&lt;br /&gt;return false&lt;br /&gt;}&lt;br /&gt;dfsVisit(Vertex vertex, Graph G) {&lt;br /&gt;vertex.setColor(gray);&lt;br /&gt;for each neighbor of vertex do&lt;br /&gt;if (neighbor.getColor() == white)&lt;br /&gt;neighbor.addPredecessorList(vertex)&lt;br /&gt;dfsVisit(vertex)&lt;br /&gt;endif&lt;br /&gt;if (neighbor.getColor() == black)&lt;br /&gt;G.setCycle(true)&lt;br /&gt;return&lt;br /&gt;endif&lt;br /&gt;endfor&lt;br /&gt;vertex.setColor(black);&lt;br /&gt;}&lt;br /&gt;129&lt;br /&gt;developer must be very careful about performing synchronization between scripts to prevent&lt;br /&gt;deadlock. Avoidance is the only technique available.&lt;br /&gt;Rebecca, in contrast, provides full support for deadlock detection and recovery. Deadlock is&lt;br /&gt;detected by searching for cycles in a resource graph created from synchronization events in&lt;br /&gt;each virtual user script. Vertices consist of scripts and synchronization ids. An edge is drawn&lt;br /&gt;from an id to a script if the script has not synchronized on the id, but may at some point in the&lt;br /&gt;future. An edge is drawn from a script to an id if the script is synchronizing on the id. A&lt;br /&gt;(V+E) DFS digraph algorithm is used to determine if there is a cycle in the graph (Figure&lt;br /&gt;51).&lt;br /&gt;A sample resource graph is shown in Figure 52. In this example, there are three scripts (S1, S2,&lt;br /&gt;S3) and two synchronization ids (I1, I2). Scripts S1 and S2 are synchronizing on id I1. In&lt;br /&gt;addition, S1 may synchronize on id I2 in the future. Script S3 is synchronizing on id I2, but may&lt;br /&gt;synchronize on I1 in the future. Deadlock occurs in the example because S3 will never unblock&lt;br /&gt;until S1 synchronizes on I2 and S1 will never unblock until S3 synchronizes on I1. Figure 52 also&lt;br /&gt;hows the cycle detected in the graph.&lt;br /&gt;Figure 52: Resource graph (left) with deadlock cycle detected&lt;br /&gt;(right)&lt;br /&gt;Traditional testing systems have no deadlock recovery capabilities and require the user to&lt;br /&gt;terminate the deadlocked scripts. Rebecca alerts the user when deadlock is detected and&lt;br /&gt;provides several recovery options. The user can select a specific synchronization id to&lt;br /&gt;unblock. All scripts blocking on that id will then continue executing. The user can also use&lt;br /&gt;the VCR-like script controls to play or stop one or more blocking scripts. If the play button is&lt;br /&gt;S1&lt;br /&gt;S2&lt;br /&gt;S3&lt;br /&gt;I1&lt;br /&gt;I2&lt;br /&gt;S1&lt;br /&gt;S2&lt;br /&gt;S3&lt;br /&gt;I1&lt;br /&gt;I2&lt;br /&gt;130&lt;br /&gt;pressed, the script is unblocked and the script continues executing. If the stop button is&lt;br /&gt;selected the script is unblocked, the current line in the script is moved one past the&lt;br /&gt;synchronization event, and script execution terminates.&lt;br /&gt;Rebecca implements the original proposal’s concept of a measure by allowing repeated&lt;br /&gt;synchronization on a single id. Replay of a group of scripts can be coordinated by inserting&lt;br /&gt;the same number of synchronization points on the same id in each script. Figure 53 reworks&lt;br /&gt;the script coordination of the example in Figure 50. Traditional test systems do not support&lt;br /&gt;repeated synchronization on the same id within a single script.&lt;br /&gt;Figure 53: Reworked playback orchestration&lt;br /&gt;6.3.2.4 Algorithms&lt;br /&gt;Synchronization is controlled from a central process: RebeccaServer. When&lt;br /&gt;RebeccaAgents encounter a synchronization event, RebeccaServer is notified. For more&lt;br /&gt;information on the general architecture, see Section 6.1.&lt;br /&gt;The algorithm for processSynchronizationEvent() is shown in Figure 54 shows how&lt;br /&gt;RebeccaServer processes a synchronization event. The synchronization id and virtual user&lt;br /&gt;script are extracted from the event. The virtual user script is synchronized. Two lists are&lt;br /&gt;created. The first list, alreadySynchronizingList, contains all virtual user scripts&lt;br /&gt;synchronizing on the same id. The second list, toBeSynchronized, contains all of the scripts&lt;br /&gt;Start CollabBillboard Server&lt;br /&gt;Wait for Start Server Dialogue&lt;br /&gt;Wait for OK button to activate&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Synhronize(measure0)&lt;br /&gt;Press Site Selection button&lt;br /&gt;Wait for Site Selection Dialogue&lt;br /&gt;Move to x,y pos 100,100&lt;br /&gt;Press left mouse button&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press View Placement button&lt;br /&gt;Wait for View Placement Dialogue&lt;br /&gt;Synchronize(measure0)&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;Start CollabBillboard Client&lt;br /&gt;Wait for Start Client Dialogue&lt;br /&gt;Press Configure Button&lt;br /&gt;Wait for Configure Dialogue&lt;br /&gt;Enter server IP address in text field&lt;br /&gt;Press OK Button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Synchronize(measure0)&lt;br /&gt;Press Site Selection button&lt;br /&gt;Wait for Site Selection Dialogue&lt;br /&gt;Move to x,y pos 100,200&lt;br /&gt;Press left mouse button&lt;br /&gt;Synchronize(measure1)&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Place Billboard button&lt;br /&gt;Wait for Place Billboard Dialogue&lt;br /&gt;Synchronize(measure0)&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;Start CollabBillboard Client&lt;br /&gt;Wait for Start Client Dialogue&lt;br /&gt;Press Configure Button&lt;br /&gt;Wait for Configure Dialogue&lt;br /&gt;Enter server IP address in text field&lt;br /&gt;Press OK Button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Site Selection button&lt;br /&gt;Wait for Site Selection Dialogue&lt;br /&gt;Move to x,y pos 100,200&lt;br /&gt;Press left mouse button&lt;br /&gt;Press OK button&lt;br /&gt;Wait for Task Menu Dialogue&lt;br /&gt;Press Place Billboard button&lt;br /&gt;Wait for Place Billboard Dialogue&lt;br /&gt;Synchronize(measure1)&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;.&lt;br /&gt;131&lt;br /&gt;that have the id, but haven’t synchronized on it yet. toBeSynchronized is tested.&lt;br /&gt;If empty, no more scripts are needed to synchronize on this id. All scripts in already-&lt;br /&gt;SynchronizingList can be released. If toBeSynchronized is non-empty, then a deadlock&lt;br /&gt;check is performed. If no deadlock is detected, then the algorithm terminates. If deadlock is&lt;br /&gt;detected, then a list of recovery options is presented to the user.&lt;br /&gt;Figure 54: Algorithm to Process Synchronization Events&lt;br /&gt;Figure 55: Algorithm for the removal of a synchronization event from a&lt;br /&gt;script.&lt;br /&gt;In addition to processing the replay of synchronization events, RebeccaServer must also&lt;br /&gt;monitor script editing. Consider the situation where a group of virtual user scripts block on&lt;br /&gt;synchronization id I1. The scripts are waiting for one final script to fire a synchronization&lt;br /&gt;event on I1. The user then deletes all synchronization events containing I1 from the final&lt;br /&gt;processSynchronizationEvent(SynchronizationEvent event) {&lt;br /&gt;id = event.getSynchronizationId()&lt;br /&gt;script = event.getScript()&lt;br /&gt;synchronize script&lt;br /&gt;alreadySynchronizingList = getAlreadySynchronizingList(id)&lt;br /&gt;toBeSynchronizingList = getToBeSynchronizingList(id)&lt;br /&gt;if (toBeSynchronizedList is empty)&lt;br /&gt;for each script in alreadySynchronizingList()&lt;br /&gt;release synchronization on script&lt;br /&gt;endfor&lt;br /&gt;else&lt;br /&gt;G = getGraph();&lt;br /&gt;if (getCycle(G))&lt;br /&gt;display deadlock recovery options&lt;br /&gt;endif&lt;br /&gt;endelse&lt;br /&gt;}&lt;br /&gt;processDeleteSynchronizationEvent(VirtualUserScript sourceScript,&lt;br /&gt;SynchronizationEvent event)&lt;br /&gt;{&lt;br /&gt;id = event.getSynchronizationId()&lt;br /&gt;script = event.getScript()&lt;br /&gt;alreadySynchronizingList = getAlreadySynchronizingList(id)&lt;br /&gt;toBeSynchronizingList = getToBeSynchronizingList(id)&lt;br /&gt;if (toBeSynchronizedList is empty)&lt;br /&gt;for each script in alreadySynchronizingList()&lt;br /&gt;release synchronization on script&lt;br /&gt;endfor&lt;br /&gt;endif&lt;br /&gt;}&lt;br /&gt;132&lt;br /&gt;script. Deadlock will result unless the blocking scripts are released. The algorithm in Figure&lt;br /&gt;55 shows how RebeccaServer handles this situation.&lt;br /&gt;Modifications to virtual user scripts include: insertion of additional synchronization events,&lt;br /&gt;modification of existing synchronization events, creation of new scripts, and erasure of loaded&lt;br /&gt;scripts. To simplify processing, insertion of additional synchronization events into a script has&lt;br /&gt;no effect existing on other scripts blocking on the same id. Scripts released from&lt;br /&gt;synchronization on the same id also ignore the insertion.&lt;br /&gt;Modification of an existing synchronization event is treated as an event removal followed by&lt;br /&gt;an insertion. Creation of new scripts is treated as event insertion. Finally, erasure of a loaded&lt;br /&gt;script is treated as the removal of all synchronization events in the script.&lt;br /&gt;6.3.2.5 Implementation in Rebecca-J&lt;br /&gt;Figure 56: Determining synchronization points for SecondWind’s&lt;br /&gt;recording.&lt;br /&gt;Rebecca-J implements the synchronization algorithms and architecture discussed in this&lt;br /&gt;section. Three scenarios are presented to demonstrate the implementation. In the first&lt;br /&gt;scenario, two virtual users are orchestrated using the measure technique to create&lt;br /&gt;synchronization points during the replay of their scripts. In the second scenario, a third virtual&lt;br /&gt;#1: Mouse pointer enters here.&lt;br /&gt;#2: Press, hold, drag slider to right.&lt;br /&gt;#3: 1/4 way through drag.&lt;br /&gt;#4: 1/2 way through drag.&lt;br /&gt;#5: 3/4 way through drag.&lt;br /&gt;#6: Release mouse button, end drag.&lt;br /&gt;#7: Press mouse button.&lt;br /&gt;#8: Mouse pointer exits here.&lt;br /&gt;133&lt;br /&gt;user is added which uses a timer to create a ten second delay between measures. In the third&lt;br /&gt;scenario, two virtual users that deadlock during synchronization are examined.&lt;br /&gt;6.3.2.6 Scenario One: Using the Measure Technique to Orchestrate Virtual Users&lt;br /&gt;Two copies of the application from Figure 41 are run on different machines. One machine,&lt;br /&gt;SecondWind, has a Pentium III 450Mhz processor with 256MB of RAM memory. The other&lt;br /&gt;machine, Invictus, has a Pentium 166Mhz/MMX processor.&lt;br /&gt;The tester is trying to track down a bug that seems to occur when two users concurrently&lt;br /&gt;manipulate the slider bar or push buttons in the application. Recordings are made of user&lt;br /&gt;interaction with the application on both machines. On SecondWind, the user grabs the slider&lt;br /&gt;bar and moves it from left to right, then from right to left, then presses the push button&lt;br /&gt;labeled “JButton1”. On Invictus, the user mirrors the slider bar movement by sliding it&lt;br /&gt;from right to left to right, then presses the same push button.&lt;br /&gt;Figure 57: Synchronization Dialog for SecondWind’s&lt;br /&gt;Recording&lt;br /&gt;Next, the tester replays both recordings simultaneously in the hopes of reproducing the bug.&lt;br /&gt;The replay of the recording on SecondWind finishes well ahead of Invictus. During replay&lt;br /&gt;the tester notices that the virtual user exercising the application on SecondWind finishes with&lt;br /&gt;the slider bar in half the time of the user on Invictus. SecondWind’s user replays so&lt;br /&gt;quickly that it presses the push button and exits the application before Invictus’ user&lt;br /&gt;releases the slider bar.&lt;br /&gt;134&lt;br /&gt;Using Rebecca-J’s measure approach to script orchestration, the tester inserts synchronization&lt;br /&gt;events at critical points in both recordings. Figure 56 shows the synchronization points that&lt;br /&gt;were determined for SecondWind’s recording. The first synchronization event appears at the&lt;br /&gt;beginning of both scripts. This guarantees that the scripts will begin replay at the same time.&lt;br /&gt;The second through sixth synchronization events in Figure 56 are inserted at similar locations&lt;br /&gt;in the Invictus recording. These events ensure simultaneous manipulation of the slider bars.&lt;br /&gt;The seventh synchronization point ensures the button press occurs simultaneously in both&lt;br /&gt;recordings. The final synchronization point ensures that both scripts end at the same time.&lt;br /&gt;Figure 58: Synchronization event inserted just before mouse press on slider bar in SecondWind’s&lt;br /&gt;recording&lt;br /&gt;Figure 57 shows the synchronization dialog used to insert the synchronization event with the&lt;br /&gt;id measure eight times into each recording.&lt;br /&gt;Figure 58 shows result of inserting the synchronization event just before the push button is&lt;br /&gt;pressed in SecondWind’s recording.&lt;br /&gt;#2: Second synchronization event inserted&lt;br /&gt;just before mouse press on slider bar.&lt;br /&gt;135&lt;br /&gt;6.3.2.7 Scenario Two: Creating a Metronome Using the Measure Technique&lt;br /&gt;Rebecca-J can be used to control the throughput or load over time that a set of virtual&lt;br /&gt;users place on the application using a metronome. The metronome is created using the&lt;br /&gt;measure technique described earlier in this section coupled with a timer trigger.&lt;br /&gt;Synchronization events with the same synchronization id are placed in a set of virtual&lt;br /&gt;user scripts. A final virtual user is created with a script containing only the&lt;br /&gt;synchronization event with the measure’s id. A timer trigger (see Section 6.3.3) is&lt;br /&gt;attached to this virtual user with an interval set to some value. When the scripts are&lt;br /&gt;executed, the timer trigger fires at regular intervals. The synchronization between virtual&lt;br /&gt;users is paced by this interval much like a musical score paced by a metronome.&lt;br /&gt;Figure 59: Timer trigger and virtual user script to support the metronome in Rebecca-J.&lt;br /&gt;Figure 59 shows the arrangement of the timer trigger and virtual user script to support the&lt;br /&gt;metronome in Rebecca-J.&lt;br /&gt;136&lt;br /&gt;Figure 60: Deadlocked scripts.&lt;br /&gt;6.3.2.8 Scenario Three: Deadlock Detection and Recovery&lt;br /&gt;Rebecca-J has built in support for deadlock detection. Figure 60 shows two scripts that&lt;br /&gt;deadlock immediately. The first script synchronizes on resourceA, but will synchronize on&lt;br /&gt;resourceB in the future. The second script synchronizes on resourceB, but will&lt;br /&gt;synchronize on resourceA in the future. Using the algorithms presented in Section 6.3.2.3,&lt;br /&gt;deadlock is detected and the user is notified. Notification consists of two visual cues:&lt;br /&gt;synchronization button change and deadlock dialog. The synchronization button in the VCRlike&lt;br /&gt;script control window changes to red when a virtual user is deadlocked. The dialog shown&lt;br /&gt;in Figure 61 informs the user of deadlock, the participating virtual users, and the recovery&lt;br /&gt;options.&lt;br /&gt;Figure 61: Deadlock dialog.&lt;br /&gt;137&lt;br /&gt;Rebecca-J provides the tester with two recovery options: release all or play/stop script.&lt;br /&gt;Release all will release all scripts synchronizing on the same id. The user invokes this option&lt;br /&gt;by pressing the red synchronization button on any virtual user script. All virtual users&lt;br /&gt;synchronizing on the same id will automatically be released. The user can also recover from&lt;br /&gt;deadlock by pressing the play or stop buttons of individual scripts. This will automatically&lt;br /&gt;release the script from synchronization and remove it from any cycle in the resource graph.&lt;br /&gt;Figure 62: User interface for triggers in Rebecca-J&lt;br /&gt;6.3.3 Triggers&lt;br /&gt;Commercial testing system orchestration completely prescribes the test session from start to&lt;br /&gt;finish and precludes live user participation. Virtual user orchestration is specified using a&lt;br /&gt;combination of script synchronization primitives and session manager scheduling [138]. The&lt;br /&gt;introduction of a live user into the test system creates several orchestration problems. First,&lt;br /&gt;the live user does not execute a test script. This means that script synchronization primitives&lt;br /&gt;used by virtual users cannot coordinate with the live user. Second, it is impossible to complete&lt;br /&gt;prescribe the test session, because the live user’s actions are unpredictable. The testing system&lt;br /&gt;would be enhanced if virtual users changed their behavior based on live user actions.&lt;br /&gt;Triggers radically change the concept of a test session by giving virtual users the ability to react&lt;br /&gt;to live users. Instead of being a prescribed process, multiuser testing becomes a reactive one.&lt;br /&gt;Triggers operate on any application activity that can be encapsulated with Rebecca’s extensible&lt;br /&gt;Triggers fired. Triggers fired.&lt;br /&gt;Maximum Triggers fired. Maximum Triggers fired.&lt;br /&gt;Trigger Listener Trigger Listener&lt;br /&gt;Component Selected Component Selected&lt;br /&gt;Threshold Model Threshold Model&lt;br /&gt;Recording Player Recording Player&lt;br /&gt;Trigger name Trigger name&lt;br /&gt;Enabled/Disabled&lt;br /&gt;State&lt;br /&gt;Enabled/Disabled&lt;br /&gt;State&lt;br /&gt;138&lt;br /&gt;component and event models. For more information on these extensible models, see Section&lt;br /&gt;6.2.3.&lt;br /&gt;6.3.3.1 Trigger Definition&lt;br /&gt;In order to use a trigger, it must be created and configured by the user from the server’s user&lt;br /&gt;interface. Figure 62 shows the user interface for the definition of a trigger in Rebecca-J. The&lt;br /&gt;process involves the following steps:&lt;br /&gt;Step One: The agent that will contain the trigger is selected. The user makes this&lt;br /&gt;selection from a list of all agents that have registered with the server.&lt;br /&gt;Step Two: The component that will generate the triggering event is selected. The&lt;br /&gt;server provides a remote component browser similar to the record filtration system&lt;br /&gt;from Section 6.2.4 that allows the user to browse the application components that&lt;br /&gt;have been registered with the trigger agent.&lt;br /&gt;Step Three: The threshold model is selected. The threshold model gives the trigger&lt;br /&gt;specifics about the event or sequence of events the selected component must generate.&lt;br /&gt;For example, the threshold model may require the component to generate a sequence&lt;br /&gt;of keyboard events.&lt;br /&gt;Step Four: The threshold model editor is used to further configure event specifics.&lt;br /&gt;Editors are model specific. For example, a keyboard event sequence editor could&lt;br /&gt;display a text box where user types “Hello”, the exact sequence of keyboard events&lt;br /&gt;the selected component generate.&lt;br /&gt;Step Five: The agent that contains the recording player is selected. Like the trigger,&lt;br /&gt;this agent is selected from a list of all agents registered with the server.&lt;br /&gt;Step Six: The user creates or loads a recording that will be played back in the&lt;br /&gt;recording agent selected in step five.&lt;br /&gt;Step Seven: The user configures the trigger queuing. If active, trigger firings are&lt;br /&gt;queued when the player is already replaying. This happens when the frequency that a&lt;br /&gt;trigger fires (e.g. once every five seconds) exceeds the frequency that the player can&lt;br /&gt;replay the recording (e.g. once every thirty seconds. If inactive, trigger firings are&lt;br /&gt;ignored during replay.&lt;br /&gt;Step Eight: The user configures a maximum firing count. If the maximum count is&lt;br /&gt;exceeded, then triggering is disabled. By default, the maximum count is infinite.&lt;br /&gt;Step Nine: The application that is attached to the agent with the trigger is used. When&lt;br /&gt;the selected component generates an event that meets the threshold model’s criteria,&lt;br /&gt;the trigger is fired.&lt;br /&gt;139&lt;br /&gt;6.3.4 Threshold Model&lt;br /&gt;Selecting a threshold model is an important part of the trigger process. Events generated by&lt;br /&gt;the selected component are passed to the threshold model for testing. If the event or&lt;br /&gt;sequence of events passes the threshold test, then the trigger fires. The exact nature of the test&lt;br /&gt;depends on the threshold model selected. It can be as simple as ensuring that the event is of a&lt;br /&gt;specific type, or as complicated as a state machine that reaches its end state with the correct&lt;br /&gt;sequence of events.&lt;br /&gt;Rebecca-J implements a number of threshold models including:&lt;br /&gt;Threshold Model Description&lt;br /&gt;keyPressed fire if GUI component generates a KEY_PRESSED event&lt;br /&gt;keyReleased fire if GUI component generates a KEY_RELEASED event&lt;br /&gt;keyTyped fire if GUI component generates a KEY_TYPED event&lt;br /&gt;mouseClicked fire if GUI component generates a MOUSE_CLICKED event&lt;br /&gt;mouseEntered fire if GUI component generates a MOUSE_ENTERED event&lt;br /&gt;mouseExited fire if GUI component generates a MOUSE_EXITED event&lt;br /&gt;mousePressed fire if GUI component generates a MOUSE_PRESSED event&lt;br /&gt;mouseReleased fire if GUI component generates a MOUSE_RELEASED event&lt;br /&gt;mouseMoved fire if GUI component generates a MOUSE_MOVED event&lt;br /&gt;mouseDragged fire if GUI component generates a MOUSE_DRAGGED event&lt;br /&gt;propertyChange fire if state change component generates a PROPERTY_CHANGE&lt;br /&gt;event&lt;br /&gt;keySequence fire if GUI component generates a sequence of KEY_PRESSED&lt;br /&gt;events specified through the model’s editor&lt;br /&gt;mouseRegion fire if GUI component generates a mouse event in the area of&lt;br /&gt;the GUI specified by the model’s editor&lt;br /&gt;propertyChangeInt fire if state change component generates a&lt;br /&gt;PROPERTY_CHANGE_INT event and the event satisfies the range&lt;br /&gt;conditions specified by the model’s editor&lt;br /&gt;Table 14: Threshold Models implemented in Rebecca-J&lt;br /&gt;Depending on the threshold model, an editor may be available for runtime customization.&lt;br /&gt;The editor may be as simple as a set of form fields specifying characteristics of the event or as&lt;br /&gt;sophisticated as a graphical editor indicating the area of a GUI the event must occur.&lt;br /&gt;140&lt;br /&gt;6.3.4.1 mousePressed Threshold Model&lt;br /&gt;Rebecca-J’s mousePressed model typifies a simple event type threshold model. The threshold&lt;br /&gt;model examines all mouse events generated by the selected component. If the mouse event is&lt;br /&gt;of type MOUSE_PRESSED, then the trigger fires. Because of the simplicity of this threshold&lt;br /&gt;model, no editor is necessary. If the user tries to edit the threshold, the dialog in Figure 63&lt;br /&gt;appears.&lt;br /&gt;Figure 63: A threshold editor is necessary for a simple event type&lt;br /&gt;threshold model.&lt;br /&gt;6.3.4.2 propertyChangeInt Threshold Model&lt;br /&gt;Rebecca-J’s propertyChangeInt model gives an example of a more complex threshold&lt;br /&gt;model. An editor, shown in Figure 64, provides runtime properties of the&lt;br /&gt;PropertyChangeInt event that must be satisfied for the trigger to fire.&lt;br /&gt;Figure 64: Rebecca-J’s editor for the propertyChangeInt threshold&lt;br /&gt;model.&lt;br /&gt;The user specifies an integer value in the text field labeled this. Then one of the seven&lt;br /&gt;conditional push buttons is pressed. The combination creates a boolean test that the&lt;br /&gt;141&lt;br /&gt;PropertyChangeInt event must pass for the trigger to fire. For example, if this is set to&lt;br /&gt;“10” and the conditional button “&lt;” is pressed, then the trigger will fire whenever the&lt;br /&gt;component selected generates a PropertyChangeInt event with a value less than ten.&lt;br /&gt;6.3.4.3 mouseRegion Threshold Model&lt;br /&gt;Rebecca-J’s mouseRegion model is an example of a sophisticated threshold model. An editor,&lt;br /&gt;shown in Figure 65, provides runtime properties of the mouse event that must be satisfied for&lt;br /&gt;the trigger to fire.&lt;br /&gt;Figure 65: The mouseRegion threshold model editor.&lt;br /&gt;After the user has used the remote component browser to select a GUI component, the&lt;br /&gt;mouseRegion threshold editor is activated. The editor queries the remote component for its&lt;br /&gt;dimensions and displays a facsimile in a graphical editing window. The user selects one or&lt;br /&gt;more geometric regions and draws them on top of the facsimile. Finally, a specific type of&lt;br /&gt;mouse event is selected from a pull-down list. Only events that occur in the user drawn&lt;br /&gt;regions with the specified type will fire the trigger.&lt;br /&gt;Figure 65 gives an example of the mouseRegion editor in use with the application from Figure&lt;br /&gt;41. The JButton1 component is selected using the remote component browser. The&lt;br /&gt;Select the specific mouse&lt;br /&gt;event that will fire the&lt;br /&gt;trigger.&lt;br /&gt;Select shape to mark&lt;br /&gt;component region&lt;br /&gt;where mouse events&lt;br /&gt;will fire trigger.&lt;br /&gt;Map component selected&lt;br /&gt;to editor’s drawing area.&lt;br /&gt;Rectangle shape marks a&lt;br /&gt;region of the component&lt;br /&gt;where mouse events will&lt;br /&gt;fire trigger.&lt;br /&gt;Select a component.&lt;br /&gt;142&lt;br /&gt;mouseRegion editor gets the dimensions of the button remotely and displays them in the&lt;br /&gt;graphical editing window. The user presses the Zoom In button several times to enlarge the&lt;br /&gt;push button facsimile. The rectangle geometric region is selected. The user draws a rectangle&lt;br /&gt;in the upper right quarter of the facsimile. The user then selects the mousePressed event&lt;br /&gt;from the pull-down list and presses the OK button to end the editing session. The threshold&lt;br /&gt;model is now configured to fire whenever the mouse button is pressed in the upper right&lt;br /&gt;corner of the JButton1 push button.&lt;br /&gt;6.3.4.4 Timers&lt;br /&gt;There are some situations where a virtual user script may need to be activated by a timer,&lt;br /&gt;rather than application activity. For example, the tester may want virtual user to manipulate a&lt;br /&gt;widget or type some characters every few seconds. Periodic load like this is useful for&lt;br /&gt;observing performance characteristics of the application under test. Rebecca’s trigger&lt;br /&gt;architecture includes provisions for such time-based triggers.&lt;br /&gt;Figure 66: Timer browser in Rebecca-J&lt;br /&gt;Unlike other triggers, timers are managed entirely by the server so it is not necessary to select a&lt;br /&gt;remote agent component and threshold model. The tester configures the timer by double&lt;br /&gt;clicking on the timer widget that appears with every trigger panel in the server. A timer&lt;br /&gt;selection window appears as in Figure 66. This window is used to manage timers once they&lt;br /&gt;have been configured. The tester can add a new timer, delete existing timers, edit an existing&lt;br /&gt;timer, or attach the selected timer to the trigger panel.&lt;br /&gt;143&lt;br /&gt;Figure 67 gives an example of the timer editor and trigger panel in Rebecca-J. The tester has&lt;br /&gt;configured the timer to fire once every 2000ms. TriggerPanel1, also shown in the figure, is&lt;br /&gt;the only trigger panel that will be affected by the timer. The user activates the timer by&lt;br /&gt;pressing the editor’s Start button. In the example, the timer is active and reads 400ms. At&lt;br /&gt;0ms the timer will fire and activate a recording player managed by the remote agent&lt;br /&gt;RebeccaAgentImpl0. The panel’s trigger counter shows that the timer has already fired four&lt;br /&gt;times.&lt;br /&gt;Figure 67: Configuring a timer trigger for a single virtual&lt;br /&gt;user.&lt;br /&gt;Timer triggers are managed by the server, rather than by remote agents. This allows multiple&lt;br /&gt;recording players to be attached to the same timer. When the timer fires, all of the recording&lt;br /&gt;players are activated near-simultaneously. Simultaneous replay of recordings creates realistic&lt;br /&gt;approximations of application use. For example, periodic dialog between a group of users in a&lt;br /&gt;chat application could be simulated.&lt;br /&gt;True simultaneous replay of recordings, implemented with networking techniques such as&lt;br /&gt;broadcasting, is not suitable for a testing system. The tester can never be sure if the replay&lt;br /&gt;started at exactly the same moment on each machine. Software and hardware layers&lt;br /&gt;throughout the command’s path from server to agent introduce delays that can keep it from&lt;br /&gt;being processed immediately. The cause of these delays can change during the test session&lt;br /&gt;making the replay order different each time the trigger fires. When a timer trigger fires&lt;br /&gt;Timer fires once&lt;br /&gt;every 2000 ms.&lt;br /&gt;Timer coumtdown at&lt;br /&gt;400ms. Timer fires when&lt;br /&gt;countdown reaches zero.&lt;br /&gt;Trigger counter indicates&lt;br /&gt;timer has fired four times.&lt;br /&gt;Only one trigger&lt;br /&gt;panel participating&lt;br /&gt;in this timer.&lt;br /&gt;144&lt;br /&gt;Rebecca iterates through an ordered list of players, activating each in turn. The server provides&lt;br /&gt;the tester with a user interface to configure the ordered list. Figure 68 demonstrates how&lt;br /&gt;players attached to a timer trigger are ordered in Rebecca-J. In the example, two recording&lt;br /&gt;players are attached to the timer. In the initial ordering, the player associated with&lt;br /&gt;TriggerPanel0 will be activated first when the timer fires. The user selects TriggerPanel1&lt;br /&gt;and presses the Move Up button. TriggerPanel1 moves up one position in the ordering list&lt;br /&gt;ahead of TriggerPanel0. The next time the timer fires, TriggerPanel0’s recording player&lt;br /&gt;will activate first.&lt;br /&gt;Figure 68: Ordering recording players in Rebecca-J&lt;br /&gt;One Agent, Multiple Triggers&lt;br /&gt;In some situations, a single threshold model is not sufficient to express the conditions under&lt;br /&gt;which a trigger should fire. For example, consider a trigger for a Property-ChangeInt event&lt;br /&gt;that fires if the new state is less than ten or greater than twenty. The PropertyChangeInt&lt;br /&gt;threshold editor allows only one boolean condition to be specified on the event. Rebecca&lt;br /&gt;allows the user to create multiple triggers on the same component and event in the same agent.&lt;br /&gt;This allows one trigger to be created for each boolean condition. In the state trigger example,&lt;br /&gt;User presses Move Upbutton&lt;br /&gt;to move TriggerPanel1 up one&lt;br /&gt;in the firing order.&lt;br /&gt;User selects a TriggerPanel&lt;br /&gt;to move&lt;br /&gt;145&lt;br /&gt;one trigger would be created for values less than ten and a second for values greater than&lt;br /&gt;twenty. Both triggers would exist in the same agent, listen for the same events from the same&lt;br /&gt;state component, and replay the same recording.&lt;br /&gt;Multiple triggers allow the user to OR a set of threshold model tests on an event, but what&lt;br /&gt;about other boolean operations? Rebecca does not provide for any other boolean operation.&lt;br /&gt;In order to provide complete a complete set of boolean operations, a boolean algebra and&lt;br /&gt;would need to be adopted. The operators in this algebra would consist of AND, OR, and NOT.&lt;br /&gt;The variables in the algebra would be a list of trigger names. Triggers specified in a boolean&lt;br /&gt;expression would never fire. A boolean expression would be assigned its own trigger. If the&lt;br /&gt;boolean expression was satisfied, then its trigger would fire. Consider an example where two&lt;br /&gt;state triggers, TriggerGreaterThan10 and TriggerLessThan20 are set on the same&lt;br /&gt;PropertyChangeInt component. TriggerGreater-Than10 fires when the new integer&lt;br /&gt;value is greater than ten. TriggerLessThan20 fires when the new value is less than twenty.&lt;br /&gt;The syntax for a boolean expression trigger that fired for integer value changes between ten&lt;br /&gt;and twenty would look like: (TriggerGreaterThanl0 AND TriggerLessThan-20).&lt;br /&gt;Figure 69: Adding a customized threshold model to ThresholdList’s initialize()&lt;br /&gt;method.&lt;br /&gt;6.3.4.5 Threshold Model Customization&lt;br /&gt;Like most of Rebecca’s subsystems, the threshold model subsystem is extensible. A tester can&lt;br /&gt;create new threshold models that weren’t anticipated at the time Rebecca was implemented.&lt;br /&gt;Threshold model customization requires three steps: model registration, comparator&lt;br /&gt;definition, and editor definition. The model must be registered with the system so the user can&lt;br /&gt;private void initialize() {&lt;br /&gt;try {&lt;br /&gt;list.addElement(new KeySequenceThresholdModel());&lt;br /&gt;list.addElement(new MouseRegionThresholdModel());&lt;br /&gt;…&lt;br /&gt;// Add your own threshold models here&lt;br /&gt;list.addElement(new CustomizedThresholdModel());&lt;br /&gt;} catch (Exception e) {&lt;br /&gt;System.out.println("Exception: " + e.getMessage());&lt;br /&gt;e.printStackTrace();&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;146&lt;br /&gt;select it from the server’s trigger panel user interface. The comparator determines if an event&lt;br /&gt;received by the trigger listener from the selected component is equivalent to an event stored in&lt;br /&gt;the threshold model. The editor allows runtime configuration of the threshold model.&lt;br /&gt;In Rebecca-J creating a custom threshold model consists of writing implementations for three&lt;br /&gt;abstract classes: ThresholdModel, ThresholdView, and ThresholdController. Once&lt;br /&gt;written, the new model is integrated into the application by adding a line to the initialize()&lt;br /&gt;method of the ThresholdList class (see Figure 69). The line adds an instance of the model&lt;br /&gt;to the list of threshold models on the server side. The server uses this list to provide the tester&lt;br /&gt;with a drop down selection of threshold models.&lt;br /&gt;An implementation of the abstract class ThresholdModel involves writing code for the&lt;br /&gt;method compare(). compare() has two parameters: the event generated by the component,&lt;br /&gt;and an internal “model” event provided by the threshold model. The characteristics of the&lt;br /&gt;internal event may be hard coded into the model, or they can be configured using the model’s&lt;br /&gt;editor.&lt;br /&gt;Figure 70: Implementation of the compare() method for low level key event threshold models.&lt;br /&gt;compare() returns TRUE if the events are equivalent. Equivalence is implementation specific.&lt;br /&gt;Figure 71 shows the implementation of the method for low level keyboard events such as&lt;br /&gt;keyPressed and keyReleased. compare() returns TRUE if the component generated event&lt;br /&gt;matches the event type of the internal “model” event. The characteristics of internal event are&lt;br /&gt;hard coded into the threshold model is selected. No editor is needed.&lt;br /&gt;Figure 71 shows the compare() method for the keySequence threshold model. Two events&lt;br /&gt;are equivalent in this implementation if they both have the event type KEY_PRESSED, and they&lt;br /&gt;public boolean compare(java.util.EventObject o1, java.util.EventObject o2){&lt;br /&gt;// Is this a key event?&lt;br /&gt;if ((o1 instanceof KeyEvent) &amp;&amp; (o2 instanceof KeyEvent)) {&lt;br /&gt;// Do the key events have the same TYPE?&lt;br /&gt;return (((KeyEvent) o1).getID() == ((KeyEvent) o2).getID());&lt;br /&gt;} else {&lt;br /&gt;return false;&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;1&lt;br /&gt;147&lt;br /&gt;both have the same keyboard code, character, modifiers, and action key type. The tester sets&lt;br /&gt;the characteristics of the internal “model” event at runtime using a threshold editor. In&lt;br /&gt;addition to the compare() method, a customized threshold model may also have a specialized&lt;br /&gt;editor. The purpose of the editor is to provide the user with runtime modification of the&lt;br /&gt;model. Editing commands are translated into parameters that configure the model. The&lt;br /&gt;abstract class ThresholdModel has a default editor, view, which does nothing (see Figure 70).&lt;br /&gt;When the user selects a threshold model, the trigger panel’s Edit Threshold button is&lt;br /&gt;connected to the display command of the threshold model’s view. The developer of a custom&lt;br /&gt;threshold model can reset the value of the view member in the model’s constructor. Figure 72&lt;br /&gt;shows the constructor for the mouseRegion threshold model. If the user selects this mode,&lt;br /&gt;the server displays the editor in Figure 65 when the Edit Threshold button is pressed.&lt;br /&gt;Figure 71: Implementation of compare() method for keySequence threshold model.&lt;br /&gt;Rebecca uses the MVC design pattern for threshold model editors. The model is the threshold&lt;br /&gt;model itself. The threshold view displays representation of the model and the model’s&lt;br /&gt;configuration parameters. The threshold controller converts user actions directed at the view&lt;br /&gt;into commands that manipulate the model.&lt;br /&gt;public MouseRegionThresholdModel() {&lt;br /&gt;super();&lt;br /&gt;// Special code to avoid infinite recursion&lt;br /&gt;// in the Visual Composition Editor&lt;br /&gt;if (java.beans.Beans.isDesignTime()) {&lt;br /&gt;System.out.println("Design time so not going to set the view");&lt;br /&gt;// Reset the view.&lt;br /&gt;} else {&lt;br /&gt;setView(new MouseRegionThresholdView(this));&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;148&lt;br /&gt;Like the model and view, the developer can create a custom controller for the editor.&lt;br /&gt;Rebecca-J implements several controllers that the developer can use as examples. The&lt;br /&gt;mouseRegion controller converts the mouse events inside the graphical editor into geometric&lt;br /&gt;shape drawing commands. The keySequence controller converts characters typed into the&lt;br /&gt;sequence text field into KEY_PRESSED events that are stored in the model’s event list state&lt;br /&gt;machine.&lt;br /&gt;Figure 72: Constructor for mouseRegion threshold model.&lt;br /&gt;6.3.4.6 Event Sequence Triggers&lt;br /&gt;In addition to firing a trigger in response to a single event, Rebecca provides the ability to fire&lt;br /&gt;after a sequence of events. For example, instead of firing whenever a keyboard event is&lt;br /&gt;generated, Rebecca can fire when a specific sequence of keyboard characters is typed (e.g.&lt;br /&gt;“Hello”). A runtime configurable event state machine to supports this capability. Events&lt;br /&gt;generated from the selected component are ignored until one equivalent to the first input&lt;br /&gt;condition event in the state machine is generated. The machine progresses through states until&lt;br /&gt;it reaches a final state or until an event occurs that does not satisfy a move to the next state.&lt;br /&gt;When the machine reaches its final state, the trigger is fired and it returns to the start state.&lt;br /&gt;Rebecca-J implements this state machine in the abstract class ThresholdModel method&lt;br /&gt;isThresholdMet(). The current position in an ordered list of internal model events,&lt;br /&gt;public boolean compare(java.util.EventObject o1, java.util.EventObject o2) {&lt;br /&gt;// We must be working with KeyEvents&lt;br /&gt;if (!(o1 instanceof KeyEvent) || !(o2 instanceof KeyEvent)) return false;&lt;br /&gt;// Typecast the event objects&lt;br /&gt;KeyEvent k1 = (KeyEvent) o1;&lt;br /&gt;KeyEvent k2 = (KeyEvent) o2;&lt;br /&gt;boolean result = true;&lt;br /&gt;// Test event characteristics of o1 against the internal model o2&lt;br /&gt;result = result &amp;&amp; (k1.getID() == k2.getID());&lt;br /&gt;result = result &amp;&amp; (k1.getKeyCode() == k2.getKeyCode());&lt;br /&gt;result = result &amp;&amp; (k1.getKeyChar() == k2.getKeyChar());&lt;br /&gt;result = result &amp;&amp; (k1.getModifiers() == k2.getModifiers());&lt;br /&gt;result = result &amp;&amp; (k1.isActionKey() == k2.isActionKey());&lt;br /&gt;return result;&lt;br /&gt;}&lt;br /&gt;149&lt;br /&gt;getStatePos(), keeps track of state. The machine has the same number of states (plus one&lt;br /&gt;for the end state) as there are events in the list. When an event equivalent to the first event in&lt;br /&gt;the event list is encountered, the start state input condition is satisfied, and the position in the&lt;br /&gt;event list is incremented by one. This continues until the end of the event list is reached, at&lt;br /&gt;which point the machine has reached its final state. If an event occurs that is not part of the&lt;br /&gt;sequence, the current position in the event list is reset to the beginning and the state machine&lt;br /&gt;resets to the start state.&lt;br /&gt;Figure 73: Implementation of event sequencing threshold model in&lt;br /&gt;Rebecca-J.&lt;br /&gt;public boolean isThresholdMet(java.util.EventObject e) {&lt;br /&gt;// Check if event object meets our event type criteria.&lt;br /&gt;// This checks the event type without altering the state&lt;br /&gt;// machine. It's a convenience method if not overridden&lt;br /&gt;// always returns TRUE.&lt;br /&gt;if (!isEventTypeFilter(e)) {&lt;br /&gt;return false;&lt;br /&gt;}&lt;br /&gt;// Start of the event list?&lt;br /&gt;if (getStatePos() == null) {&lt;br /&gt;setStatePos(getEventList().elements());&lt;br /&gt;}&lt;br /&gt;// Get current state in the state machine&lt;br /&gt;java.util.Enumeration s = getStatePos();&lt;br /&gt;// Special case test if we have an empty state machine&lt;br /&gt;if (!s.hasMoreElements()) {&lt;br /&gt;return true;&lt;br /&gt;}&lt;br /&gt;// Compare event with statePos event&lt;br /&gt;if (compare(e,(java.util.EventObject) s.nextElement())) {&lt;br /&gt;// Move to next state&lt;br /&gt;if (s.hasMoreElements()) {&lt;br /&gt;return false;&lt;br /&gt;}&lt;br /&gt;// We're at last state, reset to start&lt;br /&gt;// state and return TRUE&lt;br /&gt;else {&lt;br /&gt;setStatePos(getEventList().elements());&lt;br /&gt;return true;&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;// Event was not next event in state machine&lt;br /&gt;// reset to start state and return FALSE&lt;br /&gt;setStatePos(getEventList().elements());&lt;br /&gt;return false;&lt;br /&gt;}&lt;br /&gt;150&lt;br /&gt;6.3.4.7 Agent-side Threshold Models&lt;br /&gt;Runtime configuration of the threshold model takes place in the server using a threshold&lt;br /&gt;model editor. Once configured, the model is sent to a remote trigger listener. Upon receipt&lt;br /&gt;the trigger listener starts listening to component events generated by the selected component.&lt;br /&gt;The threshold model tests each event. If the test succeeds, the trigger is fired.&lt;br /&gt;Rebecca-J transmits the threshold model from the server to the agent via RMI by invoking the&lt;br /&gt;remote trigger listener method setThresholdModel() with the model as a parameter. The&lt;br /&gt;entire model is transmitted as an object to the agent requiring the object to be serializable&lt;br /&gt;[133]. Care must be taken when creating a custom threshold model that all members are&lt;br /&gt;serializable. In the abstract class ThresholdModel, for example, the server side threshold view&lt;br /&gt;and controller members are marked as transient. Transient members are not included when an&lt;br /&gt;object is serialized for transmission. This avoids JDK compatibility problems between the&lt;br /&gt;server and agent because these view/controller members may contain JDK specific GUI&lt;br /&gt;components.&lt;br /&gt;Once the threshold model is established in the agent’s trigger listener, the listener invokes the&lt;br /&gt;threshold model method startListening(). This method registers the listener’s interest in&lt;br /&gt;component events that may cause the trigger to fire. When an event is sent to the listener, the&lt;br /&gt;threshold model method isThresholdMet() is invoked with the event as a parameter. If the&lt;br /&gt;method returns TRUE, the trigger fires.&lt;br /&gt;Invoking isthresholdMet() rather than compare() hides whether the model is testing for&lt;br /&gt;a single event, sequence of events, or some custom condition. isThresholdEvent() tests&lt;br /&gt;for a single event by constructing a state machine with only one event necessary to reach the&lt;br /&gt;final state. The code for testing for a sequence of events is listed in Figure 73. A customized&lt;br /&gt;threshold model can override the method to create special case test logic. The nuances of&lt;br /&gt;testing logic are hidden from the listener by simply checking the return value of the method.&lt;br /&gt;6.3.4.8 Usage Case 1: Race Condition&lt;br /&gt;Triggers are a powerful mechanism for testing distributed applications. The principal&lt;br /&gt;advantage of triggers over traditional test systems is that a live user can be incorporated into a&lt;br /&gt;151&lt;br /&gt;test session. Virtual users react to events generated by other users, live or virtual. This reactive&lt;br /&gt;approach creates test sessions that are dynamic, rather than completely prescribed. The next&lt;br /&gt;several sections present examples of trigger use.&lt;br /&gt;Race conditions are a common problem in distributed systems. A race condition occurs when&lt;br /&gt;system behavior is dependent on instruction ordering between threads of execution. One way&lt;br /&gt;to test for the presence of a race condition is to create a situation using parallel threads varying&lt;br /&gt;the execution order.&lt;br /&gt;Consider a distributed form of the application in Figure 41. Whenever a +/- button is&lt;br /&gt;pressed, the counter is incremented/decremented by one and the new value is displayed in the&lt;br /&gt;text field of all distributed copies of the application. A potential race condition exists when&lt;br /&gt;users press the count buttons near-simultaneously.&lt;br /&gt;Triggers can help test for this race condition. A trigger could be set up between the live user&lt;br /&gt;and a virtual user. Whenever the live user presses the + button, the virtual user immediately&lt;br /&gt;reacts by pressing the - button. If the count field reads 0, then no matter how many times the&lt;br /&gt;live user presses the + button, it should always read 0.&lt;br /&gt;The probe for the race condition can be coupled with a stress test by creating a second trigger.&lt;br /&gt;Stress tests help uncover application flaws that aren't exposed under normal use. The new&lt;br /&gt;trigger presses the + button whenever the original virtual user presses the - button. Now we&lt;br /&gt;have a situation where two virtual users are reacting to each other. The maximum trigger&lt;br /&gt;count field in the panel for the new trigger is set to 1000. The test begins with the live user&lt;br /&gt;pressing the + button. Assuming the count field begins with 0, at the end of the test it should&lt;br /&gt;read 1. In addition to using duration to stress test the system, stress can be increased by&lt;br /&gt;setting the playback delay on the trigger recordings to NO_DELAY.&lt;br /&gt;Testing for race conditions is possible with a traditional testing system. One advantage of&lt;br /&gt;prescribed testing over triggers is the ability to test with scripts executing simultaneously on&lt;br /&gt;separate machines. In contrast to the simplicity of triggers, however, some effort is required to&lt;br /&gt;produce the test case. For the test described in this section, the tester would be forced to&lt;br /&gt;152&lt;br /&gt;create two scripts, one for each virtual user. Some form of synchronization primitive would&lt;br /&gt;have to be added so that both scripts began execution at the same time. A looping construct&lt;br /&gt;would be necessary so that the button presses could be repeated. Finally, the modified scripts&lt;br /&gt;would have to be saved, compiled, and loaded into the test system.&lt;br /&gt;6.3.4.9 Usage Case 2: Response Time Under Load&lt;br /&gt;Another vexing difficulty when developing synchronous multiuser applications is determining&lt;br /&gt;the responsiveness of the system under load. Response time is particularly important for user&lt;br /&gt;interface portions of the application. User anxiety rises when interactive components, such as&lt;br /&gt;a slider bar or pull down list, do not respond within milliseconds [40].&lt;br /&gt;Consider another example using the application from Figure 41. The tester wants to&lt;br /&gt;investigate the performance effect that pushing JButton1 in one copy of the application has&lt;br /&gt;on the moving the slider bar. Slider bars are useful way to get a “feel” for the responsiveness&lt;br /&gt;of the application. If the cursor doesn’t track well with a slider bar, then the user will notice it&lt;br /&gt;immediately.&lt;br /&gt;The tester creates a trigger that activates a virtual user that presses push button JButton1.&lt;br /&gt;The trigger fires whenever the live user moves the slider bar. During testing the slider bar&lt;br /&gt;does move sluggishly indicating a performance problem. Curious about whether part of the&lt;br /&gt;problem is due to a backlog of queued push button events, the tester resets the trigger’s&lt;br /&gt;threshold model to use the mouseRegion model. The left half of the live user’s slider bar is&lt;br /&gt;marked as the triggering region for the virtual user. Now the tester can compare how the&lt;br /&gt;slider bar behaves when inside and outside the triggering region. If the slider bar is sluggish&lt;br /&gt;inside the region, but immediately responsive outside it, then the user knows that an event&lt;br /&gt;backlog is not the problem.&lt;br /&gt;Compared to triggers, traditional testing systems only have the ability to place a gross load on&lt;br /&gt;the system. This is accomplished by running one or more iterating scripts against the&lt;br /&gt;application. A live user can participate in such a test session, but only in an uncoordinated&lt;br /&gt;fashion with the scripts. Using the example from this section, the tester executes a script that&lt;br /&gt;153&lt;br /&gt;repeatedly presses JButton1 in one copy of the application. A live user interacts with the&lt;br /&gt;slider bar in another copy of the application while this script executes.&lt;br /&gt;Since live user-based script control is unavailable with traditional testing, all interactions occur&lt;br /&gt;while the script executes. Script execution can be controlled from the test system, but it would&lt;br /&gt;be difficult or impossible to implement the test cases in this section. When the cursor nears&lt;br /&gt;the slider bar, interaction would stop while the tester configured and started the push button&lt;br /&gt;script. Interaction would then continue and as the slider bar’s behavior was observed under&lt;br /&gt;load. When the cursor was about to leave the slider bar, interaction would stop while the&lt;br /&gt;tester terminated the push button script. The test cases would be impossible if the application&lt;br /&gt;and test system were on the same machine because the cursor would have to leave and re-enter&lt;br /&gt;the application to control the test script. The mouseRegion trigger test case would be&lt;br /&gt;impossible to imitate because by the time the tester had shut off the test script and returned to&lt;br /&gt;the application, the backlog of queued events would have been processed.&lt;br /&gt;6.3.4.10 Usage Case 3: Simulating User Behavior&lt;br /&gt;In some situations, the tester may desire one or more virtual users to simulate “normal” user&lt;br /&gt;behavior. This allows the tester to observe how the application behaves under normal use.&lt;br /&gt;The definition of normal is application specific. Generally, the tester assigns a profile to each&lt;br /&gt;virtual user consisting of initiated and reactive behaviors. Initiated behavior consists of actions&lt;br /&gt;performed by the virtual user without regard to other activity in the application. For example,&lt;br /&gt;the virtual user might perform a series of mouse movements culminating in a button press&lt;br /&gt;every couple of minutes. Rebecca can model this behavior using a recording with the&lt;br /&gt;PLAY/CONTINUOUS PLAY button or a timer trigger.&lt;br /&gt;Reactive behavior consists of virtual user actions triggered by other user activity. For example,&lt;br /&gt;when testing a chat program, the virtual user might send the message “Hello, yourself!”&lt;br /&gt;in response the string “Hello” being typed by another user. Rebecca models this behavior&lt;br /&gt;using triggers that fire based on the activity of other users.&lt;br /&gt;Consider the shared drawing/chat application in Figure 74. The tester wants to create a profile&lt;br /&gt;for a virtual user that will interact with the live user during testing. For initiated behavior, the&lt;br /&gt;154&lt;br /&gt;virtual user will draw several shapes every sixty seconds. This behavior provides a periodic&lt;br /&gt;load on the system typical of normal use. Using Rebecca, the tester creates a recording that&lt;br /&gt;draws several shapes. A timer trigger that fires once every sixty seconds is attached to the&lt;br /&gt;recording.&lt;br /&gt;Figure 74: A shared drawing/chat application&lt;br /&gt;The virtual user also has three reactive behaviors in its profile. The first behavior is a friendly&lt;br /&gt;attitude when the live user types “Hi”, “Hello”, or “Hey” in the chat window. The virtual&lt;br /&gt;user responds with “Welcome. I’m drawing in the upper right corner and using&lt;br /&gt;blue!” The tester sets up this behavior by recording the welcome message and copying the&lt;br /&gt;recording to keySequence triggers for each possible live user greeting.&lt;br /&gt;The second behavior is a protective attitude about the upper right hand quarter of the drawing&lt;br /&gt;area. Whenever the live user’s mouse enters this area and begins drawing, the virtual user&lt;br /&gt;sends the angry message “Hey! I’m drawing in this area, draw somewhere else!”&lt;br /&gt;This behavior is set up by recording the angry message and attaching a mouseRegion trigger&lt;br /&gt;to the recording. The mouseRegion’s editor is configured to fire to mousePressed events in&lt;br /&gt;the upper right quarter of the drawing area. The live user’s agent is selected as the trigger&lt;br /&gt;listener.&lt;br /&gt;155&lt;br /&gt;The third behavior is another protective attitude towards the drawing color used by live user.&lt;br /&gt;The virtual user only draws with the color blue. If the live user chooses blue as well, the&lt;br /&gt;virtual user sends the angry message “Hey! Blue is my color. You’ve ruined the&lt;br /&gt;picture and we are starting over again.” and clears the drawing area. If the live&lt;br /&gt;user chooses a different color, the virtual user sends the message “Great choice! That&lt;br /&gt;will complement my blue very nicely.” In order to describe this behavior to Rebecca,&lt;br /&gt;a new component, event, and threshold model for propertyChangeColor must be written.&lt;br /&gt;The specifications for the propertyChangeColor are as similar to propertyChangeInt (see&lt;br /&gt;Section 6.3.4.2). The component is a state change component. It contains the current value&lt;br /&gt;of the local user’s drawing color. The event is a property change event. It is generated by the&lt;br /&gt;component when the user’s drawing color changes. The value transmitted in the event is the&lt;br /&gt;new drawing color. The threshold model tests for changes to the drawing color component.&lt;br /&gt;An editor configures the model by allowing the user to select a drawing color from a list in&lt;br /&gt;combination with a “this” or “all-but-this” option. If this is chosen, then the trigger&lt;br /&gt;will fire when the drawing color is selected. If all-but-this is chosen, then the trigger will&lt;br /&gt;fire when any other color is selected.&lt;br /&gt;The virtual user’s color selection behavior is created with two triggers. The first trigger is&lt;br /&gt;attached to a recording that sends the angry message and erases the drawing area. A&lt;br /&gt;properyChangeColor threshold model is configured to fire when the live user selects the&lt;br /&gt;color blue. The second recording sends the friendly message. It is attached to a&lt;br /&gt;propertyChangeColor trigger that fires when the live user selects any color except blue.&lt;br /&gt;Rebecca’s support for reactive behavior in virtual users greatly expands the possibilities for&lt;br /&gt;simulating user activity. Traditional testing systems do not have this capability. For initiated&lt;br /&gt;behavior, Rebecca’s PLAY/CONTINUOUS PLAY and trigger timer mechanisms offer a simple&lt;br /&gt;alternative to the labor intensive task of adding wait commands and loop constructs,&lt;br /&gt;compiling, loading and running a traditional test script.&lt;br /&gt;156&lt;br /&gt;6.3.4.11 Usage Case 4:A Flurry of Activity: Trigger Chaining&lt;br /&gt;The usage cases discussed in this section have dealt with virtual users reacting individually to&lt;br /&gt;live user actions. Rebecca’s trigger subsystem also supports trigger chaining. Trigger chaining&lt;br /&gt;allows virtual users to react to each other in a coordinated fashion. A trigger chain is created&lt;br /&gt;when a trigger is fired in reaction to event generated from a virtual rather than a live user.&lt;br /&gt;Triggers can be strung together across many virtual users to create a chain reaction to a single&lt;br /&gt;event.&lt;br /&gt;Figure 75: An example of trigger chaining.&lt;br /&gt;Figure 75 gives a generic example of trigger chaining. Three triggers are registered with the&lt;br /&gt;live user’s agent. Any one of these triggers will cause the same recording in VirtualUser0’s&lt;br /&gt;agent to execute. Two triggers, TriggerD and TriggerE, registered with VirtualUser0 fire&lt;br /&gt;because of an event in this recording. TriggerE causes the replay of a recording in&lt;br /&gt;VirtualUser2, where a branch of the chain terminates. TriggerD activates a recording in&lt;br /&gt;VirtualUser1 that fires TriggerF. TriggerF replays a recording in VirtualUser3 where&lt;br /&gt;the last branch of the chain terminates.&lt;br /&gt;TriggerA&lt;br /&gt;TriggerB&lt;br /&gt;TriggerC Recording&lt;br /&gt;TriggerD&lt;br /&gt;TriggerE&lt;br /&gt;Live User Virtual User0&lt;br /&gt;Recording&lt;br /&gt;TriggerF&lt;br /&gt;Virtual User1&lt;br /&gt;Recording&lt;br /&gt;Virtual User3&lt;br /&gt;Recording&lt;br /&gt;Virtual User2&lt;br /&gt;157&lt;br /&gt;Figure 76 shows how trigger chaining can be used to extend the shared drawing area test from&lt;br /&gt;the previous use case. A second virtual user is added to the test. As in the original test, when&lt;br /&gt;the live user types a greeting in the chat window, the original virtual user, VirtualUser0,&lt;br /&gt;responds with “Hello”. This response, however, is chained to another trigger that causes&lt;br /&gt;VirtualUser1 to type “Don’t be fooled by the friendly greeting.&lt;br /&gt;VirtualUser0 is actually very testy. If you want I can prove it to you.&lt;br /&gt;Just say ‘show me’” in the chat window.&lt;br /&gt;Figure 76: Trigger chaining extends shared drawing area&lt;br /&gt;test.&lt;br /&gt;If the live user types any phrase containing “show me” in the chat window, VirtualUser1&lt;br /&gt;replays a recording of mouse movements in the upper right corner of the drawing area.&lt;br /&gt;Chained to these mouse events is VirtualUser0’s angry response “Hey! I’m drawing&lt;br /&gt;in this area, draw somewhere else!” This triggers a retort from VirtualUser1&lt;br /&gt;“See, I told you! I’ll tell you something else. Don’t even think about&lt;br /&gt;drawing in blue. VirtualUser0 really hates that.” VirtualUser0 gets in the&lt;br /&gt;last word with “I heard that!” triggered by the key sequence “hates that” from&lt;br /&gt;VirtualUser1.&lt;br /&gt;“Hi”, “Hey”, “Hello”&lt;br /&gt;“Ok, show me!”&lt;br /&gt;Live User&lt;br /&gt;Hello&lt;br /&gt;Recording&lt;br /&gt;“Hello”&lt;br /&gt;Virtual User0&lt;br /&gt;Angry&lt;br /&gt;Region&lt;br /&gt;Recording&lt;br /&gt;“somewhere else!”&lt;br /&gt;Heard&lt;br /&gt;That!&lt;br /&gt;Recording&lt;br /&gt;Upper Right Corner&lt;br /&gt;Recording&lt;br /&gt;Upper Right Corner&lt;br /&gt;Virtual User1&lt;br /&gt;Don’t be fooled&lt;br /&gt;Recording&lt;br /&gt;Told you so&lt;br /&gt;Recording&lt;br /&gt;“told you so”&lt;br /&gt;158&lt;br /&gt;Although unimplemented in Rebecca-J, Rebecca has a second form of chaining: trigger state&lt;br /&gt;chaining. Trigger state chaining allows the tester to construct a non-deterministic finite&lt;br /&gt;automaton from a set of triggers. The input grammar for the automaton is quadruple of&lt;br /&gt;component, event, threshold model, and trigger count for each trigger in the chain. State&lt;br /&gt;transitions occur when the input condition for the current state is satisfied (the trigger fires).&lt;br /&gt;Figure 77 shows a generic example of trigger state chaining. In the example, TriggerX&lt;br /&gt;represents the triple: component, event, and threshold model. countX represents the trigger&lt;br /&gt;counter for a specific agent/trigger combination. For example, countVU0 represents the&lt;br /&gt;number of times TriggerA has fired to activate a recording in VirtualUser0. The start&lt;br /&gt;state, S0, begins the automaton. When TriggerA fires and countVU0 is less than the&lt;br /&gt;maximum firing count, S1 is entered replaying a recording in VirtualUser0. The machine&lt;br /&gt;stays in S1, replaying the recording for each successive firing of TriggerA until countVU0&lt;br /&gt;reaches its maximum. S2 is entered at this point. The paths S3,S4,S7 and S4,S6,S7 can be&lt;br /&gt;executed in parallel. The machine cannot proceed to the final state, S7, until the trigger&lt;br /&gt;counters for TriggerB and TriggerC for VirtualUser1 have both reached maximum.&lt;br /&gt;Figure 77: Trigger state chaining example.&lt;br /&gt;Start Virtual User 0&lt;br /&gt;TriggerA&lt;br /&gt;Recording&lt;br /&gt;Virtual User 0&lt;br /&gt;TriggerA&lt;br /&gt;Recording&lt;br /&gt;Virtual User 1&lt;br /&gt;TriggerB&lt;br /&gt;Recording&lt;br /&gt;Virtual User 1&lt;br /&gt;TriggerB&lt;br /&gt;Recording&lt;br /&gt;Virtual User 1&lt;br /&gt;TriggerC&lt;br /&gt;Recording&lt;br /&gt;Virtual User 1&lt;br /&gt;TriggerC&lt;br /&gt;Recording&lt;br /&gt;A1 - TriggerA &amp;&amp; (countVU0 &lt; MAX)&lt;br /&gt;A2 - TriggerA &amp;&amp; (countVU0 == MAX)&lt;br /&gt;B1 - TriggerB &amp;&amp; (countVU1B &lt; MAX)&lt;br /&gt;B2 - TriggerB &amp;&amp; (countVU1B == MAX)&lt;br /&gt;C1 - TriggerC &amp;&amp; (countVU1C &lt; MAX)&lt;br /&gt;C2 - TriggerC &amp;&amp; (countVU1C == MAX)&lt;br /&gt;A1&lt;br /&gt;A1&lt;br /&gt;A2&lt;br /&gt;A2 B1&lt;br /&gt;B1&lt;br /&gt;B2&lt;br /&gt;B2&lt;br /&gt;B2 B1&lt;br /&gt;C1 B1&lt;br /&gt;C1&lt;br /&gt;C1&lt;br /&gt;C2&lt;br /&gt;C2&lt;br /&gt;C2&lt;br /&gt;C1&lt;br /&gt;Finish&lt;br /&gt;F1 - (countVU1B == MAX) &amp;&amp; (countVU1C == MAX)&lt;br /&gt;F1&lt;br /&gt;F1&lt;br /&gt;Key&lt;br /&gt;S0&lt;br /&gt;S1 S2 S3 S4&lt;br /&gt;S5 S6&lt;br /&gt;S7&lt;br /&gt;159&lt;br /&gt;State change triggers are useful in situations where the tester would like a trigger to replay&lt;br /&gt;different recordings as the test session progresses. Consider the VirtualUser0’s response&lt;br /&gt;when the live user selects the color blue. Instead of sending the message and erasing the&lt;br /&gt;graphics in a single recording, the virtual user’s response could be broken up into a collection&lt;br /&gt;of smaller responses. The first time the live user selects the color blue VirtualUser0 issues a&lt;br /&gt;warning: “Hey! Blue is my color, you’ll ruin the picture if you use blue&lt;br /&gt;too.” If the live user selects another color, then the state machine returns to the start-state.&lt;br /&gt;If, however, the live user begins to draw with the color blue, VirtualUser0 issues another&lt;br /&gt;warning: “Stop drawing in blue or I’ll erase the whole picture!” If the live&lt;br /&gt;user selects another color, then the state machine returns to the start-state. However, if the&lt;br /&gt;user continues to draw VirtualUser0 sends the message “I’m going to keep erasing&lt;br /&gt;the drawing area until you change your drawing color from blue.” and clears&lt;br /&gt;the drawing area. This message/erase recording is triggered by live user drawing activity until&lt;br /&gt;another drawing color is selected.&lt;br /&gt;Although Rebecca-J has not implemented trigger state chaining, what would the&lt;br /&gt;implementation look like? Each trigger panel would include a state chaining button. When&lt;br /&gt;the button is pressed the user would see a dialog box listing all registered triggers. The user&lt;br /&gt;selects one or more of the registered triggers and adds them to the next state list. All triggers&lt;br /&gt;in the next state list are disabled until the trigger being edited has fired the maximum firing&lt;br /&gt;amount of times. The tester can check the “Reset Count” box for triggers in the next state&lt;br /&gt;list. This resets the trigger fired count for that trigger to zero when the trigger is enabled. The&lt;br /&gt;“Reset Count” box allows movement to a previous state in the trigger state chaining diagram.&lt;br /&gt;This section discussed trigger chaining, a multiuser testing methodology that coordinates&lt;br /&gt;virtual users based on live or virtual user actions. This reactive approach to coordinated replay&lt;br /&gt;is impossible with traditional testing systems. With trigger chaining, a triggered recording can&lt;br /&gt;trigger one or more additional replays. Trigger state chaining differs from chaining because&lt;br /&gt;triggers further down the chain are disabled until their predecessors have completed firing.&lt;br /&gt;160&lt;br /&gt;6.3.5 Global Clipboard&lt;br /&gt;Rebecca’s global recording clipboard simplifies the process of sharing some or all of a&lt;br /&gt;recording between virtual users. The server maintains a single clipboard that contains the&lt;br /&gt;contents of the latest event list cut or copy operation. Details of the clipboard’s architecture&lt;br /&gt;are described in Section 6.1.2.&lt;br /&gt;The global clipboard is an important component of the server’s runtime event list editor. By&lt;br /&gt;copying the entire contents of the clipboard, the tester can quickly configure multiple virtual&lt;br /&gt;users with the same behavior. If several triggers should cause the same behavior, the clipboard&lt;br /&gt;can be used to paste the recording into different players within a single agent.&lt;br /&gt;The event list subsystem comes with error handling facilities to deal with the situation where&lt;br /&gt;an event from one agent’s event list is pasted into the event list of an incompatible agent. The&lt;br /&gt;primary cause of incompatibility is a non-existent receiving component for an event.&lt;br /&gt;Determination of the error occurs during replay of the event list. If the component does not&lt;br /&gt;exist in the application, Rebecca displays an error message.&lt;br /&gt;A more insidious error, which Rebecca cannot detect, is event data incompatibility. This&lt;br /&gt;occurs when the data associated with the event are incompatible with the receiving&lt;br /&gt;component. Consider a set of mouse drag events that move the slider bar from the far right to&lt;br /&gt;the far left. The events are copied into a new agent. In the new agent, however, the slider bar&lt;br /&gt;is already on the far left side. If the new agent’s recording is replayed, the mouse drag events&lt;br /&gt;will have no effect. It is the responsibility of the tester to make sure that the events pasted into&lt;br /&gt;an event list make sense based on the state of the agent’s components.&lt;br /&gt;Traditional testing systems do not support a global clipboard because there is no support for&lt;br /&gt;runtime editing of recordings. In a traditional testing system a recording must be created,&lt;br /&gt;edited, compiled, and loaded into a test session. In order to make a change, the virtual user&lt;br /&gt;executing the recording must be terminated, recreated, and an updated version of the&lt;br /&gt;recording compiled and reloaded.&lt;br /&gt;161&lt;br /&gt;The global clipboard does not work with Rebecca’s native language recordings. Editing of a&lt;br /&gt;native language recording must be performed using an outside editor. Once editing is&lt;br /&gt;complete, the recording must be compiled using a native language compiler. Rebecca will then&lt;br /&gt;load the recording into an agent for replay. In contrast to traditional testing, however, Rebecca&lt;br /&gt;does not require the virtual user to terminate when reloading.&lt;br /&gt;6.3.6 Scalability&lt;br /&gt;Rebecca has a resource conserving architecture. This allows the system to run in tandem with&lt;br /&gt;an IDE, and improves scalability as the number of users participating in a test increases. For a&lt;br /&gt;detailed description of the server/agent architecture, see Section 6.1.&lt;br /&gt;On the agent side, several resource conserving design decisions were made. The agent has a&lt;br /&gt;small memory footprint (approximately 200K for Rebecca-J). Rebecca is able to achieve this&lt;br /&gt;by delegating all control and feedback logic, including the user interface, to the server. The&lt;br /&gt;agent’s limited memory requirements increase the amount of memory available to the&lt;br /&gt;application being tested.&lt;br /&gt;Rebecca’s agents make frequent use of sleep to reduce CPU utilization. The playback thread is&lt;br /&gt;responsible for replaying an agent player recording. When the thread is not replaying events, it&lt;br /&gt;is in a resource conserving wait state until awakened by a command from the server. The wait&lt;br /&gt;state is entered when the agent thread:&lt;br /&gt;Receives a stop or synchronize command from the server&lt;br /&gt;Finishes executing an event and is in single step mode&lt;br /&gt;Reaches the end of a recording and continuous play is disabled.&lt;br /&gt;The tester activates recorded or user defined delay for the recording. A special timed&lt;br /&gt;wait state puts the playback thread asleep for a parameter specified number of&lt;br /&gt;milliseconds.&lt;br /&gt;The listener/observer design pattern reduces the CPU impact of Rebecca’s trigger subsystem.&lt;br /&gt;The threshold model registers as a listener with the trigger’s selected component. The model&lt;br /&gt;only listens for specific events that are capable of firing the trigger. The design pattern&lt;br /&gt;passively eliminates events generated by other components and component selected events&lt;br /&gt;162&lt;br /&gt;that can’t fire the trigger. This significantly reduces the amount of event processing performed&lt;br /&gt;by the threshold model to determine when a trigger should fire.&lt;br /&gt;Efficient data structures are used throughout Rebecca to reduce the CPU demands of the test&lt;br /&gt;system. The components the application registers with an agent are stored in two data&lt;br /&gt;structures: a tree and hashtable. The tree structure maintains the hierarchical relationship&lt;br /&gt;between components in the table. The hashtable structure allows linear time lookup of&lt;br /&gt;components given a persistent store id. Hashtables are also used for lookup of trigger listeners&lt;br /&gt;and recording players in agents. The keySequence threshold model also uses a hashtable to&lt;br /&gt;determine if the key pressed by a user is alphanumeric.&lt;br /&gt;Recording players improve the performance of deadlock analysis on the agent by maintaining a&lt;br /&gt;distinct list of synchronization events present in the recording. The synchronization events are&lt;br /&gt;stored in a HashMap, a variation of a Hashtable that does not allow duplicates. As&lt;br /&gt;synchronization events are added or removed from the event list, the operation is reflected in&lt;br /&gt;the HashMap with an O(1) cost. This preprocessed list of synchronization events is forwarded&lt;br /&gt;to the server during deadlock analysis. Without the list an O(N) extraction of synchronization&lt;br /&gt;events from the player’s event list would be necessary.&lt;br /&gt;The agent’s resource conserving architecture reduces the impact in terms of memory and CPU&lt;br /&gt;that Rebecca has on the application. This is important for accurate recording and replay of&lt;br /&gt;actions because it reduces the test system’s probe effect. Additionally, minimal agent side load&lt;br /&gt;on the application can allow other resource intensive applications such as the IDE to be run&lt;br /&gt;on the same machine. This is helpful in situations where the tester needs both Rebecca and&lt;br /&gt;the IDE’s facilities to debug the application. Finally, a minimal footprint opens the possibility&lt;br /&gt;that multiple copies of the application could run on the same machine. This would allow the&lt;br /&gt;tester to perform scalability tests with a reduced amount of hardware.&lt;br /&gt;Rebecca’s centralized control of distributed agents incurs a minimal cost on the server side.&lt;br /&gt;The server provides user interface controls for each agent. When the tester manipulates a&lt;br /&gt;control, a command is sent to a remote agent. The major computational cost of&lt;br /&gt;163&lt;br /&gt;record/playback operations is borne by the agent. This results in a scalable server as the&lt;br /&gt;number of agents increases.&lt;br /&gt;One server performance bottleneck is the deadlock detection subsystem. The algorithm&lt;br /&gt;presented in Figure 51 incurs a server side cost of (V+E) to search for a cycle each time a&lt;br /&gt;synchronization event occurs. An additional (V+E) cost is incurred because the resource&lt;br /&gt;graph is built from scratch each time. Rebecca-J which implements the algorithm in Figure 51&lt;br /&gt;can be improved in two principal ways. First, the resource graph can be maintained between&lt;br /&gt;synchronization events. The graph would be updated when a synchronization event was:&lt;br /&gt;added to a recording&lt;br /&gt;deleted from a recording&lt;br /&gt;replayed in a recording&lt;br /&gt;released in a recording&lt;br /&gt;A companion data structure to the graph, such as a hashtable, would provide an O(1) lookup&lt;br /&gt;on the synchronization label to determine vertices and edges to be updated when the&lt;br /&gt;relationship between a synchronization event and one or more recordings changed.&lt;br /&gt;A second improvement to the algorithm in Figure 51 is a heuristic. The search for a graph&lt;br /&gt;cycle should originate from the vertex whose edges were modified in the graph. For example,&lt;br /&gt;if a synchronization event is replayed in by a recording player, then the graph cycle search&lt;br /&gt;should begin with the vertex, vn, in G that represents recording player. Although this doesn’t&lt;br /&gt;guarantee to reduce an (V+E) search of the graph, it can improve performance by&lt;br /&gt;eliminating portions of the graph that are unreachable from the modified vertex. The&lt;br /&gt;unreachable portions cannot contain an undiscovered cycle.&lt;br /&gt;6.3.7 Application Independence&lt;br /&gt;Distributed applications may not always have the same look and feel across users. In&lt;br /&gt;CollabBillboard, for example, some portions of the application had the same user interface,&lt;br /&gt;while others were radically different [13]. The view billboard user, for example, could view the&lt;br /&gt;164&lt;br /&gt;entire billboard and move the place billboard user, but could not actually move billboard&lt;br /&gt;pieces. The place billboard user, in contrast, could move about the billboard and move pieces,&lt;br /&gt;but could not see the entire billboard. Other areas of the application like site and task selection&lt;br /&gt;displayed a similar user interface for both users.&lt;br /&gt;Rebecca, like traditional testing systems, was designed with application independence in mind.&lt;br /&gt;Application independence is critical if the testing system is going to support many kinds of&lt;br /&gt;applications. Section 6.3 has shown that Rebecca provides novel support for multiuser testing.&lt;br /&gt;The multiuser coordination subsystems were designed with the recognition that a single&lt;br /&gt;distributed application can have several simultaneous users, each with possibly a different user&lt;br /&gt;interface. The design also recognizes, however, that sometimes the application will use the&lt;br /&gt;same user interface across users.&lt;br /&gt;Trigger definition maintains application independence. In phase one, the trigger source and&lt;br /&gt;characteristics are defined. The selection of the trigger listener’s agent, the component that&lt;br /&gt;will generate the event, and editing of runtime characteristics of the threshold model that will&lt;br /&gt;fire the trigger are pecific to the trigger source application. In phase two, the agent and&lt;br /&gt;recording that will be activated when the trigger fires is identified. The agent receiving trigger&lt;br /&gt;notification is not required to have a specific user interface or component configuration. The&lt;br /&gt;independence between trigger source and receiver is so complete that the two agents could&lt;br /&gt;actually be attached to different applications.&lt;br /&gt;Synchronization gives the tester fine-grained coordination of the activities of virtual users. The&lt;br /&gt;synchronization subsystem is only interested in the label and location of synchronization&lt;br /&gt;events in virtual user recordings. The subsystem does not care what a particular virtual user’s&lt;br /&gt;interface looks like or what application controls the user has. In CollabBillboard, for example,&lt;br /&gt;a recording of the view billboard user grabbing the place billboard view can be synchronized&lt;br /&gt;with the place billboard user view movement commands by inserting synchronization events at&lt;br /&gt;appropriate points in both scripts.&lt;br /&gt;The event list subsystem provides tools to edit and copy events to other event lists. When&lt;br /&gt;virtual users share the same components, the global clipboard allows some or all of a recording&lt;br /&gt;165&lt;br /&gt;to be shared. Laissez-faire error checking allows the tester to clean up events with unidentified&lt;br /&gt;components that have been copied to a virtual user.&lt;br /&gt;166&lt;br /&gt;7 Evaluation&lt;br /&gt;This chapter presents an evaluation of CAMELOT and Rebecca-J, the Java based&lt;br /&gt;implementation of Rebecca. Several approaches to the evaluation were considered&lt;br /&gt;including: application re-implementation, experimental analysis, and application testing.&lt;br /&gt;For application re-implementation, the goal would have been to re-implement an existing&lt;br /&gt;CSCW application using Rebecca and CAMELOT. A positive evaluation would show that&lt;br /&gt;it takes less effort to implement a CSCW system using our methodology and architecture.&lt;br /&gt;This approach was rejected for two reasons. First, it would be difficult to compare&lt;br /&gt;implementation efforts. A great deal of effort is expended in human factors rework when&lt;br /&gt;creating a CSCW system. This rework time would not be present in the reimplementation.&lt;br /&gt;Second, the effort involved in implementing a real CSCW application is&lt;br /&gt;significant. An application that isn’t a toy system could take months to years to reimplement.&lt;br /&gt;Another approach we considered was an experimental evaluation. One of the programs&lt;br /&gt;used to test Rebecca-J was a small Java-based shared drawing/chat application. We&lt;br /&gt;considered creating a version of the application with a set of deliberately introduced bugs&lt;br /&gt;for experimental purposes. Two groups of testers would have been asked to find bugs in&lt;br /&gt;the application. The first group would use CAMELOT and Rebecca-J to uncover&lt;br /&gt;problems. The second group, a control, would try to find problems without any tools. A&lt;br /&gt;positive evaluation would show that the group using our methodology and architecture&lt;br /&gt;found a statistically significant greater number of bugs. This approach has several&lt;br /&gt;disadvantages. First, the bugs introduced into the application might be biased towards&lt;br /&gt;CAMELOT or Rebecca since we were introducing them. Second, the application was a&lt;br /&gt;toy CSCW system. Third, a significant time commitment was necessary from volunteers&lt;br /&gt;to learn the methodology, Rebecca-J, and debug the application. During the&lt;br /&gt;CollabBillboard experiment, we had a great deal of difficulty finding unskilled volunteers&lt;br /&gt;willing to commit an hour of time. For this evaluation the volunteers would have needed&lt;br /&gt;excellent Java programming skills, and would have had to commit hours, possibly days, to&lt;br /&gt;the experiment.&lt;br /&gt;167&lt;br /&gt;The approach we finally selected considered was applying CAMELOT and Rebecca-J to a&lt;br /&gt;CSCW application in use or under development. A positive evaluation would be one that&lt;br /&gt;uncovered previously undetected bugs and improved the quality of the application. This&lt;br /&gt;approach also had several disadvantages. First, since Rebecca-J was a research&lt;br /&gt;implementation of a proposed architecture, some amount of rework would probably be&lt;br /&gt;necessary to make it compatible with a real CSCW application. Second, this approach&lt;br /&gt;would be less quantitative than the previous methods. Despite these drawbacks, this kind&lt;br /&gt;of evaluation was more attractive than other approaches because it could be conducted&lt;br /&gt;using a real CSCW application.&lt;br /&gt;After considering each of the approaches, we decided to conduct the evaluation using a&lt;br /&gt;real CSCW application. The application is a remote windowing system called the&lt;br /&gt;Reconfigurable Collaboration Network (RCN). Section 7.1 examines the remote&lt;br /&gt;windowing application in detail. Section 7.2 discusses the effort involved in upgrading&lt;br /&gt;Rebecca-J to Java 1.2. This upgrade was necessary for compatibility with RCN. Section&lt;br /&gt;7.3 reflects on architectural problems with Rebecca-J that were uncovered during the&lt;br /&gt;evaluation. Section 7.4 discusses the two-dozen RCN bugs that were discovered using&lt;br /&gt;CAMELOT and Rebecca-J. Section 7.5 has some final thoughts on the evaluation&lt;br /&gt;process.&lt;br /&gt;7.1 The Reconfigurable Collaboration Network&lt;br /&gt;The Reconfigurable Collaboration Network (RCN) was developed as part of the&lt;br /&gt;Collaborative Classroom research effort at Rensselaer Polytechnic Institute. The goal of&lt;br /&gt;the research was to develop a classroom where the learning came from group participation&lt;br /&gt;rather than lecture. The classroom design consisted of a unique combination of hardware,&lt;br /&gt;software, and physical architecture to promote group activity [13].&lt;br /&gt;The roots of the Collaborative Classroom can be traced to the Design Conference Room&lt;br /&gt;(DCR) Project [15]. The DCR contained a specially designed table for six meeting&lt;br /&gt;participants. Each user has a private computer, keyboard, mouse, and display. The display&lt;br /&gt;is mounted beneath the table’s surface to facilitate eye contact. A glass pane imbedded in&lt;br /&gt;the table’s surface allows the display to be viewed. A public computer is also included.&lt;br /&gt;Although the public machine is physically located in a different room, users can view&lt;br /&gt;168&lt;br /&gt;activity on the machine through several displays mounted beneath the conference room&lt;br /&gt;table. Users can also control activity on the public machine with their private machine’s&lt;br /&gt;keyboard and mouse using special remote windowing system called the Collaboration&lt;br /&gt;Network (CN). CN allows full control of all systems services of the public machine.&lt;br /&gt;The original CN software differs from traditional remote windowing systems in several&lt;br /&gt;respects. Foremost, because users can see the public machine’s display, there is no need&lt;br /&gt;for remote viewing capability. Second, unlike other remote windowing systems, the DCR&lt;br /&gt;software views the public machine as a shared resource. Session and floor control&lt;br /&gt;functionality are included for management of the public machine during meetings. Finally,&lt;br /&gt;other meeting facilitation capabilities are provided including a chat system.&lt;br /&gt;RCN represents the next generation of this DCR software system. Key improvements&lt;br /&gt;include portability, enhanced session management, floor control modifications, and&lt;br /&gt;reduced meeting facilitation capability. The RCN system was implemented almost entirely&lt;br /&gt;in Java. Unlike the CN, which supported only MacOS, RCN supports Windows&lt;br /&gt;95/98/NT, MacOS, and Linux.&lt;br /&gt;Session management has been greatly enhanced. While the CN supports a group of users&lt;br /&gt;sharing a single public machine, RCN organizes users into teams. Each team has&lt;br /&gt;descriptive information and one or more administrators that control who is on the team.&lt;br /&gt;The first time a user selects a public machine a session is created. Additional users&lt;br /&gt;(registered or guests), may join or leave a session at any time. Multiple public machines are&lt;br /&gt;supported with one public machine per session. Finally the concept of a super session is&lt;br /&gt;supported. This allows users from different sessions to join together in a single metasession&lt;br /&gt;for shared control of a public machine.&lt;br /&gt;The CN had a sophisticated floor control mechanism to control access to the public&lt;br /&gt;machine. Studies showed that users found the mechanism awkward and preferred to use a&lt;br /&gt;simple interrupt button and verbal consensus to take control of the public. Only this&lt;br /&gt;simple floor control mechanism was carried over to RCN.&lt;br /&gt;169&lt;br /&gt;Finally, the chat system and other meeting facilitation capabilities were dropped from&lt;br /&gt;RCN. Many of these capabilities are available in other software systems, which can be run&lt;br /&gt;under RCN. For example, ICQ is an excellent chat system available for free on the&lt;br /&gt;Internet.&lt;br /&gt;RCN architecture consists of three core components: ISServer, RCNPublicServer, and&lt;br /&gt;rcnClient. ISServer is responsible for session management. It keeps track of all active&lt;br /&gt;publics, sessions, teams, and users. It also maintains persistent store information about&lt;br /&gt;teams and users. RCNPublicServers register with the ISServer to advertise their&lt;br /&gt;availability to users. rcnClients must locate an ISServer to register as an active user&lt;br /&gt;and to find publics, sessions, teams and other users.&lt;br /&gt;An RCNPublicServer runs on each public machine. It is responsible for receiving&lt;br /&gt;remote mouse and keyboard events from rcnClients and translating them to local&lt;br /&gt;events. If user selects ghosting, the software translates remote mouse events into move&lt;br /&gt;commands for a ghost icon associated with the user.&lt;br /&gt;An rcnClient runs on each user’s machine. The rcnClient presents the user with an&lt;br /&gt;array of session management functionality. Session management commands are sent from&lt;br /&gt;the rcnClient to a remote ISServer. When the user joins a session and selects the&lt;br /&gt;Interrupt button, his or her mouse and keyboard events are sent to a remote&lt;br /&gt;RCNPublicServer. Only one session member at a time can have this control. However,&lt;br /&gt;if any session member presses the Ghost button, his or her mouse events are sent to the&lt;br /&gt;remote public machine.&lt;br /&gt;RCN was fairly mature at the time of the CAMELOT and Rebecca-J evaluation. The&lt;br /&gt;application had been in development for over two and a half years, and there were plans to&lt;br /&gt;commercialize the system. During the school semester, the software was used daily by&lt;br /&gt;students in several courses. The development team was reasonably confident of the&lt;br /&gt;stability of the system. Presumably, because of this confidence, one team member&lt;br /&gt;suggested it might be necessary to deliberately introduce bugs into the software.&lt;br /&gt;170&lt;br /&gt;7.2 Evaluation Phase I: Converting Rebecca to Java 1.2&lt;br /&gt;RCN was implemented almost entirely in Java. At the time of the evaluation, RCN&lt;br /&gt;operated with JDK 1.2.2 while Rebecca-J used JDK1.1.8. A major concern of the&lt;br /&gt;evaluation was the effort involved in converting Rebecca-J to the new JDK. Sun had&lt;br /&gt;made significant changes to the swing, graphics, and thread classes that were used&lt;br /&gt;throughout Rebecca-J. When the implementation was first imported into a JDK 1.2&lt;br /&gt;compatible version of IBM’s IDE, Visual Age for Java, over 7000 syntax errors were&lt;br /&gt;detected. Surprisingly, the conversion went smoothly with the exception of package name&lt;br /&gt;conversion, text events, and some difficulties with the IDE’s visual editor.&lt;br /&gt;Swing consists of a rich set of Java classes that can be used to implement graphical user&lt;br /&gt;interfaces. Prior to JDK 1.2.0, Swing was separate from JDK’s core classes stored under a&lt;br /&gt;set of packages that began with the prefix com.sun.java.swing. With JDK 1.2.0 and&lt;br /&gt;subsequent releases, Swing was integrated into the core classes under the prefix&lt;br /&gt;javax.swing.&lt;br /&gt;For Rebecca-J this change in package and class names affected most of the 700+ classes&lt;br /&gt;comprising the implementation. IBM recommended WoodenChair’s Repackage+ utility&lt;br /&gt;that was specifically designed to handle the conversion [139]. This tool worked, but had&lt;br /&gt;the unfortunate side effect of corrupting the visual portion of any class created with the&lt;br /&gt;IDE’s visual composition editor. In order to preserve the visual editing work, a semiautomatic&lt;br /&gt;technique was developed. An automated search and replace conversion was&lt;br /&gt;applied manually to each class.&lt;br /&gt;Using the IDE’s visual composition editor, some of Rebecca-J’s classes were created and&lt;br /&gt;edited visually. When the editing was complete, a regenerate option was selected and the&lt;br /&gt;IDE created Java source code for the class. In parallel with the Java source, the IDE&lt;br /&gt;stored the visual composition using a special internal format inaccessible to developers.&lt;br /&gt;Unfortunately for classes imported from JDK 1.1.8, this internal format contained artifacts&lt;br /&gt;that generated Java source incompatible with JDK 1.2.0.&lt;br /&gt;Consider the Java class VCEImportTest, a JWindow containing a JTextField with right&lt;br /&gt;aligned text created using the visual composition editor under JDK 1.1.8. If the class is&lt;br /&gt;171&lt;br /&gt;imported into the JDK 1.2.0 IDE and the regenerate option is selected to produce the Java&lt;br /&gt;source code the following method call will be regenerated as part of the class initialization:&lt;br /&gt;setAlignment(com.sun.java.swing.JText.RIGHT_ALIGNMENT)&lt;br /&gt;This produces a syntax error because the JText class is located in the package&lt;br /&gt;javax.swing in JDK 1.2.0. The only way to correct this problem is to use the visual&lt;br /&gt;editor to reset the JTextField’s alignment to something other than right, set the&lt;br /&gt;alignment back to right, then regenerate the Java class. Resetting the alignment causes the&lt;br /&gt;visual editor to pick up the JDK 1.2.0 constant for source code generation.&lt;br /&gt;A subtle change to Swing’s event subsystem also impacted Rebecca-J’s record/playback&lt;br /&gt;system. Rebecca-J provides support for user interface recording by default. Support is&lt;br /&gt;implemented by recording low level mouse and keyboard events on user interface&lt;br /&gt;components. The events are replayed when requested. The new JDK altered how text&lt;br /&gt;events were processed from classes that inherited from the JText class. The effect of this&lt;br /&gt;change was that when low-level text events were sent to a JText component&lt;br /&gt;programmatically, the current cursor position would not move causing replayed text events&lt;br /&gt;to appear backwards in a text field.&lt;br /&gt;The fix to this problem utilized Rebecca’s extensible component/event model. A new&lt;br /&gt;event type called CaretEventRecord was added to Rebecca-J. Whenever a class&lt;br /&gt;inheriting from JText was encountered during the processing of user interface&lt;br /&gt;components, a listener was added for caret movement in addition to mouse and keyboard&lt;br /&gt;events.&lt;br /&gt;7.3 Evaluation Phase II: Getting Rebecca to work with RCN&lt;br /&gt;In addition to problems with RCN, the evaluation uncovered problems with Rebecca-J.&lt;br /&gt;The decision to use a real CSCW application was critical in detecting these implementation&lt;br /&gt;flaws. It is likely these would not have been discovered using the other evaluation&lt;br /&gt;methods under consideration. The major flaws uncovered in Rebecca-J were: component&lt;br /&gt;detection, component naming, component existence, modal dialogs, menu bars, and&lt;br /&gt;synchronization feedback.&lt;br /&gt;172&lt;br /&gt;7.3.1 Component Detection&lt;br /&gt;In the original implementation, the application under test registered with a RebeccaAgent&lt;br /&gt;by invoking the method RebeccaAgent.registerTopLevelUIComponent(). This&lt;br /&gt;method gave Rebecca a hook into the application’s user interface. The hook was used for&lt;br /&gt;a one-time identification of all components that could produce and receive mouse and&lt;br /&gt;keyboard events. Testing with RCN uncovered two problems with this approach. First it&lt;br /&gt;wasn’t possible to account for UI components added to the application after registration.&lt;br /&gt;For example, if the application created and displayed a new dialog, Rebecca-J would not be&lt;br /&gt;able to record/playback events on this dialog.&lt;br /&gt;One solution to this problem was to require the application to inform Rebecca when a&lt;br /&gt;new UI component was created after registration. This solution was rejected because the&lt;br /&gt;extra application instrumentation required might make Rebecca-J unattractive to&lt;br /&gt;developers.&lt;br /&gt;Another solution was to watch the UI event queue, looking for mouse and keyboard&lt;br /&gt;events. Anytime an event occurred in a new component, the component would be&lt;br /&gt;registered with Rebecca. This solution was rejected because of concern about the&lt;br /&gt;performance degradation this might cause in the application.&lt;br /&gt;A new event type added to JDK 1.2.0 provided the final solution. The JDK allows an&lt;br /&gt;application to listen for window events at the virtual machine, rather than the class&lt;br /&gt;instance level. When a window was displayed or hidden in the application, the listener&lt;br /&gt;would be notified. Rebecca’s listener determined if the event contained a new window by&lt;br /&gt;querying the ComponentMonitor. If new, then the window along with any&lt;br /&gt;subcomponents was added to Rebecca’s component hierarchy.&lt;br /&gt;The second problem RCN uncovered was Rebecca’s inability to detect component&lt;br /&gt;replacement. Component replacement occurs when an application creates a component&lt;br /&gt;instance, uses the instance, and discards it. Later during the application’s execution, the&lt;br /&gt;process is repeated with the same component. For example, consider a dialog window&lt;br /&gt;that appears when saving a file to disk. The application may create a new instance of the&lt;br /&gt;dialog each time the user saves a file, and discard the window when the operation is&lt;br /&gt;173&lt;br /&gt;complete. From a high level view, the component is the same, although the instance is&lt;br /&gt;not.&lt;br /&gt;Rebecca’s original implementation expected component recycling. If the same component&lt;br /&gt;was used later in the application, it would always be the same component instance. The&lt;br /&gt;evaluation with RCN showed that both component replacement and recycling occur in an&lt;br /&gt;application.&lt;br /&gt;The solution to this problem built on the enhancements made to support automatic&lt;br /&gt;registration of new components. Whenever Rebecca received a visible window event, two&lt;br /&gt;checks were performed. The first tested whether it was a new window. The second check&lt;br /&gt;tested whether it was an existing window, but a new instance. If the latter was true, then&lt;br /&gt;the old instance was replaced in the component hierarchy. If application events were&lt;br /&gt;being recorded, care had to be taken to ensure that Rebecca listened to mouse and&lt;br /&gt;keyboard events on components from the new instance.&lt;br /&gt;7.3.2 Component Naming&lt;br /&gt;In the original version of Rebecca-J, all user interface components were required to have&lt;br /&gt;unique names. This requirement allowed Rebecca-J to maintain a unique, persistent&lt;br /&gt;identifier for each component. During playback, the identifier was used to determine the&lt;br /&gt;component instance to send an event to. Rebecca-J used the UI component methods&lt;br /&gt;setName()/getName() to determine the unique name. If the method returned an empty&lt;br /&gt;string, then the class name of the component was used.&lt;br /&gt;For the small applications that Rebecca-J was tested with before RCN, this technique&lt;br /&gt;worked well. The IDE used to develop these toy applications assigned unique names to&lt;br /&gt;each UI component when the visual composition editor was used. However, when&lt;br /&gt;Rebecca-J was connected to RCN, problems developed immediately.&lt;br /&gt;174&lt;br /&gt;The basic problem was that most of RCN’s UI components did not have unique names&lt;br /&gt;that could be determined from getName(). Rebecca’s alternative, the component’s class&lt;br /&gt;name, only worked once for each class. Unfortunately RCN used many UI component&lt;br /&gt;instances from same class. A simple JDialog with two unnamed JButton push buttons&lt;br /&gt;would cause Rebecca’s component naming system to break down.&lt;br /&gt;Figure 78: Derivation of unique name from root&lt;br /&gt;component.&lt;br /&gt;One solution to the problem was to force the application developer to uniquely name each&lt;br /&gt;component. This solution was rejected because it was felt that the instrumentation&lt;br /&gt;requirement would make Rebecca less appealing to users.&lt;br /&gt;A less intrusive solution was developed that required unique names for the highest level UI&lt;br /&gt;components. These components are classes that inherit from JDialog, JWindow and&lt;br /&gt;JFrame. An algorithm was developed to create a unique name for every subcomponent&lt;br /&gt;based on its position in the UI hierarchy with respect to the root UI component. The&lt;br /&gt;name was derived tracing the component’s ancestry up the component hierarchy until root&lt;br /&gt;was reached. Figure 78 shows a JDialog with a JLabel and a JPanel containing two&lt;br /&gt;JButton push buttons. The right hand side of the figure shows the UI hierarchy for the&lt;br /&gt;JDialog and the names derived for each component. The fully qualified name for the OK&lt;br /&gt;JButton is: //Overwrite Dialog/JContentPane0/JPanel0/JButton1.&lt;br /&gt;JDialog (//Overwrite Dialog)&lt;br /&gt;JContentPane (JContentPane0)&lt;br /&gt;JLabel (JLabel0) JPanel (JPanel0)&lt;br /&gt;JButton (JButton0) JButton (JButton1)&lt;br /&gt;175&lt;br /&gt;7.3.3 Component Existence&lt;br /&gt;The original version Rebecca-J assumed that when replaying a UI event, the receiving&lt;br /&gt;component was visible on the screen. During the evaluation with RCN it became obvious&lt;br /&gt;that this was not always the case. A user action could trigger the display of a UI&lt;br /&gt;component, but because of an unpredictable delay (e.g. computational cost, competition&lt;br /&gt;for CPU or memory resources, network delay), it might take some time for the component&lt;br /&gt;to become visible. Rebecca-J would replay events on a component without regard to its&lt;br /&gt;readiness to accept them. Another possibility was that a user action could trigger the&lt;br /&gt;delayed creation of a UI component, which would be then be displayed. In this situation,&lt;br /&gt;Rebecca would generate an error as it tried to replay an event on a component that didn’t&lt;br /&gt;exist yet. The solution to these problems was to force Rebecca-J’s playback mechanism to&lt;br /&gt;wait until the UI component identified by an event existed and was visible.&lt;br /&gt;7.3.4 Modal Dialogs&lt;br /&gt;Modal dialogs were used throughout the RCN application. The purpose of this UI&lt;br /&gt;component was to grab the user’s attention by restricting all user input to the dialog. All&lt;br /&gt;other UI input processing halted until the user interacted with and closed the dialog. Until&lt;br /&gt;the evaluation, Rebecca-J had not been tested with modal dialogs. Two serious problems&lt;br /&gt;were detected: out-of-sequence events and frozen playback.&lt;br /&gt;The first problem, out-of-sequence events, occurred when Rebecca-J recorded a set of&lt;br /&gt;user actions that caused a modal dialog to appear, interacted with the dialog, then caused&lt;br /&gt;the dialog to disappear. Analysis of events from this scenario showed that the UI event&lt;br /&gt;that caused the modal dialog to appear was reported after the dialog was closed. During&lt;br /&gt;replay, Rebecca-J would wait indefinitely at the first modal dialog event for the window to&lt;br /&gt;appear. The window would never appear because the event that actually triggered the&lt;br /&gt;dialog was located later in the event list.&lt;br /&gt;To correct the problem an algorithm was developed for detecting and correcting out-ofsequence&lt;br /&gt;modal dialog events. After a recording, the event list was scanned for modal&lt;br /&gt;dialog events. The first occurrence of a modal dialog event was marked. The event list&lt;br /&gt;was searched further for the following sequence: MOUSE_RELEASE or KEY_RELEASE event&lt;br /&gt;in the modal dialog followed by a MOUSE_RELEASE or KEY_RELEASE event in the same&lt;br /&gt;176&lt;br /&gt;component as the event immediately preceding the first modal dialog event. This last&lt;br /&gt;event was the out-of-sequence event and was moved before the first modal dialog event.&lt;br /&gt;The algorithm handled most common modal dialog scenarios, but did not handle all of&lt;br /&gt;them. For example, if the application developer used a non-standard UI event to trigger&lt;br /&gt;the display of the modal dialog, such as MOUSE_PRESSED or KEY_PRESSED, then the&lt;br /&gt;algorithm would not work.&lt;br /&gt;The second problem encountered with modal dialogs was frozen playback. During&lt;br /&gt;playback of UI events, Rebecca-J invokes the AWT method dispatchEvent() on the UI&lt;br /&gt;component. This method normally returns immediately because AWT handles event&lt;br /&gt;processing in a separate thread. However, when a UI event triggers the display of a modal&lt;br /&gt;dialog, the method does not return until the dialog is removed from the display. This&lt;br /&gt;behavior caused Rebecca-J’s playback mechanism to freeze. Playback was halted&lt;br /&gt;indefinitely because the event that would remove the dialog was located later in the event&lt;br /&gt;list. To correct the problem, a separate thread was created for each dispatchEvent()&lt;br /&gt;invocation.&lt;br /&gt;7.3.5 Menu Bars&lt;br /&gt;The evaluation with RCN uncovered a problem with record/playback of JMenuItem&lt;br /&gt;components. Unlike the rest of the JDK’s UI components, JMenuItem required a&lt;br /&gt;combination of low-level mouse events and higher-level MenuDragMouseEvents.&lt;br /&gt;Rebecca’s extensible component/event model was used to define a special&lt;br /&gt;MenuDragMouseEvent for this UI component.&lt;br /&gt;7.3.6 Synchronization Feedback&lt;br /&gt;Rebecca-J’s synchronization subsystem was used extensively during the distributed&lt;br /&gt;computing tests conducted on RCN. When a synchronization event is encountered, the&lt;br /&gt;VCR-like user interface activates the synchronize button and displays event in the event list&lt;br /&gt;window. For infrequent synchronizations, provided feedback to the tester about the state&lt;br /&gt;of a script playback. However, when replaying scripts with no delay option set to detect&lt;br /&gt;race conditions in the application under test, the feedback introduced an artificial pause in&lt;br /&gt;the script. Based on past experience, the problem was identified as the processing&lt;br /&gt;177&lt;br /&gt;involved in rendering the event list window. The event list window feedback for&lt;br /&gt;synchronization was removed from the implementation.&lt;br /&gt;7.4 Evaluation Phase III: Evaluating RCN&lt;br /&gt;Bug Description CAMELOT Code&lt;br /&gt;A.1 Error message displayed when starting up RCNPublicServer in Win GC.ST.9, HCI.GR.3&lt;br /&gt;A.2 Configuration of PATH shell variable necessary for NativeLibrary.dll for&lt;br /&gt;RCNPublicServer in Win95/98&lt;br /&gt;GC.ST.9, HCI.GR.3&lt;br /&gt;A.3 ISServer does not always flush terminated RCNPublicServer GC.ST.9, DC.TC.1&lt;br /&gt;A.4 Documentation Errors GC.ST.11&lt;br /&gt;A.5 Inconsistent use of Quit, Exit, Leave, Cancel HCI.GR.1&lt;br /&gt;A.6 “Pick a IS” is grammatically incorrect. HCI.GR.1&lt;br /&gt;A.7 No version number displayed in RCNPublicServer, rcnClient, ISServer GC.ST.9, GC/DC.4&lt;br /&gt;A.8 Preference Dialog Displays Invalid Colors HCI.GR.7&lt;br /&gt;A.9 Preference Dialog Displays Too Many Colors HCI.UITG.10&lt;br /&gt;A.10 Preference Dialog Allows Same Color for Two Users in Same Session HHI.A, HCI.UITG.10&lt;br /&gt;A.11 No lock mechanism for simultaneous edits of Team Information DC.RC.4&lt;br /&gt;A.12 Race Condition Joining a Session DC.RC.2, DC.RC.3&lt;br /&gt;A.13 Ghost Cursor Hidden By New Applications HCI.UITG.7&lt;br /&gt;A.14 Sticky Mouse Buttons GC.IM.1, DC.RC.2&lt;br /&gt;A.15 Multiple Client Control of Public Machine GC.IM.1&lt;br /&gt;A.16 Incorrectly Translated Keys GC.IM.1&lt;br /&gt;A.17 Sticky Shift, Alt, and Ctrl Keys GC.IM.1, DC.RC.2&lt;br /&gt;A.18 Race Condition in rcnClient’s User Interface HCI/DC.1&lt;br /&gt;A.19 Race Conditions Joining Sessions, Users, Teams, Publics GC/HCI/DC.1&lt;br /&gt;A.20 Inconsistent use of OK, Okay HCI.GR.1&lt;br /&gt;A.21 Flickering Ghost Cursor DC.S.2, HCI.UITG.7&lt;br /&gt;A.22 Confusing Display of Session Clients HCI.UITG.1, HHI.A&lt;br /&gt;A.23 Memory Leaks in Public and Client When Ghosting DC.S.2, GC.ST.7&lt;br /&gt;A.24 Can’t play Indiana Jones from rcnClient GC.IM.1&lt;br /&gt;Table 15: Bugs discovered in RCN using CAMELOT&lt;br /&gt;and Rebecca-J&lt;br /&gt;CAMELOT provides techniques for the tests that should be performed on a CSCW&lt;br /&gt;application. Rebecca-J provides a system to conduct these tests. Using these tools, twodozen&lt;br /&gt;problems were discovered with the RCN system (see Table 15). Some of the&lt;br /&gt;problems were serious enough to jeopardize the planned commercialization of the&lt;br /&gt;software. This section discusses how the problems were uncovered using CAMELOT and&lt;br /&gt;Rebecca-J.&lt;br /&gt;7.4.1 Single User Tests&lt;br /&gt;Single user testing focused on the General Computing and Human-Computer Interaction&lt;br /&gt;aspects of RCN’s three main components: ISServer, RCNPublicServer, and&lt;br /&gt;178&lt;br /&gt;rcnClient. These tests were not concerned with distributed or multiuser computing&lt;br /&gt;issues, although they were occasionally revealed.&lt;br /&gt;7.4.1.1 General Computing Tests&lt;br /&gt;The first test conducted investigated problems users might encounter installing and&lt;br /&gt;operating the RCN system for the first time. To conduct the test, we installed RCN on&lt;br /&gt;three computers in our lab. Difficulties with the installation process were recorded and&lt;br /&gt;submitted to the RCN team.&lt;br /&gt;The first problem encountered during the installation was a false error reported when&lt;br /&gt;initializing the RCNPublicServer. The error indicated that an operating system specific&lt;br /&gt;KeyMap class could not be located by the application. The class was supposed to be used&lt;br /&gt;by the application to map Java keyboard codes to OS specific ones. The error looked&lt;br /&gt;ominous, but the RCN development team reported that it was harmless and should be&lt;br /&gt;ignored. Although it wasn’t a problem from an operational standpoint, the error feedback&lt;br /&gt;was misleading and was reported as bug: A.1 Error message displayed when starting up&lt;br /&gt;RCNPublicServer in Win. The bug failed two CAMELOT testing criteria: GC.ST.9&lt;br /&gt;HCI.GR.3.&lt;br /&gt;Another installation problem encountered related to a non-Java NativeLibrary.dll file&lt;br /&gt;that the RCNPublicServer used to convert Java keyboard and mouse events to native OS&lt;br /&gt;events. Although the file was installed with the RCN application, it was also necessary to&lt;br /&gt;configure an OS shell variable to indicate the file’s location. There was no documentation&lt;br /&gt;about this configuration process. Additionally, when the RCNPublicServer failed to find&lt;br /&gt;the .dll file, a cryptic error was printed with no explanation about how to correct the&lt;br /&gt;problem. This bug was reported as A.2 Configuration of PATH shell variable necessary for&lt;br /&gt;NativeLibrary.dll for RCNPublicServer in Win95/98, a violation of CAMELOT code GC.ST.9&lt;br /&gt;and HCI.GR.3.&lt;br /&gt;In the process of discovering and correcting installation problems, the RCNPublicServer&lt;br /&gt;was started and terminated frequently. The server would terminate abnormally when it&lt;br /&gt;couldn’t locate the NativeLibrary.dll file. After several abnormal terminations, it&lt;br /&gt;179&lt;br /&gt;became impossible to restart the RCNPublicServer. The following error message would&lt;br /&gt;appear:&lt;br /&gt;There is already RCN software running on this machine. Only&lt;br /&gt;one connection allowed per machine.&lt;br /&gt;The error was disturbing because there was no other RCN software running on the same&lt;br /&gt;machine as the RCNPublicServer. After some more investigation, a problem was&lt;br /&gt;discovered in the ISServer process. The ISServer maintained a list of the currently&lt;br /&gt;active public servers. During the abnormal termination caused by the missing .dll file, the&lt;br /&gt;message that the server was no longer active was not getting sent to the ISServer. The&lt;br /&gt;result was that the ISServer thought the public server was still active. The ISServer&lt;br /&gt;refused the let a new RCNPublicServer start on the same machine. This error was&lt;br /&gt;particularly frustrating because there was no mechanism to force the public server off of&lt;br /&gt;the ISServer’s list short of restarting the ISServer. Restarting the ISServer was&lt;br /&gt;difficult because it was located on a machine in a room with restricted physical and login&lt;br /&gt;access. The bug was reported as: A.3 ISServer does not always flush terminated RCNPublicServer,&lt;br /&gt;a violation of CAMELOT codes GC.ST.9 and HCI.GR.3.&lt;br /&gt;Functional testing of RCN focused on validating the system’s core capabilities. In&lt;br /&gt;particular, RCN’s ability to provide keyboard and mouse input to a public machine from a&lt;br /&gt;remote client was examined. Rebecca-J was not necessary for the installation and humancomputer&lt;br /&gt;interaction testing that had been conducted. For functional testing, however,&lt;br /&gt;Rebecca-J was crucial.&lt;br /&gt;The first functional test determined if all keyboard actions generated by a client were&lt;br /&gt;reported correctly to the public machine. To test this functionality a simple Java&lt;br /&gt;application on the public machine was written for the public machine that reported&lt;br /&gt;keyboard events received. On the client machine, Rebecca-J generated all possible&lt;br /&gt;keyboard events when the local RCN application controlled the public. The events were&lt;br /&gt;generated sequentially using Java’s keyboard key code and modifier. If an event was&lt;br /&gt;reported out of sequence or missing on the public machine, an error was detected.&lt;br /&gt;180&lt;br /&gt;Rebecca-J did not have a built-in ability to generate all keyboard event sequences. Using&lt;br /&gt;the record facility, a recording was made of several keyboard presses while the RCN client&lt;br /&gt;controlled the public machine. The script was saved as the Java class&lt;br /&gt;rebecca.recordings. KeyboardTest. Using the recorded keyboard events as a&lt;br /&gt;prototype, looping constructs were added to the class to generate all key codes and&lt;br /&gt;modifiers. The modified class loaded and executed in Rebecca-J.&lt;br /&gt;The first time the script ran the test failed. With the exception of the first few, most&lt;br /&gt;keyboard events reported by the public machine did not match those sent from the client&lt;br /&gt;machine. Closer examination revealed that the problem began after the VK_CAPS_LOCK&lt;br /&gt;event representing a press of the CAPS_LOCK key was sent to the public machine.&lt;br /&gt;Subsequent events were reported as if the CAPS_LOCK key were still active. Although this&lt;br /&gt;was correct from a functional standpoint, it caused problems matching client-to-public&lt;br /&gt;events. To correct the problem, code was added to the KeyboardTest class to resend&lt;br /&gt;special key events like CAPS_LOCK and NUM_LOCK to reset them to their original state. The&lt;br /&gt;modified script detected several errors with numeric keypad events reported as A.16&lt;br /&gt;Incorrectly Translated Keys, a violation of CAMELOT code GC.IM.1.&lt;br /&gt;Keys that could change keyboard state included CAPS_LOCK, NUM_LOCK, SCROLL_LOCK,&lt;br /&gt;SHIFT, CTRL, ALT, and INSERT. The problems encountered with the CAPS_LOCK event&lt;br /&gt;focused attention on the public machine’s keyboard state. What would happen if clientA&lt;br /&gt;triggered a keyboard state change using one of these keys, and clientB took control of&lt;br /&gt;the public?&lt;br /&gt;Rebecca-J’s triggering facility helped answer this question. A recording of clientB was&lt;br /&gt;made taking control of the public and typing the characters “aaa”. A trigger was set up on&lt;br /&gt;clientA so that whenever a key was pressed, clientB’s recording was played back.&lt;br /&gt;When the test was executed, the characters “AAA” were reported on the public machine.&lt;br /&gt;The error was reported as: A.17 Sticky Shift, Alt, and Ctrl Keys, a violation of&lt;br /&gt;CAMELOT codes GC.IM.1 and DC.RC.2.&lt;br /&gt;Experiences with keyboard functional testing were used to develop mouse event tests.&lt;br /&gt;The Java application that reported key events on the public machine was modified so that&lt;br /&gt;181&lt;br /&gt;mouse events were also reported. A simple manual test verified that each type of mouse&lt;br /&gt;event was correctly reported from the client to the public machine.&lt;br /&gt;To test for mouse state problems, Rebecca-J was used to make a recording of an RCN&lt;br /&gt;client’s mouse movement with the left mouse button pressed (i.e. Java MOUSE_DRAGGED&lt;br /&gt;events) while controlling a public machine. Using the event editor, all events except&lt;br /&gt;MOUSE_DRAGGED were removed from the script. When the recording was played back, the&lt;br /&gt;events were incorrectly reported as MOUSE_MOVED. This indicated that like the keyboard,&lt;br /&gt;the public OS maintained state information about the mouse. Pasting a MOUSE_PRESSED&lt;br /&gt;event at the beginning of the recording verified this hypothesis, as MOUSE_DRAGGED events&lt;br /&gt;were then correctly reported.&lt;br /&gt;Based on the evidence of mouse state and the keyboard stickiness bug. A similar ”sticky"&lt;br /&gt;experiment with mouse events was attempted. Again, Rebecca-J’s triggering facility was&lt;br /&gt;used. A recording of clientB was made taking control of the public and moving the&lt;br /&gt;mouse (i.e. MOUSE_MOVED events). A trigger was set up on clientA so that when the left&lt;br /&gt;mouse button was pressed, clientB’s recording was played back. When the test was&lt;br /&gt;executed, the clientB’s mouse events were incorrectly reported as MOUSE_DRAGGED on the&lt;br /&gt;public machine. The error was reported as: A.14 Sticky Mouse Buttons, a violation of&lt;br /&gt;CAMELOT codes GC.IM.1 and DC.RC.2.&lt;br /&gt;During the mouse “sticky” experiment, a major flaw with Rebecca-J was uncovered.&lt;br /&gt;Mouse events from clientB were being fed to the client window responsible for&lt;br /&gt;forwarding events to the public machine before the window was visible. This flaw was&lt;br /&gt;discussed in detail in Section 7.3.3.&lt;br /&gt;Because of the component existence flaw, the “sticky” experiment was verified manually.&lt;br /&gt;In addition to verifying the state problem, another problem was discovered. As long as the&lt;br /&gt;mouse button from clientA stayed pressed down both clients could control the public&lt;br /&gt;machine. This was reported as: A.15 Multiple Client Control of Public Machine, a violation of&lt;br /&gt;CAMELOT code GC.IM.1.&lt;br /&gt;182&lt;br /&gt;Additional manual functional tests were conducted to observe how well an RCN client&lt;br /&gt;could control a representative set of applications on the public machine. Two additional&lt;br /&gt;problems were detected using this technique. First, the CTRL-C character sequence was&lt;br /&gt;not detected in an emacs editor running on the public machine. Second, a gaming&lt;br /&gt;application Indiana Jones and the Infernal Machine ignored remote keyboard and mouse input&lt;br /&gt;when in graphics mode. These problems indicated that some applications executing on&lt;br /&gt;the public would not be usable from an RCN client. They were reported under A.24 Can’t&lt;br /&gt;play Indiana Jones from rcnClient, a violation of CAMELOT code GC.IM.1.&lt;br /&gt;Stress testing played an important role in detecting race conditions in the application.&lt;br /&gt;Section 7.4.2 discusses these tests in detail. In addition to race conditions, this test&lt;br /&gt;technique was applied to mouse and keyboard control of the public, and interaction with&lt;br /&gt;RCN client’s user interface.&lt;br /&gt;Under normal load, the public machine appeared to process remote keyboard and mouse&lt;br /&gt;events instantaneously. If any processing delay was incurred by the event on the public, it&lt;br /&gt;was smaller than the client delay caused by the human. A stress test was created to see if&lt;br /&gt;the public could handle these events faster than a human being could generate them. The&lt;br /&gt;test looked for two anomalies. First, would the public machine behave unusually? Second&lt;br /&gt;would some of the events be lost or ignored by the public? Rebecca-J was used to record&lt;br /&gt;a sequence of mouse events when the client was in control of the public machine. The&lt;br /&gt;script was edited so that only 20 mouse events were contained in the recording. Playback&lt;br /&gt;delay was set to none. Rebecca-J’s trigger counter was set to 50 and the&lt;br /&gt;CONTINUOUS_PLAY button was pressed on the script’s VCR-like control panel. This&lt;br /&gt;resulted in the script being executed 50 times with no delay for a total of 1000 events. On&lt;br /&gt;the public machine, the Java program used for functional testing of keyboard and mouse&lt;br /&gt;events was modified so that it reported a count of the number of events received. The test&lt;br /&gt;was repeated for keyboard events and a combination of mouse and keyboard events. In&lt;br /&gt;each case, the public machine behaved normally and the correct number of events was&lt;br /&gt;reported.&lt;br /&gt;Another candidate for stress testing was interaction with the RCN client’s user interface.&lt;br /&gt;Multithreading is an important capability of modern user interfaces because it allows the&lt;br /&gt;183&lt;br /&gt;user to perform several tasks simultaneously. An event generated by one UI component&lt;br /&gt;can be processed by the application while the user interface is ready to accept and process&lt;br /&gt;new input. Unfortunately, UI multithreading is also a breeding ground for race conditions.&lt;br /&gt;The RCN client seemed a good candidate for UI race condition problems because of a&lt;br /&gt;noticeable (one or more seconds) delayed application response to some UI component&lt;br /&gt;actions. If the application response was supposed to monopolize the user’s input, then the&lt;br /&gt;developer’s may not have considered the possibility that a different UI component action&lt;br /&gt;might occur before the response. Rebecca-J was used to record interactions with various&lt;br /&gt;client UI components. The event editor was used to order several UI component action&lt;br /&gt;events in sequence. For example, the event that triggered the display of the Preferences&lt;br /&gt;dialog was followed immediately by the event that triggered the display of the User&lt;br /&gt;Information dialog. Playback delay was set to NONE and the script was replayed. Rebecca-J&lt;br /&gt;was crucial to the UI race condition testing because of the small time window available for&lt;br /&gt;additional UI component actions. This small window made it difficult to perform the tests&lt;br /&gt;manually.&lt;br /&gt;Generally, the application performed admirably. Most of the time the application response&lt;br /&gt;produced modal dialogs that blocked other UI events until closed. Blocking UI events in&lt;br /&gt;other dialogs and windows prevented possible race conditions. A problem did develop,&lt;br /&gt;however, during a test that produced a modal dialog and full screen window. A UI event&lt;br /&gt;triggered the display of a modal dialog. Before the modal dialog appeared, a second UI&lt;br /&gt;event triggered the display of a full screen window. The full screen window blocked access&lt;br /&gt;to the modal dialog and the modal dialog blocked all other UI events until closed. This&lt;br /&gt;bug was reported as: A.18 Race Condition in rcnClient’s User Interface, a violation of&lt;br /&gt;CAMELOT code HCI/DC.1.&lt;br /&gt;During the single user analysis, several other CAMELOT tests were conducted including:&lt;br /&gt;documentation, compatibility, and volume. Documentation Testing examined RCN’s&lt;br /&gt;online help systems. A large number of errors were uncovered and reported as: A.4&lt;br /&gt;Documentation Errors, a violation of CAMELOT code GC.ST.11.&lt;br /&gt;184&lt;br /&gt;Compatibility testing considered problems that might occur between different versions of&lt;br /&gt;RCN’s client, public, and ISServer. This type of test was particularly important given the&lt;br /&gt;distributed nature of RCN. With many different public machines, client machines, and&lt;br /&gt;ISServers it seemed likely that versions could get out of synch. Although no tests actually&lt;br /&gt;were conducted using different versions of RCN, a basic compatibility problem was&lt;br /&gt;uncovered. There was no way for the user to determine the version of an RCN&lt;br /&gt;component. The bug was reported as: A.7 No version number displayed in&lt;br /&gt;RCNPublicServer, rcnClient, ISServer, a violation of CAMELOT codes&lt;br /&gt;GC.ST.9 and GC/DC.4.&lt;br /&gt;Volume testing investigated how the RCN application handled large data volumes. The&lt;br /&gt;client’s User and Team Information panels were candidates for this type of testing. The&lt;br /&gt;User Information panel fields were filled with a large amount of text (approximately 1K).&lt;br /&gt;Using Rebecca-J, a recording was made of viewing and then closing the panel. Playback&lt;br /&gt;delay was set to NONE. The CONTINUOUS_PLAY button was pressed on the VCR-like&lt;br /&gt;control panel. After several minutes the playback was stopped. No problems were&lt;br /&gt;observed with the application. The process was repeated for the Team Information panel.&lt;br /&gt;Again, no problems were observed.&lt;br /&gt;7.4.1.2 Human Computer Interaction Tests&lt;br /&gt;Once installed and running, a thorough examination of RCN’s human-computer&lt;br /&gt;interaction was conducted. User interaction with RCN takes place through a series of&lt;br /&gt;dialogs triggered from a central panel. Each dialog was exercised and, through&lt;br /&gt;CAMELOT, Schneiderman’s rules for dialog design were applied. Inconsistencies and&lt;br /&gt;dialog errors were detected using this approach including: A.5 Inconsistent use of Quit,&lt;br /&gt;Exit, Leave, Cancel, A.6 “Pick a IS” is grammatically incorrect. and A.20&lt;br /&gt;Inconsistent use of OK, Okay, a violation of CAMELOT code HCI.GR.1.&lt;br /&gt;During dialog design testing, a problem with rcnClient’s user preferences dialog was&lt;br /&gt;detected. The dialog allowed each user to select a ghost color preference. This selection&lt;br /&gt;determined the color of text associated with the user (e.g. the user’s id in a team or session&lt;br /&gt;panel), and of the icon displayed when the user was ghosting on the public machine. The&lt;br /&gt;185&lt;br /&gt;color editor was very sophisticated. In addition to selecting from 256 color swatches, the&lt;br /&gt;user could create a custom color by selecting millions of RGB or HSB values.&lt;br /&gt;RCN restricted users from choosing colors that were too close to white. Presumably this&lt;br /&gt;was because user text was usually displayed on a white background and would be difficult&lt;br /&gt;to view. Unfortunately the user was not made aware of restrictions until after completing a&lt;br /&gt;possibly lengthy color editing process and the dialog was closed with the OK button. It&lt;br /&gt;would be easy for the user to get frustrated because the application lets invalid colors be&lt;br /&gt;selected or constructed and then reports a “too close to white” error without giving the&lt;br /&gt;user guidance in how to choose a better color. This bug was reported as: A.8 User Preference&lt;br /&gt;Dialog Displays Invalid Colors and was considered a violation CAMELOT code HCI.GR.7.&lt;br /&gt;Additional thought about the color dialog led to the discovery of several additional&lt;br /&gt;problems. First there were too many color choices for a typical user. Schneiderman’s&lt;br /&gt;color design guidelines recommend using color conservatively. This resulted in bug A.9&lt;br /&gt;Preference Dialog Displays Too Many Colors, considered a violation of CAMELOT code&lt;br /&gt;HCI.UITG.10.&lt;br /&gt;Second, was the development of a human-human interaction test: Can two users share the&lt;br /&gt;same or similar color? A test was created where a user in the same session selected the&lt;br /&gt;same color as another user from the preference dialog. RCN provided no warning that the&lt;br /&gt;color was already being used. Shared colors could be confusing, particularly during&lt;br /&gt;simultaneous ghosting. This resulted in bug A.10 Preference Dialog Allows Same Color for Two&lt;br /&gt;Users in Same Session, considered a violation of CAMELOT codes HHI.A and&lt;br /&gt;HCI.UITG.10.&lt;br /&gt;7.4.2 Multiuser Tests&lt;br /&gt;Multiuser testing focused on the Distributed Computing and Human-Human Interaction&lt;br /&gt;aspects of the RCN application. These tests were not concerned with general computing&lt;br /&gt;or human-computer interaction issues covered during single user testing.&lt;br /&gt;186&lt;br /&gt;7.4.2.1 Distributed Computing Tests&lt;br /&gt;Multiuser race condition testing looked for problems with several users sharing access to&lt;br /&gt;the same data object. The first step in a race condition test was to identify a shared data&lt;br /&gt;object in the application. For RCN these objects were: personal information, team&lt;br /&gt;information, list of sessions, list of users, list of teams, and a list of public machines. The&lt;br /&gt;second step was to create a scenario that would likely trigger a race condition with the&lt;br /&gt;object. Simultaneous read/write or write/write operations on a shared object provide&lt;br /&gt;fruitful scenarios. The third step was to use Rebecca-J to record and instrument the&lt;br /&gt;scenario. Finally, Rebecca-J was used to repeatedly exercise the scenario in an attempt to&lt;br /&gt;trigger a race condition.&lt;br /&gt;The first race condition test was constructed around the personal information object.&lt;br /&gt;Every user logged into RCN had a unique personal information object. The object&lt;br /&gt;contained information including the user’s name, address, phone number, and e-mail&lt;br /&gt;address. A user could update this object at any time. Other users could view the object at&lt;br /&gt;any time. A user could restrict access to the personal information object to all users, only&lt;br /&gt;users on the same team, or no users.&lt;br /&gt;Since only the one user could modify a personal information object, a read/write race&lt;br /&gt;condition scenario was constructed. At the same time one client modified personal&lt;br /&gt;information, other clients would attempt to view it. Rebecca-J was used to record&lt;br /&gt;clientA selecting the Edit-&gt;Personal Information option, typing a few characters&lt;br /&gt;into the address field, and selecting OK to save the information. A second recording was&lt;br /&gt;made of clientB selecting View-&gt;User Information, selecting clientA from the user&lt;br /&gt;list, viewing clientA’s personal information, and pressing OK to exit the view. The&lt;br /&gt;playback delay option for both scripts was set to NONE. Finally, the CONTINUOUS_PLAY&lt;br /&gt;button was pressed on the VCR-like control panel for both scripts.&lt;br /&gt;This first test did not uncover a race condition, however it did clarify how RCN handled&lt;br /&gt;changes to the personal information object. A client modified the object locally using the&lt;br /&gt;Personal Information dialog window. When the OK button was pressed, the object was&lt;br /&gt;sent from the client to the ISServer. Subsequent view requests from other clients&lt;br /&gt;returned the updated object from the ISServer. A potential race condition exis
