A. R. Pritchett, R.J. Hansman & E.N. Johnson
Department of Aeronautics & Astronautics
Massachusetts Institute of Technology
amyruth@mit.edu
The use of testable responses as a performance based measurement of situation awareness is a valuable measurement technique for testing of a wide-range of systems. Unlike measurement techniques that attempt to ascertain the subject's mental model of the situation at different times throughout an experiment, performance based testing focuses solely on the subject's outputs. This quality makes it ideal for comparing the desired and achieved performance of a human-machine system, and for ascertaining weak points of the subject's situation awareness.
This paper will focus on the use of situations with testable responses during simulations. During the simulation runs, the subjects are presented with situations. The situations are designed such that, if the subject has sufficient situation awareness, an action is required. This provides an unambiguous accounting of the types of tasks for which the pilots had sufficient situation awareness.
First, this method of assessing situation awareness will be briefly compared with other methods. The use of situations with testable responses in a representative flight simulator study will be detailed. Then, because the subject's responses depend heavily on the precision with which the situations are generated, techniques for robust generation of pre-determined situations will be discussed, and the performance of a current implementation will be discussed.
A COMPARISON OF PERFORMANCE BASED MEASUREMENT WITH OTHER METHODS OF SITUATION AWARENESS ASSESSMENT
Performance-Based Measurement of Situation Awareness has taken several forms. Some techniques measure the overall final performance of the human-in-the-loop system in any or all of its tasks (Endsley, 1995). This paper focuses on the use of Testable Responses for evaluating situation awareness, where the subjects are presented with realistic situations during the simulation runs which, if they have sufficient situation awareness, require decisive and identifiable actions.
Several other methods of testing situation awareness have been documented (Endsley, 1995; Adams, Tenney & Pew, 1995). Several complex techniques exist which attempt to determine or model the subject's knowledge of the situation at different times throughout the simulation runs. For example, the Situation Awareness Global Assessment Technique (SAGAT) freezes the simulator screens at random times during the runs, and queries the subjects about their knowledge of the environment. This knowledge can be at several levels of cognition, from the most basic of facts to complicated predictions of future states.
Several causal factors affect the actions of the subject, as shown in Figure 1. Comparing knowledge-based and performance-based techniques of evaluating situation awareness, we find they take measurements at different points in the process of user cognition. This illustrates the different purposes for these two measurement techniques.
For providing a detailed, theoretical assessment of the subject's situation awareness, the knowledge based techniques are more accurate, as they measure these variables directly. Performance-based measurement can only make inferences based upon the particular information the subject acted upon, and how it was interpreted.
However, performance-based measurements can satisfy several goals that knowledge-based techniques can not. The most apparent is its ability to ascertain the timing and substance of a user's reaction to realistic situations. For testing of systems, final decisions must be based on whether the user will be provided with sufficient situation awareness to perform the correct actions, which performance-based techniques measure directly. Knowledge-based measurement techniques, on the other hand, can only make reasonable guesses about the likely user's actions given their knowledge state.
In addition, performance-based measurement provides measures of situation awareness that are not otherwise easily achievable. It can identify constraints on a user, arising from their training and standard procedures, that would not be anticipated by a strict knowledge-based model of situation awareness. For example, in a flight simulator study by Midkiff and Hansman, ATC neglected to turn the subject towards the landing runway although the subjects could overhear the aircraft before and after them being giving the proper instructions; although the subjects' actions indicated they were aware of the situation, they did not take a strong reaction because of their reticence to assume the Air Traffic controller had made an error (Midkiff & Hansman, 1993). A knowledge-based measurement of the pilots situation awareness also would have provided a measurement, in this case, of the pilot's awareness of the problem; only performance-based measurement, however, could ascertain how the pilots would act upon this information within an established set of Air Traffic Control procedures.
Performance-based measurement is also able to determine perceived reliability of the knowledge users gather from any of a multitude of sources. For example, the same simulation study by Midkiff and Hansman found pilots were often unwilling to act upon information only overheard on ATC voice frequencies because they did not have confidence in the mental model it provided (Midkiff & Hansman, 1993). The study was therefore able to ascertain whether pilots had sufficient confidence in their mental model to take action. A knowledge-based measurement, in the same study, might have concluded that the pilots had correct knowledge, but might not realize the pilots would refuse to act upon it in the same manner as if they had verifiable, correct knowledge.
Finally, performance-based measurement works well in time-critical situations to find the real-time response, rather than planned or thought-through response. Subtle variations in situation awareness or current conditions may be causal factors in different actions by the user, as shown in autopilot mode-awareness simulation, where the pilot's actual, real-time reactions often varied significantly from those they named as 'what they would do' during non-time critical questioning afterwards (Johnson & Pritchett, 1995).
In summary, performance-based measurement is complementary to knowledge-based measurement in the development of a human-in-the-loop system. Each is useful at different times, and for different purposes, throughout the design process. For final testing of a system, performance-based measurement is very useful because of its ability to ascertain the resulting performance of the entire system, and to point to areas of situation awareness that are deficient. Although performance-based measurement does not provide as pure a measurement of a user's knowledge base as other techniques, it is able to illustrate the inter-relationship between the user's knowledge and the manner in which they use it.
USE OF SITUATIONS WITH TESTABLE RESPONSES IN A REPRESENTATIVE FLIGHT SIMULATION STUDY
This section shall use a recent flight simulator study to demonstrate the use of testable responses in measuring situation awareness and overall system performance. Both the development and performance of the measurement techniques shall be discussed.
The flight simulator study by Midkiff & Hansman was conducted to evaluate pilot utilization of the Party Line Information they can overhear on shared Air Traffic Control frequencies (Midkiff & Hansman, 1993). Two-pilot air transport flight crews, using the NASA Ames Man-Vehicle System Research Facility (MVSRF), flew a 3 leg flight, during which they were exposed to nine different situations.
The design and scripting of the situations is the most crucial aspect of the experiment design. The situations must be designed to have several traits. Most importantly, the situations must be designed such that, should the user have sufficient situation awareness, a clear and unambiguous response is mandated. As illustrated in Figure 2, the task of the experimenter is to expose the user to situations which force a measurable action, without attempting to examine the specifics of the 'inner' workings of the subject, such a their knowledge state.
When expert-users, such as airline pilots, are used as subjects, situations can be chosen for which standard operational criteria demand a certain response. For example, one situation in the Midkiff and Hansman simulator allowed pilots to overhear communications which suggested that another aircraft had not departed the runway the subjects were very close to landing on. In this case, action was required to avert a collision; a lack of action by the pilots could be considered to represent a lack of pilot situation awareness.
In addition, the situations should be chosen to cover the domain of important situations in which the system is expected to perform. For example, in the Midkiff and Hansman simulator study, the nine situations tested were the testable situations which had received the highest importance ratings in a pilot survey of Party Line Information importance. Testing of a final prototype system may include situations which test all conditions given in the system design specifications.
Finally, the situations must represent believable and recognizable occurrences to which the subject can be expected to react as they would in the real, non-simulated environment. For example, in the Midkiff and Hansman study, the subjects were flying an air transport simulator and believed they were over-hearing other air transport aircraft. Therefore, the 'Potential Collision' situations were staged to happen at a rate which was physically reasonable and were carefully scripted to portray to the subject a believable scenario of pilot confusion and/or mechanical failure on the part of the intruding aircraft.
The testable responses should be capable of examining the range of all probable actions and in-actions by the subject throughout the experiment. Care must be taken to look for actions which are different, less severe or incorrect in addition to just looking for the expected or desired result. For example, the response to the situation "Aircraft on Landing Runway" might be expected to be an immediate go-around. However, the subject's actions were often less severe, with pilots instead attempting to query ATC or each other to verify the knowledge they had gained from Party Line Information.
The strong reactions can be considered an indication of good situation awareness; correspondingly, the lack of any indication of awareness can be considered an indication of insufficient situation awareness. As discussed earlier in this paper, the uncertain or weak responses are also valuable measurements. They may illustrate problem areas such as lack of pilot confidence in information, feelings by the subjects that the expected reaction would defy accepted procedures, or other such unexpected impediments to action.
Performance-based measurement does not preclude other concurrent methods of assessing situation awareness. For example, Midkiff and Hansman also debriefed their subjects in an attempt to get pilot opinions on their situation awareness during the experiment.
GENERATING REPEATABLE SITUATIONS
When the purpose of an experiment is to test subjects' responses to specific situations involving multiple agents, there is a need to repeatably generate these situations across multiple trials. This is often complicated since subjects may not act consistently or as expected before the desired situation. As an illustration, consider the creation of an aircraft collision hazard. If the subject does not fly at exactly the speeds that were expected, the resulting situation can be completely different than that desired, or, as in this example as depicted in Figure 3, not occur at all.
In order to make situations repeatable, some form of feedback of system state must be used to control the pseudo-agents (agents other than the subject), constantly controlling their actions to create the desired situations. Traditionally, this has been achieved by using experimenters to control pseudo-agents, in real-time, during the simulation run. A Robust Situation Generation architecture has been developed (Johnson & Hansman, 1995) whereby system state information is used to automatically generate scripted situations for a human subject, shown in Figure 4.
Pseudo-agents have plans that consist of a desired trajectory specified by waypoints and a discrete action plan. System state is utilized in three fundamental ways: pseudo-agent waypoints specified as relative to the subject, discrete actions of pseudo agents triggered by a cue, and cued amendments to pseudo-agent flight plans. Instances of these features are specified in a pre-determined script.
A Robust Situation Generation system has been implemented as part of the MIT Aeronautical Systems Laboratory (ASL) Advanced Cockpit Simulator (ACS), illustrated in Figure 5. A single workstation is used to simulate the pseudo-agents, consisting primarily of aircraft and controllers, and is referred to as the experimenter's station. Pseudo-aircraft state and digitally pre-recorded radio transmissions are presented to a subject operating the cockpit simulator. The scripts can be designed interactively in preliminary simulation runs using the experimenter's station, and are then stored and used as often as required.
The achieved robustness of the system, i.e. the maximum subject variation that can occur while still producing scripted situations, has been tested by varying subject-aircraft
speed and position, as well as testing blunders by the subject, such as missing a turn. Unless the subject operates at an extreme limit of performance, situations were demonstrated to occur repeatably. The level of robustness depends on the level of fore thought and detail in the script, which can be made to react an arbitrary amount of subject variation, as required by the simulation.
Performance based measurement of situation awareness is a powerful tool for measuring the performance of a human-in-the-loop system and for identifying areas of inadequate situation awareness. The use of situations with testable responses can provide valuable insight into the user's situation awareness and how the user will act upon it.
The development of automatic robust situation generation has created a reliable mechanism for repeatable, consistent situations, making performance based measurement more reliable and easy to implement. Although the current implementation has been designed specifically for flight simulator experiments, Robust Situation Generation can also be implemented for any simulation involving multiple controllable agents.
Adams, M.J. et al (1995) "Situation Awareness and the Cognitive Management of Complex Systems" Human Factors 37(1) 85-104
Endsley, M.R. (1995) "Measurement of Situation Awareness in Dynamic Systems" Human Factors 37(1) 65-84
Johnson, E.N. & Hansman, R.J. (1995) "Multi-Agent Flight Simulation with Robust Situation Generation" MIT Aeronautical Systems Laboratory Report ASL-95-2
Midkiff, A.H. & Hansman, R.J.(1993) "Identification of Important 'Party Line' Information Elements and Implications for Situational Awareness in the Datalink Environment" Air Traffic Control Quarterly, 1