Automation Issues for Satellite Operations
James K. Kuchar (jkkuchar@mit.edu),
Daniel Hastings (hastings@mit.edu),
John Deyst (deyst@mit.edu),
Stephan Kolitz (skolitz@draper.com)
Problem Statement
Satellite operations have long been accomplished using a highly manual approach. As an example, the Air Force currently uses some 3200 personel to monitor 80 satellites. Although commercial outfits are much more efficient, each satellite still requires a force of a dozen or so human operators to maintain it. New global communications systems have been proposed which will contain constellations of many satellites. It is clear that current operations concepts will not be practical for these large systems. Automation of routine tasks can reduce the need for so many humans, thus reducing the cost of satellite operations. However, automation may effect system reliability, and will certainly effect development costs. The key question is where to put automation such that life cycle costs are minimized while system reliability is maintained or improved.
Approach
During the initial stages of the project, work has been done in developing a reliability model for the overall system. The assumption is that since automation effects reliability, such a model would be needed to compare different operations concepts. Markov models have been chosen to examine the system reliability because they can accurately model very reliable systems will less computational time than most simulation models. The entire system is functionally decomposed into primitive functions, and each of these functions is represented by a separate Markov model. Each model represents the failure and recovery process which is assumed to be the target for automation. Automation is used for routine tasks such as battery maintenece as well as for fault detection. To preserve generality for the model, an EVENT is defined as: a departure from the intended operating state of the system. This definition includes component failures, anomalous phenomena such as single event upsets (SEU's), as well as state values transitioning beyond a pre-defined threshold, such as a propellant tank temperature rising out-of-bounds.
The Model
The model consists of the following basic states, each of which represents a step in the failure-recovery process.
1) Event - the occurance of the event
2) Sensor Detection - the event has been detected
3) Communication - the occurance of the event has been communicated to the appropriate decision maker
4) Solution Determined - a solution has been found to recover from the event
5) Solution Implemented - the solution has been implemented
6) Recovery - the system has recovered from the event
Additional states are included as needed.
7) Permenant Loss of Function - the primitive function is unrecoverable due to a permenant failure of a component, or the exhaution of a finite resource (ie, propellant)
8) Processor Stall - the primary decision making processor has "stalled". In other words, the processor has failed to determine the solution to the event.
9) Stall Recovery - the secondary processor has recognized the stall, and has taken over the recovery process
10) Temporary Fix Performed - a temporary coarse solution has been initiated. This may be required for certain events which are extremely time critical such as thruster misfirings, or loss of attitude control. Safing of the spacecraft is a common temporary fix.
Levels of Automation
To simplify the problem, discrete levels of automation have been defined. Because processing may be done in space, by a ground computer, or by a human operator, each function has associated with it two levels of automation. The level of remote automation defines the interaction between the spacecraft and the ground segment. The level of ground automation defines the interaction between human operators and ground based computers. The six levels of automation are shown below. The definitions correspond to the remote level of automation. The levels are analogous for the level of ground automation.
Level 1: Fully Automated: The spacecraft performs the primitive function with no interaction from the ground. Operations costs associated with this level of automation are non-existant, however there is no capability to recover from a processor stall, should it occur.
Level 2: Paging: The ground station is "paged" upon the occurance of an event or a processor stall. If the secondary processor is a human, stall recovery may be slow.
Level 3: Supervision: Similar to paging, except that the secondary processor is continuously monitoring the actions taken by the primary processor. If the secondary processor notices that the primary processor is performing an unsafe action, it may intercede. Also, stall recovery is timely for both computers and humans.
Level 4: Cueing: The primary processor must request permission to perform any action. It suggests solutions and actions to the secondary processor, who then must OK the action before the recovery process progresses.
Level 5: Data Filtering: Here there is a shift of the primary processor from the satellite to the ground. The spacecraft preprocesses data, and sends the processed, filtered data to the ground. The ground segment is responsible for determining solutions to events.
Level 6: No Automation: Raw data is sent to the ground, and the ground must identify evens as well as solve them.
SOCRATES
With the help of Aidan Low (aidan@mit.edu), the Markov model code has been linked to a GUI (graphical user interface). The user utilizes a series of menus and dialog boxes to define a system architecture, along with the functional breakdown of each satellite and ground station. The component and processor reliabilities are given, as well as the level of automation desired for each function. The user also specifies operations and development costs, and revenues, if applicable. The Markov engine is then run from the interface, and returns functional operational probabilities over the lifespan of the project. Costs are also calculated, and may be plotted for the overall system as well as for each individual function. Furthermore, separate plots may be provided for development costs, operations costs, revenues, and total costs. Cost and reliability information from several runs representing seperate levels of automation may be viewed together. At the present time, the software tool is limited to independent functions, and the underlying cost model is quite crude. Work is currently underway to improve the package.
Future Directions
Current and future work will focus on improving the model so that it is more easilly used in practice. The underlying Markov engine will be adapted to handle functional interdependence, and more detailed software and human reliability models will be added to the SOCRATES package to aid in the estimation of these parameters. Through case studies and simple examples, it is hoped that the underlying cost drivers related to automation will be identified. Once this is accomplished, changes may be made to the existing model to shift the focus to these drivers. This should result in a simpler, more efficient model which may be more easily applied to real world problems.
This page is under construction. Thank you for your patience