Search within Lanny's blog:

Leave me comments so I know people are actually reading my blogs! Thanks!

Wednesday, March 18, 2009

Set Phasers on Stun -- Bad Designs Kill

Following my adviser's recommendation, I finally picked up the book Set Phasers on Stun by Steve Casey and read it with utter interest. The book is really light reading, but as I read on, my heart felt heavier and heavier.

The book contains 20 short, true stories of how design errors in various technologies led to terrible disasters, often resulting in the loss of many lives. Among them were the shut off handle on a command module capsule that caused the death of three Russian astronauts because it takes too long to turn, the control lever for autopilot vs. manual control caused a supertanker to hit a rock and leaking millions gallons of oil into the ocean because the captain slipped it into an unintended third control mode in panic, the Airbus A320 plane that crashed in an air show demo, killing many passengers, because the pilot was over-confident with the plane's autopilot, and a ferry ship that capsized because the captain didn't know the bow cargo doors were not closed when the ship set off. The key message the author tries to get across is that designers of technology MUST take into consideration human factors, especially possible human errors and capability limitations in tense and nervous situations. Learning from mistakes might be too costly because Bad Designs Kill.

The title of the book comes from the name of the first story in the collection. On March 23, 1986, Ray Cox, a patient in his 30s undergoing treatment to have a tumor removed from his back, was taking his ninth regular treatment with the Therac 25 machine. The Therac 25 is a highly sophisticated machine that's capable of using high-energy radiation to hit cancer cells to any point on or in a person's body with pinpoint accuracy. The machine can operate in two modes: the high-power "x-ray" mode and the lower-power "electron beam" mode. What Ray was to receive would be the lower-power "electron beam" mode. He would not feel a thing. When Mary Beth, the radiotherapy technician, started the procedure in the control room (a different room), she mistakenly typed "x", the command to use the "x-ray" mode. Noticing her mistake, she quickly moved the cursor back and used the "edit" function to change it to command "e", the command to use the "electron beam" mode. She had no idea that her quick sequence of keystrokes within 8 seconds was something the machine had never been run under before. The machine retracted the think metal plate used during "x-ray" mode but left the power setting on maximum. When Mary entered the command to initiate the treatment, Ray saw a blue flash and felt as if he was hit by a lighting bolt. Back in the control room, a message popped up on the monitor with the error message, "Malfunction 54, treatment not initiated." Feeling quite puzzled, Mary re-entered the command to initiate the treatment. Ray was rolling and screaming in pain when he was struck the second time, and he began to call out loud for help. Soon the third shock struck, and Ray jumped from the table and ran to the door. Nobody at the hospital knew what was going on, and only after the same incident happened again to another patient did they realize something was seriously wrong with the machine. Instead of receiving 200 rads of radiation, Ray was shot with 25000 rads. In the next few months, tissues hit by the beams died, leaving massive lesions in Ray's upper body. "Captain Kirk forgot to put the machine on stun," said Ray Cox, trying to keep his humor. Four months later, Ray Cox died.

At least three things went terribly wrong in this tragic incident:
  1. The unexpected key sequence within the short time window should not have allowed the power setting to be left on maximum. The kind of operating mistake Mary made is typical human error and should have been expected and tested against.
  2. The error message should have been clearer, at least warning the operator that something had been seriously wrong (whether it is serious or not) and that the beams have already been shot. This would have prevented Mary from firing the beams again and again.
  3. A strict procedure should have been in place to make sure the patient undergoing treatment is been monitored real-time. This would also have spared Ray from the additional two shots (whether or not it might make a different of life and death in Ray's case).
Here's an article from People covering this story.

As a researcher in AI and robotics, it is likely that I'll be designing advanced and complex systems to be used in real applications. While enjoying the thrill and fun of designing cool toys, it is also very important to always keep in mind the responsibilities we hold. Especially in the case of people working with automation. We should always take into consideration the kind of errors human might make and design accordingly to handle such situations accordingly. As automation and robots emerge in many aspects of people's lives (I am talking about more direct interactions here, not the kind of secluded factory settings), we have to be utterly careful and make sure people don't get injured or killed.

UAV used in our field trial
I couldn't help but remember an incident happened during one of our UAV field trials. Our research group works on using Unmanned Aerial Vehicles to support Wilderness Search and Rescue operations. Once we performed a field trial at Squaw Peak, Provo, Utah, a very mountainous area. The UAV is capable of maintaining a fixed height above ground, so to relieve some of the workload off the operator. The control software also overlays the area with a color map, warning the operator if the UAV is too close to the side of the mountain. When the UAV was flying along the side of the mountain, the operator noticed from the color warning that the UAV was too close to the side of the mountain, so he commanded the UAV to fly away from the mountain. Then the autonomy of maintaining fixed height above ground kicked in and the UAV quickly descended. The operator noticed that the UAV is still too close to the side of the mountain and kept "pushing" the UAV away from the mountain. Eventually the UAV lost control and crashed to the ground because it had been descending quickly continuously and failed to climb up fast enough when it ran into a small hill. Both capabilities were supposed to help the human better fly the UAV, but the combination in the specific situation actually directly led to human error and the plane crash (luckily the plane was not badly damaged).
Squaw Peak, Provo, Utah
So what should we do? There's obviously the need for extensive testing. The designer should also consider possible human errors and design accordingly to minimize the chances of human errors and deal with human errors when they do occur. Especially if the failure of the system might create catastrophic consequences, extensive safety checks must be built in. We'd rather for a machine to fail than for it to kill. Then, there's always the possibility of insurance polices as the last resort, as shown in today's Video of the Day.

Video of the Day:

For only $4 a month, you can achieve peace of mind in a world full of crime and robots, with Old Glory Insurance!