Chapter 3
Errors, mistakes and fails in software are common, usually a fail cause inconvenience but no serious long-term damages or something as serious as huge money loss or even health damage. However, in some systems failure can have very big and serious consequences. This type of system is called critical system. There are three main types of critical systems:
The most important emergent property of a critical system is dependability. Systems that are unreliable and unsafe are often rejected by users. Possible failure cost may be so big users refuse to use the system. A system that may easily loose the information is very unsafe too, because data is often the most expensive part of organization.
As a summary of all seriousness of critical systems and software fails we can say that only trusted methods and techniques must be used for development.
Dependability of a system is the main component in “calculating” trustworthiness. Trustworthiness is a degree of user confidence that the system will operate exactly as it suppose to. Of course, calculating is not the right word, because such a value cannot be expressed numerically, but some abstract terms like “not dependable”, “very dependable” are used.
Trustworthiness and usefulness are not the same and not even directly related. Program may be very useful and easy to work with in many areas, but it may crash every time user hits more than three buttons at a time. Or vice versa, system may work as a solid stone, but all it does is printing random numbers. Four principal dimensions to system dependability are: Availability, Reliability, Safety and Security.
All of these may be decomposed into another, for example security includes integrity (ensuring that data is not damaged) and confidentiality. Reliability includes correctness, precision and timeliness. All of them are interrelated.
Some other system properties may be considered under the heading of dependability:
System availability and reliability are closely related to each other. Both of them can be expressed as numerical probabilities – availability is the probability that system will be up and running to deliver services; reliability is the probability that the system will work correctly. More precise definitions are:
By definition, the environment for a system is quite important and has to be taken into account. Measuring the system in one environment doesn’t mean it will work with same results in different environment. Three complementary approaches that are used to improve the reliability of a system are:
Safety-critical systems are systems where it is essential that system operation is always safe. The system should never damage people or system’s environment even in case of failure. There are two classes of safety-critical systems:
There is no 100% safe and reliable system, so various methods are used to assure safety. What we can do is to ensure that accidents do not occur or the consequences of an accident are minimal. Three complementary ways of doing that are:
Security is a system attribute that reflects the ability of the system to protect itself from external attacks that may be accidental or deliberate. Nowadays, security is very serious issue, especially for Internet or network-related systems. Mistakes in designing security system can cause system faults because of possible attacks. Three types of damage that may be caused through external attack are:
My thoughts:
We all got used to software failures and it became so common for us that programs are unstable and you can never be sure. I think this is bad thing, really bad, that’s why working on this chapter I was thinking of some “perfect future” where all software-based systems somehow managed to work perfectly fine, but actually, I don’t think this will ever happen. As you were saying on one lecture about mission-critical systems, the same software bug as there was 20 years ago came out and caused big damage. Even though computers are machines, they do mistakes, because they were created by human, and human indeed do mistakes.