http://www.unixreview.com/documents/s=2472/uni1034613148788/ Regular Expressions: Be Good to Your Objects October 2002 By Cameron Laird and Kathryn Soraiz Object orientation (OO) isn't the silver bullet it was advertised to be about a decade ago. In fact, there are plenty of programmers who say they're ready to abandon it for either a more procedural or functional style. Before you stop doing OO, though, consider a few ways you might do it better. That's the subject of this month's "Regular Expressions". What does OO have to do with the high-level languages that are the column's usual beat? Plenty, of course. First, such scripting languages as Ruby are as fully OO as Smalltalk or Java, and arguably more so. Second, most high-level languages have at least partial support for object-oriented or object-based source idioms. Most crucially, the true central focus of "Regular Expressions" is computing language usages that accomplish "more with less". While scripting languages are generally the most fertile territory for such returns, the same attitudes can help make the best of your work in C, Java, or Assembler, as well. Let's see how OO looks from a scripter's perspective. Patching OO OO's promise was to manage complexity better than the previous fashion of procedural "structured programming" did. Instead of spaghetti-like control flow, OO's characteristic is brief manipulations of data structures that have natural correspondents in real-world objects. The reality of current OO programming practice, however, falls short of that ideal. Sometimes it falls far short. One response is to patch languages in technical ways that make them better for OO. Java inspires a great deal of continued interest in this regard. Specific discussions anticipate such features as latent type checking, aspects, checked exceptions, and more, for Java 3.0. Certainly introduction of templates for C++, inner classes for Java, and other novel syntactic elements, have been significant watersheds in those languages' histories. Catalogue of Inheritance Errors All these strike us as superficial, though, in the sense that inheritance is at the heart of OO, and inheritance confusions so pervade programming as we see it done commercially. Inheritance, done right, gives OO its power. It's often done wrong, though. OO theoreticians like to ponder the Liskov Substitution Principle (LSP), see: http://okmij.org/ftp/Computation/Subtyping/References.html and http://www.objectmentor.com/publications/lsp.pdf as a fundamental guide to construction of class hierarchies. Eiffel inventor Bertrand Meyer probably captured LSP best with his contract-oriented paraphrase that "A subtype must require no more and promise no less than its supertype." LSP formalizes the standard pedagogical device of talking about inheritance as an "is-a" relationship: a cat is a mammal, a requirement specification is a document, a subclass instance is also a class instance, and so on. We're seeing, however, that some working programmers have such flawed understanding of inheritance that LSP's subtleties are wasted on them. Here are examples we've come across in production programs. We've rewritten the class names slightly to make the examples easier to understand, and to spare individuals any needless embarrassment. Rectangle subclasses Square . What seems to be happening here is that Rectangle has two canonical attributes, height and width, while Square has only one, side. A surprising number of programmers appear to conclude from that that Rectangle must necessarily subclass Square. They see Rectangles as having "more complicated" behavior, and believe that means they're members of a subclass. Equilateral Triangle subclasses Square . Equilateral Triangles and Squares share an interface: they both have a readable and writable side, and for both it's possible to compute a perimeter and enclosed area. If they have the same interface signatures, surely one is a subclass of the other. Document subclasses Rectangle . Document displays fit in visual rectangles, and Documents are more complicated than Rectangles, so the latter must be the former's superclass. As extreme as that sounds, some programmers appear to have a sincere belief that it's good OO analysis. We run into quite a few designs which express that Document subclasses Window, or Protocol subclasses Document, or equally inscrutable concepts. As physicist Wolfgang Pauli is said to have judged one paper given him, that's "not even wrong!" It's possible to program badly in any language, using any technology. The prevalence of these errors doesn't mean OO is doomed. It does suggest it's time for something to change, though. Apparently a significant number of programmers are learning OO by rote. They never have a chance to understand "is-a" theory properly; they're just cutting and pasting from others' work. Perhaps baroque inheritance is a symptom of the same disease. We've come across inheritance hierarchies that are at least formally correct, but that conceal gross redundancies. One production system has several correct class hierarchies, sufficiently complicated that each inheritance root has a method that is subclassed deeply enough to have at least seven implementations for each of several methods. What shocked us was that no more than two of the implementations were ever distinct. The rest were verbatim copies of each other. Apparently, the formally correct class hierarchy is a front for complete confusion about what the objects in the real world do. Proper redesign of the class hierarchies could reduce each of them to a superclass and at most one subclass. No one was equipped to spot this, though, in design or coding. Conclusion The OO implementations we've seen over the past year have reinforced our bias in favor of simple systems. Arguments about the advantages Java and C++ enjoy when they're done right fade when one realizes how rarely they're done right. Our strategy is to emphasize good basic OO education and use of highly expressive languages that express intent relatively concisely. More feature-full languages have no practical advantage if programmers are unable to remember correct usage that could realize that advantage. Here's a rule of thumb: just as you should keep your function definitions to a single page, more or less, and generally forego use of more than three distinct fonts on a page, make your inheritance simple. Learn about relations that class might enter -- including association and aggregation . other than inheritance. Look for ways to abstract and parametrize, more than subclass. Instead of worrying through the relations of Square, Rectangle, Triangle, and so on, write a Polygon superclass, and flatten out a layer or two of the class hierarchy. In an upcoming issue of "Regular Expressions", we'll explain how the dynamic nature of scripting languages changes OO programming idioms from the C++ and Java style commonly taught. Until then, keep simplifying your life.