Showing posts with label ck. Show all posts
Showing posts with label ck. Show all posts

2007-08-17

Third normal form for classes

It's been wisely and wittily said, though I don't know who by, that a relation is in third normal form (3NF) when all its fields depend on "the key, the whole key, and nothing but the key". This is generally considered to be a Good Thing, though people do deviate from it for the sake of performance (or what they think performance will be -- but that's a whole different rant).

I'd like to introduce an analogous notion of 3NF for classes in object-oriented programming. A class is in 3NF when its public methods depend on the state, the whole state, and nothing but the state. By state here I mean all the private instance variables of the class, without regard to whether they are mutable or not. A public method depends on the state if it either directly refers to an instance variable, or else invokes a private method that depends on the state.

So what do I mean, "the state, the whole state, and nothing but the state"? Three things:

  • If a method doesn't depend on the state at all, it shouldn't be a method of the class. It should be placed in a utility class, or (in C++) outside the class altogether, or at the very least marked as a utility method. It's really a customer of the class, and leaving it out of the class improves encapsulation. (Scott Meyers makes this point in Item 23 of Effective C++, Third Edition.)

  • Furthermore, if the state can be partitioned into two non-overlapping sub-states such that no methods depend on both of them, then the class should be refactored into two classes with separate states. This also improves encapsulation, as the methods in one class can now be changed without regard to the internals of the other class.

  • Finally, if the behavior of a method depends on something outside the state, encapsulation is once again broken — from the other direction this time. Such a method is difficult to test, since you cannot know what parts of what classes it depends on except by close examination.

At any rate, this is my current understanding. My Celebes Kalossi model grew out of considering how methods and state belong together, and this is the practical fruit of it.

Update: I didn't talk about protected methods, protected instance variables, or subclassing. The subclasses of a class are different from its customers, and need to be considered separately if any protected methods or state exist. I am a firm believer in "design for subclassing or forbid it": if you follow the rules above, then instead of subclassing a class, you can simply replace it with a work-alike that has different state while taking no risks of breaking it. (You probably need to make the original class implement some interface.)

Furthermore, the static methods of a class have static state, and the same analysis needs to be performed with respect to them.

Comments?

2007-02-18

The heart of Celebes Kalossi

I'm finally ready to explain the central ideas of my object-oriented programming model Celebes Kalossi (read the linked post first for the terminology). I've been hanging fire on it for a long time over a terminological issue, which I've decided to just punt on.

In CK, there are two kinds of relationships between classes: subtyping and incorporation. Subtyping is like the relationship between a Java interface and its superinterface; incorporation is something like C++ private inheritance. These two concepts are intertwingled in various ways in various programming languages, but CK completely separates them. Every class subtypes one or more classes except for the root of the class hierarchy (if there is one); classes can incorporate zero or more other classes.

When you declare that class A subtypes class B, you mean that A (the subclass) has all the methods that are declared or defined public in B (the superclass) and all of B's superclasses, plus those declared or defined public in A itself. You are not saying that any of the definitions provided in B or its superclasses are or are not available in A, so there is no problem if a given method is public in more than one superclass. You are also suggesting that an instance of A is Liskov substitutable for an instance of B, although it is impossible to check this property mechanically. We call the public methods of A and its superclasses the interface of A.

When you declare that class C incorporates class D, on the other hand, you mean that the non-private methods defined in class D (the incorporated class) are effectively given the same definition in class C (the incorporating class). Provided that the methods in D have been properly declared in class C, they can be invoked on objects of class C just as if they had been defined as standard or public methods of class C. It does not matter if a method is defined in class D and declared in class C (which is C++ private inheritance) or vice versa. However, the fact that a method is public in class D does not make it public in class C unless it is declared public in class C or one of its superclasses: incorporation affects behavior but not interface.

For convenience, we say that a class subtypes itself and incorporates itself. Both incorporation and subtyping are transitive: class A subtypes class B's superclasses as well as class B itself, and classes incorporated by class D are implicitly incorporated by class C as well. All the methods in the incorporated classes of C are placed on an equal footing: it does not matter how they were incorporated. A loop in the subtype hierarchy specifies that all the types in the loop have the same interface; a loop in the incorporation hierarchy is ignored, so even if C incorporates D, D can also incorporate C. Note that an implementation may provided types which are outside the model: typical examples would be numeric types, strings, and exception objects.

If a method is declared in a class X or in any of the classes incorporated (directly or indirectly) by class X, it must also be defined exactly once in one of those classes. (If there are no definitions of some method, the class is abstract and must be declared as such.) If incorporating class Z would cause conflicts, CK provides for renaming or hiding unwanted methods when incorporating a class: class L can incorporate class M including specified methods (in which case all others are hidden), excluding specified methods, or renaming a method as a new name. Classes that incorporate L see the changed view.

There are no inheritance rules in CK, because there is no inheritance as such: if you want X to use the same implementation of method m as its superclass Y, then incorporate whichever class Z from which Y gets its implementation of method m. You can do the same thing for classes that are not related to you in the superclass hierarchy, or even for your subclasses if you really want to -- the model is nothing if not flexible.

I'll have another post, hopefully soon, about how CK might be implemented on the JVM or CLR.

2006-03-29

Celebes Kalossi 2.0

I've decided it's time to post about my object-oriented programming model, Celebes Kalossi, again. All previous statements are inoperative, so you don't have to look back at my earlier postings. This posting will be mostly about terminology.

In CK, there are classes. A class contains declarations of state variables (aka instance variables, fields, data members) and both declarations and definitions of methods. State variables are only accessible from within the class: they are all private in Java terminology.

A declaration of a method specifies the method's name and its signature; that is, the type of its return value and the names and types of its arguments. In the model, no two methods in a class can have the same name; an actual implementation might provide Java-style method overriding, since overriding is resolved at compile time and is basically convenient syntactic sugar.

A definition of a method specifies everything the corresponding declaration does, but also includes the code of the method. If a class contains a definition of a method, it has no need to contain its declaration too.

A method may be public, private, or neither; the third type will be called standard methods here. A public method can be called from anywhere, and can be invoked on any object. A private method cannot be invoked outside the class in which it is defined, so there is no point in declaring one. Basically, it's just a subroutine. The difference between standard and private methods will be explained in another posting.

Standard and private methods can only be invoked on the self (this in Java) object, implicitly or explicitly. The most important rule of CK is that you cannot invoke a method on self that is not declared (not necessarily defined) in the current class.

Finally, by "Java" I mean "Java or C#". More later.

2005-08-22

Celebes Kalossi: Who knows best?

This post expands and enlarges on my previous post OOP Without Inheritance.

Celebes Kalossi (CK) is the name of a model of object-oriented programming I'm developing. It's also the name of a (currently hypothetical) programming language that implements it. Here I'm going to talk about how objects are constructed in CK. There will be later posts that discuss various other features. I'll say right off that CK is class-based (loosely speaking) rather than delegation-based like Javascript or Self.

Mainstream OO programming languages like Smalltalk, Java, C++, and C# natively support a Daughter Knows Best (DKB) model: a method in a subclass overrides the correspondingly named method in a superclass. All except Smalltalk also require that the static types of the arguments match in order for an override to be successful. I say Daughter Knows Best because the overridden method need not be invoked at all unless the overriding method decides to do so.

Simula and Beta, per contra, use a Father Knows Best (FKB) model: the superclass method is invoked first and foremost. In fact, subclass methods are not invoked at all unless the superclass method decides to do so. This is perfectly symmetrical with the "Daughter Knows Best" model.

There are advantages and disadvantages to both models. In DKB, the subclass can take complete control, which is sensible because it understands its own needs better than the superclass. However, when a superclass method invokes a subclass method under the guise of invoking its own method (because the self/this object belongs to the subclass), it has no guarantees that the subclass method will Do The Right Thing. In FKB, the subclass is stuck with the behavior that the superclass imposes, for good and bad: good, because the superclass can prevent the subclass from going off the rails; bad, because the superclass may do things the subclass does not want.

In CK, all classes are equal, and specify exactly which behavior is appropriate for them. This is achieved by partitioning the notion of class into two notions: type and behavior. To create an object, you specify its type: in statically typed Celebes Kalossi, variables specify their types. The methods you can invoke on a type are the methods it declares public, plus the methods declared public by all its supertypes. A type can have more than one supertype. Consequently, a type is like an interface in Java or C#, except that types can have instances, whereas interfaces can't.

A behavior is simply a named set of methods plus instance variables. Objects cannot be created nor can variables be typed using the name of a behavior. The instance variables are always private to the behavior alone, so if you want to make them accessible to the outside world, you must provide getter and/or setter methods for them within the behavior. The methods, however, can be specified as private, external, or standard (no keyword in the syntax). Private methods, like instance variables, are defined by the behavior and visible only within the behavior. External methods can only be declared, not defined, and indicate which methods this behavior depends on but does not itself define. Finally, standard methods are defined by the behavior and are available in types that use the behavior.

What do I mean by "use the behavior"? In the declaration of a type, the programmer can specify that it has no behaviors, in which case it is an abstract type, or the programmer can specify one or more behaviors. The behaviors must fit together like jigsaw puzzle pieces: if a method is declared as external in one or more behaviors of the type, it must be defined with a standard definition in exactly one behavior of the type. Furthermore, each of the public methods of the type (including the public methods of its supertypes) must correspond to a standard method defined in a single behavior. However, behaviors are never inherited from supertypes: a type that wishes to have the same behavior as a supertype must specify that behavior explicitly.

The model also provides control of the visibility of individual method names. When a type declares that it uses a behavior B, it may mention standard methods to be suppressed or renamed. A suppressed method is not visible to the jigsaw-puzzle mechanism described above; this allows creating types whose behaviors have method names that accidentally clash. Renaming a method allows it to be invoked from other behaviors under a different name. This allows the DNB model to be simulated: the subtype uses its supertype's behavior, suppressing conflicting methods that are not going to be called, and renaming ones that are. Thus there is no mechanism in CK corresponding to super in DKB languages or inner in FKB languages.

There is some syntactic sugar that makes the model easier to use. In particular, a behavior A can declare that it uses another behavior B. This simply means that whenever a type uses behavior A, it also automatically uses behavior B; no special relationship between A and B is necessarily implied. Furthermore, types can define instance variables, private methods, and standard methods as well as declaring external methods. In terms of the model, these things are really declared in an anonymous behavior which the type automatically uses.

Behaviors are something between mixins and traits. Traits don't have instance variables, which in the traits model are defined within classes that incorporate the traits. Mixins have instance variables that are visible to all other mixins within the class, bringing in all the problems of unrestricted multiple inheritance. Behaviors share the orthogonality of traits: you can just combine them without worrying about the order in which they are to be combined, since identical instance variable names are irrelevant (instance variables being private), and identical method names are forbidden unless renamed or suppressed. Since each behavior carries its own state with it, and behaviors are plug-replaceable, one can create closely related types, or even equal types (loops in the subtype-supertype relationship are not forbidden) that implement variant behaviors.

2005-06-12

OOP without inheritance

Well, I've read Schärli's Traits thesis, which is getting uptake in Perl 6 and Fortress [PDF] among other places. And it makes me furiously to think.

Do we really need inheritance (or delegation, which is inheritance at the object level) in a traits-based world? Why bother with overriding methods when they can be just replaced selectively? Sending to super is a marginal feature, and can easily be simulated by selective including and renaming from traits.

Here's my current vision, best if eaten by <date> yada yada:

Code is encapsulated in methods, so we still have methods, but classes go away in favor of two sort-of-new concepts, behaviors and types. A behavior is just a set of methods and associated private state variables. The methods in the behavior can be local (in which case they are not visible outside the behavior, and are basically just subroutines), standard, or abstract. You cannot instantiate a behavior as such, nor can a variable be typed (in a statically typed language) to a behavior. It's just a pile of things that an object might be able to do. If a behavior wants to expose its private state, it does so with a getter and/or setting method; a smart language will make it easy to specify that you want these things.

Types are used to instantiate objects and declare variables. They are composed out of behaviors: the methods available on an object of type T are the non-local methods provided by the behaviors out of which T is composed. Unless the type is itself abstract, any abstract methods in one behavior must be supplied by a standard method in another behavior, a sort of peg-and-hole operation. You can bring in a single method from a behavior, suppress a method in a behavior, or rename a method in a behavior when constructing a type; that allows you to compose behaviors that don't quite fit together perfectly. All this is verified when the type is compiled; a clever IDE can notice discrepancies and warn the programmer about them.

We also need a notion of private vs. public methods, but it's not clear to me whether this should be declared directly on the method (i.e., where the behavior is) or at the type level. That's just notational, however. With that established, we can give an implementation-independent definition of subtypes and supertypes, declared at the type level. The compiler verifies that the methods of a declared {sub,super}type are a {super,sub}set of the type currently being specified, and that arguments are appropriately contravariant and results covariant (in a statically typed language) so as to provide minimum requirements for Liskov substitutability.

I'm not sure yet what the constructor/destructor story might be: I like factory methods better than constructors anyhow, and perhaps that's the Right Thing.

Ideas? Comments? WAGs?

2005-04-11

Divided Classes: Having your subclassing and not eating fragility too

It's a maxim of OOP that "inheritance breaks encapsulation". The difficulty is that in order to subclass a class and override some of its methods, you have to make sure that the method you are overriding is actually being invoked by other methods that you are not overriding. and that they aren't just bypassing you.

The usual solution to this problem amounts to "Forget inheritance; use delegation of one kind or another instead" or else "Document the connections for subclassers and let them hope they can trust you to get it right."

There is, however, a general method for preventing this problem, which consists in dividing your classes into what I will call, unoriginally, divisions. A division of a class is made up of a part of the class's state variables and all the methods that refer to the instance variables in that part. (State variables are instance variables, except for ones that immutably refers to an immutable object; a final String variable in Java, for example, is not part of the state.)

In particular, to divide an existing class into divisions, start with any state variable, then include all the methods that refer to it, then include all the state variables referred to by those methods, and so on until there's nothing more to do. That's one division. Then start with any remaining state variable, do the same thing, and so on until there are no more state variables. Any remaining methods are convenience methods, and are put into divisions by themselves. We can ignore private methods in this process, since they aren't visible to subclasses.

Now the rule is, When subclassing, you must override all the methods in a division or none of them. With all the methods in a division overridden, all the state shared by those methods is irrelevant to the subclass, and and other methods in the superclass don't refer to that state in any way. So encapsulation isn't broken by subclassing in this style.

Furthermore, you must not merge divisions in the subclass.. That is, there must be no shared state between an overriding method in one division and an overriding method in another division. That keeps you safe from having one overridden method call another that corrupts its subclass state. You can add state variables to each overriding division, though, because you control everything.

I can't claim credit for inventing this; it was written up in a paper (seemingly unavailable on line) called "Modular Reasoning in the Presence of Subtyping". I have reinvented the terminology as well. If anyone remembers the source, please tell me and I will credit it. Thanks.