Monday, July 18, 2011

CS 101 in plain English: Abstraction Explained

This is the first post among many to come. I plan on writing several posts on computer science concepts using non-computer science language. Do you know that moment when you are working in the yard or talking to your 17 month old son, and you realize that you were just provided with an exceptional way to explain something that is normally somewhat complex? No? Well it happens to me all of the time. Here are a couple right off of the top of my head. Bad pointer aliasing can be explained using pronouns and antecedents--or the lack thereof--, when and when not to use multiple threads can be explained with gardening, and how to use abstraction in software design can be explained with a conversation you had with your wife about the culture's overuse of the word "thing". OK, maybe that wasn't off the top of my head. Anyways, this post is about the last of the three. Unfortunately for someone who doesn't like to read my philosophical musings in the process of my getting to the point, well ... you get the point. So here we go.
"The beginning of wisdom is the ability to call things by their right names"      -- Confucious
Have you ever noticed that "thing" is one of the most useless words in the majority of English sentences? No? Well it is. And if you don't believe me, you should crash a sorority party or go to your average evangelical church, and try and make sense out of anything that is said. You would leave the sorority party with a firm conviction that thingies of a certain thingy makes someone's thingy look big . Or you would learn that someone's thingy looks smaller than a really small thingy, and that pissed off some girl in their thingy. Likewise, you would leave the church with the firm assurance if you believe some thing then that thing will magically make your life better. Anyhow, at the end of the day, we tend to say that we disdain such statements because they are "abstract". Well, let me add a slight correction to that critique. The problem is not that these statements are "abstract", the problem is that the statements are "poor abstractions". Allow me to explain.

Lemma 1: The fundamental function of language is to name things. 

There is a reason why the Judeo-Christian mythology begins with the fundamental role of mankind's gift of language being to name the animals. Imagine the usefulness of the ability to speak without names. You are back at the sorority party. Speaking may as well be hissing or some other cleaver-given-the-metaphor sound that all other creatures can only make one of. Imagine trying to communicate to your wife, "Don't eat that apple or we will both die." In this simple sentence we have named that delicious, bright, red-colored fruit that originated somewhere in Kazakhstan which was almost wiped out by the American temperance movement because it made excellent cider. We also named many other things such as the act of eating and death, but this will be sufficient to prove my point and to introduce Lemma 2.

Lemma 2: Naming things is the heart of abstraction.

When Adam should have been using the word apple in instructing his wife on how not to royally screw all of humanity, he would have been using a single word to describe all of the things that we know about apples. The word "apple" is shorthand for that sweet fruit that grows on a tree, can't be planted from seed or it will end up being completely different from the fruit that produced it, and was used to plunge all of humanity into wars, death, and calamity. But of course, the previous sentence only makes sense if you understand the definition of each of the terms used in that sentence and then the meaning of the terms in each of the sentences used to describe each of the terms and so on. You see, you can't communicate without names to point to collections and groups of ideas. In other words, the fundamental function of language is to name things. Q.E.D.

Also, notice that a name is a pointer to collections of ideas. In other words, you name something because you have found an idea or collection of ideas that need to be generalized for communication sake. Using this name allows us to deal with a complex idea simply without having to say or think about its complexity. We can throw around an idea with a simple word instead of a thousand.  But there's more. We can actually do really cool things with these complex ideas once we start using names. We can link multiple complex ideas with other complex ideas and form even more complex ideas giving them names as well. In fact, this is exactly what we did with the name "Apple". "Apple" is the combination of other ideas represented by the names "fruit", "core", "red", and so on. This is what abstraction is all about: greater and greater generalization. This is what makes language work. Abstraction only works because of our ability to name things. Or, to put it another way, naming things is the heart of abstraction. Q.E.D.

So, back to the name "thing". "Thing" is the ultimate abstraction. "Thing" includes all other names which brings on another lemma.

Lemma 3: The more and more abstract the idea a name represents, the less meaning that name possesses.


This principle is sort of like the one mentioned in the video above. Here is my adaptation of this brilliant Disney indoctrination. When a name means everything, it means nothing at all. So, when I am riding in the car with my wife, and she says, "Watch out for that thing in the road!" I will most likely not watch out for the cat that I caught sleeping on top of my freshly washed and waxed 1994 bronco which is now giving birth to kittens in the middle of the road--yeah right, like I would wash my bronco. Or when I say, "Hey babe, bring me one of those cold things out of the fridge", I shouldn't be surprised when she chunks a rotten tomato at me (for multiple reasons).  Q.E.D. OK, so I am tired of the whole Q.E.D thing. I'm going to stop doing that. It is a name which basically points to the idea that I have amply demonstrated the point that I was trying to make.

So when is the word "thing" actually useful? Well, observe my wording for Lemma 1 and Lemma 2. Those sentences actually would lack their ability to communicate so generally if I didn't have "thing" a.k.a super abstraction. That is the actual cool thing about abstractions. You can use them to manipulate multiple complex ideas simultaneously. You simply must be able to apply the idea back down to lower abstractions. This is usually what people mean by "Bring that down to my level" or "That was over my head": they mean that you need to apply that general idea's properties down to lower abstractions.

A Case Study on Abstraction: Picasso

The artwork of  Pablo Picasso is an excellent example on how abstraction is performed.

Olga Picasso - 1923

This is a "representational" painting. In other words, the artist tries very hard to represent --or for our purposes model-- something from the real world onto the canvas. Let's call this a real, concrete, or non-abstract painting.

Three Musicians
Notice that in this painting, Picasso has begun to see real figures in terms of more general and basics shapes. He has grouped colors, curves, and shapes into basic groupings and has diminished the individuality of each individual pixel of the painting into more generalized shapes. So, I like to suppose that his dialog went something like this, "Hmm, the shape of that face is basically rectangular, the shape of that hat is basically triangular." As a result, he diminished the nitty-gritty details on the real representation in order to abstract the particulars into greater and greater generalities. This is what abstraction is fundamentally about: either diminishing particulars or naming them so that we can forget about them for the sake of working with greater and greater complexities more easily.

Guernica

Now here is where abstraction becomes more useful to Picasso. How do you show the absolute horror of a Nazi Blitzkrieg on a defenseless Spanish town? Picasso chose to do so by taking his already abstract rendering of shapes and to distort them. I suppose, that for him this communicated the unreal or anti-beauty of the situation. This is part of the usefulness of abstraction: it enables us to perform operations--such as distortions-- upon representations of reality  which would not be possible if we were using concrete representations.

A Case Study on Poor Abstraction: Pollock

Eyes in the Heat--Jackson Pollock

This painting is a classic example of pure dog shit. Though this painting is considered "abstract" by genre, it is a very poor abstraction. This is the art equivalent of the incorrect usage of "thing". It has no actual connection to reality, so there is actually nothing being abstracted to begin with. "Eyes in the Heat" is an example of a useless abstraction. I don't care what Peggy Guggenheim says. This is simply abstraction for the sake of abstraction. In order for an abstraction to be useful, it must be actual generalizations of the real world. Even then, in order for it to communicate in full or in part, the actual reality being represented must be communicated in part or in full.

Case Study: Modeling Trees
(Warning I am about to gradually drift back into CS specific stuff here)

Suppose you just got a new job working for the U.S. forestry service. Your task is to write a program which will simulate all of the tree growth of the Sipsey Wilderness in Bankhead National Forest. By the way, if you haven't been to the Sipsey Wilderness, you need to go. It is the closest thing to a virgin forest that I have found in all of Alabama.

Of course, right away you have a problem. Each variety of tree will grow at different rates, multiply at different rates, respond to droughts differently, and some will even produce geriatric, mutant, ninja squirrels which will take out other competing tree varieties. It would be very easy, without a well designed abstraction, for this program to get out of hand, unreadable, and unmaintainable quickly. So here is how I would approach the problem, and here is the catch: you could design your entire program in plain English before writing the first stitch of code.

#1. I would write down a list of each variety of tree in this forest. So for example,

Pin Oak,
Water Oak,
Cypress,
Cedar,
Pine,
Geriatric Ninja Producing White Oak.

#2. Write down what each tree has in common.

They all grow,
The all have leaves (yeah, yeah, I know we will deal with that in a minute),
They all multiply,
They all perform photosynthesis.

#3. Write down general ways that the trees can be different

Some have needles while some have leaves (see, I told you I would get to it),
Some are angiosperms while others are gymnosperms,

#4. (Possibly number 1) Write down the aspects of the trees that you need to model.

Everything relevant to growth.
Everything relevant to reproduction.

So, as a result we end up with a lot of ways to form our abstraction tree (i.e. your inheritance hierarchy).

Every concrete unit (the element of smallest reduction) is a tree.
However a tree could be an angiosperm or a gymnosperm.
Also, some trees have needles while others don't.

We have three angiosperms, the Pin Oak, the Water Oak, and the ninja producing White Oak.
We have three gymnosperms, the Cypress, the Cedar, and the Pine.

Trees with needles (conifers): the Cypress, the Cedar, and the Pine.
Trees with leaves: The Pin Oak, the Water Oak, and the ninja producing White Oak.

So, as a result we get something like.

Tree
|
/\
                                                                                /   \
                                                            Gymnosperm    Angiosperm
                                                              |                                     |
                                                              |                                     |
                                                      Conifer                               Oak Trees
                                                      /   |     \                                 /      |     \
                                                    /     |       \                             /        |       \
                                       Cypress   Cedar  Pine                 Pin      Water  Ninja_Variety

OK, so what is the usefulness of this diagram? Think of it this way. Suppose you are walking through a forest doing some research and you come across the Geriatric Mutant Ninja Squirrel producing Oak tree. After defeating four squirrels who attacked you upon approach, you need to talk about that particular tree. However, the name is really long, and each time you say the name, Geriatric Mutant Ninja Squirrel producing Oak tree, another squirrel attacks--and that just isn't cool. So, you simply call the thing, a tree. You can talk about that tree (using the generic name of tree) growing and reproducing even though each tree does each of those tasks differently. So for instance, you can say that tree reproduces which will refer to the angiosperm style of reproducing as opposed to the gymnosperm style of reproducing. Also, this particular type of tree has mutant squirrels that fend off predators of its acorns. All of this is implicitly understood when you say that that particular tree reproduces. 

Meanwhile, you get chased off by another batch of squirrels, and you walk over to a majestic pine tree. You call this object a tree and talk about it reproducing. Except for this time, you mean an entirely different process. A pine tree is a gymnosperm which means that it doesn't produce any fruit-like structures like nuts. It produces its offspring in an entirely different way. However, so long as your usage of the term "tree" is directed at that particular pine tree, when you talk about it reproducing you are implying everything specific to that pine tree's reproductive system.

This is the usefulness of abstraction! If you understand this principle then you understand everything you need to know to use inheritance, dynamic binding, and even type casting. 

So, to show it in pseudo-code (for our friends here who aren't programmers):

     Tree t  references a tree that is also a pine tree.
      t ---- reproduce references the reproduce function of the pine tree.

    Tree o references a tree that is also an Pin Oak tree.
     o ----- reproduce references the reproduce function of the Pin Oak tree.

Or for my C++ understanding friends:

    Tree* t = new PineTree();
    t->Reproduce(); //this will call PineTree::Reproduce if it is defined, not Tree::Reproduce

   Tree* o = new PinOakTree();
   o->Reproduce(); //this will call PinOakTree::Reproduce if it is defined, not Tree::Reproduce

This means that your main program can be completely ignorant of how the particular tree reproduces. It can just focus on counting the years, raining, and sun-shining. The individual implementations will handle the rest. Also, this means that should you ever need to add a new tree to your model, then you just have to plug it into the abstraction. The main application will need little to no modification.

This is what all programming languages do. Think about an int in a c program. An int is usually 4 bytes strung together. Some very kind man, I suppose Dennis Ritchie, decided that programmers did not need to worry every time they wanted to add two numbers together about allocating 4 bytes of memory, calculating the binary value of the 4 bytes together, performing binary addition, and storing the new value back into the 4 bytes of memory. An int is an abstraction because it generalizes this operation. Of course this is different than inheritance, but it is a similar idea. You always want to hide complexity from your main application, especially when you have other programmers on the project. 

Conclusion

My goal is not to teach the syntax for implementing such abstractions--merely to try and teach how to begin thinking in the abstract. Ultimately see it this way. When you name things, you are grouping ideas together, and are therefore abstracting. Learn to do this in each of your software projects and you will find your code easier and easier to write, organize, and maintain. Think to yourself, "Hmm... can this code be generalized any?" "Is there some name I can apply to this section of code?" Eventually you will find yourself writing better and better code.

However, be sure to avoid abstraction merely for the sake of abstraction. Never create complexity, only hide it. Remember, for an abstraction to be useful, it must be derived from something that is a concrete representation of something real.


No comments:

Post a Comment