Erik Engbrecht's Blog

Wednesday, March 05, 2008

Does Google Reader read GMail?

Up until recently, all my "Top Recommendations" in Google Reader were related to IT, software development, math, and finance. This makes perfect sense, because that covers all the topics in my subscriptions. However, today I noticed the Google Reader suggested an RSS feed for a blog about local (kind-of) restaurants, clubs, bars, etc. This means that:

Google Reader has a decent idea about where I live
Somehow Google Reader figured out that I enjoy dining out

The best I can figure is that Google Reader is basing my recommendations on the contents of my email. While I don't have any feeds about such things, I do exchange emails with my wife and friends about making dinner reservations, where to go out to eat, etc. The other possibility is that it figured it out using my search history, except that the contents of my search history look pretty much the same as my subscriptions (I know, I'm a nerd). This is because usually I'm not logged into Google when I am searching. Consequently, I think it is reading my GMail.

Anyone else have any ideas?

Sphere: Related Content

Monday, March 03, 2008

Floating Points Stike CNN!

Looks like developers at cnn.com need a lesson in floating point numbers:

Either that or the Democratic primary is truly too close to call...

Sphere: Related Content

Monday, February 18, 2008

Typesafe Ranged Integers in Scala

For the past few weeks I've been working on a basic numeric library for Scala for eventual contribution to the Scalax project. My goal is to create an extensible, type-safe library that provides numeric types that are easy to use and more closely resemble their mathematical foundations than the primitive operations on which they are built. My initial focus has been on building a working rational numbers implementation with unlimited precision integers. However, today on the Scala mailing lists, someone raised the issue that Scala's use of primitive numbers doesn't belong in such a high-level language. When I read this, I thought "Hmmmm, my library helps with that."

Integer Underflow and Overflow

First, let's consider integer underflow and overflow. The standard Java/Scala integer type is a 32-bit twos-complement signed integer. That means it can representing numbers between 2147483647 and -2147483648 inclusive. Often times this is plenty range, so what we really want is for exceptions to be thrown when underflow and overflow. So consider the following:


scala> import scalax.math.Int32Domain._

import scalax.math.Int32Domain._



scala> val a: Int32 = 5

a: scalax.math.Int32Domain.Int32 = 5



scala> val b: Int32 = 7

b: scalax.math.Int32Domain.Int32 = 7



scala> a + b

res0: scalax.math.Int32Domain.N = 12



scala> a * b

res1: scalax.math.Int32Domain.N = 35



scala> val i: Int32 = 2147483647

i: scalax.math.Int32Domain.Int32 = 2147483647



scala> i + 1

java.lang.ArithmeticException: result exceeds maximum value: 2147483647

 at scalax.math.Int32Domain$Int32.checkRange(int32.scala:55)

 at scalax.math.Int32Domain$Int32.$plus(int32.scala:60)

 at .(:8)

 at .()

 at RequestResult$.(:3)

 at RequestResult$.()

 at RequestResult$result()

 at sun.reflect.NativeMethodAccess...

Magic! Overflow has been detected. Well, not quite magic. The way it works (and I'm sure it could work much, much better, I just through this together tonight) is that it converts the 32-bit integers into 64-bit integers prior to calculations, performs the calculations, then checks the result to ensure that it is within range. If it is within range, then the result is converted back into a 32-bit integer and returned. Between the usage of an object instead of a primitive and all this extra work for range checking, this class is probably at least an order-of-magnitude slower that using primitive Ints.

Bounded Ranges

Now let's consider bounded ranges. In order to accomplish this, we simply create an object that extends Int32 and overrides its minimum and maximum value.

scala> object SmallDomain extends scalax.math.Int32Domain { | override val max: N = 10 | override val min: N = -10 | } defined module SmallDomain scala> val d = SmallDomain.Integer(5) d: SmallDomain.N = 5

Now we have a domain that is limited from -10 to 10 inclusive. Attempting to mix integers from this domain with integers from other domains will yield a type error:


scala> i + d

:9: error: type mismatch;

 found   : SmallDomain.N

 required: scalax.math.Int32Domain.N

       i + d

           ^



scala> d + i

:9: error: type mismatch;

 found   : scalax.math.Int32Domain.Int32

 required: SmallDomain.N

       i + a

           ^

Even if you had a Int32 within the range of SmallDomain, you would still receive the type error. Also, as you can see, Int32 will not allow itself to be mixed with an integer from SmallDomain. If you look at the code (which I'll publish soon, I promise) for Int32Domain and its superclasses, you will see a lot of complex stuff involving mixins and type constructors. However, the end class is easily extensible by the user and still provides good type safety.

Rationals

I mentioned rationals at the beginning of this blog, so I thought I would give a little sneak-peak:


scala> import scalax.math.BigRationalDomain._

import scalax.math.BigRationalDomain._



scala> val a = Integer(3)

a: scalax.math.BigRationalDomain.IS = 3



scala> val b = Integer(7)

b: scalax.math.BigRationalDomain.IS = 7



scala> a / b

res0: scalax.math.BigRationalDomain.N = 3/7



scala> a.inverse

res1: scalax.math.BigRationalDomain.N = 1/3



scala> a.inverse * a

res2: scalax.math.BigRationalDomain.N = 1



scala> (a / b)^2345

res3: scalax.math.BigRationalDomain.N = 70687450412160609527952431393691553815800419127671886519112946722799601077076407951140254943159198319665943578284183744314587093153151823879346566151398437630838022610858501398173872472014103905434680267558985498302895510754430260294086995134629615707980341285332639084519209821403179213183556756416404920769037158823806045214292550723098516538040...



scala> a * b

res4: scalax.math.BigRationalDomain.I = 21



scala> Integer(3) / Integer(6)

res5: scalax.math.BigRationalDomain.N = 1/2

Notice how an integer times an integer equals an integer, but an integer divided by an integer equals a rational. This is because integer is a subtype of rational, because in all integers are rationals, but not all rationals are integers.

More to come soon!

Sphere: Related Content

Friday, February 08, 2008

Linguistic Success

There's a new favorite pastime on the fringes of the Scala community. That pastime is blogging about aspects of the Scala language that will prevent it from being a "success" unless they are quickly addressed. "Success" is roughly defined as "widespread commercial use." A good metric might be: "How hard is it to find a job programming Scala?" The premise of the criticism usually revolves around one or more complicated, terse, or "foreign" (relative to the author) constructs that are common Scala, or at least favorites among frequent posters, and how these constructs will prevent "average" programmers from understanding Scala and thereby prevent commercial adoption. A recent favorite is symbolic function names (/: and :\ for folds).

The logic of this argument seems relatively sound. When choosing a technology for a project, it is very important to consider the availability of potential employees who know that technology. Forcing every new member to scale a potentially steep learning curve is a frightening prospect. I can imagine development managers lying awake at night fearing that their expert in some obscure technology will leave, and it will take months to replace him. It's a legitimate, although I think slightly exaggerated, fear.

That being said, I think it has little to do with the adoption of a new language. The majority of programmers simply do not spend their free time learning new languages. Many, maybe most, won't even use their free time to learn languages that they know have ready pre-existing demand. They learn when they are paid to learn, or when failing to learn will lead to immediate negative consequences for their career. I think the same can be said of most professions. Most people work because they have to, not because they want to.

Consequently, I think expanding beyond a core of enthusiasts is very difficult, if not impossible, simply be attracting people to the language. Right now some leading-edge Java people are taking a look at Scala, because they think it might be the next big thing in Java-land. These people are different than enthusiasts. Enthusiasts will learn a language for the sake of learning it. The leading-edge folks learn it as a high risk investment. If they can get a head start on the next big thing, it will be great for their careers (and businesses). These people constantly think "can I sell using this technology?" and "if I do sell it, while it come back to haunt me?" This is a very pragmatic perspective, and it is the perspective I take when I'm at work.

Confusing, odd-ball language features make the sell a lot harder. Pitching them as features increases personal risk.

But it doesn't matter.

Why? Because the vast majority of developers are not going to learn a new language because they want to, they are going to learn it because they have to. Not to mention that there are countless languages out there, so going after the enthusiasts and leading-edgers (who are mostly lookers anyway) is just fishing in an already over-fished pond.

So how does a language become a success?

Enough people use it for real projects. Let's say a consultant rapidly prototypes an application using the technology, and that application makes it into production. Now maintenance programmers have to learn that technology. Sure, it's complicated, but unlike the guy learning it on free weekends, the maintenance programmers have all-day every-day. It's hard to learn complex concepts a hour or two at a time, but these guys have all day, and the next day. It's hard to memorize something by looking at it a couple hours a week, but spend all day staring and it and it will click. Not to mention that their livelihoods depend on it. Any sort of "cowboy" development team can cause this to happen, and frankly such teams are pretty common.

So maybe one-in-five maintenance programmers actually like the technology, and admire the cowboys, so when they go get a chance to do new development, they use it, too.

The same thing can happen with products from startups. Let's say a startup builds a piece of enterprise software using Scala. They sell it to big, conservative companies by emphasizing the Java aspect. They sell customization services, too. And then it's back to the maintenance programmer, who has no choice.

Notice a pattern? The key to language success is making it powerful enough for a couple cowboys to do the work of an entire team in a shorter period of time. Selling fast and cheap is easy. If you have enough fast and cheap, the business people won't care if you are making it out of bubble-gum and duct-tape, because you are giving them what they want.

The key to success is making the reward justify the risk. Judging by what some people using Scala for real-world projects are saying, and my one hands-on experience, I think Scala offers it. It's just a matter of time before it sneaks its way into enterprises, just like Ruby has.

Sphere: Related Content

Monday, January 21, 2008

Programming Language Continuum

Introduction

Ever since I started getting into less-than-mainstream programming languages, I've pondered how to go about classifying them according to their attributes in the hopes that it would yield insight into what an ideal programming language would be. There is the genealogical way that traces roots and influences, but that doesn't break down attributes or make any qualitative judgements. Here I intend to mostly frame the context for debate, rather than draw significant conclusions. I believe bringing some more structure to the discussion is necessary before much more can be gleaned from it.

So what I've done is attempt to break the essence of a programming language down into two dimensions:

Enforced Structure / Dynamism
Engineered Foundations / Mathematical Foundations

Here's a loose attempt at classifying some languages:

Enforced Structure vs Dynamism

Recent debates have focused heavily on the difference between statically typed and dynamically typed languages. I believe that this is a specific case of the more general issue of "how much structure should the programming language force on the programmer?" Today we see Java versus Ruby, but it could just as easily be Ada versus Lisp. On you side of the debate you have people who believe that heavily structured languages are essential for scaling and helping ensure correctness. One of my college professors once said (paraphrased) "the compiler is your friend, help it catch your errors for you." Also, a well defined and compiler checked structure can help scale programming projects up to large teams by ensuring that all team members are working to the same program structure. Finally, they point out the sophisticated tools that available for many statically typed languages, particularly refactoring tools, but also code generators and static validators.

On the other side of the debate you have dynamic language advocates. They claim that statically typed languages are too constraining and require more up-front design than is practical, especially given the vague and changing requirements typical of software development projects. Furthermore, robust code an be achieved through automated testing and by significantly reducing the amount of code required to deliver the required functionality. Finally, they point out the quicker feedback cycles that dynamic languages enable by shortening the change->compile->deploy->test loop to change->test. There was a time when this loop was actually figured into software development cost and schedule estimates, but today outside of very large projects it is simply a cognitive disruption.

Engineered vs Mathematical Foundations

Most of the mainstream programming languages have been "engineered" or "designed" to better enable some group of programmers to better achieve some objective. For example, Ada was engineered to enable the development of very large, complicated and highly reliable systems by huge teams. SmallTalk was designed to enable the construction of modular software in a more natural or "human" manner. COBOL was designed to enable business analysts to program. The list goes on and on, but the common theme is that most of these languages were designed or engineered with very specific design goals, and those goals were largely disconnected from strong mathematical foundations.

On the other side of the spectrum you see languages that are very strongly influenced by computer science theory. Lisp started out as an executable implementation of the untyped lambda calculus and has stayed fairly true to that origin. Haskell combines the typed lambda calculus with category theory and focuses on functional purity. Prolog is based on logic, and SQL on relational theory. What these languages offer, especially strongly typed ones like Haskell, is that they enable computer programs to be much more easily reasoned about by theorem provers (and similar) and therefore can provide a much higher degree of safety. To the initiated, they also provide much more natural and elegant abstractions.

Hybrid Languages

The idea of dynamic languages with option type annotations is often raised as a potential means to bridge the divide between the advocates of static and dynamic languages. Common Lisp provides optional compile-time type checking, but in practice is seems to be used mostly as an optimization in special cases rather than as a means of ensuring correctness. Some touted StrongTalk as an answer, especially when Sun released it as open source, but it seems to have sputtered out. There was much talk about adding optional type annotations to Python, but I believe (please correct me if I am wrong) it is still absent from Python 3000. So while optional static typing is a favorite debate topic, its not popular in the languages that have it and attempts to add it to languages that don't have yet to reach fruition, despite considerable effort.

A recent phenomena, or perhaps simply recently receiving attention, are hybrid languages that attempt to blend concepts from strongly-typed functional languages such as Haskell with more mainstream object-oriented underpinnings. In my opinion, Scala is probably the best of these, as it offers innovative OO features in addition to functional ones, but others such as F# and OCaml certainly deserve mentioning, as do no doubt countless others that I am unfortunately going to neglect.

Conclusion

I think that hybrid static/dynamic languages will never really become popular - or more accurately that the capability will not be extensively used in the languages that offer it. There are a couple reasons for this. Primarily, I think that making static typing optional almost completely eliminates the benefits of static typing. Second, I think the primary advantage of dynamic languages is that they allow partially formed thoughts to be quickly expressed in "working" (meaning running, not correct) code, and that static typing is a major barrier to this. I personally find doing exploratory programming in dynamic languages to be much more pleasant and productive, but once ideas are concrete I prefer the compile-time checks and completeness of static typing.

I personally believe that languages that blend the engineered structure of OO with mathematical formalism represent the future of programming. Scala and F# are here, and both Java and C# are gradually acquiring features from strongly typed functional languages. What's going to be interesting is how it shakes out. If you peruse the Scala mailing list archives, you will notice that there is a marked tension between those from an object-oriented (engineered) perspective who enjoy the extra power that functional programming provides them, versus those from a more pure functional programming background (mathematical). Ultimately, at least from a popularity perspective, I think the more OO approach will win, as historically languages engineered to provide specific advantages have won out over languages with robust mathematical underpinnings.

Sphere: Related Content

Erik Engbrecht's Blog

Wednesday, March 05, 2008

Does Google Reader read GMail?

Monday, March 03, 2008

Floating Points Stike CNN!

Monday, February 18, 2008

Typesafe Ranged Integers in Scala

Integer Underflow and Overflow

Bounded Ranges

Rationals

Friday, February 08, 2008

Linguistic Success

Monday, January 21, 2008

Programming Language Continuum

Introduction

Enforced Structure vs Dynamism

Engineered vs Mathematical Foundations

Hybrid Languages

Conclusion

Blog Archive

Labels

About Me