iPhone. Single. Looking to make friends on any network.

I'm at SFO, connected to the public wifi, and in the span of 15min have already denied my MacBook Pro Lion from connecting to over 40 iPhones and iPads. What's going on?

Being a geek, a security geek, and slightly paranoid about what's going on in my laptop, I use a wonderful little utility called Hands Off! This app enables me to control network and file operations on a per program basis. Since connecting to the SFO wifi I'm being bombarded with pop-ups like this one:

According to this site usbmuxd is a "usbmuxd: USB Multiplex Daemon. This bit of software is in charge of talking to your iPhone or iPod Touch over USB and coordinating access to its services by other applications."

Other posts link this to iTunes and iPhone/iPad synchronization. I don't own an iPhone (it's a nice device but I love my Nexus S), do have an iPad, and am not currently running iTunes. Still my laptop detects all sorts of devices on the network.

I wonder if the owners realize they're broadcasting their names loud and clear?

The next step is to connect to some of these devices to see what they say. Unfortunately I have a flight to catch!

My Hammer is Better than your iPhone

Since the iPhone 4S came out, I've heard that Steve Jobs wanted to destroy it, that people are so much happier on their iPhones, even my friend Garry. But guess what? Though my primary computer is a MacBook Pro and I haven't been without an iPad since their launch, I really like my Android phone, yes Android phone, a Nexus S.

Now I know a smartphone is just about the most personal piece of technology you can buy: We carry them everywhere, play with them constantly (or until the battery runs out), and fuss over them assiduously. In that light, this post isn't an attempt to prove to you that Android is better than iOS, just a desire to share some of its qualities I appreciate.

1. Keyboard. Yes, I know Siri is amazing (or not), but most of the time you'll still be typing on that tiny keyboard. On iOS, that keyboard has barely evolved in four years and it blows. On Android you can actually replace the default keyboard. My favorite is Swype, it's fast, fluid, and feels natural. It almost achieves (dare I say it) Apple-level elegance. If Swype isn't your thing, SwiftKey X most certainly will be.

2. Home screen. Android allows you to do so much with your home screen than iOS. You can embed shortcuts to apps, documents, bookmarks, and even app-specific features. Widgets make your home screens even more useful by surfacing views into apps such as calendars, tickers, weather, etc. iOS5 makes up for this a little with the updated notifications but Android's options are way more powerful.

3. The buttons. Android has four buttons to iOS's one (which now has triple click functionality, talk about overload). The Home button is there as are Back, Menu, and Search. Back is the handiest IMO, esp. its ability to cross applications. Sharing something in one app? Go ahead, then hit Back and you're returned to your original flow.As an aside, one of my biggest beefs with Android apps is that they're not designed to take advantage of these buttons: Why include a magnifying glass on the screen when there's a search button available?

4. Long presses and sharing. Long presses, the ability to pull up a contextual menu by long pressing an object on the screen, sound trivial but used well they unclutter the UI and give users handy shortcuts to functions. Sharing, a feature almost all apps... share, lets you to send data (text, URLs, tweets, pictures, etc.) from one program to another. Natural and powerful.

Android is by no means perfect and the iPhone has a lot going for it (it is, after all, a cathedral), but hopefully this post redressed the balance a little, at least until someone with a hammer comes along!

Belgians Love Android

OK, you may not think so, but when our waiter put this pot of steaming mussels in front of me the other day in Belgium, I couldn't help but think: "Boy! That really looks like the Android mascot!"

Come on! You can see the resemblance right?

No?! How about now? :-)

Taming Software Complexity

Complexity is everywhere in our world: In our ever-growing canon of laws, in the volatile & unpredictable nature of the stock markets (esp. now with the abundance of autonomous trading systems), in our tax codeand of course in the second law of thermodynamics. It's little wonder then, as programmers the world over know, that complexity is definitely present in our software. Of all the long term threats to applications, complexity is perhaps second most critical (the first being no longer meeting user needs :-).

(Complexity can be beautiful too. Source)

Unfortunately, complexity goes hand in hand with success: the more popular an application, the more demands that are placed on it, and the longer its "working life". Left to their own devices both factors will increase complexity significantly. Life for a mildly successful app is even worse, the low demand usually results in a never-ending "maintenance mode" where poor code stewardship often abounds.

Without ways to tame complexity, any evolving piece of software, no matter how successful, will eventually collapse under the load imposed to simply maintain it.

How is software complexity defined? Many techniques have been proposed, from simple approaches such as lines per method, statement counts, and methods per class, to more esoteric-sounding metrics like efferent coupling. One of the most prevalent metrics in use today is cyclomatic complexity, usually calculated at the method level and defined as the number of independent paths within each method. Many tools exist to calculate it, at RelayHealth we've had good success with NDepend.

Identifying areas of complexity in the code base is easy. The hard part is deciding what to do about them. Options abound...

Big Ball of Mud
The "Do Nothing" approach is always worth exploring and it typically results in Brian Foote's Big Ball of Mud. Foote wrote the paper as, however noble the aspirations of their developers, many systems eventually devolve into a big, complex mass. He also notes that such systems can sometimes have appealing properties as long as their working life is expected to be mercifully short. Fate often intervenes though and woe betide the programmers stuck maintaining a big ball of mud.

Creating a big ball of mud is easy, just add code and mountain dew :-)

Let's assume that you'd like to stay out of the mud. What other options are there?

Some simple process changes can help fight complexity:
  • Analyze code metrics upon checkin and reject the new code if the files changed don't pass complexity targets (this will initially slow down development if  you impose it mid-flight but it will improve your code quality).
  • Allocate bandwidth for complexity bashing: reserve capacity such as 1 sprint every release, or a %age of total story points (e.g. 20% of all completed story points every month).
  • Temporal taming: Focus on different parts of the architecture over time, say a new area every month.
  • Something I've been wondering about: Are there processes that promote complexity? Or are some so time consuming that they prevent developers from addressing complexity?
  • Automation is a powerful tool. You can easily add exceptions to a manual process ("Oh, well if it's an app server in cluster B, then we need to run this additional program") but an automated process is a lot harder to complexify, and if it needs additional steps at least you'll know their execution will be consistent.

Complexity has spawned many solutions at the architecture / software engineering levels, though even something as basic as ensuring developers all have a common understanding of the architecture and documenting its basic idioms can go far. Other solutions are very well covered in our industry:
  • Design patterns. Tried and true approaches to common problems.
  • Aspect Oriented Programming. AOP's focus on abstracting common behaviors from the code base can reduce its complexity.
  • Service Orientation. Ruthlessly breaking up your applications into disparate, independent services reduces the overall complexity of the system. This is an SOA approach but without the burdening standards and machinery that armchair architects are prone to impose. One of my favorite examples of this approach, Amazon.com, has been using SOA since before anyone thought up the acronym. By creating loosely coupled services with standard interfaces it's much easier to update or completely replace a service compared to the same work in an inevitably intertwined monolithic application.

The most powerful weapon against the encroachment of complexity is culture: the shared conviction among developers that everyone needs to pitch in to reduce it.
  • Refactoring: developers should feel empowered to refactor code that's overly complex, not in line with the evolution of the architecture, or simply way too ugly. Two key enablers are required and both need a strong cultural buy-in:
  • A solid set of unit and other tests so the developers knows if they've broken something
  • A fast build & test cycle. Most developers like to work in small increments. Make a small change, test it. If it takes 15min for a build & test cycle, very few developers are going to refactor anything that isn't directly in their path. I really like the work Etsy has done in this area as well as culture in general by focusing on developer happiness
  • Adopt this maxim: "Leave each class a little better than when you found it". Even if it's a small change - adding a comment, reformatting a few lines of code - taken in aggregate these changes really add up over time.
  • Remove features. I heard one of Instagram's founders state that they spend a good deal of time removing features as well as adding them. That was probably a slight exaggeration, but removing features can be very powerful in terms of fighting complexity: both directly (fewer lines of code == lower complexity, except with perl ;-), and indirectly as a signal to the team and your customers.
  • What have I missed? I haven't written about complexity as the database level though, while we're on the topic, I suspect that however much I like NOSQL databases, their rise will increase data complexity in the long term. The leeway many provide developers in storing information will make it very hard to manage it: data elements will be needlessly duplicated, inconsistencies within elements will abound, etc. Error recovery will be critical, as will a strong business layer to help provide consistency.

    (Another source of complexity! Source :-) 

    Happy simplifying!

    How eBay Scales its Platform

    In these days of discussing how Facebook, Twitter, Foursquare, Tumblr, and others scale, I don't often think of eBay anymore. Yet eBay, despite its age and ugly UI, is still one of the largest sites on the internet, esp. given its global nature. So I enjoyed this QCon talk by Randy Shoup, eBay Chief Engineer, about Best Practices for Large-Scale Websites.

    Here are few lessons that caught my eye:
    • Partition functions as well as data: eBay has 220 clusters of servers running different functions like bidding, search, etc. This is the same model Amazon and other use
    • Asynchrony everywhere: The only way to scale is to allow events to flow asynchronously throughout the app
    • Consistency isn't a yes/no issue: A few datastores require immediate consistency (Bids), most can handle eventual consistency (Search), a few can have no consistency (Preferences)
    • Automate everything and embed machine learning in your automation loops so the system improves on its own
    • Master-detail storage is done detail first, then master. If a transaction fails in the middle, eBay prefers having unreachable details than a partial master record. Reconciliation processes clean up orphaned detail records
    • Schema-wise, eBay is moving away from strict schemas towards key/value pairs and object storage
    • Transitional states are the norm. Most of the time eBay is running multiple versions of its systems in parallel, it's rare that all parts of a system are in sync. This means that backwards compatibility is essential
    • "It is fundamentally the consumer's responsibility to manage unavailability and quality-of-service violations." In other words: expect and prepare for failure
    • You have only one authoritative source of truth for each piece of data but many secondary sources, which are often preferable to the system of record itself
    • There is never enough data: Collect everything you never know you'll need. eBay processes 50TB of new data / day and analyzes 50PB of data / day. Predictions in the long tail require massive amounts of data

    Platforms as a Service Revisited

    In September 2007, Marc Andreessen wrote a thought provoking blog post describing a way to categorize different types of Platforms as a Service (PaaS). Over the years we’ve often made use of Andreessen’s levels within our Engineering team as a convenient way to discuss how we want our own platform to evolve.

    So I was surprised when I went looking for that article the other day and it was nowhere to be found on Andreessen’s blog! Fortunately I was able to rescue it from oblivion thanks to the internet archive. I’ve also uploaded a PDF of the article to Scribd.

    Andreessen’s premise is that there are three levels of internet platforms (the term Platform as a Service didn’t exist back then):
    1. At level 1 a platform is essentially a series of APIs in the cloud (another term that had yet to make its appearance) that app developers can leverage in their own apps.
    2. The prime example of a level 2 platform is Facebook. In addition to the APIs it makes available, Facebook also gives developers a way to embed their apps into its user interface.
    3. A level 3 platform achieves something the other two levels can’t: It runs developers’ apps for them. Examples here include Force.com (Salesforce.com’s PaaS) and Andreessen’s own Ning.

    As the levels go up they become easier for developers to program on and manage. A company working on a level 3 platform shouldn’t need to worry about hardware and operating systems. The categories aren’t perfect though: Amazon’s Web Services offerings are clearly level 3 (they run your code) while forcing you to still manage a virtual infrastructure.

    That said, many platforms do fit the model well. Thousands of companies offer APIs and therefore qualify as level 1. Platforms such as Google App Engine, Microsoft Azure, Heroku, EngineYard, etc. all offer flavors of level 3, some “purer” than others, I.e. with more or less hardware/OS abstraction.

    At RelayHealth we put a number of APIs at our partners’ and clients’ disposal. Some are web services, others rely on healthcare specific protocols such as HL7 or CCD riding on top of a variety of communication channels.

    Our approach to level 2 turns Andreessen’s definition inside out: instead of embedding third party apps into our UI, we make it easy for them to customize and embed our modules into their applications. This is important to many partners as building features like ePrescribing themselves is prohibitive. By providing these capabilities we enable our partners not only to deliver key features to their customers but also complete their EHR certifications (vital so their clients qualify for federal incentives).

    Regardless of your approach, if you’re building a whole platform or some simple APIs, Andreessen’s article is worth reading.

    Microsoft's Losing the Web Application War

    Netcraft's surveys of internet web servers has been a staple of the net since 1995. Eons in internet time! In those early days I fondly remember regularly checking Netcraft for updates and discussing the merits of the various web servers with friends and colleagues.

    I hadn't thought of Netcraft in years and when I suddenly remembered them the other day, I had to go check. How were the different web servers doing?

    I wasn't surprised by the rapid growth of Apache but I was surprised by the dramatic fall and subsequent slow rise of Microsoft's web server.  According to Netcraft the drop in Jan-Jun 2009 was caused by a reduction in activity in Microsoft's Live Sites.

    Looking at web server popularity in relative terms, the slow rise becomes a rapid decline in market share: from close to 40% to around 15% penetration in four years.

    It's useful to remember that there are lies, damn lies, and statistics. There could be many explanations for Microsoft IIS' relative decline. 

    One is that Netcraft is measuring incorrectly. Netcraft has been at this for a long time, so I'm going to assume they know how accurately count servers. Part of the decline is due to the Live Spaces moving to Wordpress. Clearly Microsoft doesn't view blogging as strategic. Fair enough.

    Another point to keep in mind is that Netcraft's survey is internet-focused. If they could survey intranets, I'm sure the number of IIS servers would be significantly higher.

    Still, I can't help thinking that this is yet another front Microsoft is losing ground on. And the web server is just the tip of the iceberg. Internet sites aren't choosing Apache as much as they are choosing web application stacks that use it.

    Continued loss of web application stack market share will have significant repercussions in terms revenues. Hard costs such as server and software licenses. Soft costs such as losing popularity among developers. This isn't enough of a reason to ditch Microsoft for established sites. It is a reason to think twice before going with the Microsoft stack for new projects.

    It's a shame. Microsoft's web application stack has decent technology, and Microsoft has smart engineers. They are quite capable of innovating in this space. They're just not doing so.

    Learn the Zen of a Programming Language with Koans

    I love the idea behind Ruby Koans: write a set of failing unit tests that teach you about the essence of ruby as make every test turn green. It's a brilliant idea. The tests themselves are usually simple and illustrative. You even get encouragement (or enlightenment? :-) as you fix them.

    The good news is that this idea has spread beyond ruby. There are koans in many languages:

    While learning a programming language is best achieved by writing a useful application, these koans are a very welcome (and fun!) addition.

    Hashtables in Mathematica

    I fondly think of Mathematica as a "kitchen sink language": other than that proverbial kitchen sink, it has functions for pretty much anything you can think of.

    Why then does it not have a hashtable data type?

    It turns out that it doesn't need one. Hashtables are built into the language at a fundamental level. Just start typing:

     h[foo] = 1;
     h[bar] = 2;


    And you have a hashtable!

    It's not quite that simple though. What if you want to list all the keys used in the hashtable? This function (from a handy StackOverflow answer) takes care of that:

     keys = DownValues[#][[All, 1, 1, 1]] &;
     { bar, foo }


    Recently I was playing with NOAA earthquake data in Mathematica provided in the form of a TSV (Tab Separated Values) file. Mathematica easily parses it into a list:

     ed = Import[NotebookDirectory[] <> "EarthquakeData", "TSV"]
     Take[ed, 5] // TableForm
     8204 Tsu 2009 1 3 ...
     8211 Tsu 2009 1 3 ...
     8210 2009 1 8 ...
     8250 Tsu 2009 1 15 ...


    This was a good start but the data wasn't in a very useful form. What I wanted was to be able to address the data by column name and row number, so I wrote this helper function:

     MakeHash[hash_, a_] := Module[
      {keys = First[a]},
       hash[keys[[i]], j - 1] = a[[j, i]],
       {i, 1, Length[keys]}, {j, 2, Length[a]}];
      hash[Dim] = {Length[a] - 1, Length[keys]};
      hash[Rows] = Length[a] - 1;
      hash[Cols] = Length[keys];
      hash[Keys] = keys;


    The first parameter is the name of the hash to create, the second is the array to parse (assuming the first row represents column headers). It's now easy to access the elements you want.

     MakeHash[ehash, ed]


    You'll notice MakeHash adds some convenience entries in the hashtable. I even included one for keys, despite the function we defined earlier on. It ensures MakeHash is self contained and also deals with a limitation of the keys function as it stands. As we're dealing with a two dimensional hashtable, the keys function considers each key (i.e. ID,1 and ID,2 etc.) as distinct, so returns way too many of them.

     46 (* expected *)
     2074 (* woah! *)
     (* Let's fix this by eliminating dupes with Union *)
     keys = Union[DownValues[#][[All, 1, 1, 1]]] &;

    Why 50 keys and not 46? Because MakeHash added four more: Dim, Rows, Cols, and Keys.

    Solving the Hyperpublic Coding Challenge in Mathematica

    I quite enjoy the various coding challenges that appear to be gaining in popularity. Greplin's was a lot of fun and yesterday Hyperpublic released one of their own. Since I just noticed that a number of people have posted their answers I thought I'd post mine.

    I like to use Mathematica for these challenges. Why? It's a really powerful environment, it's a lot of fun to use, I have a copy :-), and completing these challenges always teaches me more about Mathematica itself.

    Challenge 1

    In this test you're given a file that represents which users have invited others, defines an influence metric, and asks you to find the influences of the top three influencers.

    Read in the sample file
    l = ReadList[NotebookDirectory[] <> "Hyperpublic Q1.txt", String];

    Define a function that returns the positions of the Xs in a line
    CalcFriends[s] := Map[(#[[1]] &), StringPosition[s, "X"]]
    This returns a list of the positions of the Xs in a String
    E.g. CalcFriends of the fifteenth line (which represents a user with four friends) generates the indices of those friends
    {12, 23, 84, 93}
    A line with all Os (no friends for this user) gives
    { }

    Map CalcFriends over the list of lines
    f = CalcFriends[#] & /@ l

    A recursive function to calculate a user's influence
    Influence[l_List] := Length[l] + Fold[Plus, 0, Influence[f[[#]]] & /@ l]

    Now we just map Influence over the output of CalcFriends and take the top 3

    Take[Reverse[Sort[Influence[#] & /@ f]], 3]

    Challenge 2

    Here we're (essentially) asked to find the minimum of moves to achieve a target.

    For some reason, even though I knew this was a linear optimization problem, I started coding it. A mistake caused me to rethink my approach which, when you're using Mathematica, usually goes along the lines of "Why am I writing code?! I'm sure there's a function for this in here somewhere!" :-)

    Lo and behold, there was:
    FindMinimum[ {p1 + p2 + p3 + p4 + p5 + p6, 
    2 p1 + 3 p2 + 17 p3 + 23 p4 + 42 p5 + 98 p6 == 2349 && 
    p1 >= 0 && p2 >= 0 && p3 >= 0 && p4 >= 0 && p5 >= 0 && p6 >= 0 && 
    p6 ∈ Integers && p5 ∈ Integers && p4 ∈ Integers && p3 ∈ Integers && p2 ∈ Integers }, 
    {p1, p2, p3, p4, p5, p6}]

    But you should see Yaroslav Bulatov's solution for this problem, it's much more elegant.

    Fun stuff... Not only does Mathematica give you some great tools for solving problems, it also solves them fast.