11 August 2009

Microsoft: the bold, yet timid, giant

Contrary to popular belief, Microsoft is an innovator. I myself participated in ground-breaking technological efforts in my years as a program manager and planner in the Windows division, and the projects I worked on are just the tip of the iceberg. Unfortunately, many of those innovations are in areas that never matter in the grand scheme of things, or are undermined by a simultaneous streak of timidity that courses through the organization. Microsoft is bold and timid, all at the same time.

It is not my intention to smear Microsoft. To paraphrase Tolstoy, every family has it’s issues, and Microsoft is no exception. My over-all experience at the company was a positive one. Much of the hostility, and consternation, towards Microsoft results from the simple fact that it is difficult for outsiders to comprehend the truly Herculean problems the company faces with creating and maintaining the most widely used software products in the world. So many people use Microsoft’s products, in so many ways, that almost any decision will wind up stepping on someone’s toes. Microsoft takes the commitment to prevent customer disruptions seriously (even if it doesn’t seem so from the outside), and moves heaven and earth in its efforts to do so.

That said, my purpose in this article is to outline some of the chronic problems I have seen in my tenure at the firm. I can count on my friends still at the company to do their job highlighting the company strengths (which are significant). It should also be noted that Microsoft is in no way unique with having engineering and corporate political problems. But the examples which follow will at least illustrate that the company is not immune from such issues.

By the way, I use projects I was directly involved with for most of my examples simply because I know so much about them, not to infer that these are the most egregious missteps Microsoft has made. I have great respect for the people I worked with on these projects, and know that everyone was always doing what they honestly felt was best for the company, and its customers.


Brute force innovation

The centralized control over software development, and world-class engineering processes, allows Microsoft to pursue massive undertakings that are simply impossible, or unheard of, in its competitors. Senior managers can (and do) create mandates which set whole armies of engineers in motion to ensure that new initiatives, or technologies, are thoroughly supported in every single line of code.

In Windows Vista I led an effort to drive adoption of IPv6 throughout the entire operating system. We created a massive effort to get every engineering team in the Windows division to test, and qualify, their code to work properly with IPv6. I am proud to say that the end result was the world’s first operating system that is nearly 100% IPv6 compatible (with only a couple unavoidable exceptions), and able to run on a network that has absolutely no IPv4 service.

This may be in the class of brute force innovation, but it’s innovation nonetheless. No other operating system has undergone the same degree of thoroughness, or dedication, to supporting IPv6. Linux hasn't done it, OS X hasn’t done it. No one has. Having gone through the experience of managing this massive transition effort, I can attest that supporting IPv6 is NOT as trivial as it seems. There are plenty of issues that were uncovered in tests with IPv6 only networks where we discovered strange (and unexpected) dependencies of IPv4.

On the other hand, as impressive as this achievement of reaching IPv6 purity is, one has to wonder why so much effort was expended in the first place? As far I know, no one is running IPv6 pure networks in anything other than a test lab. Sure, the US Department of Defense mandated that all products it purchased would have to pass tests in an IPv6-only environment, but they eventually rescinded that requirement, and most software firms (and operating systems) were able to play along just enough to provide sufficient IPv6 support to avoid being banned from government procurement lists. After all, there are really only a handful of services that most people really use, and the fact that some functions of Linux or OS X don’t fully support IPv6 has never raised an eyebrow.

It can certainly be said that this unparalleled extensive support of IPv6 has meant absolutely nothing to end-users, who won’t be using IPv6 networks at home (or at work) for many years.

To be fair, Microsoft is in somewhat of a unique situation in the marketplace which forces it’s hands into unnatural acts that competitors can slough off. The government isn’t a big customer of OS X, and Linux is free, which allows it to slip customer requirements without any trouble. The fact that Linux can be customized to such an extensive degree also means that it is possible for a vendor to provide a genuinely IPv6 pure version of Linux merely by never installing the components that aren’t IPv6 compatible.

Over-engineering

Another of Microsoft’s chief vices is to over-engineer things. Engineers are always looking for ways to overhaul old sub-systems with monster functionality, and breath-taking designs. This is akin to building a rocket powered scooter with fuel cells when all a kid wants is a skateboard to get across the street.

For my case study of this phenomena, I give you the Windows Filtering Platform (WFP) introduced in Vista. In conjunction with a completely rebuilt network stack (the value of which could also be debated), WFP represented the first time a comprehensive set of APIs were purpose built in Windows to allow software developers to manipulate, and control, packets. This was a laudable effort indeed. The network stack in Windows XP (which was based on the one in Windows 2000), was never designed to allow third parties to examine, manipulate, and control, packets. It was simply never conceived that software developers would have a need for low-level packet manipulation. This was before anyone considered that the security threats on the Internet would lead to the wide-spread adoption of host firewalls, all of which depend on low-level access to packets.

A few crude APIs for working with packets were tacked on top of the Windows XP stack as an after-thought, but software developers needing this kind of functionality were largely forced to adopt strange and unnatural strategies to achieve their goals. Firewall developers would often create fake network drivers, pretending to be network hardware. It was easy to make mistakes with this kind of hack. For example, how do you ensure that your fake network interface card driver properly sends the power-up and power-down commands the operating system expects, even though no network card exists? Consequently, host firewalls quickly became one of the primary causes of crashes in Windows XP. No wonder the engineers wanted to remedy this in Vista by finally creating some APIs that were designed from the ground up to allow low-level access to the network stack.

So far so good, but this is where things run off the rails. Instead of creating some simple to use APIs for working with packets, the Microsoft engineers (including myself), decided to take the bull by the horns and build an all-encompassing filtering engine that would keep track of, and manage, all the various requests for working with packets that might be sent from numerous applications. Instead of leaving users struggling in a world of chaos, with different security systems sending conflicting orders to the stack to drop, or allow, network traffic, the all-knowing Windows Filtering Platform would intelligently determine which of all the various instructions should win to give the user what they want (i.e. a pleasant, yet secure, experience).

The reality, however, is that no vendor of security software would ever want to leave the decisions of what to allow (and disallow) up to someone else. Well into the beta cycle of Vista it became glaringly apparent that few, if any, firewall vendors were willing to adopt WFP since it would mean giving up control over network activity to the operating system. Microsoft engineers reluctantly conceded the point and created a “veto” flag for filters, which would ensure that nothing could override it when the option was set.

In the end, WFP can legitimately be called a success. Virtually every firewall in existence now uses WFP, and crashes due to firewalls have dropped significantly. But the dirty secret is that most filters placed into WFP use the veto flag, which pretty much renders the beautifully engineered filtering platform useless.

If Microsoft had only set out to create some easier, comprehensive, APIs for packet manipulation, they could have achieved the same result with a significantly lower expenditure in resources. Further, the WFP APIs are needlessly complex for what they are actually being used for, thereby creating more expense for third party developers.

This is a classic example (amongst many) where Microsoft could have achieved the same goal by scaling down its ambitions and actually doing a little less innovation. There was nothing devious afoot with WFP. There was no grand scheme to undermine security, or put any partners out of business. Microsoft’s engineers were diligently trying to solve a real problem (i.e. the chaos of conflicting security systems messing with packets), but lost sight of the market realities.

Unfortunately, I saw similar situations occur in Windows 7, and have heard talk of other such ambitious schemes for future OS releases.

Fields of dreams

Luckily, the Windows Filtering Platform did not fall into the category of ignominy, reserved for ambitious technologies that never quite take flight. At least WFP has become widely adopted. By contrast, there are many significant technologies, and platforms, that are built into Microsoft’s products that are only ever used at the margins, and never see wide adoption. The details for why these failures occur are numerous, but the result is the same: a technology gets built that just collects dust but never gets removed, and still requires constant maintenance. This is what I call the “field of dreams” phenomena: a belief that if Microsoft creates something, then people will use it.

Picking on networking technologies again (merely because I know them so well, having worked in the network engineering team), Peer to Peer (P2P) networking and IPSec figure prominently here. Yes, you can point to examples of where some organizations have adopted IPSec to secure their network traffic, but the reality is that these significant engineering efforts have never really seen wide-spread usage.

P2P and IPSec are particularly good examples of how good innovations go awry. P2P is a whole suite of technologies, built on top of IPv6, which are designed to make peer-to-peer networking a tour de force in Vista, allowing 3rd party developers to easily incorporate peer-to-peer features. On paper this all sounds great, but constructing this on top of IPv6 (which no one uses), and the lack of key functionality make this unappealing for most developers to consider. The fact that only limited P2P functionality was introduced on Windows XP was also the kiss of death. Which software developer wants to adopt a technology that can only be used on the latest operating system?

IPSec has been suffering similar problems since Windows 2000. Over the years Microsoft has put a huge investment into creating innovations to make the management of fully authenticated, and encrypted, networks a reality. IPSec is an old IETF standard (supported by every OS in existence), but it is rarely used due to the sheer complexity of managing certificated, and creating policies. When IPSec is used, it is generally only for specific functions like VPN access. If you have a network with nothing but Windows Vista and Windows Server 2003 machines, it is quite possible to ensure that all network traffic is fully secured, at all times. The edge firewall is obsolete.

Despite many years of effort, wide use of end-to-end IPSec has yet to get off the ground. There are some very noteworthy case studies where organizations have fully deployed IPSec throughout their network (including Microsoft), but these are the exceptions. Poor cross-platform support has plagued Microsoft’s IPSec efforts from the very start. This is exacerbated by compatibility lags between each release of Windows. New functionality released in Vista (to overcome some IPSec deployment issues) was never brought back to Windows XP.

More importantly, Microsoft’s entire IPSec strategy ignored the needs and wishes of key players in the networking space (has anyone heard of Cisco?). Understandably, most network technology providers have always been luke-warm, at best, to Microsoft’s IPSec vision. What value is there for all manner of network security, and traffic management, products when all traffic becomes encrypted? When all traffic is gobbledygook, automated management tools can’t differentiate between traffic being used for Skype, accessing e-mail servers, or video games. So much for the ability of IT managers to give higher priority to particular types of traffic, or block others altogether.

Of course, Microsoft’s engineers have answers for most of these things (e.g. some 3rd parties build Linux plug-ins for Microsoft’s IPSec policy management system, and there are strategies for allowing IPSec to be deployed without encryption that allow traffic to be managed), but the end result is the same: IPSec is still only used on the margins. Most of these problems were foreseeable early on, but institutional momentum, and a grand vision that is too compelling to die, has kept Microsoft plugging away at it for a decade.

IPSec and P2P are yet more examples of Microsoft’s significant achievements in fruitless innovation. Who knows, maybe the Direct Access VPN feature in Windows 7 will be the technology that finally pulls IPSec out of obscurity, but I have my doubts. VPN usage is in terminal decline, as more and more corporate hosts get put directly on the Internet. Most organizations already allow access to e-mail directly from the Internet (which is the most common reason people need VPN access to corporate networks), and many other key services are being put directly on the Internet as well (e.g. CRM with Salesforce.com, etc). Direct Access also continues to suffer from the perpetual Windows curse of poor down-level and cross-platform support.

Little orphans

At least IPSec manages to sustain enough momentum to see continual improvements across multiple operating system releases. There are many other technologies that are not nearly so privileged.

There are numerous grand initiatives, with the best of intentions, which get built, but become orphans almost from the day they are released. Quite often these ambitious projects fail because the feature list is cut back so drastically to allow them to ship on schedule, that they lack critical capabilities that would make them popular. I know this is difficult for outsiders to understand, but many Microsoft engineering groups run on shoe-strings. I was a program manager for one technology in the networking group that had just one developer, when a small competitor with only $50 million in revenue had 10 engineers working on this. Is there any wonder that our spunky competition was so easily able to run circles around us, adding features, and capabilities, that we could only dream of?

These resourcing problems are particularly acute in Microsoft’s big products (like Windows and Office), since so much is bundled together. This makes it very difficult to know just how many resources should be devoted to any one area since there is no way to tell which features are most responsible for generating OS sales. When product teams make their cases to executives for resources, they are hard pressed to show how much doing a particular feature will contribute to increasing over-all revenue.

Any technology that doesn’t get widely adopted after its initial release is liable to find itself abandoned, and never improved upon later down the road. ClickOnce is an example of just such a technology. It was initially envisioned as a replacement for the Windows Installer (a.k.a. MSI), that would be free from the myriad headaches faced by application installation packages. However, ClickOnce functionality was pruned so much to allow it to ship on time, that it would only work for the most basic types of applications that had no need of using any kind of operating system extensions (e.g. the ability to automatically open an application with clicking on a designated file type).

It is far easier for developers to create ClickOnce installation packages than to do so with the Windows Installer, and ClickOnce applications are easily updated, and offer few nasty side effects for end users to worry about. Unfortunately, the limitations of ClickOnce render it unusable for most software.

The paltry uptake of ClickOnce, after its initial release, resulted in a virtual stop to the original vision of creating a grand new replacement for the Windows Installer. It was too difficult to make a compelling argument that more money should be spent on ClickOnce to actually realize the original intent since usage was so low. This is a circular problem. You will never get sufficient adoption without additional investment, but you can’t justify the investment without getting the adoption.

An unfortunate side-effect of all the prevarication over ClickOnce is that the Windows Installer itself (the veritable workhorse for installing the majority of software written for Windows) has been put on ice as well. Microsoft has made only the bare minimum of investments in both ClickOnce and the Windows Installer for many years. Yes, there are some minor improvements to the Windows Installer and ClickOnce in Windows 7, but “minor” is the operative word.

When Microsoft’s engineers decide that a given technology is too antiquated, and needs to be put to rest, they put it on life-support, even if they don’t have any viable alternatives in the immediate future (there is ALWAYS talk of building some amazing new technology that will replace all the old ones, most of which never sees the light of day). The Graphics Device Interface (GDI) is yet another example of this phenomena.

Sadly, this often means that key technologies (like the Windows Installer and GDI) can go through multiple OS releases, spanning a decade or more, with no real investments to speak of, when some very minor improvements could solve a lot of pain that developers (and users) are facing. Once the product teams reach a point where they feel that a given technology is antiquated, it is harder than pulling teeth from a rodent to get them to touch the code and make additional investments. Instead, energies will be spent spinning up proposals on the NEXT big technology that would replace everything.

Having seen enough of these grand schemes come and go over the years, I can honestly say that precious few of them ever amount to anything, and most wind up as instant orphans if they are lucky enough to get built in the first place.

Which brings me to a counter-intuitive conclusion: in many cases Microsoft would be far better served if it was less innovative, and really dug into the hard work of incremental improvements. I am not against taking risks, and making bold investments in new technologies. But if you know your new creation won’t be able to achieve a critical mass of functionality out of the gate, then you would be better off not even trying, and putting your resources into the tedious effort of improving what already exists.

NOTE: If you could spare about 10 minutes, I would love to get your input on a survey I am conducting about how the recession is impacting your IT spending.

10 comments:

  1. Thanks for the M/S information, you have specified what I have assumed true all along. Much of the "improvements" since Windows 98are moot, users don't need them. I had the wonderful experience the last couple months to upgrade to Excel 2007 from 2003.....what a nightmare to update my monthly charts without a training class...LOL

    The bottom line, I use the Excel 2003 and 2007 the same, but the method of source data for charts was totally different and you know the joke....Help doesn't Help....LOL

    Remember when PCs first came out and they said we'd go paperless? LOL...ever since PCs there's been twice as much paper used. When I do a Word doc I always make a paper copy to check it....why is it checking on a computer screen always overlooks errors? LOL

    ReplyDelete
  2. Michael,

    There are two areas where your posting fails the sniff test. The first is that you are attributing a series of decisions and policies mandated by one misguided networking VP in one corner of a single product, Windows, to all of Microsoft. Not all Windows works that way!, and many other product groups at Microsoft are much more customer focused.

    The second is the distinction between innovation and invention. Arguably, if you take Buxton's view, http://billbuxton.com/innovationInvention.pdf
    innovation focused on the extraction of value rather than invention. By the argument you present here, no value was extracted, hence you can not claim innovation. Again, other groups at Microsoft definitely drive both invention and innovation (Sharepoint comes to mind as a clear example).

    Food for thought.

    ReplyDelete
  3. you are attributing a series of decisions and policies mandated by one misguided networking VP in one corner of a single product

    You will notice that I mentioned GDI and installation technologies which were clearly NOT under the pervue of networking executive management. I know of many other examples outside of networking I can cite (even in IE), but I stuck to the ones I knew best for this post.

    By the way, I should make it clear that not EVERYTHING the product groups do is a disaster. In fact, I believe the good outweighs the bad. And even the glitches I outlined in my article aren't complete debacles. WFP, for example, actually does work, and is serving an important function even today.

    ReplyDelete
  4. Oh, God, Mikhail...

    "[..]a completely rebuilt network stack (the value of which could also be debated)[...]"

    Please, just keep that Vista/Windows 7 thing out of too much "innovation" at TCP/IP level. It keeps randomly disconnecting and reconnecting to a (perfectly functional) IPv4 network, based on Linux routers, making the usage of real-time applications impossible...

    ReplyDelete
  5. You provide an interesting perspective on Microsoft's design and engineering decisions. Yet, although you allude to it, you fail to mention the core problem: the brute force innovation you discuss is required, because many decisions at Microsoft lead to inelegant and overly complicated designs. Microsoft's application programming and user interfaces are full of inconsistencies. See, for instance, how many UI elements offer a right-click help or how ways are there in Win32 to find a text string's width. This happens, because designs are apparently produced with more thought to scratching today's itch and conquering a market than to architectural and technical elegance. Later on, as you write, these inconsistencies come back to bite with vengeance as Microsoft tries to evolve the product while staying compatible with legacy code. The world is full of timeless elegant designs: the Unix API and its shell, Ethernet, OpenGL, the C programming language, TCP/IP, Postscript, the web. It's no coincidence that none of them was developed by Microsoft.

    ReplyDelete
  6. Overall - a great post, and many of us REALLY appreciate the hard work in getting IPv6 integrated into new products.

    I do disagree with "It can certainly be said that this unparalleled extensive support of IPv6 has meant absolutely nothing to end-users, who won’t be using IPv6 networks at home (or at work) for many years." ... the IPv4 exhaustion 'brick wall' leads me to believe it will be in the 'couple' to 'few' years, not 'many' :).


    /TJ

    ReplyDelete
  7. A great post - thanks Mikhail

    In addition to the insights into process and decision making, I think it also demonstrates that Microsoft inculcates, and/or hires for, a tendency towards self-criticism that verges on self-flagellation

    ReplyDelete
  8. I think it also demonstrates that Microsoft inculcates, and/or hires for, a tendency towards self-criticism that verges on self-flagellation

    This isn't the impression I had. The people who seemed to have the most successful careers at Microsoft were those who were willing to step up and execute whatever the goals of their managers, and organizations, with a minimum of questioning.

    Let's not forget that I am not working at Microsoft anymore.

    ReplyDelete
  9. While some critique the fact that this is about a small segment of the whole. Everything is a fractal.

    food for thought.

    ReplyDelete
  10. I'm going to have to agree with MSFTie on this one.
    I spent 3 years in the Networking group under a certain VP, buried to the neck in the same things you were, and while I was there, I knew it was dysfunctional, but I never understood why. I thought all groups worked that way -- just like you do.

    It really wasn't until I left and experienced other groups both in and outside of Windows that I understood that it was *not* always a Microsoft thing.

    The VP in charge was the source of the dysfunction. He was (is) a smart, guy, with grand ambitions and the best of intentions, but he should never have been a VP. His visions were never backed up by a clear plan to bring those inventions to market, and he had so many things going at once that he starved everything. He never saw a technology or project that he didn't want to participate in.

    He also built one the most dysfunctional organizational structures I've ever seen or had the sadness to be a part of. The endless series of PUMs and GMs (a GM with one report who is a PUM? WTF?) that fought amongst themselves for increased charter or headcount. Even worse, he trained a mini-generation of leaders whose primary skill was how to make their groups look world-changing so they get more headcount -- those leaders remained in charge of Networking even after he left.

    The dysfunction of the Networking group was an extreme case, but some of the same problems were echoed in the rest of the org-formerly-known-as-COSD -- and ultimately comes from a focus on core technology without particularly caring about the reality of users and developers.

    As for your other examples (ClickOnce and GDI), your perspective is a Windows-centric one. For the past 8-10 years, devdiv has been building their own abstraction (.Net), with the intention of replacing the Win32 API. ClickOnce is not a replacement for Windows Installer -- it's a .Net app deployer. Similarly, GDI+ was replaced by WPF (Avalon) and all of the innovation was there. Of course, neither of these technologies has much takeup -- because our customers needs changed. Devdiv has been following our customers who are no longer interested in client development, and want to build everything on the web.

    Looking at devdiv's numbers in the last several years, clearly, it has been a very profitable strategy. But, if you worked in Windows (or were trying to develop non-.Net Windows client software), the clear impression is that we have APIs that were abandoned. Yup -- they were. But they were replaced by .Net APIs.

    If you look *just* at devdiv and ignore Windows, Microsoft made all of the right decisions. We've built a profitable business helping our customers do exactly what they want to do -- build web apps/sites. In fact, we took on the JavaEE juggernaut -- backed by *every* single one of our major competitors -- and have successfully created an ecosystem that matches it or exceeds. That is MS being bold, *and* delivering for our customers and partners.

    Now, you could argue that we have ignored Windows client development, and ClickOnce & GDI are examples of that. That's true. But, sometimes you have spend your time doing one thing right, and letting other things drop, even if it’s painful. And that's what devdiv did and it paid off for them.

    I'm sorry Michael, but you spent too much time in Networking (and Windows), and you've come away with very skewed viewpoints on Microsoft development. You're not wrong -- we have made many missteps, particular in Windows and in the consumer space in general -- but you have seen some of the more extreme cases -- and inappropriately applied them as general rules across the company.

    We have been bold and successful in a number areas – Exchange, SQL Server, CRM, SharePoint, DevDiv, and in the enterprise space in general.

    Happily, most of those leaders are gone from Windows (and Mobile, and Online), and the new leadership are smart *and* sensible with the right strategy in place to be both bold *and* successful.

    ReplyDelete