I want for a moment to address a weakness that I think exists in Microsofts tool offerings. Microsoft does not offer tools for building ultra high performance, extremely fault tolerant and widely distributed applications. While some may consider this to be a niche market I do not. And further more the applications, platforms and tools that require such technology will be integral to a much wider array of applications through the services that they offer. A more concrete example would be a search engine. Search engines have massive computing needs and need to be infinitely scalable. Consider an application like an Electronic Medical Record being run as an ASP. In order to handle the plethora of data from thousands of sources and to enable the application to always be available new computing approaches need to be developed.
Forget everything that you have learned about traditional applications. N-Tier applications have difficulty scaling up to the size that is required by a large scale web application. Traditional components like databases will continue to exist and be important but the features that are important will change. For instance startups and large companies building webscale applications have increasingly turned to MySQL. MySQL enables horizontal scaling so that to increase capacity you simply add servers. This is very different from something like Microsoft SQL Server where you need to scale up onto larger machines. Don’t think that MySQL is ready for prime time? Yahoo, Amazon, Nokia, and Google disagree.
However that is not a significant departure from traditional applications. Enter Hadoop. Hadoop is a distributed computing platform that has many features that are similar to Google’s base technology. It implements Map Reduce. It has a distributed file system. Now this is an advanced filesystem that is fault tolerant. It can copy and replicate data on demand for frequently accessed objects. Hadoop provides a simple infrastructure for developers to use to build applications that can deal with enourmous volumes of data. Why would companies that are not search engines be interested? Simple, Hadoop can be used to solve very demanding problems very cheaply. Want to build an EMR with advanced data mining functionality? Hadoop would enable the data to be analyzed in a fast and inexpensive manner.
Hadoop is built in Java and runs natively on Linux. For Microsoft that is a problem. Already many web applications, especially Web 2.0 run on LAMP. Microsoft has competitors to these though that are good, even if more expensive. The argument that Microsoft tools have developer time may make up for more expensive software but that is a seperate issue for another time. However when deploying applications that run on a large number of machines can lead to huge licensing costs. Each computer needs its own copy of Windows even though the only thing that these machines will be used for is computing and storage. Few Windows services will ever be used. So Linux makes sense in this situation.
Microsoft should do three things. Microsoft should sponsor a port of Hadoop to .Net. That will give Microsoft developers the same tools as their open source associates. Microsoft will need to hire a bunch of FTEs to make this happen. Second SQL Server needs to easily scale horizontally across many machines cheaply. Third their needs to be a version of Windows to support this. Windows Compute Cluster is not what we are looking for. Windows Compute Cluster does not support .Net natively (although it can use PInvoke) and is targeted at people I think with legacy applications. Information from the website is short on details. If someone wants to correct this please do.
Microsoft needs to do something to address this niche but rapidly growing market segment.