Comparison of the use of Amazons EC2 and Windows Azure, cloud computing and implementation of Map-Reduce.
Juan G Diaz
The City University of New York
Cloud Computing has been accepted globally as a one stop shop for dynamically scalable and resources that are virtualized as services over the internet. They can virtually â€œOut-Sourceâ€ their infrastructure and concentrate in their development. These third-parties would take care of these services and will provide the computing power as needed. Â In which key companies have been exploiting the use of it and taking advantage of making profit at the same time creating new tools for programmers to work with.
These companies however provide different sets of tools for programmers to work with, one takes the open source route, on the other hand the other company takes the advantage of close source and very limited to the use of other non-proprietary software
Amazon provides Amazon Elastic Compute Cloud (Amazon EC2), which presents a virtual computing environment. Users can launch instances of a variety of operating systems; enabling them to access them as a web service, loading them with custom application environments and enabling them to create multiple images of the software instances as needed with the same configuration.
Amazon EC2 includes a vast amount of web services in which they are market as Amazon Web Services (AWS), they are: Alexa Web Services, Amazon Associates Web Services, Amazon AWS Authentication, Amazon CloudFront, Amazon DevPay, Amazon Elastic Block Stores, Amazon SimpleDB, Amazon Elastic Map Reduce, among others .
It offers flexible environment to the user, is as simple as creating your own virtual machine with your own configuration and upload it to their platform. Also you can use their predefined virtual machines, or use other OSs from a community based selection in which they have many different configurations. Since this is the case it can run virtually in any programming language and any OS flavor. Though, the use of Amazon Elastic Map Reduce is restricted to several programming languages, in which I will be discussed later on .
Microsoft offers a similar product called Windows Azure Platform, which is a form of a Virtual computing environment and offers development services. It uses the Windows Azure Operating system, which means that it would only run under a windows server like operating system and there is no support for Unix/Linux type of operating systems. This operating system is enhanced to take the advantages of being on a cloud, designed for high availability and dynamic scaling to mach users needs.
This platform includes five services:Â Live Services, SQL Services, .NET Services, SharePoint Services and Dynamic CRM Services  – which are the tools used by developers to build applications over the Azures cloud. These services would be provided by adding to the Visual Studio a Library that manages the use of the Azure platform.
Azure currently runs on the Microsoft Visual Studio Environment and the .NET Framework, supporting the use of ASP.NET applications and provides the associated methods to deploy on to the cloud. Â The use of 3rd party software or open source is very limited as of this moment, but allows some of the most popular tools and languages such as Eclipse, Ruby, PHP, and Python .
Cloud Services Comparison (Map-Reduce)
When it comes to cloud services we need to compare the most important tools that are used in distributed computing systems on large data sets in a cluster of computers. One of this tools is Map-Reduce which is a software framework introduced by Google .Â It is created from the map and reduce functions that are commonly used in Functional Programming , that has been tailored to be use in large data set in cluster systems.
Map-Reduce functions work as a (key, value) pairs:
Map takes one pair of data with a type with a data domain, and returns a list of different domain.
Map (k1, v1) Ã list (k2, v2)
The production of list (k2, v2) pairs calls gets grouped together, thus creating one group for each one to the different generated keys.
Reduce is applied after the Map which in turn it processes a collection of values in the same domain.
Reduce (k2, list (v2)) Ã list (v3)
Each return will produce either one value for v3 or empty, then the desired return call aree collected as a result list.
Apache Hadoop , was inspired by Googleâ€™s Map-Reduce , itâ€™s a Java based thatâ€™s supported to provide data-intensive applications over a cluster system. Â Amazon presents Amazon Elastic Map Reduce, as a web service that offers businesses, researchers, data analysts and developers a cost effective an easy to use that process vast amounts of data . Amazon adopted this open source project instead of creating its own version of Map-Reduce. It was a wise decision since its Java based and its widely used and combining this with their infrastructure of Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (Amazon S3). Â Amazon uses the term Elastic, which is centered on the concept of job flow. Each job flow can use data from Amazonâ€™s S3, and distributes them to an specify number of EC2 instances running Hadoop (as many Virtual instances)
On the other hand Microsoft Azure doesnâ€™t have any real capabilities to control virtual clusters programmatically.Â There have been some attempts to create an answer to Map-Reduce over a cloud. Moreover, Microsoft is still in its Beta version of their cloud and they have been expanding and changing their technologies.
One of them is called DryadLinq, itâ€™s a research done by Microsoft. As stated in their website itâ€™s a simple, powerful and elegant programming environment for writing large scale data parallel applications running on a large PC cluster .Â Dryad is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming.  DryadLinq combines the power of the Dryad distributed execution engine and the power of the Language Integrated Query (LINQ), makes distributed computing easier to distribute parallel applications in thousands of clustered computers.Â Hence, this is a great tool created by Microsoft, DryadLinq is only targeted to cluster environments, which itâ€™s different from a cloud computing environment which make is difficult to port it to Microsoft Azure.
There is also another interesting attempt to reproduce Map-Reduce, there is an open source application that does this, it just started a couple of month ago but itâ€™s still in its infancy steps. But it looks very promising; this is located in Googleâ€™s codeplex open source website, called MapSharp . Unfortunately there is not enough information at this time to tell us more about this project.
Comparing both Amazon and Microsoft, indeed Amazon has the advantage of being in the market for a longer period of time. Thus Amazon taking the advantages of open source it was easy and cost effective to include Hadoop into their array of web services for the cloud. Conversely, Microsoft still has a long route ahead of them, but it is very promising for the .NET world, their tools makes an easy access to the cloud by just adding a Toolkit to the Visual Studio IDE and some limited access to other programming languages. However, they will have some trial and error before they have a solution to Map-Reduce functionality could be delivered to Windows Azure
1. Amazon Web Services (AWS)
2. Amazon Elastic Map Reduce
3. Microsoft Azure
4. Microsoft Azure (What is Azure)
5. Map-Reduce:Â SimplifiedÂ DataÂ ProcessingÂ onÂ LargeÂ Clusters
6. Map Reduce
7. Apache Hadoop
8. Microsoft Research of Map-Reduce-DryadLinq
9. Google CodePlex â€“ Map-Reduce using Mapsharp