CHOReOS was an FP7 project. It is now fully completed. This website is kept open for information purpose only, it is not updated. Please visit CHOReVOLUTION the project that takes over from CHOReOS.
choreos.eu: Grid middleware component documentation (Documentation.grid_doc)

Grid middleware component documentation

Grid Middleware

For choreography-level applications that require the processing of large quantities of data or CPU-intensive tasks, CHOReOS offers a gateway to two different computational infrastructures: Apache Hadoop and InteGrade.

Features

  • Ability to process large datasets
  • APIs to support MapReduce, bag-of-tasks, MPI and BSP applications
  • Scalable middleware platform

How does it work

The Grid middleware component implements RESTful APIs that enable the execution of CPU-intensive tasks in a grid computing environment by interacting with the open source InteGrade middleware or the Apache Hadoop framework. While the first was designed to use paradigms such as MPI and BSP, the latter was designed to support applications using the MapReduce programming paradigm. Users are able to start different types of applications (for instance, sequential or parametric applications) by providing their binary file and the execution information (e.g., program arguments, execution constraints, preferences, etc.), to keep track of their progress, and to receive the results using the web services available.

The use of Grid Computing as a service may be particularly interesting for end-user applications or middleware components that require a high degree of parallelism for computationally-intensive tasks. For example, the CHOReOS Grid could be applied in the WP7 use-case. As specified, to accomplish the Adaptative Customer Relationship Booster (ACRB) use-case, users should obtain special personalized discounts and promotions on the fly and in a context-aware manner through their own smartphones. Personalized recommendations are a demand from Companies and Sales managers that need highly personalized and short term marketing campaigns in a flexible and distributed way. They also expect to gather fine grained feedbacks on customer expectations and needs in near-real time. This feedback is essential to get such recommendations. Indeed, to get near-real time recommendations, high processing and storage capacity are necessary, which can be provided by the Grid middleware using InteGrade and Apache Hadoop.

The Grid middleware component depends on an Infrastructure as a Layer (IaaS) provider. The role of this IaaS provider is to provide virtual machines (VMs). Once instanciated, users can deploy the adequate platform (Hadoop or InteGrade) to perform the computations. The AWS can provide pre-configured intances of Apache Hadoop VMs using the Amazon EMR (Amazon Elastic Map Reduce), which provides scalability according to the users' needs. Besides the possibility of using cloud-based collections of VMs, the Grid middleware component could use federation of clusters with dedicated physical machines. In this case, the clusters would provide all the infrastructure and platforms needed to accomplish the computation.

Download the Grid middleware API.

Targeted audience

Deployers that would like to use service composition with large datasets processing. This is specially useful in datasets that have a high degree of paralelism and the processing time in single machines is prohibitive. Therefore, the use of platforms that supports MapReduce, MPI and BSP may provide near real-time results when processing these data, possibly providing major enhancments on service composition performance times.

Requirements

The main requirements to run the Grid component are an IaaS provider account and the VMs instanciated with the choosen platform.

The recomended IaaS provider is the AWS, specially the Amazon EMR. Please note that, when using Amazon as the IaaS provider, you are responsible for the costs.

Supported technologies

The Grid middleware component supports several technologies of parallel and distributed computing paradigms, such as MapReduce, MPI and BSP. The majority of users programs will use the Java programming language, although platforms also allows the use Python or C/C++ programming languages. Concerning Apache Hadoop, a set of RESTful APIs allows the access to the file system, the resource manager, and the application manager. These APIs allows a more complete access to the framework, leveraging its use as a service.

License

The Apache Hadoop is licenced under the Apache v2 license, which is considered permissive.

The InteGrade is licenced under the LGPL licence and its source code and documentation are also available as free software.


This wiki is licensed under a Creative Commons 2.0 license - Legal Notice
XWiki Enterprise 5.4.6 - Documentation
Site maintained by