Open Cirrus singapore

Open Cirrus Singapore
Thursday, 6 May 2010

Session Chair: Donald Tan (National Grid Office)

10:30am - 11:10am Open Cirrus & HP Labs Update
by Jason Tan (HP)
11:10am - 12:30pm Open Cirrus Sites Update
by Napat Chalakornkosol (IDA), Thillaj Raj (MIMOS), Han Namgoong (ETRI), Dr. Marcel Kunze (KIT)

Session Chair: Napat Chalakornkosol (National Grid Office)

2:20pm - 2:40pm Optimizing Model Training for Speech Recognition
by Tan Yu Shyang (NTU)
2:40pm - 3:00pm Improving MapReduce Fault Tolerance in The Public Cloud
by Dr. Qin Zheng (IHPC)
3:00pm - 3:20pm A Performance Study of MapReduce Framework
by Shi Lei (NUS)
3:20pm - 3:40pm Using Hadoop for Taxi Data AnalyticsPDF
by Koh Jun Yong Alvin (SMU), Nguyen Xuan Khoa (SMU), Dr. C. Jason Woodard (SMU)

Session Chair: Napat Chalakornkosol (National Grid Office)

4:00pm - 4:20pm Cloud Computing: Multiplier for Societal Impact
by Thillai Raj (MIMOS)
4:20pm - 4:40pm On The Design of Efficient Resource Allocation & Workload Management Strategies for Large-Scale Compute Cloud Environments PDF
by Dr. Bharadwaj Veeravalli (NUS)
4:40pm - 5:20pm Discussion with HP Labs
by Chris Whitney (HP)


Open Cirrus Singapore
Thursday, 6 May 2010

10:30am - 11:10am
"Open Cirrus & HP Labs Update"
Speaker: Jason Tan (Director, Operations, HP Labs Singapore)

Abstract
OpenCirrus was started by the joint partnership between HP, Intel and Yahoo in July 2008. The platform provides an open cloud-computing research testbed designed to support research into the design, provisioning, and management of services at a global, multi-datacenter scale. The open nature of the testbed is designed to encourage research into all aspects of service and datacenter management. In addition, we hope to foster a collaborative community around the testbed, providing ways to share tools, lessons and best practices, and ways to benchmark and compare alternative approaches to service management at datacenter scale.

In this talk, we will share some of the updates on Open Cirrus at a worldwide level and also provide information on some of the research work done on this platform by HP Labs.

Biodata
Jason Tan is Director of Operations for HP Labs Singapore and holds a joint appointment as Program Director for Strategic Research and Infrastructure at Hewlett-Packard Singapore. In his capacity as Program Director, he worked with local country management to drive the new growth initiatives for HP Singapore, including Cloud Computing and software as a service. He is also the Director of Operations for HP Labs Singapore, driving research activities between Singapore and HP Labs Singapore. Prior to this, he was Business Development Manager for the High Performance Computing (HPC) and online gaming at HP South East Asia. This key responsibility was to develop grid computing, life science, digital media and online gaming industry in the region. Jason has extensive industry and technical experience in HPC especially in Life Science, Grid Computing and online gaming. He worked as country strategic business manager in Singapore and had driven several key initiatives in Singapore, such as HP's participation in National Grid program, utility computing, research collaboration, digital media and online gaming. Jason holds a bachelor degree in Electrical and Electronics Engineering from Nanyang Technological University, Singapore.

. . .

11:10am - 12:30pm
"Open Cirrus Sites Update"
Speaker: Napat Chalakornkosol (Assistant Manager, IDA), Thillai Raj (CTO, MIMOS Berhad), Han Namgoong (Director, Cloud Computing Research Department, ETRI), Dr. Marcel Kunze (Head, Cloud Computing Research Group, KIT)

Biodata of Napat Chalakornkosol
Napat is an Assistant Manager at Infocomm Development Authority of Singapore (IDA). Before joining IDA, Napat involved in various projects on High Performance Computing, Cluster Computing, Grid Computing and Cloud Computing. He conducted Grid-related training courses and provided his expertise in grid-enable applications under the National Grid Competency Centre. At IDA, Napat manages the Open Cirrus Cloud Computing projects as well as conducts the Hadoop for Users Training courses.

Napat graduated with a Bachelors Degree of Computer Engineering from Kasetsart University, Bangkok, Thailand and was member of High Performance Computing and Networking Center. Prior to his graduation, his final year project on Cluster & Grid monitoring platform was nominated and presented at IEEE CCGrid 2003, Tokyo.

Biodata of Thillai Raj
Thillai Raj holds a Bachelor of Technology in Engineering from the Indian Institute of Technology in Chennai, India and joined MIMOS with more than 20 years experience in Electronic Design and Manufacturing.

As CTO and Head of Software Development & Central Engineering in MIMOS, he is the driver behind the National Grid Computing Initiative, also known as KnowledgeGRID Malaysia. Mr. Thillai Raj is also running Advance Software Development consisting of:

  • Information Security Software
  • Image Processing
  • Wireless Software Development
  • Semantic Software Lab
  • Cloud Computing

He was previously attached with Motorola as a Director for Systems Engineering Group and as a Senior Director of Global Engineering in Flextronics. Later, he was instrumental in setting up the Motorola Software Design Centre in Cyberjaya. The design centre achieved SEI CMM & CMMI Level 5 under his guidance.

Thillai Raj has 7 patents to his name and holds a Six Sigma Blackbelt.

Biodata of Han Namgoong
Han Namgoong is the Director of the Cloud Computing Research Department of the Software Research Laboratory at the Electronics and Telecommunications Research Institute (ETRI), South Korea.

 

 

 

 

Biodata of Dr. Marcel Kunze
Heading the research group “Cloud Computing” at the Karlsruhe Institute of Technololgy, Dr. Marcel Kunze is a technical lead in the Open Cirrus HP-Intel-Yahoo Cloud Computing testbed.

After physics and informatics studies at Karlsruhe University, Bochum University and CERN he joined the BABAR collaboration at SLAC / Stanford University to investigate and further develop the Grid Computing paradigm for distributed processing of particle physics data. In 2002 Dr. Kunze joined Research Centre Karlsruhe as a department leader for “Grid Computing and e-Science” to work on the establishment of the LHC Computing Grid. He was spiritus rector of the German D-Grid initiative and served many years in the management board of the EGEE project to represent Germany and Switzerland. In the recently founded Karlsruhe Institute of Technology he is now committed to R&D in the field of service oriented architectures, virtualization techniques and system development for Cloud Computing. His teams are active in many national and international projects, the most important being D-Grid, EGEE, EUFORIA, g-Eclipse and Open Cirrus.

. . .

2:20pm - 2:40pm
"Optimizing Model Training for Speech Recognition"
Speaker: Tan Yu Shyang (Project Officer, School of Computer Engineering, NTU)

Abstract
Modern speech recognition systems are generally based on statistical models which output a sequence of symbols or quantities. These models can be trained automatically and are simple and computationally feasible to use. However, the accuracy of the model degrades significantly when speech signals are corrupted by noises. The noise distortion usually causes a difference between the statistics of training and testing speech features.

Recently, to address the issue of noise distortion, margin-based training methods like soft-margin estimation (SME) have been applied to the training of HMM based acoustic models. The method is shown to be effective in improving the resulting model’s robustness against noise distortion.

While the training of the HMM based acoustic models does not require much computation time, it is necessary to train the model multiple times to ensure the accuracy of the model. This multi-iteration training causes the training to take up quite a significant amount of CPU time in a dedicated HPC setting.

In this project, we make use of Apache Hadoop to parallelize the model training phase of a Speech Recognition System for a distributed environment. We also show some issues when using Hadoop and how its performance can be further fine tuned through the aid of visualization tools.

Biodata
Tan Yu Shyang Alan, is currently working as a Project Officer in the Parallel Distributed and Computing Center at the School of Computer Engineering, Nanyang Technological University. He is also currently pursuing a part time Masters in Engineering under the same faculty. He was previously working on projects in the Grid and Cloud Virtualization area and was involved in setting up a Cloud virtualization test bed. His current research interest is in the use of Hadoop framework in a heterogeneous architecture cluster consisting of GPUs and CPUs. Alan has also been involved in several projects which utilizes Hadoop. These projects ranges from using Hadoop to help effectively process log files from the Internet root DNS servers to analyze their availability and effectiveness to the performance evaluation and fine tuning of Hadoop from the user’s aspect.

. . .

2:40pm - 3:00pm
"Improving MapReduce Fault Tolerance in The Public Cloud"
Speaker: Dr. Qin Zheng (Research Engineer, IHPC)

Abstract
MapReduce has been used at Google, Yahoo, Facebook etc., even for their production jobs. However, according to a recent study, a single failure on a Hadoop job could cause a 50% increase in completion time. MapReduce has also been used in the public cloud, for example, Amazon Elastic MapReduce has been provided to help users perform data-intensive tasks for their applications. These applications may have high reliability and/or tight SLA requirements. However, it is more challenging to provide reliability for MapReduce jobs in the public cloud where topology control and rack locality currently are not possible. In this paper, we investigate how to use redundant copies for map tasks to improve MapReduce fault tolerance in the public cloud while reducing completion time.

Biodata
Qin Zheng received the B. Eng. degree in Information Engineering from Xi’an Jiaotong University, Xi’an, China in July 2001 and the Ph.D. degree in Electrical and Computer Engineering from National University of Singapore, Singapore in January 2006. He then worked as a Research Fellow in the same department till November 2007 when he joined the Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), Singapore. His research interest is in Cloud/Grid/HPC on fault tolerance, pricing, load balancing, MapReduce/Hadoop, data management, and robust scheduling. He is a member of ACM, IEEE, IEEE Computer Society, and Vice Program Chair for HCW 2010, to be held in conjunction with IEEE IPDPS 2010.

. . .

3:00pm - 3:20pm
"A Performance Study of MapReduce Framework"
Speaker: Shi Lei (Ph.D Candidate, School of Computing, NUS)

Abstract
MapReduce-based systems have been widely used for large-scale data analysis. Although these systems achieve storage-system independence, high scalability, and fine-grained fault tolerance, their performance are not satisfactory. According to a recent study, MapReduce-based systems are significantly slower than Parallel Database systems in performing a variety of analytic tasks. Some attribute the performance gap between MapReduce-based and Parallel Database systems to architectural design. This speculation yields an interesting question: Must a system sacrifice performance to achieve flexibility and scalability?

Inspired by the question above, we conducted an in-depth performance study of MapReduce in its open source implementation, Hadoop. We identify four factors that have significant performance efficient on MapReduce, and investigate alternative strategies for each factor. Finally, we evaluate the performance of MapReduce on a representative yet tractable combinations of the four factors. The results show that with proper implementation, the performance of MapReduce can be improved by a factor of 2.5 to 3.5 and approaches to Parallel Databases. Our results show that it is possible to build a MapReduce-based system that is not only flexible and scalable, but is also efficient.

Biodata
Shi Lei received his bachelor degree in College of Information Science and Engineering, Northeastern University, China in 2008. Currently he is a Ph.D candidate in School of Computing, National University of Singapore.

Shi Lei’s research interests include cloud computing infrastructure, parallel and distributed database theories and applications.

Shi Lei has been using Hadoop since 2008. He is now a member of Singapore Special Interest Group on Hadoop (Hadoop SIG).

. . .

3:20pm - 3:40pm
"Using Hadoop for Taxi Data Analytics"
Speakers: Koh Jun Yong Alvin (Student, SMU), Nguyen Xuan Khoa (Research Engineer, SMU), Dr. C. Jason Woodard (Asst. Prof. Information Systems, SMU)

Abstract
Singaporeans take over half a million taxi trips a day, yet Singapore’s nearly 25,000 taxis spend up to 40% of their time unoccupied. Even a 1% boost in fleet efficiency could yield millions of dollars of economic benefits through higher driver earnings, lower passenger waiting times, and less wasted fuel. Achieving these gains is difficult, however, because driver behaviour is complex and highly decentralized. Only about 10% of trips are dispatched through a central booking system; the remainder originate at taxi queues or by street pickups. Efficient matching of supply and demand relies on drivers’ experience, intuition, eyesight, and luck.

The project studies the feasibility of using cloud computing to provide both offline and online analytical support for taxi fleet operations. In this preliminary study, we benchmark the performance gains from distributing the analysis of GPS location traces over multiple virtual machines using the Hadoop framework. Our dataset consists of about 200 gigabytes of GPS location traces (over 2 billion observations) provided by a Singapore taxi company.

At CloudAsia 2010, we will present preliminary results from our benchmarking study and comment on their implications for potential applications of the Open Cirrus platform in the transportation domain.

Biodata
All three researchers are affiliated with the SMU School of Information Systems, which generously supported this work. Alvin Koh is a second-year undergraduate student whose interests lie at the intersection of technology, business and social work. Khoa Nguyen is a research engineer interested in researching large-scale distributed systems. Jason Woodard is an assistant professor of information systems. His current research focuses on the evolution of new software platforms.

. . .

4:00pm - 4:20pm
"Cloud Computing: Multiplier for Societal Impact"
Speaker: Thillai Raj (CTO, MIMOS)

Abstract
MIMOS has a mission to pioneer innovative information and communication technologies to improve the nation.  By harnessing distributed computing resources, we look towards  building a platform to elevate local research and industry capabilities, To address the masses, we look towards building a cloud delivery platform for government  information and services. And finally to bridge the gap of connectivity, its about connecting the unconnected through cost effective wireless technologies.

Biodata
Thillai Raj holds a Bachelor of Technology in Engineering from the Indian Institute of Technology in Chennai, India and joined MIMOS with more than 20 years experience in Electronic Design and Manufacturing.

As CTO and Head of Software Development & Central Engineering in MIMOS, he is the driver behind the National Grid Computing Initiative, also known as KnowledgeGRID Malaysia. Mr. Thillai Raj is also running Advance Software Development consisting of:

  • Information Security Software
  • Image Processing
  • Wireless Software Development
  • Semantic Software Lab
  • Cloud Computing

He was previously attached with Motorola as a Director for Systems Engineering Group and as a Senior Director of Global Engineering in Flextronics. Later, he was instrumental in setting up the Motorola Software Design Centre in Cyberjaya. The design centre achieved SEI CMM & CMMI Level 5 under his guidance.

Thillai Raj has 7 patents to his name and holds a Six Sigma Blackbelt.

. . .

4:20pm - 4:40pm
"On The Design of Efficient Resource Allocation & Workload Management Strategies for Large-Scale Compute Cloud Environments"
Speaker: Dr. Bharadwaj Veeravalli (Associate Professor, Department of Electrical & Computer Engineering, NUS)

Abstract
Cloud computing is currently emerging as a powerful way to transform the IT industry to build and deploy custom applications. In this research, we devise resource allocation and pricing strategies for Cloud environments. We apply axiomatic bargaining approaches to derive an optimal solution for allocating virtual CPU instances (VCIs) in a homogeneous Compute Cloud (CC). We demonstrate that our approaches give flexibility to the Cloud Service Providers (CSPs) in choosing various task parameters to arrive at the optimal solution. We introduce the concept of asymmetric pricing scheme in which a user can specify his budget constraints and CSPs can attempt to maximize the revenue without sacrificing the performance. Then we propose a scheme for resource allocation and load balancing in a heterogeneous Cloud environment, where task requirements include CPU, memory and bandwidth capacity required. The proposed solution attempts to increase resource utilization by packing VCIs together in server nodes based on their demand for resources thereby achieving a “green” solution. Through simulations, we show that our schemes are economical, energy efficient and effectively capture Cloud characteristics. Later we attempt to investigate the use of Divisible Load paradigm to design efficient strategies to minimize the overall processing time for performing large scale polynomial product computations in CC environments.

Biodata
Bharadwaj Veeravalli, Member, IEEE & IEEE-CS, received his BSc in Physics, from Madurai-Kamaraj University, India in 1987, Master's in Electrical Communication Engineering from Indian Institute of Science (IISc), Bangalore, India in 1991 and PhD from Department of Aerospace Engineering, IISc, Bangalore, India in 1994. He received Gold Medals for his Bachelor's Degree overall performance and for an outstanding PhD thesis (Sabitha Chowdhary Gold Medal, IISc, Bangalore India) in the years 1987 and 1994, respectively.. He did his post-doctoral research in the Department of Computer Science, Concordia University, Montreal, Canada, in 1996. He is currently with the Department of Electrical and Computer Engineering, Communications and Information Engineering (CIE) division, at The National University of Singapore, Singapore, as a tenured Associate Professor. His main stream research interests include, Cloud/Grid/Cluster Computing, Scheduling in Parallel and Distributed Systems, Bioinformatics & Computational Biology, and Multimedia Computing. He is one of the earliest researchers in the field of Divisible Load Theory (DLT). He had successfully secured several externally funded projects. He has published over 100 papers in high-quality International Journals and Conferences. He has co-authored three research monographs in the areas of PDS, Distributed Databases (Competitive Algorithms), and Networked Multimedia Systems, in the years 1996, 2003, and 2005, respectively. He had guest edited a special issue on Cluster/Grid Computing for IJCA, USA journal in 2004. He is currently serving the Editorial Board of IEEE Transactions on Computers, IEEE Transactions on SMC-A, Multimedia Tools & Applications (MTAP) and Cluster Computing, as an Associate Editor. He is a Visiting Professor with HUST, Wuhan, China, since June 2007. He had served as a program committee member and as a Session Chair in several International Conferences. Visit his page – http://cnds.ece.nus.edu.sg/elebv.

. . .

4:40pm - 5:20pm
“Discussion with HP Labs”
Speaker: Chris Whitney (Managing Director, HP Labs Singapore)

Abstract
HP Labs Singapore was announced on Feb 2010. The current research focus is around Cloud Computing. We are hoping to use this forum for interacting with cloud research communities to explore area of collaborations and partnership. Chris Whitney, our founding Managing Director from HP Labs Singapore, will share some of his thoughts and is keen to hear from the research communities for ideas to partnership.

Biodata
Chris Whitney is managing director of HP Labs Singapore. He has been involved in service automation and integration since joining BT Labs in 1985 where he was responsible for developing algorithms and prototypes of intelligent service and network management systems for use in public and private communications networks.

Previously, Whitney managed HP Services' research and innovation group, focused on developing new services and solutions for HP Consulting, Integration, Outsourcing and Support organizations.

From 2000 to 2005 he held vice president roles at Exodus Communications and at Cable & Wireless, where he developed lights-out, virtualized, automated data center solutions. He also founded ActiveReasoning, a California-based startup company developing data center compliance and management software.

Earlier roles included work in HP’s Customer Relationship Management Operation, where he led the research and development organization in developing automated call center and customer management software.

Whitney first joined HP Labs in 1994, developing automated service-management capabilities for HP’s OpenView and other products. He received his bachelor of science in computer science from Teesside University and a master of science from South Bank University, London.


Platinum Sponsors: Alatum IBM      
Silver Sponsor: IGEL        
Bronze Sponsors: 1 Degree North Microsoft NCS
Venue Sponsor: SMU SIS   Lanyard & Bag Sponsor: IBM  
Organized By: IDA ngo SCS SITF
A*Star NTU NUS SMU SIS  

Shadow