Parallel Distributed Infrastructure for Minimization of Energy

Madeleine.Gray's blog

ParaDIME’s innovations offer significant energy savings for data centres

Wed, 2015-10-28

Researchers from the ParaDIME research project, coordinated by Barcelona Supercomputing Center, have successfully developed a number of methodologies which enable savings in data centre energy consumption ranging from 30% to 60%. These methodologies tackle several critical power-related research challenges, from using different current and future devices to inter-datacentre VM (virtual machine) scheduling.

At the programming model level, the focus has been on shifting from the shared memory model to an actor-based, message-passing programming model which enables programmers to achieve greater energy efficiency and become more energy aware. There are two illustrative approaches:

•        Tailored programming solutions for heterogeneous GPU/CPU architectures. By directly managing the GPU through optimized code, energy consumption can be reduced by roughly 80%. However, this requires in-depth specialist knowledge, which reduces programmability. ParaDIME has developed techniques based on domain-specific languages (DSL) which generate code for both the CPU and GPU, resulting in energy savings of up to 40% while, crucially, empowering a far greater number of programmers to utilize these innovative architectures.

•        Tools for power and cost awareness that estimate the power requirements of a single process being run in a virtualized environment. These tools can also be used for user-based pricing models, energy-aware task scheduling and as an indicator for how many heterogeneous resources are consumed by an application.

At the runtime level, ParaDIME has developed a large, decentralized infrastructure of small data centres that provide heating and hot water. This is motivated by the efficiency gains demonstrated by the project’s industrial partner Cloud&Heat. ParaDIME researchers have developed:

·         The multi-datacentre scheduler: this schedules jobs across different data centres, striking a balance between data centre workloads and heating/cooling necessities, resulting in the reduction of CO2 emissions and energy consumption of up to 50%.

·         An intra-datacentre scheduler: technologies have been developed to reduce the time needed to reactivate virtual machines and their migration costs. Parts of this work are under review by the QEMU – an open-source machine emulator and virtualizer – community. Institutions using QEMU to virtualize their workload will be able to benefit from ParaDIME-optimized virtual machine migration code. Furthermore, ParaDIME has contributed a feature to track changes to block devices that has already been incorporated into the latest Linux kernel.

At the hardware level, ParaDIME researchers have proposed and simulated several methodologies for improving energy-efficiency of the future computing node, including:

·         Scheduling of tasks to heterogeneous cores (e.g. big.LITTLE processors, or systems that combine FPGA, GPU and CPU cores). On average, a 40% saving in energy consumption can be achieved by combining FPGA, GPU and CPU cores as opposed to a multicore processor. ParaDIME scheduling also reduces 20% of the power and 35% of the energy on average across different types of heterogeneous platforms. ParaDIME has also researched power estimation tools for a variety of core types. 

·         Aggressively lowering the supply voltage. Energy is saved by combining this with low-overhead error detection and correction techniques. In addition, ParaDIME researchers have explored this methodology for circuits built with future devices. The ParaDIME methodology saves up to 60% of the energy consumed by the L1 data cache.

Diagram showing ParaDIME infrastructure

                                                                          Figure 1 ParaDIME infrastructure          

About the ParaDIME project

ParaDIME ("Parallel Distributed Infrastructure for Minimization of Energy") was a three-year research project launched in September 2012 with a total budget of €3.2M, including €2.5M funding from the European Commission's Seventh Framework Programme. The project was coordinated by Barcelona Supercomputing Center (BSC) and partners were IMEC (Belgium), Technische Universität Dresden (Germany), Université de Neuchâtel (Switzerland) and Cloud&Heat (Germany).

The objective of this European project was to attack the power-wall problem by radical software-hardware techniques that are driven by future circuit and device characteristics on the hardware side, and by a programming model based on message-passing, and in a smart scheduling of the workload of data centres on the software side.

For further information, visit www.paradime-project.eu

Ten minutes with...Santhosh Rethinagiri, Barcelona Supercomputing Center

Fri, 2015-03-13

Santhosh Rethinagiri

Santhosh Rethinagiri is a senior researcher with the Microsoft research group in Barcelona Supercomputing Center (BSC). His research interests involve minimisation of energy for data centers, power reduction for supercomputers with mobile computing chips and a FPGA-based acceleration for databases. Fellow BSC researcher Oscar Palomar also participates in the ParaDIME project, while the principal investigators are BSC’s Adrián Cristal and Osman Ünsal.

 

What’s your research background? How did you come to be working in this field?

I did my B.Eng in electronics and instrumentation before specialising in embedded systems for my MS in electrical engineering. After that, I spent time working for the electronic system-level (ESL) company Synopsys, where I got to know about system research, platform architecture in general and the products which they offer in particular. My PhD, which I studied at Inria, in France, focused on developing tools to estimate power for applications and systems used by various companies, including Thales, Inpixel, Inria and STMicroelectronics.

What are your current research interests?

My work is mostly in power estimation and optimisation across every step of the computing system, from hardware to applications. I undertake heterogeneous prototyping with field-programmable gate arrays (FPGAs), central processing units (CPUs) and graphic processing units (GPUs). This involves running real applications and trying to optimise them for different devices, aiming to use the advantages of the three different devices to improve energy efficiency and accelerate applications. I am also working at the device level for hybrid architectures, using complementary metal-oxide-semiconductors (CMOS).

In terms of software, I am working on annotated power saving. A piece of software will specify that an application should be executed on a specific frequency, but some applications don’t require that much power so you can use automatic workload specification to intelligently distribute power.

Why should we be working towards more energy-efficient computing systems? What can we do to make them more efficient?

Currently, data centres are not operating fully: up to 90% of their processors may be idle at any one time. This leads to enormous energy costs.

One way to reduce the power consumption is to replace powerful processors with embedded ones, gaining a productive trade-off of energy versus performance. The Mont-Blanc project at BSC, for example, is creating supercomputer prototypes where high-performance processors are replaced with ARM-based chips. Where software is concerned, annotating the sections which are critical and therefore need high performance and those which need less power can help reduce the power consumption, as mentioned above.  

What, for you, are the main technical challenges which should be tackled in order to achieve these?

For me the key thing is that every part of the design of computing systems needs to be addressed: improved architecture should be complemented by software and device design. As no one person can master every element, this means that engineers from different areas need to work together, and this is what the ParaDIME project is all about.

On what areas are you concentrating in the ParaDIME project?

Within the project I concentrate on architecture. My first task was to develop power models for different kinds of processor and automatically evaluating these models. Next, I’ve been building heterogeneous platforms consisting of the three different processor types listed above, both high-performance and embedded systems. These will define how future architectures are designed.   

We are also working on a proposal for a hybrid device combining two different types of processor, such as CMOS and tunnel field-effect transistors. We’ve been modelling designs with the help of a Belgian research institute, the Interuniversitair Micro-Electronica Centrum (IMEC).

We’re lucky to have a cooperative working environment in the ParaDIME project, as well as strong project management, which means that everything is well coordinated.

What do you think BSC’s unique contribution to the ParaDIME project is?

As the coordinator, BSC is the heart of the project, both in terms of project coordination and the new research it is producing on different topics. The fact that there is a supercomputer on site at BSC means that researchers can experiment with their own prototypes. There is also expertise in every field at the centre and researchers are very approachable. We swap advice and share code with researchers from the Mont-Blanc project –we’ve asked for their input about their prototypes, for example, which has influenced our development of a data centre prototype. We also work with the programming models group: their programming models can be accessed for FPGA and GPU communication.

Why is the ParaDIME project important? What do you think the most important results will be for society in general?

ParaDIME will provide new roadmaps for data-centre systems, covering programming models, runtime models, new devices (both in the near and distant future). Testing with prototypes gives us the assurance that these will work. The project won’t change the face of data centres, but at least it will provide one of the solutions contributing to energy-aware computing.

Why is it important for ParaDIME to participate in the ICT-Energy project?

In ICT-Energy, a consortium of projects related to energy, the projects start from basic physics and go all the way up to data centres and modern supercomputers. ParaDIME is the only one working on all aspects from the device to data-centre level, so acts as a kind of middleman for the other projects, and the other projects give their inputs on how to utilise small architectures.

Exploring energy-efficient transactional memory at HiPEAC 2015

Thu, 2015-02-19

Osman Ünsal presenting a keynote on energy-efficient transactional memory during HiPEAC 2015

 

Energy-efficient transactional memory was the topic of a presentation by ParaDIME researcher Osman Ünsal (Barcelona Supercomputing Center - BSC) at the final Euro-TM workshop, co-located with the 2015 HiPEAC conference in Amsterdam on 19 January 2015.

Transactional memories offer an alternative programming model which may simplify the development and testing of concurrent programs, enhance code reliability and boost productivity. Academic research in the field has recently been followed by interest in the commercial sector: processors with transactional memory support have recently become available (IBM BGQ, IBM Z-series, IBM Power8, Intel Haswell), making it possible to measure and quantify energy-related aspects.

The presentation provided the research background before exploring energy-efficient hardware transactional memory and research on error detection and transactional memory for energy-efficient computing below safe operation margins.  

In addition, along with BSC researchers Adrián Cristal and Gülay Yalçın and other experts in the field from across Europe, Dr Ünsal has also contributed to the final Euro-TM publication, Transactional Memory. Foundations, Algorithms, Tools, and Applications on the topic of reliability and transactional memory.

The keynote is available for download below.

For further information on the Transactional Memory publication, visit the Euro-TM website.

Ten minutes with…Christof Fetzer, Technische Universität Dresden

Wed, 2015-01-14

Christof Fetzer

Christof Fetzer holds an endowed chair (Heinz-Nixdorf endowment) in Systems Engineering in the Computer Science Department at Technische Universität Dresden (TUD), as well as being chair of the Distributed Systems Engineering International Master’s Programme. His PhD students Thomas Knauth and Lenar Yazdanov also work on the ParaDIME project.

 

Can you tell me a bit about your main research interests? What led you to work in this field?

My research interests include cloud computing, dependability, security and energy efficiency, partly because I supervise several PhD students doing different things. I take on new research problems which interest me personally – so a few years ago I thought there were interesting problems in cloud computing. For example, as a cloud customer, how can I trust that confidentiality is ensured for my data and its computation? How do we ensure the integrity of the data and its availability?

Cloud computing has some great advantages with regard to energy efficiency because the shared infrastructure means that you can provide infrastructure for peak load. Computers are more fully utilised and are therefore more energy efficient.

There are varying daily patterns, with some periods experiencing high loads and others low loads. Most data centres don’t switch off their machines; as a result, a 2012 study by the New York Times found that data centres can waste 90% of the electricity they take from the grid, as they use only a small percentage of the electricity powering their servers to perform computations and the rest to keep servers idling – and servers are idle for about 90% of the time. You could therefore potentially achieve energy savings in the region of an order of magnitude if the servers were utilised 80% of the time. One way of doing this would be to consolidate the load, moving all the computation onto a few machines and switching off the other machines. This presents significant challenges, as moving applications too often should be avoided.

 

What does Dresden bring in particular to the ParaDIME project?

TUD has two large projects which are complementary to ParaDIME. The first is a cluster which forms part of the German excellence initiative, the Centre for Advancing Electronics. In this cluster we look at resilient computing, and one of the topics we consider is how we can lower the energy consumption of computers to a point where there might be errors in the computation, so that we can try to detect them, correct them and from there aim not to introduce them in the first place. We are also investigating new technology, material and devices, such as carbon nanotubes for integrated circuits, which might provide better energy efficiency.

The other major project is SG Labs Germany, which researches the next generation of wireless networks which will replace long-term evolution (LTE) networks. In this project, we are researching edge clouds to see how we can distribute computing such that we reduce the latency in order to communicate/compute within less than a millisecond. Applying this in the area of the tactile internet, for example, providing a low response time and give users immediate feedback so they would not notice any latency, would allow the creation of a range of new applications, such as in the domains of health or music.

 

What, for you, are the most compelling reasons why we should create more energy-efficient computing systems?

One reason is the cost of computing, as energy consumption contributes to the total cost of ownership. Another reason is that if computers are more energy efficient you could pack them closer together, thereby achieving a higher compute density in the data centre and reducing the space required for the data centre.

Ecological reasons are also important: the total electricity consumption by the ICT infrastructure is greater than that of India. It therefore makes a lot of sense from an ecological perspective to increase energy efficiency. 

 

What, for you, are the key technical challenges which need to be tackled in order to achieve more energy-efficient computing systems?

One of the ways to save energy in data centres is to switch off some of the machines when they are not needed. However, this poses technical problems as the machines might not come back on when switched on, so technicians would theoretically need to be available in case of any issues.

Storage is usually attached directly to the computer, as there is higher throughput when you attach solid-state drives directly to the computing nodes. If you switch off the machine, the storage attached to it is lost. We need to find a solution which would allow us to support directly attached storage and still be able to switch off machines.

 

Is it possible to deliver genuine energy savings while achieving optimum performance?

It depends what you mean by optimum performance. As computers running at maximum capacity are more energy efficient, we need to keep them at a high level of utilisation. The most efficient algorithms should also be the most energy efficient, but in parallel computing you often want to reduce the runtime of an application, which you do by parallelisation. However, you almost never get linear increases in speed. This means you pay a price in terms of energy in order to achieve shorter runtimes, so the throughput per server is lower than in the case of sequential programming.

So what we would have to do is to maximise the throughput per server of an application and not minimise the runtime of the application. In so doing I think we can achieve genuine energy savings for batch jobs, but this might not be the case for interactive jobs, where you need to get a response quickly.

 

What are the main improvements which you would like to see in the computing systems of the future?

What I really want to see is decentralised computing infrastructure that will compute at the edges of the internet. That would have a positive impact on energy efficiency as less data would have to be transferred and therefore the energy consumption of the network would be reduced. If you keep computation local you can achieve much higher energy efficiency in comparison to a centralised system where you have to transfer data across Europe, for example, and back again.

 

What will the lasting impact of the ParaDIME project be?

Edge computing will allow devices to become more ‘intelligent’: with traffic systems, for example, you could offload some of the intelligence of controlling the cars and routing the traffic. To optimise the energy consumption of cars and schedule routes you would need intelligent computational infrastructure which we don’t have today but could have in the future. Cars could therefore be more connected and autonomous and could interact with the system around them, which could lead to more energy-efficient transportation.

Ten minutes with…Pascal Felber, University of Neuchâtel

Tue, 2014-12-02

Pascal Felber

Pascal Felber leads the research group on Complex Systems in the Computer Science department at the University of Neuchâtel. The department focuses on large-scale distributed, concurrent and dependable systems. Along with Pascal, University of Neuchâtel researchers Anita Sobe, Yarco Hayduk and Mascha Kurpicz are also involved in the project.

 

Can you tell me a bit about your main research interests? What led you to work in this field?

My main research interests are distributed and concurrent systems. The same problems that you find in distributed systems are also encountered in concurrent systems; these include synchronisation problems, access to shared data, fault tolerance, or various kinds of agreement problems.  

After completing my PhD on fault-tolerant distributed computing, which was one of the topics of interest at EPFL (École Polytechnique Fédérale de Lausanne), the university where I studied, I went to work in in the United States and spent some time at Oracle and at Bell Labs. 

 

What, for you, are the most compelling reasons why we should create more energy-efficient computing systems?

The evolution of cloud computing with the development of huge data centres where you mutualise resources means that there is greater incentive to reduce the footprint.

Neuchâtel, with its long-established watchmaking industry, is a hub of microelectronics research. It has a history of energy research, partly due to the requirement for low energy consumption in microelectronics. It also specialises in renewable energy: for instance, it is where the first set of ‘white’ solar panels have been developed, and there is a whole research culture based on this field in Neuchâtel.

 

What, for you, are the key technical challenges which need to be tackled in order to achieve more energy-efficient computing systems?

First we need more efficient hardware, with more efficient chips; this needs to be combined with more intelligent software which can maximise the use of the hardware. Most of the energy loss in ICT currently comes from idle computers, so greater energy efficiency can be achieved by creating low-energy computers used 100% of the time.

Another aspect is making better use of computers: people often browse the web or watch videos on YouTube, for example, without realising how much energy this takes up. I think a cultural change is necessary. Even clicking on ‘search’ in Google generates much more activity globally in the system than users might expect.

 

Is it possible to deliver genuine energy savings while achieving optimum performance?

There is always a balance between performance and energy. If you want to increase performance slightly, you often have to greatly increase the energy. For me, it’s about making more reasonable use of resources and prioritisation, i.e., distributing resources by also taking into account what is critical and what is not.

 

What are the main improvements which you would like to see in the computing systems of the future?

I would like to see systems which are more intelligent in the way they operate – ones which adapt to ensure better energy consumption based on need. For example, I love embedded systems and would love to buy a smart watch, but a watch with a battery life of only one day is a no go. Smart products should operate at the level at which you need them, so a watch could be on standby all day without any of the ‘smart’ features enabled.

 

What will the lasting impact of the ParaDIME project be?

This is a great project for facilitating future research focusing on how to use resources more effectively. It has given us the opportunity to collaborate with researchers on concrete problems that need to be solved.

As for the wider impact, this project has shown that energy efficiency is not a simple problem, but rather one involving a lot of different aspects and parties, both at the level of hardware and software. There are many variables and many trade-offs to be investigated. The most likely solution will be to provide a mix of different hardware components and software systems that can be leveraged according to the problem to be solved.

When a greener cloud = warm homes: IEEE reports on Cloud&Heat

Thu, 2014-11-20

ParaDIME partner Cloud&Heat has been garnering plenty of positive press recently, including a headline spot in the IEEE (Institute of Electrical and Electronics Engineers) Spectrum Energywise newsletter. This article, which features Cloud&Heat’s video “The Green Cloud”, explains how the German company’s win-win concept: installing mini data centers in homes and offices which provide free heating for the host.

The company’s “distributed cloud heaters”, the article continues, would cost about the same as a conventional heating system. For their part, Cloud&Heat avoid many datacenter infrastructure costs and have access to a well-distributed network.

You can read the full article on the IEEE Spectrum website

René Marcel Schretzmann, Dr Jens Struckmeier and Prof Christof Fetzer from Cloud&Heat

ParaDIME presents: A summer of energy-efficient computing

Wed, 2014-07-02

Just in time for summer, we’re taking ParaDIME on the road, with a series of project-related presentations at events over the next few months. Come along to meet our researchers and find out how ParaDIME is reworking everything from hardware to applications to make computing more energy efficient.

First up is the ISVLSI 2014: IEEE Computer Society Annual Symposium on Very-large-scale integration (VLSI) in sunny Tampa, Florida, USA from 9-11 July 2014. Researchers from Barcelona Supercomputing Center will be discussing system-level power and energy-estimation methodology for open multimedia applications on 10 July, in the 4.30-5.30pm session.

Next, along with a clutch of projects focusing on delivery energy-efficient computing, ParaDIME will be represented at the ICT-Energy summer school taking place in Perugia, Italy from 14-18 July 2014. Romantic Verona will be the backdrop for our talk at DSD 2014, the Euromicro Conference on Digital System Design from 27-29 August 2014.

To round off the summer, researchers will discuss the DESSERT (DESign Space ExploRation Tool based on Power and Energy at System-Level) tool at the IEEE SOCC 2014: IEEE International System-on-Chip Conference in Las Vegas from 2-5 September 2014.

We look forward to seeing you there – don’t forget in the meantime you can contact us with any project queries via the ParaDIME website contact form

Map with markers showing ParaDIME events in Tampa, Perugia, Verona and Las Vegas

ParaDIME researcher Oscar Palomar speaking at a conference

ParaDIME researchers Yarco Hayduk, Santhosh Rethinagiri and Oscar Palomarat an event