Dienstag, 21. Dezember 2010

AutoProfiler - Detecting parallelization potential

As I wrote in my last post, we focus on rather coarse-grained parallelization schemes and patterns. In our research group we developed AutoProfiler, a tool to detect parallelization potential from  analyzing the runtime profile of a sequential application.

Also, our results tell us, that we detect most of the methods, that would be parallelized by an experienced developer in manual parallization process.

Today,  I want to write something about how AutoProfiler reveals this.

In my post on parallelism of the future, I already stated, that a combination of static any dynamic approaches is the most promising way. So AutoProfiler start with the creation of a dynamic runtime profile. Specifically, AutoProfiler records indicators such as the number of times a method is being called, the runtime share of a method and so on. After that, a metric tries to map those indicators together with the actual indicator values to known parallel software design patterns.

One example. Let's say we have a method foo() that calls some other methods in its body, such as bar1(), bar2() and bar3(). These three methods use most of the runtime of foo(). For this specific example, AutoProfiler would come to the conclusion, that foo() is a the master ins a master/worker-pattern. The calls to bar1() to bar3() sould be made in parallel as they are the workers.


Of course, it's not as simple, as the control flow has to be seen in combination with the data flow. The method foo() might even be a whole pipeline with different pipeline stages, but I think, the main aspect of AutoProfiler comes clear to you.

If you know my blog you know that this is just one single piece of the puzzle - the big picture still is to bring all this together in one IDE. It's crucial to bring this info closer together and closer to the developer.

Greetings,
K!

Freitag, 3. Dezember 2010

The .NET-Multicore Group

Hey there,

as our research project grows, we initiiated the ".NET Multicore Group" at the Karlsruhe Institute of Technology (KIT). Out project website can be found here.


Currently, we have 3 scientific assistants and 5 students working together. When you know my blog a bit, it might not be a big surprise, that our research interest is the preparation and conversion of sequential application to parallel and the runtime defects that arise in parallel applications.

Have a nice weekend.

K!

Dienstag, 23. November 2010

AutoProfiler - Mesuring method performance

Hello everybody,

explicit parallel constructs were great in the beginning of the multicore era, but now, where we're facing dozens of chips on a processor. This is not well suited anymore for general purpose applications.

We conducted studies in parallelizing sequential applications, reaching from fairly simple algorithms to general purpose-applications and learned, that the advantages of parallel computing relate to the degree of the parallelization scheme used when converting sequential to parallel code.

Of course, there is still the need to have an experienced developer dive deep into the code, analyze it and transform that to a parallel version - in other words: fine-grained explicit parallelization. Algorithmic parallelizations often profit from this scheme for example .

For general purpose applications though, it's a complete different story: Here, the focus rather lies on somewhat coarse-grained structures such as methods. Combining this aspect with object-oriented design, me and the fellow researchers Georgios Tournavitis et al. (University of Edinburgh) come to the same conclusion: A holistic approach to parallelization of general purpose applications begins with the detection of methods or blocks of coherent code.

Currently, we focus on developing an automated approach to analyzing sequential code in our research group. We developed a tool named "AutoProfiler", which extracts possibly interesting regions of code according to the runtime profile having been created and instrumented completely automatically. We see this tool as a first approach to detect software design patterns. In our evaluation, we used a benchmark set of over 20 fairly small algorithms and programs together with real-world applications having been parallelized manually such as a desktop search. We compared the results we got from AutoProfiler with the methods that have been touched in a manual parallelization process. We found, that in over 80% AutoProfiler proposes the correct methods. In other words: AutoProfiler is able to detect over 80% of those methods, that a programmer would want to parallelize - good news!

So, this proves us right - now we'll take the next step forward.

K!

Mittwoch, 20. Oktober 2010

Ennhancing Unit Tests for Performance Monitoring #2

Welcome back,

as I wrote in my last posting on unit tests, they can be used to get a runtime profile for a certain region that you are interested in. One general shortcoming of unit tests when testing the inner structure of a method or region is, that they that focus on behavior that can externally be seen. We propose an alternative usage of the assertion mechanism in order to measure the performance profile of a method in its inner body. Without our approach you would have to restructure your code to get this kind of information. At the same time this would also changes the timing behavior. When dealing with concurrent programming this is exactly, what you don't want. When monitoring the performance what you would do today is let a profiler run you application which captures everything you can get. This ends up in a huge pile of runtime information that might not even be relevant for the region you are interested in. As I've mentioned before: A parallel application is a queue of sequential and parallel code. The pearl neckless. Remember?!

Throughout an application there are many different parameters that affect the runtime behavior, reaching from general aspects such as the design pattern that is used to describe the architecture, to more specific ones such as the number of threads executing an application or the size of global data structures. The concrete value of each parameter can be manifold and is orthogonal to all other parameters. Thus, each parameter defines one dimension in a hypercube depicting the complexity space of an application. It can easily be seen that already a very small application with -let's say- 5 parameters results in a 5-dimensional cube.

Now, what's got all this to do with performance optimization and the pearl necklace you might ask? Calm down and relax. I'll get to it: When capturing the whole application and deducing parameters along the way, you easily end up inside exploding space due to the complexity. Instead, our approach enables you to focus on certain regions of interest. You can at first focus on those areas of sequential code that use most of the runtime and focus on monitoring those parallel regions alongside the whole development phase. Also, we keep a history table to be able to capture and analyze the effects changes had on your "pearls".

K!

Mittwoch, 6. Oktober 2010

Enhancing Unit Tests for Performance Monitoring #1

Hey folks,

in the past several weeks I've eagerly tested Intel's Parallel Studio. I got my hands on the beta release and tested it against some common parallel problems and I went on working on the parallelization only in September. Since then though, I have been developing an approach combining software patterns with unit tests I'd like to share with you. In the next weeks I'll be coming up with further performance aspects.


Our research works in this field show that parallel applications can basically be seen as sequential code with parallel regions of code embedded in between. I might be using the metaphor of a "pearl necklace" now and then, because I like the idea that the pearls are arranged one after another and that they are the precious spots that harbor parallel potential.

Recently, some quality work has been done around unit tests, that attracted my attention. I'm currently working on a test-based way to detect parallelization potential in sequential applications. Now, everybody that is familiar with that topic knows that unit test frameworks don't support this out-of-the-box, so we're just about to enhance it ourselves. The new thing about our approach is that unit tests can then not only assert Boolean conditions for fields or return-types. Assertions can then also account for memory consumption, cache-evicts or other counters that might not be uninteresting to those dealing with parallelization. As the test runs repeatedly execute alongside the development process, our approach can also be used to monitor performance changes that occur in the course of code refactoring.

So now you can more easily focus on the essentials - the pearls - and leave the rest out.

K!

Freitag, 4. Juni 2010

Software Design Patterns for Parallelization

Welcome back,

it's been some time, but it's been some quality time:

In the last few weeks I've further investigated how to cope with the multicore era as a software engineer.
For this purpose I've collected software patterns, that can be used for code refactorings. For now I can say, that I see a huge benefit of using well-structured source code in sequential code when preparing to go parallel.

So I thought It would be nice to say something about those patterns:
In our lectures on software engineering we tell our students, that the usage of design patterns together with architectural pattern relatively early in the design phase has some huge advantages within the software development cycle. The benefits of these "solutions to recurring problems" called patterns are :

- With every developer having a sound understanding of patterns, the code is much more easily readable and will thus be understood much better.
- Patterns improve the team communication by defining a clear terminology for this specific context.
- Patterns improve both the quality and the structure of source code.
- With patterns, code refactorings like minor or even major revision updates for that software can be achieved much faster
- Patterns support state-of-the-art programming

But patterns also have one tiny, itsy-bitsy drawback:
- Nobody uses them. Ever.

Snap! OK, well...that's a bit too gloomy and pessimistic.But still: Design patterns in software are by far not that frequent, as they should be when you look at the benefits you get. Now, why is that so? One reason for sure is, that not many developers around the globe know about the full potential of software patterns. Within the last few years though, MVC (Model-View-Controller) was one pattern, which finally got some attention due to the strong increase in web applications. The second reason is, that when you don't know what you're doing you can experience problems with patterns you wouldn't have had if you hadn't used patterns. Of course it's not specific for patterns that you can mess things up, when you don't know what you're doing. Nonetheless does this keep people away from  patterns... The third reason - maybe the easiest to understand and the hardest to figth against: People do things the way they always did. Design patterns came up in the 1980s and gained some reputation in object-oriented design in the 1990s. Only since the turn of the new millennium are patterns a topic for research in software engineering. So, to many developers this still is relatively new. They didn't grow up with that paradigm and they clearly didn't learn how to program in such a way.

Now, I'm not here today to tell you, that the points of criticism to patterns are invalid. But when you take the time and read about software patterns and use them later on in one of the next projects, you will definitely appreciate the benefits mentioned above because they suddenly come reasonable to you.

When I find the time, I'll come up with a post on Intel's "Parallel Studio 2011" which is currently in beta. It's supposed to help the developer by guiding him through the whole process of parallelizing software.
I'll symbolize the results we got out of the first two real-world projects with a quote from Albert Einstein, who said:
"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."
K!

[1] - Model-View-Controller pattern (MVC): http://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller
[2] - Design Patterns: http://en.wikipedia.org/wiki/Design_pattern_(computer_science)

Dienstag, 9. Februar 2010

Parallelism of the future

Hey folks,

as I'm currently working on some form of automatic parallelism in software engineering, I'd like to share my perspective of solving this problem on machines of tomorrow.  As I've already mentioned in the post on "Multicore programming - but what comes next" the multicore era of today already belongs to the past, and the future being "manycore" already is the present.
We as software engineers have to think about it today to be able to close the gap that currently exists between the hardware potential and the software engineering expertise. In my studies I find that we need to be able to offer a development environment to the software developer that takes care of certain parallel situations itself. To my mind it's the wrong decision to burden every developer with parallelism specifics. Instead, we as parallel researchers need to figure out a handy way to achieve a certian form of parallelism semi-automatically or even automatically. This is what I'm focusing on.

In the same way virtual machines have been taking over a large part of business software development, AutoParallelism has to be achieved. And just like native code is faster than IL-code, an automatic parallelized application won't be able to speed up just like an applicaion that is developed by a parallel specialst - which is totally OK. If you walk the extra mile you can get the extra speed.

Currently the parallelisation frameworks are very fine-grained: They offer statement-based parallelisation together with statement-based locks. Also, the compiler experts work on automatic compilers that are able to detect parallelizable loops. I'm heading in another direction: Don't detect small jobs and roll them out on multiple processors, but detect large jobs and schedule those. Intel's TechParallel Talk undermines my thesis.

Let's wait and see what I can tell you next time