Euro-Par 2006 Logo
Title:Euro-Par 2006 in Dresden, Germany; Picture of Dresden: DWT/Dittrich

Distinguished Papers

The Program Committee has chosen from 292 submitted papers 5 distinguished papers. These where presented in 4 Topics:
    1. Support Tools and Environments
    2. Performance Prediction and Evaluation
    11. Distributed and High-Performance Multimedia
    17. High-Performance Bioinformatics

  • Andrej Kuehnal , Marc-Andre Hermanns, Bernd Mohr, Felix Wolf
    Specification of Inefficiency Patterns for MPI-2 One-sided Communication
    Topic: 1. Support Tools and Environments

    Abstract: Automatic performance analysis of parallel programs can be accomplished by scanning event traces of program execution for patterns representing inefficient behavior. The temporal and spatial relationships between individual runtime events recorded in the event trace allow the recognition of wait states as a result of suboptimal parallel interaction. In our earlier work, we have shown how patterns related to MPI point-to-point and collective communication can be easily specified using common abstractions that represent execution-state information and links between related events. In this article, we present new abstractions targeting remote memory access (aka one-sided communication) as defined in the MPI-2 standard. We also describe how the general structure of these abstractions differs from our earlier work to accommodate the more complicated sequence of data-transfer and synchronization operations required for this type of communication. To demonstrate the benefits of our methodology, we specify typical performance properties related to one-sided communication.

  • Sadaf R Alam and Jeffrey S Vetter
    Hierarchical Model Validation of Symbolic Performance Models of Scientific Kernels
    Topic: 2. Performance Prediction and Evaluation

    Abstract: Multi-resolution modeling and validation of performance models of scientific applications is critical primarily for two reasons. First, the step-by-step validation determines the correctness of all essential components or phases in a science simulation. Second, a model that is validated at multiple resolution levels is the very first step to generate predictive performance models, for not only existing systems but also for future systems and problem sizes. We present the design and validation of performance models of two scientific kernels using a new technique called the modeling assertions (MA). Our MA prototype framework generates symbolic performance models that can be evaluated efficiently by generating the equivalent model representations in Octave and MATLAB. The multi-resolution modeling and validation is conducted on two contemporary, massively-parallel systems, XT3 and Blue Gene/L system, which provides 512 MBytes and 256 MBytes physical memory in its two unique execution modes. The experimental results validate the hierarchical MA models predictions in terms of the workload distribution and the growth rates with respect to the key input parameters. In addition, the memory requirements and limits that are identified by the MA models are also verified by the runtime values.

  • Maik Nijhuis, Herbert Bos, Henri E. Bal
    Supporting Reconfigurable Parallel Multimedia Applications
    Topic: 11. Distributed and High-Performance Multimedia

    Abstract: Programming multimedia applications for System-on-Chip (SoC) architectures is difficult because streaming communication, user event handling, reconfiguration, and parallelism have to be dealt with. We present Hinch, a runtime system for multimedia applications, that efficiently exploits parallelism by running the application in a dataflow style. The application has to be implemented as components that communicate using streams. Reconfigurability is supported by a generic component interface. Measurements have been performed on a SpaceCake SoC architecture simulator. Hinch can easily be ported to other shared-memory architectures.

  • Xiaoyuan Yang, Porfidio Hernández, Fernando Cores, Ana Ripoll, Remo Suppi, Emilio Luque
    Providing VCR in a Distributed Client Collaborative Multicast Video Delivery Scheme
    Topic: 11. Distributed and High-Performance Multimedia

    Abstract: In order to design a high scalable video delivery technology for VoD systems, two representative solutions have been developed: multicast and P2P. Each of them has limitations when it has to implement VCR interactions to offer true-VoD services. With multicast delivery schemes, part of system resources has to be exclusively allocated in order to implement VCR operations, therefore the initial VoD system performance is considerably reduced. The P2P technology is able to decentralize the video delivery process among all the clients. However, P2P solutions are for video streaming systems in Internet and do not implement VCR interactivity. Therefore, P2P solutions are not suitable for true-VoD systems. In this paper, we propose the design of VCR mechanisms for a P2P multicast delivery scheme. The new mechanisms coordinate all the clients to implement the VCR operations using multicast communications. We compared our design with previous schemes and the results show that our approach is able to support VCR operations without increasing the system resource requirements; furthermore, the existence of VCR operations even reduces the resource requirements by up to 24%.

  • Shingo Masuno, Tsutomu Maruyama, Yoshiki Yamaguchi, Akihiko Konagaya
    Multidimensional Dynamic Programming for Homology Search on Distributed Systems
    Topic: 17. High-Performance Bioinformatics

    Abstract: Alignment problems in computational biology have been focused recently because of the rapid growth of sequence databases. By computing alignment, we can understand similarity among the sequences. Dynamic programming is a technique to find optimal alignment, but it requires very long computation time. We have shown that dynamic programming for more than two sequences can be efficiently processed on a compact system which consists of an off-the-shelf FPGA board and its host computer (node). The performance is, however, not enough for comparing long sequences. In this paper, we describe a computation method for the multidimensional dynamic programming on distributed systems. The method is now being tested using two nodes connected by Ethernet. The data transfer speed of Ethernet (100 Mbps) is not enough, but according to our experiments, it is possible to achieve 5.1 times speedup with 16 nodes, and more speedup can be expected for comparing longer sequences using more number of nodes. The performance is affected only a little by the data transfer delay when comparing long sequences. Therefore, our method can be mapped on any kinds of networks with large delays.