Research Projects
Andrea Marongiu leads the activities of a small group of young researchers and graduate students, with focus on parallel programming for multi- and many-processor systems on a chip. This includes programming model, compiler (based on the GNU GCC technology) and runtime support to efficiently address the many performance and energy issues in heterogeneous multi-processor systems on a chip (MPSoC). In particular, the focus is on efficient management of explicitly managed memory hierarchies (i.e. scratchpad-based) adhering to the Partitioned Global Address Space (PGAS) paradigm.
OpenMP extensions for NUMA MPSoC programming
OpenMP was considered as a convenient starting point to develop a programming framework. The implementative challenges to support the SIMD/OpenMP parallel execution model on an embedded PGAS MPSoC were first addressed to create a scalable execution layer. Then, a set of extensions to the OpenMP API that augment the standard interface with features to expose the memory system at the application level were defined and realized. Specifically, the extensions can be summarized as follows:
Features to trigger data
distribution and data movement (additional #pragma
directives)
Compiler (based on
GCC 4.3) support to distributed array access instrumentation
(software address translation) and DMA-based data transfer
Lightweight lookup
mechanism based on compiler-generated metadata for low-cost distributed array
references
Allocation compiler passes
that exploit profile information on array access count to determine a data
distribution scheme which captures data locality at each parallel region
SIMinG-1000: Fast and scalable simulation of thousand-core architectures on GPGPUs
Recently, the scope of our research has broadened to GPGPU programming. Emphasis is on fast and scalable parallel simulation of many-core systems on GPU hardware. Simulators are the primary tool for application development and performance evaluation of future massively parallel (thousand-core) architectures. However, the complexity of such architectures exposes the limitation of current simulation technologies. It is clear that current virtual platforms are not able to tackle the complexity issues introduced by 1000-core future scenarios, because of the many performance and/or scalability issues.
Our research is oriented to the development of a fast and accurate simulation framework, SIMinG-1000, targeting extremely large parallel systems by specifically taking advantage of the inherent potential processing parallelism available in modern GPGPUs.