Like 2017, Polly Labs participates as a mentoring organizations at Google Summer of Code 2018. Even before 2017 we participated indirectly through the LLVM Foundation and Julia.

This page is meant as a starting point for students interested in submitting a project proposal to our organization.

Applying

First and foremost, please contact us before applying. You can reach us for enquiries related to any of our projects through the Polly Labs mailinglist. We can help you deciding which project you want to work on and review your proposal. In our experience, the proposals submitted by students who contacted us first are vastly better than those who did not.

The detailed guide is available at the How-to-apply page.

In summary:

  • We will ask you to do a minor contribution to your project of choice, for instance, fix a small bug. This helps you to get familiar with our patch workflow and gives us an idea about your skills.

  • Chose one of the project suggestions below or propose you own.

  • Write a proposal using the template found at the How-to-apply page.

  • Let use review your proposal before submitting the final version.

Mentors

These people have agreed to potentially mentor a project:

  • Tobias Grosser
  • Michael Kruse
  • Sven Verdoolaege
  • Oleksandr Zinenko
  • Chandan Reddy
  • Michael Ferguson
  • Siddharth Bhat
  • Armin Größlinger
  • Stefan Ganser
  • Philip Pfaffe

Project Ideas

Here are some ideas for projects

These are some projects students can apply for. The list is not exhaustive, we are open to more project suggestions.

PRL for CUDA

PRL is a runtime library for PPCG-generated code. Its main advantages over ocl_utilities.c used by default by PPCG are that it reuses the OpenCL objects (context, command queue, etc) and the programs compiled by the OpenCL driver so it does not have to be recompiled every time a PPCG-optimized region starts execution. It also allows reusing memory buffers allocated on the GPU, with fine-grained control over when they have to be transferred between host and accelerator.

This library currently only exists for OpenCL. In CUDA the program compilation and persistent objects are managed by CUDA and its runtime. However, there is currently no control over data transfers when using CUDA. The goal for this GSoC project is to add CUDA-support to PRL.

Things to do in this project:

  • Refactor out the shared parts for the OpenCL and CUDA versions of PRL.
  • Implement the PRL memory tagging API (e.g. prl_mem_manage) for CUDA.
  • Change the PPCG code generator to optionally generate CUDA code for PRL.
  • Add support to pencilcc, e.g. pass the required libraries to the linking pass.
  • Measure the speedup gained through memory tagging.
  • Maybe standardize the memory tagging API in PENCIL (e.g. if there are additional flags only useful for CUDA).

Required knowledge: C, Low-level CUDA

Estimated difficulty: Moderate

Possible mentor(s): Michael Kruse and/or Chandan Reddy

Fuzzing Polly and isl

Fuzzy-testing could be a useful tool for finding bugs in the existing implementation. Unfortunately for both projects it is not as easy as launching libFuzzer or American Fuzzy Lop.

In case of Polly, this merely tests the bitcode parser which has not been fortified against fuzzing itself yet. Also, the probability to hit something that just remotely looks like a SCoP is rather low. That probability is not much higher when using llvm-stress which also has the problem that it can only generate certain categories of nested loop structures. A tool that can find whether some condition can occur and produce reduced test cases that are otherwise difficult to find would be very valuable.

For isl simple fuzzing would just test the parser. More interesting would be if it could test the library’s other public functions as well. For instance, some error conditions can get hit only in unusual circumstances.

This project is more about developing tools for fuzzing and finding bugs than to fix them. The community would try to fix the bugs on the majority of cases. Possible strategies are:

  • Try to increase the likelihood of SCoPs being generated by afl or libFuzzer.
  • Modifying llvm-stress to more likely generate SCoP-like IR.
  • Write a tool that combines and mutates tests from Polly’s test suite into new test cases. (This is basically what afl and libFuzzer are doing as well, but on bit-level)
  • Fuzzing minimal programs that read a polyhedron as input, process it using isl functions and check the plausibility of the result’s correctness.

Required knowledge: LLVM bitcode, Fuzzing

Estimated difficulty: Easy

Possible mentor: Michael Kruse

Implement a faster SCoP detection in Polly

Last year, Sebastian Pop and Aditya Kumar published the article "SCoP Detection: A Fast Algorithm for Industrial Compilers >". It presents a more compile-time efficient algorithm to find SCoPs on the IR-Level which they implemented in GCC Graphite.

This project’s idea is to either implement a similar algorithm in Polly or improve the detection in other ways, e.g., currently the same IR is checked multiple times. Polly’s ScopDetection pass is currently based on LLVM’s RegionInfo. which in turn uses LLVM’s DominanceFrontier analysis. The latter is considered deprecated and has quadratic worst-case behavior. Not least the current ScopDetection does a lot of redundant computation.

Required knowledge: Polyhedral model

Estimated difficulty: Very difficult

Possible mentors: Tobias Grosser, Andreas Simbürger

Correctly preserve LLVM analyses after Polly’s CodeGen

Transformation passes in LLVM can chose to either preserve an analysis’ result by adapting it to the changes it made or let the pass manager throw the analysis away to force it being recomputed when needed again. Due to Polly’s current pass structure, Polly must preserve all analyses it uses or the pass manager will throw away Polly’s SCoP analysis themselves.

Currently Polly does not preserve all analysis as well as it could do. Most of the time it preserves just enough information such that other SCoPs in the same function can still be processed correctly. There is an ugly workaround to ensure that the analyses are recomputed for add other passes.

The following analyses must be preserved.

  • DominatorTree: Fully preserved.
  • ScalarEvolution: At the moment Polly just invalidates everything in the SCoP.
  • LoopInfo: Polly ensures that already found loops stay consistent, but does not add new loops in the generated code.
  • RegionInfo: Like loops, existing regions are preserved, but no new regions in generated code are found.
  • ScopDetection: Code-generated SCoPs should be invalidated.
  • DependenceInfo: Code-generated SCoPs should be invalidated.
  • Alias analyses: No effort done, probably also not necessary.

The main goal of this project is to ensure that all analysis are preserved correctly such that they pass their verifyAnalysis function and no miscompilation occurs with the barrier disabled. Stretch goals are to improve the quality of analysis preservation, for instance that loops in the generated code are added to LoopInfo.

Required knowledge: LLVM pass manager (old and new)

Difficulty: Moderate

Possible mentors: Michael Kruse

Profile Guided Optimization in Polly

Even though Polly’s compile time is today not a lot higher than other non-trial IR passes, the need to version code in many situations and the lack of static knowledge about loop iteration counts, hotness of functions, and parameter requires Polly to be significantly more conservative than it would need to be. The goal of this project is to connect Polly with the LLVM profiling infrastructure to exploit profiling information to decide: 1) when to run Polly, 2) how aggressive to version the code, 3) which code version to emit, and 4) which assumptions to take. As a result, Polly should can in profile guided builds become more aggressive, while still having a lower compile time and code size impact.

Required knowledge: PGO in LLVM

Possible mentors: Tobias Grosser

Compile Chapel with LLVM/Polly

The Chapel compiler supports an LLVM backend and early experiments with Polly integration indicated that Polly was not identifying loops from Chapel kernels an analyzable. This goal of this project is to figure out what is going wrong and address the issue. Addressing the issue might require patching Polly or Chapel and the student will be expected to work with the appropriate developers to make that happen. Once Polly is recognizing loops from Chapel, the next step in this project is to demonstrate some simple Chapel loops running on a GPU through Polly integration.

Wanted knowledge: Polly, Chapel

Estimated difficulty: moderate

Possible mentors: Michael Kruse, Michael Ferguson

Improving the Polyhedral Playground

The Polyhedral Playground is an interactive Jupyter worksheet for trying out the Integer Set Library and polyhedral compilation.

Here are some ideas on what can be worked on:

  • Use WebAssebemly for better performance.
  • Export more functions from isl.
  • More interactive JavaScript-based features.
  • Improve graphical representation.
  • More tutorials and examples.

Required knowledge: Jupyter, JavaScript, EmScripten

Estimated difficulty: moderate; easy if you have experience with those technologies

Possible mentors: Tobias Grosser, Oleksandr Zinenko

Evaluate Spatial Locality Optimizers

We have two (competing) implementations that optimize schedules for spatial locality. Polly’s rescheduler currently only optimize for temporal locality.

The task is to evaluate these spatial optimzers and implement at least one of them in Polly. Individal features of both implementations might even be combined. In addtion, Tensor Comprehensions has functionality that we might also be interested in.

Required knowledge: C, C++, understanding of the underlying polyhedral theory is a plus

Estimated difficulty: harder

Possible mentors: Sven Verdoolaege, Oleksandr Zinenko

Collecting Polyhedral Benchmarks

Our ability to optimize programs depends on how we measure them. At the moment we mostly use Polybench. The project is to assemble more benchmarks that a polyhedral optimizer should be able to transform. The benchmark program should only contain the performance-relevant kernel, with no obstacles which could make transformation illegal (e.g. indirect arrays, aliasing, overflows, function calls, etc.) The goal is to add these benchmarks to LLVM’s test-suite.

Fields which we think are underrepresented in our current benchmark set:

Required knowledge: C, C++

Estimated difficulty: Easy