neljapäev, 14. juuli 2016

PUMPS 2016 - Fourth day

With today's lectures and practical session ended the CUDA part of the summer school. In the lectures case study, heterogeneous computing algorithms and architecture trends and implications were covered. On the fifth day the participants will get their hands dirty with OmpSs programming model.
Lecture slides:


Yesterday's poster session winner was also announced. Winning poster was "Chunking SVM Training Implementation in CUDA" and the presenter was awarded with Tesla K40 graphics card.
The winner of poster presentations
Between the lectures I was able to visit the supercomputer center near our summer school lecture hall.
Supercomputer specs:
  • Name: MareNostrum III (2013)
  • Ranking (in the top 500 supercomputers list): 106
  • Peak performance: 1.1 Petaflops
  • Main memory: 115.5 TB
  • Disk storage: 2 PB
  • Operating system: Linux SuSe distribution
MareNostrum has 52 racks and takes up a space of 120m2

kolmapäev, 13. juuli 2016

PUMPS 2016 - Third day

Third day of PUMPS 2016 started with three lectures, had one practical session,  some poster presentations and in the end was a social event where snacks and drinks were served. The topics discussed in lectures were more advanced than in previous days but still interestingly presented (students were awake the all day and interacted with the lecturers). Lectures covered privatization, graphs, dynamic parallelism and CUDA 8 features.
Third day lecture slides:
David Kirk talking about CUDA 8 features
Posters before the poster presentation session

teisipäev, 12. juuli 2016

PUMPS 2016 - Second day

Today there were three lectures covering mostly topics about sparse data, computational thinking, performance tools and higher level programming (Numba, PyCUDA etc.). After the lectures the second practical task had to be completed.
Second day lecture slides:

The purpose of the practical part was to make students understand sparse matrix storage formats and their impacts on performance using sparse matrix-vector multiplication as an example. The formats used where Compressed Sparse Row (CSR) and Jagged Diagonal Storage (JDS).
Illustrative schema from the instructions of the practical part
After the official part was over I took time to roam around Barcelona (PUMPS 2016 takes place at the UPC in Barcelona). People here seem friendly and I have only good words on the local food (I try to stay away from Burger King and MacDonald's).
Picture taken on the roof of  Arenas de Barcelona

esmaspäev, 11. juuli 2016

PUMPS 2016 - First day

PUMPS (Programming and Tuning Massively Parallel Systems) summer school is the first big stop on my journey. It is a five day intensive course where the basics of CUDA programming are teached. As I have not done any programming on CUDA cards I feel it is a necessary step in making my life easier for my encoding cluster implementation.
On the first day there where three lectures and one practical session that among other things covered scatter parallelization and gather parallelization.
First day lecture slides:

In the practical session scatter parallelization and gather parallelization had to be implemented in CUDA code. For this there was a online browser based environment that automatically checked the coding result. This environment is going to be used throughout the course.

Wen-mei Hwu giving a lecture

Introduction

My name is Hendrik, I am 24 years old Estonian. I can speak Estonian, English and some programming languages (also have studied Russian for many years but for me programming languages are closer to heart).
Computers have fascinated me since my early childhood. I used to play Commander Keen on my mothers work computer often wondering how the character in the game obeyed all my orders. Specially when it meant the death of the hero.
First PC in our household had windows 95 operating system installed on it. There I learned that if movies do not play, there is typically some sort of mystical codec missing and without the right drivers the computer is just a box of electronics.
Now I am older, have had a lot of experience in electronics, personal computers, robotics and much more. My Bachelor's theses was about AES implementation on FPGA development board. On FPGA boards I first encountered  extensive parallel programming. Encouraged by this I became certain that I wanted to do something with parallel computing as a part of my Master's theses as well.
Another interest and passion I have is videography. With my friends I have made different short films, fun clips and other media for over five years. With creating video media comes also the need for video encoding. This is usually tedious and time consuming process and during the encoding process the computer is more or less unusable.
When I found out that my university has some Nvidia CUDA cards and available nodes I thought that it is a wonderful idea to combine my two passions and create a cluster with Nvidia cards that would relief my computer from the tedious encoding load. This is probably not going to happen as it is the schools hardware but universities have even more videos to encode as the lectures and events are often recorded.
So as my Master's thesis I am designing a CUDA cluster that is capable of video encoding. Future posts will cover the learning experience I have during this project.
A memory from my childhood

The big plan

In this blog I will post about the journey of writing a Master's theses in the Robotics and Computer Engineering curriculum at the university of Tartu. I currently just finished the first year in the Master's program and I am preparing for the second year.

The outline (plan for the content of this blog):

  1. Introduction posts;
  2. PUMPS 2016 summer school posts;
  3. Third semester post;
  4. Fourth semester posts;
  5. Conclusion post.

This is my first blog