Class UAB Notes

My class and lab UAB notes


These are my class notes in University Autonoma of  Barcelona (UAB) from course High Performance Computer (HPC). These post aren't written with academic rigor, not yet :D.

Class Parallel Computer Architecture


Class Date: 01/10/2012

Keywords: KILL Rule, Little's Law, Performance, Multi-core's Architecture


When we are talking about performance we have two ways: hardware and software. In first case, we can improving performance if increasing the clock frequence. This solution has some problems like power consumption and heat dissipation. If you increase the clock frequence two times the power consumption will increase in 6 times. It means you have to provide a expensive system cooler to keep the machine working without problems.

The KILL Rule is associate with clock frequence. That rule say if you increase the chip size the performance gain must increase in same percentage. Otherwise the resource shouldn't be increase. In other words: the performance increase must be LINEAR to the space's resourse increase.

Algorithmically:
if less than linear

   Kill


Multi-core's Architecture is a simple solution to the kill rule. Because, just join many cores in same chip. Hey!!! Then, don't we have kill rule in multi-core? Maybe, I'll write about in next post. But if you can't wait you can see in [1].
Another hardware performance problem is leakage current. Leakage is a loss of electrical current or eletrons.[2] said: "as semiconductor manufacturers continue to make transistors smaller to squeeze more onto a chip, leakage current problems increase. Smaller transistors have thinner insulating layers, causing more leakage current." It is a problem to growth computer processor performance, because semiconductor manufacturers can't increase the transistor number inside the chip.
In addition, we saw at class a Little's Law. In [3] they say: "Little's Law says that, under steady state conditions, the average number of items in a queuing system equals the average rate at which items arrive multiplied by the average time that an item spends in the system. Letting
L =average number of items in the queuing system, 

W = average waiting time in the system for an item, and 

A =average number of items arriving per unit time, the law is 

 L=AW

"
For example, how many cars a company could build? If the company spend 100 days to build one car and it's able to build 100 car by days then the company can build 1000 cars.
100 days X 100 cars/day = 1000 cars
"The interpretation (of items) will depend on the application and the goals of the modeler"[3]. Another examples in [4].


We can apply the Little's Law in Parallel performance this way:
Parallelism = Latency X Bandwith
Latency means: time spent
Bandwith means: volume by time
Parallelism means: thread available

Other importants conception cited in class were: data througput, thred context switch, multi-threading. I think I'll talk about it more in next post.
GPU -> 1000 threads ready to be executed.


[1] Agarwal, Anant; Levy, Markus. 2007. The KILL Rule for Multicore. Design Automation Conference. DAC '07. 44th ACM/IEEE, vol., no., pp.750-753, 4-8 June 2007.

[2] http://www.wisegeek.com/what-is-leakage-current.htm

[3] http://web.mit.edu/sgraves/www/papers/Little's%20Law-Published.pdf

[4] http://www.factoryphysics.com/Principle/LittlesLaw.htm


Class Information Theory


Class Date: 04/10/2012

Posted: 11/10/2012

Keywords: Entropy, Math Demonstration




Entropy:

Basic concepts. Sorry, part of this note was written in Portugues because I'm very busy. I'll will translate to Portuguese as soon as possible. However, the Entropy's equations are available in link below and there are two math demonstrations. I think its usefull.

SET ENGLISH OFF
SET PORTUGUESE ON

Teoria da informação é um ramo da Teoria da probalilidade e da estatística que lida com sistemas de comunicação, transmissão de dados, ruidos, correção de erros.

A Informação ou auto-informação  é definida como: - log P(x).
A Informação Total é o somatório das Informações.

A entropia mede a quantidade de informações que podem ser obtidas de uma fonte.

A entropia é uma medida para melhorar projetar sistemas. É possível calcular a distribuição da probabilidade e a capacidade de transmitir a informação. A Entropia, chamada também de Informação média, é a média ponderada da Informação. O peso usando na fórmula é a probabilidade daquele evento ocorrer dentro dos simbolos da Entropia.

A entropia mútua é a correlação entre dois conjuntos de eventos (x,y).

A entropia condicional é a entropia de um valor condicionado a ocorrencia de um outro.

SET ENGLISH ON
SET PORTUGUESE OFF

This link has many Entropy's equations and math demonstrations: http://www.4shared.com/office/Fue1sLeY/EntropyQ3.html


The follow example was solved in class:

Example 1 We have two coins, one unbiased and the other two-headed. A coin
is selected at random and tossed twice, and the number of heads is recorded.
How much information is conveyed about the identity of the coin by the number
of heads obtained?

The solution is in pictures bellow. Click to amplify.






This pictures are avaliable in:

http://www.4shared.com/photo/coMPem_E/IMG_0120.html
http://www.4shared.com/photo/KW67wXtZ/IMG_0121.html
http://www.4shared.com/photo/MkJsIFzw/IMG_0123.html
http://www.4shared.com/photo/L6pB4M7l/IMG_0125.html





[1]  http://arquivoescolar.org/bitstream/arquivo-e/132/5/cap1.pdf (this file is very good, but is written in Portuguese)

[2] http://en.wikipedia.org/wiki/Information_theory

[3] http://en.wikipedia.org/wiki/Conditional_entropy

[4] http://en.wikipedia.org/wiki/Entropy_(information_theory)

[5] http://pt.wikipedia.org/wiki/Teoria_da_informa%C3%A7%C3%A3o




Class Parallel Computer Architecture


Class Date: 08/10/2012

Posted: 15/10/2012

Keywords: KILL Rule, Little's Law, Instruction Level Parallelism,  Data Level Parallelism,  Thread Level Parallelism, SIMD, MISD


This class started with review and showed the vantages and disadvantages of parallelism. 


The mainly problems are programming( it's very hard):
  -Synchronization, Dependencies, Communication, Shared Memory, Resources, Load unbalance

and reliability (failures are more frequent):
    1 processor fail every 100.000 hours
    2 processor fail every 50.000 hours
                 .
                 .
                 .
   100000 processor fail every 1 hours 





 We reviewed about the Kill Rule - if you replicate processors( multi-core) you can get linear performance increase.

We reviewed too Little's law applied to Parallelism.


Techniques to handle long memory latency: 
  - Multithreading to tolerate latency
  - Cache memory to reduce latency

"New" concepts were introduced: 

ILP - Instruction Level Parallelism. The Instruction Level Parallelism is technique that overlap the processing. Some definitions:

"ILT is the principle that there are many instructions in code that don’t depend on each other.  That means it’s possible to execute those instructions in parallel." [1] 

"Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. The potential overlap among instructions is called instruction level parallelism." [2]

"Instruction-level Parallelism (ILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations, such as memory loads and stores, integer additions and floating point multiplications, to execute in parallel. The operations involved are normal RISC-style operations, and the system is handed a single program written with a sequential processor in mind."[3]

This paper [7] talk about how converting thread-level parallelism to instruction-level parallelism using simultaneous multithreading

DLP - Data Level Parallelism is capacity to execute same operation in many data items. Examples of the applications: Media-oriented like sounds and video and Matrix-oriented like Scientific computer.


One interesting paper about DLP, ILP and TLP in [5]. There are many concepts and comparations [4] beyond a great Cook analogy with ILP, TLP and DLP. Here [6] you can find another concepts explained.

The SIMD architecture is more efficiency than MIMD. The SIMD architecture can be found in GPU, CPU and Vector Processor( 30 years old).




[1] http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&sqi=2&ved=0CCQQFjAB&url=http%3A%2F%2Fwww.csd.uoc.gr%2F~hy590-25%2FChapter04-Pipelining2.ppt&ei=I8h7UOjZGs-0hAeUpYGABg&usg=AFQjCNEIp2y5ijl4tRxJzDmnmTcUq_cYew

[2] http://en.wikipedia.org/wiki/Instruction-level_parallelism

[3] http://www.hpl.hp.com/techreports/92/HPL-92-132.pdf

[4] https://smartsite.ucdavis.edu/access/content/group/1707812c-4009-4d91-a80e-271bde5c8fac/dlp1.pdf

[5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.5549

[6] http://indigo.ece.neu.edu/~dschaa/docs/parallelism.html

[7] http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&sqi=2&ved=0CB4QFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.48.6251%26rep%3Drep1%26type%3Dps&ei=JMx7UOPkM4aIhQfmw4CwDg&usg=AFQjCNFxibMFqitWeujtdFhp-wio_AX_nA



Class Research Methodology


Class Date: 08/10/2012

Posted: 22/10/2012

Keywords: CAOS, Research Area, HPC, UAB


This presentation:
http://www.4shared.com/office/fettqfYT/AboutCAOS.html

talk about Research Area from CAOS. It is research group in HPC at UAB.

If you can see more information about CAOS click in:  http://caos.uab.cat/


.
.
Sorry. Don't time to update
.
.

Lab Sun Grid Engine (SGE)

Lab Date: 21/01/2012

Posted: 22/01/2012

Keywords: SGE, Job, HPC, Cluster


if you are new user in SGE I think the follow page is a good starting. It is simple and give you useful information how execute, delete and get information about jobs in SGE.

Enjoy: http://star.mit.edu/cluster/docs/0.93.3/guides/sge.html



Nenhum comentário:

Pesquisar neste blog

Google