<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-19694363</id><updated>2011-12-05T12:40:24.246-06:00</updated><category term='exascale programming'/><title type='text'>High Performance Computing</title><subtitle type='html'>Notes on high performance computing, compiler and programming model. Also all methodology of parallel computation, numerical analysis and its application.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>17</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-19694363.post-8617260152807423671</id><published>2011-12-05T12:39:00.001-06:00</published><updated>2011-12-05T12:40:24.257-06:00</updated><title type='text'>Coarray Fortran 2.0 in Supercomputing 2011</title><content type='html'>Taken from Rice University news http://compsci.rice.edu/news.cfm?doc_id=14042 : &lt;br /&gt;
&lt;blockquote class="tr_bq"&gt;
In a competition held at SC11—the International Conference for High 
Performance Computing, Networking, Storage, and Analysis—in Seattle, 
Washington in November 2011, Rice University's Coarray Fortran (CAF) 2.0
 language, compiler, and runtime system received an award as as one of 
three finalists in the HPC Challenge Class II Awards Competition. The 
CAF 2.0 project is led by Professor John Mellor-Crummey.
&lt;br /&gt;
&lt;br /&gt;
At SC11, Dr. Laksono Adhianto delivered a presentation about CAF 2.0, a 
set of language extensions that support scientific programming on 
scalable parallel systems in a session devoted to the HPC Challenge 
Awards Competition.&amp;nbsp; The presentation described the implementation and 
performance of several CAF 2.0 programs that solve problems in the HPC 
Challenge Benchmark suite. These programs were then run on up to 8,192 
cores of several Department of Energy supercomputers. Other members of 
the Rice Coarray Fortran 2.0 project team include Guohua Jin, Mark 
Krentel, Karthik Murthy, Dung Nguyen, William Scherer III, Scott Warren,
 and Chaoran Yang&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-8617260152807423671?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/8617260152807423671/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=8617260152807423671' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/8617260152807423671'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/8617260152807423671'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2011/12/coarray-fortran-20-in-supercomputing.html' title='Coarray Fortran 2.0 in Supercomputing 2011'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-4004734947632655293</id><published>2011-04-15T11:36:00.003-05:00</published><updated>2011-04-15T11:53:24.360-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='exascale programming'/><title type='text'>Exascale programming: Won't be and will be</title><content type='html'>There is a very interesting article in &lt;a href="http://hpcwire.com"&gt;hpcwire&lt;/a&gt; written by Michael Wolfe (PGI) on the "emerging" of exascale programming: &lt;a href="http://www.hpcwire.com/features/Compilers-and-More-Programming-at-Exascale-117593783.html?viewAll=y"&gt;http://www.hpcwire.com/features/Compilers-and-More-Programming-at-Exascale-117593783.html?viewAll=y&lt;/a&gt;

There are some interesting points we can learn:
&lt;span style="font-weight: bold;"&gt;
Exascale Programming: What It Won't Be&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;It's not a library: it hides the algorithm, data structures, and performance aspects&lt;/li&gt;&lt;li&gt;It's not a C++ class hierarchy or template library: Too much abstraction&lt;/li&gt;&lt;li&gt;It's not a domain-specific language: DSL don't have a large enough user community, not enough support.
&lt;/li&gt;&lt;li&gt;It's not OpenCL: very low level, too complicated.
&lt;/li&gt;&lt;li&gt;It's not a whole new language: most programmers avoid adopting a new language for fear that it will die&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold; color: rgb(255, 0, 0);"&gt;It's not easy&lt;/span&gt;. The idea of making parallel programming easy is silly (?)
&lt;/li&gt;&lt;li&gt;It's not just parallelism: the key isn't parallelism, it's performance&lt;/li&gt;&lt;/ul&gt;
&lt;span style="font-weight: bold;"&gt;Exascale Programming: What It Is&lt;/span&gt;
&lt;ul&gt;&lt;li&gt;It supports all levels of parallelism, from node parallelism down to vector and pipeline parallelism, effectively.&lt;/li&gt;&lt;li&gt;It can map an expression of program parallelism (a parallel loop, say) to different levels of hardware parallelism (across nodes, or to a vector unit) depending on the target.&lt;/li&gt;&lt;li&gt;It supports the programmer with lots of feedback.&lt;/li&gt;&lt;li&gt;It supports dynamic parallelism, creating parallel tasks and threads when needed.&lt;/li&gt;&lt;li&gt;It efficiently composes abstract operations&lt;/li&gt;&lt;li&gt;It is self-balancing and self-tuning.&lt;/li&gt;&lt;li&gt;It must be resilient.&lt;/li&gt;&lt;/ul&gt;I agree most of the above points, except for &lt;span style="font-style: italic;"&gt;the exascale programming won't be easy&lt;/span&gt;. Yes,  effective parallel programming is not easy, and easy parallel programming is not effective. But this does not mean there is no way to be both easy and effective. I think the main key is about &lt;span style="font-weight: bold;"&gt;productivity&lt;/span&gt;. It is okay to be not efficient as long as we can tolerate the performance.&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-4004734947632655293?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/4004734947632655293/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=4004734947632655293' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/4004734947632655293'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/4004734947632655293'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2011/04/exascale-programming-wont-be-and-will.html' title='Exascale programming: Won&apos;t be and will be'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-1374716542261638376</id><published>2007-11-15T09:13:00.000-06:00</published><updated>2007-11-15T09:38:37.817-06:00</updated><title type='text'>Creating customized toolbar in Eclipse view</title><content type='html'>By default, Eclipse view toolbar appears on the right side of the view as shown in the image below:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_4qhUJxn4D5E/Rzxj_vZHszI/AAAAAAAAABQ/Px_QNJTCHZA/s1600-h/toolbar-original.PNG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://4.bp.blogspot.com/_4qhUJxn4D5E/Rzxj_vZHszI/AAAAAAAAABQ/Px_QNJTCHZA/s320/toolbar-original.PNG" border="0" alt=""id="BLOGGER_PHOTO_ID_5133087622000259890" /&gt;&lt;/a&gt;

This is fine when we the actions are not visually associated in any columns. However, one we need to associate a button or toolitem to a table column or a column header, we need to place the toolbar as closed as possible to the table column. Here is the trick: we need to design a layout, preferably a GridLayout, then place the toolbar on top of the tree view or table view.

&lt;blockquote&gt;
 &lt;pre&gt;
    private Composite createCoolBar(Composite aParent) {
     // make the parent with grid layout
     GridLayout grid = new GridLayout(1,false);
     aParent.setLayout(grid);
     CoolBar coolBar = new CoolBar(aParent, SWT.FLAT);
        GridData data = new GridData(GridData.FILL_HORIZONTAL);
     coolBar.setLayoutData(data);
     // prepare the toolbar
     ToolBar toolbar = new ToolBar(coolBar, SWT.FLAT);
     
     // ------------- prepare the items
     // flatten
     ToolItem tiFlatten = new ToolItem(toolbar, SWT.PUSH);
     tiFlatten.setToolTipText("Flatten the node");
        ....
 return aParent; // no changes, reuse the parent
     }
    public void createPartControl(Composite aParent) {
     Composite parent = this.createCoolBar(aParent);
        treeViewer = new TreeViewer(parent, 
               SWT.SINGLE|SWT.FULL_SELECTION | SWT.BORDER);
        ....
        // tricky: needed to expand the tree
        GridData data = new GridData(GridData.FILL_BOTH);
        treeViewer.getTree().setLayoutData(data);

    }
 &lt;/pre&gt;
&lt;/blockquote&gt;

The result is shown below:

&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_4qhUJxn4D5E/Rzxnu_ZHs0I/AAAAAAAAABY/TmIjpa1Fja8/s1600-h/toolbar-new.PNG"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="http://1.bp.blogspot.com/_4qhUJxn4D5E/Rzxnu_ZHs0I/AAAAAAAAABY/TmIjpa1Fja8/s320/toolbar-new.PNG" border="0" alt=""id="BLOGGER_PHOTO_ID_5133091732283962178" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-1374716542261638376?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/1374716542261638376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=1374716542261638376' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/1374716542261638376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/1374716542261638376'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/11/creating-customized-toolbar-in-eclipse.html' title='Creating customized toolbar in Eclipse view'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_4qhUJxn4D5E/Rzxj_vZHszI/AAAAAAAAABQ/Px_QNJTCHZA/s72-c/toolbar-original.PNG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-8500065524128447498</id><published>2007-11-05T10:21:00.000-06:00</published><updated>2007-11-05T10:49:59.484-06:00</updated><title type='text'>Eclipse RCP: Is it really easy ?</title><content type='html'>&lt;p&gt;Eclipse RCP supposed to be to easy and to help programmers not to reinvent the wheel. Unfortunately, its design to be as general as possible makes it hard to understand due its highly rich of features and complex.
For simple RCP, as provided by some samples and snippets (can be checked out via eclipse's repository), Eclipse is fun and easy. This is true at the beginning, but as the time goes on, programmers will face very complex and intuitive way to build a rich application.
&lt;/p&gt;&lt;p&gt;
However, having search for a complete tutorial on Internet, a very interesting page written by Lars Vogel describes a step-by-step programming with Eclipse RCP from feature perspective. The article is accessible from this URL: http://www.vogella.de/articles/RichClientPlatform/article.html
&lt;/p&gt;&lt;p&gt;
This article although not very deep, but it covers functionalities that most programmers need. A high recommend article to read for those who have no time to read books.
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-8500065524128447498?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/8500065524128447498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=8500065524128447498' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/8500065524128447498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/8500065524128447498'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/11/eclipse-rcp-is-it-really-easy.html' title='Eclipse RCP: Is it really easy ?'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-970581709282733028</id><published>2007-09-26T11:25:00.000-05:00</published><updated>2007-09-26T11:29:45.411-05:00</updated><title type='text'>cafc: Co-array Fortran Compiler from Rice</title><content type='html'>&lt;p&gt;Rice has their own Co-array Fortran source-to-source compiler named cafc. The source code can be downloaded from http://www.hipersoft.rice.edu/caf/download.html together with Open64/SL. It also needs third party libraries: &lt;a href="http://www.emsl.pnl.gov/docs/parsoft/armci"&gt;ARMCI&lt;/a&gt;+MPI or &lt;a href="http://gasnet.cs.berkeley.edu/"&gt;Gasnet&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
The libf90caf source code files are:
&lt;blockquote&gt;&lt;pre&gt;
#----------------------------------------------------------------------
# List of Source files
#----------------------------------------------------------------------
COMMON_COM_TARG_SRC =     \
  c_a_to_q.c        \
  config_host.c     \
  config_platform.c 

CFILES=  \
  $(COMMON_COM_SRC) \
  $(COMMON_COM_TARG_SRC)

COMMON_COM_CXX_SRC =  \
  config.cxx  \
  config_elf_targ.cxx \
  const.cxx \
  cxx_memory.cxx \
  dwarf_DST.cxx \
  dwarf_DST_dump.cxx  \
  dwarf_DST_mem.cxx \
  glob.cxx  \
  ir_bcom.cxx \
  ir_bread.cxx  \
  ir_bwrite.cxx \
  ir_reader.cxx \
  irbdata.cxx \
  mtypes.cxx  \
  opcode.cxx  \
  opcode_core.cxx \
  pu_info.cxx \
  quadop.cxx      \
  strtab.cxx  \
  symtab.cxx  \
  symtab_verify.cxx \
  wn.cxx    \
  wn_map.cxx  \
  wn_pragmas.cxx  \
  wn_simp.cxx \
  wn_util.cxx \
  wutil.cxx \
  xstats.cxx  \
  upc_symtab_utils.cxx  \
  upc_wn_util.cxx \
  intrn_info.cxx



COMMON_COM_TARG_CXX_SRCS =  \
  config_targ.cxx \
  targ_const.cxx  \
  targ_sim.cxx

CXXFILES =                 \
  dummy-defines.cxx            \
  $(COMMON_COM_CXX_SRC)        \
  $(COMMON_COM_TARG_CXX_SRCS)
&lt;/pre&gt;&lt;/blockquote&gt;
Pretty wierd, isn't it ?
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-970581709282733028?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/970581709282733028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=970581709282733028' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/970581709282733028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/970581709282733028'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/09/cafc-co-array-fortran-compiler-from.html' title='cafc: Co-array Fortran Compiler from Rice'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-3387790662866405191</id><published>2007-09-25T11:41:00.002-05:00</published><updated>2007-09-25T12:31:54.340-05:00</updated><title type='text'>Co-Array Fortran (CAF)</title><content type='html'>&lt;p&gt;&lt;a href="http://www.co-array.org/"&gt;CAF&lt;/a&gt; is one of the three &lt;a href="http://crd.lbl.gov/UPC/images/b/b5/PGAS_Tutorial_sc2003.pdf"&gt;partitioned global address space (PGAS)&lt;/a&gt;. The other two are &lt;a href="http://upc.gwu.edu/"&gt;unified parallel C (UPC)&lt;/a&gt; and &lt;a href="http://titanium.cs.berkeley.edu/"&gt;Titanium&lt;/a&gt;.
&lt;/p&gt;&lt;p&gt;

CAF is an SPMD based programming model created by created by Robert Numrich and John Reid (researchers from Cray, I suppose ?). 
The advantage of CAF is its simplicity compared to MPI but on the other hand provide almost the same flexibility and control to users. Furthermore, CAF is based on Fortran90, and therefore facilitate users to learn. 
However, it seems not possible in CAF to have dynamic processes where user can spawn or shrink processes. Another disadvantage is that it is based on "message" communication and therefore theoretically has no gain performance for multicore processors.
&lt;/p&gt;&lt;p&gt;

&lt;h3&gt;Rules:&lt;/h3&gt;
– Normal rounded brackets ( ) to point to data in local memory.
– Square brackets [ ] to point to data in remote memory.

&lt;h3&gt;Examples&lt;/h3&gt; (taken from [Numrich03]):
&lt;h4&gt;Variables decalaration:&lt;/h4&gt;

&lt;blockquote&gt;&lt;pre&gt;real :: s[*]
real :: a(n)[*]
complex :: z[*]
integer :: index(n)[*]
real :: b(n)[p, *]
real :: c(n,m)[0:p, -7:q, 11:*]
real, allocatable :: w(:)[:]
type(field) :: maxwell[p,*]&lt;/pre&gt;&lt;/blockquote&gt;

&lt;h4&gt;Communication&lt;/h4&gt;
&lt;blockquote&gt;x(:)[q] = x(:) + x(:)[p]&lt;/blockquote&gt;
Which means x in the q processor will receive the sum of local x and x from the p processor.


&lt;h4&gt;Real examples&lt;/h4&gt;
&lt;blockquote&gt;&lt;pre&gt;real,dimension(n,n)[p,*] :: a,b,c
do k=1,n
do q=1,p
c(i,j)[myP,myQ] = c(i,j)[myP,myQ]
+ a(i,k)[myP, q]*b(k,j)[q,myQ]
enddo
enddo&lt;/pre&gt;&lt;/blockquote&gt;
&lt;h3&gt;Refeences&lt;/h3&gt;
&lt;ul&gt;
 &lt;li&gt;[Numrich03] &lt;a href="http://www.llnl.gov/asci/platforms/bluegene/papers/29numrich.pdf"&gt;Co-Array Fortran What Is It? Why should you put it on BlueGene/L?&lt;/a&gt;
Robert W. Numrich, Minnesota Supercomputing Institute University of Minnesota&lt;/li&gt;
&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-3387790662866405191?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/3387790662866405191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=3387790662866405191' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/3387790662866405191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/3387790662866405191'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/09/co-array-fortran-caf.html' title='Co-Array Fortran (CAF)'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-4850054829748473564</id><published>2007-09-21T10:43:00.001-05:00</published><updated>2007-09-21T13:40:06.285-05:00</updated><title type='text'>Cilk</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_4qhUJxn4D5E/RvQP6pNdOrI/AAAAAAAAAAo/ZPceFLXO3Cg/s1600-h/CilkExample.png"&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;" src="http://2.bp.blogspot.com/_4qhUJxn4D5E/RvQP6pNdOrI/AAAAAAAAAAo/ZPceFLXO3Cg/s320/CilkExample.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5112728977141480114" /&gt;&lt;/a&gt;
&lt;p&gt;Cilk [Blumofe95] is a coarse-grain parallel programming model based C. It is a faithful extension of C for multithreading that uses asynchronous parallelism and an efficient work-stealing schedule. Cilk is currently limited for shared memory architecture although some efforts for distributed memory have been published [Nikhil95]. &lt;/p&gt;
&lt;p&gt;
Despite its simplicity, Cilk provides several advantages as mentioned in [Kuszmaul07]: simple, small, can use the best serial code, can express cache-efficient code, efficient (productivity), good performance on one processor and portable.
&lt;/p&gt;&lt;p&gt;
However, it also suffers some limitations, such as no direct support for fine-grained parallelism, synchronization flexibility, group team and lack of support for other languages such as Fortran and Java.
&lt;/p&gt;
One famous Cilk example is as follows:
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Original&lt;/th&gt;&lt;th&gt;Cilk version&lt;/th&gt;&lt;/tr&gt;
 &lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;pre&gt;int fib(int n) {
 if (n &lt; 2) return n; 
 else {
  int n1, n2;
  n1 = fib(n-1);
  n2 = fib(n-2);
  return;
 }
}&lt;/pre&gt;&lt;/td&gt;&lt;td&gt;
&lt;pre&gt;
cilk int fib(n) {
 if (n &lt; 2) return n;
 else {
  int n1, n2;
  n1 = spawn fib(n-1);
  n2 = spawn fib(n-2);
  sync;
  return (n1 + n2);
 }
}
&lt;/pre&gt;&lt;/td&gt;&lt;/tbody&gt;&lt;/table&gt;


The question is that if this language is good enough for multicore processors ? According to &lt;a href="http://people.csail.mit.edu/cel/"&gt;Charles Leisorson&lt;/a&gt; from MIT, the answer is yes. (See &lt;a href="http://supertech.csail.mit.edu/cilk/2006%20Cilk%20for%20Oregon%20--%20Lecture%201.ppt"&gt;Multithreaded programming in Cilk&lt;/a&gt;). But not for general case I presume. Multicore processor architecture tends to encourage fine grain parallelism instead of coarse grain.


&lt;h3&gt;References&lt;/h3&gt;

[Blumofe95] Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: an efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, California, United States, July 19 - 21, 1995). R. L. Wexelblat, Ed. PPOPP '95. ACM Press, New York, NY, 207-216. &lt;br/&gt;

[Nikhil95] Nikhil, R. S. 1995. Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines. In Proceedings of the 7th international Workshop on Languages and Compilers For Parallel Computing (August 08 - 10, 1994). K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, and D. A. Padua, Eds. Lecture Notes In Computer Science, vol. 892. Springer-Verlag, London, 376-390.
&lt;br/&gt;
[Kuszmaul07] Kuszmaul, B. C. 2007. Cilk provides the "best overall productivity" for high performance computing: (and won the HPC challenge award to prove it). In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (San Diego, California, USA, June 09 - 11, 2007). SPAA '07. ACM Press, New York, NY, 299-300.&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-4850054829748473564?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://supertech.csail.mit.edu/cilk/' title='Cilk'/><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/4850054829748473564/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=4850054829748473564' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/4850054829748473564'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/4850054829748473564'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/09/cilk.html' title='Cilk'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_4qhUJxn4D5E/RvQP6pNdOrI/AAAAAAAAAAo/ZPceFLXO3Cg/s72-c/CilkExample.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-6431417827756437843</id><published>2007-04-02T07:51:00.000-05:00</published><updated>2007-04-02T08:01:09.295-05:00</updated><title type='text'>Building CDT parser</title><content type='html'>&lt;a href="http://wiki.eclipse.org/index.php/CDT"&gt;Eclipse&lt;/a&gt; and &lt;a href="http://www-128.ibm.com/developerworks/views/opensource/libraryview.jsp?search_by=CDT+based+editor"&gt;IBM &lt;/a&gt;website provide some very interesting and useful articles to extend CDT parser (or at least to build a new parser). The &lt;a href="http://wiki.eclipse.org/index.php/CDT/User/FAQ"&gt;CDT FAQ&lt;/a&gt; gives a preliminary hint on how to develop a plugin. 

IBM has a more comprehensive and details information on extending CDT written by Matthew Scarpino. The articles are separated into 5 parts:
- &lt;a href="http://www.ibm.com/developerworks/library/os-ecl-cdt1/index.html"&gt;The C/C++ Development Tooling model&lt;/a&gt;
- &lt;a href="http://www.ibm.com/developerworks/library/os-ecl-cdt2/index.html"&gt;Presenting text in the CDT&lt;/a&gt;
- &lt;a href="http://www.ibm.com/developerworks/library/os-ecl-cdt3/index.html"&gt;Basic CDT parsing&lt;/a&gt;
- &lt;a href="http://www.ibm.com/developerworks/library/os-ecl-cdt4/index.html"&gt;Advanced CDT parsing and the Persisted Document Object Model&lt;/a&gt;
- &lt;a href="http://www.ibm.com/developerworks/library/os-ecl-cdt5/index.html"&gt;Using the PDOM for code completion&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-6431417827756437843?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/6431417827756437843/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=6431417827756437843' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/6431417827756437843'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/6431417827756437843'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/04/building-cdt-parser.html' title='Building CDT parser'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-3968275481347372027</id><published>2007-03-13T11:35:00.000-05:00</published><updated>2007-03-13T11:41:33.227-05:00</updated><title type='text'>International Workshop on Performance Modeling and Evaluation 2007</title><content type='html'>Currently I am a program committee in PMECT 2007: http://nets-www.lboro.ac.uk/pmect07/
&lt;h3&gt;Important dates&lt;/h3&gt;&lt;table&gt;&lt;tbody&gt; &lt;tr&gt;&lt;td&gt;Submission Deadline:    &lt;/td&gt;&lt;td&gt;   April 1, 2007&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;Acceptance Notification:  &lt;/td&gt;&lt;td&gt;   May 11, 2007&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;Camera Ready Copy:    &lt;/td&gt;&lt;td&gt;   June 1, 2007&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;Author Registration due:  &lt;/td&gt;&lt;td&gt;   June 1, 2007&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt;&lt;td&gt;Workshop Day:   &lt;/td&gt;&lt;td&gt;  August 16, 2007&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;

&lt;h3&gt;Submission&lt;/h3&gt;
Authors are invited to submit manuscripts reporting original unpublished research and recent developments in the topics related to the workshop. The proceedings of the workshop program will be published, as the ICCCN 2007 main conference, by IEEE Communications Society and IEEE Digital Library. An accepted paper must be registered and presented at the conference venue and must be limited to 6 pages (with an allowance of up to two extra pages at additional cost) in standard IEEE camera-ready format (double-column, 10-pt font).  Extended versions of selected high quality papers will appear in a Special Issue of Elsevier Journal of Simulation, Modeling, Practice and Theory.   Any update will be posted on the ICCCN-07 Web site.  Please contact the Workshop Co-Chair   L.Guan@lboro.ac.uk   with any questions.

Paper Submission:  EDAS (http://edas.info/)  Deadline: April 1, 2007&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-3968275481347372027?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://nets-www.lboro.ac.uk/pmect07/' title='International Workshop on Performance Modeling and Evaluation 2007'/><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/3968275481347372027/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=3968275481347372027' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/3968275481347372027'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/3968275481347372027'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2007/03/international-workshop-on-performance.html' title='International Workshop on Performance Modeling and Evaluation 2007'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113648819673659611</id><published>2006-01-05T13:03:00.000-06:00</published><updated>2006-01-05T14:01:43.280-06:00</updated><title type='text'>How to time manually</title><content type='html'>Although there is some profilers such as gprof and perfsuite, we can write our own time measurement. The &lt;a href="http://www.hpcx.ac.uk/support/documentation/UserGuide/HPCxuser/HPCxuser.html"&gt;HPCx user guide&lt;/a&gt; from HPCx website and &lt;a href="http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IA64LinuxCluster/Doc/timing.html"&gt;Timing and Profiling&lt;/a&gt; NCSA website a very interesting way to write portable timers. Some of them:

In Fortran 9x:
&lt;pre&gt;  integer :: clock0, clock1, clockmax, clockrate, ticks
  real    :: secs
  call system_clock(count_max=clockmax, count_rate=clockrate)
  call system_clock(clock0)

  ! code to be timed

  call system_clock(clock1)

  ticks = clock1-clock0
  ticks = mod(ticks+clockmax, clockmax)   ! reset negative numbers
  secs = float(ticks)/float(clockrate)
  write(*,*) 'Code took ', secs, ' seconds'
&lt;/pre&gt;
Using MPI_Wtime:
&lt;pre&gt;
  DOUBLE PRECISION :: start, end
  start = MPI_Wtime()
 
  ! code to be timed
 
  end   = MPI_Wtime()
  print*,'That took ',end-start,' seconds'
&lt;/pre&gt;
In C:
&lt;pre&gt;
#include &lt;sys/time.h&gt;
struct timeval *Tps, *Tpf;
void *Tzp;
Tps = (struct timeval*) malloc(sizeof(struct timeval)):
Tpf = (struct timeval*) malloc(sizeof(struct timeval));
Tzp = 0;
gettimeofday (Tps, Tzp);
 &lt;code to be timed&gt;
gettimeofday (Tpf, Tzp); 
printf("Total Time (usec): %ld\n",
(Tpf-&gt;tv_sec-Tps-&gt;tv_sec)*1000000
             + Tpf-&gt;tv_usec-Tps-&gt;rvr_usec);
&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113648819673659611?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113648819673659611/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113648819673659611' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113648819673659611'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113648819673659611'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2006/01/how-to-time-manually.html' title='How to time manually'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113502290254701841</id><published>2005-12-19T14:07:00.000-06:00</published><updated>2005-12-19T14:19:31.353-06:00</updated><title type='text'>Week 50th 2005 (12/11-12/17)</title><content type='html'>&lt;ul&gt;   &lt;li&gt;Investigate on MPI and hybrid MPI+OpenMP measurements on Sphinx. There are mainly two issues: measurements implemented by Bronis, and the original implementation from SKaMPI. The former is more reasonable and make sense, While the later is somewhat "strange" and need further investigation to understand. In summary, measuring MPI is not very different to measuring OpenMP: we still need auxiliary test.&lt;/li&gt;   &lt;li&gt;Develop a program to retrieve callgraph information from &lt;a href="http://www.gnu.org/software/binutils/manual/gprof-2.9.1/html_mono/gprof.html"&gt;gprof&lt;/a&gt;. The tool is developed using Tcl/Tk as the user interface, and VCG to display a &lt;a href="http://rw4.cs.uni-sb.de/users/sander/html/gsvcg1.html"&gt;graph&lt;/a&gt;. The tool can be used to eliminate some routines so we can display the callgraph we interest. This tool helps me to understand how the Sphinx works, and what is the most consuming time in a program (in this case I used &lt;a href="http://www.llnl.gov/asci_benchmarks/asci/limited/sweep3d/"&gt;Sweep3D&lt;/a&gt;).&lt;/li&gt;   &lt;li&gt;Write down DOE report.&lt;/li&gt;   &lt;li&gt;Retouch the thesis. Make some corrections.
 &lt;/li&gt; &lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113502290254701841?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113502290254701841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113502290254701841' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113502290254701841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113502290254701841'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/week-50th-2005-1211-1217.html' title='Week 50th 2005 (12/11-12/17)'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113440497346904688</id><published>2005-12-12T10:29:00.000-06:00</published><updated>2005-12-19T14:07:31.693-06:00</updated><title type='text'>Computer and personal stuff</title><content type='html'>I found an interesting blog about computer and stuff: http://adhianto.blogspot.com/
This blog apparently discuss more on tips and tricks using linux, latex and networking.&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113440497346904688?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113440497346904688/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113440497346904688' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113440497346904688'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113440497346904688'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/computer-and-personal-stuff.html' title='Computer and personal stuff'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113438700150640459</id><published>2005-12-12T05:23:00.000-06:00</published><updated>2005-12-12T05:47:16.650-06:00</updated><title type='text'>Week 49th 2005 (12/5-12/11)</title><content type='html'>&lt;ul&gt;
&lt;li&gt;Goal: to be able to work with perfsuite
   &lt;ul&gt;&lt;li&gt;Able to use psinv and psrun&lt;/li&gt;   &lt;li&gt;Unable to run psprocess. Problem: Unable to integrate with tDOM&lt;/li&gt;  &lt;/ul&gt; &lt;/li&gt; &lt;li&gt;Investigate MPI measurement in Sphinx&lt;/li&gt; &lt;li&gt;investigate matrix multiplication and &lt;a href="http://www.llnl.gov/asci_benchmarks/asci/limited/sweep3d/"&gt;SWEEP3D&lt;/a&gt; mpi+openmp
   &lt;ul&gt;&lt;li&gt; communication cost is extremely small (0.0xxxx) compared to computation (42)&lt;/li&gt;   &lt;li&gt; static vs dynamic vs guided is comparable and no many differences&lt;/li&gt;  &lt;/ul&gt; &lt;/li&gt; &lt;li&gt; Need to investigate critical measurements. The result is always negative&lt;/li&gt; &lt;li&gt; Discussed with rick kuffrin about perfsuite problem: &lt;b&gt;Perfsuite does not support linux kernel 2.6&lt;/b&gt;. A new release to support this kernel will be available this month&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113438700150640459?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113438700150640459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113438700150640459' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113438700150640459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113438700150640459'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/week-49th-2005-125-1211.html' title='Week 49th 2005 (12/5-12/11)'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113414895604906360</id><published>2005-12-09T11:21:00.000-06:00</published><updated>2005-12-09T13:05:42.596-06:00</updated><title type='text'>Pointers to some tutorials</title><content type='html'>&lt;p&gt;
&lt;a href="http://www.funet.fi/~magi/opinnot/mpi/"&gt;High Performance Computing course&lt;/a&gt; at &lt;a href="http://www.tucs.fi/about/"&gt;Turku Centre for Computer Science&lt;/a&gt; (TUCS) has interesting information on parallel computing. They provide examples for parallel heat transfer simulation using a 2-dimensional finite difference Gauss-Seidel successive over-relaxation (SOR) scheme, One-dimensional finite difference calculation and Mandebrot set visualizer and N-body simulation.
&lt;/p&gt;
&lt;p&gt;&lt;a href="http://scv.bu.edu/"&gt;Scientific Computing and Visualization&lt;/a&gt; from &lt;a href="http://www.bu.edu/"&gt;Boston University&lt;/a&gt; has an excellent tutorial on MPI in &lt;a href="http://scv.bu.edu/SCV/Tutorials/MPI/"&gt;Course 3085: Multiprocessing by Message Passing MPI&lt;/a&gt;. This course includes examples of &lt;a href="http://scv.bu.edu/SCV/Tutorials/MPI/MPIcodes.zip"&gt;MPI Numerical Integration&lt;/a&gt; both in C and Fortran, &lt;a href="http://scv.bu.edu/SCV/Tutorials/MPI/alliance/apply/solvers/"&gt;iterative solvers&lt;/a&gt; using &lt;a href="http://scv.bu.edu/SCV/Tutorials/MPI/alliance/apply/solvers/relax.zip"&gt;SOR relaxation&lt;/a&gt; and &lt;a href="http://scv.bu.edu/SCV/Tutorials/MPI/alliance/apply/transpose/"&gt;matrix transpose&lt;/a&gt;
&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113414895604906360?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113414895604906360/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113414895604906360' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113414895604906360'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113414895604906360'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/pointers-to-some-tutorials.html' title='Pointers to some tutorials'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113407178225006228</id><published>2005-12-08T13:55:00.000-06:00</published><updated>2005-12-19T14:17:39.740-06:00</updated><title type='text'>OpenMP for cluster computers</title><content type='html'>&lt;h3&gt;Introduction&lt;/h3&gt;OpenMP is a easy, simple but powerful programming model for shared memory model such as SMP and cc-NUMA machines. OpenMP program can be run on cluster computers if and only if on top of software distributed shared memory such as Treadmarks and SCASH. However using SDSM can be very costly due to high overhead. A translation to distributed memory programming model such as message passing interface (MPI) is then more suitable and advantageous. This article describes some works on this issue.
&lt;h3&gt;Translation OpenMP to MPI&lt;/h3&gt;To the best of my knowledge, the only automatic translation from OpenMP to MPI work is done by [Basumallik2005]. They use the concept so called &lt;i&gt;partial replication&lt;/i&gt;. Here is the key:
&lt;blockquote&gt;HPF-like approaches emphasize on data distribution: data partitioning, computation partitioning and message insertion. In this case, computation is performed by the owner of the data (owner-computes rule).
SDSM has different approach. It replicates all shared program data and also allocate management data structures (e.g.: shadow copies of shared data) on all CPUs.
[Basumallik2005] scheme: to replicate shared data on all processors, but no allocation on management structure. In the future, arrays with fully regular accesses will be distributed among processors.
&lt;/blockquote&gt;The translation technique also considers irregular data accesses, overlapped communication and computation and recognition of transformed reduction idioms.
&lt;p&gt;&lt;/p&gt;[Basumallik2005] experiments on seven OpenMP programs: NAS (CG, IS, EP, FT and LU) and SPEC OMP (ART and EQUAKE) benchmarks on Pentium III clusters and IBM SP. The result is very impressive: although the partial replication technique has less data scalable than data distribution scheme, but the good thing is this technique allows much larger data sets. The paper also mentions that their tranlation technique outperform both HPF programs and SDSM programs.
&lt;h3&gt;Application&lt;/h3&gt; &lt;p&gt;[Berger2005] proposes different a strategy to implement a hybrid MPI+OpenMP application. Instead of introducing OpenMP directives into an MPI application, it converts incremented OpenMP parallelization into MPI.
The application is a flow solver to solve the inviscid steady state euler equation on multi-level Cartesian grids with embedded boundaries. It uses space-filling curves to provide a one-dimensional ordering of a three-dimensional mesh. Using a general purpose partitioner such as METIS in this case would be too expensive.
Machine used: SGI Origin 3600
Result:
- OpenMP implementation is marginally faster than MPI due thanks to memory placement and locality.
- Greater MPI overhead for more than 320 CPUs
&lt;/p&gt;&lt;h3&gt;References&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;[Basumallik2005] Basumallik, A. and Eigenmann, R. 2005. Towards automatic translation of OpenMP to MPI. In Proceedings of the 19th Annual international Conference on Supercomputing (Cambridge, Massachusetts, June 20 - 22, 2005). ICS '05. ACM Press, New York, NY, 189-198. DOI= http://doi.acm.org/10.1145/1088149.1088174
&lt;/li&gt;&lt;li&gt;[Berger2005] Berger, M. J., Aftosmis, M. J., Marshall, D. D., and Murman, S. M. 2005. Performance of a new CFD flow solver using a hybrid programming paradigm. J. Parallel Distrib. Comput. 65, 4 (Apr. 2005), 414-423. DOI= http://dx.doi.org/10.1016/j.jpdc.2004.11.010&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113407178225006228?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113407178225006228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113407178225006228' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113407178225006228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113407178225006228'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/openmp-for-cluster-computers.html' title='OpenMP for cluster computers'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113406443030483893</id><published>2005-12-08T11:25:00.000-06:00</published><updated>2006-01-06T07:58:42.323-06:00</updated><title type='text'>Some experiences on hybrid MPI+OpenMP</title><content type='html'>&lt;p&gt;Current high performance computers are multiple node architecture where each node is a shared memory system (or also known symmetric multi-processor or SMP). This hierarchical architecture encourages hierarchical parallel programming model such as MPI and OpenMP.
&lt;/p&gt; &lt;h3&gt;Coarse grain MPI + fine grain OpenMP&lt;/h3&gt; &lt;p&gt;[Majumdar2000] compares 3 different programming models: Tera Multi-Threaded Architecture (MTA), MPI and hybrid MPI+OpenMP. The hybrid implementation adopt hybrid simple strategy. That is, it inserts OpenMP loop level parallelization on the outermost loop, and MPI reduction outside the parallel region to enhance consistency among nodes. From this experiment, we see nearly linear speedup for all three programming paradigms.&lt;/p&gt;[Tai2005] works on parallelization of a multigrid solver by multigrid domain decomposition approach (MG-DD), using SPMD and MIMD programming paradigm (although it appears it is just a simple data decomposition instead of "real SPMD"). It proposes two strategies: one is a one-level parallelization using geometric domain decomposition; and the other one is a two-level parallelization that consists of a hybrid of both geometric domain decomposition with MPI and data decomposition using OpenMP. METIS is employed to produce the partitions of grids.
&lt;p&gt;Result: poorer scalability in hybrid model due to less and less computation and more and more time for communication.
&lt;/p&gt; &lt;p&gt;A similar result is also achieved by [Mahinthakumar2002]. Although the authors use different optimization strategies, especially in OpenMP loop level (such as loop interchange, fusion, loop restructuring and thread decomposition), still pure MPI version outperforms the hybrid version. One cause of hybrid poor performance is OpenMP scalability. The OpenMP implementation is based on loop level parallelism and can cause false sharing. Another cause is poor MPI-OpenMP interaction, where only master thread that perform communication while others are idle.
An attempt improvement has been done by pushing parallelism in outer loop. This is reported has improved both OpenMP and MPI performance.
&lt;/p&gt; &lt;h3&gt;Coarse grain MPI + coarse grain OpenMP&lt;/h3&gt;[Oliker2002] investigates the effects of various ordering and partitioning strategies on the performance of CG (Aztec) and PCG (ILU BlockSolve95) of three different partitioner (METIS, SAW and RCM) from four different programming models: MPI, OpenMP, hybrid MPI+OpenMP and MTA. The Machines used: Cray T3E, SGI Origin 2000, IBM SP (clustered SMPs) and Cray MTA.
&lt;p&gt;Model:
&lt;/p&gt;&lt;blockquote&gt;T = Tf + Tm + Tc
Where:
T is the total time
Tf time to perform floating-point operations
Tm time to service the cache misses
Tc time to communicate the x vector
&lt;/blockquote&gt;Remark: this model may not be accurate since we cannot explicitely split between cache and arithmetic operation. But for raw estimation, it is acceptable.
&lt;p&gt;&lt;/p&gt;What we can learn:
&lt;ul&gt;&lt;li&gt; METIS transfers the least amount of data, whereas RCM has the fewest messages&lt;/li&gt;&lt;li&gt; Intelligent ordering schemes are extremely important for efficient sparse matrix computations regarless of whether the programming paradigm is OpenMP, MPI or hybrid.&lt;/li&gt;   &lt;li&gt; Hybrid implementation offers no noticeable advantage, and pure MPI is a more effective strategy&lt;/li&gt; &lt;li&gt; RCM and SAW are faster than METIS due to better cache reuse&lt;/li&gt;&lt;/ul&gt;
&lt;h3&gt;Translation into hybrid MPI+OpenMP&lt;/h3&gt;&lt;p&gt;An interesting approach is done by [Benkner2003] to translate from HPF (with additional extension) into hybrid MPI and OpenMP. The proposed extension is necessary to control data across nodes and (inter-node) within the node (intra-node). They experimented with 2 applications: a kernel from a crash simulation and a kernel from numerical pricing module. Both kernels are based on iterative computational scheme with an outer time-step loop. The result is somewhat mix: the hybrid version of financial optimization kernel has much better speedup than the pure MPI version. While car crash simulation results similar scalability for both MPI and hybrid. The reason of this difference is due to the number of messages and memory requirement due to arrays replication. In the finance kernel, pure MPI has 4 times number of messages and more replicated arrays than the hybrid version.
&lt;/p&gt;
&lt;h3&gt;References&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;[Benkner2003] Siegfried Benkner, Viera Sipková: Exploiting Distributed-Memory and Shared-Memory Parallelism on Clusters of SMPs with Data Parallel Programs. International Journal of Parallel Programming 31(1): 3-19 (2003)&lt;/li&gt;   &lt;li&gt;[Mahinthakumar2002] Mahinthakumar, G, Saied, F. A Hybrid MPI-OpenMP Implementation of an Implicit Finite-Element Code on Parallel Architectures. International Journal of High Performance Computing Applications 2002 16: 371-393
  &lt;/li&gt; &lt;li&gt;[Majumdar2000] Amitava Majumdar: Parallel Performance Study of Monte Carlo Photon Transport Code on Shared-, Distributed-, and Distributed-Shared-Memory Architectures. IPDPS 2000: 93-
&lt;/li&gt;&lt;li&gt;[Oliker2002] Oliker, L., Li, X., Husbands, P., and Biswas, R. 2002. Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations. SIAM Rev. 44, 3 (Mar. 2002), 373-393. DOI= http://dx.doi.org/10.1137/S00361445003820
&lt;/li&gt;&lt;li&gt;[Tai2005] C.H. Tai, Y. Zhao and K.M. Liew, Parallel-multigrid computation of unsteady incompressible viscous flows using a matrix-free implicit method and high-resolution characteristics-based scheme, Computer Methods in Applied Mechanics and Engineering, Volume 194, Issues 36-38, 23 September 2005, Pages 3949-3983.
(http://www.sciencedirect.com/science/article/B6V29-4DW3937-3/2/fb22d5fa52e6791f816d90a770f9e5da)
&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113406443030483893?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113406443030483893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113406443030483893' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113406443030483893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113406443030483893'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/some-experiences-on-hybrid-mpiopenmp.html' title='Some experiences on hybrid MPI+OpenMP'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-19694363.post-113406120562999701</id><published>2005-12-08T10:36:00.000-06:00</published><updated>2005-12-22T04:25:51.986-06:00</updated><title type='text'>Software for HPC</title><content type='html'>&lt;h3&gt;Integrated Development Environment (IDE)&lt;/h3&gt;We can cite &lt;a href="http://www.eclipse.org/"&gt;Eclipse&lt;/a&gt; as a powerful and extensible multi-platform IDE. Currently, there is a project to support parallel programming named &lt;a href="http://www.eclipse.org/ptp/index.html"&gt;Parallel Tools Platform&lt;/a&gt; (or PTP). Originally PTP more focus on MPI. The last news from Supercomputing, Beth Tibbitts reported that IBM also develop a plugin for OpenMP and it appears it will be integrated into PTP plugin.
&lt;h3&gt;Performance Tools&lt;/h3&gt;PerfSuite [&lt;a href="http://perfsuite.sourceforge.net/"&gt;1&lt;/a&gt;,&lt;a href="http://perfsuite.ncsa.uiuc.edu/"&gt;2&lt;/a&gt;] is a collection of performance analysis software that can assist with a variety of methods and techniques useful for application software optimization.

The current release provides support for analysis with hardware performance counters on Linux with Intel and AMD-based systems (x86, x86-64, and ia64).

Support for the 2.6 kernel is not yet available in any released version of PerfSuite. According to Rick Kuffrin (via email conversation), this kernel should be supported in the next release in December 2005.

Perhaps one of the most known profiler is &lt;a href="http://www.gnu.org/software/binutils/manual/gprof-2.9.1/"&gt;gprof&lt;/a&gt;. gprof is a statistical based profiler, it is not an event trace. As a statistical profiler, gprof is suitable for measuring frequently called routines. Another similar tool is &lt;a href="http://oprofile.sourceforge.net/"&gt;oprofile&lt;/a&gt;. In contrast to gprof, oprofile does not need to recompile the source code, nor to patch the kernel (as perfsuite does).  One disadvantage of oprofile is it act as a monitor, so you need to run it like a "daemon", and in most machines, normal users cannot start or shutdown a daemon. This means oprofile is more suitable for root users.

&lt;a href="http://sourceware.org/systemtap/"&gt;Systemtap&lt;/a&gt; has very similar approach to oprofile. It is intended to spy the kernel events and report it as does Sun's dprobes.&lt;div class="blogger-post-footer"&gt;Current information and news on high perfomance computer and the tools&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/19694363-113406120562999701?l=laksono.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://laksono.blogspot.com/feeds/113406120562999701/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=19694363&amp;postID=113406120562999701' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113406120562999701'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/19694363/posts/default/113406120562999701'/><link rel='alternate' type='text/html' href='http://laksono.blogspot.com/2005/12/software-for-hpc.html' title='Software for HPC'/><author><name>Laksono Adhianto</name><uri>http://www.blogger.com/profile/15297127601614301975</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
