Monday, December 05, 2011

Coarray Fortran 2.0 in Supercomputing 2011

Taken from Rice University news http://compsci.rice.edu/news.cfm?doc_id=14042 :
In a competition held at SC11—the International Conference for High Performance Computing, Networking, Storage, and Analysis—in Seattle, Washington in November 2011, Rice University's Coarray Fortran (CAF) 2.0 language, compiler, and runtime system received an award as as one of three finalists in the HPC Challenge Class II Awards Competition. The CAF 2.0 project is led by Professor John Mellor-Crummey.

At SC11, Dr. Laksono Adhianto delivered a presentation about CAF 2.0, a set of language extensions that support scientific programming on scalable parallel systems in a session devoted to the HPC Challenge Awards Competition.  The presentation described the implementation and performance of several CAF 2.0 programs that solve problems in the HPC Challenge Benchmark suite. These programs were then run on up to 8,192 cores of several Department of Energy supercomputers. Other members of the Rice Coarray Fortran 2.0 project team include Guohua Jin, Mark Krentel, Karthik Murthy, Dung Nguyen, William Scherer III, Scott Warren, and Chaoran Yang

Friday, April 15, 2011

Exascale programming: Won't be and will be

There is a very interesting article in hpcwire written by Michael Wolfe (PGI) on the "emerging" of exascale programming: http://www.hpcwire.com/features/Compilers-and-More-Programming-at-Exascale-117593783.html?viewAll=y There are some interesting points we can learn: Exascale Programming: What It Won't Be
  • It's not a library: it hides the algorithm, data structures, and performance aspects
  • It's not a C++ class hierarchy or template library: Too much abstraction
  • It's not a domain-specific language: DSL don't have a large enough user community, not enough support.
  • It's not OpenCL: very low level, too complicated.
  • It's not a whole new language: most programmers avoid adopting a new language for fear that it will die
  • It's not easy. The idea of making parallel programming easy is silly (?)
  • It's not just parallelism: the key isn't parallelism, it's performance
Exascale Programming: What It Is
  • It supports all levels of parallelism, from node parallelism down to vector and pipeline parallelism, effectively.
  • It can map an expression of program parallelism (a parallel loop, say) to different levels of hardware parallelism (across nodes, or to a vector unit) depending on the target.
  • It supports the programmer with lots of feedback.
  • It supports dynamic parallelism, creating parallel tasks and threads when needed.
  • It efficiently composes abstract operations
  • It is self-balancing and self-tuning.
  • It must be resilient.
I agree most of the above points, except for the exascale programming won't be easy. Yes, effective parallel programming is not easy, and easy parallel programming is not effective. But this does not mean there is no way to be both easy and effective. I think the main key is about productivity. It is okay to be not efficient as long as we can tolerate the performance.

Thursday, November 15, 2007

Creating customized toolbar in Eclipse view

By default, Eclipse view toolbar appears on the right side of the view as shown in the image below: This is fine when we the actions are not visually associated in any columns. However, one we need to associate a button or toolitem to a table column or a column header, we need to place the toolbar as closed as possible to the table column. Here is the trick: we need to design a layout, preferably a GridLayout, then place the toolbar on top of the tree view or table view.
    private Composite createCoolBar(Composite aParent) {
     // make the parent with grid layout
     GridLayout grid = new GridLayout(1,false);
     aParent.setLayout(grid);
     CoolBar coolBar = new CoolBar(aParent, SWT.FLAT);
        GridData data = new GridData(GridData.FILL_HORIZONTAL);
     coolBar.setLayoutData(data);
     // prepare the toolbar
     ToolBar toolbar = new ToolBar(coolBar, SWT.FLAT);
     
     // ------------- prepare the items
     // flatten
     ToolItem tiFlatten = new ToolItem(toolbar, SWT.PUSH);
     tiFlatten.setToolTipText("Flatten the node");
        ....
 return aParent; // no changes, reuse the parent
     }
    public void createPartControl(Composite aParent) {
     Composite parent = this.createCoolBar(aParent);
        treeViewer = new TreeViewer(parent, 
               SWT.SINGLE|SWT.FULL_SELECTION | SWT.BORDER);
        ....
        // tricky: needed to expand the tree
        GridData data = new GridData(GridData.FILL_BOTH);
        treeViewer.getTree().setLayoutData(data);

    }
 
The result is shown below:

Monday, November 05, 2007

Eclipse RCP: Is it really easy ?

Eclipse RCP supposed to be to easy and to help programmers not to reinvent the wheel. Unfortunately, its design to be as general as possible makes it hard to understand due its highly rich of features and complex. For simple RCP, as provided by some samples and snippets (can be checked out via eclipse's repository), Eclipse is fun and easy. This is true at the beginning, but as the time goes on, programmers will face very complex and intuitive way to build a rich application.

However, having search for a complete tutorial on Internet, a very interesting page written by Lars Vogel describes a step-by-step programming with Eclipse RCP from feature perspective. The article is accessible from this URL: http://www.vogella.de/articles/RichClientPlatform/article.html

This article although not very deep, but it covers functionalities that most programmers need. A high recommend article to read for those who have no time to read books.

Wednesday, September 26, 2007

cafc: Co-array Fortran Compiler from Rice

Rice has their own Co-array Fortran source-to-source compiler named cafc. The source code can be downloaded from http://www.hipersoft.rice.edu/caf/download.html together with Open64/SL. It also needs third party libraries: ARMCI+MPI or Gasnet.

The libf90caf source code files are:

#----------------------------------------------------------------------
# List of Source files
#----------------------------------------------------------------------
COMMON_COM_TARG_SRC =     \
  c_a_to_q.c        \
  config_host.c     \
  config_platform.c 

CFILES=  \
  $(COMMON_COM_SRC) \
  $(COMMON_COM_TARG_SRC)

COMMON_COM_CXX_SRC =  \
  config.cxx  \
  config_elf_targ.cxx \
  const.cxx \
  cxx_memory.cxx \
  dwarf_DST.cxx \
  dwarf_DST_dump.cxx  \
  dwarf_DST_mem.cxx \
  glob.cxx  \
  ir_bcom.cxx \
  ir_bread.cxx  \
  ir_bwrite.cxx \
  ir_reader.cxx \
  irbdata.cxx \
  mtypes.cxx  \
  opcode.cxx  \
  opcode_core.cxx \
  pu_info.cxx \
  quadop.cxx      \
  strtab.cxx  \
  symtab.cxx  \
  symtab_verify.cxx \
  wn.cxx    \
  wn_map.cxx  \
  wn_pragmas.cxx  \
  wn_simp.cxx \
  wn_util.cxx \
  wutil.cxx \
  xstats.cxx  \
  upc_symtab_utils.cxx  \
  upc_wn_util.cxx \
  intrn_info.cxx



COMMON_COM_TARG_CXX_SRCS =  \
  config_targ.cxx \
  targ_const.cxx  \
  targ_sim.cxx

CXXFILES =                 \
  dummy-defines.cxx            \
  $(COMMON_COM_CXX_SRC)        \
  $(COMMON_COM_TARG_CXX_SRCS)
Pretty wierd, isn't it ?

Tuesday, September 25, 2007

Co-Array Fortran (CAF)

CAF is one of the three partitioned global address space (PGAS). The other two are unified parallel C (UPC) and Titanium.

CAF is an SPMD based programming model created by created by Robert Numrich and John Reid (researchers from Cray, I suppose ?). The advantage of CAF is its simplicity compared to MPI but on the other hand provide almost the same flexibility and control to users. Furthermore, CAF is based on Fortran90, and therefore facilitate users to learn. However, it seems not possible in CAF to have dynamic processes where user can spawn or shrink processes. Another disadvantage is that it is based on "message" communication and therefore theoretically has no gain performance for multicore processors.

Rules:

– Normal rounded brackets ( ) to point to data in local memory. – Square brackets [ ] to point to data in remote memory.

Examples

(taken from [Numrich03]):

Variables decalaration:

real :: s[*]
real :: a(n)[*]
complex :: z[*]
integer :: index(n)[*]
real :: b(n)[p, *]
real :: c(n,m)[0:p, -7:q, 11:*]
real, allocatable :: w(:)[:]
type(field) :: maxwell[p,*]

Communication

x(:)[q] = x(:) + x(:)[p]
Which means x in the q processor will receive the sum of local x and x from the p processor.

Real examples

real,dimension(n,n)[p,*] :: a,b,c
do k=1,n
do q=1,p
c(i,j)[myP,myQ] = c(i,j)[myP,myQ]
+ a(i,k)[myP, q]*b(k,j)[q,myQ]
enddo
enddo

Refeences

Friday, September 21, 2007

Cilk

Cilk [Blumofe95] is a coarse-grain parallel programming model based C. It is a faithful extension of C for multithreading that uses asynchronous parallelism and an efficient work-stealing schedule. Cilk is currently limited for shared memory architecture although some efforts for distributed memory have been published [Nikhil95].

Despite its simplicity, Cilk provides several advantages as mentioned in [Kuszmaul07]: simple, small, can use the best serial code, can express cache-efficient code, efficient (productivity), good performance on one processor and portable.

However, it also suffers some limitations, such as no direct support for fine-grained parallelism, synchronization flexibility, group team and lack of support for other languages such as Fortran and Java.

One famous Cilk example is as follows:
OriginalCilk version
int fib(int n) {
 if (n < 2) return n; 
 else {
  int n1, n2;
  n1 = fib(n-1);
  n2 = fib(n-2);
  return;
 }
}
cilk int fib(n) {
 if (n < 2) return n;
 else {
  int n1, n2;
  n1 = spawn fib(n-1);
  n2 = spawn fib(n-2);
  sync;
  return (n1 + n2);
 }
}
The question is that if this language is good enough for multicore processors ? According to Charles Leisorson from MIT, the answer is yes. (See Multithreaded programming in Cilk). But not for general case I presume. Multicore processor architecture tends to encourage fine grain parallelism instead of coarse grain.

References

[Blumofe95] Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: an efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, California, United States, July 19 - 21, 1995). R. L. Wexelblat, Ed. PPOPP '95. ACM Press, New York, NY, 207-216.
[Nikhil95] Nikhil, R. S. 1995. Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines. In Proceedings of the 7th international Workshop on Languages and Compilers For Parallel Computing (August 08 - 10, 1994). K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, and D. A. Padua, Eds. Lecture Notes In Computer Science, vol. 892. Springer-Verlag, London, 376-390.
[Kuszmaul07] Kuszmaul, B. C. 2007. Cilk provides the "best overall productivity" for high performance computing: (and won the HPC challenge award to prove it). In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (San Diego, California, USA, June 09 - 11, 2007). SPAA '07. ACM Press, New York, NY, 299-300.