In a competition held at SC11—the International Conference for High Performance Computing, Networking, Storage, and Analysis—in Seattle, Washington in November 2011, Rice University's Coarray Fortran (CAF) 2.0 language, compiler, and runtime system received an award as as one of three finalists in the HPC Challenge Class II Awards Competition. The CAF 2.0 project is led by Professor John Mellor-Crummey.
At SC11, Dr. Laksono Adhianto delivered a presentation about CAF 2.0, a set of language extensions that support scientific programming on scalable parallel systems in a session devoted to the HPC Challenge Awards Competition. The presentation described the implementation and performance of several CAF 2.0 programs that solve problems in the HPC Challenge Benchmark suite. These programs were then run on up to 8,192 cores of several Department of Energy supercomputers. Other members of the Rice Coarray Fortran 2.0 project team include Guohua Jin, Mark Krentel, Karthik Murthy, Dung Nguyen, William Scherer III, Scott Warren, and Chaoran Yang
High Performance Computing
Notes on high performance computing, compiler and programming model. Also all methodology of parallel computation, numerical analysis and its application.
Monday, December 05, 2011
Coarray Fortran 2.0 in Supercomputing 2011
Friday, April 15, 2011
Exascale programming: Won't be and will be
- It's not a library: it hides the algorithm, data structures, and performance aspects
- It's not a C++ class hierarchy or template library: Too much abstraction
- It's not a domain-specific language: DSL don't have a large enough user community, not enough support.
- It's not OpenCL: very low level, too complicated.
- It's not a whole new language: most programmers avoid adopting a new language for fear that it will die
- It's not easy. The idea of making parallel programming easy is silly (?)
- It's not just parallelism: the key isn't parallelism, it's performance
- It supports all levels of parallelism, from node parallelism down to vector and pipeline parallelism, effectively.
- It can map an expression of program parallelism (a parallel loop, say) to different levels of hardware parallelism (across nodes, or to a vector unit) depending on the target.
- It supports the programmer with lots of feedback.
- It supports dynamic parallelism, creating parallel tasks and threads when needed.
- It efficiently composes abstract operations
- It is self-balancing and self-tuning.
- It must be resilient.
Thursday, November 15, 2007
Creating customized toolbar in Eclipse view
private Composite createCoolBar(Composite aParent) {
// make the parent with grid layout
GridLayout grid = new GridLayout(1,false);
aParent.setLayout(grid);
CoolBar coolBar = new CoolBar(aParent, SWT.FLAT);
GridData data = new GridData(GridData.FILL_HORIZONTAL);
coolBar.setLayoutData(data);
// prepare the toolbar
ToolBar toolbar = new ToolBar(coolBar, SWT.FLAT);
// ------------- prepare the items
// flatten
ToolItem tiFlatten = new ToolItem(toolbar, SWT.PUSH);
tiFlatten.setToolTipText("Flatten the node");
....
return aParent; // no changes, reuse the parent
}
public void createPartControl(Composite aParent) {
Composite parent = this.createCoolBar(aParent);
treeViewer = new TreeViewer(parent,
SWT.SINGLE|SWT.FULL_SELECTION | SWT.BORDER);
....
// tricky: needed to expand the tree
GridData data = new GridData(GridData.FILL_BOTH);
treeViewer.getTree().setLayoutData(data);
}
The result is shown below:
Monday, November 05, 2007
Eclipse RCP: Is it really easy ?
Eclipse RCP supposed to be to easy and to help programmers not to reinvent the wheel. Unfortunately, its design to be as general as possible makes it hard to understand due its highly rich of features and complex. For simple RCP, as provided by some samples and snippets (can be checked out via eclipse's repository), Eclipse is fun and easy. This is true at the beginning, but as the time goes on, programmers will face very complex and intuitive way to build a rich application.
However, having search for a complete tutorial on Internet, a very interesting page written by Lars Vogel describes a step-by-step programming with Eclipse RCP from feature perspective. The article is accessible from this URL: http://www.vogella.de/articles/RichClientPlatform/article.html
This article although not very deep, but it covers functionalities that most programmers need. A high recommend article to read for those who have no time to read books.
Wednesday, September 26, 2007
cafc: Co-array Fortran Compiler from Rice
Rice has their own Co-array Fortran source-to-source compiler named cafc. The source code can be downloaded from http://www.hipersoft.rice.edu/caf/download.html together with Open64/SL. It also needs third party libraries: ARMCI+MPI or Gasnet.
The libf90caf source code files are:
Pretty wierd, isn't it ?#---------------------------------------------------------------------- # List of Source files #---------------------------------------------------------------------- COMMON_COM_TARG_SRC = \ c_a_to_q.c \ config_host.c \ config_platform.c CFILES= \ $(COMMON_COM_SRC) \ $(COMMON_COM_TARG_SRC) COMMON_COM_CXX_SRC = \ config.cxx \ config_elf_targ.cxx \ const.cxx \ cxx_memory.cxx \ dwarf_DST.cxx \ dwarf_DST_dump.cxx \ dwarf_DST_mem.cxx \ glob.cxx \ ir_bcom.cxx \ ir_bread.cxx \ ir_bwrite.cxx \ ir_reader.cxx \ irbdata.cxx \ mtypes.cxx \ opcode.cxx \ opcode_core.cxx \ pu_info.cxx \ quadop.cxx \ strtab.cxx \ symtab.cxx \ symtab_verify.cxx \ wn.cxx \ wn_map.cxx \ wn_pragmas.cxx \ wn_simp.cxx \ wn_util.cxx \ wutil.cxx \ xstats.cxx \ upc_symtab_utils.cxx \ upc_wn_util.cxx \ intrn_info.cxx COMMON_COM_TARG_CXX_SRCS = \ config_targ.cxx \ targ_const.cxx \ targ_sim.cxx CXXFILES = \ dummy-defines.cxx \ $(COMMON_COM_CXX_SRC) \ $(COMMON_COM_TARG_CXX_SRCS)
Tuesday, September 25, 2007
Co-Array Fortran (CAF)
CAF is one of the three partitioned global address space (PGAS). The other two are unified parallel C (UPC) and Titanium.
CAF is an SPMD based programming model created by created by Robert Numrich and John Reid (researchers from Cray, I suppose ?). The advantage of CAF is its simplicity compared to MPI but on the other hand provide almost the same flexibility and control to users. Furthermore, CAF is based on Fortran90, and therefore facilitate users to learn. However, it seems not possible in CAF to have dynamic processes where user can spawn or shrink processes. Another disadvantage is that it is based on "message" communication and therefore theoretically has no gain performance for multicore processors.
Rules:
– Normal rounded brackets ( ) to point to data in local memory. – Square brackets [ ] to point to data in remote memory.Examples
(taken from [Numrich03]):Variables decalaration:
real :: s[*] real :: a(n)[*] complex :: z[*] integer :: index(n)[*] real :: b(n)[p, *] real :: c(n,m)[0:p, -7:q, 11:*] real, allocatable :: w(:)[:] type(field) :: maxwell[p,*]
Communication
x(:)[q] = x(:) + x(:)[p]Which means x in the q processor will receive the sum of local x and x from the p processor.
Real examples
real,dimension(n,n)[p,*] :: a,b,c do k=1,n do q=1,p c(i,j)[myP,myQ] = c(i,j)[myP,myQ] + a(i,k)[myP, q]*b(k,j)[q,myQ] enddo enddo
Refeences
- [Numrich03] Co-Array Fortran What Is It? Why should you put it on BlueGene/L? Robert W. Numrich, Minnesota Supercomputing Institute University of Minnesota
Friday, September 21, 2007
Cilk
Cilk [Blumofe95] is a coarse-grain parallel programming model based C. It is a faithful extension of C for multithreading that uses asynchronous parallelism and an efficient work-stealing schedule. Cilk is currently limited for shared memory architecture although some efforts for distributed memory have been published [Nikhil95].
Despite its simplicity, Cilk provides several advantages as mentioned in [Kuszmaul07]: simple, small, can use the best serial code, can express cache-efficient code, efficient (productivity), good performance on one processor and portable.
However, it also suffers some limitations, such as no direct support for fine-grained parallelism, synchronization flexibility, group team and lack of support for other languages such as Fortran and Java.
One famous Cilk example is as follows:| Original | Cilk version |
|---|---|
int fib(int n) {
if (n < 2) return n;
else {
int n1, n2;
n1 = fib(n-1);
n2 = fib(n-2);
return;
}
} |
cilk int fib(n) {
if (n < 2) return n;
else {
int n1, n2;
n1 = spawn fib(n-1);
n2 = spawn fib(n-2);
sync;
return (n1 + n2);
}
}
|
References
[Blumofe95] Blumofe, R. D., Joerg, C. F., Kuszmaul, B. C., Leiserson, C. E., Randall, K. H., and Zhou, Y. 1995. Cilk: an efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Santa Barbara, California, United States, July 19 - 21, 1995). R. L. Wexelblat, Ed. PPOPP '95. ACM Press, New York, NY, 207-216.[Nikhil95] Nikhil, R. S. 1995. Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines. In Proceedings of the 7th international Workshop on Languages and Compilers For Parallel Computing (August 08 - 10, 1994). K. Pingali, U. Banerjee, D. Gelernter, A. Nicolau, and D. A. Padua, Eds. Lecture Notes In Computer Science, vol. 892. Springer-Verlag, London, 376-390.
[Kuszmaul07] Kuszmaul, B. C. 2007. Cilk provides the "best overall productivity" for high performance computing: (and won the HPC challenge award to prove it). In Proceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures (San Diego, California, USA, June 09 - 11, 2007). SPAA '07. ACM Press, New York, NY, 299-300.