Saturday, July 10, 2010

Some news!

News:
*Gpu computing gems 1 or GPU gems 4 source code already avaiable in gpucomputing.net:
Book for November..
Right now:

Title


A Programmable Graphics Pipeline in CUDA for Order Independent Transparency1 new07-10-2010
High Performance Iterated Function Systems0 new07-02-2010
CUDA Implementation of the Tree-based Barnes Hut n-Body Algorithm0 new07-01-2010
Connected Component Labeling in CUDA - demo+code0 new06-30-2010
A Practical Guide toMassively ParallelMonte Carlo Simulations: The Ising Model0 new06-30-2010
Parallel LDPC Decoding using CUDA0 new06-30-2010
Path Regeneration for Random Walks0 new06-30-2010
GPU Gems 4: Deformable Volumetric Registration using B-splines Source Code0 new06-30-2010
Monte Carlo Photon Transport on the GPU0 new06-30-2010
Lattice-Boltzmann Lighting Models - Source Code0 new06-30-2010
RNA folding GPU0 new06-30-2010
Haar Classifiers for Object Detection with CUDA: Pixel-parallel processing kernel0 new06-29-2010
Multiclass Support Vector Machine0 new06-29-2010
Parallelization of the x264 encoder using OpenCL0 new06-21-2010
Cone-Beam CT image reconstruction using the Katsevich Algorithm0 new06-21-2010
Line forward projection on CUDA0 new06-11-2010

seems MareNostrum getting a rack of Fermis perhaps with IBM Power7

see now Nvidia would have to publish a PowerPC arch CUDA driver?

Or using PathScale with full open source based computing stack..
avaiable here branch from noveau:

http://github.com/pathscale/pscnv/commits/master
Seems Nvidia TCC supporting driver Fermi in IBM web site version 197.81

Catalyst 10.8 beta seems avaiable 10.7 coming 21/7..


Physx 3.0 coming with CPU improvements:
*auto threading
*sse enabled by default
Mafia has new runtimes NVIDIA PhysX driver: 10.04.02_9.10.0522.
Mueller has post paper of Fermi launch demo using water heigh fields plus particles..
Two other papers interesting from Nvidia research are:

HLBVH: Hierarchical LBVH Construction for Real-Time Ray Tracing
PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes

Hwu based course from Stanford:
http://code.google.com/p/stanford-cs193g-sp2010/wiki/ClassSchedule

Two interesting conferences program avaiable:

PACT
has intel gpu paper demystifying ..
also Revisiting Sorting for GPGPU Stream Architectures
which achieves near 500mkeys/s on gt200..



there is a workshop on gpus
http://informatik.technikum-wien.at/gpusca/
and web doesn't work.

The Nineteenth International Conference on
Parallel Architectures and Compilation Techniques (PACT)
Vienna, Austria, September 11-15, 2010
Interesting papers:
Scalable Thread Scheduling and Global Power Management for Heterogeneous Many-Core Architectures
Dynamically Managed Multithreaded Reconfigurable Architectures for Chip Multiprocessors
WAYPOINT: Scaling Coherence to Thousand-core Architectures
Scalable Hardware Support for Conditional Parallelization
Less is More: Trading off Work-Efficiency for Scalability in Irregular Programs
Revisiting Sorting for GPGPU Stream Architectures
D. Merrill, A. Grimshaw
An Integer Programming Framework for Optimizing Shared Memory Use on GPUs
W. Ma, G. Agrawal
DMATiler: Revisiting Loop Tiling for Direct Memory Access
A Software-SVM-based Transactional Memory for Multicore Accelerator Architectures with Local Memory
Automatic Vector Instruction Selection for Dynamic Compilation
An OpenCL Framework for Heterogeneous Multicores with Local Memory

SC10

I would like to review this papers:
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems
Parallel Fast Gauss Transform
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers
The Multi-Scale Heart Simulation on Massively Parallel Computers
Using 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs
An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Scalable Graph Exploration on Multicore Processors
The 48-core SCC processor: the programmer’s view
Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture
Reducing Multicore Bandwidth Requirements for Combinatorial Multigrid
Diagnosis, Tuning and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Scaling Hierarchical N-Body Simulations on GPU Clusters
Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance
The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches

8 comentarios:

  1. http://www.vizworld.com/2010/07/sc10-technical-program/?utm_source=footer&utm_medium=relatedlinks&utm_campaign=layout
    Adapting Partial Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches in GPUs

    ReplyDelete
  2. 好的開始並不代表會成功,壞的開始並不代表是失敗..................................................

    ReplyDelete
  3. 卡爾.桑得柏:「除非先有夢,否則一切皆不成。」共勉!.. ...............................................................

    ReplyDelete
  4. Joy often comes after sorrow, like morning after night.. . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    ReplyDelete
  5. 一個人的價值,應該看他貢獻了什麼,而不是他取得了什麼......................................... ........................

    ReplyDelete