Saturday, July 3, 2010

ATI Stream SDK roadmap

I have found a roadmap of ATI Stream SDK till end of year:
DISCLAIMER: It's on Internet and found with some luck.. no breaking of NDA

Let's talk about it..
currently AMD OpenCL lacks:
*opengl interop issues:images interop issues (for example copy buffer to image where image is opengl tex acquired doesn't work)
*expose multiple component images (other than rgba)
*DX interop
*expose all graphics mem (currently 128-256mb)
*Catalyst integration

Stream SDK 2.2 Adds:
*OCL 1.1 (3 component vectors is part and image support ocl 1.1 is multiple component images (r,rg,rgb))
*DX10 interop (seems only that no dx9 or dx11 as Nvidia has)
*mem fences don't generate unneeded barrier isa instructions
*append buffers (what about also about GDS extension)
*seems atomics ocl 1.1 is nothing new? and offline compilation goes final from preview and dpfp adds fma as others are supported now(?)
dpfp fma should allow peak test kernels in benchmarks showing high numbers.. near 400-500gflop/s..

A lot more interesting is 2.3:
*In process compilation of OpenCL kernels means no shipping LLVM compilers (llc,etc..) and hopefully means will be integreated in atiocl.dll so it can ship OpenCL builtin in Catalyst 10.12..
*Library models
*C++ template support in kernels (I hope this means you can specify at least kernels args depeding on template argument for supporting double and float kernels with one code for example similar to CUDA support)
*Adds trig DPFP routines (but still no complete DPFP support seems so horrible as Nvidia shiping since October 2009 and AMD said support coming gradually since end 2009)
The more interesting is last three:
*FFT library: why not also a blas lib, I suspect is ocl based as directcompute has its fft lib
also is going to be part of acml? currently matmul in acml gpu is cal based..
At least I hope to be only binary library and also for Win and Lin so for Mac I hope somehow we can extract  OpenCL kernels or create a wrapper around it and use Wine or something like this to test perf on MAC on AMD boards is correct..
*OpenPhysics: well at least some to play, I expect cloth, soft body and SPH particles support in OpenCL and/or DirectCompute.. well in bullet site there is a preliminary executable with cloth demo and AMD worker talking about state of soft body support (http://code.google.com/p/bullet/issues/detail?id=390#c3) seems since last week also we have directcompute and opencl code for both cloth and soft body in trunk..
Also by September we will have DMM 2.0 as said in GDC that has some OpenCL love for this rigid body+fracture simulatior..
*OpenDecode UVD: Well a cuvid/vdpau library for AMD boards.. Nvidia has put lot of love to GPU video decoding and interop with CUDA/OpenGL with CUVID for Win and Mac and VDPAU for Linux..
VDPAU has since 256 drivers efficient OpenGL and CUDA interop.. CUVID has by def efficient CUDA interop and fast OpenGL/DX interop in Windows.. CUVID for MAC only seems good for feeding data to CUDA as OpenGL interop in MAC is slow right now (and has been so, since ever)..
I expect this brings fast interop to OpenCL on Win and Lin and that adds to DXVA DX interop on Win and AMD xvBA on Linux which VAAPI wrapper seems to provide fast OGL interop..
So Mac seems left but I hope recent video acceleration API on 10.6.3 supports AMD 5xxx cards when released and also that VC1 support is added in addition to h264.. I think this provides fast path to OpenGL textures so as OpenCL/OpenGL interop is fast on Apple provides also OpenCL interop on that platform..
Another thing is if Dual Stream acceleration will be exposed and supported.. on Nvidia I think both DXVA,CUVID and VDPAU expose with a GTX 470 at least..
Also related is Catalyst 10.7 having improved support for VLC 1.1.1 DXVA decoding for AMD cards which I presume relates to fast path GPU/CPU sending of frames works..
Remember also last month Nvidia released a ION driver (257.29) improving perf with DXVA on ION with PCIex x1 as Flash requires (GPU->CPU->GPU roundtrip)..

What's left after OCL 1.1 and stream sdk 2.3:
Well I expect Global Data Share and shared registers extensions,3d image writes, true complete DPFP support (cl_khr_fp64), complete BLAS and FFT lib (as CUBLAS and CUFFT in CUDA),  pinned mem working, host mem accessible from GPU extension, gather4 instructions for image support in OpenCL, and working concurrent kernel and mem transfers (i.e. concurrency in oclCopyCompute CUDA 3.1 example >=20%)


3 comentarios:

  1. "OpenDecode UVD: Well a cuvid/vdpau library for AMD boards.."

    hmm, perhaps you can explain this AMD non sequitur logic.... im at a total loss here ;)

    if you check the very latest Christmas SDK v2.3 you will notice that AMD in their wisdom have decided to include both the headers and the vital OVDecode.lib in their windows SDK...

    However, in the exact same linux SDK they include the headers BUT NOT the OVDecode.so, crazy..

    it seems that Someone Inside AMD DO NOT WANT You actually Using Linux in any form to actually decode HD video on their UVD or this new OpenCL OVDecode code base.... can you explain this Linux incompetence within AMD ?

    ReplyDelete
  2. for easy reference ill mention Here that gbeauche
    is the only 3rd party Developer that has actually committed to Signing an AMD/ATI NDA for access to the closed UVD ASIC documentation.

    and the guy solely responsible for even trying to bring real Linux AMD UVD HD Video decoding to the linux masses to date with his http://www.splitted-desktop.com/~gbeauchesne/

    you will also note that given the total lack of actual fixes to the AMD driver as reported for a very long time now by gbeauche he has even considered porting his codebase to OVDecode
    http://phoronix.com/forums/showpost.php?p=160895&postcount=1067

    in his latest 0.8.x-series
    http://phoronix.com/forums/showpost.php?p=160896&postcount=1068
    but then he discovers that latest AMD OVDecode.so road block LOL
    http://phoronix.com/forums/showpost.php?p=161163&postcount=1083

    no matter what options seem like a reasonable path to take as given by AMD , we are No farther along the Linux HW HD Video Decode path that we where before even UVD1 appaeared all these years ago, Crazy....

    ReplyDelete