Friday, April 2, 2010

Megapost!

Today fools{
*GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders..
*ati 5990 has 4 gpus in board..
*bulldozer benchmarks
}end fools..

ATI has released:
*5870 2gb 6 outputs
*GL 3.3/4.0 drivers (linux &win)
*GPU perfstudio 2.2
*AMD ADL SDK 3.0 (aka eyefinity sdk)
 two stream documents:
*OpenCL Programming Guide
*GPU Computing: Past, Present and Future with ATI Stream Technology michael chu
lame to see backup slide cuda vs opencl..

*vaapi with h.264 decode on westmere cpus on git
well we have now h.264 gpu decode on linux via vaapi for intel nvidia and amd cards..
well amd with 5xxx not ok and intel g45 will wait until q3 2010..
also what about vc-1? ati and nvidia support is there even on 8800gt via latest vdpau..
intel will catch up?
and what about dual hd decode is working with every api/implemenation on latest gpu's all intel hd 2010 graphics amd 5xxx and gt240 and fermi have hardware suport for it..
what about h.264 mvc vaapi exposes it? i.e. api allows that and what about xvba,dxva and vdpau..
also now we have cuvid for mac even in x64 possible so cuvid will allow or allows mvc?
also now gnash vaapi support is integrated in trunk and compilable in mac and windows seems so we can
port vaapi to mac and win and even implement a cuvid vaapi wrapper?
this would allow mplayer and gnash to support gpu video decode on mac for nvidia cards for hd video and flash video..

Nvidia has released:
*Nexus march beta (same as shown in GDC'10 so would allow d3d10 and d3d11 shader debug on Fermi..)
*Optix 2.0b3
*CUDA 3.0
*OGL 3.3 drivers

Still lacking
*Cg 3.0?
*OGL 4 drivers with ext_image_load_store  and ext_image_atomic_counters support
*Linux Fermi drivers (win has 197.17)
*3d vision surround sdk
*3dtv hdmi 1.4 drivers
*256 drivers
*nv d3d11 sdk presumably has:
hair tess and water tess demos
*physx sdk 3.0 with rigid body on gpu and height field water as fermi launch demo?
*voltage tweakers software and max oc with it for gtx 480 (900mhz?) and 470(750/800mhz possible)  and bencharmks 
*optix 2 and nexus 1.0 final
*test voxilla demos and fp64 in cuda and opencl perf cud-z
is 1/4 of tesla products? can be hacked? see ptx code and cubin code..
gpu computing:
*cudart x64 for mac
*cuda-gdb for mac
*cuda-gdb support for ocl binaries
*promised nv official tools for diassembly and assembly of fermi binaries (new cubins old use decuda or also will support sm_1x binaries?) promised soon in sigg asia cuda perf optimization course..
*mac cuda-opengl efficient interop?

official perf
*tesselation 6-8x
*raytracing 3.5x
*sli near 2x on d3d11 games
*3d vision near 2x (see 3d vision blog)
ok but rops and texture power very low and seems tex units capped at half 
as gf104 info surface has 64 tex units also..
nvidia agrees has gddr5 controller problems so no uses gddr5 5000mhz chips to 1250mhz..
470 seems use 4000mhz chips..

reviews notes:
*noticias3d has slides and perf vs 5870 with launch 8.66 drivers so can be good to test perf improvement overall as this would be the perf six months ago.. cat 10.2/10.3 have 10% perf improvement..
*ixbt uses rightmark geo shaders perf..
*anandtech has chen nqueen opencl perf. and folding@home new client but other site claims on 50% perf vs 2-4x improvement anand says
*review have new d3d11 bencharmk by sweden company
*sandra 2010 gpgpu benchmarks but double prec is bad..
*d3d11 games metro, heaven 2.0, dx 11 sdk tess demos, just cause2 benches..
*luxrays perf on beyond3d forums..

Apple has released 10.6.3 without amd cal libs (see pgi 10.2 with cal info saying aticalrt.dylib)
also seems to have almost ogl 3.0 for amd nvidia has some extension less and cpu driver lacks 3/4 extensions..
I have found fermi on ogl binary driver but not support really..
phoronix found ogl drivers has more than 50% perf degradation on 9400 (bad)
but should allow steam to run on mac well..
regarding opencl still no new headers for cuda sdk 3.0 issues and seems no big improvements as no mentioned on release
I have to test 10.6.3 with a cuvid x64 executable i have, optix 2.0b3 sdk, run fft opencl and ocean apple demos on both nvidia and ati gpus.. and run nvidia ocl ft3d sample which says has issues with apple opencl to see if fixed..
also ocl headers in ipad 3.2 sdk golden master?..
still seems no double support on opencl for nvidia and no image support for ati gpus on apple..
add that to no fix double prec on compute shaders..

double in ogl 4.0 (ext_gpu_shader_fp64):
Nvidia has released ogl 3.3 but 4.0 drivers will support fp64 on 
gt275?
also double support is on 4850 cards on ati 4.0 drivers?

also will nvidia release wgl_nvx_dx_interop spec and ext_image_load_store extension on gl 4.0 drivers?
any extension more?

with that at least directcompute and ogl will allow 3d image writes.. opencl allows 2d image writes by default and cuda least good? with from pitch linear mem.. 
lacking is opencl 3d image writes extension and cuda surface functions removed from cuda 3.0beta.. I think they didn't work..
also a post is interesting in nvidia forums saying that now opencl using a writable texture seems to not 

Iz3d 1.11 released has shutter support (i can't test in samsung 120hz because I have activation issues)
but I have found anaglyph which shows algorithm goes good d3d9,10 and 11 in directx sdk samples..
lame ati d3d11 mecha ladybug doesn't work ok..
mecha crashes and ladybug doesn't affect view..
nvidia compute shader ocean demo doesn't see good and 3d vision works 197.13 with that demo!
also some tesselation doesn't work
brief:
*32 bits ok 64 bits examples crash (its my system fault?)
*Youtube 1080p 3D HD works with internet explorer with flash 10.0 not 10.1 and with youtube in english mode!
*Windowed stereo mode works.
so nvidia has to add youtube 3d and windowed stereo mode support (for non quadro) in 256 magical drivers.. better if they add also nvidia 3dtv and hdmi 1.4 out for opengl qb for quadros..

Also diagnostic utility reports about ati aqbs surface format d3d which must be amd catalyst 10.3 3d support shows is not supported altough using catalyst 10.3 whql so seems I must have lcd setup to 120hz or finds a hdmi projector? anyway can't setup hz on catalyst cc now..

I would love to have cuda hook that allows to enable graphics interop trough host for tesla computing driver on windows and running kernel moduly only on linux to run nbody for example..
it's a shame ogl interop was through host if not run on same gpu on earlier versions not it returns error..
also for opencl which reports ogl interop..
both for d3d and ogl interop..
also would add a cubin to ptx on the fly for running nufft or fastest matmul cubin codes on fermi..
also test enabling cu-force-ptx-jti

Would be good to test d3d ocl interop with dxva 2.0 d3d9 tex? interop to build a open source badaboom..
I would love to see on a 8800gt or gt200 with vp2 (vc-1 vld not supported) where we have lower cpu usage if using cuvid, dxva or vdpau.. assuming all these handle it..
the same for dual stream hd and mvc when it gots out..

Currently I found lacking on AMD 5xxx:
*OCL image support
*OGL-OCL tex interop
*xvba 5xxx incorrect decoding

I would love to have a simple ogl qb driver with anaglyph output for testing porting gnash, mplayer etc.. to support 3d stereo rendering and youtube 3d on mac and linux..
Then port 3d vision to these oses..

note I have learnt from Unigine Heaven 2.0 that iz3d doesn't work from launcher but it has .bat files for launching the demo and with that iz3d works in d3d9, in d3d10 crashes as soon as activated and d3d11 depends but no sees good..
note seems windows demo compiled on 7 march has no support for amd old tesselator gl extension editing haven.cfg so doesn't work also doesn't work with amd ogl 4.0 drivers..
on linux you can use heaven 2.0 with ati tesselation as linux build is later..

I would like atioc utility on linux to overclock much than officially supported as msi afterburner does..
have to hook ati adl and see..

angle google code project is improving fast:
*now has ogl samples included with esut.h and support for loops in shaders etc..
*64bit requires 
--- src/libEGL/Display.cpp (revision 49)
+++ src/libEGL/Display.cpp (working copy)
@@ -63,8 +63,8 @@
         }
         else
         {
-            EGLint minSwapInterval = 4;
-            EGLint maxSwapInterval = 0;
+            int minSwapInterval = 4;
+            int maxSwapInterval = 0;
Index: src/libGLESv2/geometry/vertexconversion.h
===================================================================
--- src/libGLESv2/geometry/vertexconversion.h (revision 49)
+++ src/libGLESv2/geometry/vertexconversion.h (working copy)
@@ -122,7 +122,7 @@
     static const std::size_t finalWidth = N+(N&1);
 };
-template
+template
 struct WidenToFour

samples require more changes also..

I have been trying to port 

Crazy drivers:
amd:

cat 10.2  B_95228  3/2
cat 10.3b B_95437  5/2
cat 10.3  B_96537  3/3
10.3a     B_97263 14/3
10.3 ogl4 B_97624 24/3
10.3b     B_97763 25/3
10.4 shipping for ubuntu 10.4

nvidia

196.75 required nexus support
197 or higher ->ocl d3d interop
197.13 cuda 3.0 oficial ones and whql
197.15 ogl 3.3 driver
197.16 notebook verde driver with 3d vision external support
197.17 fermi launch press drivers
197.25 starcraft dx8 issues

geforce 256 in april with 3d vision surround


about ogl 3.3/4.0 drivers

ogl 3.3 samples released..
ogl 4.0
openglext and extensions viewer show ogl 3.3/4.0 extensions
google code gle,gloader load 4.0 extensions.. glew?


info released about http://developer.download.nvidia.com/opengl/specs/GL_EXT_gpu_memory_info.txt


Fermi post launch analysis:

lacks
http://forum.beyond3d.com/showpost.php?p=1414824&postcount=283
latest gpgpu releases:


*thrust 1.2
*jacket 1.3
*Folding@Home fermi with openmm? gpu3 client
*cudpp 1.1.1?

released:
*nvidia Design Garage
*supersonic sled


demos not public:
*Raging Rapids tech demo
*hair demo
*water tesselation demo
*d3d11 demo by sweden company

testing cufft I have found since 2.3 includes nufft cubin only improvements (nufft paper sc09)
nufft has test bench code for 256^3 fft trasnform.
cufft in sc09 has perf over 160gflops for 256x144x192
cufft 3.0 only superfast if power of two every dimension altough different..
if not 20-30glfops

have to test fft dx compute shader microsoft library..




amd 5850 in glext shows


*doesn't have:

GL_EXT_stencil_two_side?

GL_ARB_compatibility (3.1)-> seems present so it present if 3.1 queries?

GL_EXT_shader_image_load_store->present in dll!
accessorStore UAV_STORE
imageLoad imageStore

GL_ARB_shading_language_include->seems has include basic support!

has:

GL_EXT_vertex_attrib_64bit (no published spec)
GL_ARB_texture_compression_bptc->tiene ext
GL_EXT_shader_atomic_counters (no published spec)
imageAtomicAdd imageAtomicSub imageAtomicMin imageAtomicMax
GL_ARB_texture_swizzle->tiene ext_texture_swizzle
GL_ARB_texture_buffer_object_rgb32->tiene ext

propietary
add  GL_AMDX_debug_output
amdx->GL_AMD_name_gen_delete
GL_AMD_conservative_depth



-------------------------------------------
Not implemented extensions in OpenGL 2.0:
GL_EXT_stencil_two_side


-------------------------------------------
Not implemented extensions in OpenGL 3.0:
GL_NV_depth_buffer_float->tiene arb_depth

-------------------------------------------
Not implemented extensions in OpenGL 3.1:
GL_ARB_compatibility

-------------------------------------------
Not implemented extensions in OpenGL 3.3:
GL_ARB_shading_language_include->no ned
GL_ARB_texture_swizzle->tiene ext_texture_swizzle

-------------------------------------------
Not implemented extensions in OpenGL 4.0:
GL_ARB_texture_buffer_object_rgb32->tiene ext

-------------------------------------------

*I can't speak but I have betas of both:
*OpenRL 1.0b2
Has Windows (x32,x64) libraries and Mac x32 only libraries
still lacking linux and mac x64 binaries..
remember optix has mac also but only x32..
no opencl bits found anywhere and support from now and only cpu release
but uses all my 8 cores..
would be nice to port optix to OpenRL samples and tutorial and viceversa..
or better make a OpenRL wrapper to Optix 2.0b3 with fermi support..

*Intel Compilter 12 (composer 2011)
cilk,#pragma vector size(4,8) etc..
vs2010 support 
aes-ni for crc32 and better avx overall
and more..
ipp 7.0 beta
intel compiler 12 beta
tbb old version?

*I have code libecuda,libptx of PFC ptx emulator of UPC now..
trying for windows and update to ptx isa 2.0

*Still no gdebuggerCL

ptx 2.0 isa released:
includes ptx 1.5 info also (llvm ptx nvidia opencl compiler emits this code)
-> mainly adds separate tex and sampler setup also same __param stuff as functions arguments
-> also shows opencl has no name mangling for kernels ocl.. now testing if ptx with addc can be inserted on opencl  i.e. conversor from cuda 3.0 ptx kernels to opencl ptx kernels would be good..
also a cubin to ptx is possible? would allow me to run fastest to date matmul on fermi as fermi doesn't run cubins..
there are some limitation? ask barra creator he has a tesla cubin simulator..
so I could theoretically go from cubin to ocl compatible ptx code..
ptx 2.0 shows for fermi
HAS (also implemented):
*d3d11 cs 5.0 integer instructions
*ldu
*unified address space ld loads 
*surface functions (load and store)-> 3d image writes
has load with format or not and with format loads are not implemented and stores with format also not implemented excepting a b32 format
EXPOSES:
*recursion via..
*functions calls with stack (so recursion possible) without defining and abi
*calloc function
*variable args to functions
note this is not implemented in 2.0
lacking still are:
*jump to register/pointer or call to register/pointer (virtual functions?)
*host system calls malloc,printf..etc..
also cuda book shows:
*fermi predication based on
"A Comparison of Full and Partial Predicated Execution Support
for ILP Processors"
*fermi supports terminating kernels when you want (driver stability improvements?)
also for load balancing..
*cuda fermi implementations priorities.. virtual unified space can take years..
*virtual address space good with GMAC approach for unified unique address for CPU GPU mem now GPU address is unified 


0 comentarios:

Post a Comment