*TCC support for GF100 products will be out next week also this drivers will add support for simultaneously running this drivers with normal graphics drivers (that support OGL,DX,DXVA,etc..) I suspect graphics and TCC driver will have to have same version as both write dll's in windows system..
I hope still inf trick works so I can enable on Geforce Fermi and also that this works with Nsight also.. anyway is not severe as 25x drivers seems to add support for CUDA cards (Geforces even) without extending desktop on it so kernels exec time needn't be time limited for TDR.. before it required to use two Nvidia cards and one can be not desktop extended but if you used say a ATI card and a Nvidia card without desktop extended on Nvidia so to use Nsight for example (which requires no desktop extended) it will fail since CUDA will not find a CUDA card..
*There is support for Fermi on MacOs right now on Nvidia 19.5.8f03 drivers released month before but wuthout reposting so have NVDAGF100HAL.kext..
Anyway it only works OGL support as both CUDA and OCL don't use it..
I have to use NVloader injector which anyway doesn't work with Fermi on 64 bit kernel mode.. note gf 275 works in 64 bit with this injector also..
note i wanted to fix and all I found was a cuGetExportTable and something like MacCompatibiltyTID used by a checkcompatibility executable perhaps fixing it will work..
One in Nvidia forums assumed OCL broken fixed creating a OGL context beforce searching for OCL devices (oclgetdevice) but this trick didn't work..
*Storing ELF binaries instead of CUBIN deletes use of decuda hopefully one very interesting solution is..
*Seeing MAGMA webinar seems big release for SC2010 with some big features check magma presentation for what to expect..
*Physx 3.0 nearing to launch as Physx Visual Debugger includes support for it in release note says..
I hope still inf trick works so I can enable on Geforce Fermi and also that this works with Nsight also.. anyway is not severe as 25x drivers seems to add support for CUDA cards (Geforces even) without extending desktop on it so kernels exec time needn't be time limited for TDR.. before it required to use two Nvidia cards and one can be not desktop extended but if you used say a ATI card and a Nvidia card without desktop extended on Nvidia so to use Nsight for example (which requires no desktop extended) it will fail since CUDA will not find a CUDA card..
*There is support for Fermi on MacOs right now on Nvidia 19.5.8f03 drivers released month before but wuthout reposting so have NVDAGF100HAL.kext..
Anyway it only works OGL support as both CUDA and OCL don't use it..
I have to use NVloader injector which anyway doesn't work with Fermi on 64 bit kernel mode.. note gf 275 works in 64 bit with this injector also..
note i wanted to fix and all I found was a cuGetExportTable and something like MacCompatibiltyTID used by a checkcompatibility executable perhaps fixing it will work..
One in Nvidia forums assumed OCL broken fixed creating a OGL context beforce searching for OCL devices (oclgetdevice) but this trick didn't work..
*Storing ELF binaries instead of CUBIN deletes use of decuda hopefully one very interesting solution is..
*Seeing MAGMA webinar seems big release for SC2010 with some big features check magma presentation for what to expect..
*Physx 3.0 nearing to launch as Physx Visual Debugger includes support for it in release note says..
Note this brings concurrent kernels support for Fermi for improved perf on physics simulations.. hopefully also includes wrinkle meshes feature studied by Mueller.
Note also GPU AI notes once Function pointers supported on CUDA will use it so expect a new release sometime optimized even more for Fermi too..
Probably anuonced at Siggraph.. even launching later..
Hope too see also APEX shipping for other than Big AAA games i.e. downloadable for everyone..
Lastly I expect Optix 2.0 and Cg 3.0 final for Siggraph and let's see also in time OpenRL with OpenCL support for GPUs would be interesting for ATI.. Note also Luxrender GPU 1.6 brings Stocasthic Photon Mapping and uses OCL on ATI GPUs also..
*Nsight also is moving fast from beta in early June now is RC state.. launching at siggraph?
*ATI Doubles on DirectCompute are broken.. altough feature flag is supported..
now we can test it with June DX compiler before it was broken for doubles inside control flow (loops, if,etc..)
Mainly compiling works but rendering shows issues vs Fermi which supports nicely..
Download my code.. (coming soon..)
*ATI GLSL driver is somewhat broken at least seems to geometry shaders as I fixed Nvidia Physx fluid demo to use non Cg code on GLSL code and some other fix related to point rendering and now seems to work but not without instabilities present as noise in screen even outside the window it fills..
Download ant test.. (coming soon..)
Also GLSL driver don't implement fetching integer textures with integer coordinates (texel2Dfetch( itex))
*CUDA 3.1 ships with three interesting examples: one is oclTridiagonal a fast tridiagonal solver.. interesting for a DoF cinematic renderer as in Metro using OCL/OGL..
other one is oclCopyComputeOverlap shows two things one is that concurrent kernel and exec is possible in OCL.. via command queues also shows there is an issue in 25x drivers that prevent full scaling I think good is 30% faster code and I obtain 20% on 25x drivers.. on 197 drivers I obtain 30%..
note that on both ATI and Apple platforms even with Nvidia GPUs exhibit no scaling and even negative scaling (-15%)
Good is that is fixed issue in 258.19 OCL 1.1 preview drivers with report CUDA 3.2 so I obtain back 30% overlap.. Note that other 258 drivers don't work (as they report older CUDA code 3.1 and OCL 1.0)..
One more interesting thing is that supposedly even dual dma engine is suposed to work on ocl so overlap would be 50%.. seems restricted to Tesla but Nvidia has been less detailed than double capping on Geforce..
Luckily I have a trick for you 197.44 driver seem to support Dual DMA engine on Geforce Fermi too!
This is OGL 4.0 driver so all you lost to current 256 drivers is CUDA 3.1 features only.. Linux also use OGL 4.0 driver on developer.nvidia.com and you have it...
Note also 197.75 etc don't work only work with this..
*So seems DUAL DMA engine is broken/disabled on Geforce Fermi without any reason other than economical..
*CUDA simpleStream seems to show broken streams on Fermi but it's due to not sending enough work.. a simple fix..
*Matmul by Lschien is one of the fastest ones for CUDA but it fails currently on fermi due to using cubins with obtained modifing tesla asm via decuda cudaasm.. thanks god seems related to volatile keyword don't working correctly pre cuda 3.0.. author suggest a fix assuming this works that uses cuda variant 6.. I have tested and it works so it's fixed I obtain near 850Gflops on Fermi 470 at 1650Mhz..
*Lot of soft updated to CUDA 3.x even 3.1 right now: NPP 3.1,CULA 2.0, JACKET 1.4,OpenMM 2.0 on Zephyr SVN, Gromcas 4.5 beta,GMAC, etc..
More news:
Mainly compiling works but rendering shows issues vs Fermi which supports nicely..
Download my code.. (coming soon..)
*ATI GLSL driver is somewhat broken at least seems to geometry shaders as I fixed Nvidia Physx fluid demo to use non Cg code on GLSL code and some other fix related to point rendering and now seems to work but not without instabilities present as noise in screen even outside the window it fills..
Download ant test.. (coming soon..)
Also GLSL driver don't implement fetching integer textures with integer coordinates (texel2Dfetch( itex))
*CUDA 3.1 ships with three interesting examples: one is oclTridiagonal a fast tridiagonal solver.. interesting for a DoF cinematic renderer as in Metro using OCL/OGL..
other one is oclCopyComputeOverlap shows two things one is that concurrent kernel and exec is possible in OCL.. via command queues also shows there is an issue in 25x drivers that prevent full scaling I think good is 30% faster code and I obtain 20% on 25x drivers.. on 197 drivers I obtain 30%..
note that on both ATI and Apple platforms even with Nvidia GPUs exhibit no scaling and even negative scaling (-15%)
Good is that is fixed issue in 258.19 OCL 1.1 preview drivers with report CUDA 3.2 so I obtain back 30% overlap.. Note that other 258 drivers don't work (as they report older CUDA code 3.1 and OCL 1.0)..
One more interesting thing is that supposedly even dual dma engine is suposed to work on ocl so overlap would be 50%.. seems restricted to Tesla but Nvidia has been less detailed than double capping on Geforce..
Luckily I have a trick for you 197.44 driver seem to support Dual DMA engine on Geforce Fermi too!
This is OGL 4.0 driver so all you lost to current 256 drivers is CUDA 3.1 features only.. Linux also use OGL 4.0 driver on developer.nvidia.com and you have it...
Note also 197.75 etc don't work only work with this..
*So seems DUAL DMA engine is broken/disabled on Geforce Fermi without any reason other than economical..
*CUDA simpleStream seems to show broken streams on Fermi but it's due to not sending enough work.. a simple fix..
*Matmul by Lschien is one of the fastest ones for CUDA but it fails currently on fermi due to using cubins with obtained modifing tesla asm via decuda cudaasm.. thanks god seems related to volatile keyword don't working correctly pre cuda 3.0.. author suggest a fix assuming this works that uses cuda variant 6.. I have tested and it works so it's fixed I obtain near 850Gflops on Fermi 470 at 1650Mhz..
*Lot of soft updated to CUDA 3.x even 3.1 right now: NPP 3.1,CULA 2.0, JACKET 1.4,OpenMM 2.0 on Zephyr SVN, Gromcas 4.5 beta,GMAC, etc..
More news:
Also Nvidia has released a lot of drivers on 256 brach lets see rough differences/progression: 197.44 first OGL 4.0 driver and also unique supporting Dual DMA engine on Fermi on on Tesla/Quadro boards.. also has no issues in single dma.. 256 add cuda 3.1 currently all has issues in concurrent kernel and exec on Fermi at least on OCL 257.15 bluray3d 257.19 nsight june beta drive 257.21 whql (supports nsight) 257.29 ion support accelerated dxva flash with pciex 1x devices 258.18 ocl 1.1 beta (says cuda 3.2!) fixes oclCopyCompute issues (but single DMA on Fermi) 258.48 first supporting Quadro Fermis..
258.69 shipping with 3d vision surround (Nvidia ntersect says youtube 3d support coming soon.. also I hope they add windows DX 3d vision support soon..)
Some other striking news :-) are: *OpenCurrent 1.1 ships with CUDA 3.0 and multigpu code.. well I have been testing with CUDA 3.1 because I have Ubuntu 9.10 and with CUDA 3.1 GCC 4.4 works ok (so Ubuntu 10.4 is right also..) and has some issue related to now supporting true functions I think I must add some static to a function as cuda 3.1 release notes porting guide says.. with CUDA 3.0 GCC 4.4 doesn't work so I have to check with a Ubuntu 9.04 if I don't fix.. *OpenMP to CUDA compiler is avaiable in Cetus 1.2. *PGI 10.6 is avaiable integer support in kernels and VS 2010 support at least. I have tested GATLAS and is good at least 260 gflops on a gtx 275.. and I tested on MAC so at least works in Lin and Mac without much work and says author with 5870 and stream 2.1 achieves some image kernels 1,3 tflops so similar to cal++ matmul in OpenCL! have to test or modify code(?) for double testing.. Some tricks and work to do:
RAW DATA: I know its lame but at least you can emulate 3d image writes on cuda with surfaces using ptx 3d tricks (post later). I have to put a sample of CUVID on MAC. SimpleStreams in cuda seems fermi bad in forums says increase work to 500. matmul chien says put volatile and check (works!) bsgp fermi support checking mail with author.. sparse matrix ati code test on fermi.. See fermi benchmarks: nvidia benchmarks in blog openvidia benchmarks.. cula blog jacket blog same papers of hpg2010 presentations billeter scattering and aov mcguire.. seems also code of rasterization and color stocastic shadow map coming soon..
0 comentarios:
Post a Comment