AMD is progressing good and now we have an OpenCL stack with a lot features/optional extensions published and even AMD propietary ones:
regarding supported extensions:
*Image support: well only on 5xxx GPU (i don't know but I expect for CPUs also support as Apple CPU implementation? 4xxx don't expect but should be possible (CAL supports image/textures on 4xxx)))
right now only RGBA formats: but only supports 10/11 formats which are the obligatory ones (Nvidia has 7x).. well all rgba 4 channels so some Nvidia examples won't work..
well in 2.01 you can use export or set GPU_IMAGES_SUPPORT and get it on 5xxx..
no support on CPU also..
2.1 really has 3d tex support (didn't work in 2.01 hack)..
You can test Nvidia ocl samples oclVolumeRender and oclsimpletexture3d if you change samples to load on a 4 channel tex:
basically change in initCLvolume or oclsimpletexture3d h_volume to use 4 channel in initCLvolume:
volume_format.image_channel_order = CL_RGBA;
volume_format.image_channel_data_type = CL_UNORM_INT8;
uchar * h_volume2=(uchar *)malloc(volumeSize[0] * volumeSize[1]*4*volumeSize[2]);
for(int i=0; i<(volumeSize[0] * volumeSize[1]*volumeSize[2]); i++)
h_volume2[4*i]=h_volume[i];
d_volumeArray = clCreateImage3D(cxGPUContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, &volume_format,
volumeSize[0],volumeSize[1], volumeSize[2],
volumeSize[0]*4,volumeSize[0] * volumeSize[1]*4,
h_volume2, &ciErrNum);
Also a bug mentioned in developer notes is linear filtering can't work if setted constant via
constant sampler_t volumeSampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;
(also note CUDA 3.0 final has a bugs regarding linear filtering on 3d tex samples and Nvidia and AMD OpenCL samples aren't working on other IHV OCL because some need constant or __const samplers and others not work with that I don't remember)
so I have to comment this sample in volumesample (simpletex3d does the right ting) in cl shader and setting via adding a parameter
__kernel void
d_render(__global uint *d_output,
uint imageW, uint imageH,
float density, float brightness,
float transferOffset, float transferScale,
__constant float* invViewMatrix
#ifdef IMAGE_SUPPORT
,__read_only image3d_t volume,
__read_only image2d_t transferFunc,
sampler_t volumeSampler
#endif
)
then you can add form simpletex
case 'f':
linearFiltering = !linearFiltering;
ciErrNum = clSetKernelArg(ckKernel, 10, sizeof(cl_sampler), linearFiltering ? &volumeSamplerLinear : &volumeSamplerNearest);
shrLog("\nLinear Filtering Toggled %s...\n", linearFiltering ? "ON" : "OFF");
oclCheckErrorEX(ciErrNum, CL_SUCCESS, pCleanup);
break;
to keyboard gl..
Also checked simultaneous image and opengl interop and it worked..
http://dl.dropbox.com/u/1416327/clinterop2.c
define USEGL or not to check image support or simultaneous image and opengl interop.. (clcreateimageformgltexture..)
note in gl interop there is some image updown and some greener image but Nvidia OCL gets similar output so I have to revise code but for now is working..
Lastly what's lacking is 3d image write support but using a sample (using amd new simple image has some more or less disabled 3d texture write test so changing a few simple lines you can test)..
currently I see even cl shader compiler has imagewrite 3d signature so when changing code if you pass a 3d image object and using int2 for coords says it needs a int4 arg.. changing the code the error you have is "I can't find builtin function #xyz" so seems all is well in place.. including the #pragma enable image 3d writes fails saying extension not know but anyway seems like perhaps next version has this support and implementation more advanced than Nvidia?
GL Interop:
Well AMD example has VBO example and works.. using oclPostprocessGL as PBO example also works..
Even changing code in these two demos for creating VBO and PBO GL objects before CL context creation works and that shouldn't work as is said to be a limitation..
So seems current limitation is GL context before CL context which is per spec as createcontext needs gl context..
Also as said before we have image support GL interop working..
Byteaddresable well works but at IL level seems is some and and or masks so hardware has no native byte addressing also by the fact that IL shows UAV which is a dx concept that needs 32bit aligned accesses so I think not native also even UAV DX byte buffer allows byte addressing but as said at 32bit aligned.. general UAV a like int vectors so a[1] is as a byte pointer a[4]..
I have to see how can AMD fight against race conditions if not native when multiple threads write bytes in same word as if doing RMW must use atomics?.. and overhead
What troubles me the most is that Apple demos as GL interop fails but with GL interop is using image support
and also some copyimagetobuffer or buffertoimage so I have to see if is GL interop problem, image support problem or copy problem.. then I will release it..
regarding samples it has new boxgl sample not mentioned:
AMD is publishing a lot of extensions (some very simple):
*amd_printf: I have checked and now works now with Visual Studio (2.01 with Linux?)
if you don't enable explcitilly compiler fails.. previously no way to disable it..
*amd_fp64: GPU no changes as 2.01 so +-/* only and in CPU whealth of features but no conformance and strictness so no dmad i thing so how can GPCbenchmark get so high gflops in doubles without mad I don't know.. also I don't know if mads are generated for integers now as it seems to use it but last time i checked 2.00 in january didn't use that.. also what about mad24
*amd_media_ops: could obviate pyrit cal++ implementation that on trunc or svn has 2x-3x improvement over OpenCL due to to bitalign use now you can use on opencl now.. would be interesting to see if cpyrit gets support now that on trunc also code used rotate opencl native instruction for better possibly perf with ISAs having it.. also it has SAD support that was anounced by AMD to add to OpenCL on 5xxx launch
in binary there are hints of:
amd_vector3 I assume defines float3 or no.. i think nvidia hasn't it even unofficially so good to have..
also some apple demos #define float3 so good to be able to disable to it similar to printf as this code should now work on AMD without any modification..
amd_atomic_counters similar to unpublished glsl atomic_counters?
also ext_device_fission is currently lacking extension ocumentation.. and only cpu but seems to expose concurrent kernels on Fermi GPUs so hope Nvidia supports it.. anyway it's a shame using two or more commanq queues aren't able to extract perf in Nvidia as Nvidia supports it in CUDA via streams which is a similar concept.. I have to post the code I coded to check it..
Also now I have found trick to enable fully working GLSL sprites used in Nvidia OCL samples and Particles demos simply by changing in fragment shaders tex_coord[0] glPointCoord..
(thanks pboudier AMD forums) before you can show as point redering particles use 'p' key or with menu optiuon..
Regarding samples interop many bugs are fixed but only remain the ones due to architectural differencees:
mainly warp related, shared mem size, workgroup size and other out of resources limitations (register stack?) etc..
Particles and Sort on AMD examples need a fix I posted some time ago..
Biggest complains/suggestions and bugs/limitations are:
*Byte addresable HW native? thread race conditions issues with different byte and same word by multiple threads or perf issues due to atomics usage?
*More image formats support (at least R and RG with half float, float and int8/16)
*3d image writes
*d3d9 and 10 interop: disabled in 2.1 (worked in 2.01?) supposedly coming in next version in Q3 anyway a new khr d3d10 extension is published on Khronos which is similar to nvidia but differs in supported a shared handle parameter and a flag in device info saying if it will get improved perf interop with a shared handle..
Would be good KHR d3d9 and D3d11 extensions as Nvidia and AMD supporting it..
for example DXVA->opencl via this extensions should enable MultiIHV via badabooms in the decoding part and perhaps full using MFT GPU encoders..
*Doubles still lacky on GPU (+-/*) and not conformant on GPU
*No device fission on GPU as AMD shared it's stream processors have support for it in HW at least the 80 shaders blocks so 20 conc kernels in 5xxx theoretically possible.. but I think is a CAL API moslty limiation or AMD IL so can take a while to fix?
So biggest Nvidia remaininglimitations now are:
*3d image writes
*Conc kernels on Fermi->No device fission on GPU or using multiple command streams..
Also Dual DMA is usable?
Subscribe to:
Post Comments (Atom)
KK777一夜激情聊天live show成人自拍貼圖自慰少婦自拍裸體圖片台灣色情成人網站情人視訊網情色留言板視訊美女免費視訊聊天室限制級極度震撼情色論壇色情特區自拍裸女貼圖潮吹性影片觀賞小穴情色片a圖片sex story性愛影片美女做愛成人色情網站性愛圖片成人情色貼圖全裸寫真集圖片走光圖女生陰毛自慰影片色情av1007成人色情聊天室女生自慰裸體照成人影音聊天台灣色情網站色情片打手槍情色天堂成人視訊聊天免費情色網站av網超性感辣妹激突成人論壇情色視訊聊天鹹濕成人網站av成人論壇免費美女視訊
ReplyDelete才華在逆境中展現,在順境中被掩藏。.............................................
ReplyDelete失去金錢的人,失去很多;失去朋友的人,失去更多;失去信心的人,失去所有。......................................................
ReplyDelete脫衣走光色情自慰自拍成人全裸打炮打手槍打飛機巨乳巨奶女優大奶性交性愛淫蕩淫慾淫亂淫婦淫妹淫叫淫水淫女情慾情色做愛限制級波霸口交18禁貼圖寫真視訊
ReplyDelete你的部落帶給我愉快的心情,感謝~~..................................................
ReplyDeleteTwo heads are better than one. ............................................................
ReplyDelete幸福不是一切,人還有責任。..................................................
ReplyDeleteToday is the first day of the rest of your life.......................................................
ReplyDelete恨一個人,比原諒一個人,更傷力氣。..................................................................
ReplyDelete很用心的部落格 祝你人氣百分百 期待您的新文章.................................................................
ReplyDelete死亡是悲哀的,但活得不快樂更悲哀。......................................................................
ReplyDelete在莫非定律中有項笨蛋定律:「一個組織中的笨蛋,恆大於等於三分之二。」......................................................................
ReplyDeleteHey oscar !! Its one of the best blogs that I have read on OpenCL and OpenGL updates. I tried finding your contact but couldn't. I have some doubts and questions regarding OpenCL/OpenGL interop in Linux. It would be kind of u could plz get back to me at vineetsureka9@gmail.com . Thanks
ReplyDelete