https://dl.dropbox.com/u/1416327/cld3d.rar
First OCL D3D interop headers and spec for Nvidia and AMD and a tool for checking current status:
the headers are in h
and are for d3d9,10,11 for NV and d3d9,10 for AMD..
#include
d3d stuff..
if you #define INCAMD you have even amd functions included and can avoid amd headers..
with these I have complied four exes named cl_xx_interop which check d3d 9,9Ex,10 and 11..
they check extension reporting, try to create a shared context in some ways and then associate a d3d object and textures to ocl and aquire and release it prior to use..
Also cl_d3d10_interop build shows image formats avaiable to OpenCL images see next post..
Testing OCL-D3D11 interop
Checking D3D interop extensions support for platform: NVIDIA Corporation
nv D3D 9 interop extension: Found.
nv D3D 10 interop extension: Found.
nv D3D 11 interop extension: Found.
Using device: GeForce GTX 275
Enabling texture interop checks: image support is supported.
clGetDeviceIDsFromD3D11NV pointer: Found
and it works! (returns d3d associated ocl device)
clCreateFromD3D11BufferNV pointer: Found
clCreateFromD3D11Texture2DNV pointer: Found
clCreateFromD3D11Texture3DNV pointer: Found
clEnqueueAcquireD3D11ObjectsNV pointer: Found
clEnqueueReleaseD3D11ObjectsNV pointer: Found
Testing context creation with
no dev (clCreateContextFromType): OK.
dev info (getdeviceids): OK.
dev info (clGetDeviceIDsFromD3DNV CL_PREFERRED_DEVICES_FOR_D3D9_NV): OK.
Testing clCreateFromD3D11BufferNV: OK.
Testing aquire release stuff: Ok.. releasing it: Ok.
Testing clCreateFromD3D11Texture2DNV: OK.
Testing aquire release stuff: Ok.. releasing it: Ok.
Testing clCreateFromD3D11Texture3DNV: OK.
Testing aquire release stuff: Ok.. releasing it: Ok.
Also I contains a optd3d which displays the four optional d3d11 features (cap bits):
In my gtx 200 displays:
multithreaded comand lists: 0
multithreaded Concurrent Creates: 1
Double precision: 0
Compute Shader: 1
in ATI 5850 displays:
multithreaded comand lists: 0
multithreaded Concurrent Creates: 1
Double precision: 1
Compute Shader: 1
Anyway double prec is not working with loops..
This shows multithreaded command lists are still not supported by ATI (are this supposed to be a implementation issue or a hardware limitation..)
Equal to Nvidia and upcoming Fermi..
I include a CLinfo not mine but for checking CL info..
report.bat create a report.txt with the info of all this executables..
I also include 2dbench for cheking GDI in Windows 7 perf issues.. AMD will fix in Catalyst 10.4..
There is a high efficient matmul for CUDA and AMD cards and peakflops for AMD cards..
%
% compute C = A*B, A:mxk, B:kxn, C:mxn
%
% cubin file = ../method1/decuda_ldsb32_cudasm.cubin
% kernel function = method1_variant_sgemmNN
% use device: GeForce GTX 275
% m=n=k gpu_time (ms) flops (Gflops/s)
32 0.044 1.391
128 0.120 32.451
224 0.194 107.870
320 0.302 201.802
416 0.445 301.033
512 0.619 403.979
608 1.277 327.914
704 1.582 410.719
800 2.618 364.210
896 3.135 427.439
992 4.401 413.123
1088 6.014 398.868
1184 6.981 442.860
1280 8.751 446.365
1376 10.911 444.746
1472 13.403 443.262
1568 16.377 438.470
1664 18.901 454.051
1760 22.437 452.594
1856 25.820 461.218
1952 31.233 443.566
2048 33.317 480.229
2144 39.834 460.841
2240 44.989 465.337
2336 51.643 459.765
2432 56.514 474.095
2528 64.183 468.859
2624 72.540 463.923
2720 79.686 470.387
2816 85.826 484.626
2912 96.003 479.094
3008 108.801 465.942
3104 121.579 458.181
3200 126.446 482.699
3296 138.522 481.473
3392 153.544 473.440
3488 168.797 468.268
3584 177.873 482.085
3680 193.298 480.227
3776 212.160 472.675
3872 229.596 470.947
3968 246.403 472.280
4064 260.086 480.699
clock 1620
% m=n=k gpu_time (ms) flops (Gflops/s)
32 0.040 1.516
128 0.108 36.044
224 0.173 120.900
320 0.265 229.925
416 0.393 341.338
512 0.535 467.090
608 1.107 378.021
704 1.371 474.163
800 2.270 420.030
896 2.751 486.983
992 3.804 477.992
1088 5.205 460.925
1184 6.003 514.983
1280 7.609 513.393
1376 9.396 516.463
1472 11.555 514.134
1568 14.145 507.666
1664 16.427 522.442
1760 19.387 523.784
1856 22.182 536.854
1952 26.860 515.777
2048 28.642 558.623
2144 34.530 531.627
2240 39.585 528.868
2336 44.440 534.292
2432 49.141 545.226
2528 55.274 544.429
2624 63.241 532.134
2720 68.451 547.592
2816 74.160 560.865
2912 82.945 554.516
3008 94.150 538.449
3104 104.581 532.653
3200 108.907 560.436
3296 119.277 559.158
3392 131.982 550.785
3488 146.003 541.376
3584 154.088 556.502
3680 166.307 558.166
3776 184.523 543.469
3872 198.692 544.196
3968 214.158 543.390
4064 223.720 558.838
it's a cubin so will not work in fermi
5850 stock
flopspeak.exe
Device 0
target 8
localRAM 1024 MB
uncachedRemoteRAM 2047 MB
cachedRemoteRAM 2047 MB
engineClock 725 MHz
memoryClock 1000 MHz
wavefrontSize 64
numberOfSIMD 18
doublePrecision 1
localDataShare 1
globalDataShare 1
globalGPR 1
computeShader 1
memExport 1
pitch_alignment 256
surface_alignment 4096
Device 0: execution time 7913.45 ms, achieved 2041.80 gflops
oc 950mhz
flopspeak.exe
engineClock 950 MHz
memoryClock 1000 MHz
Device 0: execution time 6039.35 ms, achieved 2675.40 gflops
matmul.exe 2048 2048 100
Device 0: execution time 1415.08 ms, achieved 1214.06 gflops
oc 950mhz
Device 0: execution time 1114.06 ms, achieved 1542.09 gflops
UPDATE 1:
Nvidia and ATI working together!
opencl.dll from ati sdk 2.01
Found 2 platform(s).
platform[01104BA0]: profile: FULL_PROFILE
platform[01104BA0]: version: OpenCL 1.0 CUDA 3.0.1
platform[01104BA0]: name: NVIDIA CUDA
platform[01104BA0]: vendor: NVIDIA Corporation
platform[01104BA0]: extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_
gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing cl_nv_comp
iler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
platform[01104BA0]: Found 1 device(s).
device[01104C08]: NAME: GeForce GTX 275
device[01104C08]: VENDOR: NVIDIA Corporation
device[01104C08]: PROFILE: FULL_PROFILE
device[01104C08]: VERSION: OpenCL 1.0 CUDA
device[01104C08]: EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd c
l_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing cl_n
v_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_glob
al_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_ba
se_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
device[01104C08]: DRIVER_VERSION: 196.75
device[01104C08]: Type: GPU
device[01104C08]: EXECUTION_CAPABILITIES: Kernel
device[01104C08]: GLOBAL_MEM_CACHE_TYPE: None (0)
device[01104C08]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
device[01104C08]: SINGLE_FP_CONFIG: 0x3e
device[01104C08]: QUEUE_PROPERTIES: 0x3
device[01104C08]: VENDOR_ID: 4318
device[01104C08]: MAX_COMPUTE_UNITS: 30
device[01104C08]: MAX_WORK_ITEM_DIMENSIONS: 3
device[01104C08]: MAX_WORK_GROUP_SIZE: 512
device[01104C08]: PREFERRED_VECTOR_WIDTH_CHAR: 1
device[01104C08]: PREFERRED_VECTOR_WIDTH_SHORT: 1
device[01104C08]: PREFERRED_VECTOR_WIDTH_INT: 1
device[01104C08]: PREFERRED_VECTOR_WIDTH_LONG: 1
device[01104C08]: PREFERRED_VECTOR_WIDTH_FLOAT: 1
device[01104C08]: PREFERRED_VECTOR_WIDTH_DOUBLE: 1
device[01104C08]: MAX_CLOCK_FREQUENCY: 1404
device[01104C08]: ADDRESS_BITS: 32
device[01104C08]: MAX_MEM_ALLOC_SIZE: 229998592
device[01104C08]: IMAGE_SUPPORT: 1
device[01104C08]: MAX_READ_IMAGE_ARGS: 128
device[01104C08]: MAX_WRITE_IMAGE_ARGS: 8
device[01104C08]: IMAGE2D_MAX_WIDTH: 8192
device[01104C08]: IMAGE2D_MAX_HEIGHT: 8192
device[01104C08]: IMAGE3D_MAX_WIDTH: 2048
device[01104C08]: IMAGE3D_MAX_HEIGHT: 2048
device[01104C08]: IMAGE3D_MAX_DEPTH: 2048
device[01104C08]: MAX_SAMPLERS: 16
device[01104C08]: MAX_PARAMETER_SIZE: 4352
device[01104C08]: MEM_BASE_ADDR_ALIGN: 256
device[01104C08]: MIN_DATA_TYPE_ALIGN_SIZE: 16
device[01104C08]: GLOBAL_MEM_CACHELINE_SIZE: 0
device[01104C08]: GLOBAL_MEM_CACHE_SIZE: 0
device[01104C08]: GLOBAL_MEM_SIZE: 919994368
device[01104C08]: MAX_CONSTANT_BUFFER_SIZE: 65536
device[01104C08]: MAX_CONSTANT_ARGS: 9
device[01104C08]: LOCAL_MEM_SIZE: 16384
device[01104C08]: ERROR_CORRECTION_SUPPORT: 0
device[01104C08]: PROFILING_TIMER_RESOLUTION: 1000
device[01104C08]: ENDIAN_LITTLE: 1
device[01104C08]: AVAILABLE: 1
device[01104C08]: COMPILER_AVAILABLE: 1
platform[0313A434]: profile: FULL_PROFILE
platform[0313A434]: version: OpenCL 1.0 ATI-Stream-v2.0.1
platform[0313A434]: name: ATI Stream
platform[0313A434]: vendor: Advanced Micro Devices, Inc.
platform[0313A434]: extensions: cl_khr_icd
platform[0313A434]: Found 2 device(s).
device[0338CA70]: NAME: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
device[0338CA70]: VENDOR: GenuineIntel
device[0338CA70]: PROFILE: FULL_PROFILE
device[0338CA70]: VERSION: OpenCL 1.0 ATI-Stream-v2.0.1
device[0338CA70]: EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomic
s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo
cal_int32_extended_atomics cl_khr_byte_addressable_store
device[0338CA70]: DRIVER_VERSION: 1.0
device[0338CA70]: Type: CPU
device[0338CA70]: EXECUTION_CAPABILITIES: Kernel
device[0338CA70]: GLOBAL_MEM_CACHE_TYPE: Read-Write (2)
device[0338CA70]: CL_DEVICE_LOCAL_MEM_TYPE: Global (2)
device[0338CA70]: SINGLE_FP_CONFIG: 0x7
device[0338CA70]: QUEUE_PROPERTIES: 0x2
device[0338CA70]: VENDOR_ID: 4098
device[0338CA70]: MAX_COMPUTE_UNITS: 8
device[0338CA70]: MAX_WORK_ITEM_DIMENSIONS: 3
device[0338CA70]: MAX_WORK_GROUP_SIZE: 1024
device[0338CA70]: PREFERRED_VECTOR_WIDTH_CHAR: 16
device[0338CA70]: PREFERRED_VECTOR_WIDTH_SHORT: 8
device[0338CA70]: PREFERRED_VECTOR_WIDTH_INT: 4
device[0338CA70]: PREFERRED_VECTOR_WIDTH_LONG: 2
device[0338CA70]: PREFERRED_VECTOR_WIDTH_FLOAT: 4
device[0338CA70]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0
device[0338CA70]: MAX_CLOCK_FREQUENCY: 2698
device[0338CA70]: ADDRESS_BITS: 32
device[0338CA70]: MAX_MEM_ALLOC_SIZE: 536870912
device[0338CA70]: IMAGE_SUPPORT: 0
device[0338CA70]: MAX_READ_IMAGE_ARGS: 0
device[0338CA70]: MAX_WRITE_IMAGE_ARGS: 0
device[0338CA70]: IMAGE2D_MAX_WIDTH: 0
device[0338CA70]: IMAGE2D_MAX_HEIGHT: 0
device[0338CA70]: IMAGE3D_MAX_WIDTH: 0
device[0338CA70]: IMAGE3D_MAX_HEIGHT: 0
device[0338CA70]: IMAGE3D_MAX_DEPTH: 0
device[0338CA70]: MAX_SAMPLERS: 0
device[0338CA70]: MAX_PARAMETER_SIZE: 4096
device[0338CA70]: MEM_BASE_ADDR_ALIGN: 32768
device[0338CA70]: MIN_DATA_TYPE_ALIGN_SIZE: 128
device[0338CA70]: GLOBAL_MEM_CACHELINE_SIZE: 64
device[0338CA70]: GLOBAL_MEM_CACHE_SIZE: 65536
device[0338CA70]: GLOBAL_MEM_SIZE: 1073741824
device[0338CA70]: MAX_CONSTANT_BUFFER_SIZE: 65536
device[0338CA70]: MAX_CONSTANT_ARGS: 8
device[0338CA70]: LOCAL_MEM_SIZE: 32768
device[0338CA70]: ERROR_CORRECTION_SUPPORT: 0
device[0338CA70]: PROFILING_TIMER_RESOLUTION: 1
device[0338CA70]: ENDIAN_LITTLE: 1
device[0338CA70]: AVAILABLE: 1
device[0338CA70]: COMPILER_AVAILABLE: 1
device[04A30050]: NAME: Cypress
device[04A30050]: VENDOR: Advanced Micro Devices, Inc.
device[04A30050]: PROFILE: FULL_PROFILE
device[04A30050]: VERSION: OpenCL 1.0 ATI-Stream-v2.0.1
device[04A30050]: EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_gl
obal_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_e
xtended_atomics
device[04A30050]: DRIVER_VERSION: CAL 1.4.556
device[04A30050]: Type: GPU
device[04A30050]: EXECUTION_CAPABILITIES: Kernel
device[04A30050]: GLOBAL_MEM_CACHE_TYPE: None (0)
device[04A30050]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
device[04A30050]: SINGLE_FP_CONFIG: 0x6
device[04A30050]: QUEUE_PROPERTIES: 0x2
device[04A30050]: VENDOR_ID: 4098
device[04A30050]: MAX_COMPUTE_UNITS: 18
device[04A30050]: MAX_WORK_ITEM_DIMENSIONS: 3
device[04A30050]: MAX_WORK_GROUP_SIZE: 256
device[04A30050]: PREFERRED_VECTOR_WIDTH_CHAR: 16
device[04A30050]: PREFERRED_VECTOR_WIDTH_SHORT: 8
device[04A30050]: PREFERRED_VECTOR_WIDTH_INT: 4
device[04A30050]: PREFERRED_VECTOR_WIDTH_LONG: 2
device[04A30050]: PREFERRED_VECTOR_WIDTH_FLOAT: 4
device[04A30050]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0
device[04A30050]: MAX_CLOCK_FREQUENCY: 725
device[04A30050]: ADDRESS_BITS: 32
device[04A30050]: MAX_MEM_ALLOC_SIZE: 268435456
device[04A30050]: IMAGE_SUPPORT: 0
device[04A30050]: MAX_READ_IMAGE_ARGS: 0
device[04A30050]: MAX_WRITE_IMAGE_ARGS: 0
device[04A30050]: IMAGE2D_MAX_WIDTH: 0
device[04A30050]: IMAGE2D_MAX_HEIGHT: 0
device[04A30050]: IMAGE3D_MAX_WIDTH: 0
device[04A30050]: IMAGE3D_MAX_HEIGHT: 0
device[04A30050]: IMAGE3D_MAX_DEPTH: 0
device[04A30050]: MAX_SAMPLERS: 0
device[04A30050]: MAX_PARAMETER_SIZE: 1024
device[04A30050]: MEM_BASE_ADDR_ALIGN: 4096
device[04A30050]: MIN_DATA_TYPE_ALIGN_SIZE: 128
device[04A30050]: GLOBAL_MEM_CACHELINE_SIZE: 0
device[04A30050]: GLOBAL_MEM_CACHE_SIZE: 0
device[04A30050]: GLOBAL_MEM_SIZE: 268435456
device[04A30050]: MAX_CONSTANT_BUFFER_SIZE: 65536
device[04A30050]: MAX_CONSTANT_ARGS: 8
device[04A30050]: LOCAL_MEM_SIZE: 32768
device[04A30050]: ERROR_CORRECTION_SUPPORT: 0
device[04A30050]: PROFILING_TIMER_RESOLUTION: 1
device[04A30050]: ENDIAN_LITTLE: 1
device[04A30050]: AVAILABLE: 1
device[04A30050]: COMPILER_AVAILABLE: 1
UPDATE 2:
DX formats included in optd3d
thank for share, it is very important . ̄︿ ̄
ReplyDelete