Monday, March 1, 2010

New findings and questions..

Regarding DX IL:
Well I can only generate with fxc, right?.. also seems I can't feed DX IL to DX via fxc or D3DCompile or CreateComputeshader? seems no.. then what's is for excluding IHVs for doing drivers for it as base.. so no IL modification and compiling from that?ATI SKA also gets it but doesn't generate from it..
Also is DX IL spec public or anywhere avaiable?

Regarding OGL-DX interop trough OCL:
having new DX extensions for OCL Nvidia published only and AMD shipping is possible to
use for OGL-DX interop? (using createcontex with cl_context_properties having both ogl context and d3d context stuff)
It will work someday? one vendor at least? ogl extension says can be possible..
also what about wgl_dx_interop is going to be supported on Vista/7 and d3d9,10,11..
going to be introduced (at least spec txt) in  fermi gl extensions this month?

Regardinng OCL binaries
Found AMD OpenCL 2.01 supports binaries (both CPU and GPU targets) getting and building from that altough AMD release notes list that as a lacking feature..
perhaps since 2.0..
target CPU binary should be cross CPU i.e. work with all CPUs (AMD,INtel) across generations.. even Atoms..
there is a flag for only SSE2 requirement obviating current sse3 it will generate only sse2 code and run even on p4?..
GPU support is good but worse than Nvidia first binary chars are CLBC (cl byte code? similar to DXBC) and has assembly device code so I use 5xxx will not work on 4xxx would be better AMD IL so would work on all GPUs supported..
well at least seems that OCL generates AMD IL v2 in my 5xxx and I don't know if this works on 4xxx..
Also seems ELF binary and also has other info than code so you can't modify code as some headers will show code size etc..
How OCL GPU binaries compare to ELF CAL binaries with Calclassemble?..
Are the formats  going to be published simiar to CAL ELF binaries.. well at least they were some time ago but I don't know if they are up to date or possible now that seems device assmebly is not possible or at least not supported officialy on 5xxx..
Also remember Nvidia gets PTX so should work current OCL binaries with Fermi acording to Fermi compatiblity guide..
also straight ptx allows modificating code.. possible but spec 1.5 still not published (this month?)
Anyway I didn't mention last time but with decuda git now having most GT 200 arch instructions (SM 1.3) you teoretically could write a CUDA wrapper that intercept cubin and using decuda get PTX which you feed to CUDA stack.. don't know why Nvidia doesn't do that.. well they must have reason regarding precision,
mul24 is not native instruction,etc..

I have ported/fixed also swan to windows and added better opencl translation from cuda kernels..
Trying to get CAL++  fiexs for windows also..

Todays news:
*cebit: Geforce 480 boxes show 1.5gb ram 8pin+6pin connector..
ATI competition will be a 950mhz 5000mhz 5870 and 5970 with 4gb at 850mhz
also seems a Computex Dual Fermi possible by Asus..
*http://www.geosenseforwindows.com/ supplies a sensor driver for Windows for using location apis
gives a demo google maps enabled.. works with weather gadget..
Then I hope QT Location API in mobilty  pack has win7 location api support..
*cebit: gigabyte shows laptop with docking station having nvidia gtx2xx for laptops and netbook with multitouch and tablet convertible
*Hardware accelerated graphics and text in Firefox directwrite and 2d in nightly firefox for windows 7
*glu3 soon.
Old news:
*Flash 10.3 beta 3 supports GPU decoding for fluid HD youtube on netbooks with GMA500 (720p) and Broadcom CrystalHD (1080p) with new gma500 and CrystalHD new drivers..
as it's based on DXVA seem now they have proper DXVA on drivers.. it's 1 or dxva 2? i suppose 1 as it works on XP also but can be on vista uses dxva 2.0?..
*C3DL 2.0 now WebGL and beyond
*OpenScreenGraph 1.96 supports OGL ES 1.x and 2.0 and GL 3.x and Iphone coming soon..

OCL tip:
Images on today's hardware have caches, so you get most of the benefits of local memory without the difficulty. The caches are small (~32kB L1, ~768kB L2) so you need a lot of locality to make it work.
Writing to images is very slow. Avoid it if you can.

2 comentarios:

  1. AMD OpenCL running on a Pentium P4
    using SET CPU_ENABLE_ALL=1

    http://moozoo.dyndns.org/misc/OpenCLonP4.jpg

    There are not special instructions for getting it to work.
    I have Windows XP SP3

    This is what I did

    With Visual Studio Professional 2008 installed:
    Install ati-stream-sdk-v2.01-xp32.exe

    Add CPU_ENABLE_ALL value 1 via My Computer-> Properties->Advanced->Enviroment Variables.

    As per SDK instructions copy
    My Documents\ATI Stream\samples\opencl\bin\x86\BoxFilter_Input.bmp
    to
    My Documents\ATI Stream\samples\opencl\cl\app\BoxFilter

    Open
    My Documents\ATI Stream\samples\opencl\OpenCLSamples.sln

    Modify SDKApplication.cpp so that
    SDKSample::SDKSample(std::string sampleName)
    and
    SDKSample::SDKSample(const char* sampleName)
    have
    deviceType = "cpu";
    instead of
    deviceType = "gpu";

    Rebuild Solution.

    ReplyDelete
  2. D3D IL documentation:
    http://msdn.microsoft.com/en-gb/library/ms800355.aspx

    ReplyDelete