Other things:
*OGL 4.0 quick reference card
http://www.khronos.org/files/opengl4-quick-reference-card.pdf
*new glext.h and gl3.h updated
*glloader, glew on svn, and opengl extensions viewer for 3.3/4.0 already support it..
wait for sdl, smfl ..
*Waiting Fermi drivers on launch day..
remember all ARB extension no vendor or EXT..
No arb extensions included in 3.3/4.0 spec are:
GL_ARB_shading_language_include
GL_ARB_texture_compression_bptc
so HDR D3d11 texture format not required for ogl 4.0..
also lost is #include in shaders..
5xxx series include ogl4.0 emulating double on cpu? better with double-float emulation..Last Nvidia found:
GL_EXT_shader_image_load_store
GL_EXT_vertex_attrib_64bit
and amd:
GL_EXT_shader_atomic_counters
are not found..
AMD 10.3 includes also first extension blend_func_extended..
AMD 10.3 includes also first extension blend_func_extended..
GL_EXT_vertex_attrib_64bit adds vertex attribs:
so now fp64 is only for uniforms and passing not vertex attribs
remember no double rendertargets tex formats simlar to d3d11..
GL_EXT_shader_image_load_store allow write to random access to texes RWtexture3d
amd has amdx_random_access_target
amd has amdx_random_access_target
ARB_blend_func_extended is called dual source blending in DX10, but got dropped in DX11..
We have tesselation shaders, dynamic shader linkage and compute interop with OCL..
still lacking vs d3d11 is:*multi-threaded rendering:
remember only creation of resources in current drivers.. no parralel command list creation
is driver or hardware issue?
*random access load/store/atomic to texes->GL_EXT_shader_image_load_store amdx_random_access_target+GL_EXT_shader_atomic_counters RWtexture3d
*lacking atomic access to texs and mem barriers in fragment shaders: DeviceMemoryBarrier in d3d11
*GL_AMD_conservative_depth adds:
Conservative oDepth - This algorithm allows a pixel shader to compare the per-pixel depth value of the pixel shader with that in the rasterizer. The result enables early depth culling operations while maintaining the ability to output oDepth from a pixel shader.
So people on OGL forums are criticizing lack of:
*multi-threaded rendering
*shader binaries for avoid compilation preferibly crossvendor and plaform as DX IL DXBC (which is almost 100% compatible with ATI's IL)
*direct state access
* Epic fail for GL_ARB_sampler_objects as no glsl support..
I lack:
*ext_separate_shader_objects
The ability to separate program objects is only going to become increasingly more relevant.
*nv_texture_barrier
crossprocess texture sharing?
Support for programmable offsets in gather is there see 2x speedup in Fermi whitepaper and tesselation
fermi test would be good
fermi:
196.78 drivers support fermi..
full support for OGL 4.0 in fermi launch..
stocasthic transpareny i3d 2010 has fermi perf on this algorithm via ogl sample_shading 10.1 extension
GLwgl_dx_interop
GL_NVX_gpu_memory_info
GL_NV_gpu_program4_1
published then?
try openrl with opencl on fermi..
opencl drivers at fermi launch will have:
1.cuda 3.0 final
Fermi Direct3D 11 interoperability
Fermi HW Profiler support in OpenCL Visual Profiler
Complete BLAS lib, now with complex routines
cuda-gdb support for JIT compiled kernels
add
C++ Class Inheritance
C++ Template Inheritance
Unified interoperability API for Direct3D and OpenGL
OpenGL texture interoperability
2 with new opencl driver support:
*pragma unroll
*local atomics
*icd final
*d3d9/10/11 support
fxc interface has interface support but functions inside it how are called
see "CUDA_Developer_Guide_for_Optimus_Platforms"
http://www.stumblingahead.com/blog/?p=66 talking about tesselation soon..
2010 conferences GPU papers:
*PPOP
*GDC 2010
*I3D 2010
*GPGPU-3
*ASPLOS
MacroSS: Macro-SIMDization of Streaming Applications,
COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders,
"Investigating the Impact of Code Generation on Performance Characteristics of Integer Programs."
EUROGRAPHICS 2010
SIGGRAPH 2010
Interesting new/coming books:
*Game Programming Gems 8
*gpu computing gems 2010?
*Game Engine Gems 1, Volume One
*Programming Massively Parallel Processors: A Hands-
*GPU Pro: Advanced Rendering Techniques
*Multigrid Methods on GPUs
*Game Coding Complete, Third Edition
*Video Game Optimization
*Game Engine Architecture
*Real-Time Cameras
Programming Game AI by Example
Comments:
GL_ARB_shading_language_include-> glsl acepta #include i compilarshaderincludepaths fija <> paths de busqueda
GL_ARB_texture_compression_bptc
textures d3d 11 -> compressor incluido mejor offline
GL_ARB_blend_func_extended
permite usar dos salidas de fragment shader como color in i blend factors
mira ejemplo ventana color reflectiva en un paso usando con rops
GL_ARB_explicit_attrib_location->
fija en glsl explicito como las variables entre shaders se pasan e
GL_ARB_occlusion_query2
permite una boleana para si algo pasa o no
GL_ARB_sampler_objects
BindSampler( uint unit, uint sampler );
When a sampler object is bound to a texture unit, its state supersedes that
of the texture object bound to that texture unit. If the sampler name zero
is bound to a texture unit, the currently bound texture's sampler state
becomes active. A single sampler object may be bound to multiple texture
units simultaneously.
no cambia glsl a hlsl con tex.sampler
GL_ARB_shader_bit_encoding
con esto puedo usar fast float to int de spap paper kun zhou que coge bits
de float i haciendo cosas consige abs, float2int de valor ,etc..
To obtain signed or unsigned integer values holding the encoding of a
floating-point value, use:
genIType floatBitsToInt(genType value);
genUType floatBitsToUint(genType value);
Conversions are done on a component-by-component basis.
GL_ARB_texture_rgb10_a2ui
GL_ARB_texture_swizzle
GL_ARB_timer_query
GL_ARB_vertex_type_2_10_10_10_rev
GL_ARB_draw_indirect
compute interop
void DrawArraysIndirect(enum mode, const void *indirect);
nuevo buffer object
DRAW_INDIRECT_BUFFER
que hay bindeao
se usa como datos del num elementos etc..
que no
pues el puntero indirect se usa?..
GL_ARB_gpu_shader5
GL_ARB_gpu_shader_fp64
Should double-precision fragment shader outputs be supported?
RESOLVED: Not in this extension. Note that we don't have
double-precision framebuffer formats to accept such values.
GL_ARB_shader_subroutine
GL_ARB_tessellation_shader
GL_ARB_texture_buffer_object_rgb32
GL_ARB_transform_feedback2
1.transform feedback objects
2.pause and resume transform feedback
3.ability to draw primitives captured in transform feedback mode without querying the captured
primitive count
DrawTransformFeedback()
GL_ARB_transform_feedback3
unreal 3 news:
*palm webos and iphone support (on mac?)
*3d vision support
http://www.chw.net/2010/02/29-incomodas-preguntas-para-nvidia-sobre-gf100/
AMD Open Physics Initiative Expands Ecosystem with Free DMM for Game Production and Updated version of Bullet Physics
Apple adopts DirectX 11 GPUs, buys AMD Radeon HD 5750
apple news:
*99 dev program
*valve games to mac next month and monkey island 2 se..
*6core macpro next week (12 core?)Mac Pro 'hexacore' Xeon Core i7-980x coming Tuesday
reviews on anandtech 980 gulftown with aes today..
*amd 5750 imac in june? adds opengl 4.0 and ocl full support for mac..
so 10.6.4 will support amd 5xxx
*iphone 4.0 multitasking support
*10.6.3 this month?
CUDA:cuda-gdb gpu support and visual profilers,64 bit and efficient gl interop soon?
http://pasco2010.imag.fr/images/poster_pasco2010.pdf
http://unlimiteddetailtechnology.com/
roxio cienplayer 3d
CLyther = Python + OpenCL
amd open physics (free dmm 2.0 with ocl) and open stereo(qbf stereo for radeon?)
also eyefinity sdk coming soon..
ticker tape avaiable
pgi insider feb 2010 volume
http://www.pgroup.com/lit/articles/insider/v2n1a3.htm
says new fermi support and data region things..
XNA 4.0 winpho 7 tegra2 soon..
Yellow Dog Enterprise Linux for CUDA
http://ydl.net/cuda/iso/YDELforCUDA-6.2-20100302-DVD.iso download free for students
Jenkins Software Announces Data Mining Tool for Game Developers
As a further enhancement, AMD has developed new parallel GPU accelerated implementations of Bullet Physics’ Smoothed Particle Hydrodynamics (SPH) Fluids and Soft Bodies/Cloth. The new code written in OpenCL and Direct Compute will be contributed as open source.
OpenGL usage from an ISV perspective
intel gpa 3.0
nity Announces 3.0 Platform, Support For PS3, iPad, And Android
Valve Confirms Mac Versions Of Steam, Valve Game
http://www.raknet.net/echochamber
Erwin Coumans - SONY - Porting existing code to OpenCL
Ben Gaster AMD and Avi Shapira - Graphic Remedy - Debugging fluid dynamics on OpenCL
Greg Smith - NVIDIA - FFT and OpenCL Profiling
http://www.arm.com/community/software-enablement/google/solution-center-android.php
http://realworldtech.com/forums/index.cfm?action=detail&id=108017&threadid=108017&roomid=2
I can only say that at CAL level (and obviously OpenCL built upon CAL) there are numerous problems with multiple GPUs.
Definitely you're need one thread and one context per each GPU to make it working. But it itsn't enough because almost every CAL function isn't thread safe, thus calling calResMap() (which is the only to get access to local GPU memory) in one thread blocks all other threads/contexts.
And (as I've already wrote at these forums), OpenCL using calCtxWaitForEvent() function instead of CPU burning loop
while (calCtxIsEventDone(calCtx, e) == CAL_RESULT_PENDING);
to wait for GPU kernel completion.
But this calCtxWaitForEvent() also blocks every context currently running. This especially noticeable when there are different devices at system (like 5770+4770). So basically it's simply impossible to asynchronously work with multiple GPUs within single process.
All above things applies to windows version of CAL, never tried linux one.
Yup, and I use 1 thread per GPU too. So 1 thread, 1 context, 1 queue for each GPU. I tried other configurations but they weren't working (i.e. not running in parallel).
Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???
http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=128846&enterthread=y
MacroSS: Macro-SIMDization of Streaming Applications,
COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders,
"Investigating the Impact of Code Generation on Performance Characteristics of Integer Programs."
http://ctk-dev.sourceforge.net/
gmac
http://ctk-dev.sourceforge.net
http://code.google.com/p/fluidic/
http://otoy.com/
http://www.gameenginegems.com/
We're excited to announce a new addition to the Palm® webOS™ development platform: the webOS Plug-in Development Kit (PDK) lets developers extend their webOS applications by writing plug-ins in C or C++. The webOS PDK makes it easy for developers to leverage existing code and exposes new capabilities — including high-performance 3D graphics.
http://code.google.com/p/gyp/source/checkout
0 comentarios:
Post a Comment