<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-8553786559872430029</id><updated>2011-11-28T00:35:24.061+01:00</updated><title type='text'>GPU computing</title><subtitle type='html'>Stay up to date in OpenCL, DirectCompute, CUDA, CAL and OpenGL information</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default?start-index=101&amp;max-results=100'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>171</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-8335020641159591646</id><published>2010-07-10T20:16:00.001+02:00</published><updated>2010-07-11T04:40:54.171+02:00</updated><title type='text'>Some news!</title><content type='html'>News:&lt;br /&gt;*Gpu computing gems 1 or GPU gems 4 source code already avaiable in gpucomputing.net:&lt;br /&gt;Book for November..&lt;br /&gt;Right now:&lt;br /&gt;&lt;br /&gt;&lt;table class="views-table" style="border-collapse: collapse; color: #333333; font-family: Arial, Verdana, sans-serif; font-size: 1em; line-height: 18px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th class="views-field views-field-title" style="border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 3px; padding-right: 1em; text-align: left; width: 280px;"&gt;&lt;a class="active" href="http://www.gpucomputing.net/?q=node/1006&amp;amp;order=title&amp;amp;sort=asc" style="color: #6c9270; text-decoration: none;" title="sort by Title"&gt;Title&lt;/a&gt;&lt;/th&gt;&lt;th class="views-field views-field-timestamp" style="border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 3px; padding-right: 1em; text-align: left; width: 50px;"&gt;&lt;br /&gt;&lt;/th&gt;&lt;th class="views-field views-field-comment-count" style="border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 3px; padding-right: 1em; text-align: left; width: 50px;"&gt;&lt;br /&gt;&lt;/th&gt;&lt;th class="views-field views-field-last-comment-timestamp active" style="border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 3px; padding-right: 1em; text-align: left; width: 200px;"&gt;&lt;br /&gt;&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody style="border-bottom-style: none; border-color: initial; border-left-style: none; border-right-style: none; border-top-color: rgb(204, 204, 204); border-top-style: none; border-top-width: 1px; border-width: initial;"&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="A Programmable Graphics Pipeline in CUDA for Order Independent Transparency" href="http://www.gpucomputing.net/?q=node/1280" style="color: #6c9270; text-decoration: none;" title="A Programmable Graphics Pipeline in CUDA for Order Independent Transparency"&gt;A Programmable Graphics Pipeline in CUDA for Order Independent Transparency&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1280#new" style="color: #6c9270; text-decoration: none;"&gt;1 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="07-10-2010" href="http://www.gpucomputing.net/?q=node/1280" style="color: #6c9270; text-decoration: none;" title="07-10-2010"&gt;07-10-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="High Performance Iterated Function Systems" href="http://www.gpucomputing.net/?q=node/1327" style="color: #6c9270; text-decoration: none;" title="High Performance Iterated Function Systems"&gt;High Performance Iterated Function Systems&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1327#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="07-02-2010" href="http://www.gpucomputing.net/?q=node/1327" style="color: #6c9270; text-decoration: none;" title="07-02-2010"&gt;07-02-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="CUDA Implementation of the Tree-based Barnes Hut n-Body Algorithm" href="http://www.gpucomputing.net/?q=node/1314" style="color: #6c9270; text-decoration: none;" title="CUDA Implementation of the Tree-based Barnes Hut n-Body Algorithm"&gt;CUDA Implementation of the Tree-based Barnes Hut n-Body Algorithm&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1314#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="07-01-2010" href="http://www.gpucomputing.net/?q=node/1314" style="color: #6c9270; text-decoration: none;" title="07-01-2010"&gt;07-01-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Connected Component Labeling in CUDA - demo+code" href="http://www.gpucomputing.net/?q=node/1312" style="color: #6c9270; text-decoration: none;" title="Connected Component Labeling in CUDA - demo+code"&gt;Connected Component Labeling in CUDA - demo+code&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1312#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1312" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="A Practical Guide toMassively ParallelMonte Carlo Simulations: The Ising Model" href="http://www.gpucomputing.net/?q=node/1310" style="color: #6c9270; text-decoration: none;" title="A Practical Guide toMassively ParallelMonte Carlo Simulations: The Ising Model"&gt;A Practical Guide toMassively ParallelMonte Carlo Simulations: The Ising Model&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1310#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1310" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Parallel LDPC Decoding using CUDA" href="http://www.gpucomputing.net/?q=node/1309" style="color: #6c9270; text-decoration: none;" title="Parallel LDPC Decoding using CUDA"&gt;Parallel LDPC Decoding using CUDA&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1309#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1309" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Path Regeneration for Random Walks" href="http://www.gpucomputing.net/?q=node/1308" style="color: #6c9270; text-decoration: none;" title="Path Regeneration for Random Walks"&gt;Path Regeneration for Random Walks&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1308#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1308" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="GPU Gems 4: Deformable Volumetric Registration using B-splines Source Code" href="http://www.gpucomputing.net/?q=node/1307" style="color: #6c9270; text-decoration: none;" title="GPU Gems 4: Deformable Volumetric Registration using B-splines Source Code"&gt;GPU Gems 4: Deformable Volumetric Registration using B-splines Source Code&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1307#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1307" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Monte Carlo Photon Transport on the GPU" href="http://www.gpucomputing.net/?q=node/1306" style="color: #6c9270; text-decoration: none;" title="Monte Carlo Photon Transport on the GPU"&gt;Monte Carlo Photon Transport on the GPU&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1306#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1306" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Lattice-Boltzmann Lighting Models - Source Code" href="http://www.gpucomputing.net/?q=node/1305" style="color: #6c9270; text-decoration: none;" title="Lattice-Boltzmann Lighting Models - Source Code"&gt;Lattice-Boltzmann Lighting Models - Source Code&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1305#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1305" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="RNA folding  GPU" href="http://www.gpucomputing.net/?q=node/1304" style="color: #6c9270; text-decoration: none;" title="RNA folding  GPU"&gt;RNA folding GPU&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1304#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-30-2010" href="http://www.gpucomputing.net/?q=node/1304" style="color: #6c9270; text-decoration: none;" title="06-30-2010"&gt;06-30-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Haar Classifiers for Object Detection with CUDA: Pixel-parallel processing kernel" href="http://www.gpucomputing.net/?q=node/1287" style="color: #6c9270; text-decoration: none;" title="Haar Classifiers for Object Detection with CUDA: Pixel-parallel processing kernel"&gt;Haar Classifiers for Object Detection with CUDA: Pixel-parallel processing kernel&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1287#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-29-2010" href="http://www.gpucomputing.net/?q=node/1287" style="color: #6c9270; text-decoration: none;" title="06-29-2010"&gt;06-29-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Multiclass Support Vector Machine " href="http://www.gpucomputing.net/?q=node/1281" style="color: #6c9270; text-decoration: none;" title="Multiclass Support Vector Machine "&gt;Multiclass Support Vector Machine&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1281#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-29-2010" href="http://www.gpucomputing.net/?q=node/1281" style="color: #6c9270; text-decoration: none;" title="06-29-2010"&gt;06-29-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Parallelization of the x264 encoder using OpenCL" href="http://www.gpucomputing.net/?q=node/1143" style="color: #6c9270; text-decoration: none;" title="Parallelization of the x264 encoder using OpenCL"&gt;Parallelization of the x264 encoder using OpenCL&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1143#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-21-2010" href="http://www.gpucomputing.net/?q=node/1143" style="color: #6c9270; text-decoration: none;" title="06-21-2010"&gt;06-21-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="even" style="background-attachment: initial; background-clip: initial; background-color: #eef9f4; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Cone-Beam CT image reconstruction using the Katsevich Algorithm" href="http://www.gpucomputing.net/?q=node/1142" style="color: #6c9270; text-decoration: none;" title="Cone-Beam CT image reconstruction using the Katsevich Algorithm"&gt;Cone-Beam CT image reconstruction using the Katsevich Algorithm&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1142#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-21-2010" href="http://www.gpucomputing.net/?q=node/1142" style="color: #6c9270; text-decoration: none;" title="06-21-2010"&gt;06-21-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr class="odd" style="background-attachment: initial; background-clip: initial; background-color: #f5f5e9; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(204, 204, 204); border-bottom-style: solid; border-bottom-width: 1px; padding-bottom: 0.1em; padding-left: 0.6em; padding-right: 0.6em; padding-top: 0.1em;"&gt;&lt;td class="views-field views-field-title" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 280px;"&gt;&lt;a alt="Line forward projection on CUDA" href="http://www.gpucomputing.net/?q=node/1031" style="color: #6c9270; text-decoration: none;" title="Line forward projection on CUDA"&gt;Line forward projection on CUDA&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-field-name" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;span style="color: red;"&gt;&lt;/span&gt;&lt;/td&gt;&lt;td class="views-field views-field-comment-count" style="padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 50px;"&gt;&lt;a href="http://www.gpucomputing.net/?q=node/1031#new" style="color: #6c9270; text-decoration: none;"&gt;0 new&lt;/a&gt;&lt;/td&gt;&lt;td class="views-field views-last-comment-timestamp active" style="background-color: inherit; padding-bottom: 0.3em; padding-left: 0.3em; padding-right: 0.3em; padding-top: 0.3em; width: 200px;"&gt;&lt;a alt="06-11-2010" href="http://www.gpucomputing.net/?q=node/1031" style="color: #6c9270; text-decoration: none;" title="06-11-2010"&gt;06-11-2010&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;seems MareNostrum getting a rack of Fermis&amp;nbsp;perhaps with IBM Power7&lt;br /&gt;&lt;br /&gt;&lt;pre style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; white-space: pre-wrap; word-wrap: break-word;"&gt;see now Nvidia would have to publish a PowerPC arch CUDA driver?&lt;/pre&gt;&lt;br /&gt;Or using PathScale with full open source based computing stack..&lt;br /&gt;avaiable here branch from noveau:&lt;br /&gt;&lt;br /&gt;&lt;pre style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; white-space: pre-wrap; word-wrap: break-word;"&gt;http://github.com/pathscale/pscnv/commits/master&lt;/pre&gt;&lt;pre style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; white-space: pre-wrap; word-wrap: break-word;"&gt;&lt;/pre&gt;&lt;div&gt;Seems&amp;nbsp;Nvidia TCC supporting driver Fermi in IBM web site version 197.81&lt;/div&gt;&lt;br /&gt;Catalyst 10.8 beta seems avaiable 10.7 coming 21/7..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Physx 3.0 coming with CPU improvements:&lt;br /&gt;*auto threading&lt;br /&gt;*sse enabled by default&lt;br /&gt;Mafia has new runtimes NVIDIA PhysX driver: 10.04.02_9.10.0522.&lt;br /&gt;Mueller has post paper of Fermi launch demo using water heigh fields plus particles..&lt;br /&gt;Two other papers interesting from Nvidia research are:&lt;br /&gt;&lt;br /&gt;HLBVH: Hierarchical LBVH Construction for Real-Time Ray Tracing&lt;br /&gt;PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes&lt;br /&gt;&lt;br /&gt;Hwu based course from Stanford:&lt;br /&gt;http://code.google.com/p/stanford-cs193g-sp2010/wiki/ClassSchedule&lt;br /&gt;&lt;br /&gt;Two interesting conferences program avaiable:&lt;br /&gt;&lt;br /&gt;PACT&lt;br /&gt;has intel gpu paper demystifying ..&lt;br /&gt;also Revisiting Sorting for GPGPU Stream Architectures&lt;br /&gt;which achieves near 500mkeys/s on gt200..&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace;"&gt;&lt;span class="Apple-style-span" style="white-space: pre-wrap;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;there is a workshop on gpus&lt;br /&gt;&lt;a href="http://informatik.technikum-wien.at/gpusca/"&gt;http://informatik.technikum-wien.at/gpusca/&lt;/a&gt;&lt;br /&gt;and web doesn't work.&lt;br /&gt;&lt;br /&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;The Nineteenth International Conference on&lt;br /&gt;Parallel Architectures and Compilation Techniques (PACT)&lt;br /&gt;Vienna, Austria, September 11-15, 2010&lt;/pre&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;Interesting papers:&lt;br /&gt;Scalable Thread Scheduling and Global Power Management for Heterogeneous Many-Core Architectures&lt;br /&gt;Dynamically Managed Multithreaded Reconfigurable Architectures for Chip Multiprocessors&lt;br /&gt;WAYPOINT: Scaling Coherence to Thousand-core Architectures&lt;br /&gt;Scalable Hardware Support for Conditional Parallelization&lt;br /&gt;Less is More: Trading off Work-Efficiency for Scalability in Irregular Programs&lt;br /&gt;Revisiting Sorting for GPGPU Stream Architectures&lt;br /&gt;D. Merrill, A. Grimshaw&lt;br /&gt;An Integer Programming Framework for Optimizing Shared Memory Use on GPUs&lt;br /&gt;W. Ma, G. Agrawal&lt;br /&gt;DMATiler: Revisiting Loop Tiling for Direct Memory Access&lt;br /&gt;A Software-SVM-based Transactional Memory for Multicore Accelerator Architectures with Local Memory&lt;br /&gt;Automatic Vector Instruction Selection for Dynamic Compilation&lt;br /&gt;An OpenCL Framework for Heterogeneous Multicores with Local Memory&lt;/pre&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;&lt;/pre&gt;&lt;br /&gt;SC10&lt;br /&gt;&lt;br /&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;I would like to review this papers:&lt;br /&gt;Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems&lt;br /&gt;Parallel Fast Gauss Transform&lt;br /&gt;Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers&lt;br /&gt;The Multi-Scale Heart Simulation on Massively Parallel Computers&lt;br /&gt;Using 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs&lt;br /&gt;An 80-Fold Speedup, 15.0 TFlops, Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code&lt;br /&gt;Exploiting 162-Nanosecond End-to-End Communication Latency on Anton&lt;br /&gt;Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories&lt;br /&gt;Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory&lt;br /&gt;OpenMPC: Extended OpenMP Programming and Tuning for GPUs&lt;br /&gt;Scalable Graph Exploration on Multicore Processors&lt;br /&gt;The 48-core SCC processor: the programmer’s view&lt;br /&gt;Exploring a Novel Gathering Method for Finite Element Codes on the Cell/B.E. Architecture&lt;br /&gt;Reducing Multicore Bandwidth Requirements for Combinatorial Multigrid&lt;br /&gt;Diagnosis, Tuning and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method&lt;br /&gt;Scaling Hierarchical N-Body Simulations on GPU Clusters&lt;br /&gt;Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance&lt;br /&gt;The Sharing Tracker: Using Ideas from Cache Coherence Hardware to Reduce Off-Chip Memory Traffic with Non-Coherent Caches&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-8335020641159591646?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/8335020641159591646/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/07/some-news.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/8335020641159591646'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/8335020641159591646'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/07/some-news.html' title='Some news!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-4503570452015264066</id><published>2010-07-05T04:41:00.002+02:00</published><updated>2010-07-05T04:45:54.174+02:00</updated><title type='text'>DirectCompute Double precision Mandelbrot demo and more..</title><content type='html'>In addition to first demo using double precision on &lt;a href="http://oscarbg.blogspot.com/2010/04/mandelbrot-using-ogl-40-features-double.html"&gt;GL 4.0 here&lt;/a&gt; now on DirectCompute:&lt;br /&gt;&lt;b&gt;THIS DEMO NEEDS DX JUNE 2010 RUNTIMES&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;/b&gt; so update if needed&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;this test on AMD&amp;nbsp;shows a ATI DirectCompute DPFP bug.. it shows incorrect rendering..&lt;/div&gt;Also note I learned DirectCompute doesn't admit division with doubles so I have to change /2 with *0.5.&lt;br /&gt;Nvidia Fermi works OK!&lt;br /&gt;&lt;a href="http://dl.dropbox.com/u/1416327/MandelDX11double.rar"&gt;DirectCompute Double precision Mandelbrot&lt;/a&gt;&amp;nbsp;(includes source based almost 100% on Voxilla demo):&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;use test.bat app starts at big zoom so it shows DP in action.. if you exit with esc then shows same rendering at SPFP.. note with mouse you can zoom in out..&amp;nbsp;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;bat calls mandel.exe 0 for SP or mandel.exe 1 for DP..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Also note I expected better perf for AMD than Nvidia but two work very slow i.e. Nvidia runs at full speed (i.e. capped 8x vs Teslas) but AMD has perf issues as it should run at least 3-4x vs Nvidia Fermi..&lt;br /&gt;Also has vector mode running somewhat faster than scalar shader (but not much could run up to 4x faster if compiler didn't extract perf of scalar code but runs not much faster compared to SP where vector code outperforms scalar code by a higher amount).. fermi perf is unaffected by using vector code..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Correct behavior:&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_I4-UBBtkAT4/TDFFTiYJWwI/AAAAAAAAAHA/G_g8tjUvNGw/s1600/dx11m.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="260" src="http://3.bp.blogspot.com/_I4-UBBtkAT4/TDFFTiYJWwI/AAAAAAAAAHA/G_g8tjUvNGw/s400/dx11m.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;Double precision (on GTX 470)&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://dl.dropbox.com/u/1416327/dx11.jpg"&gt;See full Window&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_I4-UBBtkAT4/TDFFssDrv-I/AAAAAAAAAHI/oFXoFgA9Ou4/s1600/dx11fp.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="251" src="http://1.bp.blogspot.com/_I4-UBBtkAT4/TDFFssDrv-I/AAAAAAAAAHI/oFXoFgA9Ou4/s400/dx11fp.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;Single precision&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: left;"&gt;On AMD 5850 DP renders as (i will post image soon):&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Related also I patched&amp;nbsp;Nvidia Physx Demo to work on AMD changing GLSL code using Cg non standard functions.. it exhibits some OpenGL&amp;nbsp;bugs.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Instructions:&lt;/div&gt;&lt;div&gt;Download Nvidia Physx Demo&lt;a href="http://www.nvidia.com/content/forcewithin2/us/download.asp"&gt; here &lt;/a&gt;((select&amp;nbsp;&lt;span class="Apple-style-span" style="font-family: 'Trebuchet MS', sans-serif; font-size: 17px; line-height: 20px;"&gt;&lt;b style="font-size: 14px; font-weight: bold; text-transform: uppercase;"&gt;FLUIDS: TECHNOLOGY DEMO)&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;and use &lt;a href="http://dl.dropbox.com/u/1416327/ParticleFluidDemoatifix.rar"&gt;this exectuable&lt;/a&gt; for running on AMD cards (extract on demo dir).&lt;br /&gt;It shows artifacts on AMD card not on rendering but on desktop outside of program window..&lt;br /&gt;On AMD 5850 DP bad renders as (i will post image soon):&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-4503570452015264066?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/4503570452015264066/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/07/directcompute-double-precision.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4503570452015264066'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4503570452015264066'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/07/directcompute-double-precision.html' title='DirectCompute Double precision Mandelbrot demo and more..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_I4-UBBtkAT4/TDFFTiYJWwI/AAAAAAAAAHA/G_g8tjUvNGw/s72-c/dx11m.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-4694691590495437356</id><published>2010-07-04T21:06:00.003+02:00</published><updated>2010-07-05T00:49:13.241+02:00</updated><title type='text'>A lot of things you probably don't know.. and a worth it..</title><content type='html'>&lt;div&gt;*&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;TCC support for GF100 products will be out next week also this drivers will add support for simultaneously running this drivers with normal graphics drivers (that support OGL,DX,DXVA,etc..) I suspect graphics and TCC driver will have to have same version as both write dll's in windows system..&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;I hope still inf trick works so I can enable on Geforce Fermi and also that this works with Nsight also.. anyway is not severe as 25x drivers seems to add support for CUDA cards (Geforces even) without extending desktop on it so kernels exec time needn't be time limited for TDR.. before it required to use two Nvidia cards and one can be not desktop extended but if you used say a ATI card and a Nvidia card without desktop extended on Nvidia so to use Nsight for example (which requires no desktop extended) it will fail since CUDA will not find a CUDA card..&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;*There is support for Fermi on MacOs right now on Nvidia 19.5.8f03 drivers released month before but wuthout reposting so have NVDAGF100HAL.kext..&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;Anyway it only works OGL support as both CUDA and OCL don't use it..&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;I have to use NVloader injector which anyway doesn't work with Fermi on 64 bit kernel mode.. note gf 275 works in 64 bit with this injector also..&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;note i wanted to fix and all I found was a cuGetExportTable and something like MacCompatibiltyTID used by a checkcompatibility executable perhaps fixing it will work..&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: monospace; white-space: pre-wrap;"&gt;One in Nvidia forums assumed OCL broken fixed creating a OGL context beforce searching for OCL devices (oclgetdevice) but this trick didn't work..&lt;/span&gt;&lt;br /&gt;*Storing ELF binaries instead of CUBIN deletes use of decuda hopefully one &lt;a href="http://forums.nvidia.com/index.php?showtopic=172577"&gt;very interesting solution is&lt;/a&gt;..&lt;br /&gt;&lt;br /&gt;*Seeing MAGMA webinar seems big release for SC2010 with some big features check magma presentation for what to expect..&lt;br /&gt;*Physx 3.0 nearing to launch as Physx Visual Debugger includes support for it in release note says..&lt;/div&gt;&lt;div&gt;Note this brings concurrent kernels support for Fermi for improved perf on physics simulations.. hopefully also includes wrinkle meshes feature studied by Mueller.&lt;/div&gt;&lt;div&gt;Note also GPU AI notes once Function pointers supported on CUDA will use it so expect a new release sometime optimized even more for Fermi too..&lt;/div&gt;&lt;div&gt;Probably anuonced at Siggraph.. even launching later..&lt;/div&gt;&lt;div&gt;Hope too see also APEX shipping for other than Big AAA games i.e. downloadable for everyone..&lt;/div&gt;&lt;div&gt;Lastly I expect Optix 2.0 and Cg 3.0 final &amp;nbsp;for Siggraph and let's see also in time OpenRL with OpenCL support for GPUs would be interesting for ATI.. Note also Luxrender GPU 1.6 brings Stocasthic Photon Mapping and uses OCL on ATI GPUs also..&lt;/div&gt;&lt;div&gt;&lt;div&gt;*Nsight also is moving fast from beta in early June now is RC state.. launching at siggraph?&lt;/div&gt;&lt;/div&gt;&lt;div&gt;*ATI Doubles on DirectCompute are broken.. altough feature flag is supported..&lt;/div&gt;&lt;div&gt;now we can test it with June DX compiler before it was broken for doubles inside control flow (loops, if,etc..)&lt;br /&gt;Mainly compiling works but rendering shows issues vs Fermi which supports nicely..&lt;br /&gt;Download my code.. (coming soon..)&lt;br /&gt;*ATI GLSL driver is somewhat broken at least seems to geometry shaders as I fixed Nvidia Physx fluid demo to use non Cg code on GLSL code and some other fix related to point rendering and now seems to work but not without instabilities present as noise in screen even outside the window it fills..&lt;br /&gt;Download ant test..&amp;nbsp;(coming soon..)&lt;br /&gt;Also GLSL driver don't implement fetching integer textures with integer coordinates (texel2Dfetch( itex))&lt;br /&gt;*CUDA 3.1 ships with three interesting examples: one is oclTridiagonal a fast tridiagonal solver.. interesting for a DoF cinematic renderer as in Metro using OCL/OGL..&lt;br /&gt;other one is oclCopyComputeOverlap shows two things one is that concurrent kernel and exec is possible in OCL.. via command queues also shows there is an issue in 25x drivers that prevent full scaling I think good is 30% faster code and I obtain 20% on 25x drivers.. on 197 drivers I obtain 30%..&lt;br /&gt;note that on both ATI and Apple platforms even with Nvidia GPUs exhibit no scaling and even negative scaling (-15%)&lt;br /&gt;Good is that is fixed issue in 258.19 OCL 1.1 preview drivers with report CUDA 3.2 so I obtain back 30% overlap.. Note that other 258 drivers don't work (as they report older CUDA code 3.1 and OCL 1.0)..&lt;br /&gt;One more interesting thing is that supposedly even dual dma engine is suposed to work on ocl so overlap would be 50%.. seems restricted to Tesla but Nvidia has been less detailed than double capping on Geforce..&lt;br /&gt;Luckily I have a trick for you 197.44 driver seem to support Dual DMA engine on Geforce Fermi too!&lt;br /&gt;This is OGL 4.0 driver so all you lost to current 256 drivers is CUDA 3.1 features only.. Linux also use OGL 4.0 driver on developer.nvidia.com and you have it...&lt;br /&gt;Note also 197.75 etc don't work only work with this..&lt;br /&gt;*So seems DUAL DMA engine is broken/disabled on Geforce Fermi without any reason other than economical..&lt;br /&gt;*CUDA simpleStream seems to show broken streams on Fermi but it's due to not sending enough work.. a simple fix..&lt;br /&gt;*Matmul by Lschien is one of the fastest ones for CUDA but it fails currently on fermi due to using cubins with obtained modifing tesla asm via decuda cudaasm.. thanks god seems related to volatile keyword don't working correctly pre cuda 3.0.. author suggest a fix assuming this works that uses cuda variant 6.. I have tested and it works so it's fixed I obtain near 850Gflops on Fermi 470 at 1650Mhz..&lt;br /&gt;*Lot of soft updated to CUDA 3.x even 3.1 right now: NPP 3.1,CULA 2.0, JACKET 1.4,OpenMM 2.0 on Zephyr SVN, Gromcas 4.5 beta,GMAC, etc..&lt;br /&gt;&lt;br /&gt;More news:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;Also Nvidia has released a lot of drivers on 256 brach lets see rough differences/progression:&lt;br /&gt;197.44 first OGL 4.0 driver and also unique supporting Dual DMA engine on Fermi on on Tesla/Quadro boards.. also has no issues in single dma..&lt;br /&gt;256 add cuda 3.1 currently all has issues in concurrent kernel and exec on Fermi at least on OCL&lt;br /&gt;257.15 bluray3d&lt;br /&gt;257.19 nsight june beta drive&lt;br /&gt;257.21 whql (supports nsight)&lt;br /&gt;257.29 ion support accelerated dxva flash with pciex 1x devices&lt;br /&gt;258.18 ocl 1.1 beta (says cuda 3.2!) fixes oclCopyCompute issues (but single DMA on Fermi)&lt;br /&gt;258.48 first supporting Quadro Fermis..&lt;/pre&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;258.69 shipping with 3d vision surround (Nvidia ntersect says youtube 3d support coming soon.. also I hope they add windows DX 3d vision support soon..)&lt;/pre&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;Some other striking news :-) are:&lt;br /&gt;*OpenCurrent 1.1 ships with CUDA 3.0 and multigpu code..&lt;br /&gt;well I have been testing with CUDA 3.1 because I have Ubuntu 9.10 and with CUDA 3.1 GCC 4.4 works ok (so Ubuntu 10.4 is right also..) and has some issue related to now supporting true functions I think I must add some static to a function as cuda 3.1 release notes porting guide says.. with CUDA 3.0 GCC 4.4 doesn't work so I have to check with a Ubuntu 9.04 if I don't fix..&lt;br /&gt;*OpenMP to CUDA compiler is avaiable in Cetus 1.2.&lt;br /&gt;*PGI 10.6 is avaiable integer support in kernels and VS 2010 support at least.&lt;br /&gt;&lt;br /&gt;I have tested GATLAS and is good at least 260 gflops on a gtx 275.. and I tested on MAC so at least works in Lin and Mac without much work and says author with 5870 and stream 2.1 achieves some image kernels 1,3 tflops so similar to cal++ matmul in OpenCL! have to test or modify code(?) for double testing..&lt;br /&gt;&lt;br /&gt;Some tricks and work to do:&lt;/pre&gt;&lt;pre style="white-space: pre-wrap; word-wrap: break-word;"&gt;RAW DATA:&lt;br /&gt;I know its lame but at least you can emulate 3d image writes on cuda with surfaces using ptx 3d tricks (post later).&lt;br /&gt;I have to put a sample of CUVID on MAC.&lt;br /&gt;SimpleStreams in cuda seems fermi bad in forums says increase work to 500.&lt;br /&gt;matmul chien says put volatile and check (works!)&lt;br /&gt;bsgp fermi support checking mail with author..&lt;br /&gt;sparse matrix ati code test on fermi..&lt;br /&gt;&lt;br /&gt;See fermi benchmarks:&lt;br /&gt;nvidia benchmarks in blog&lt;br /&gt;openvidia benchmarks..&lt;br /&gt;cula blog&lt;br /&gt;jacket blog&lt;br /&gt;same papers of hpg2010 presentations billeter scattering and aov mcguire..&lt;br /&gt;seems also code of rasterization and color stocastic shadow map coming soon..&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-4694691590495437356?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/4694691590495437356/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/07/lot-of-things-you-probably-dont-know.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4694691590495437356'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4694691590495437356'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/07/lot-of-things-you-probably-dont-know.html' title='A lot of things you probably don&apos;t know.. and a worth it..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-7392297602439838525</id><published>2010-07-03T15:40:00.004+02:00</published><updated>2010-07-03T15:43:35.808+02:00</updated><title type='text'>ATI Stream SDK roadmap</title><content type='html'>I have found a roadmap of ATI Stream SDK till end of year:&lt;br /&gt;&lt;div&gt;DISCLAIMER: It's on Internet and found with some luck.. no breaking of NDA&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_I4-UBBtkAT4/TC8q1Bhp0aI/AAAAAAAAAG4/5h6f3gV3CTs/s1600/roadmap.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="297" src="http://3.bp.blogspot.com/_I4-UBBtkAT4/TC8q1Bhp0aI/AAAAAAAAAG4/5h6f3gV3CTs/s400/roadmap.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Let's talk about it..&lt;br /&gt;currently AMD OpenCL lacks:&lt;br /&gt;*opengl interop issues:images interop issues (for example copy buffer to image where image is opengl tex acquired doesn't work)&lt;br /&gt;*expose multiple component images (other than rgba)&lt;br /&gt;*DX interop&lt;br /&gt;*expose all graphics mem (currently 128-256mb)&lt;br /&gt;*Catalyst integration&lt;br /&gt;&lt;br /&gt;Stream SDK 2.2&amp;nbsp;Adds:&lt;br /&gt;*OCL 1.1 (3 component vectors is part and image support ocl 1.1 is multiple component images (r,rg,rgb))&lt;br /&gt;*DX10 interop (seems only that no dx9 or dx11 as Nvidia has)&lt;br /&gt;*mem fences don't generate unneeded barrier isa instructions&lt;br /&gt;*append buffers (what about also about GDS extension)&lt;br /&gt;*seems atomics ocl 1.1 is nothing new? and offline compilation goes final from preview and dpfp adds fma as others are supported now(?)&lt;br /&gt;dpfp fma should allow peak test kernels in benchmarks showing high numbers.. near 400-500gflop/s..&lt;br /&gt;&lt;br /&gt;A lot more interesting is 2.3:&lt;br /&gt;*In process compilation of OpenCL kernels means no shipping LLVM compilers (llc,etc..) and hopefully means will be integreated in atiocl.dll so it can ship OpenCL builtin in Catalyst 10.12..&lt;br /&gt;*Library models&lt;br /&gt;*C++ template support in kernels (I hope this means you can specify at least kernels args depeding on template argument for supporting double and float kernels with one code for example similar to CUDA support)&lt;br /&gt;*Adds trig DPFP routines (but still no complete DPFP support seems so horrible as Nvidia shiping since October 2009 and AMD said support coming gradually since end 2009)&lt;br /&gt;The more interesting is last three:&lt;br /&gt;*FFT library: why not also a blas lib, I suspect is ocl based as directcompute has its fft lib&lt;br /&gt;also is going to be part of acml? currently matmul in acml gpu is cal based..&lt;br /&gt;At least I hope to be only binary library and also for Win and Lin so for Mac I hope somehow we can extract &amp;nbsp;OpenCL kernels or create a wrapper around it and use Wine or something like this to test perf on MAC on AMD boards is correct..&lt;br /&gt;*OpenPhysics: well at least some to play, I expect cloth, soft body and SPH particles support in OpenCL and/or DirectCompute.. well in bullet site there is a preliminary executable with cloth demo and AMD worker talking about state of soft body support (&lt;a href="http://code.google.com/p/bullet/issues/detail?id=390#c3"&gt;http://code.google.com/p/bullet/issues/detail?id=390#c3&lt;/a&gt;) seems since last week also we have directcompute and opencl code for both cloth and soft body in trunk..&lt;br /&gt;Also by September we will have DMM 2.0 as said in GDC that has some OpenCL love for this rigid body+fracture simulatior..&lt;br /&gt;*OpenDecode UVD: Well a cuvid/vdpau library for AMD boards.. Nvidia has put lot of love to GPU video decoding and interop with CUDA/OpenGL with CUVID for Win and Mac and VDPAU for Linux..&lt;br /&gt;VDPAU has since 256 drivers efficient OpenGL and CUDA interop.. CUVID has by def efficient CUDA interop and fast OpenGL/DX interop in Windows.. CUVID for MAC only seems good for feeding data to CUDA as OpenGL interop in MAC is slow right now (and has been so, since ever)..&lt;br /&gt;I expect this brings fast interop to OpenCL on Win and Lin and that adds to DXVA DX interop on Win and AMD xvBA on Linux which VAAPI wrapper seems to provide fast OGL interop..&lt;br /&gt;So Mac seems left but I hope recent video acceleration API on 10.6.3 supports AMD 5xxx cards when released and also that VC1 support is added in addition to h264.. I think this provides fast path to OpenGL textures so as OpenCL/OpenGL interop is fast on Apple provides also OpenCL interop on that platform..&lt;br /&gt;Another thing is if Dual Stream acceleration will be exposed and supported.. on Nvidia I think both DXVA,CUVID and VDPAU expose with a GTX 470 at least..&lt;br /&gt;Also related is Catalyst 10.7 having improved support for VLC 1.1.1 DXVA decoding for AMD cards which I presume relates to fast path GPU/CPU sending of frames works..&lt;br /&gt;Remember also last month Nvidia released a ION driver (257.29) improving perf with DXVA on ION with PCIex x1 as Flash requires (GPU-&amp;gt;CPU-&amp;gt;GPU roundtrip)..&lt;br /&gt;&lt;br /&gt;What's left after OCL 1.1 and stream sdk 2.3:&lt;br /&gt;Well I expect Global Data Share and shared registers extensions,3d image writes, true complete DPFP support (cl_khr_fp64), complete BLAS and FFT lib (as CUBLAS and CUFFT in CUDA), &amp;nbsp;pinned mem working, host mem accessible from GPU extension, gather4 instructions for image support in OpenCL, and working concurrent kernel and mem transfers (i.e. concurrency in oclCopyCompute CUDA 3.1 example &amp;gt;=20%)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-7392297602439838525?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/7392297602439838525/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/07/ati-stream-sdk-roadmap.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/7392297602439838525'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/7392297602439838525'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/07/ati-stream-sdk-roadmap.html' title='ATI Stream SDK roadmap'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_I4-UBBtkAT4/TC8q1Bhp0aI/AAAAAAAAAG4/5h6f3gV3CTs/s72-c/roadmap.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-3439419674534438394</id><published>2010-05-05T20:57:00.022+02:00</published><updated>2010-05-05T22:13:09.793+02:00</updated><title type='text'>About AMD OpenCL 2.1!</title><content type='html'>AMD is progressing good and now we have an OpenCL stack with a lot features/optional extensions published and even AMD propietary ones:&lt;br /&gt;regarding supported extensions:&lt;br /&gt;*Image support: well only on 5xxx GPU (i don't know but I expect for CPUs also support as Apple CPU implementation? 4xxx don't expect but should be possible (CAL supports image/textures on 4xxx)))&lt;br /&gt;right now only RGBA formats: but only supports 10/11 formats which are the obligatory ones (Nvidia has 7x).. well all rgba 4 channels so some Nvidia examples won't work..&lt;br /&gt;well in 2.01 you can use export or set GPU_IMAGES_SUPPORT and get it on 5xxx..&lt;br /&gt;no support on CPU also..&lt;br /&gt;2.1 really has 3d tex support (didn't work in 2.01 hack)..&lt;br /&gt;You can test Nvidia ocl samples oclVolumeRender and oclsimpletexture3d if you change samples to load on a 4 channel tex:&lt;br /&gt;basically change in initCLvolume or oclsimpletexture3d h_volume to use 4 channel in initCLvolume:&lt;br /&gt;volume_format.image_channel_order = CL_RGBA;&lt;br /&gt;volume_format.image_channel_data_type = CL_UNORM_INT8;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; uchar * h_volume2=(uchar *)malloc(volumeSize[0] * volumeSize[1]*4*volumeSize[2]);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; for(int i=0; i&amp;lt;(volumeSize[0] * volumeSize[1]*volumeSize[2]); i++)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; h_volume2[4*i]=h_volume[i];&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; d_volumeArray = clCreateImage3D(cxGPUContext, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, &amp;amp;volume_format, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; volumeSize[0],volumeSize[1], volumeSize[2],&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; volumeSize[0]*4,volumeSize[0] * volumeSize[1]*4,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; h_volume2, &amp;amp;ciErrNum);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;Also a bug mentioned in developer notes is linear filtering can't work if setted constant via&lt;br /&gt;constant sampler_t volumeSampler = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_CLAMP | CLK_FILTER_LINEAR;&lt;br /&gt;(also note CUDA 3.0 final has a bugs regarding linear filtering on 3d tex samples and Nvidia and AMD OpenCL&amp;nbsp; samples aren't working on other IHV OCL because some need constant or __const samplers and others not work with that I don't remember)&lt;br /&gt;so I have to comment this sample in volumesample (simpletex3d does the right ting) in cl shader and setting via adding a parameter &lt;br /&gt;__kernel void&lt;br /&gt;d_render(__global uint *d_output, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; uint imageW, uint imageH,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; float density, float brightness,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; float transferOffset, float transferScale,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; __constant float* invViewMatrix&lt;br /&gt;&amp;nbsp;#ifdef IMAGE_SUPPORT&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ,__read_only image3d_t volume,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; __read_only image2d_t transferFunc,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; sampler_t volumeSampler&lt;br /&gt;&lt;br /&gt;&amp;nbsp;#endif&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; )&lt;br /&gt;then you can add form simpletex&lt;br /&gt;case 'f':&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; linearFiltering = !linearFiltering;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ciErrNum = clSetKernelArg(ckKernel, 10, sizeof(cl_sampler), linearFiltering ? &amp;amp;volumeSamplerLinear : &amp;amp;volumeSamplerNearest);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; shrLog("\nLinear Filtering Toggled %s...\n", linearFiltering ? "ON" : "OFF");&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; oclCheckErrorEX(ciErrNum, CL_SUCCESS, pCleanup);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;to keyboard gl..&lt;br /&gt;Also checked simultaneous image and opengl interop and it worked..&lt;br /&gt;http://dl.dropbox.com/u/1416327/clinterop2.c&lt;br /&gt;define USEGL or not to check image support or simultaneous image and opengl interop.. (clcreateimageformgltexture..)&lt;br /&gt;note in gl interop there is some image updown and some greener image but Nvidia OCL gets similar output so I have to revise code but for now is working..&lt;br /&gt;Lastly what's lacking is 3d image write support but using a sample (using amd new simple image has some more or less disabled 3d texture write test so changing a few simple lines you can test)..&lt;br /&gt;currently I see even cl shader compiler has imagewrite 3d signature so when changing code if you pass a 3d image object and using int2 for coords says it needs a int4 arg.. changing the code the error you have is "I can't find builtin function #xyz" so seems all is well in place.. including&amp;nbsp; the #pragma enable image 3d writes fails saying extension not know but anyway seems like perhaps next version has this support and implementation more advanced than Nvidia?&lt;br /&gt;GL Interop:&lt;br /&gt;Well AMD example has VBO example and works.. using oclPostprocessGL as PBO example also works..&lt;br /&gt;Even&amp;nbsp; changing code in these two demos for creating VBO and PBO GL objects before CL context creation works and that shouldn't work as is said to be a limitation..&lt;br /&gt;So seems current limitation is GL context before CL context which is per spec as createcontext needs gl context..&lt;br /&gt;&lt;br /&gt;Also as said before we have image support GL interop working..&lt;br /&gt;Byteaddresable well works but at IL level seems is some and and or masks so hardware has no native byte addressing also by the fact that IL shows UAV which is a dx concept that needs 32bit aligned accesses so I think not native also even UAV DX byte buffer allows byte addressing but as said at 32bit aligned.. general UAV a like int vectors so a[1] is as a byte pointer a[4]..&lt;br /&gt;I have to see how can AMD fight against race conditions if not native when multiple threads write bytes in same word as if doing RMW must use atomics?.. and overhead&lt;br /&gt;&lt;br /&gt;What troubles me the most is that Apple demos as GL interop fails but with GL interop is using image support&lt;br /&gt;and also some copyimagetobuffer or buffertoimage so I have to see if is GL interop problem, image support problem or copy problem.. then I will release it..&lt;br /&gt;&lt;br /&gt;regarding samples it has new boxgl sample not mentioned:&lt;br /&gt;&lt;br /&gt;AMD is publishing a lot of extensions (some very simple):&lt;br /&gt;*amd_printf: I have checked and now works now with Visual Studio (2.01 with Linux?)&lt;br /&gt;if you don't enable explcitilly compiler fails.. previously no way to disable it..&lt;br /&gt;*amd_fp64: GPU no changes as 2.01 so +-/* only and in CPU whealth of features but no conformance and strictness so no dmad i thing so how can GPCbenchmark get so high gflops in doubles without mad I don't know.. also I don't know if mads are generated for integers now as it seems to use it but last time i checked 2.00 in january didn't use that.. also what about mad24&lt;br /&gt;*amd_media_ops: could obviate pyrit cal++ implementation that on trunc or svn has 2x-3x improvement over OpenCL due to to bitalign use now you can use on opencl now.. would be interesting to see if cpyrit gets support now that on trunc also code used rotate opencl native instruction for better possibly perf with ISAs having it.. also it has SAD support that was anounced by AMD to add to OpenCL on 5xxx launch&lt;br /&gt;in binary there are hints of:&lt;br /&gt;&lt;br /&gt;amd_vector3 I assume defines float3 or no.. i think nvidia hasn't it even unofficially so good to have..&lt;br /&gt;also some apple demos #define float3 so good to be able to disable to it similar to printf as this code should now work on AMD without any modification..&lt;br /&gt;amd_atomic_counters similar to unpublished glsl atomic_counters?&lt;br /&gt;also ext_device_fission is currently lacking extension ocumentation.. and only cpu but seems to expose concurrent kernels on Fermi GPUs so hope Nvidia supports it.. anyway it's a shame using two or more commanq queues aren't able to extract perf in Nvidia as Nvidia supports it in CUDA via streams which is a similar concept.. I have to post the code I coded to check it.. &lt;br /&gt;&lt;br /&gt;Also now I have found trick to enable fully working&amp;nbsp; GLSL sprites used in Nvidia OCL samples and Particles demos simply by changing in fragment shaders tex_coord[0] glPointCoord..&lt;br /&gt;(thanks pboudier AMD forums) before you can show as point redering particles use 'p' key or with menu optiuon..&lt;br /&gt;&lt;br /&gt;Regarding samples interop many bugs are fixed but only remain the ones due to architectural differencees:&lt;br /&gt;mainly warp related, shared mem size, workgroup size and other out of resources limitations (register stack?) etc..&lt;br /&gt;Particles and Sort on AMD examples need a fix I posted some time ago..&lt;br /&gt;&lt;br /&gt;Biggest complains/suggestions and bugs/limitations are:&lt;br /&gt;*Byte addresable HW native? thread race conditions issues with different byte and same word by multiple threads or perf issues due to atomics usage? &lt;br /&gt;*More image formats support (at least R and RG with half float, float and int8/16)&lt;br /&gt;*3d image writes&lt;br /&gt;*d3d9 and 10 interop: disabled in 2.1 (worked in 2.01?) supposedly coming in next version in Q3 anyway a new khr d3d10 extension is published on Khronos which is similar to nvidia but differs in supported a shared handle parameter and a flag in device info saying if it will get improved perf interop with a shared handle..&lt;br /&gt;Would be good KHR d3d9 and D3d11 extensions as Nvidia and AMD supporting it..&lt;br /&gt;for example DXVA-&amp;gt;opencl via this extensions should enable MultiIHV via badabooms in the decoding part and perhaps full using MFT GPU encoders..&lt;br /&gt;*Doubles still lacky on GPU (+-/*) and not conformant on GPU&lt;br /&gt;*No device fission on GPU as AMD shared it's stream processors have support for it in HW at least the 80 shaders blocks so 20 conc kernels in 5xxx theoretically possible.. but I think is a CAL API moslty limiation or AMD IL so can take a while to fix?&lt;br /&gt;&lt;br /&gt;So biggest Nvidia remaininglimitations now are:&lt;br /&gt;*3d image writes&lt;br /&gt;*Conc kernels on Fermi-&amp;gt;No device fission on GPU&amp;nbsp; or using multiple command streams..&lt;br /&gt;Also Dual DMA is usable?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-3439419674534438394?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/3439419674534438394/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/05/about-amd-opencl-21.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3439419674534438394'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3439419674534438394'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/05/about-amd-opencl-21.html' title='About AMD OpenCL 2.1!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-8927352102360884076</id><published>2010-04-07T04:57:00.016+02:00</published><updated>2010-04-07T05:14:19.314+02:00</updated><title type='text'>Mandelbrot using OGL 4.0 features (double precision and precise keyword)</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Arial; font-size: small;"&gt;&lt;span class="Apple-style-span" style="font-size: 13px;"&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: 'Trebuchet MS', Verdana, Arial, Helvetica, sans-serif; font-size: 19px;"&gt;&lt;a href="http://dl.dropbox.com/u/1416327/mandeldouble.rar" rel="nofollow" style="background-attachment: initial; background-clip: initial; background-color: initial; background-image: none; background-origin: initial; background-position: initial initial; background-repeat: initial initial; color: #336699;" target="_blank"&gt;http://dl.dropbox.com/u/1416327/mandeldouble.rar&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;above executable contains:&lt;br /&gt;*uses gl_arb_gpu_shader5 in a float-float implementation with precise keyword for fixing agressive Nvidia compiler&lt;br /&gt;*uses arg_gpu_shader_FP64 with doubles.. and fallbacks to doublepAMD on catalyst no ogl 4.0 drivers..&lt;br /&gt;*normal mandelbrot implementation&lt;br /&gt;&lt;br /&gt;on AMD 5850 with 1920x1080 res ati gl 4.0 drivers&lt;br /&gt;I obtain:&lt;br /&gt;*15fps using float-float approach..&lt;br /&gt;*50fps using doubles with ati gl 4.0 drivers&lt;br /&gt;*130fps using single precision&lt;br /&gt;Note pre GL 4.0 drivers using doublepAMD attain 36fps on double precision now gl 4.0 drivers either doublepAMD or double attain 50fps..&lt;br /&gt;You can deduce Gflop/s seeing glsl code.. it's very high..&lt;br /&gt;&lt;br /&gt;I use #if 1 instead of #ifdef GL_arb_gpu_shader5 or shader_fp64 as then shaders work on Nvidia GL 3.3 drivers altough without doubles (instead double precision) and without precise keywork so float-float is still bad!&lt;br /&gt;i.e. I force #pragma extension enable&lt;br /&gt;&lt;br /&gt;Sorry for big exe is linked to Cg altough not usingly now it was used for correct disabling of optimization on Nvidia.. but it's not working now&lt;br /&gt;program arguments are first pixel start horizontal offset for multimonitor setups second fullscreen or no then fragment and vertex shader and then zoom and x and y offset in mandelbrot..&lt;br /&gt;It's used for showing a enough zoom for &amp;nbsp;seeing diff between single and bigger precision either double precision or float-float.. last argument in use glsl or cg backend..&lt;br /&gt;but as said cg is broken..&lt;br /&gt;&lt;br /&gt;seems amd doesn't optimize so many as float-float without precise works ok!&lt;br /&gt;&lt;br /&gt;AMD 5850 with ogl 4.0 drivers windows 7(with fps)&lt;br /&gt;http://dl.dropbox.com/u/1416327/float-float.jpg&lt;br /&gt;http://dl.dropbox.com/u/1416327/fp32.jpg&lt;br /&gt;http://dl.dropbox.com/u/1416327/fp64.jpg&lt;br /&gt;NVIDIA&lt;br /&gt;bad float-float is similar to amd fp32 photo&lt;br /&gt;&lt;br /&gt;fix for float-float-&amp;gt; use precise&lt;br /&gt;I hope this goes well with Fermi OGL 4.0 drivers and also enable precise keywork for GL 3.0 hardware..&lt;br /&gt;Cg has a trick for disabling optimizations so it's not needed..&lt;br /&gt;search blog for more info..&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="color: #333333; font-family: 'Trebuchet MS', Verdana, Arial, Helvetica, sans-serif; font-size: 19px;"&gt;vec2 dblsgl_add (vec2 x, vec2 y)&lt;br /&gt;{&lt;br /&gt;precise vec2 z;&lt;br /&gt;float t1, t2, e;&lt;br /&gt;&lt;br /&gt;t1 = x.y + y.y;&lt;br /&gt;e = t1 - x.y;&lt;br /&gt;t2 = ((y.y - e) + (x.y - (t1 - e))) + x.x + y.x;&lt;br /&gt;z.y = e = t1 + t2;&lt;br /&gt;z.x = t2 - (e - t1);&lt;br /&gt;return z;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;vec2 dblsgl_mul (vec2 x, vec2 y)&lt;br /&gt;{&lt;br /&gt;precise vec2 z;&lt;br /&gt;float up, vp, u1, u2, v1, v2, mh, ml;&lt;br /&gt;&lt;br /&gt;up = x.y * 4097.0;&lt;br /&gt;u1 = (x.y - up) + up;&lt;br /&gt;u2 = x.y - u1;&lt;br /&gt;vp = y.y * 4097.0;&lt;br /&gt;v1 = (y.y - vp) + vp;&lt;br /&gt;v2 = y.y - v1;&lt;br /&gt;//mh = __fmul_rn(x.y,y.y);&lt;br /&gt;mh = x.y*y.y;&lt;br /&gt;ml = (((u1 * v1 - mh) + u1 * v2) + u2 * v1) + u2 * v2;&lt;br /&gt;//ml = (fmul_rn(x.y,y.x) + __fmul_rn(x.x,y.y)) + ml;&lt;br /&gt;&lt;br /&gt;ml = (x.y*y.x + x.x*y.y) + ml;&lt;br /&gt;&lt;br /&gt;mh=mh;&lt;br /&gt;z.y = up = mh + ml;&lt;br /&gt;z.x = (mh - up) + ml;&lt;br /&gt;return z;&lt;br /&gt;}&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-8927352102360884076?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/8927352102360884076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/04/mandelbrot-using-ogl-40-features-double.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/8927352102360884076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/8927352102360884076'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/04/mandelbrot-using-ogl-40-features-double.html' title='Mandelbrot using OGL 4.0 features (double precision and precise keyword)'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-3295202523331652253</id><published>2010-04-02T04:40:00.001+02:00</published><updated>2010-04-02T04:41:08.541+02:00</updated><title type='text'>Some things I forgot..</title><content type='html'>First is directcompute blog&lt;br /&gt;&lt;a href="http://www.yakiimo3d.com/"&gt;http://www.yakiimo3d.com/&lt;/a&gt;&lt;br /&gt;with nebularot code&lt;br /&gt;also&amp;nbsp;seems rigid body on gpu is starting&lt;br /&gt;before physx sdk 3.0 and batman with a gtx480 will use it &amp;nbsp;I have found on nvidia ftp:&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial; font-size: 13px;"&gt;&lt;a class="name" href="file:///C:/Users/oscar/Documents/Downloads/BatmanAA_GTX480and470_PhysX_Patch.zip" style="display: inline; max-width: 450px; padding-right: 16px; word-break: break-all;"&gt;BatmanAA_GTX480and470_PhysX_Patch.zip&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;dated 30 march 2010&lt;br /&gt;and has rrb.dll that is&lt;br /&gt;gpu accelerated rigid body dynamics v 1.0.0.1 dated 11 january 2010&lt;br /&gt;depends on cudart 3.0 patch 9&lt;br /&gt;exposes&lt;br /&gt;&lt;br /&gt;AgPmDestroySourceConnection&lt;br /&gt;AgPmEventEnabled&lt;br /&gt;AgPmEventLoggingEnabled&lt;br /&gt;AgPmSubmitEvent&lt;br /&gt;PrbCreatePhysicsSDK&lt;br /&gt;PrbFree&lt;br /&gt;PrbGetPhysicsSDK&lt;br /&gt;PrbMalloc&lt;br /&gt;PrbMallocDEBUG&lt;br /&gt;PrbReleasePhysicsSDK&lt;br /&gt;&lt;br /&gt;&lt;div&gt;similar to physxcore&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;AgPmDestroySourceConnection&lt;/div&gt;&lt;div&gt;AgPmEventEnabled&lt;/div&gt;&lt;div&gt;AgPmEventLoggingEnabled&lt;/div&gt;&lt;div&gt;AgPmSubmitEvent&lt;/div&gt;&lt;div&gt;NgCreateCoreSDK&lt;/div&gt;&lt;div&gt;NpCreatePhysicsSDK&lt;/div&gt;&lt;div&gt;NpGetFoundationSDK&lt;/div&gt;&lt;div&gt;NpGetPhysicsSDK&lt;/div&gt;&lt;div&gt;NpGetPhysicsSDKAllocator&lt;/div&gt;&lt;div&gt;NpGetUtilLib&lt;/div&gt;&lt;div&gt;NpReleasePhysicsSDK&lt;/div&gt;&lt;div&gt;NxCreateCoreSDK&lt;/div&gt;&lt;div&gt;NxGetValue&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;batman originial release had this?&lt;br /&gt;&lt;br /&gt;Also note cudart 3.0 patch 9 is found in &amp;nbsp;physx runtime 22 feb 2010&lt;br /&gt;&lt;br /&gt;note we have cuda 3.0 rt dll&lt;br /&gt;8 beta&lt;br /&gt;9 physx&lt;br /&gt;11 optix2b3&lt;br /&gt;14 final&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-3295202523331652253?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/3295202523331652253/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/04/somethings-i-forgot.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3295202523331652253'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3295202523331652253'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/04/somethings-i-forgot.html' title='Some things I forgot..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-6968854761550258454</id><published>2010-04-02T02:13:00.000+02:00</published><updated>2010-04-02T02:13:45.893+02:00</updated><title type='text'>Megapost!</title><content type='html'>&lt;div&gt;Today fools{&lt;/div&gt;&lt;div&gt;*GTX 485 is 512 cores 3gbytes gddr5 and 850/1750 shaders..&lt;/div&gt;&lt;div&gt;*ati 5990 has 4 gpus in board..&lt;/div&gt;&lt;div&gt;*bulldozer benchmarks&lt;/div&gt;&lt;div&gt;}end fools..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ATI has released:&lt;/div&gt;&lt;div&gt;*5870 2gb 6 outputs&lt;/div&gt;&lt;div&gt;*GL 3.3/4.0 drivers (linux &amp;amp;win)&lt;/div&gt;&lt;div&gt;*GPU perfstudio 2.2&lt;/div&gt;&lt;div&gt;*AMD ADL SDK 3.0 (aka eyefinity sdk)&lt;/div&gt;&lt;div&gt;&amp;nbsp;two stream documents:&lt;/div&gt;&lt;div&gt;*OpenCL Programming Guide&lt;/div&gt;&lt;div&gt;*GPU Computing: Past, Present and Future with ATI Stream Technology michael chu&lt;/div&gt;&lt;div&gt;lame to see backup slide cuda vs opencl..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;*vaapi with h.264 decode on westmere cpus on git&lt;/div&gt;&lt;div&gt;well we have now h.264 gpu decode on linux via vaapi for intel nvidia and amd cards..&lt;/div&gt;&lt;div&gt;well amd with 5xxx not ok and intel g45 will wait until q3 2010..&lt;/div&gt;&lt;div&gt;also what about vc-1? ati and nvidia support is there even on 8800gt via latest vdpau..&lt;/div&gt;&lt;div&gt;intel will catch up?&lt;/div&gt;&lt;div&gt;and what about dual hd decode is working with every api/implemenation on latest gpu's all intel hd 2010 graphics amd 5xxx and gt240 and fermi have hardware suport for it..&lt;/div&gt;&lt;div&gt;what about h.264 mvc vaapi exposes it? i.e. api allows that and what about xvba,dxva and vdpau..&lt;/div&gt;&lt;div&gt;also now we have cuvid for mac even in x64 possible so cuvid will allow or allows mvc?&lt;/div&gt;&lt;div&gt;also now gnash vaapi support is integrated in trunk and compilable in mac and windows seems so we can&lt;/div&gt;&lt;div&gt;port vaapi to mac and win and even implement a cuvid vaapi wrapper?&lt;/div&gt;&lt;div&gt;this would allow mplayer and gnash to support gpu video decode on mac for nvidia cards for hd video and flash video..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Nvidia has released:&lt;/div&gt;&lt;div&gt;*Nexus march beta (same as shown in GDC'10 so would allow d3d10 and d3d11 shader debug on Fermi..)&lt;/div&gt;&lt;div&gt;*Optix 2.0b3&lt;/div&gt;&lt;div&gt;*CUDA 3.0&lt;/div&gt;&lt;div&gt;*OGL 3.3 drivers&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Still lacking&lt;/div&gt;&lt;div&gt;*Cg 3.0?&lt;/div&gt;&lt;div&gt;*OGL 4 drivers with&amp;nbsp;ext_image_load_store &amp;nbsp;and ext_image_atomic_counters support&lt;/div&gt;&lt;div&gt;*Linux Fermi drivers (win has 197.17)&lt;/div&gt;&lt;div&gt;*3d vision surround sdk&lt;/div&gt;&lt;div&gt;*3dtv hdmi 1.4 drivers&lt;/div&gt;&lt;div&gt;*256 drivers&lt;/div&gt;&lt;div&gt;*nv d3d11 sdk presumably has:&lt;/div&gt;&lt;div&gt;hair tess and water tess demos&lt;/div&gt;&lt;div&gt;*physx sdk 3.0 with rigid body on gpu and height field water as fermi launch demo?&lt;/div&gt;&lt;div&gt;*voltage tweakers software and max oc with it for gtx 480 (900mhz?) and 470(750/800mhz possible) &amp;nbsp;and bencharmks&amp;nbsp;&lt;/div&gt;&lt;div&gt;*optix 2 and nexus 1.0 final&lt;/div&gt;&lt;div&gt;*test voxilla demos and fp64 in cuda and opencl perf cud-z&lt;/div&gt;&lt;div&gt;is 1/4 of tesla products? can be hacked? see ptx code and cubin code..&lt;/div&gt;&lt;div&gt;gpu computing:&lt;/div&gt;&lt;div&gt;*cudart x64 for mac&lt;/div&gt;&lt;div&gt;*cuda-gdb for mac&lt;/div&gt;&lt;div&gt;*cuda-gdb support for ocl binaries&lt;/div&gt;&lt;div&gt;*promised nv official tools for diassembly and assembly of fermi binaries (new cubins old use decuda or also will support sm_1x binaries?) promised soon in sigg asia cuda perf optimization course..&lt;/div&gt;&lt;div&gt;*mac cuda-opengl efficient interop?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;official perf&lt;/div&gt;&lt;div&gt;*tesselation 6-8x&lt;/div&gt;&lt;div&gt;*raytracing 3.5x&lt;/div&gt;&lt;div&gt;*sli near 2x on d3d11 games&lt;/div&gt;&lt;div&gt;*3d vision near 2x (see 3d vision blog)&lt;/div&gt;&lt;div&gt;ok but&amp;nbsp;rops and texture power very low and seems tex units capped at half&amp;nbsp;&lt;/div&gt;&lt;div&gt;as gf104 info surface has 64 tex units also..&lt;/div&gt;&lt;div&gt;nvidia agrees has gddr5 controller problems so no uses gddr5 5000mhz chips to 1250mhz..&lt;/div&gt;&lt;div&gt;470 seems use 4000mhz chips..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;reviews notes:&lt;/div&gt;&lt;div&gt;*noticias3d has slides and perf vs 5870 with launch 8.66 drivers so can be good to test perf improvement overall as this would be the perf six months ago.. cat 10.2/10.3 have 10% perf improvement..&lt;/div&gt;&lt;div&gt;*ixbt uses rightmark geo shaders perf..&lt;/div&gt;&lt;div&gt;*anandtech has chen nqueen opencl perf. and folding@home new client but other site claims on 50% perf vs 2-4x improvement anand says&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;*review have new d3d11 bencharmk by sweden company&lt;/div&gt;&lt;div&gt;*sandra 2010 gpgpu benchmarks but double prec is bad..&lt;/div&gt;&lt;div&gt;*d3d11 games metro, heaven 2.0, dx 11 sdk tess demos, just cause2 benches..&lt;/div&gt;&lt;div&gt;*luxrays perf on beyond3d forums..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Apple has released 10.6.3 without amd cal libs (see pgi 10.2 with cal info saying aticalrt.dylib)&lt;/div&gt;&lt;div&gt;also seems to have almost ogl 3.0 for amd nvidia has some extension less and cpu driver lacks 3/4 extensions..&lt;/div&gt;&lt;div&gt;I have found fermi on ogl binary driver but not support really..&lt;/div&gt;&lt;div&gt;phoronix found ogl drivers has more than 50% perf degradation on 9400 (bad)&lt;/div&gt;&lt;div&gt;but should allow steam to run on mac well..&lt;/div&gt;&lt;div&gt;regarding opencl still no new headers for cuda sdk 3.0 issues and seems no big improvements as no mentioned on release&lt;/div&gt;&lt;div&gt;I have to test 10.6.3 with a cuvid x64 executable i have, optix 2.0b3 sdk, run fft opencl and ocean apple demos on both nvidia and ati gpus.. and run nvidia ocl ft3d sample which says has issues with apple opencl to see if fixed..&lt;/div&gt;&lt;div&gt;also ocl headers in ipad 3.2 sdk golden master?..&lt;/div&gt;&lt;div&gt;still seems no double support on opencl for nvidia and no image support for ati gpus on apple..&lt;/div&gt;&lt;div&gt;add that to no fix double prec on compute shaders..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;double in ogl 4.0 (ext_gpu_shader_fp64):&lt;/div&gt;&lt;div&gt;Nvidia has released ogl 3.3 but 4.0 drivers will support fp64 on&amp;nbsp;&lt;/div&gt;&lt;div&gt;gt275?&lt;/div&gt;&lt;div&gt;also double support is on 4850 cards on ati 4.0 drivers?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;also will nvidia release wgl_nvx_dx_interop spec and ext_image_load_store extension on gl 4.0 drivers?&lt;/div&gt;&lt;div&gt;any extension more?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;with that at least directcompute and ogl will allow 3d image writes.. opencl allows 2d image writes by default and cuda least good? with from pitch linear mem..&amp;nbsp;&lt;/div&gt;&lt;div&gt;lacking is opencl 3d image writes extension and cuda surface functions removed from cuda 3.0beta.. I think they didn't work..&lt;/div&gt;&lt;div&gt;also a post is interesting in nvidia forums saying that now opencl using a writable texture seems to not&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Iz3d 1.11 released has shutter support (i can't test in samsung 120hz because I have activation issues)&lt;br /&gt;&lt;div&gt;but I have found anaglyph which shows algorithm goes good d3d9,10 and 11 in directx sdk samples..&lt;/div&gt;&lt;div&gt;lame ati d3d11 mecha ladybug doesn't work ok..&lt;/div&gt;&lt;div&gt;mecha crashes and ladybug doesn't affect view..&lt;/div&gt;&lt;div&gt;nvidia compute shader ocean demo doesn't see good and 3d vision works 197.13 with that demo!&lt;/div&gt;&lt;div&gt;also some tesselation doesn't work&lt;/div&gt;&lt;div&gt;brief:&lt;/div&gt;&lt;div&gt;*32 bits ok 64 bits examples crash (its my system fault?)&lt;/div&gt;&lt;div&gt;*Youtube 1080p 3D HD works with internet explorer with flash 10.0 not 10.1 and with youtube in english mode!&lt;/div&gt;&lt;div&gt;*Windowed stereo mode works.&lt;/div&gt;&lt;div&gt;so nvidia has to add youtube 3d and windowed stereo mode support (for non quadro) in 256 magical drivers.. better if they add also nvidia 3dtv and hdmi 1.4 out for opengl qb for quadros..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Also diagnostic utility reports about ati aqbs surface format d3d which must be amd catalyst 10.3 3d support shows is not supported altough using catalyst 10.3 whql so seems I must have lcd setup to 120hz or finds a hdmi projector? anyway can't setup hz on catalyst cc now..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I would love to have cuda hook that allows to enable graphics interop trough host for tesla computing driver on windows and running kernel moduly only on linux to run nbody for example..&lt;/div&gt;&lt;div&gt;it's a shame ogl interop was through host if not run on same gpu on earlier versions not it returns error..&lt;/div&gt;&lt;div&gt;also for opencl which reports ogl interop..&lt;/div&gt;&lt;div&gt;both for d3d and ogl interop..&lt;/div&gt;&lt;div&gt;also would add a cubin to ptx on the fly for running nufft or fastest matmul cubin codes on fermi..&lt;/div&gt;&lt;div&gt;also test enabling cu-force-ptx-jti&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Would be good to test d3d ocl interop with dxva 2.0 d3d9 tex? interop to build a open source badaboom..&lt;/div&gt;&lt;div&gt;I would love to see on a 8800gt or gt200 with vp2 (vc-1 vld not supported) where we have lower cpu usage if using cuvid, dxva or vdpau.. assuming all these handle it..&lt;/div&gt;&lt;div&gt;the same for dual stream hd and mvc when it gots out..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Currently I found lacking on AMD 5xxx:&lt;/div&gt;&lt;div&gt;*OCL image support&lt;/div&gt;&lt;div&gt;*OGL-OCL tex interop&lt;/div&gt;&lt;div&gt;*xvba 5xxx incorrect decoding&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I would love to have a simple ogl qb driver with anaglyph output for testing porting gnash, mplayer etc.. to support 3d stereo rendering and youtube 3d on mac and linux..&lt;/div&gt;&lt;div&gt;Then port 3d vision to these oses..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;note I have learnt from Unigine Heaven 2.0 that iz3d doesn't work from launcher but it has .bat files for launching the demo and with that iz3d works in d3d9, in d3d10 crashes as soon as activated and d3d11 depends but no sees good..&lt;/div&gt;&lt;div&gt;note seems windows demo compiled on 7 march has no support for amd old tesselator gl extension editing haven.cfg so doesn't work also doesn't work with amd ogl 4.0 drivers..&lt;/div&gt;&lt;div&gt;on linux you can use heaven 2.0 with ati tesselation as linux build is later..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I would like atioc utility on linux to overclock much than officially supported as msi afterburner does..&lt;/div&gt;&lt;div&gt;have to hook ati adl and see..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;angle google code project is improving fast:&lt;/div&gt;&lt;div&gt;*now has ogl samples included with esut.h and support for loops in shaders etc..&lt;/div&gt;&lt;div&gt;*64bit requires&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;div&gt;--- src/libEGL/Display.cpp&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;(revision 49)&lt;/div&gt;&lt;div&gt;+++ src/libEGL/Display.cpp&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;(working copy)&lt;/div&gt;&lt;div&gt;@@ -63,8 +63,8 @@&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; else&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/div&gt;&lt;div&gt;- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;EGLint minSwapInterval = 4;&lt;/div&gt;&lt;div&gt;- &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;EGLint maxSwapInterval = 0;&lt;/div&gt;&lt;div&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;int minSwapInterval = 4;&lt;/div&gt;&lt;div&gt;+ &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;int maxSwapInterval = 0;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Index: src/libGLESv2/geometry/vertexconversion.h&lt;/div&gt;&lt;div&gt;===================================================================&lt;/div&gt;&lt;div&gt;--- src/libGLESv2/geometry/vertexconversion.h&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;(revision 49)&lt;/div&gt;&lt;div&gt;+++ src/libGLESv2/geometry/vertexconversion.h&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;(working copy)&lt;/div&gt;&lt;div&gt;@@ -122,7 +122,7 @@&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; static const std::size_t finalWidth = N+(N&amp;amp;1);&lt;/div&gt;&lt;div&gt;&amp;nbsp;};&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;-template &lt;unsigned int="" n=""&gt;&lt;/unsigned&gt;&lt;/div&gt;&lt;div&gt;+template &lt;std::size_t n=""&gt;&lt;/std::size_t&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;struct WidenToFour&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;samples require more changes also..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;I have been trying to port&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;Crazy drivers:&lt;br /&gt;amd:&lt;br /&gt;&lt;br /&gt;cat 10.2 &amp;nbsp;B_95228 &amp;nbsp;3/2&lt;br /&gt;cat 10.3b B_95437 &amp;nbsp;5/2&lt;br /&gt;cat 10.3 &amp;nbsp;B_96537 &amp;nbsp;3/3&lt;br /&gt;10.3a &amp;nbsp; &amp;nbsp; B_97263 14/3&lt;br /&gt;10.3 ogl4 B_97624 24/3&lt;br /&gt;10.3b &amp;nbsp; &amp;nbsp; B_97763 25/3&lt;br /&gt;10.4 shipping for ubuntu 10.4&lt;br /&gt;&lt;br /&gt;nvidia&lt;br /&gt;&lt;br /&gt;196.75 required nexus support&lt;br /&gt;197 or higher -&amp;gt;ocl d3d interop&lt;br /&gt;197.13 cuda 3.0 oficial ones and&amp;nbsp;whql&lt;br /&gt;197.15 ogl 3.3 driver&lt;br /&gt;197.16 notebook verde driver with 3d vision external support&lt;br /&gt;197.17 fermi launch press drivers&lt;br /&gt;197.25 starcraft dx8 issues&lt;br /&gt;&lt;br /&gt;geforce 256 in april&amp;nbsp;with 3d vision surround&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;about ogl 3.3/4.0 drivers&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;ogl 3.3 samples released..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;ogl 4.0&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;openglext and extensions viewer show ogl 3.3/4.0 extensions&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;google code gle,gloader load 4.0 extensions.. glew?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;info released about&amp;nbsp;http://developer.download.nvidia.com/opengl/specs/GL_EXT_gpu_memory_info.txt&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Fermi post launch analysis:&lt;br /&gt;&lt;br /&gt;lacks&lt;br /&gt;http://forum.beyond3d.com/showpost.php?p=1414824&amp;amp;postcount=283&lt;br /&gt;latest gpgpu releases:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;*thrust 1.2&lt;br /&gt;*jacket 1.3&lt;br /&gt;*Folding@Home fermi with openmm? gpu3 client&lt;br /&gt;*cudpp 1.1.1?&lt;br /&gt;&lt;br /&gt;released:&lt;br /&gt;*nvidia Design Garage&lt;br /&gt;*supersonic sled&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;demos not public:&lt;br /&gt;*Raging Rapids tech demo&lt;br /&gt;*hair demo&lt;br /&gt;*water tesselation demo&lt;br /&gt;*d3d11 demo by sweden company&lt;br /&gt;&lt;br /&gt;testing cufft I have found since 2.3 includes nufft cubin only improvements&amp;nbsp;(nufft paper sc09)&lt;br /&gt;nufft has test bench code for 256^3 fft trasnform.&lt;br /&gt;cufft in sc09 has perf over 160gflops for 256x144x192&lt;br /&gt;cufft 3.0 only superfast if power of two every dimension altough different..&lt;br /&gt;if not 20-30glfops&lt;br /&gt;&lt;br /&gt;have to test&amp;nbsp;fft dx compute shader microsoft library..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;amd 5850 in glext shows&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;*doesn't have:&lt;br /&gt;&lt;br /&gt;GL_EXT_stencil_two_side?&lt;br /&gt;&lt;br /&gt;GL_ARB_compatibility (3.1)-&amp;gt; seems present so it present if 3.1 queries?&lt;br /&gt;&lt;br /&gt;GL_EXT_shader_image_load_store-&amp;gt;present in dll!&lt;br /&gt;accessorStore UAV_STORE&lt;br /&gt;imageLoad imageStore&lt;br /&gt;&lt;br /&gt;GL_ARB_shading_language_include-&amp;gt;seems has include basic support!&lt;br /&gt;&lt;br /&gt;has:&lt;br /&gt;&lt;br /&gt;GL_EXT_vertex_attrib_64bit (no published spec)&lt;br /&gt;GL_ARB_texture_compression_bptc-&amp;gt;tiene ext&lt;br /&gt;GL_EXT_shader_atomic_counters&amp;nbsp;(no published spec)&lt;br /&gt;imageAtomicAdd imageAtomicSub imageAtomicMin imageAtomicMax&lt;br /&gt;GL_ARB_texture_swizzle-&amp;gt;tiene ext_texture_swizzle&lt;br /&gt;GL_ARB_texture_buffer_object_rgb32-&amp;gt;tiene ext&lt;br /&gt;&lt;br /&gt;propietary&lt;br /&gt;add &amp;nbsp;GL_AMDX_debug_output&lt;br /&gt;amdx-&amp;gt;GL_AMD_name_gen_delete&lt;br /&gt;GL_AMD_conservative_depth&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;Not implemented extensions in OpenGL 2.0:&lt;br /&gt;GL_EXT_stencil_two_side&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;Not implemented extensions in OpenGL 3.0:&lt;br /&gt;GL_NV_depth_buffer_float-&amp;gt;tiene arb_depth&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;Not implemented extensions in OpenGL 3.1:&lt;br /&gt;GL_ARB_compatibility&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;Not implemented extensions in OpenGL 3.3:&lt;br /&gt;GL_ARB_shading_language_include-&amp;gt;no ned&lt;br /&gt;GL_ARB_texture_swizzle-&amp;gt;tiene ext_texture_swizzle&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;Not implemented extensions in OpenGL 4.0:&lt;br /&gt;GL_ARB_texture_buffer_object_rgb32-&amp;gt;tiene ext&lt;br /&gt;&lt;br /&gt;-------------------------------------------&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;*I can't speak but I have betas of both:&lt;/div&gt;&lt;div&gt;*OpenRL 1.0b2&lt;/div&gt;&lt;div&gt;Has Windows (x32,x64) libraries and Mac x32 only libraries&lt;/div&gt;&lt;div&gt;still lacking linux and mac x64 binaries..&lt;/div&gt;&lt;div&gt;remember optix has mac also but only x32..&lt;/div&gt;&lt;div&gt;no opencl bits found anywhere and support from now and only cpu release&lt;/div&gt;&lt;div&gt;but uses all my 8 cores..&lt;/div&gt;&lt;div&gt;would be nice to port optix to OpenRL samples and tutorial and viceversa..&lt;/div&gt;&lt;div&gt;or better make a OpenRL wrapper to Optix 2.0b3 with fermi support..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;*Intel Compilter 12 (composer 2011)&lt;/div&gt;&lt;div&gt;cilk,#pragma vector size(4,8) etc..&lt;/div&gt;&lt;div&gt;vs2010 support&amp;nbsp;&lt;/div&gt;&lt;div&gt;aes-ni for crc32 and better avx overall&lt;/div&gt;&lt;div&gt;and more..&lt;/div&gt;&lt;div&gt;ipp 7.0 beta&lt;/div&gt;&lt;div&gt;intel compiler 12 beta&lt;/div&gt;&lt;div&gt;tbb old version?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;*I have code libecuda,libptx of PFC ptx emulator of UPC now..&lt;/div&gt;&lt;div&gt;trying for windows and update to ptx isa 2.0&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;*Still no gdebuggerCL&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ptx 2.0 isa released:&lt;/div&gt;&lt;div&gt;includes ptx 1.5 info also (llvm ptx nvidia opencl compiler emits this code)&lt;/div&gt;&lt;div&gt;-&amp;gt; mainly adds separate tex and sampler setup also same __param stuff as functions arguments&lt;/div&gt;&lt;div&gt;-&amp;gt; also shows opencl has no name mangling for kernels ocl.. now testing if ptx with addc can be inserted on opencl &amp;nbsp;i.e. conversor from cuda 3.0 ptx kernels to opencl ptx kernels would be good..&lt;/div&gt;&lt;div&gt;also a cubin to ptx is possible? would allow me to run fastest to date matmul on fermi as fermi doesn't run cubins..&lt;/div&gt;&lt;div&gt;there are some limitation? ask barra creator he has a tesla cubin simulator..&lt;/div&gt;&lt;div&gt;so I could theoretically go from cubin to ocl compatible ptx code..&lt;/div&gt;&lt;div&gt;ptx 2.0 shows for fermi&lt;/div&gt;&lt;div&gt;HAS (also implemented):&lt;/div&gt;&lt;div&gt;*d3d11 cs 5.0 integer instructions&lt;/div&gt;&lt;div&gt;*ldu&lt;/div&gt;&lt;div&gt;*unified address space ld loads&amp;nbsp;&lt;/div&gt;&lt;div&gt;*surface functions (load and store)-&amp;gt; 3d image writes&lt;/div&gt;&lt;div&gt;has load with format or not and with format loads are not implemented and stores with format also not implemented excepting a b32 format&lt;/div&gt;&lt;div&gt;EXPOSES:&lt;/div&gt;&lt;div&gt;*recursion via..&lt;/div&gt;&lt;div&gt;*functions calls with stack (so recursion possible) without defining and abi&lt;/div&gt;&lt;div&gt;*calloc function&lt;/div&gt;&lt;div&gt;*variable args to functions&lt;/div&gt;&lt;div&gt;note this is not implemented in 2.0&lt;/div&gt;&lt;div&gt;lacking still are:&lt;/div&gt;&lt;div&gt;*jump to register/pointer or&amp;nbsp;call to register/pointer (virtual functions?)&lt;/div&gt;&lt;div&gt;*host system calls malloc,printf..etc..&lt;/div&gt;&lt;div&gt;also cuda book shows:&lt;/div&gt;&lt;div&gt;*fermi predication based on&lt;/div&gt;&lt;div&gt;&lt;div&gt;"A Comparison of Full and Partial Predicated Execution Support&lt;/div&gt;&lt;div&gt;for ILP Processors"&lt;/div&gt;&lt;/div&gt;&lt;div&gt;*fermi supports terminating kernels when you want (driver stability improvements?)&lt;/div&gt;&lt;div&gt;also for load balancing..&lt;/div&gt;&lt;div&gt;*cuda fermi implementations priorities.. virtual unified space can take years..&lt;/div&gt;&lt;div&gt;*virtual address space good with GMAC approach for unified unique address for CPU GPU mem now GPU address is unified&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-6968854761550258454?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/6968854761550258454/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/04/megapost.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/6968854761550258454'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/6968854761550258454'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/04/megapost.html' title='Megapost!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-781745449464866206</id><published>2010-03-21T20:37:00.000+01:00</published><updated>2010-03-21T20:37:53.619+01:00</updated><title type='text'>What's for CUDA 3.1 and OpenGL 3.3/4.1!</title><content type='html'>Let's see CUDA 3.0 vs beta:&lt;br /&gt;&lt;br /&gt;*adds full blas support&lt;br /&gt;*opencl local atomics&lt;br /&gt;*ocl i cuda d3d9-11 interop..&lt;br /&gt;*updated guides since beta..&lt;br /&gt;still no ptx 1.5,2.0 specs..&lt;br /&gt;also nv-cl extensions published now:&amp;nbsp;http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/opencl_extensions/cl_nv_compiler_options.txt&lt;br /&gt;&lt;br /&gt;&lt;div&gt;Interesting notes.&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;*Float16 (half) textures are supported in the runtime&lt;/blockquote&gt;&lt;blockquote&gt;*cublas complete i ieee754 complaint fermi&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;*SGEMM performance on Fermi-based GPU is 30% lower than expected.&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;It will be fixed in 3.1.&lt;/blockquote&gt;&lt;blockquote&gt;*The stability of the large-prime FFT transform (signals with a length&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;that is prime and &amp;gt;64k samples) is extremely variable, giving single-&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;precision accuracy in the range 0.005-&amp;gt;0.025. In general, smaller signals&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;experience greater accuracy.&lt;/blockquote&gt;&lt;blockquote&gt;*This package will work MAC OSX running 32/64-bit. &amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;* &amp;nbsp; &amp;nbsp; CUDA applications built in 32/64-bit (CUDA Driver API) is supported.&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; * &amp;nbsp; &amp;nbsp;CUDA applications built as 32-bit (CUDA Runtime API) is supported.&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (10.5.x Leopard and 10.6 SnowLeopard)&lt;/blockquote&gt;&lt;blockquote&gt;Note: x86_64 is not currently working for Leopoard or SnowLeopard&lt;/blockquote&gt;&lt;blockquote&gt;*CUDA applications built with the CUDA driver API can run as either 32/64-bit applications. &amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;* &amp;nbsp;CUDA applications using CUDA Runtime APIs can only be built on 32-bit applications.&lt;/blockquote&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;SDK Release 3.0 Final:&lt;/div&gt;&lt;blockquote&gt;* Replaced 3dfd sample with FDTD3d (Finite Difference sample has been updated)&lt;/blockquote&gt;&lt;blockquote&gt;* Added support for Fermi Architecture (Compute 2.0 profile) to the SDK samples&lt;/blockquote&gt;&lt;blockquote&gt;* Updated Graphics/CUDA samples to use the new unified graphics interop&lt;/blockquote&gt;&lt;blockquote&gt;* Several samples with Device Emulation have been removed. &amp;nbsp;Device Emulation is&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp;deprecated for CUDA 3.0, and will be removed with CUDA 3.1.&lt;/blockquote&gt;&lt;blockquote&gt;* Added new samples:&lt;/blockquote&gt;&lt;blockquote&gt;&amp;nbsp;&amp;nbsp; concurrentKernels (Fermi Capability)&lt;/blockquote&gt;&lt;blockquote&gt;* Bug Fixes&lt;/blockquote&gt;have added simplempi also..&lt;br /&gt;have to test with intel mpi 4.0&lt;br /&gt;&lt;br /&gt;MAC notes:&lt;br /&gt;cuda.dylib is 64bit and has 195API and 195 185 dylibs versioned as 195_96 or 185_55..&lt;br /&gt;*has cuda-memcheck but no cuda-gdb&lt;br /&gt;*cuda kext is fatbin with 64 bits and also cuda.dylib so cuda driver applications are compatible with 64 bits&lt;br /&gt;and compilable..&lt;br /&gt;note also can boot in 64 bit kernel due to kext..&lt;br /&gt;cudart 32 bit&lt;br /&gt;then we can in theory program a cudart wrapper over cuda driver and compile in 64 bits more&lt;br /&gt;now than cudart is stateless and has interop with cuda driver mem alloc..&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;all needed is cublas and cufft to be 64 bits compile in that..&lt;/div&gt;&lt;div&gt;we have code for cudpp,thrust and cusp and in the meanwhile volkov matmul,fft and lapack codes&lt;/div&gt;&lt;br /&gt;so all these can be compiled with 64 bits if we had a cudart 64 bit and see what's up..&lt;br /&gt;well I have compiled cudadevicedrv and matmuldrv&lt;br /&gt;(i'm the first in the world to have 64 bit cuda apple binaries? excepting at nvidia..?)&lt;br /&gt;I have get rid of cutil though compiling to 64 bits would be no problem some notes:&lt;br /&gt;nvcc on mac defaults to 32 bits vs gcc defaults on 64 bits on Snow leopard..&lt;br /&gt;so for using 64bits you must use -m64 in nvcc..&lt;br /&gt;but for cuda driver projects nvcc is of no use since you can use g++ for cuda driver api and compile cuda&lt;br /&gt;files to ptx with nvcc -ptx&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;if you use nvcc with -m64 you get both cpu 64 bit code but also using -ptx you get ptx code&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;using 64 bit pointers for Fermi?&lt;/div&gt;so you can use 32 bit pointers in Fermi is better use 32 bit pointers..&lt;br /&gt;so matrixmuldrv use nvcc -ptx for 32bit pointers and use g++ (-m64) and you get&lt;br /&gt;but cudamoduleloaddataex i get error&lt;br /&gt;CUDA_ERROR_POINTER_IS_64BIT &amp;nbsp; &amp;nbsp; = 800, &amp;nbsp; &amp;nbsp; &amp;nbsp;///&amp;lt; Attempted to retrieve 64-bit pointer via 32-bit API function&lt;br /&gt;loading ptx either if I use a nvcc -m64 or nvcc (all with -ptx) get this error..&lt;br /&gt;so ptx with 32 or 64 bit pointers doesn't change that..&lt;br /&gt;I have to compare files with 32 and 64 bit pointers to see differences also with sm_20..&lt;br /&gt;also note for nvcc -m64 to work either if it not needed needs /usr/local/cuda/lib64 to exist..&lt;br /&gt;so I have copied lib-&amp;gt;lib64 or do a symlink..&lt;br /&gt;so you can now run it..&lt;br /&gt;I have to write tutorial of using cuda and nvcc and achieving macos fat binaries(i386 ad 64)&lt;br /&gt;*I see nvcuvid library for mac in gpu computing sdk.. only 32 bits..&lt;br /&gt;/C/common/lib&lt;br /&gt;and /C/common/inc/cuvid&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Anyway I have a libcuvid (vs libnvcuvid) for 64 bits /usr/local/cuda (where i have get from?)&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;*also a pref pane control panel with autoupdate and shows gpu driver version and cuda driver version..&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;note opencl samples on mac no work until 10.6.3..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;good is opencl not definided behavior (implementation specific) for nvidia:&lt;/div&gt;&lt;div&gt;http://developer.download.nvidia.com/compute/cuda/3_0/toolkit/docs/NVIDIA_OpenCL_ImplementationNotes_3.0.txt&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;issues with mac..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;opengl 4.1/3.3 perfect release:&lt;/div&gt;&lt;div&gt;*ext_direct_state_access&lt;/div&gt;&lt;div&gt;*ext_separate_shader_objects&lt;/div&gt;&lt;div&gt;*RW textures (3d also) ext_image_load_store&lt;/div&gt;&lt;div&gt;*binary shaders (gl es 2.0 api)&lt;/div&gt;&lt;div&gt;in theory you can use some ir from 3dlabs frontend compiler source..&lt;/div&gt;&lt;div&gt;or also translate to hlsl via som translator (amd hlsl2glsl?) and then use binary hlsl shader..&lt;/div&gt;&lt;div&gt;also a good translator..&lt;/div&gt;&lt;div&gt;http://code.google.com/p/angleproject/&lt;/div&gt;&lt;div&gt;has flex/bison glsl parser and also a glsl2hlsl translator (es 2.0)..&lt;/div&gt;&lt;div&gt;going from binary to dx il via:&lt;/div&gt;&lt;div&gt;fxc /dumpbin&lt;/div&gt;&lt;div&gt;but dx il to binary? also how from dx il-&amp;gt;hlsl or glsl directly..&lt;/div&gt;&lt;div&gt;I also have found wine handles/parses more or less dxbc files..&lt;/div&gt;&lt;div&gt;/dlls/d3d10/effect.c&lt;/div&gt;&lt;div&gt;static HRESULT parse_shade&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;NV OGL extensions:&lt;/div&gt;&lt;div&gt;*fermi fuction pointers and recursion for glsl?&lt;/div&gt;&lt;div&gt;would be good addition to bindless &amp;nbsp;extensions and shader buffer load..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;CUDA 3.1:&lt;br /&gt;*cuda-gdb OpenCL HW debugging support..&lt;br /&gt;*pinned GPU mem interop with MPI Infiniband.. (spring10 in sc09)&lt;br /&gt;&lt;br /&gt;*template&amp;nbsp;for a DirectCompute project&lt;br /&gt;Currently there is no template for a DirectCompute project, but NVIDIA will be&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;providing one soon.&lt;br /&gt;*Fix perf of CUBLAS SGEMM by 30% faster on Fermi&lt;br /&gt;*Fix CUFFT perf vs 3.0beta goes 180-190gflops to 150gflops&lt;br /&gt;*provide official cudaasm/decuda or documentation about cubin/ELF format for SM_20 devices? also for sm_10?&lt;br /&gt;*PTX 1.5, 2.0 docs?&lt;br /&gt;*Updated opencl best practices for Fermi? cuda best.. guide is updated but for Fermi?&lt;br /&gt;&lt;br /&gt;*Surface functions: RW textures with x,w addressing etc.. also 3d image writes.. headers and exported functions in beta but removed in final..&lt;br /&gt;&lt;br /&gt;Also CUDA to CPU compiler or is gpuocelot mature enough and also mac and windows ports avaiable..&lt;br /&gt;would be good a direct PTX2CPU code conversor and using gpuocelto lib as cudart and cuda api..&lt;br /&gt;&lt;br /&gt;Mac&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;*add cuda-gdb (with ocl also) and OpenCL visual profiler&lt;/div&gt;&lt;br /&gt;opencl mac no xutan 2 ejemplos&lt;br /&gt;&lt;br /&gt;&lt;div&gt;cuda opengl slow mac&lt;/div&gt;&lt;div&gt;ship&lt;/div&gt;&lt;div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;is going to work with fermi cuda.kext&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;*Related is first 195 series 197 whql driver for Quadros enabling OpenCL on these devices..&lt;br /&gt;&lt;blockquote&gt;&lt;blockquote&gt;Adds support for CUDA 3.0 for improved performance in GPU Computing applications. See CUDA for more details.&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;This driver resolves fan speed issues reported with version 196.75 drivers.&lt;/blockquote&gt;&lt;blockquote&gt;Adds support for the Open Computing Language (OpenCL) 1.0 in Quadro FX Series x700 and newer as well as the FX4600 and FX5600.&lt;/blockquote&gt;&lt;/blockquote&gt;*Nvidia mentions compute cluster driver but is 196.28 not updated since early feb.. anyway d3d interop&lt;br /&gt;added finally is not nedeed here..&lt;br /&gt;*&lt;br /&gt;to pierre boudier you cansee ogl 4.0 drivers soon and also a image write and random access extension soon ala d3d11 rwtexture..&lt;br /&gt;ubuntu 10.4 fglrx 8.72&lt;br /&gt;&lt;blockquote&gt;&lt;blockquote&gt;fglrx-installer (2:8.721-0ubuntu1) lucid; urgency=low&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;* New upstream release:&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;- Restore compatibility with kernel 2.6.32 and xserver 1.7 (LP: #494699).&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;- Add Passive Stereo support on workstation (FireGL/FirePro) hardware.&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;- Add Eyefinity support (more than 2 monitors on Radeon HD 5xxx hardware).&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;Officially WS-only but should work on consumer boards as well.&amp;nbsp;&lt;/blockquote&gt;&lt;/blockquote&gt;GL_EXT_shader_subroutine GL_EXT_timer_query&lt;br /&gt;&lt;br /&gt;Also what about 3d stereo on linux:&lt;br /&gt;*3d vision for opengl qb on quadro with stereo connector is here..&lt;br /&gt;*a 3dtv for linux so opengl qb can be output to hdmi 1.4 on linux? this can add working on low profile quadros as stereo connector is not needed (is not needed in 3d vision is Nvidia way of artificially limiting to super high end quadros well expect perhaps better synch..)&lt;br /&gt;also if they add VDPAU h.264 MVC and you decrypt bluray3d with anydvd hd you will be able in theory to see it in linux gpu accelerated decoding and sending to tv's via hdmi 1.4..&lt;br /&gt;let's see also how windows is handled as not dxva 2.0 support it mvc? also not cuvid so leet's see if they add it to cuvid also..&lt;br /&gt;so seems all cyberlink will get some library by nvidia or what?&lt;br /&gt;*ATI has hooks for d3d9,10? d3d11? in 10.3, also fglrx 8.72 add passive stereo for ogl qb (active stereo is here right?.. but for 120hz lcds also?)&lt;br /&gt;let's see also how ati manages output to HDMI 1.4 tv's via either IZ3D partnership or what? in fact I expect iz3d only hooks d3d stereo and the amd will add some HDMI 1.4 stereo from this hooks so will be good a sdk or documentation of this hooks..&lt;br /&gt;Also Nvidia will be good publishing stereo sdk (promised in gdc2010) and hope also this hooks (d3d9-11) will work with 3dtv and output to hdmi 1.4 tvs.. In fact yes as Avatar and 3d stereo vision use this hooks presumably..&lt;br /&gt;mac is out in this scope..&lt;br /&gt;&lt;br /&gt;also nvidia can be late with fermi but not with software supporting it..&lt;br /&gt;now d3d11 is with cs5.0 here and also we have now d3d11 interop for cuda in 3.0 and d3d11 interop with opencl extension and also optix d3d11 interop..&lt;br /&gt;We have d3d11 interop with:&lt;br /&gt;*CUDA 3.0&lt;br /&gt;*OpenCL&lt;br /&gt;*Optix&lt;br /&gt;HW debugging:&lt;br /&gt;Nsight.&lt;br /&gt;All need to be released is nsight which will also bring d3d11 support (hw debug and profile) wii be good to hw debug cuda, d3d11 cs, cuda with d3d11 interop, and trace opencl and opengl (4.0? will be traced?)..&lt;br /&gt;&lt;br /&gt;also cg 3.0 will have support for d3d11? and also sm5.0 opengl 4.0 support? i.e. tesselation shaders with glsl output?&lt;br /&gt;note cgc 3.0 is shipping on tegra sdk and also as part of nvidia drivers 195 opengl compiler..&lt;br /&gt;I have seen cgfx working with optix and cuda in a blog so hope they ship example soon..&lt;br /&gt;http://lorachnroll.blogspot.com/2010/03/mixing-nvidia-technologies-thanks-to.html&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;GPU: GF100 @ 700MHz&lt;br /&gt;- CUDA cores: 480 @ 1401MHz&lt;br /&gt;- Memory: 1536MB GDDR5 @ 1848MHz 384-bit&lt;br /&gt;- TDP: 250W&lt;br /&gt;GeForce GTX 470:&lt;br /&gt;- GPU: GF100 @ 607MHz&lt;br /&gt;- CUDA cores: 448 @ 1215MHz&lt;br /&gt;- Memory: 1280MB GDDR5 @ 1674MHz 320-bit&lt;br /&gt;- TDP: 225W&lt;br /&gt;- Price: $349US&lt;br /&gt;&lt;br /&gt;- 3D APIs: OpenGL 4.0 and Direct3D 11&lt;br /&gt;- GPU Computing: OpenCL, CUDA and DirectCompute&lt;br /&gt;- 3-way SLI support&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;GeForce GTX 480 : 480 SP, 700/1401/1848MHz core/shader/mem, 384-bit, 1536MB, 250W TDP, US$499&lt;br /&gt;&lt;br /&gt;GeForce GTX 470 : 448 SP, 607/1215/1674MHz core/shader/mem, 320-bit, 1280MB, 225W TDP, US$349&lt;br /&gt;&lt;br /&gt;Note also we have like GLSL and OCL vec4 and other C++ libraries:&lt;br /&gt;*GLM has GLSL strict compliance..&lt;br /&gt;even with GMX experimental extensions we have SIMD implementations..&lt;br /&gt;*DX SDK feb 2010 has XNAMATH 2.02 SIMD math library&lt;br /&gt;also read:&lt;br /&gt;http://www.gamasutra.com/view/feature/4248/designing_fast_crossplatform_simd_.php&lt;br /&gt;&lt;br /&gt;HDR good maps:&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;http://www.hdrlabs.com/sibl/archive.html&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Nvidia employess blogs:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;http://timothylottes.blogspot.com/&lt;br /&gt;http://jamesdolan.blogspot.com/&lt;br /&gt;http://industrialarithmetic.blogspot.com/&lt;br /&gt;http://castano.ludicon.com/blog/&lt;br /&gt;&lt;br /&gt;http://twitter.com/castano&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;http://twitter.com/tmurray_cmpxchg&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;showing max cuda mem:&lt;/div&gt;&lt;div&gt;&lt;div&gt;http://forums.nvidia.com/index.php?showtopic=102682 cuda maxmem&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;caustics patents:&lt;/div&gt;&lt;div&gt;US patent applications: 20090096788, 20090096789, and especially 20090128562,&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;The LLVM 2.7 binaries are available for testing:&lt;/div&gt;&lt;div&gt;http://llvm.org/pre-releases/2.7/pre-release1/&lt;/div&gt;&lt;/div&gt;&lt;div&gt;http://amnoid.de/tmp/clangtut/tut.html&lt;/div&gt;&lt;div&gt;http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-May/005167.html&lt;/div&gt;&lt;div&gt;http://synopsis.fresco.org/&lt;/div&gt;&lt;div&gt;Performance inconsistencies when testing various bit-counting methods&amp;nbsp;&lt;/div&gt;&lt;div&gt;ubuntu cheat cube:119834-cheat-cube-ub&lt;/div&gt;&lt;div&gt;ie9 VML to SVG Migration Guide&lt;/div&gt;&lt;div&gt;windows phone 7:&lt;/div&gt;&lt;div&gt;*xna ctp 4.0 avaiable works with pc but only reach profile not hidef..&lt;/div&gt;&lt;div&gt;*unlocked image with all apps instructions on a blog..&lt;/div&gt;&lt;div&gt;*petzold samples and book excerpt avaiable..&lt;/div&gt;&lt;div&gt;*also sqlite port -&amp;gt;csharp-sqlite.wp&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Windows 7 &amp;nbsp;XP Mode now has support for CPUs without virtualization VT-D support..&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;Windows 7 SP1 virtualization news:&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;With Microsoft RemoteFX, users will be able to work remotely in a Windows Aero desktop environment, watch full-motion video, enjoy Silverlight animations, and run 3D applications," Microsoft's Max Herrmann writes, "All with the fidelity of a local-like performance when connecting over the LAN."&lt;/blockquote&gt;&lt;blockquote&gt;cuda will work with it? i.e. no need for compute cluster driver and also ogl,dx and interop support..&lt;/blockquote&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;blockquote&gt;Q: Will RemoteFx support also OpenGL hardware acceleration which is the 3D high level API used by professional applications like CAD systems or medical applications ?&lt;/blockquote&gt;&lt;blockquote&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;blockquote&gt;A: RemoteFX will support certain OpenGL applications. However, as the development of RemoteFX is still ongoing, it is too early to provide any specifics at this point.&lt;/blockquote&gt;&lt;blockquote&gt;Q: Are you plan to introduce RemoteFX also for Windows 7 because their are many scenarios where the remote system is not a server but a high end workstation ?&lt;/blockquote&gt;&lt;blockquote&gt;A: RemoteFX has been designed as a Windows Server capability to support the growing demand for multi-user, media-rich centralized desktop environments. Windows 7 will be supported as a virtual guest OS under Hyper-V.&lt;/blockquote&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;div&gt;&lt;blockquote&gt;Dynamic Memory is an improvement to Hyper-V which allows users to pool all available physical host memory together, and dynamically allocate it to virtual machines. In other words, if the workload changes, VMs can get access to extra memory without having to shut them down.&lt;/blockquote&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;XNA forums:&lt;/div&gt;&lt;blockquote&gt;Updated list of D3D12 suggestions&lt;/blockquote&gt;&lt;blockquote&gt;Unable to perform a recursive call with DirectCompute?&amp;nbsp;&lt;/blockquote&gt;&lt;blockquote&gt;How to AttachBuffersAndPrecompute to ID3DX11FFT&lt;/blockquote&gt;&lt;blockquote&gt;RWStructuredBuffer counter&lt;/blockquote&gt;&lt;blockquote&gt;The IncrementCounter is faster than IterlockedAdd(Buffer[0], 1) in 4 times.&lt;/blockquote&gt;&lt;blockquote&gt;Gamefest 2010 presentations?&lt;/blockquote&gt;&lt;blockquote&gt;D3D11 / D2D Interoperativity&lt;/blockquote&gt;&lt;blockquote&gt;329M pairs/sec radix sort performance, 408M keys/sec - crushes CUDPP numbers&lt;/blockquote&gt;&lt;blockquote&gt;AppendStructuredBuffer driver bug?&lt;/blockquote&gt;&lt;blockquote&gt;How to debug DirectX 11 Compute Shaders?&lt;/blockquote&gt;&lt;blockquote&gt;Creating a Shared Surface with DXGI&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;atomic&lt;br /&gt;I have some questions about RWStructuredBuffer:&lt;br /&gt;1. How to copy hidden counter to system memory? CopyStructureCount&lt;br /&gt;2. How to reset the counter to zero? last argument of OMSetRenderTargetsAndUnorderedAccessViews&lt;br /&gt;3. Why the performance of this counter is much more than the performance of InterlockedAdd at the element buffer? (HD 5670)&lt;br /&gt;The IncrementCounter is faster than IterlockedAdd(Buffer[0], 1) in 4 times.&lt;br /&gt;How to AttachBuffersAndPrecompute to ID3DX11FFT?&lt;br /&gt;&lt;br /&gt;http://gephi.org/&lt;br /&gt;&lt;br /&gt;http://forums.xna.com/forums/t/49607.aspx&lt;br /&gt;Thank you. I forgot about debug version of the D3DCSX. Debug message proved to be helpful. For the record: 1. The number of buffers attached must be exactly the same as in D3DX11_FFT_BUFFER_INFO. 2. The views MUST be created with the D3D11_BUFFER_UAV_FLAG_RAW flag (although it wasn't mentioned in documentation).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;div&gt;The Chrome dev channel release has support for an Open GL ES 2.0 interface&amp;nbsp;&lt;/div&gt;&lt;div&gt;for Native Client. This is something we said we would do sometime last year.&amp;nbsp;&lt;/div&gt;&lt;div&gt;When we consider it stable, documented etc. we will do more of an&amp;nbsp;&lt;/div&gt;&lt;div&gt;announcement.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Google are announcing that NaCl now also supports x86-64 and ARM.&lt;/div&gt;&lt;div&gt;http://www.osnews.com/story/23021/Native_Client_Portability_Almost_Native_Graphics_Layer_Engine&lt;/div&gt;&lt;/div&gt;&lt;div&gt;NaCl_SFI:Adapting Software Fault Isolation to Contemporary CPU&lt;/div&gt;&lt;div&gt;Architectures&lt;/div&gt;&lt;div&gt;pnacl: Portable Native Client Executables&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;from GDC:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;this are also graphics API translations:&lt;/div&gt;&lt;div&gt;Cider &amp;amp; Cedega: Direct3D on OpenGL&lt;/div&gt;&lt;div&gt;GameTree.tv: Direct3D on OpenGL ES&lt;/div&gt;&lt;div&gt;SwiftShader: DX Software Rendering (also WARP)&lt;/div&gt;&lt;div&gt;ANGLE Project: WebGL (OGL ES 2.0) on Direct3D&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;now we need GPGPU apis so:&lt;/div&gt;&lt;div&gt;&lt;div&gt;cuda on opencl?&lt;/div&gt;&lt;div&gt;cuda on cal?&lt;/div&gt;&lt;div&gt;directcompute on opencl?&lt;/div&gt;&lt;/div&gt;&lt;div&gt;opencl on directcompute?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;posted on opengl and cuda forums:&lt;/div&gt;&lt;div&gt;&lt;div&gt;Questions to nvidia:&lt;/div&gt;&lt;div&gt;*Is Nvidia going to expose ext_gpu_shader_fp64 on GT2xx hardware with double precision or is for d3d11 hardware?&lt;/div&gt;&lt;div&gt;For example gtx275&lt;/div&gt;&lt;div&gt;AMD seems to support double precision on GLSL via doublepAMD even on 4850 cards..&lt;/div&gt;&lt;div&gt;Also is Nvidia with initial GL 4.0 drivers going to finally expose documentation for wgl_nv_dx_interop and have the shown at gtc texture writting and random access support?&lt;/div&gt;&lt;div&gt;via ext_image_load_store?&lt;/div&gt;&lt;div&gt;Please post PTX 1.5 and &amp;nbsp;2.0 documents..&lt;/div&gt;&lt;div&gt;Also I'm summing here things promised soon by Nvidia so let's see how much it takes before we get:&lt;/div&gt;&lt;div&gt;*cuda-gdb support for hardware debugging of OpenCL kernels&lt;/div&gt;&lt;div&gt;*cuda-gdb GPU debugger for Mac (with OpenCL support also)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Mac related:&lt;/div&gt;&lt;div&gt;Is mac 64 supported?&lt;/div&gt;&lt;div&gt;This package will work MAC OSX running 32/64-bit. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CUDA applications built in 32/64-bit (CUDA Driver API) is supported.&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CUDA applications built as 32-bit (CUDA Runtime API) is supported.&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (10.5.x Leopard and 10.6 SnowLeopard)&lt;/div&gt;&lt;div&gt;Note: x86_64 is not currently working for Leopoard or SnowLeopard&lt;/div&gt;&lt;div&gt;UDA applications built with the CUDA driver API can run as either 32/64-bit applications. &amp;nbsp;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;CUDA applications using CUDA Runtime APIs can only be built on 32-bit applications.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;My mac notes:&lt;br /&gt;&lt;br /&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;span style="font-family: &amp;quot;Helvetica&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 12.0pt;"&gt;nvcc matrixMul_kernel.cu matrixMulDrv.cpp&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;-I../../common/inc/&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;../../lib/libcutil_i386.a matrixMul_gold.cpp -Xlinker /usr/local/cuda/lib/libcuda.dylib &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;span style="font-family: &amp;quot;Helvetica&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 12.0pt;"&gt;nvcc matrixMul_kernel.cu -c -m64 &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;span style="font-family: &amp;quot;Helvetica&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 12.0pt;"&gt;g++ matrixMul_gold.cpp matrixMulDrv.cpp&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;-I../../common/inc/ -I$CUDA_INC_PATH -L$CUDA_LIB_PATH /usr/local/cuda/lib/libcuda.dylib ../../lib/libcutil_i386.a &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;span style="font-family: &amp;quot;Helvetica&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 12.0pt;"&gt;para nvcc -m64 crea lib64 con copia de lib&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;span style="font-family: &amp;quot;Helvetica&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 12.0pt;"&gt;nvcc -m64 deviceQueryDrv.cpp&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;-I../../common/inc/ -I../../../shared/inc -Xlinker /usr/local/cuda/lib/libcuda.dylib &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.3pt 56.65pt 85.0pt 113.35pt 141.7pt 170.05pt 198.4pt 226.75pt 255.1pt 283.45pt 311.8pt 340.15pt; text-autospace: none;"&gt;&lt;span style="font-family: &amp;quot;Helvetica&amp;quot;,&amp;quot;sans-serif&amp;quot;; font-size: 12.0pt;"&gt;quita cut&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;nvcc defaults 32 bits&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;gcc defaults 64&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;g++&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;g++&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;deviceQueryDrv.cpp&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;-I../../common/inc/ -I../../../shared/inc&lt;span style="mso-spacerun: yes;"&gt;&amp;nbsp; &lt;/span&gt;/usr/local/cuda/lib/libcuda.dylib -I$CUDA_INC_PATH &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;//#include &lt;cutil.h&gt;&lt;/cutil.h&gt;&lt;/span&gt;&lt;span style="font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #643820; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;#define CU_SAFE_CALL_NO_SYNC(a) a&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;//CUT_EXIT(argc, argv);&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;export CUDA_BIN_PATH=/usr/local/cuda/bin&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;export CUDA_BIN_PATH=/usr/local/cuda/bin &lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;export CUDA_LIB_PATH=/usr/local/cuda/lib&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;export CUDA_INC_PATH=/usr/local/cuda/include&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;export PATH=$PATH:/usr/local/cuda/bin&lt;o:p&gt;&lt;/o:p&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;span style="color: #007400; font-family: Menlo-Regular; mso-bidi-font-family: Menlo-Regular;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="MsoNormal" style="line-height: normal; margin-bottom: .0001pt; margin-bottom: 0cm; mso-layout-grid-align: none; mso-pagination: none; tab-stops: 28.0pt; text-autospace: none;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-781745449464866206?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/781745449464866206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/whats-for-cuda-31-and-opengl-3341.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/781745449464866206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/781745449464866206'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/whats-for-cuda-31-and-opengl-3341.html' title='What&apos;s for CUDA 3.1 and OpenGL 3.3/4.1!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-1942838912929466691</id><published>2010-03-18T19:32:00.000+01:00</published><updated>2010-03-18T19:32:07.927+01:00</updated><title type='text'>raw data..</title><content type='html'>games:&lt;br /&gt;*metro 2033 and just cause 2 demo avaiable! (fermi launch titles?)&lt;br /&gt;*assasins creed2 and bad company 2 this month also..&lt;br /&gt;*Command &amp;amp; Conquer 4: Tiberian Twilight &lt;b&gt;&lt;/b&gt;&lt;br /&gt;*3d vision cd 1.23 has direct3d11 support! (so list support for&amp;nbsp; d3d11 fermi supersleddemo)&lt;b&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;iexplore 9 preview with direct2d directwrite support&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;*&lt;/b&gt;3D texture based separable convolution, extension of SDK example&lt;br /&gt;code:&lt;br /&gt;http://forums.nvidia.com/index.php?showtopic=163382&lt;br /&gt;&lt;br /&gt;*bin format for fermi is similar ptx: post luebke on gpgpu-sim mailing list&lt;br /&gt;one guy from pathscale says he has all info on this and other low level info presumably PTX 1.5,2.0 specs (bin format spec?) and also info for open source cuda driver for BSD etc..? &lt;br /&gt;*gpgpu-3 papers avaiable!&lt;br /&gt;http://www.ece.neu.edu/groups/nucar/GPGPU/GPGPU-FinalProgram.pdf&lt;br /&gt;*CULA 1.2 avaiable with some eigenvectors/values stuff..&lt;br /&gt;&lt;br /&gt;*"GPU Sample Sort" paper for the upcoming IPDPS 2010 conference? &lt;br /&gt;&lt;blockquote&gt;It is possible to achieve much higher sorting rates for NV devices than with the Satish/CUDPP methods. You might be interested in our radix CUDA sorting results here at UVA. We demonstrate 480M pairs/sec, and 550M keys/sec on our GTX285 (with other devices evaluated as well). Interestingly enough, our keys-only results on the NV GT200 architecture are superior to the cycle-accurate sorting results from the (defunct) 32-core Larrabee.&lt;/blockquote&gt;Where is source?&lt;br /&gt;&lt;br /&gt;http://www.cs.virginia.edu/~dgm4d/papers/RadixSortTR.pdf&lt;br /&gt;&lt;br /&gt;Other sorting new papers: &lt;br /&gt;*Revisiting Sorting for GPGPU Stream Architectures&lt;br /&gt;&amp;nbsp;"GPU Sample Sort" paper for the upcoming IPDPS 2010&lt;br /&gt;N. Leischner, V. Osipov, and P. Sanders. GPU sample sort. In &lt;i&gt;Proc.  Int'l Parallel and Distributed Processing Symposium (IPDPS), to appear&lt;/i&gt;,  2010 (currently available at http://arxiv1.library.cornell.edu/abs/&lt;b&gt;&lt;a href="http://eprintweb.org/S/article/arxiv/0909.5649" style="color: #cc0000; text-decoration: none;"&gt;0909.5649&lt;/a&gt;&lt;/b&gt;). &lt;br /&gt;&lt;br /&gt;*CUFFT does support streams... and seems has 3d ffts perf improvements of sc08 paper included so&lt;br /&gt;apple fft code seems now work on Nvidia OpenCL but offer 2x-3x perf disadvantage vs cufft..&lt;br /&gt;&lt;br /&gt;2d to 3d video conversion:&lt;br /&gt;we have reald and other directshow plugin..&lt;br /&gt;now:&lt;br /&gt;arsoft sim 3d plus hd coming q2..&lt;br /&gt;and powerdvd 10..&lt;br /&gt;*TrueTheater™ Stabilizer &lt;br /&gt;*TrueTheater™ 3D&lt;br /&gt;*TrueTheater  Noise Reduction&lt;br /&gt;&lt;blockquote&gt;PowerDVD 10 Mark II: Consumers who purchase PowerDVD 10 Ultra 3D will receive a FREE UPGRADE that enables support of the Blu-ray 3D format and 2D to 3D conversion of video files. Available this summer.&lt;br /&gt;Blu-ray 3D playback requires FREE "Mark II" upgrade which will be available soon.&lt;/blockquote&gt;lot of betas coming:&lt;br /&gt;qt 4.7&lt;br /&gt;intel compiler 12&lt;br /&gt;vmware workstation 7.1&lt;br /&gt;other march:&lt;br /&gt;openrl&lt;br /&gt;heaven 2.0&lt;br /&gt;&lt;br /&gt;http://www.cs.utk.edu/~dongarra/WEB-PAGES/cscads-libtune-09/&lt;br /&gt;&lt;b&gt;&lt;b&gt;&lt;span&gt;1st CUDA Developers' Conference&lt;/span&gt;&lt;/b&gt;&lt;/b&gt; &lt;br /&gt;http://www.smithinst.ac.uk/Events/CUDA2009&lt;br /&gt;see&lt;br /&gt;"Looking after the 7 dwarfs: numerical libraries / frameworks for  GPUs" Mike Giles&lt;br /&gt;&amp;nbsp;also&lt;br /&gt;"The Art of Performance Tuning for CUDA and Manycore Architectures"&lt;br /&gt;David  Tarjan (NVIDIA)&lt;br /&gt;Kevin Skadron (U. Virginia)&lt;br /&gt;Paulius  Micikevicius (NVIDIA)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;b&gt;   &lt;span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/b&gt;&lt;br /&gt;cudpp  1.1.1 svn has fermi support&lt;br /&gt;cusp has amg geometric multigrid.. &lt;br /&gt;http://forums.nvidia.com/index.php?showtopic=163382&amp;amp;st=0&amp;amp;#entry1022104&lt;br /&gt;&lt;br /&gt;See DirectX 9.0 on OpenGL ES 2.0 -&amp;gt;http://www.gametree.tv/ linux sdk&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Coming in Spring 2010, the GameTree.tv Publishing SDK for Intel CE  hardware will include the tools you need to optimize and debug your game  for the GameTree.tv Gaming Platform, plus the ability to order Intel CE  hardware.Developer Tools &amp;amp; Documentation&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; available&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;  available&lt;br /&gt;OpenGL ES 1.1 and 2.0&lt;br /&gt;&amp;nbsp;- Windows Game Development and Emulation&lt;br /&gt;&amp;nbsp;- Linux Desktop Runtime SDK &amp;nbsp;&amp;nbsp;&amp;nbsp; available &amp;nbsp;&amp;nbsp;&amp;nbsp; available&lt;br /&gt;Direct3D® support&lt;br /&gt;&amp;nbsp;- Fixed-Function&lt;br /&gt;&amp;nbsp;- Shader Model 1.0 and 2.0 API&lt;br /&gt;&amp;nbsp;- Linux Desktop Emulation SDK &amp;nbsp;&amp;nbsp;&amp;nbsp; available &amp;nbsp;&amp;nbsp;&amp;nbsp; available&lt;br /&gt;Debugging With Visual Studio &amp;nbsp;&amp;nbsp;&amp;nbsp; Coming March 2010 &amp;nbsp;&amp;nbsp;&amp;nbsp; available&lt;br /&gt;GameTree.tv Developer Forums &amp;nbsp;&amp;nbsp;&amp;nbsp; Coming Soon &amp;nbsp;&amp;nbsp;&amp;nbsp; available&lt;br /&gt;Publish Games For Commercial Sale &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &lt;br /&gt;Detailed Hardware Setup Documentation &amp;nbsp;&amp;nbsp; &lt;br /&gt;Hardware Order Process &amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp; &lt;br /&gt;Developer Relations Support &amp;nbsp;&amp;nbsp; &lt;/blockquote&gt;fglrx 8.72.5 has ubuntu 10.4 support and opengl 3.2.97xx (opengl  3.3/40 partial support?)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Nvidia theater GDC notes:&lt;br /&gt;dmm2&lt;br /&gt;dmm2 free 1500 objects (star unleashed not uses more) max, has interop with physx and bullet adds&lt;br /&gt;also directcompute and opencl simulation&lt;br /&gt;&lt;br /&gt;shipping september october beta&lt;br /&gt;still not ready plastic simulation and fracture mode.. calculates stress on volume so physical based break..&lt;br /&gt;uses fp32 for gpu support and sse..&lt;br /&gt;&lt;br /&gt;3d vision on unreal engine 3 shipping in april..&lt;br /&gt;3d vision sdk soon code samples etc developer tricks for surround&lt;br /&gt;surround recommends gfx400 in sli i "release 256 driver"&lt;br /&gt;&lt;br /&gt;khrnos gdc sessions published has&lt;br /&gt;info physics amd opencl sph and soft bodies no rigid bodies this is bullet work..&lt;br /&gt;also fem simulation is dmm2 work..&lt;br /&gt;no more interesting talk slides?: fft profiling for OpenCL by Nvidia employee &lt;br /&gt;&lt;br /&gt;physxlab with destruction (precalculated) is beta now with unreal engine 3 integration&lt;br /&gt;&lt;br /&gt;new unigine 2.0 this month on 26 has Linux support? and Windows OpenGL tesselation support with Fermi /5xxx cards?&lt;br /&gt;nsight 480gtx 8marzo release&lt;br /&gt;nexus 1.0 opengl and opencl analyzer not hardware debugger but like gdebugger gl+cl&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;fermi games just cause 2 (d3d10 only) metro 2033 (d3d11 optional)&lt;br /&gt;http://nvidia.fullviewmedia.com/gdc2010/agenda.html &lt;br /&gt;opengl 4.0 extensions viewer and glew in trunk support!&lt;br /&gt;assasins cred2, badcompany 2&lt;br /&gt;ati open 3d&lt;br /&gt;nvidia 3dtv&lt;br /&gt;&lt;br /&gt;cuda and visual studio:&lt;br /&gt;&lt;br /&gt;&lt;div class="quotetop"&gt;QUOTE &lt;/div&gt;&lt;div class="quotemain"&gt;-  create empty cuda projects trough "project.."&lt;/div&gt;&lt;br /&gt;You  can just create an ordinary console project and then add .cu files to  this project (see next point).&lt;br /&gt;&lt;br /&gt;&lt;div class="quotetop"&gt;QUOTE  &lt;/div&gt;&lt;div class="quotemain"&gt;- add new .cu files through  "add new item" (renaming c++ or txt in .cu files causes build errors)&lt;/div&gt;&lt;br /&gt;If  you add the CUDA build rules (Cuda.rules, distributed with the SDK)  then VS will automatically detect the .cu files and pass them to nvcc to  compile these to standard .obj files, the standard linker (link.exe)  will then link these with the rest of your application's .obj files.&lt;br /&gt;&lt;br /&gt;&lt;div class="quotetop"&gt;QUOTE &lt;/div&gt;&lt;div class="quotemain"&gt;-  doesn't highlight code in .cu files&lt;/div&gt;&lt;br /&gt;See  the instructions in  (SDK_INSTALL_DIR)\C\doc\syntax_highlighting\visual_studio_8&lt;br /&gt;&lt;br /&gt;&lt;div class="quotetop"&gt;QUOTE &lt;/div&gt;&lt;div class="quotemain"&gt;- must  copy a thousand times cutil64.dll around till it releases the program  ...&lt;/div&gt;&lt;br /&gt;Cutil is used to minimise  code replication between the SDK samples. I'd advise understanding what  you actually need and implementing it yourself. For example, most people  only want the cuda safe call macros and you would be better off  handling the error in a manner suitable for your app rather than just  calling exit().&lt;br /&gt;&lt;br /&gt;&lt;div class="quotetop"&gt;QUOTE &lt;/div&gt;&lt;div class="quotemain"&gt;- must add a "thousand" new libraries  not to cause build errors&lt;/div&gt;&lt;br /&gt;By  "thousand" do you mean one (cudart.lib)?! Ok, so you're using cutil so  you need cutil64.lib too. But by definition using any library (and the  CUDA API is provided through a library) you have to link with libraries.&lt;br /&gt;&lt;br /&gt;&lt;div class="quotetop"&gt;QUOTE &lt;/div&gt;&lt;div class="quotemain"&gt;- and  even then its not sure if it runs&lt;/div&gt;&lt;br /&gt;Can't  help with that one (without more info).&lt;br /&gt;&lt;br /&gt;I would advise the  following.&lt;br /&gt;&lt;br /&gt;Preparation:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Set up syntax highlighting&lt;/li&gt;&lt;li&gt;Set  up Intellisense&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Development:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Create a new,  empty, console project (or you can use an existing project if you have  one&lt;/li&gt;&lt;li&gt;Add your .c, .cpp and .cu files&lt;/li&gt;&lt;li&gt;Add the Cuda.rules&lt;/li&gt;&lt;li&gt;Modify  C/C++ code generation to use /MT in release, /MTd in debug&lt;/li&gt;&lt;li&gt;Do  the same for the Cuda code generation&lt;/li&gt;&lt;li&gt;Add cudart.lib to all  configurations (i.e. release and debug)&lt;/li&gt;&lt;li&gt;Build, run, debug etc.&lt;/li&gt;&lt;/ul&gt;&amp;nbsp;Proceedings of 24th IEEE International Parallel and Distributed Processing Symposium&lt;br /&gt;&lt;br /&gt;gpu papers:&lt;br /&gt;&lt;br /&gt;Session 2: Scientific Computing with GPUs Improving Numerical Reproducibility and Stability in Large-Scale Numerical Simulations on GPUs&lt;br /&gt;Implementing the Himeno Benchmark with CUDA on GPU Clusters&lt;br /&gt;Direct Self-Consistent Field Computations on GPU ClustersParallelization of Tau-Leap Coarse-Grained Monte Carlo Simulations on GPUs&lt;br /&gt;&lt;br /&gt;A High-Performance Fault-Tolerant Software Framework for Memory on Commodity GPUs&lt;br /&gt;&lt;br /&gt;Sort&lt;br /&gt;High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs&lt;br /&gt;GPU Sample Sort &lt;br /&gt;Highly Scalable Parallel Sorting&lt;br /&gt;&lt;br /&gt;Session 9: Software Support for Using GPUs 26&lt;br /&gt;Object-Oriented Stream Programming using Aspects&lt;br /&gt;Optimal Loop Unrolling For GPGPU Programs&lt;br /&gt;Speculative Execution on Multi-GPU Systems&lt;br /&gt;Dynamic Load Balancing on Single- and Multi-GPU Systems&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Fisheye Lens Distortion Correction on Multicore and Hardware Accelerator Platforms .. . . 37&lt;br /&gt;Large-Scale Multi-Dimensional Document Clustering on GPU Clusters&lt;br /&gt;&lt;br /&gt;Dynamically Tuned Push-Relabel Algorithm for the Maximum Flow Problem on CPU-GPU-Hybrid Platforms .&lt;br /&gt;&lt;br /&gt;Optimization of Linked List Prefix Computations on Multithreaded GPUs Using CUDA&lt;br /&gt;&lt;br /&gt;Inter-Block GPU Communication via Fast Barrier Synchronization&lt;br /&gt;&lt;ul&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-1942838912929466691?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/1942838912929466691/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/raw-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1942838912929466691'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1942838912929466691'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/raw-data.html' title='raw data..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-4867527878798533852</id><published>2010-03-18T19:01:00.000+01:00</published><updated>2010-03-18T19:01:48.303+01:00</updated><title type='text'>What's left in OpenGL 4.0? and more raw info..</title><content type='html'>Somedays ago OGL 3.3 and 4.0 specs were published and a set of equivalent&amp;nbsp;ARB extensions were put on registry where GLSL 3.3 and 4.0 were released..&amp;nbsp;now ogl 4.0 compatibility spec is +600 pages long core is 420 pages..&lt;br /&gt;&lt;br /&gt;Other things:&lt;br /&gt;&lt;div style="margin: 0px;"&gt;*OGL 4.0 quick reference card&lt;/div&gt;&lt;div style="margin: 0px;"&gt;&lt;span class="Apple-style-span" style="color: #008800; font-family: Arial; font-size: 13px; white-space: nowrap;"&gt;http://www.khronos.org/files/opengl4-quick-reference-card.pdf&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;*new glext.h and gl3.h updated &lt;br /&gt;*glloader, glew on svn, and opengl extensions viewer for 3.3/4.0 already support it..&lt;br /&gt;wait for sdl, smfl .. &lt;br /&gt;*Waiting Fermi drivers on launch day..&lt;br /&gt;&lt;br /&gt;remember all&amp;nbsp;ARB extension no vendor or EXT..&lt;br /&gt;&lt;br /&gt;&lt;div style="margin: 0px;"&gt;No arb extensions included in 3.3/4.0 spec are:&lt;/div&gt;&lt;br /&gt;GL_ARB_shading_language_include&lt;br /&gt;GL_ARB_texture_compression_bptc&lt;br /&gt;&lt;div&gt;so HDR D3d11 texture format not required for ogl 4.0..&lt;/div&gt;&lt;div style="margin: 0px;"&gt;also lost is #include in shaders..&lt;/div&gt;5xxx series include ogl4.0 emulating double on cpu? better with double-float emulation..&lt;br /&gt;&lt;br /&gt;Last Nvidia found:&lt;br /&gt;&lt;div style="margin: 0px;"&gt;&lt;/div&gt;&lt;div style="margin: 0px;"&gt;GL_EXT_shader_image_load_store&lt;/div&gt;&lt;div style="margin: 0px;"&gt;GL_EXT_vertex_attrib_64bit&lt;/div&gt;&lt;div style="margin: 0px;"&gt;and amd:&lt;/div&gt;&lt;div style="margin: 0px;"&gt;GL_EXT_shader_atomic_counters&lt;/div&gt;&lt;div style="margin: 0px;"&gt;are not found..&lt;br /&gt;&lt;br /&gt;AMD 10.3 includes also first extension blend_func_extended.. &lt;/div&gt;&lt;div style="margin: 0px;"&gt;GL_EXT_vertex_attrib_64bit adds vertex attribs:&lt;/div&gt;&lt;div style="margin: 0px;"&gt;so now fp64 is only for uniforms and passing not vertex attribs&amp;nbsp;&lt;/div&gt;&lt;div style="margin: 0px;"&gt;remember no double rendertargets tex formats simlar to d3d11..&lt;/div&gt;&lt;div style="margin: 0px;"&gt;GL_EXT_shader_image_load_store allow write to random access to texes RWtexture3d &lt;br /&gt;amd has amdx_random_access_target &lt;/div&gt;&lt;br /&gt;&lt;div style="margin: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin: 0px;"&gt;&lt;/div&gt;&lt;div style="margin: 0px;"&gt;ARB_blend_func_extended is called dual source blending in DX10, but got dropped in DX11..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="margin: 0px;"&gt;We have tesselation shaders, dynamic shader linkage and compute interop with OCL..&lt;/div&gt;still lacking vs d3d11 is:&lt;br /&gt;*multi-threaded rendering:&lt;br /&gt;remember only creation of resources in current drivers.. no parralel command list creation&lt;br /&gt;is driver or hardware issue?&lt;br /&gt;&lt;br /&gt;*random access load/store/atomic to texes-&amp;gt;GL_EXT_shader_image_load_store amdx_random_access_target+GL_EXT_shader_atomic_counters&amp;nbsp;RWtexture3d&lt;br /&gt;*lacking atomic access to texs and mem barriers in fragment shaders: DeviceMemoryBarrier in d3d11&lt;br /&gt;*GL_AMD_conservative_depth adds:&lt;br /&gt;&lt;br /&gt;Conservative oDepth - This algorithm allows a pixel shader to compare the per-pixel depth value of the pixel shader with that in the rasterizer. The result enables early depth culling operations while maintaining the ability to output oDepth from a pixel shader.&lt;br /&gt;&lt;br /&gt;So people on OGL forums are criticizing lack of:&lt;br /&gt;*multi-threaded rendering&lt;br /&gt;*shader binaries for avoid compilation preferibly crossvendor and plaform as&amp;nbsp;DX IL DXBC (which is almost 100% compatible with ATI's IL)&lt;br /&gt;&lt;br /&gt;*direct state access&lt;br /&gt;* Epic fail for GL_ARB_sampler_objects as no glsl support..&lt;br /&gt;&lt;div&gt;I lack:&lt;/div&gt;&lt;div&gt;*ext_separate_shader_objects&lt;/div&gt;&lt;div&gt;The ability to separate program objects is only going to become increasingly more relevant.&lt;/div&gt;&lt;div&gt;*nv_texture_barrier&amp;nbsp;&lt;/div&gt;&lt;div&gt;crossprocess texture sharing?&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Support for programmable offsets in gather is there see 2x speedup in Fermi whitepaper and tesselation&lt;/div&gt;&lt;div&gt;fermi test would &amp;nbsp;be good&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;fermi:&lt;/div&gt;&lt;div&gt;196.78 drivers support fermi..&lt;/div&gt;&lt;div&gt;full support for OGL 4.0 in fermi launch..&lt;/div&gt;&lt;div&gt;stocasthic transpareny i3d 2010 has fermi perf on this algorithm via ogl sample_shading 10.1 extension&lt;/div&gt;&lt;div&gt;GLwgl_dx_interop&lt;/div&gt;&lt;div&gt;GL_NVX_gpu_memory_info&lt;/div&gt;&lt;div&gt;GL_NV_gpu_program4_1&lt;/div&gt;&lt;div&gt;published then?&lt;/div&gt;&lt;div&gt;try openrl with opencl on fermi..&lt;/div&gt;&lt;div&gt;opencl drivers at fermi launch will have:&lt;/div&gt;&lt;div&gt;1.cuda 3.0 final&lt;/div&gt;&lt;div&gt;&lt;div&gt;Fermi Direct3D 11 interoperability&lt;/div&gt;&lt;div&gt;Fermi HW Profiler support in OpenCL Visual Profiler&lt;/div&gt;&lt;div&gt;Complete BLAS lib, now with complex routines&lt;/div&gt;&lt;div&gt;cuda-gdb support for JIT compiled kernels&lt;/div&gt;&lt;/div&gt;&lt;div&gt;add&lt;/div&gt;&lt;div&gt;&lt;div&gt;C++ Class Inheritance&lt;/div&gt;&lt;div&gt;C++ Template Inheritance&lt;/div&gt;&lt;div&gt;Unified interoperability API for Direct3D and OpenGL&lt;/div&gt;&lt;div&gt;OpenGL texture interoperability&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2 with new opencl driver support:&lt;/div&gt;&lt;div&gt;*pragma unroll&amp;nbsp;&lt;/div&gt;&lt;div&gt;*local atomics&lt;/div&gt;&lt;div&gt;*icd final&lt;/div&gt;&lt;div&gt;*d3d9/10/11 support&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;fxc interface has interface support but functions inside it how are called&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;see "CUDA_Developer_Guide_for_Optimus_Platforms"&lt;/div&gt;&lt;/div&gt;&lt;div&gt;http://www.stumblingahead.com/blog/?p=66 talking about tesselation soon..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;2010 conferences GPU papers:&lt;/div&gt;&lt;div&gt;*PPOP&lt;/div&gt;&lt;div&gt;*GDC 2010&lt;/div&gt;&lt;div&gt;*I3D 2010&lt;/div&gt;&lt;div&gt;*GPGPU-3&lt;/div&gt;&lt;div&gt;*ASPLOS&lt;/div&gt;&lt;div&gt;&lt;div&gt;MacroSS: Macro-SIMDization of Streaming Applications,&lt;/div&gt;&lt;div&gt;COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders, &amp;nbsp;&lt;/div&gt;&lt;div&gt;"Investigating the Impact of Code Generation on Performance Characteristics of Integer Programs."&lt;/div&gt;&lt;/div&gt;&lt;div&gt;EUROGRAPHICS 2010&lt;/div&gt;&lt;div&gt;SIGGRAPH 2010&lt;/div&gt;&lt;div&gt;Interesting new/coming books:&lt;/div&gt;&lt;div&gt;&lt;div&gt;*Game Programming Gems 8&lt;/div&gt;&lt;div&gt;*gpu computing gems 2010?&lt;/div&gt;&lt;div&gt;*Game Engine Gems 1, Volume One&lt;/div&gt;&lt;div&gt;*Programming Massively Parallel Processors: A Hands-&lt;/div&gt;&lt;div&gt;*GPU Pro: Advanced Rendering Techniques&amp;nbsp;&lt;/div&gt;&lt;div&gt;*Multigrid Methods on GPUs&lt;/div&gt;&lt;div&gt;*Game Coding Complete, Third Edition&lt;/div&gt;&lt;div&gt;*Video Game Optimization&lt;/div&gt;&lt;div&gt;*Game Engine Architecture&lt;/div&gt;&lt;div&gt;*Real-Time Cameras&lt;/div&gt;&lt;div&gt;Programming Game AI by Example&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Comments:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;GL_ARB_shading_language_include-&amp;gt; glsl acepta #include i compilarshaderincludepaths fija &amp;lt;&amp;gt; paths de busqueda&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_texture_compression_bptc&lt;/div&gt;&lt;div&gt;textures d3d 11 -&amp;gt; compressor incluido mejor offline&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_blend_func_extended&lt;/div&gt;&lt;div&gt;permite usar dos salidas de fragment shader como color in i blend factors&lt;/div&gt;&lt;div&gt;mira ejemplo ventana color reflectiva en un paso usando con rops&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_explicit_attrib_location-&amp;gt;&lt;/div&gt;&lt;div&gt;fija en glsl explicito como las variables entre shaders se pasan e&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_occlusion_query2&lt;/div&gt;&lt;div&gt;permite una boleana para si algo pasa o no&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_sampler_objects&lt;/div&gt;&lt;div&gt;BindSampler( uint unit, uint sampler );&lt;/div&gt;&lt;div&gt;&amp;nbsp;When a sampler object is bound to a texture unit, its state supersedes that&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;of the texture object bound to that texture unit. If the sampler name zero&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;is bound to a texture unit, the currently bound texture's sampler state&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;becomes active. A single sampler object may be bound to multiple texture&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;units simultaneously.&lt;/div&gt;&lt;div&gt;no cambia glsl a hlsl con tex.sampler&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_shader_bit_encoding&lt;/div&gt;&lt;div&gt;con esto puedo usar fast float to int de spap paper kun zhou que coge bits&lt;/div&gt;&lt;div&gt;de float i haciendo cosas consige abs, float2int de valor ,etc..&lt;/div&gt;&lt;div&gt;To obtain signed or unsigned integer values holding the encoding of a&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;floating-point value, use:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;genIType floatBitsToInt(genType value);&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;genUType floatBitsToUint(genType value);&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;Conversions are done on a component-by-component basis.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_texture_rgb10_a2ui&lt;/div&gt;&lt;div&gt;GL_ARB_texture_swizzle&lt;/div&gt;&lt;div&gt;GL_ARB_timer_query&lt;/div&gt;&lt;div&gt;GL_ARB_vertex_type_2_10_10_10_rev&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_draw_indirect&lt;/div&gt;&lt;div&gt;compute interop&lt;/div&gt;&lt;div&gt;void DrawArraysIndirect(enum mode, const void *indirect);&lt;/div&gt;&lt;div&gt;nuevo buffer object&lt;/div&gt;&lt;div&gt;DRAW_INDIRECT_BUFFER &amp;nbsp;&amp;nbsp;&lt;/div&gt;&lt;div&gt;que hay bindeao&lt;/div&gt;&lt;div&gt;se usa como datos del num elementos etc..&lt;/div&gt;&lt;div&gt;que no&lt;/div&gt;&lt;div&gt;pues el puntero indirect se usa?..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_gpu_shader5&lt;/div&gt;&lt;div&gt;GL_ARB_gpu_shader_fp64&lt;/div&gt;&lt;div&gt;Should double-precision fragment shader outputs be supported?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;RESOLVED: &amp;nbsp;Not in this extension. &amp;nbsp;Note that we don't have&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp;double-precision framebuffer formats to accept such values.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;GL_ARB_shader_subroutine&lt;/div&gt;&lt;div&gt;GL_ARB_tessellation_shader&lt;/div&gt;&lt;div&gt;GL_ARB_texture_buffer_object_rgb32&lt;/div&gt;&lt;div&gt;GL_ARB_transform_feedback2&lt;/div&gt;&lt;div&gt;1.transform feedback objects&amp;nbsp;&lt;/div&gt;&lt;div&gt;2.pause and resume transform feedback&lt;/div&gt;&lt;div&gt;3.ability to draw primitives captured in transform feedback mode without querying the captured&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;primitive count&lt;/div&gt;&lt;div&gt;DrawTransformFeedback()&lt;/div&gt;&lt;div&gt;GL_ARB_transform_feedback3&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;unreal 3 news:&lt;/div&gt;&lt;div&gt;*palm webos and iphone support (on mac?)&lt;/div&gt;&lt;div&gt;*3d vision support&lt;/div&gt;&lt;div&gt;http://www.chw.net/2010/02/29-incomodas-preguntas-para-nvidia-sobre-gf100/&lt;/div&gt;&lt;div&gt;AMD Open Physics Initiative Expands Ecosystem with Free DMM for Game Production and Updated version of Bullet Physics&amp;nbsp;&lt;/div&gt;&lt;div&gt;Apple adopts DirectX 11 GPUs, buys AMD Radeon HD 5750&lt;/div&gt;&lt;div&gt;apple news:&lt;/div&gt;&lt;div&gt;*99 dev program&lt;/div&gt;&lt;div&gt;*valve games to mac next month and&amp;nbsp;monkey island 2 se..&lt;/div&gt;&lt;div&gt;*6core macpro next week (12 core?)Mac Pro 'hexacore' Xeon Core i7-980x coming Tuesday&lt;/div&gt;&lt;div&gt;reviews on anandtech 980 gulftown with aes today..&lt;/div&gt;&lt;div&gt;*amd 5750 imac in june?&amp;nbsp;adds opengl 4.0 and ocl full support for mac..&lt;/div&gt;&lt;div&gt;so 10.6.4 will support amd 5xxx&lt;/div&gt;&lt;div&gt;*iphone 4.0 multitasking support&lt;/div&gt;&lt;div&gt;*10.6.3 this month?&lt;/div&gt;&lt;div&gt;CUDA:cuda-gdb gpu support and visual profilers,64 bit and efficient gl interop soon?&lt;/div&gt;&lt;div&gt;&lt;div&gt;http://pasco2010.imag.fr/images/poster_pasco2010.pdf&lt;/div&gt;&lt;div&gt;http://unlimiteddetailtechnology.com/&lt;/div&gt;&lt;div&gt;roxio cienplayer 3d&lt;/div&gt;&lt;/div&gt;&lt;div&gt;CLyther = Python + OpenCL&lt;/div&gt;&lt;div&gt;amd open physics (free dmm 2.0 with ocl) and open stereo(qbf stereo for radeon?)&lt;/div&gt;&lt;div&gt;also eyefinity sdk coming soon..&lt;/div&gt;&lt;div&gt;ticker tape avaiable&lt;/div&gt;&lt;div&gt;pgi insider feb 2010 volume&lt;/div&gt;&lt;div&gt;http://www.pgroup.com/lit/articles/insider/v2n1a3.htm&lt;/div&gt;&lt;div&gt;says new fermi support and data region things..&lt;/div&gt;&lt;div&gt;XNA 4.0 winpho 7 tegra2 soon..&lt;/div&gt;&lt;div&gt;Yellow Dog Enterprise Linux for CUDA&lt;/div&gt;&lt;div&gt;http://ydl.net/cuda/iso/YDELforCUDA-6.2-20100302-DVD.iso download free for students&lt;/div&gt;&lt;div&gt;Jenkins Software Announces Data Mining Tool for Game Developers&lt;/div&gt;&lt;div&gt;&lt;div&gt;As a further enhancement, AMD has developed new parallel GPU accelerated implementations of Bullet Physics’ Smoothed Particle Hydrodynamics (SPH)&amp;nbsp;Fluids and Soft Bodies/Cloth. The new code written in OpenCL and Direct Compute will be contributed as open source.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&amp;nbsp;OpenGL usage from an ISV perspective&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;intel gpa 3.0&lt;/div&gt;&lt;div&gt;nity Announces 3.0 Platform, Support For PS3, iPad, And Android&lt;/div&gt;&lt;div&gt;&amp;nbsp;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt; Valve Confirms Mac Versions Of Steam, Valve Game&lt;/div&gt;&lt;/div&gt;&lt;div&gt;http://www.raknet.net/echochamber&lt;/div&gt;&lt;/div&gt;&lt;div&gt;Erwin Coumans - SONY - Porting existing code to OpenCL&lt;/div&gt;&lt;div&gt;Ben Gaster AMD and Avi Shapira - Graphic Remedy - Debugging fluid dynamics on OpenCL&lt;/div&gt;&lt;div&gt;Greg Smith - NVIDIA - FFT and OpenCL Profiling&lt;/div&gt;&lt;div&gt;http://www.arm.com/community/software-enablement/google/solution-center-android.php&lt;/div&gt;&lt;div&gt;http://realworldtech.com/forums/index.cfm?action=detail&amp;amp;id=108017&amp;amp;threadid=108017&amp;amp;roomid=2&lt;/div&gt;&lt;div&gt;&lt;div&gt;I can only say that at CAL level (and obviously OpenCL built upon CAL) there are numerous problems with multiple GPUs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Definitely you're need one thread and one context per each GPU to make it working. But it itsn't enough because almost every CAL function isn't thread safe, thus calling calResMap() (which is the only to get access to local GPU memory) in one thread blocks all other threads/contexts.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And (as I've already wrote at these forums), OpenCL using calCtxWaitForEvent() function instead of CPU burning loop&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;while (calCtxIsEventDone(calCtx, e) == CAL_RESULT_PENDING);&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;to wait for GPU kernel completion.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But this calCtxWaitForEvent() also blocks every context currently running. This especially noticeable when there are different devices at system (like 5770+4770). So basically it's simply impossible to asynchronously work with multiple GPUs within single process.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;All above things applies to windows version of CAL, never tried linux one.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Yup, and I use 1 thread per GPU too. So 1 thread, 1 context, 1 queue for each GPU. I tried other configurations but they weren't working (i.e. not running in parallel).&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Why on HD4870 with 512 MB onboard RAM only 128 available to OpenCL ???&amp;nbsp;&lt;/div&gt;&lt;div&gt;http://forums.amd.com/devforum/messageview.cfm?catid=390&amp;amp;threadid=128846&amp;amp;enterthread=y&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;MacroSS: Macro-SIMDization of Streaming Applications,&lt;/div&gt;&lt;div&gt;COMPASS: A Programmable Data Prefetcher Using Idle GPU Shaders, &amp;nbsp;&lt;/div&gt;&lt;div&gt;"Investigating the Impact of Code Generation on Performance Characteristics of Integer Programs."&lt;/div&gt;&lt;div&gt;http://ctk-dev.sourceforge.net/&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;gmac&lt;/div&gt;&lt;div&gt;http://ctk-dev.sourceforge.net&lt;/div&gt;&lt;div&gt;&lt;div&gt;http://code.google.com/p/fluidic/&lt;/div&gt;&lt;div&gt;http://otoy.com/&lt;/div&gt;&lt;div&gt;http://www.gameenginegems.com/&lt;/div&gt;&lt;div&gt;We're excited to announce a new addition to the Palm® webOS™ development platform: the webOS Plug-in Development Kit (PDK) lets developers extend their webOS applications by writing plug-ins in C or C++. The webOS PDK makes it easy for developers to leverage existing code and exposes new capabilities — including high-performance 3D graphics.&lt;/div&gt;&lt;div&gt;http://code.google.com/p/gyp/source/checkout&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-4867527878798533852?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/4867527878798533852/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/whats-left-in-opengl-40-and-more-raw.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4867527878798533852'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4867527878798533852'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/whats-left-in-opengl-40-and-more-raw.html' title='What&apos;s left in OpenGL 4.0? and more raw info..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-8450121695166184000</id><published>2010-03-07T18:12:00.016+01:00</published><updated>2010-03-07T19:50:22.439+01:00</updated><title type='text'>GPU computing toys!</title><content type='html'>Hi I would like to release some lame but hopefully useful tools:&lt;br /&gt;&lt;a href="https://dl.dropbox.com/u/1416327/cld3d.rar"&gt;https://dl.dropbox.com/u/1416327/cld3d.rar&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;First OCL D3D interop headers and spec for Nvidia and AMD and a tool for checking current status:&lt;br /&gt;the headers are in h&lt;br /&gt;and are for d3d9,10,11 for NV and d3d9,10 for AMD..&lt;br /&gt;#include &lt;cl\cl_nv_d3d.h&gt; for every d3d version and call&amp;nbsp;initcld3d() in your code and voila you have the&lt;/cl\cl_nv_d3d.h&gt;&lt;br /&gt;d3d stuff..&lt;br /&gt;if you #define&amp;nbsp;INCAMD you have even amd functions included and can avoid amd headers..&lt;br /&gt;&lt;br /&gt;with these I have complied&amp;nbsp;four exes named cl_xx_interop which check d3d 9,9Ex,10 and 11..&lt;br /&gt;they check extension reporting, try to create a shared context in some ways and then associate a d3d object and textures to ocl and aquire and release it prior to use..&lt;br /&gt;&lt;br /&gt;Also&amp;nbsp;cl_d3d10_interop build shows image formats avaiable to OpenCL images see next post..&lt;br /&gt;&lt;br /&gt;Testing OCL-D3D11 interop&lt;br /&gt;Checking D3D interop extensions support for platform: NVIDIA Corporation&lt;br /&gt;&amp;nbsp;nv D3D &amp;nbsp;9 interop extension: &amp;nbsp;Found.&lt;br /&gt;&amp;nbsp;nv D3D 10 interop extension: &amp;nbsp;Found.&lt;br /&gt;&amp;nbsp;nv D3D 11 interop extension: &amp;nbsp;Found.&lt;br /&gt;&lt;br /&gt;Using device: GeForce GTX 275&lt;br /&gt;Enabling texture interop checks: image support is supported.&lt;br /&gt;clGetDeviceIDsFromD3D11NV pointer: Found&lt;br /&gt;&amp;nbsp;and it works! (returns d3d associated ocl device)&lt;br /&gt;clCreateFromD3D11BufferNV pointer: Found&lt;br /&gt;clCreateFromD3D11Texture2DNV pointer: Found&lt;br /&gt;clCreateFromD3D11Texture3DNV pointer: Found&lt;br /&gt;clEnqueueAcquireD3D11ObjectsNV pointer: Found&lt;br /&gt;clEnqueueReleaseD3D11ObjectsNV pointer: Found&lt;br /&gt;Testing context creation with&lt;br /&gt;&amp;nbsp;no dev (clCreateContextFromType): OK.&lt;br /&gt;dev info (getdeviceids): OK.&lt;br /&gt;dev info (clGetDeviceIDsFromD3DNV CL_PREFERRED_DEVICES_FOR_D3D9_NV): OK.&lt;br /&gt;Testing clCreateFromD3D11BufferNV: OK.&lt;br /&gt;Testing aquire release stuff: Ok.. releasing it: Ok.&lt;br /&gt;Testing clCreateFromD3D11Texture2DNV: OK.&lt;br /&gt;Testing aquire release stuff: Ok.. releasing it: Ok.&lt;br /&gt;Testing clCreateFromD3D11Texture3DNV: OK.&lt;br /&gt;Testing aquire release stuff: Ok.. releasing it: Ok.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Also I contains a optd3d which displays the four optional&amp;nbsp;d3d11 features (cap bits):&lt;br /&gt;&lt;br /&gt;In my gtx 200 displays:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;multithreaded comand lists: 0&lt;br /&gt;multithreaded Concurrent Creates: 1&lt;br /&gt;Double precision: 0&lt;br /&gt;Compute Shader: 1&lt;br /&gt;&lt;br /&gt;in ATI 5850 displays:&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;multithreaded comand lists: 0&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;multithreaded Concurrent Creates: 1&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Double precision: 1&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Compute Shader: 1&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Anyway double prec is not working with loops..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;This shows multithreaded command lists are still not supported by ATI (are this supposed to be a implementation issue or a hardware limitation..)&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;Equal to Nvidia and upcoming Fermi..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;I include a CLinfo not mine but for checking CL info..&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;report.bat create a report.txt with the info of all this executables..&lt;/div&gt;&lt;div&gt;I also include 2dbench for cheking GDI in Windows 7 perf issues.. AMD will fix in Catalyst 10.4..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;There is a high efficient matmul for CUDA and AMD cards and peakflops for AMD cards..&lt;/div&gt;&lt;div&gt;&lt;br /&gt;%&lt;br /&gt;% &amp;nbsp;compute C = A*B, A:mxk, B:kxn, C:mxn&lt;br /&gt;%&lt;br /&gt;% &amp;nbsp;cubin file = ../method1/decuda_ldsb32_cudasm.cubin&lt;br /&gt;% &amp;nbsp;kernel function = method1_variant_sgemmNN&lt;br /&gt;% &amp;nbsp;use device: GeForce GTX 275&lt;br /&gt;% &amp;nbsp;m=n=k &amp;nbsp; &amp;nbsp;gpu_time (ms) &amp;nbsp; flops (Gflops/s)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 32 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.044 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.391&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;128 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.120 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;32.451&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;224 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.194 &amp;nbsp; &amp;nbsp; &amp;nbsp; 107.870&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;320 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.302 &amp;nbsp; &amp;nbsp; &amp;nbsp; 201.802&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;416 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.445 &amp;nbsp; &amp;nbsp; &amp;nbsp; 301.033&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;512 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.619 &amp;nbsp; &amp;nbsp; &amp;nbsp; 403.979&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;608 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.277 &amp;nbsp; &amp;nbsp; &amp;nbsp; 327.914&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;704 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.582 &amp;nbsp; &amp;nbsp; &amp;nbsp; 410.719&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;800 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.618 &amp;nbsp; &amp;nbsp; &amp;nbsp; 364.210&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;896 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3.135 &amp;nbsp; &amp;nbsp; &amp;nbsp; 427.439&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;992 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 4.401 &amp;nbsp; &amp;nbsp; &amp;nbsp; 413.123&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1088 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 6.014 &amp;nbsp; &amp;nbsp; &amp;nbsp; 398.868&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1184 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 6.981 &amp;nbsp; &amp;nbsp; &amp;nbsp; 442.860&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1280 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 8.751 &amp;nbsp; &amp;nbsp; &amp;nbsp; 446.365&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1376 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;10.911 &amp;nbsp; &amp;nbsp; &amp;nbsp; 444.746&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1472 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;13.403 &amp;nbsp; &amp;nbsp; &amp;nbsp; 443.262&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1568 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;16.377 &amp;nbsp; &amp;nbsp; &amp;nbsp; 438.470&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1664 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;18.901 &amp;nbsp; &amp;nbsp; &amp;nbsp; 454.051&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1760 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;22.437 &amp;nbsp; &amp;nbsp; &amp;nbsp; 452.594&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1856 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;25.820 &amp;nbsp; &amp;nbsp; &amp;nbsp; 461.218&lt;br /&gt;&amp;nbsp;&amp;nbsp; 1952 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;31.233 &amp;nbsp; &amp;nbsp; &amp;nbsp; 443.566&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;33.317 &amp;nbsp; &amp;nbsp; &amp;nbsp; 480.229&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2144 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;39.834 &amp;nbsp; &amp;nbsp; &amp;nbsp; 460.841&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2240 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;44.989 &amp;nbsp; &amp;nbsp; &amp;nbsp; 465.337&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2336 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;51.643 &amp;nbsp; &amp;nbsp; &amp;nbsp; 459.765&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2432 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;56.514 &amp;nbsp; &amp;nbsp; &amp;nbsp; 474.095&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2528 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;64.183 &amp;nbsp; &amp;nbsp; &amp;nbsp; 468.859&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2624 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;72.540 &amp;nbsp; &amp;nbsp; &amp;nbsp; 463.923&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2720 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;79.686 &amp;nbsp; &amp;nbsp; &amp;nbsp; 470.387&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2816 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;85.826 &amp;nbsp; &amp;nbsp; &amp;nbsp; 484.626&lt;br /&gt;&amp;nbsp;&amp;nbsp; 2912 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;96.003 &amp;nbsp; &amp;nbsp; &amp;nbsp; 479.094&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3008 &amp;nbsp; &amp;nbsp; &amp;nbsp; 108.801 &amp;nbsp; &amp;nbsp; &amp;nbsp; 465.942&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3104 &amp;nbsp; &amp;nbsp; &amp;nbsp; 121.579 &amp;nbsp; &amp;nbsp; &amp;nbsp; 458.181&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3200 &amp;nbsp; &amp;nbsp; &amp;nbsp; 126.446 &amp;nbsp; &amp;nbsp; &amp;nbsp; 482.699&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3296 &amp;nbsp; &amp;nbsp; &amp;nbsp; 138.522 &amp;nbsp; &amp;nbsp; &amp;nbsp; 481.473&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3392 &amp;nbsp; &amp;nbsp; &amp;nbsp; 153.544 &amp;nbsp; &amp;nbsp; &amp;nbsp; 473.440&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3488 &amp;nbsp; &amp;nbsp; &amp;nbsp; 168.797 &amp;nbsp; &amp;nbsp; &amp;nbsp; 468.268&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3584 &amp;nbsp; &amp;nbsp; &amp;nbsp; 177.873 &amp;nbsp; &amp;nbsp; &amp;nbsp; 482.085&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3680 &amp;nbsp; &amp;nbsp; &amp;nbsp; 193.298 &amp;nbsp; &amp;nbsp; &amp;nbsp; 480.227&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3776 &amp;nbsp; &amp;nbsp; &amp;nbsp; 212.160 &amp;nbsp; &amp;nbsp; &amp;nbsp; 472.675&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3872 &amp;nbsp; &amp;nbsp; &amp;nbsp; 229.596 &amp;nbsp; &amp;nbsp; &amp;nbsp; 470.947&lt;br /&gt;&amp;nbsp;&amp;nbsp; 3968 &amp;nbsp; &amp;nbsp; &amp;nbsp; 246.403 &amp;nbsp; &amp;nbsp; &amp;nbsp; 472.280&lt;br /&gt;&amp;nbsp;&amp;nbsp; 4064 &amp;nbsp; &amp;nbsp; &amp;nbsp; 260.086 &amp;nbsp; &amp;nbsp; &amp;nbsp; 480.699&lt;br /&gt;&lt;div&gt;clock 1620&lt;/div&gt;&lt;div&gt;&lt;div&gt;% &amp;nbsp;m=n=k &amp;nbsp; &amp;nbsp;gpu_time (ms) &amp;nbsp; flops (Gflops/s)&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; 32 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.040 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.516&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;128 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.108 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;36.044&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;224 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.173 &amp;nbsp; &amp;nbsp; &amp;nbsp; 120.900&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;320 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.265 &amp;nbsp; &amp;nbsp; &amp;nbsp; 229.925&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;416 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.393 &amp;nbsp; &amp;nbsp; &amp;nbsp; 341.338&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;512 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 0.535 &amp;nbsp; &amp;nbsp; &amp;nbsp; 467.090&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;608 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.107 &amp;nbsp; &amp;nbsp; &amp;nbsp; 378.021&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;704 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1.371 &amp;nbsp; &amp;nbsp; &amp;nbsp; 474.163&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;800 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.270 &amp;nbsp; &amp;nbsp; &amp;nbsp; 420.030&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;896 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 2.751 &amp;nbsp; &amp;nbsp; &amp;nbsp; 486.983&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;992 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 3.804 &amp;nbsp; &amp;nbsp; &amp;nbsp; 477.992&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1088 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 5.205 &amp;nbsp; &amp;nbsp; &amp;nbsp; 460.925&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1184 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 6.003 &amp;nbsp; &amp;nbsp; &amp;nbsp; 514.983&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1280 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 7.609 &amp;nbsp; &amp;nbsp; &amp;nbsp; 513.393&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1376 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 9.396 &amp;nbsp; &amp;nbsp; &amp;nbsp; 516.463&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1472 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;11.555 &amp;nbsp; &amp;nbsp; &amp;nbsp; 514.134&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1568 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;14.145 &amp;nbsp; &amp;nbsp; &amp;nbsp; 507.666&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1664 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;16.427 &amp;nbsp; &amp;nbsp; &amp;nbsp; 522.442&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1760 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;19.387 &amp;nbsp; &amp;nbsp; &amp;nbsp; 523.784&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1856 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;22.182 &amp;nbsp; &amp;nbsp; &amp;nbsp; 536.854&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 1952 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;26.860 &amp;nbsp; &amp;nbsp; &amp;nbsp; 515.777&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2048 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;28.642 &amp;nbsp; &amp;nbsp; &amp;nbsp; 558.623&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2144 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;34.530 &amp;nbsp; &amp;nbsp; &amp;nbsp; 531.627&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2240 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;39.585 &amp;nbsp; &amp;nbsp; &amp;nbsp; 528.868&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2336 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;44.440 &amp;nbsp; &amp;nbsp; &amp;nbsp; 534.292&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2432 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;49.141 &amp;nbsp; &amp;nbsp; &amp;nbsp; 545.226&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2528 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;55.274 &amp;nbsp; &amp;nbsp; &amp;nbsp; 544.429&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2624 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;63.241 &amp;nbsp; &amp;nbsp; &amp;nbsp; 532.134&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2720 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;68.451 &amp;nbsp; &amp;nbsp; &amp;nbsp; 547.592&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2816 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;74.160 &amp;nbsp; &amp;nbsp; &amp;nbsp; 560.865&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 2912 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;82.945 &amp;nbsp; &amp;nbsp; &amp;nbsp; 554.516&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3008 &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;94.150 &amp;nbsp; &amp;nbsp; &amp;nbsp; 538.449&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3104 &amp;nbsp; &amp;nbsp; &amp;nbsp; 104.581 &amp;nbsp; &amp;nbsp; &amp;nbsp; 532.653&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3200 &amp;nbsp; &amp;nbsp; &amp;nbsp; 108.907 &amp;nbsp; &amp;nbsp; &amp;nbsp; 560.436&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3296 &amp;nbsp; &amp;nbsp; &amp;nbsp; 119.277 &amp;nbsp; &amp;nbsp; &amp;nbsp; 559.158&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3392 &amp;nbsp; &amp;nbsp; &amp;nbsp; 131.982 &amp;nbsp; &amp;nbsp; &amp;nbsp; 550.785&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3488 &amp;nbsp; &amp;nbsp; &amp;nbsp; 146.003 &amp;nbsp; &amp;nbsp; &amp;nbsp; 541.376&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3584 &amp;nbsp; &amp;nbsp; &amp;nbsp; 154.088 &amp;nbsp; &amp;nbsp; &amp;nbsp; 556.502&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3680 &amp;nbsp; &amp;nbsp; &amp;nbsp; 166.307 &amp;nbsp; &amp;nbsp; &amp;nbsp; 558.166&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3776 &amp;nbsp; &amp;nbsp; &amp;nbsp; 184.523 &amp;nbsp; &amp;nbsp; &amp;nbsp; 543.469&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3872 &amp;nbsp; &amp;nbsp; &amp;nbsp; 198.692 &amp;nbsp; &amp;nbsp; &amp;nbsp; 544.196&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 3968 &amp;nbsp; &amp;nbsp; &amp;nbsp; 214.158 &amp;nbsp; &amp;nbsp; &amp;nbsp; 543.390&lt;/div&gt;&lt;div&gt;&amp;nbsp;&amp;nbsp; 4064 &amp;nbsp; &amp;nbsp; &amp;nbsp; 223.720 &amp;nbsp; &amp;nbsp; &amp;nbsp; 558.838&lt;/div&gt;&lt;div&gt;&lt;br /&gt;it's a cubin so will not work in fermi&lt;br /&gt;5850 stock&lt;br /&gt;&lt;br /&gt;flopspeak.exe&lt;br /&gt;Device &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;0&lt;br /&gt;target &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;8&lt;br /&gt;localRAM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;1024 MB&lt;br /&gt;uncachedRemoteRAM 2047 MB&lt;br /&gt;cachedRemoteRAM &amp;nbsp; 2047 MB&lt;br /&gt;engineClock &amp;nbsp; &amp;nbsp; &amp;nbsp; 725 MHz&lt;br /&gt;memoryClock &amp;nbsp; &amp;nbsp; &amp;nbsp; 1000 MHz&lt;br /&gt;wavefrontSize &amp;nbsp; &amp;nbsp; 64&lt;br /&gt;numberOfSIMD &amp;nbsp; &amp;nbsp; &amp;nbsp;18&lt;br /&gt;doublePrecision &amp;nbsp; 1&lt;br /&gt;localDataShare &amp;nbsp; &amp;nbsp;1&lt;br /&gt;globalDataShare &amp;nbsp; 1&lt;br /&gt;globalGPR &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&lt;br /&gt;computeShader &amp;nbsp; &amp;nbsp; 1&lt;br /&gt;memExport &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 1&lt;br /&gt;pitch_alignment &amp;nbsp; 256&lt;br /&gt;surface_alignment 4096&lt;br /&gt;Device 0: execution time 7913.45 ms, achieved 2041.80 gflops&lt;br /&gt;oc 950mhz&lt;br /&gt;&lt;br /&gt;flopspeak.exe&lt;br /&gt;&lt;br /&gt;engineClock &amp;nbsp; &amp;nbsp; &amp;nbsp; 950 MHz&lt;br /&gt;memoryClock &amp;nbsp; &amp;nbsp; &amp;nbsp; 1000 MHz&lt;br /&gt;&lt;br /&gt;Device 0: execution time 6039.35 ms, achieved 2675.40 gflops&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;matmul.exe 2048 2048 100&lt;br /&gt;&lt;br /&gt;Device 0: execution time 1415.08 ms, achieved 1214.06 gflops&lt;br /&gt;oc 950mhz&lt;br /&gt;Device 0: execution time 1114.06 ms, achieved 1542.09 gflops&lt;br /&gt;&lt;br /&gt;UPDATE 1:&lt;br /&gt;Nvidia and ATI working together!&lt;br /&gt;opencl.dll from ati sdk 2.01&lt;br /&gt;&lt;br /&gt;Found 2 platform(s).&lt;br /&gt;platform[01104BA0]: profile: FULL_PROFILE&lt;br /&gt;platform[01104BA0]: version: OpenCL 1.0 CUDA 3.0.1&lt;br /&gt;platform[01104BA0]: name: NVIDIA CUDA&lt;br /&gt;platform[01104BA0]: vendor: NVIDIA Corporation&lt;br /&gt;platform[01104BA0]: extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_&lt;br /&gt;gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing cl_nv_comp&lt;br /&gt;iler_options cl_nv_device_attribute_query cl_nv_pragma_unroll&lt;br /&gt;platform[01104BA0]: Found 1 device(s).&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: NAME: GeForce GTX 275&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: VENDOR: NVIDIA Corporation&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PROFILE: FULL_PROFILE&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: VERSION: OpenCL 1.0 CUDA&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd c&lt;br /&gt;l_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing cl_n&lt;br /&gt;v_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll &amp;nbsp;cl_khr_glob&lt;br /&gt;al_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_ba&lt;br /&gt;se_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: DRIVER_VERSION: 196.75&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: Type: GPU&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: EXECUTION_CAPABILITIES: Kernel&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: GLOBAL_MEM_CACHE_TYPE: None (0)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: SINGLE_FP_CONFIG: 0x3e&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: QUEUE_PROPERTIES: 0x3&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: VENDOR_ID: 4318&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_COMPUTE_UNITS: 30&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_WORK_ITEM_DIMENSIONS: 3&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_WORK_GROUP_SIZE: 512&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PREFERRED_VECTOR_WIDTH_CHAR: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PREFERRED_VECTOR_WIDTH_SHORT: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PREFERRED_VECTOR_WIDTH_INT: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PREFERRED_VECTOR_WIDTH_LONG: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PREFERRED_VECTOR_WIDTH_FLOAT: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PREFERRED_VECTOR_WIDTH_DOUBLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_CLOCK_FREQUENCY: 1404&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: ADDRESS_BITS: 32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_MEM_ALLOC_SIZE: 229998592&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: IMAGE_SUPPORT: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_READ_IMAGE_ARGS: 128&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_WRITE_IMAGE_ARGS: 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: IMAGE2D_MAX_WIDTH: 8192&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: IMAGE2D_MAX_HEIGHT: 8192&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: IMAGE3D_MAX_WIDTH: 2048&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: IMAGE3D_MAX_HEIGHT: 2048&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: IMAGE3D_MAX_DEPTH: 2048&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_SAMPLERS: 16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_PARAMETER_SIZE: 4352&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MEM_BASE_ADDR_ALIGN: 256&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MIN_DATA_TYPE_ALIGN_SIZE: 16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: GLOBAL_MEM_CACHELINE_SIZE: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: GLOBAL_MEM_CACHE_SIZE: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: GLOBAL_MEM_SIZE: 919994368&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_CONSTANT_BUFFER_SIZE: 65536&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: MAX_CONSTANT_ARGS: 9&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: LOCAL_MEM_SIZE: 16384&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: ERROR_CORRECTION_SUPPORT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: PROFILING_TIMER_RESOLUTION: 1000&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: ENDIAN_LITTLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: AVAILABLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[01104C08]: COMPILER_AVAILABLE: 1&lt;br /&gt;platform[0313A434]: profile: FULL_PROFILE&lt;br /&gt;platform[0313A434]: version: OpenCL 1.0 ATI-Stream-v2.0.1&lt;br /&gt;platform[0313A434]: name: ATI Stream&lt;br /&gt;platform[0313A434]: vendor: Advanced Micro Devices, Inc.&lt;br /&gt;platform[0313A434]: extensions: cl_khr_icd&lt;br /&gt;platform[0313A434]: Found 2 device(s).&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: NAME: Intel(R) Core(TM) i7 CPU &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 920 &amp;nbsp;@ 2.67GHz&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: VENDOR: GenuineIntel&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PROFILE: FULL_PROFILE&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: VERSION: OpenCL 1.0 ATI-Stream-v2.0.1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: EXTENSIONS: cl_khr_icd cl_khr_global_int32_base_atomic&lt;br /&gt;s cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_lo&lt;br /&gt;cal_int32_extended_atomics cl_khr_byte_addressable_store&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: DRIVER_VERSION: 1.0&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: Type: CPU&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: EXECUTION_CAPABILITIES: Kernel&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: GLOBAL_MEM_CACHE_TYPE: Read-Write (2)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: CL_DEVICE_LOCAL_MEM_TYPE: Global (2)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: SINGLE_FP_CONFIG: 0x7&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: QUEUE_PROPERTIES: 0x2&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: VENDOR_ID: 4098&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_COMPUTE_UNITS: 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_WORK_ITEM_DIMENSIONS: 3&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_WORK_GROUP_SIZE: 1024&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PREFERRED_VECTOR_WIDTH_CHAR: 16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PREFERRED_VECTOR_WIDTH_SHORT: 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PREFERRED_VECTOR_WIDTH_INT: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PREFERRED_VECTOR_WIDTH_LONG: 2&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PREFERRED_VECTOR_WIDTH_FLOAT: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_CLOCK_FREQUENCY: 2698&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: ADDRESS_BITS: 32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_MEM_ALLOC_SIZE: 536870912&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: IMAGE_SUPPORT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_READ_IMAGE_ARGS: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_WRITE_IMAGE_ARGS: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: IMAGE2D_MAX_WIDTH: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: IMAGE2D_MAX_HEIGHT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: IMAGE3D_MAX_WIDTH: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: IMAGE3D_MAX_HEIGHT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: IMAGE3D_MAX_DEPTH: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_SAMPLERS: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_PARAMETER_SIZE: 4096&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MEM_BASE_ADDR_ALIGN: 32768&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MIN_DATA_TYPE_ALIGN_SIZE: 128&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: GLOBAL_MEM_CACHELINE_SIZE: 64&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: GLOBAL_MEM_CACHE_SIZE: 65536&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: GLOBAL_MEM_SIZE: 1073741824&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_CONSTANT_BUFFER_SIZE: 65536&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: MAX_CONSTANT_ARGS: 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: LOCAL_MEM_SIZE: 32768&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: ERROR_CORRECTION_SUPPORT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: PROFILING_TIMER_RESOLUTION: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: ENDIAN_LITTLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: AVAILABLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[0338CA70]: COMPILER_AVAILABLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: NAME: Cypress&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: VENDOR: Advanced Micro Devices, Inc.&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PROFILE: FULL_PROFILE&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: VERSION: OpenCL 1.0 ATI-Stream-v2.0.1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: EXTENSIONS: cl_khr_global_int32_base_atomics cl_khr_gl&lt;br /&gt;obal_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_e&lt;br /&gt;xtended_atomics&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: DRIVER_VERSION: CAL 1.4.556&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: Type: GPU&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: EXECUTION_CAPABILITIES: Kernel&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: GLOBAL_MEM_CACHE_TYPE: None (0)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: SINGLE_FP_CONFIG: 0x6&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: QUEUE_PROPERTIES: 0x2&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: VENDOR_ID: 4098&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_COMPUTE_UNITS: 18&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_WORK_ITEM_DIMENSIONS: 3&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_WORK_GROUP_SIZE: 256&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PREFERRED_VECTOR_WIDTH_CHAR: 16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PREFERRED_VECTOR_WIDTH_SHORT: 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PREFERRED_VECTOR_WIDTH_INT: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PREFERRED_VECTOR_WIDTH_LONG: 2&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PREFERRED_VECTOR_WIDTH_FLOAT: 4&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_CLOCK_FREQUENCY: 725&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: ADDRESS_BITS: 32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_MEM_ALLOC_SIZE: 268435456&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: IMAGE_SUPPORT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_READ_IMAGE_ARGS: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_WRITE_IMAGE_ARGS: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: IMAGE2D_MAX_WIDTH: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: IMAGE2D_MAX_HEIGHT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: IMAGE3D_MAX_WIDTH: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: IMAGE3D_MAX_HEIGHT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: IMAGE3D_MAX_DEPTH: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_SAMPLERS: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_PARAMETER_SIZE: 1024&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MEM_BASE_ADDR_ALIGN: 4096&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MIN_DATA_TYPE_ALIGN_SIZE: 128&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: GLOBAL_MEM_CACHELINE_SIZE: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: GLOBAL_MEM_CACHE_SIZE: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: GLOBAL_MEM_SIZE: 268435456&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_CONSTANT_BUFFER_SIZE: 65536&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: MAX_CONSTANT_ARGS: 8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: LOCAL_MEM_SIZE: 32768&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: ERROR_CORRECTION_SUPPORT: 0&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: PROFILING_TIMER_RESOLUTION: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: ENDIAN_LITTLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: AVAILABLE: 1&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;device[04A30050]: COMPILER_AVAILABLE: 1&lt;br /&gt;UPDATE 2:&lt;br /&gt;DX formats included in optd3d&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-8450121695166184000?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/8450121695166184000/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/gpu-computing-toys.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/8450121695166184000'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/8450121695166184000'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/gpu-computing-toys.html' title='GPU computing toys!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-7241786228336684121</id><published>2010-03-07T17:44:00.003+01:00</published><updated>2010-03-07T18:02:47.743+01:00</updated><title type='text'>GPGPU Image support!</title><content type='html'>1. D3D&lt;br /&gt;In doc there is a table "Hardware Support for Direct3D 11 Formats"&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt; &lt;th rowspan="2"&gt;Format(&lt;a href="http://www.blogger.com/post-create.g?blogID=8553786559872430029"&gt;DXGI_FORMAT_*&lt;/a&gt;)&lt;/th&gt; &lt;th rowspan="2"&gt;# Bits&lt;/th&gt; &lt;th colspan="38" style="text-align: center;"&gt;&lt;a href="http://www.blogger.com/post-create.g?blogID=8553786559872430029"&gt;Format  Target&lt;/a&gt;&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt; &lt;th&gt;1&lt;/th&gt; &lt;th&gt;2&lt;/th&gt; &lt;th&gt;3&lt;/th&gt; &lt;th&gt;4&lt;/th&gt; &lt;th&gt;5&lt;/th&gt; &lt;th&gt;6&lt;/th&gt; &lt;th&gt;7&lt;/th&gt; &lt;th&gt;8&lt;/th&gt; &lt;th&gt;9&lt;/th&gt; &lt;th&gt;10&lt;/th&gt; &lt;th&gt;11&lt;/th&gt; &lt;th&gt;12&lt;/th&gt; &lt;th&gt;13&lt;/th&gt; &lt;th&gt;14&lt;/th&gt; &lt;th&gt;15&lt;/th&gt; &lt;th&gt;16&lt;/th&gt; &lt;th&gt;17&lt;/th&gt; &lt;th&gt;18&lt;/th&gt; &lt;th&gt;19&lt;/th&gt; &lt;th&gt;20&lt;/th&gt; &lt;th&gt;21&lt;/th&gt; &lt;th&gt;22&lt;/th&gt; &lt;th&gt;23&lt;/th&gt; &lt;th&gt;24&lt;/th&gt; &lt;th&gt;25&lt;/th&gt; &lt;th&gt;26&lt;/th&gt; &lt;th&gt;27&lt;/th&gt; &lt;th&gt;28&lt;/th&gt; &lt;th&gt;29&lt;/th&gt; &lt;th&gt;30&lt;/th&gt; &lt;th&gt;31&lt;/th&gt; &lt;th&gt;32&lt;/th&gt; &lt;th&gt;33&lt;/th&gt; &lt;th&gt;34&lt;/th&gt; &lt;th&gt;35&lt;/th&gt; &lt;th&gt;36&lt;/th&gt; &lt;th&gt;37&lt;/th&gt; &lt;th&gt;38&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;UNKNOWN&lt;/td&gt; &lt;td&gt;0&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R32G32B32A32_TYPELESS&lt;/td&gt; &lt;td&gt;128&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32B32A32_FLOAT&lt;/td&gt; &lt;td&gt;128&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32B32A32_UINT&lt;/td&gt; &lt;td&gt;128&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32B32A32_SINT&lt;/td&gt; &lt;td&gt;128&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R32G32B32_TYPELESS&lt;/td&gt; &lt;td&gt;96&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32B32_FLOAT&lt;/td&gt; &lt;td&gt;96&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;o&lt;sup&gt;1&lt;/sup&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32B32_UINT&lt;/td&gt; &lt;td&gt;96&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32B32_SINT&lt;/td&gt; &lt;td&gt;96&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R16G16B16A16_TYPELESS&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16B16A16_FLOAT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16B16A16_UNORM&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16B16A16_UINT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16B16A16_SNORM&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16B16A16_SINT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R32G32_TYPELESS&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32_FLOAT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32_UINT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32G32_SINT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R32G8X24_TYPELESS&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;D32_FLOAT_S8X24_UINT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32_FLOAT_X8X24_TYPELESS&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;X32_TYPELESS_G8X24_UINT&lt;/td&gt; &lt;td&gt;64&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R10G10B10A2_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R10G10B10A2_UNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R10G10B10A2_UINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R10G10B10_XR_BIAS_A2_UNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R11G11B10_FLOAT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R8G8B8A8_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8B8A8_UNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8B8A8_UNORM_SRGB&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8B8A8_UINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8B8A8_SNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8B8A8_SINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R16G16_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16_FLOAT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16_UNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16_UINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16_SNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16G16_SINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R32_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;D32_FLOAT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32_FLOAT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32_UINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R32_SINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R24G8_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;D24_UNORM_S8_UINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R24_UNORM_X8_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;X24_TYPELESS_G8_UINT&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R8G8_TYPELESS&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8_UNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8_UINT&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8_SNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8G8_SINT&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R16_TYPELESS&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16_FLOAT&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;D16_UNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16_UNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16_UINT&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16_SNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R16_SINT&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R8_TYPELESS&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8_UNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8_UINT&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8_SNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;R8_SINT&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;A8_UNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R9G9B9E5_SHAREDEXP&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;R8G8_B8G8_UNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;G8R8_G8B8_UNORM&lt;/td&gt; &lt;td&gt;16&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC1_TYPELESS&lt;/td&gt; &lt;td&gt;4&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC1_UNORM&lt;/td&gt; &lt;td&gt;4&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC1_UNORM_SRGB&lt;/td&gt; &lt;td&gt;4&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC2_TYPELESS&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC2_UNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC2_UNORM_SRGB&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC3_TYPELESS&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC3_UNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC3_UNORM_SRGB&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC4_TYPELESS&lt;/td&gt; &lt;td&gt;4&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC4_UNORM&lt;/td&gt; &lt;td&gt;4&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC4_SNORM&lt;/td&gt; &lt;td&gt;4&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC5_TYPELESS&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC5_UNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC5_SNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;B8G8R8A8_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;B8G8R8A8_UNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;B8G8R8A8_UNORM_SRGB&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;B8G8R8X8_TYPELESS&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;B8G8R8X8_UNORM&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;B8G8R8X8_UNORM_SRGB&lt;/td&gt; &lt;td&gt;32&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;o&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC6H_TYPELESS&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC6H_UF16&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC6H_SF16&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;BC7_TYPELESS&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC7_UNORM&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt; &lt;td&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;BC7_UNORM_SRGB&lt;/td&gt; &lt;td&gt;8&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;&lt;/td&gt; &lt;td&gt;X&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Buffer  &lt;/li&gt;&lt;li&gt;Input Assembler Vertex Buffer  &lt;/li&gt;&lt;li&gt;Input Assembler Index Buffer  &lt;/li&gt;&lt;li&gt;Stream Output Buffer  &lt;/li&gt;&lt;li&gt;Texture1D  &lt;/li&gt;&lt;li&gt;Texture2D  &lt;/li&gt;&lt;li&gt;Texture3D  &lt;/li&gt;&lt;li&gt;TextureCube  &lt;/li&gt;&lt;li&gt;Shader ld  &lt;/li&gt;&lt;li&gt;Shader sample (any filter)  &lt;/li&gt;&lt;li&gt;Shader sample_c (comparison filter)  &lt;/li&gt;&lt;li&gt;Shader sample (mono 1-bit filter)  &lt;/li&gt;&lt;li&gt;Shader gather4  &lt;/li&gt;&lt;li&gt;Shader gather4_c  &lt;/li&gt;&lt;li&gt;Mipmap  &lt;/li&gt;&lt;li&gt;Mipmap Auto-Generation  &lt;/li&gt;&lt;li&gt;RenderTarget  &lt;/li&gt;&lt;li&gt;Blendable RenderTarget  &lt;/li&gt;&lt;li&gt;Depth/Stencil Target  &lt;/li&gt;&lt;li&gt;Raw UAV and SRV  &lt;/li&gt;&lt;li&gt;Structured UAV and SRV  &lt;/li&gt;&lt;li&gt;Typed UAV  &lt;/li&gt;&lt;li&gt;UAV Typed Store  &lt;/li&gt;&lt;li&gt;UAV Typed Load  &lt;/li&gt;&lt;li&gt;UAV Atomic Add  &lt;/li&gt;&lt;li&gt;UAV Atomic Bitwise Ops  &lt;/li&gt;&lt;li&gt;UAV Atomic Cmp Store or Cmp Exch  &lt;/li&gt;&lt;li&gt;UAV Atomic Exchange  &lt;/li&gt;&lt;li&gt;UAV Atomic Signed Min or Max  &lt;/li&gt;&lt;li&gt;UAV Atomic Unsigned Min or Max  &lt;/li&gt;&lt;li&gt;CPU Lockable  &lt;/li&gt;&lt;li&gt;4x Multisample RenderTarget  &lt;/li&gt;&lt;li&gt;8x Multisample RenderTarget  &lt;/li&gt;&lt;li&gt;Other Multisample Count RT  &lt;/li&gt;&lt;li&gt;Multisample Resolve  &lt;/li&gt;&lt;li&gt;Multisample Load  &lt;/li&gt;&lt;li&gt;Display Scan-Out  &lt;/li&gt;&lt;li&gt;Cast Within Bit Layout&amp;nbsp;&lt;/li&gt;&lt;/ol&gt;A API for getting supported formats is ID3D11Device::CheckFormatSupport..&lt;br /&gt;&lt;br /&gt;Would be good to write a program for checking the formats supported by&amp;nbsp;AMD and&amp;nbsp;Nvidia..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In CUDA 3.0 you have:&lt;br /&gt;&lt;br /&gt;1, 2 or 4 components:&lt;br /&gt;*Signed or unsigned 8-, 16- or 32-bit integers (18)&lt;br /&gt;*16-bit floats (currently only supported through the driver (6)&lt;br /&gt;API), or 32-bit floats&lt;br /&gt;24 tex formats&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For CUDA-GL interop (from forums):&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;works for FP textures:&lt;br /&gt;XXXX = R,RG,RGB or RGBA&lt;br /&gt;YY      = 16 or 32&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;i.e. 8 FP formats&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: 17px; line-height: 27px;"&gt;works for integer texes:&lt;br /&gt;XXXX = R,G,RGB or RGBA&lt;br /&gt;YY      = 8,16 or 32&lt;br /&gt;ZZ      = I or UI&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: 17px; line-height: 27px;"&gt;i.e. 24 FP formats&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="font-size: 17px; line-height: 27px;"&gt;depth renderbuffers doesn't work I don't know if color renderbuffers work I assume yes at least for CUDA 3.0 final..&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;use:&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;glGenTextures(1,&amp;amp;tex);&lt;br /&gt;glBindTexture(GL_TEXTURE_2D , tex);&lt;br /&gt;glTexImage2D(GL_TEXTURE_2D , 0 , GL_XXXXYYF , width , height , 0 , GL_RGBA , GL_FLOAT , 0);&lt;br /&gt;glTexParameteri(GL_TEXTURE_2D , GL_TEXTURE_MIN_FILTER , GL_NEAREST);&lt;br /&gt;cudaGraphicsGLRegisterImage (&amp;amp;resource , tex , GL_TEXTURE_2D , cudaGraphicsMapFlagsNone);&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;for integer texes change to that:&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 20px; line-height: 32px; white-space: pre;"&gt;glTexImage2D(GL_TEXTURE_2D , 0 , GL_XXXXYYZZ , width , height , 0 , GL_RGBA_INTEGER , GL_UNSIGNED_BYTE , 0);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 20px; line-height: 32px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;notes:&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 20px; line-height: 32px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #222222; font-family: Verdana, Tahoma, Arial, 'Trebuchet MS', sans-serif, Georgia, Courier, 'Times New Roman', serif; font-size: 17px; line-height: 27px;"&gt;Notice that it is important to set the minification filter to GL_NEAREST.&lt;br /&gt;In conclusion, it looks like the cudaGraphicsGL interface is working for most formats, excluding normalized internal formats such as the commonly used GL_RGBA8 format.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 20px; line-height: 32px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #222222; font-family: Verdana, Tahoma, Arial, 'Trebuchet MS', sans-serif, Georgia, Courier, 'Times New Roman', serif; font-size: 17px; line-height: 27px;"&gt;cuda GL allows to use RGB texes altough CUDA seems not from DOC!&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 17px; line-height: 27px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #465584; font-family: Courier, 'Courier New', Verdana, Arial; font-size: 20px; line-height: 32px; white-space: pre;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; color: black; font-family: 'Times New Roman'; font-size: medium; line-height: normal; white-space: normal;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 1px; -webkit-border-vertical-spacing: 1px; color: #222222; font-family: Verdana, Tahoma, Arial, 'Trebuchet MS', sans-serif, Georgia, Courier, 'Times New Roman', serif; font-size: 17px; line-height: 27px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;OCL DX interop for Nvidia:&lt;br /&gt;&lt;br /&gt;------------------------------------------------------------------&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI Format &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;cl_channel_order &amp;nbsp;cl_channel_type &amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;------------------------------ &amp;nbsp; ---------------- &amp;nbsp;---------------&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32G32B32A32_FLOAT &amp;nbsp; CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_FLOAT&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32G32B32A32_UINT &amp;nbsp; &amp;nbsp;CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNSIGNED_INT32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32G32B32A32_SINT &amp;nbsp; &amp;nbsp;CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SIGNED_INT32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16B16A16_FLOAT &amp;nbsp; CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_HALF_FLOAT&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16B16A16_UNORM &amp;nbsp; CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNORM_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16B16A16_UINT &amp;nbsp; &amp;nbsp;CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNSIGNED_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16B16A16_SNORM &amp;nbsp; CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SNORM_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16B16A16_SINT &amp;nbsp; &amp;nbsp;CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SIGNED_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8B8A8_UNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNORM_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8B8A8_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNSIGNED_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8B8A8_SNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SNORM_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8B8A8_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RGBA &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SIGNED_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32G32_FLOAT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_FLOAT&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32G32_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNSIGNED_INT32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32G32_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SIGNED_INT32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16_FLOAT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_HALF_FLOAT&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16_UNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNORM_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNSIGNED_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16_SNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SNORM_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16G16_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SIGNED_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8_UNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNORM_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_UNSIGNED_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8_SNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SNORM_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8G8_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_RG &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_SIGNED_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32_FLOAT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_FLOAT&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_UNSIGNED_INT32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R32_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_SIGNED_INT32&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16_FLOAT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_HALF_FLOAT&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16_UNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_UNORM_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_UNSIGNED_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16_SNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_SNORM_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R16_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_SIGNED_INT16&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8_UNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_UNORM_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8_UINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_UNSIGNED_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8_SNORM &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_SNORM_INT8&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DXGI_FORMAT_R8_SINT &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_R &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;CL_SIGNED_INT8&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;OCL supported textures see my program:&lt;br /&gt;clGetSupportedImageFormats&lt;br /&gt;use&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&amp;nbsp;void getimageinfo(cl_context context,cl_mem_flags m,cl_mem_object_type te)&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;{&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;   &lt;/span&gt;size_t num_entries; &amp;nbsp;cl_image_format *image_formats;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;   &lt;/span&gt;cl_int status=clGetSupportedImageFormats (context,m,te,0,NULL,&amp;amp;num_entries);&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;   &lt;/span&gt;if(status==CL_SUCCESS&amp;amp;&amp;amp;num_entries&amp;gt;0)&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;  &lt;/span&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;   &lt;/span&gt;{&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;image_formats=(cl_image_format*)malloc(num_entries*sizeof(cl_image_format));&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;status=clGetSupportedImageFormats (context,m,te,num_entries,image_formats,NULL);&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;if(status==CL_SUCCESS)&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;{&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;int o,t;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;int i,j;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;cl_int orders[]={CL_R, &amp;nbsp;CL_A,CL_INTENSITY, CL_LUMINANCE,CL_RG, &amp;nbsp;CL_RA,CL_RGB,CL_RGBA,CL_ARGB, CL_BGRA};&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;char &amp;nbsp;*or[]={"CL_R", &amp;nbsp;"CL_A","CL_INTENSITY", "CL_LUMINANCE","CL_RG", &amp;nbsp;"CL_RA","CL_RGB","CL_RGBA","CL_ARGB", "CL_BGRA"};&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;cl_int types[]={&lt;/blockquote&gt;&lt;blockquote&gt;CL_SNORM_INT8 , CL_SNORM_INT16, CL_UNORM_INT8, CL_UNORM_INT16, CL_UNORM_SHORT_565, CL_UNORM_SHORT_555, CL_UNORM_INT_101010,CL_SIGNED_INT8,&lt;/blockquote&gt;&lt;blockquote&gt;CL_SIGNED_INT16, &amp;nbsp;CL_SIGNED_INT32, CL_UNSIGNED_INT8, CL_UNSIGNED_INT16, CL_UNSIGNED_INT32, CL_HALF_FLOAT, CL_FLOAT};&lt;/blockquote&gt;&lt;blockquote&gt;char * tt[]={"CL_SNORM_INT8" ,"CL_SNORM_INT16","CL_UNORM_INT8","CL_UNORM_INT16","CL_UNORM_SHORT_565","CL_UNORM_SHORT_555","CL_UNORM_INT_101010",&lt;/blockquote&gt;&lt;blockquote&gt;"CL_SIGNED_INT8","CL_SIGNED_INT16","CL_SIGNED_INT32","CL_UNSIGNED_INT8","CL_UNSIGNED_INT16","CL_UNSIGNED_INT32","CL_HALF_FLOAT","CL_FLOAT"};&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;for(i=0; i&lt;num_entries; i++)=""&gt;&lt;/num_entries;&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;{&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;for(j=0; j&lt;sizeof(orders)/sizeof(orders[0]); j++)=""&gt;&lt;/sizeof(orders)/sizeof(orders[0]);&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;{&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;       &lt;/span&gt;if(image_formats[i].image_channel_order==orders[j])&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;        &lt;/span&gt;o=j;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;}&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;for(j=0; j&lt;sizeof(types)/sizeof(orders[0]); j++)=""&gt;&lt;/sizeof(types)/sizeof(orders[0]);&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;{&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;       &lt;/span&gt;if(image_formats[i].image_channel_data_type==types[j])&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;        &lt;/span&gt;t=j;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;}&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;      &lt;/span&gt;printf("Format %d: %s, %s\n",i,or[o],tt[t]);&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;}&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;     &lt;/span&gt;&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;}&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;    &lt;/span&gt;free(image_formats);&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;   &lt;/span&gt;}&lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;  &lt;/span&gt;}&lt;/blockquote&gt;&lt;br /&gt;AMD and Nvidia return same for all args&amp;nbsp;cl_mem_flags flags read or write only and&amp;nbsp;cl_mem_object_type image_type set to 2d or 3d.. perhaps 3d write could report 0?&lt;br /&gt;&lt;br /&gt;Nvidia:&lt;br /&gt;&lt;br /&gt;Format 0: CL_R, CL_FLOAT&lt;br /&gt;Format 1: CL_R, CL_HALF_FLOAT&lt;br /&gt;Format 2: CL_R, CL_UNORM_INT8&lt;br /&gt;Format 3: CL_R, CL_UNORM_INT16&lt;br /&gt;Format 4: CL_R, CL_SNORM_INT16&lt;br /&gt;Format 5: CL_R, CL_SIGNED_INT8&lt;br /&gt;Format 6: CL_R, CL_SIGNED_INT16&lt;br /&gt;Format 7: CL_R, CL_SIGNED_INT32&lt;br /&gt;Format 8: CL_R, CL_UNSIGNED_INT8&lt;br /&gt;Format 9: CL_R, CL_UNSIGNED_INT16&lt;br /&gt;Format 10: CL_R, CL_UNSIGNED_INT32&lt;br /&gt;Format 11: CL_A, CL_FLOAT&lt;br /&gt;Format 12: CL_A, CL_HALF_FLOAT&lt;br /&gt;Format 13: CL_A, CL_UNORM_INT8&lt;br /&gt;Format 14: CL_A, CL_UNORM_INT16&lt;br /&gt;Format 15: CL_A, CL_SNORM_INT16&lt;br /&gt;Format 16: CL_A, CL_SIGNED_INT8&lt;br /&gt;Format 17: CL_A, CL_SIGNED_INT16&lt;br /&gt;Format 18: CL_A, CL_SIGNED_INT32&lt;br /&gt;Format 19: CL_A, CL_UNSIGNED_INT8&lt;br /&gt;Format 20: CL_A, CL_UNSIGNED_INT16&lt;br /&gt;Format 21: CL_A, CL_UNSIGNED_INT32&lt;br /&gt;Format 22: CL_RG, CL_FLOAT&lt;br /&gt;Format 23: CL_RG, CL_HALF_FLOAT&lt;br /&gt;Format 24: CL_RG, CL_UNORM_INT8&lt;br /&gt;Format 25: CL_RG, CL_UNORM_INT16&lt;br /&gt;Format 26: CL_RG, CL_SNORM_INT16&lt;br /&gt;Format 27: CL_RG, CL_SIGNED_INT8&lt;br /&gt;Format 28: CL_RG, CL_SIGNED_INT16&lt;br /&gt;Format 29: CL_RG, CL_SIGNED_INT32&lt;br /&gt;Format 30: CL_RG, CL_UNSIGNED_INT8&lt;br /&gt;Format 31: CL_RG, CL_UNSIGNED_INT16&lt;br /&gt;Format 32: CL_RG, CL_UNSIGNED_INT32&lt;br /&gt;Format 33: CL_RA, CL_FLOAT&lt;br /&gt;Format 34: CL_RA, CL_HALF_FLOAT&lt;br /&gt;Format 35: CL_RA, CL_UNORM_INT8&lt;br /&gt;Format 36: CL_RA, CL_UNORM_INT16&lt;br /&gt;Format 37: CL_RA, CL_SNORM_INT16&lt;br /&gt;Format 38: CL_RA, CL_SIGNED_INT8&lt;br /&gt;Format 39: CL_RA, CL_SIGNED_INT16&lt;br /&gt;Format 40: CL_RA, CL_SIGNED_INT32&lt;br /&gt;Format 41: CL_RA, CL_UNSIGNED_INT8&lt;br /&gt;Format 42: CL_RA, CL_UNSIGNED_INT16&lt;br /&gt;Format 43: CL_RA, CL_UNSIGNED_INT32&lt;br /&gt;Format 44: CL_RGBA, CL_FLOAT&lt;br /&gt;Format 45: CL_RGBA, CL_HALF_FLOAT&lt;br /&gt;Format 46: CL_RGBA, CL_UNORM_INT8&lt;br /&gt;Format 47: CL_RGBA, CL_UNORM_INT16&lt;br /&gt;Format 48: CL_RGBA, CL_SNORM_INT16&lt;br /&gt;Format 49: CL_RGBA, CL_SIGNED_INT8&lt;br /&gt;Format 50: CL_RGBA, CL_SIGNED_INT16&lt;br /&gt;Format 51: CL_RGBA, CL_SIGNED_INT32&lt;br /&gt;Format 52: CL_RGBA, CL_UNSIGNED_INT8&lt;br /&gt;Format 53: CL_RGBA, CL_UNSIGNED_INT16&lt;br /&gt;Format 54: CL_RGBA, CL_UNSIGNED_INT32&lt;br /&gt;Format 55: CL_BGRA, CL_UNORM_INT8&lt;br /&gt;Format 56: CL_BGRA, CL_SIGNED_INT8&lt;br /&gt;Format 57: CL_BGRA, CL_UNSIGNED_INT8&lt;br /&gt;Format 58: CL_ARGB, CL_UNORM_INT8&lt;br /&gt;Format 59: CL_ARGB, CL_SIGNED_INT8&lt;br /&gt;Format 60: CL_ARGB, CL_UNSIGNED_INT8&lt;br /&gt;Format 61: CL_INTENSITY, CL_FLOAT&lt;br /&gt;Format 62: CL_INTENSITY, CL_HALF_FLOAT&lt;br /&gt;Format 63: CL_INTENSITY, CL_UNORM_INT8&lt;br /&gt;Format 64: CL_INTENSITY, CL_UNORM_INT16&lt;br /&gt;Format 65: CL_INTENSITY, CL_SNORM_INT16&lt;br /&gt;Format 66: CL_LUMINANCE, CL_FLOAT&lt;br /&gt;Format 67: CL_LUMINANCE, CL_HALF_FLOAT&lt;br /&gt;Format 68: CL_LUMINANCE, CL_UNORM_INT8&lt;br /&gt;Format 69: CL_LUMINANCE, CL_UNORM_INT16&lt;br /&gt;Format 70: CL_LUMINANCE, CL_SNORM_INT16&lt;br /&gt;&lt;br /&gt;AMD:&lt;br /&gt;&lt;br /&gt;Format 0: CL_RGBA, CL_UNORM_INT8&lt;br /&gt;Format 1: CL_RGBA, CL_UNORM_INT16&lt;br /&gt;Format 2: CL_RGBA, CL_SIGNED_INT8&lt;br /&gt;Format 3: CL_RGBA, CL_SIGNED_INT16&lt;br /&gt;Format 4: CL_RGBA, CL_SIGNED_INT32&lt;br /&gt;Format 5: CL_RGBA, CL_UNSIGNED_INT8&lt;br /&gt;Format 6: CL_RGBA, CL_UNSIGNED_INT16&lt;br /&gt;Format 7: CL_RGBA, CL_UNSIGNED_INT32&lt;br /&gt;Format 8: CL_RGBA, CL_HALF_FLOAT&lt;br /&gt;Format 9: CL_RGBA, CL_FLOAT&lt;br /&gt;Format 10: CL_BGRA, CL_UNORM_INT8&lt;br /&gt;&lt;br /&gt;OCL-GL interop I don't know:&lt;br /&gt;for Nvidia is&amp;nbsp;either the 70 above or the CUDA-GL supported formats or the GL equivalent of CUDA interop.. I suspect the CL_RGB ones supported..&lt;br /&gt;for AMD either the CL image ones or CAL DX interop ones&lt;br /&gt;I suspect RGB formats&lt;br /&gt;AMD CAL:&lt;br /&gt;&lt;br /&gt;CAL has textures exposed and CAL DX interop would be good to explore..&lt;br /&gt; &lt;br /&gt;&lt;iframe src="https://dl.dropbox.com/u/1416327/image.html" class="source_code" style="width: 100%; height: 20em;"&gt;&lt;/iframe&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-7241786228336684121?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/7241786228336684121/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/gpgpu-image-support.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/7241786228336684121'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/7241786228336684121'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/gpgpu-image-support.html' title='GPGPU Image support!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-426459346814898952</id><published>2010-03-05T17:54:00.001+01:00</published><updated>2010-03-05T18:04:59.868+01:00</updated><title type='text'>CUDA 3.0 and Nexus in VS 2010, CUDA on FreeBSD 8.0 and much more!</title><content type='html'>Interesting Nvidia threads:&lt;br /&gt;1.Nexus: Unofficial Nexus / Visual Studio 2010 integration&lt;br /&gt;&lt;a href="http://forums.nvidia.com/index.php?showtopic=161096"&gt;http://forums.nvidia.com/index.php?showtopic=161096&lt;/a&gt;&lt;br /&gt;-&amp;gt;enables also cuda 3.x compiling with vs 2010!&lt;br /&gt;this is awesome brings vs2010rc+cuda 3.0+nexus and also project templates for cuda and nexus apps!&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;It patches the vsvars32.bat file to read "Setting environment for using Microsoft Visual Studio 2008 x86 tools" instead of "Setting environment for using Microsoft Visual Studio 2010 x86 tools" to get around nvcc's Visual C++ version detection; otherwise it fails with this message: "nvcc fatal : nvcc cannot find a supported cl version. Only MSVC 8.0 and MSVC 9.0 are supported". It also creates the vcvarsamd64.bat file to make 64-bit builds work, or otherwise nvcc files with "nvcc fatal : Visual Studio configuration file '(null)' could not be found" (see this thread).&lt;/blockquote&gt;Nexus news: no DX9 and OGL in initial release:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;DX10 is currently supported and DX11 will be available in the Beta 2 release which is scheduled for early March.DX10 and DX11 are the graphics APIs of choice for 1.0.&amp;nbsp;Interesting that you feel OpenGL is favored, as full support for OpenGL won't be in the 1.0 release. The Beta 1 was focused on Compute - The Beta 2 will be released just before GDC and will bring full DX10, DX11 debugging and profiling into Visual Studio. This is a dream for game and graphics developers - perfhud on steroids. OpenGL support will come out *sometime* in late 2010. Pro version: (paid version)&amp;nbsp;In addition to premium support, platform analysis (cpu+gpu correlated timeline) and advanced debugging capabilities will be available only in the pro version. &lt;/blockquote&gt;2.Feature Request: Support simultaneous native and CUDA debugging&lt;br /&gt;&lt;blockquote&gt;I have noticed that it is not possible to debug CUDA and native code simultaneously on the same Visual Studio instance. I tried starting debugging through the Start CUDA Debugging option and then attaching the native debugger to the running process (inserting a 10 second sleep at the start of the program helped make this easier), but as soon as a breakpoint on a CUDA kernel is hit Visual Studio freezes.&lt;/blockquote&gt;&lt;blockquote&gt;I've been able to debug native and CUDA code on the same process simultaneously by having two Visual Studio instances open, and it works very well. I think it would be very valuable to be able to step through native code and device code on the same session, much like mixed debugging works with .NET.&lt;/blockquote&gt;&lt;br /&gt;2. Eclipse Plugin for CUDA and QT development&lt;br /&gt;&lt;br /&gt;http://forums.nvidia.com/index.php?showtopic=160564&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;we developed a plugin for Eclipse, which comortably allows CUDA and QT development. It provides three toolchains, which can be used to compile CUDA and/or QT sources.&lt;br /&gt;&lt;br /&gt;Features include:&lt;br /&gt;&lt;br /&gt;- Error Parsing&lt;br /&gt;- Dependency Calculation&lt;br /&gt;- Automatic invocation of all tools&lt;br /&gt;- ...&lt;br /&gt;&lt;br /&gt;http://www.ai3.uni-bayreuth.de/software/eclipsecudaqt/index.php&lt;/blockquote&gt;Fastest CUDA&amp;nbsp;reduction code to date! (following news last week of fastest matmul for GT200 and for AMD in C like language)&lt;br /&gt;http://forums.nvidia.com/index.php?showtopic=160196&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;My simple but speedy reduction code (runs 106.4GB/s on GTX 295), 106.4/111.9=95.1% to the peak bandwidth good reduction code 5ms for 150m integers.&lt;br /&gt;Testing with different input size I can see that your code is significantly slower if size is less than 16M, about the same speed with 32M and faster with more than 32M on the GTX 260.&lt;br /&gt;Seems my code can beat SDK recution on every input size provided that the parameters M and K are properly choosed. Here is a detailed result for different M and K on different input sizes, and the performance for the SDK reduction with the same sizes are also listed.&lt;br /&gt;&amp;nbsp;gtx 295 1 core.&lt;br /&gt;[size=~512K] &lt;br /&gt;My code (M=240, N=64, K=34): 60.3GB/s (23.3% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;19): 48.9GB/s&lt;br /&gt;[size=~1M]&lt;br /&gt;My code (M=240, N=64, K=69): 76.0GB/s (15.8% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;20): 65.6GB/s&lt;br /&gt;[size=~2M]&lt;br /&gt;My Code (M=240, N=64, K=137): 86.6GB/s (9.3% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;21): 79.2GB/s&lt;br /&gt;[size=~4M]&lt;br /&gt;My Code (M=240, N=64, K=273): 94.2GB/s (5.8% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;22): 89.0GB/s&lt;br /&gt;[size=~8M]&lt;br /&gt;My Code (M=240, N=64, K=546): 99.5GB/s (5.0% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;23): 94.8GB/s&lt;br /&gt;[size=~16M]&lt;br /&gt;My Code (M=240, N=64, K=1092): 103.1GB/s (5.5% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;24): 97.7GB/s&lt;br /&gt;[size=~32M]&lt;br /&gt;My Code (M=240, N=64, K=2184): 104.9GB/s (6.3% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;25): 98.7GB/s&lt;br /&gt;[size=~64M]&lt;br /&gt;My Code (M=480, N=64, K=2184): 105.8GB/s (7.3% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;26): 98.6GB/s&lt;br /&gt;[size=~128M]&lt;br /&gt;My Code (M=720, N=64, K=2912): 106.4GB/s (9.1% faster)&lt;br /&gt;SDK reduction (size=1&amp;lt;&amp;lt;27): 97.5GB/s&lt;/blockquote&gt;*cuda_wrapper&lt;br /&gt;The CUDA wrapper library provides means for an efficient resource sharing and resource protection on multi-user GPU clusters.It implements the following functionality:1) Virtualization of the physical GPU devices2) Ensuring NUMA affinity for GPUs &lt;br /&gt;&lt;a href="http://sourceforge.net/projects/cudawrapper/"&gt;http://sourceforge.net/projects/cudawrapper/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;It's suposed to show that allocating resources and freeing and allocating new ones show mem intact as left by last object so no privacy in this sense!&lt;br /&gt;&lt;br /&gt;*Seems depth buffers/renderbuffers are not supported by GL interop in CUDA 3.1&lt;br /&gt;&lt;br /&gt;so no tex where format is GL_DEPTH_COMPONENT32 in&lt;br /&gt;cudaGraphicsGLRegisterImage (&amp;amp;resource, tex , GL_TEXTURE_2D , cudaGraphicsMapFlagsNone);&lt;br /&gt;Also remember this is a post showing current color formats seems to be R,RG,RGB,RGBA in float,float16 and uint8 formats more or less similar to published OpenCL DX interop formats.. Good to write a tool that writes current formats&amp;nbsp; on OpenCL as there is a function for it for seeing if DX interop disables some formats on Nnvida hard at least.&lt;br /&gt;&lt;br /&gt;*cuda on freebsd 8.0!&lt;br /&gt;Inter-kernel communication is not supported under pain of me glaring at you really hard. &lt;br /&gt;The recipe is:&lt;br /&gt;FreeBSD 8.0 + NVidia driver 195.22 + CUDA 3.0&lt;br /&gt;Also linprocfs and linsysfs should be mounted&lt;br /&gt;&lt;br /&gt;uname -a&lt;br /&gt;FreeBSD av429635.oops 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sun Feb 7 17:30:12 MSK 2010 root@av429635.oops:/usr/src/sys/i386/compile/ALECN80 i386&lt;br /&gt;&lt;br /&gt;mount /compat/linux/proc/&lt;br /&gt;mount /compat/linux/sys/&lt;br /&gt;&lt;br /&gt;Well, not exactly "CUDA work on FreeBSD" - it's Linux program that use Linux libs under Linuxlator on FreeBSD that works.&lt;br /&gt;Also I didn't try to compile Cuda programs yet, I've just launched programs pre-compiled on Linux (Debian)&lt;br /&gt;&lt;br /&gt;also seems 190 drivers showed info with cudadeviceinfodrv but not created context.&lt;br /&gt;&lt;br /&gt;*ipad: cpu is cortex a8 1ghz 1 core (same as 3gs) but stripped&lt;br /&gt;gpu is power sgx variant.. but slow for pixel resolution (perhaps is 535 or 530 worse or 540 I doubt so)&lt;br /&gt;so tegra2 is a lot better in cpu and gpus side seems&lt;br /&gt;perhaps&amp;nbsp;flash does`'t work by custom gpu altough using PowerVr IP.. as OMAP3 or 4 has been shown with Flash 10.1 video acceleration in MWC..&lt;br /&gt;&lt;br /&gt;*"Optimus Works Perfectly With Intel Wireless Display (WiDi)"&lt;br /&gt;&lt;br /&gt;A perfect&amp;nbsp;notebook must have it!&lt;br /&gt;&amp;nbsp;Still in WiDi mode you lose&amp;nbsp;3d 120hz via HDMI and also it hasn't HDCP so no Bluray..&lt;br /&gt;I hope next Widi has HDCP and also HDMI 1.4 so 3d also work but will require double bandwith and seems to stress current wifi..&lt;br /&gt;My question is with Optimus where Nvidia sends to Intel IGP&amp;nbsp;if it&amp;nbsp;have teoretically 3d screen built in with 120hz will work Nvidia 3d Vision and what about&amp;nbsp;if it has&amp;nbsp;DVI dual link output and I connect to 3d 120hz display? I suspect the answer is the same at least the technical hurdles seem to be.. and I think correctly is hard as is a PCI Express transfer and seems 1Gbytes/s is currently used for 60hz? so at least this would put more streess but entirelly doable if Intel IGP recognizes special 120hz modes of LCD and acts acordingly..&lt;br /&gt;Also all requires Windows 7 (Optimus requires as it has two graphic drivers different IHVs at the same time and Widi seems to require 7 x64)&lt;br /&gt;Also will work with macbookpro optimus laptops widi? It would require support from Intel as is using some MyWifi tech so must see.. Perhaps Apple waits for LightStage optical video outputs no wireless tech..&lt;br /&gt;A dream notebook in graphics must have a d3d11 with 3d (so Fermi)&amp;nbsp;also with standard 3d outputs so HDMI 1.4 outputs and 3d 120hz builtin screen and Optimus and possibly Widi&amp;nbsp;better at least with HDCP support.. let's wait how long it takes to arrive to that I hope least than a year.. &lt;br /&gt;*Optimus has nvgpustateviewer tool that shows if Nvidia GPU is activated or not. Where to download?&lt;br /&gt;&lt;br /&gt;intel widi no hdcp so no bluray viewing of course not 3d but optimus compatible now&lt;br /&gt;similar a PERFECT 3D PROJECTOR&lt;br /&gt;*720p at least&lt;br /&gt;*Broad 3d support:&amp;nbsp;3d via hdmi 1.4, dlplink, 3d vision compatible&lt;br /&gt;*hdcp support &lt;br /&gt;so it can output 3d vision, PS3 3d games (HDMI 1.4) and Bluray 3d(HDMI 1.4+hdcp)&lt;br /&gt;Now Acer and Viewsonic support all but HDMI 1.4.. so no ps3+bd3d support..&lt;br /&gt;&lt;br /&gt;current projector is hdcp so bluray and 3d 120hz via duallink dvi or hdmi so no hdmi 1.4 3d spec support for projecting ps3 games bluray players output,etc.. &lt;br /&gt;&lt;br /&gt;iz3d 1.11 coming soon using catalyst 10.3 3d hooks for better multimon support (3d vision surround?) and possibly crossfire and also bringing d3d10 support for games&lt;br /&gt;&lt;br /&gt;HYDRA in AMD chipset shown with GTX275+5870 are using improved Hydra 1.5 driver with better Mix mode.. it would be interesting to see how perf and compatibilty improves over time (i.e. see the hardware potential once all software issues remain solved/tuned..)&lt;br /&gt;&lt;br /&gt;Regarding Widi:&lt;br /&gt;"The software drivers that work with Intel® Wireless Display only apply to Microsoft Windows 7 64-bit*. &lt;br /&gt;Intel® PROSet/Wireless WiFi Connection Utility for Windows 7 64-Bit for Intel Wireless Display&lt;br /&gt;Requires special Proset driver:&lt;br /&gt;Wireless Driver: &lt;br /&gt;Drivers and management software for Microsoft Windows 7 64-bit OS*.&lt;br /&gt;&amp;nbsp;NOTES: &lt;br /&gt;http://www.intel.com/support/wireless/wtech/iwd/sb/CS-031109.htm&lt;br /&gt;&lt;br /&gt;-The ZIP file is provided with Intel® My WiFi Technology enabled.&lt;br /&gt;-Intel® My WiFi Technology has the following requirements:&lt;br /&gt;-Intel® Centrino® Ultimate-N 6300, Intel® Centrino® Advanced-N 6200, Intel® Centrino® Advanced-N + WiMAX 6250, Intel® WiFi Link 1000, Intel® WiFi Link 5300, or Intel® WiFi Link 5100&lt;br /&gt;-Minimum of Intel® PROSet/Wireless WiFi Connection Utility 13.0.0.0 on Microsoft Windows 7*&lt;br /&gt;NOTE: Intel® Wireless Display requires one of the following products:&lt;br /&gt;-Intel® Centrino® Ultimate-N 6300&lt;br /&gt;-Intel® Centrino® Advanced-N 6200&lt;br /&gt;-Intel® Centrino® Advanced-N+WiMAX 6250&lt;br /&gt;NOTE: Features removed from this version:&lt;br /&gt;Wake on Wireless LAN is not present in this version of the application. &lt;br /&gt;the Intel® My WiFi Technology application is not supported for Windows Vista. This feature is available on Windows 7 only.&lt;br /&gt;For the latest driver for the Intel® PROSet/Wireless WiFi Connection Utility (for Intel® Centrino® Advanced-N 6200). Intel recommends that you use the latest drivers for best performance.&lt;br /&gt;&lt;br /&gt;intel media sdk 1.5rc&lt;br /&gt;&lt;br /&gt;See http://software.intel.com/en-us/articles/intel-media-software-development-kit-intel-media-sdk/&lt;br /&gt;&lt;br /&gt;Its going to support Intel Media SDK H.264 MVC codec of 3D Bluray either via GPU video processors or if they not support via optimized multithreaded SSE enabled code..&lt;br /&gt;Also similar to CPU h.264 encoding support is going to be a 3D MVC encoder?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;*Shader Model 5 (see Shader Model 5) vs OpenCL kernels:&lt;br /&gt;&lt;br /&gt;Common (more or less):&lt;br /&gt;Doubles with denorms &lt;br /&gt;Reduced-precision reciprocal &lt;br /&gt;Shader conversion instructions - fp16 to fp32 and vice versa&lt;br /&gt;Structured buffer, which is a new type of buffer containing structured elements. &lt;br /&gt;of which some things not present in OpenCL kernels&lt;br /&gt;Resinfo on buffers &lt;br /&gt;Count bits set instruction &lt;br /&gt;Find first bit set instruction &lt;br /&gt;Carry/Overflow handling &lt;br /&gt;Bit reversal instructions for FFTs &lt;br /&gt;Conditional Swap intrinsic &lt;br /&gt;Also Dispatch indirect&lt;br /&gt;remember it's about reading from GPU buffer the grid size to launch still requires CPU to launch the kernel..&lt;br /&gt;but I doesn't require reading about 3 integers of grid which being so much size the PCI transaction still would be 1k? and add a lot of latency and add a CPU GPU synch point.. remember still no block size&lt;br /&gt;at runtime kernel must be compiled for a fixed block size.&lt;br /&gt;it' a evolution(?)&amp;nbsp; of&amp;nbsp;Draw Indirect - Direct3D 10 implements DrawAuto, which takes content (generated by the GPU) and renders it (on the GPU). Direct3D 11 generalizes DrawAuto so that it can be called by a Compute Shader using DrawInstanced and DrawIndexedInstanced.&lt;br /&gt;&lt;br /&gt;*&amp;nbsp;gDEBugger CL is a new and exciting product; it brings all of gDEBugger's Debugging and Profiling capabilities to the OpenCL developer's world. gDEBugger CL, now in beta testing, supports all OpenCL implementations on Windows, Mac OS X and Linux. The upcoming gDEBugger iPhone version includes on-device debugging and profiling abilities, running in real-time and letting developers optimize their game on the actual iPhone device. gDEBugger iPhone displays invaluable inside information such as iPhone's GPU, CPU, graphic driver and operating system performance counters.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-426459346814898952?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/426459346814898952/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/cuda-30-and-nexus-in-vs-2010-cuda-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/426459346814898952'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/426459346814898952'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/cuda-30-and-nexus-in-vs-2010-cuda-on.html' title='CUDA 3.0 and Nexus in VS 2010, CUDA on FreeBSD 8.0 and much more!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-1700485379452750914</id><published>2010-03-03T03:09:00.001+01:00</published><updated>2010-03-03T03:10:52.872+01:00</updated><title type='text'>New in Nvidia 196.75 drivers!</title><content type='html'>opencl now has d3d interop:&lt;br /&gt;cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_nv_d3d11_sharing&lt;br /&gt;cl gl get info khr now?&lt;br /&gt;&lt;br /&gt;new ogl extensions (dx10.1 and 11)&lt;br /&gt;&lt;br /&gt;nv ones:&lt;br /&gt;GL_NV_vertex_attrib_64bit&lt;br /&gt;GL_NVX_gpu_memory_info&lt;br /&gt;GL_NV_gpu_shader5&lt;br /&gt;GL_NV_gpu_program5&lt;br /&gt;GL_NV_gpu_program_fp64&lt;br /&gt;GL_NV_gpu_program4_1&lt;br /&gt;&lt;br /&gt;present in cat 10.3 also&lt;br /&gt;GL_EXT_gpu_shader_fp64&lt;br /&gt;GL_EXT_texture_buffer_object_rgb32&lt;br /&gt;GL_EXT_tessellation_shader&lt;br /&gt;GL_EXT_shader_subroutine&lt;br /&gt;GL_EXT_gpu_shader5&lt;br /&gt;GL_EXT_texture_compression_bptc&lt;br /&gt;&lt;br /&gt;nv present:&lt;br /&gt;GL_EXT_draw_indirect&lt;br /&gt;GL_EXT_shader_image_load_store&lt;br /&gt;GL_EXT_vertex_attrib_64bit&lt;br /&gt;GL_EXT_transform_feedback3&lt;br /&gt;GL_EXT_transform_feedback2&lt;br /&gt;&lt;br /&gt;amd present:&lt;br /&gt;GL_EXT_shader_atomic_counters &lt;br /&gt;GL_AMD_conservative_depth&lt;br /&gt;&lt;br /&gt;good all 195 drivers nv_&lt;br /&gt;texture_buffer_object_rgb32&lt;br /&gt;tessellation_shader&lt;br /&gt;shader_subroutine&lt;br /&gt;&lt;br /&gt;migrated to ext now all functionality is ext_ stuff!&lt;br /&gt;nv lacks ext_shader_atomic_counters&lt;br /&gt;and amd well the most imp:&lt;br /&gt;GL_EXT_draw_indirect&lt;br /&gt;GL_EXT_shader_image_load_store&lt;br /&gt;and perhaps&lt;br /&gt;GL_EXT_vertex_attrib_64bit&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-1700485379452750914?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/1700485379452750914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/new-in-nvidia-19675-drivers.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1700485379452750914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1700485379452750914'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/new-in-nvidia-19675-drivers.html' title='New in Nvidia 196.75 drivers!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-1393544936784313713</id><published>2010-03-03T03:08:00.002+01:00</published><updated>2010-03-04T23:20:46.691+01:00</updated><title type='text'>GPU computing in a browser, and other news..</title><content type='html'>Seems just after WebGL brings GPU graphics APIs to browsers (OpenGL ES) people are&lt;br /&gt;asking about WebCL bringing similar to OpenCL to browser..&lt;br /&gt;&amp;nbsp; &lt;br /&gt;1.One way is via webgl using the old GPGPU using Graphics API tricks..&lt;br /&gt;&lt;br /&gt;see here for a matmul running on GPU-&amp;gt;learningwebgl.com&lt;br /&gt;&lt;br /&gt;2.In the meantime seems DirectX11 plugin coming to browsers&lt;br /&gt;"DirectX 11 3D Games coming to Browsers with Vision Engine 8"&lt;br /&gt;this should allow compute shaders on a browser! altough not mentioned only tesselation is mentioned..&lt;br /&gt;&lt;br /&gt;3. Search Jetpack CUDA plugin..&lt;br /&gt;http://mozillalabs.com/jetpack/2010/01/25/elevating-javascript-performance-through-gpu-power/&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;4.nacl gpu computing opencl. Should be possible with Google NaCl for Chrome via OpenCL,CUDA ,etc.. ?&lt;br /&gt;&lt;br /&gt;WebGL support for QtWebKit:&lt;br /&gt;&lt;blockquote&gt;Related is Webkit in QT trunk has WebGL support working on N900.&lt;br /&gt;Also a new backend for audio/video elements using the Qt Multimedia framework and initial work on &lt;br /&gt;WebGL support.&lt;br /&gt;&lt;br /&gt;NOTES:&lt;br /&gt;&lt;br /&gt;* Works only when accelerated composition is not enabled in compilation&lt;br /&gt;* Added --webgl command line switch to QGVLauncher, added toggle button to&lt;br /&gt;QtLauncher&lt;br /&gt;Why GraphicsLayer (accelerated composition layer) doesn't handle WebGL? Missing&lt;br /&gt;methods:&lt;br /&gt;&lt;br /&gt;* setContentsToGraphicsContext3D&lt;br /&gt;* setGraphicsContext3DNeedsDisplay&lt;br /&gt;&lt;br /&gt;I thinks adding support for content caching is subtask for this. WebGL support&lt;br /&gt;can be tested by compiling with WTF_USE_ACCELERATED_COMPOSITING=0.&lt;/blockquote&gt;WebGL and Khronos stuff:&lt;br /&gt;&lt;a href="https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/webgl-conformance-tests.html"&gt;https://cvs.khronos.org/svn/repos/registry/trunk/public/webgl/sdk/tests/webgl-conformance-tests.html&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.khronos.org/webgl/wiki/Debugging"&gt;http://www.khronos.org/webgl/wiki/Debugging&lt;/a&gt;&lt;br /&gt;&lt;a href="https://cvs.khronos.org/svn/repos/registry/trunk/public/index.php"&gt;https://cvs.khronos.org/svn/repos/registry/trunk/public/index.php&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&amp;nbsp; &lt;br /&gt;opera 10.5 released: has faster javascript to date until minefield picks new *monkey &lt;br /&gt;now concentrate on&amp;nbsp;webgl..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Related is I forgot to blog about another amazing development, the DirectWrite &amp;amp; Direct2D landing in the Minefield nightly builds (though pref'd off.)&lt;br /&gt;&lt;br /&gt;Enabling:&lt;br /&gt;1.Enter 'about:config' &lt;br /&gt;2.Click through the warning, if necessary &lt;br /&gt;3.Enter gfx.font in the 'Filter' box &lt;br /&gt;4.Double-click on 'gfx.font_rendering.directwrite.enabled' to set it to true &lt;br /&gt;5.Below this, right click and select New &amp;gt; Integer to add a pref setting &lt;br /&gt;6.Enter 'mozilla.widget.render-mode' for the preference name, 6 for the value &lt;br /&gt;7.Restart&lt;br /&gt;(To disable, set gfx.font_rendering.directwrite.enabled to false, delete mozilla.widget.render-mode, then restart.)&lt;/blockquote&gt;Old news:&lt;br /&gt;"OpenSceneGraph-2.9.6 released, introduces OpenGL ES + OpenGL 3.x support!"&lt;br /&gt;&lt;br /&gt;In March expect:&lt;br /&gt;Launch of Fermi I would want PhysX 3.0,CG 3.0,CUDA 3.0final , OGL fermi d3d11 extensions and 200 drivers with CUDA 3.1 beta and OCL new d3d extensions and 3d image writes and 3d vision windowed, youtube 3d browser support and vdpau glx interop enhancements.&lt;br /&gt;Unigine heaven linux demo&lt;br /&gt;iz3d 1.11 direct3d 10 drivers&lt;br /&gt;Events: Cebit first week, GDC second week, 15march 3rd gpgu workshoop&lt;br /&gt;Snow Leopard 10.6.3&lt;br /&gt;OpenRL public release&lt;br /&gt;Ubuntu 10.4 beta with fglrx 8.72 beta which seems allows 3rdmultiple vendors ATI and Nvidia working together (PCI arbitration) similar to in Windows and Mac..&lt;br /&gt;Joint that with an improved switcheroo patch for kernel 2.6.34 and that could possibly bring similar to Optimus on Linux.. reportedly Apple is working on Optimus tech for MacOS in next Macbook's possibly this spring.. that implies some GT2xx chip in Apple laptops.. I would hope some laptops with amd 5xxx mobility as Fermi seems long ago (well perhaps an anuncement this summer) which would bring D3D11 features but macoSX doesn't exploit's it currently barely OGL 3.x.. I hope OGL support pickup's fast on Macos and 10.6.4/5 has OGL 3.2 with ARB extensions bring parity to d3d 10.1.That would add 5xxx drivers for hackinttoshes but that can possibly come with Dual Xeon 6 cores Westmere Mac Pro coming perhaps this month also..&lt;br /&gt;&lt;br /&gt;Also raw info:&lt;br /&gt;&lt;br /&gt;Try less frequently than every Draw, e.g. after every important Release(), calling ClearState(), then Flush().&lt;br /&gt;I assume D3D runtime check reference counters while Flush() or Present() is invoked.&lt;br /&gt;yield&lt;br /&gt;I am just dipping into Direct3D and i have created a app which is able to display UYVY video using D3DXLoadSurfaceFromMemory(), StrechRect() and Present() API's.&lt;br /&gt;&lt;br /&gt;I'd like to capture and scale screenshots directly in video memory without having to rely on the CPU too much.&lt;br /&gt;&lt;br /&gt;I've looked at GetFrontBufferData, but this copies data into system memory.&lt;br /&gt;&lt;br /&gt;Any suggestions on how to go about doing this?&lt;br /&gt;&lt;br /&gt;FYI I'm brand new to the Direct3D API, so nothing is to obvious to mention.&lt;br /&gt;&lt;br /&gt;Thanks!&lt;br /&gt;&lt;br /&gt;I have found 3dlabs GLSL frontend compiler and other utilities is on orange book web page..&lt;br /&gt;&lt;br /&gt;cudaGetDeviceCount doesn't create a context. The first cudaMalloc will create a context. If you want to force context creation before a cudaMalloc, use cudaFree(0).&lt;br /&gt;mira dc5.0 code para mul24 i IL en gpu shader analyzer para ver que pasa tambien&lt;br /&gt;mads con float, integer i double&lt;br /&gt;&lt;br /&gt;cg 2.2 gs_simple uses glsl geo shader and works with ati! so seems this would work CG on MacOSX on Nvidia and ATI also (remember Nvidia doesn't worked on Mac)..&lt;br /&gt;this seems to show example to use that GLSL geo shader support on other programs (only this works).. also this would be final cg release and now waiting cg 3.0 with domain and hull shaders and fermi instructions..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-1393544936784313713?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/1393544936784313713/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/gpu-computing-in-browser-and-other-news.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1393544936784313713'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1393544936784313713'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/gpu-computing-in-browser-and-other-news.html' title='GPU computing in a browser, and other news..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-5306176980567683020</id><published>2010-03-01T20:52:00.003+01:00</published><updated>2010-03-01T23:10:25.097+01:00</updated><title type='text'>New findings and questions..</title><content type='html'>Regarding DX IL:&lt;br /&gt;Well I can only generate with fxc, right?.. also seems I can't feed DX IL to DX via fxc or D3DCompile or CreateComputeshader? seems no.. then what's is for excluding IHVs for doing drivers for it as base.. so no IL modification and compiling from that?ATI SKA also gets it but doesn't generate from it..&lt;br /&gt;Also is DX IL spec public or anywhere avaiable?&lt;br /&gt;&lt;br /&gt;Regarding OGL-DX interop trough OCL:&lt;br /&gt;having new DX extensions for OCL Nvidia published only and AMD shipping is possible to&lt;br /&gt;use for OGL-DX interop? (using createcontex with cl_context_properties having both ogl context and d3d context stuff)&lt;br /&gt;It will work someday? one vendor at least? ogl extension says can be possible..&lt;br /&gt;also what about wgl_dx_interop is going to be supported on Vista/7 and d3d9,10,11..&lt;br /&gt;going to be introduced (at least spec txt) in&amp;nbsp; fermi gl extensions this month? &lt;br /&gt;&lt;br /&gt;Regardinng OCL binaries&lt;br /&gt;Found AMD OpenCL 2.01 supports binaries (both CPU and GPU targets) getting and building from that altough AMD release notes list that as a lacking feature..&lt;br /&gt;perhaps since 2.0.. &lt;br /&gt;target CPU binary should be cross CPU i.e. work with all CPUs (AMD,INtel) across generations.. even Atoms..&lt;br /&gt;there is a flag for only SSE2 requirement obviating current sse3 it will generate only sse2 code and run even on p4?..&lt;br /&gt;GPU support is good but worse than Nvidia first binary chars are CLBC (cl byte code? similar to DXBC) and has assembly device code so I use 5xxx will not work on 4xxx would be better AMD IL so would work on all GPUs supported..&lt;br /&gt;well at least seems that OCL generates AMD IL v2 in my 5xxx and I don't know if this works on 4xxx..&lt;br /&gt;Also seems ELF binary and also has other info than code so you can't modify code as some headers will show code size etc..&lt;br /&gt;How OCL GPU binaries compare to ELF CAL binaries with Calclassemble?..&lt;br /&gt;Are the formats&amp;nbsp; going to be published simiar to CAL ELF binaries.. well at least they were some time ago but I don't know if they are up to date or possible now that seems device assmebly is not possible or at least not supported officialy on 5xxx..&lt;br /&gt;Also remember Nvidia gets PTX so should work current OCL binaries with Fermi acording to Fermi compatiblity guide..&lt;br /&gt;also straight ptx allows modificating code.. possible but spec 1.5 still not published (this month?) &lt;br /&gt;Anyway I didn't mention last time but with decuda git now having most GT 200 arch instructions (SM 1.3) you teoretically could write a CUDA wrapper that intercept cubin and using decuda get PTX which you feed to CUDA stack.. don't know why Nvidia doesn't do that.. well they must have reason regarding precision,&lt;br /&gt;mul24 is not native instruction,etc..&lt;br /&gt;&lt;br /&gt;I have ported/fixed also swan to windows and added better opencl translation from cuda kernels..&lt;br /&gt;Trying to get CAL++&amp;nbsp; fiexs for windows also..&lt;br /&gt;&lt;br /&gt;Todays news:&lt;br /&gt;*cebit: Geforce 480 boxes show 1.5gb ram 8pin+6pin connector..&lt;br /&gt;ATI competition will be a 950mhz 5000mhz 5870 and 5970 with 4gb at 850mhz &lt;br /&gt;also seems a Computex Dual Fermi possible by Asus..&lt;br /&gt;*http://www.geosenseforwindows.com/ supplies a sensor driver for Windows for using location apis&lt;br /&gt;gives a demo google maps enabled.. works with weather gadget..&lt;br /&gt;Then I hope QT Location API in mobilty&amp;nbsp; pack has win7 location api support..&lt;br /&gt;*cebit: gigabyte shows laptop with docking station having nvidia gtx2xx for laptops and netbook with multitouch and tablet convertible&lt;br /&gt;*&lt;a class="rsswidget" href="http://www.geeks3d.com/forums/index.php/topic,950.0.html" target="_blank" title="DirectWrite and Direct2D are Windows Vista and Windows 7 APIs for text and 2-D graphics that can be hardware accelerated.This is brand new code and there are sure to be bugs. If you'd like to help us test these changes and you're on a supported platfo... […]"&gt;Hardware  accelerated graphics and text in Firefox&lt;/a&gt; directwrite and 2d in nightly firefox for windows 7&lt;br /&gt;*glu3 soon.&lt;br /&gt;Old news:&lt;br /&gt;*Flash 10.3 beta 3 supports GPU decoding for fluid HD youtube on netbooks with GMA500 (720p) and Broadcom CrystalHD (1080p) with new gma500 and CrystalHD new drivers..&lt;br /&gt;as it's based on DXVA seem now they have proper DXVA on drivers.. it's 1 or dxva 2? i suppose 1 as it works on XP also but can be on vista uses dxva 2.0?..&lt;br /&gt;*C3DL 2.0 now WebGL and beyond&lt;br /&gt;*OpenScreenGraph 1.96 supports OGL ES 1.x and 2.0 and GL 3.x and Iphone coming soon..&lt;br /&gt;&lt;br /&gt;OCL tip:&lt;br /&gt;Images on today's hardware have caches, so you get most of the benefits of local memory without the difficulty. The caches are small (~32kB L1, ~768kB L2) so you need a lot of locality to make it work.&lt;br /&gt;Writing to images is very slow. Avoid it if you can.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-5306176980567683020?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/5306176980567683020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/03/new-findings-and-questions.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5306176980567683020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5306176980567683020'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/03/new-findings-and-questions.html' title='New findings and questions..'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-1494748099479910569</id><published>2010-02-26T20:30:00.005+01:00</published><updated>2010-02-26T21:15:47.835+01:00</updated><title type='text'>Reading Fermi CUDA stuff!</title><content type='html'>Fermi comp guide: &lt;br /&gt;CUBINs are only compatible forward up to major revision so 1.x cubins only work on Tesla arch not Fermi.. &lt;br /&gt;Nvidia 195 drivers and up support forcing JIT compilation of kernels in PTX for execution setting CUDA_FORCE_PTX_JIT, which&amp;nbsp; is a way of checking for Fermi support of CUDA programs. I.e. if executable doesn't contain PTX codes will fail..&lt;br /&gt;For optimizing for Fermi arch (since CUDA 3.0) is better to add explicitly code=compute_20 to code=compute_10 so it generates better code (?) also add sm_10 sm_20 for cubins.&lt;br /&gt;Seems that cubin are cached by the driver so generated once and survive reboots, crashes,etc.. (where are they stored?)..&lt;br /&gt;For CUDA driver API use nvcc -ptx and load using cumoduleloaddataex..&lt;br /&gt;Since CUDA 2.1 compiling with arch=sm_xx (default standard (?))&amp;nbsp; evaluates to code=sm_xx (cubin) and code=compute_xx (ptx) so PTX code is inserted..&lt;br /&gt;&lt;br /&gt;Fermi tuning guide&amp;amp; Programming guide 3.0:&lt;br /&gt;*New graphics interop API tex interop and DX11 supports: (pags 37 cudart 63 driver api)&lt;br /&gt;*interop cudart driver api(pag 72):&lt;br /&gt;-&amp;gt;allocate mem with whatever API &lt;br /&gt;-&amp;gt;if initiated context with driverapi first CUDA runtime call doesn't create context-&amp;gt;cublas cufft work from CUDA driver api&lt;br /&gt;-don't work with emulation and cuCtx{push,pop} functions..&lt;br /&gt;*Use concurrent kernels:&lt;br /&gt;check cudaGetDeviceProperties() concurrentKernels&lt;br /&gt;and use multiple streams..&lt;br /&gt;&lt;br /&gt;(up to 4 ONLY and from only one context? I supposed up to 16 as is the number of SMs so seems one kernel per Graphics processing cluter and not per SM) also from the same context invalidates running multiple parallel CUDA executables for extracting more perf (so no similar to use CPU cores running multiple single threaded apps. This is a shame as hardware has fast context switching but with bad coded CUDA program in parallel only aleviates overhead in switching it but no in parallel..&lt;br /&gt;I suppose it's software implementation issues and fixed in software in CUDA 3.x or if not would be fixed for Fermi 2 so we can run as kernels as SMs and any number on contexts in parallel altough possibly every SM can run only one context..&lt;br /&gt;*Arithmetic Instruction perf table (pag 90):&lt;br /&gt;remember tesla 8  cores per sm's and good ops execute one warp in 4 clocks so 8 inst/s per  SM.&lt;br /&gt;Fermi is 32 cores per SM. So 16 Sms.&lt;br /&gt;Note 32 bit  integer is on Fermi as good as floating point so imad=mad in perf.. must  see.&lt;br /&gt;&amp;nbsp;*All global mem and shared mem access is done per warp not half-warp as before so check all goes well.&lt;br /&gt;Shared mem is expanded to 32 banks.&lt;br /&gt;Now with cache global mem coalescing seems less a requirement and also shared mem is much better as only bank conflict are "when 2 o more threads request data in different words to same bank" i.e it has multiple words broadcast, etc.. So 8-bit access,16 bit , 32(always fast) 64bit (doubles) and even 96bit (was always)&amp;nbsp; sequential access is good now as 32bit and also 32bits with 8bit offset for example.. well I don't know if a 8 bit offset for 32 bit words is bad or not as that would require breaking every word in two banks and don't know if that is served jointly or not but I pressume runs without bank conflicts fast!&lt;br /&gt;*Similar to DC which seems to require knowing at compile time(DispatchIndirect is only for grid size?):&lt;br /&gt;&lt;blockquote&gt;&lt;/blockquote&gt;and OpenCL having __attribute__((reqd_work_group_size(X, Y, Z)))&lt;br /&gt;&lt;blockquote&gt;The workgroup size that must be used as the local_work_size argument to clEnqueueNDRangeKernel. This allows the compiler to optimize the generated code appropriately for this kernel.&lt;/blockquote&gt;CUDA introduces __launchbounds__&amp;nbsp; to be appended to kernels for specifying min blocks per SM desired ocuppancy max workgroup size so it can optimize register (spilling) usage..&lt;br /&gt;&lt;br /&gt;*By default (i.e. compiling source programs without change) L1 cache size will be 16kb so shared mem would be increased 3x per SM.. The function for setting is &lt;br /&gt;*We know global mem was cached by hardware cache (L2) and know that there is a L1 cache at least 16kb in size.. I presumed this was used for caching global mem but turns out that L1 caching of global mem can be disabled (compile time would be better at runtime).. So what is used L1 for? local mem for register spilling for example that can't be disabled..&lt;br /&gt;*a read only place from global mem (like const variables in C++) used along all threads in kernel is cached using constant cache (doesn't require __constant address space)..&lt;br /&gt;*Don't use 24 int integers are slow on CUDA check at compile time with CUDA_ARCH only device code&lt;br /&gt;but guide says..&lt;br /&gt;*FP ops are higher precision so results can differ from Tesla&lt;br /&gt;*As Fermi supports 64bit address space if passing -m64 to nvcc compiles to 64 host code and device code which is slower than 32 bits.. So if you don't need 64 address space but compile to 64 bits host (i.e. the GPUs your program would run are less than 4gb or program needs already less than so compile) compile separately kernel code from host code..&lt;br /&gt;*CUDA C++:&lt;br /&gt;function overloading: f(int a) f(double a)&lt;br /&gt;default parameters: f(a,b=0);&lt;br /&gt;namespaces: namespace nv{ int a;} namespace ati{ int a;} nv::a=2; ati::a; using namespace nv; a=3;(nv )&lt;br /&gt;operator overloading uchar4 operator+()&amp;nbsp; uchar4 a,b,c; c=a+b;&lt;br /&gt;implicit, explicit and specialized templates: f&lt;int&gt;(x) or int x; f(x) and f&lt;int&gt;(x){return(2);} f&lt;double&gt; ret(3);&lt;/double&gt;&lt;/int&gt;&lt;/int&gt;&lt;br /&gt;Fermi stuff:&lt;br /&gt;classes and functors.&lt;br /&gt;&lt;br /&gt;Seems support for virtual functions is missing yet and function pointers.. but coming..&lt;br /&gt;Recursion and mem allocation inside kernel still lacking and coming much later (?)..&lt;br /&gt;Remember all that supported in hardware..&lt;br /&gt;&lt;br /&gt;Search fermi new insts in b.5, b.6, b.11 (103,104)&lt;br /&gt;I don't know if b.12 is new __prof_trigger which exposes 8 counters which are incremented per warp each time and can be queried by profiler.. would be good if you can read with another inst in kernel? must think..&lt;br /&gt;b.14 has launchbound(pag 112) doc.&lt;br /&gt;Appendix G has the architecture feature chart (G1)&lt;br /&gt;LACKING documentation from the guide:&lt;br /&gt;*Launching of 3D grids! (well in 102 b.4 you find griddim is dim3 type but in pag 8 2.2 you see blocks are 1d or 2d thing and well in b.13 in 111 is said grid is dim3 but .z=1) (DC 5.0 has it OpenCL model (the API) supports that also)&lt;br /&gt;*Surface functions (I hope are no left for CUDA 3.1 or later as Fermi supports it and even Tesla as is used for RWTexture in DC and image writes in OpenCL driver)&lt;br /&gt;*Info that Fermi allows D2H H2D simultaneous transfers via async functions (check concurrent bancwith 1.1)&lt;br /&gt;&lt;br /&gt;Also somethings I was unaware of:&lt;br /&gt;use of __restrict__ in cuda pointers and some SLI info about cuda, SLI and D3D graphics interop..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-1494748099479910569?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/1494748099479910569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/reading-fermi-cuda-stuff.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1494748099479910569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1494748099479910569'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/reading-fermi-cuda-stuff.html' title='Reading Fermi CUDA stuff!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-4560666633177283913</id><published>2010-02-25T16:47:00.000+01:00</published><updated>2010-02-25T16:47:47.310+01:00</updated><title type='text'>Questions about OpenCL AMD d3d9 interop!</title><content type='html'>Which is the correct way? is API stable? &lt;br /&gt;See it http://forums.amd.com/devforum/messageview.cfm?catid=390&amp;amp;threadid=128467&amp;amp;enterthread=y&lt;br /&gt;&lt;div class="MessageText_Container"&gt;&lt;blockquote&gt;Hi I have coded some example trying to see how d3d interop  works..&lt;br /&gt;I see up to three APIs one is&lt;br /&gt;&lt;span id="ctl00_PlaceHolderMain_KBDisplayForm1_lblTitleID"&gt;KB91 -   Additional Header File Required For Preview Feature: ATI Stream SDK   v2.01 Support for OpenCL™ / Microsoft® DirectX® 9 &amp;amp; 10   Interoperability&lt;/span&gt;&lt;br /&gt;wich shows in cl_amd.hpp clEnqueueReleaseExternalObjects similar to  clEnqueueReleaseGLObjects for gl interop..&lt;br /&gt;this seems to be the correct way according to KB but I can't find  clEnqueueReleaseExternalObjects with clGetExtensionFunctionAddress so I  also&lt;br /&gt;in cl_d3d9.h I see&lt;br /&gt;clEnqueueAcquireD3D9ObjectsKHR which is another way..&lt;br /&gt;this is found by clGetExtensionFunctionAddress&lt;br /&gt;then for buffer interop similarly we found two functions:&lt;br /&gt;clCreateFromD3D9BufferKHR&lt;br /&gt;and then below is&lt;br /&gt;"//&lt;br /&gt;// Legacy AMD CL-D3D9 interop extension&lt;br /&gt;//"&lt;br /&gt;with&lt;br /&gt;clCreateFromD3D9Buffer&lt;br /&gt;function.&lt;br /&gt;With  clGetExtensionFunctionAddress I found clCreateFromD3D9BufferKHR  which is the correct I think.&lt;br /&gt;also I'm asking for correct texture interop(yeah I know is image  support required but I'm talking having  correct/stable source code here  not testing)&lt;br /&gt;I must use this clCreateFromD3D9TextureKHR?&lt;br /&gt;For testing I create context with&lt;br /&gt;&amp;nbsp;cl_context_properties cps[6] = &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;  CL_CONTEXT_PLATFORM, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (cl_context_properties)platform, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;  &amp;nbsp;&amp;nbsp;&amp;nbsp; CL_CONTEXT_D3D9_DEVICE,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;  (cl_context_properties)g_pd3dDevice,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; };&lt;br /&gt;with a D3d device created and then use&lt;br /&gt;&amp;nbsp;CL_API_ENTRY cl_mem (CL_API_CALL&lt;br /&gt;*myclCreateFromD3D9TextureKHR)(&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cl_context&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* context */,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cl_mem_flags&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*  flags */,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; IDirect3DTexture9 * /* texture */,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;  HANDLE&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* shared_handle */,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; UINT&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*  miplevel */,&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; cl_int *&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* errcode_ret */);&lt;br /&gt;myclCreateFromD3D9TextureKHR=(P2)clGetExtensionFunctionAddress("clCreateFromD3D9TextureKHR")&lt;br /&gt;as clCreateFromD3D9TextureKHR is defined already in header.&lt;br /&gt;then calling&lt;br /&gt;myclCreateFromD3D9TextureKHR(context,CL_MEM_READ_WRITE,g_inputTex,g_handle,0,&amp;amp;status);&lt;br /&gt;where&amp;nbsp; g_inputTex is created and g_handle is the last parameter  returned by CreateTexture. Is this correct?&lt;br /&gt;In this way I get runtime error after enabling images with "set  GPU_IMAGES_SUPPORT=1"&lt;br /&gt;setting the Handle shader parameter to NULL&lt;br /&gt;myclCreateFromD3D9TextureKHR(context,CL_MEM_READ_WRITE,g_inputTex,NULL,0,&amp;amp;status);&lt;br /&gt;returns an CL_INVALID_D3D_OBJECT&lt;br /&gt;What's the correct API? Is that stable?&lt;br /&gt;Is there any sample showing interop?&lt;br /&gt;is d3d11 interop coming? as nvidia has one d3d11 extension published  in khronos registry?..&lt;/blockquote&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-4560666633177283913?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/4560666633177283913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/questions-about-opencl-amd-d3d9-interop.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4560666633177283913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/4560666633177283913'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/questions-about-opencl-amd-d3d9-interop.html' title='Questions about OpenCL AMD d3d9 interop!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-798451424547454547</id><published>2010-02-25T16:46:00.000+01:00</published><updated>2010-02-25T16:46:16.058+01:00</updated><title type='text'>News 25/2!</title><content type='html'>*gpu-z 0.3.9 fixes opencl ati reporting!&lt;br /&gt;*The Wind Top desktop has 24inch 3D 120hz FullHD multitouch monitor! Seems the first!&lt;br /&gt;Jointly with Dell u2711 you have all the things I want from monitors in just two monitors&lt;br /&gt;(well dell only adds 10 bit color, 27inch and 2560x1440 res)!&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;The Wind Top desktop hasHD (1080p resolution) displays range from 19- to 24-inches. At the top of the list are the Wind Tops AE2420 and AE2280, 22- and 24-inch multi-touch displays respectively, equipped with processors up to Intel Core i7. The 24-inch model features a 120Hz LED display that pairs with 3D shutter glasses. (That 3D trend isn’t dying off so fast.)&lt;/blockquote&gt;&lt;br /&gt;Nexus and C#&lt;br /&gt;&lt;blockquote&gt;Yes, you can use Parallel Nsight/Nexus to debug CUDA C kernels written in C# or other CPU languages, but Nsight doesn't directly support the C# project type yet.&lt;br /&gt;So to use CUDA.NET with Nsight, you'll need to create a dummy C++ project whose 'command' in your Nexus User Properties to your C# executable.&lt;br /&gt;Then do Nexus Menu  Start CUDA Debugging in Visual Studio, and you should be off and running. AFAIK, you'll still need to program the actual GPU code in CUDA C.&lt;/blockquote&gt;&lt;br /&gt;Pages with GPU computing stuff!&lt;br /&gt;see the new? http://developer.nvidia.com/object/gpucomputing.html&lt;br /&gt;you have 3 guides with Fermi stuff!&lt;br /&gt;&lt;blockquote&gt;In the programming guide didn't mention that  GF100 is capable of simultaneous transfers of cuMemcpyDtoHAsync and  cuMemcpyHtoDAsync. I've added this to my good ol' concurrent bandwidth  test and will be updating that in the near future. &lt;/blockquote&gt;&lt;blockquote&gt;search concurrent bandwidth  test 1.1 for Fermi!&lt;br /&gt;&lt;br /&gt;Missing is CUDA  Developer Guide for Optimus Platforms.&lt;br /&gt;&lt;br /&gt;__global__ function  parameters are passed to the device:&lt;br /&gt;* via shared memory and are  limited to 256 bytes on devices of compute&lt;br /&gt;capability 1.x,&lt;br /&gt;*  via constant memory and are limited to 4 KB on devices of compute  capability&lt;br /&gt;2.0.&lt;/blockquote&gt;&lt;br /&gt;others:&lt;br /&gt;http://www.directx11tutorials.com/&lt;br /&gt;[JumpToDX11-11] DirectCompute&lt;br /&gt;http://vsts2010.net/220&lt;br /&gt;http://www.opengpu.org/bbs/archiver/&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Ivan Golubev is the blog to follow for Crypto and integer ops on GPUs!&lt;br /&gt;http://www.golubev.com/blog/ &lt;br /&gt;He says he has added bitalign AMD IL v2 for MD5 and SHA1 cracking on 5xxx GPUs has a post estimating perf of even Fermi GPUs..&lt;br /&gt;search&amp;nbsp; ighashgpu 0.70 it has this support test md5 and sha1 perf:&lt;br /&gt;ighashgpu.exe /h:96b13dbbc9f3bc569ddad9745f64b9cdb43ea9ae /t:sha1 /c:sd /max:7&lt;br /&gt;ighashgpu.exe /h:cbe1d6d5800ec1e03a5f2a64882a0d41 /t:md5 /c:sd /max:7 &lt;br /&gt;In post around end January you can find also SSE code used in her program..&lt;br /&gt;VS CUDA:&lt;br /&gt;&lt;blockquote&gt;You should be able to implement bit rotations using the bit-align  instruction introduced with Direct3D 11 and supported on both Fermi and  Cypress (computes ((a:b) &amp;gt;&amp;gt; c) &amp;amp; 0xffffffff, where a:b is the  concatenation of two 32-bit operands).&lt;br /&gt;This adds nothing to the  "NVIDIA vs. AMD" debate, but should provide a nice further improvement  compared to the previous generation.&lt;br /&gt;&lt;br /&gt;Maybe some other  tricks are possible...&lt;br /&gt;For instance both G80 and Fermi support  free binary negation of operands to logic instructions (allowing NOR,  NAND, NXOR, ANDN...), and Fermi supports a left shift followed by an  addition as a single instruction.&lt;/blockquote&gt;&lt;br /&gt;Edit: also, there is  always the MAD24 instruction for computations such as 5*i+1 (much faster  than adds). &lt;br /&gt;&lt;br /&gt;Benchmar reveiws has NVIDIA nTeresting: 22 February 2010!&lt;br /&gt;&lt;br /&gt;Limitations in OpenCL&lt;br /&gt;1. Can i include C inline assembly code in my openCL code?&lt;br /&gt;2. Does OpenCL support addtion and subtraction with carry?&lt;br /&gt;in AMD also current limitations:&lt;br /&gt;Lacking Pinned mem!&lt;br /&gt;uses one UAV for all allocations so max 256Mbytes usage!&lt;br /&gt;&lt;br /&gt;Nvidia has not this two limitations no through DirectCompute!&lt;br /&gt;Regarding the two OCL limitations modify CAL++ author includes in TODO list and second is assembly instruction on 5xxx so when in AMD IL author can add!&lt;br /&gt;Also Nvidia trough CUDA there is a ADDC enabled compiler referenced in previous posts and also&lt;br /&gt;inline assembly is unofficialy supported in CUDA!&lt;br /&gt;In Nvidia OCL you can modify code PTX on the fly and add addc and feed them!&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;How to wait for kernel finalization without CPU usage (from Golubev blog):&lt;br /&gt;CUDA create context with CU_CTX_BLOCKING_SYNC&lt;br /&gt;CAL Specifically there is an undocumented feature calCtxWaitForEvent&lt;br /&gt;True ATI again planted a dog - GPU kernel compiled Catalyst 9.12 are 10% slower on RV8 × 0. and somewhere in the 2-3 times slower on RV7X0. It happened due to the fact that now the ATI CAL compiler aggressively unroll !absolutely everything, so that the kernel will become the size of a few hundred KB, did not interfere in the cache ... and everything is covered&lt;br /&gt;&lt;br /&gt;OpenCL for FreeBASIC: http://shiny3d.de/libs/fbOpenCL.zip&lt;br /&gt;Remember there is also for FreePascal and Delphi! &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;5 Questions -- Implementing a bunch of OpenCL tools&lt;br /&gt;&lt;br /&gt;Texture sharing&lt;br /&gt;I thing you must use in OpenCL d3d interop..&lt;br /&gt;&lt;br /&gt;http://msdn.microsoft.com/en-us/library/ee418929%28VS.85%29.aspx&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;ID3D10Device::OpenSharedResource&lt;br /&gt;To share a resource between a Direct3D 9 device and a Direct3D 10 device the texture must have been created using the pSharedHandle argument of CreateTexture. The shared Direct3D 9 handle is then passed to OpenSharedResource in the hResource argument.&lt;br /&gt;&lt;br /&gt;The following code illustrates the method calls involved.&lt;br /&gt;&lt;br /&gt;sharedHandle = NULL; // must be set to NULL to create, can use a valid handle here to open in D3D9 &lt;br /&gt;pDevice9-&amp;gt;CreateTexture(..., pTex2D_9, &amp;amp;sharedHandle); &lt;br /&gt;... &lt;br /&gt;pDevice10-&amp;gt;OpenSharedResource(sharedHandle, __uuidof(ID3D10Resource), (void**)(&amp;amp;tempResource10)); &lt;br /&gt;tempResource10-&amp;gt;QueryInterface(__uuidof(ID3D10Texture2D), (void**)(&amp;amp;pTex2D_10)); &lt;br /&gt;tempResource10-&amp;gt;Release(); &lt;br /&gt;// now use pTex2D_10 with pDevice10&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;br /&gt;Textures being shared from D3D9 to D3D10 have the following restrictions.&lt;br /&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Textures must be 2D&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Only 1 mip level is allowed&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Texture must have default usage&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Texture must be write only&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * MSAA textures are not allowed&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Bind flags must have SHADER_RESOURCE and RENDER_TARGET set&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; * Only R10G10B10A2_UNORM, R16G16B16A16_FLOAT and R8G8B8A8_UNORM formats are allowed&lt;/blockquote&gt;&lt;br /&gt;Interesting post: http://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers/ vlc 1.1 is using that approach I think and also MPC Home cinema it seems!&lt;br /&gt;vlc 1.1 is doing that!&lt;br /&gt;&lt;br /&gt;Final round of Tesla Compute Cluster driver testing:&lt;br /&gt;*CUDA H264 GPU video encoding work through MediaCoder&lt;br /&gt;*vreveal works (clean video, sharpness)&lt;br /&gt;issues:&lt;br /&gt;stabilization:  Gray uniform colors&lt;br /&gt;contrast: i get pink color&lt;br /&gt;*Badaboom fails with:&lt;br /&gt;.GPU 0: ATI Radeon HD 5800 Series&lt;br /&gt;FATAL:There is no GPU device supporting CUDA.&lt;br /&gt;(Altough there supports TCC CUDA)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Currently the global memory available is the value returned by  CL_DEVICE_GLOBAL_MEM_SIZE in device query. Full physical memory is  expected to be available in one of the upcoming releases.&lt;br /&gt;Global  buffer is 128bit aligned addresses, UAV's are byte aligned and on 5XXX  series of cards you can have up to 9 UAV's per kernel. Also through  UAV's you can do byte addressable writes with the UAV arena and also  atomic operations. None of these can be done on the global buffer path.&lt;br /&gt;&lt;br /&gt;Global buffer is 128bit aligned addresses, UAV's are byte aligned and on 5XXX series of cards you can have up to 9 UAV's per kernel. Also through UAV's you can do byte addressable writes with the UAV arena and also atomic operations. None of these can be done on the global buffer path.&lt;br /&gt;it is easier to burst using global memory as it is an implicit 128 bit write versus an implicit 32bit write on UAV.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-798451424547454547?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/798451424547454547/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/news-252.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/798451424547454547'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/798451424547454547'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/news-252.html' title='News 25/2!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-1121247492534756594</id><published>2010-02-25T16:39:00.002+01:00</published><updated>2010-02-25T16:41:28.364+01:00</updated><title type='text'>3 new tools!</title><content type='html'>3 New GPU tools!&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.multiscalelab.org/swan"&gt;Swan: A simple tool for porting CUDA kernels to OpenCL&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;h3 id="head-68a89c8b8e36d3572e980377dc30a649d0639618"&gt;What is it?&lt;/h3&gt;&lt;span class="anchor" id="line-16"&gt;&lt;/span&gt;&lt;div class="line874"&gt;Swan is a  small tool that aids the reversible conversion of existing CUDA  codebases to OpenCL. It does several useful things: &lt;span class="anchor" id="line-17"&gt;&lt;/span&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;Translates CUDA kernel source-code to  OpenCL. &lt;span class="anchor" id="line-18"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Provides a  common API that abstracts both CUDA and OpenCL runtimes. &lt;span class="anchor" id="line-19"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;div class="line862"&gt;Preserves  the convenience of the CUDA &lt;em&gt;&amp;lt;&amp;lt;&amp;lt; grid, block &amp;gt;&amp;gt;&amp;gt;&lt;/em&gt;  kernel launch syntax by generating C source-code for kernel entry-point  functions. &lt;span class="anchor" id="line-20"&gt;&lt;/span&gt;&lt;span class="anchor" id="line-21"&gt;&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="line867"&gt; &lt;/div&gt;&lt;h3 id="head-a38e2f29cdf0e2a4ebad17a6aca0fbcde62eadfe"&gt;Why might you  want it?&lt;/h3&gt;&lt;span class="anchor" id="line-22"&gt;&lt;/span&gt;&lt;span class="anchor" id="line-23"&gt;&lt;/span&gt;&lt;div class="line874"&gt;Possible uses include: &lt;span class="anchor" id="line-24"&gt;&lt;/span&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;Evaluating OpenCL  performance of an existing CUDA code. &lt;span class="anchor" id="line-25"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Maintaining  a dual-target OpenCL and CUDA code. &lt;span class="anchor" id="line-26"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Reducing  dependence on NVCC when compiling host code. &lt;span class="anchor" id="line-27"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Support multiple CUDA compute capabilities  in a single binary &lt;span class="anchor" id="line-28"&gt;&lt;/span&gt;&lt;span class="anchor" id="line-29"&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="line867"&gt; &lt;/div&gt;&lt;h3 id="head-4f149047ea592c788e4d5008d4ed4e4e890d736a"&gt;Limitations&lt;/h3&gt;&lt;span class="anchor" id="line-30"&gt;&lt;/span&gt;&lt;span class="anchor" id="line-31"&gt;&lt;/span&gt;&lt;div class="line862"&gt;It's not a drop-in replacement  for &lt;strong&gt;nvcc&lt;/strong&gt;. Host code needs to have all kernel  invocations and CUDA API calls re-written. &lt;span class="anchor" id="line-32"&gt;&lt;/span&gt;&lt;span class="anchor" id="line-33"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="line874"&gt;Swan does not support a few things. In particular: &lt;span class="anchor" id="line-34"&gt;&lt;/span&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;CUDA C++ templating in  kernel code. &lt;span class="anchor" id="line-35"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;OpenCL  Images/Samplers (analogous to Textures). &lt;span class="anchor" id="line-36"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Multiple device management in a single  process. &lt;span class="anchor" id="line-37"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Compiling  kernels for the CPU. &lt;span class="anchor" id="line-38"&gt;&lt;/span&gt;&lt;/li&gt;&lt;li&gt;CUDA  device-emulation mode. &lt;span class="anchor" id="line-39"&gt;&lt;/span&gt;&lt;span class="anchor" id="line-40"&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/blockquote&gt;Furthermore,  it's a work in progress. It works for our code but no promises it will  for yours&lt;br /&gt;&lt;br /&gt;Cloo 0.6.2&lt;br /&gt;&lt;blockquote&gt;A new version of Cloo is out.&lt;br /&gt;It  introduces a tracking mechanism for kernel arguments (sampler or memory  objects) which prevents them from being claimed by the GC in case the  user application doesn't refer to them in later code. This behaviour has  been backported to the existing Set*Argument methods since it is safer.  You can override auto-tracking using the newly added overloads.&lt;br /&gt;A  critical bug affecting image read operations together with some other  minor glitches were fixed.&lt;br /&gt;As for breaking changes rename any  ComputeImage.PixelSize to ElementSize and you're good to go.&lt;br /&gt;&lt;br /&gt;Clootils    have been improved, too. Now, you can take advantage of some bells and  whistles which control the program building behavior.&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;a href="http://sourceforge.net/projects/calpp/"&gt;CAL++ v. 0.8 release&lt;/a&gt;&lt;br /&gt;&lt;a href="http://forums.amd.com/devforum/messageview.cfm?catid=390&amp;amp;threadid=127963&amp;amp;enterthread=y"&gt;anouncement &lt;/a&gt;&lt;br /&gt;&lt;blockquote&gt;C++ to IL  generator/compiler with C++ bindings for CAL&lt;br /&gt;http://sourceforge.net/projects/calpp/&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The  CAL++ library has been just released. Project homepage is located here  http://sourceforge.net/projects/calpp/ .&lt;br /&gt;&lt;br /&gt;The project  consist of two main components. One is C++ binding for CAL ( it's really  much easier to develop new CAL applications using bindings ) and second  is C++ to IL generator/compiler.&lt;br /&gt;&lt;br /&gt;The C++  generator/compiler has syntax very similar to OpenCL ( with few  necessary exceptions ). Also it supports all devices which can run CAL  kernels ( finally OpenCL like language for 3xxx ).&lt;br /&gt;&lt;br /&gt;It  has some advantages over OpenCL compiler. To name few&lt;br /&gt;&lt;br /&gt;-  it's much closer to CAL - it allows to write code which is almost as  good ( or as good ) as handwritten IL. Look at the matrix multiplication  example - it has almost the same ISA as prunedtree original code ( it  differs only where I've added some changes ).&lt;br /&gt;&lt;br /&gt;-  Advantage of using C++. I really wouldn't like to use double-double ( or  quad float ) technique without C++.&lt;br /&gt;&lt;br /&gt;- Powerfull  control over loop unroling and code selection ( at IL compilation time  ). The C++ language acts like preprocesor.&lt;br /&gt;&lt;br /&gt;- It has LDS  support for 4xxx, doubles, etc. And if something is missing it can be  added really easy.&lt;br /&gt;&lt;br /&gt;But as always there are some  pitfalls to this approach&lt;br /&gt;&lt;br /&gt;- it isn't OpenCL . Having  standard is always usefull.&lt;br /&gt;&lt;br /&gt;- Only partial support for  structs ( it can be much improved but never as good as OpenCL ).&lt;br /&gt;&lt;br /&gt;-  CAL++ is much closer to IL and some more knowledge about IL is required  to achive full potential ( hmmm I think this is also the case with  OpenCL ).&lt;br /&gt;&lt;br /&gt;- optimization is only performed by CAL IL  compiler ( which isn't that good ).&lt;br /&gt;&lt;br /&gt;With the library  there are some examples included. I think the fastest matrix  multiplication might be a small gem here .&lt;br /&gt;&lt;br /&gt;I hope that  CAL++ will be usefull to someone .&lt;/blockquote&gt;&lt;br /&gt;Doesn't  compile under Windows MSVC 2008!&lt;br /&gt;Use 0.8a for GCC 4.4!&lt;br /&gt;QA:&lt;br /&gt;&lt;blockquote&gt;1.  Have you tested on Windows?&lt;br /&gt;&lt;br /&gt;No. But with the exception  to C++ compiler problems it should work ( there is nothing platform  specific in the code ).&lt;br /&gt;&lt;br /&gt;2. Also have you added 24 bit  integer instructions? they are useful for getting thread id fast for  example..&lt;br /&gt;&lt;br /&gt;CAL++ is converting code to IL. So 24  operations need to be available in CAL IL. And unfortunatelly it isn't  the case.&lt;br /&gt;&lt;br /&gt;I'm thinking how hard is to add also GDS?&lt;br /&gt;&lt;br /&gt;Using  anything that isn't available in IL is really hard ( or close to  impossible ).&lt;br /&gt;&lt;br /&gt;When CAL supported ISA assembler  compilation ( 3xxx family ) you could generate ISA ASM. I would call it  really, really hard as you need to be aware of many architecture limits (  and those informations simply aren't available ).&lt;br /&gt;&lt;br /&gt;But  for 4xxx, 5xxx family to use ISA requires to write your own driver stack  ( as CAL doesn't support asm any more ) - I think it's simply  impossible at the moment.&lt;br /&gt;&lt;br /&gt;" It cannot be compiled at  the time as it depends on some CAL Vector/Matrix classes which aren't  available for public use."  are this AMD NDA code or is your own code?&lt;br /&gt;&lt;br /&gt;It's  my own code, but it's far from being ready. For vectorquantization  example is can be easily replaced by Image2D with simple functions to  fill data.&lt;br /&gt;&lt;br /&gt;are you using any magic in it? or I can code  some wrappers?..&lt;br /&gt;&lt;br /&gt;The Matrix/Vector code is using a  little bit of magic . Any vector/matrix expression ( like vec_a =  3*vec_b + vec_c + log(vec_d) ) is converted to proper kernel ( trick  with using templates for delayed execution ) and executed on gpu. It  saves a lot of time with writing custom kernels .&lt;/blockquote&gt;&lt;br /&gt;From  TODO:&lt;br /&gt;&lt;blockquote&gt;1. Add UAVs support,logical operations and more  double math functions and as_typen conversion&lt;br /&gt;&lt;br /&gt;2. Add  il_asm function ( usage example: il_asm("mov %1,%2", v1, v2); would  generate "mov r1,r2" )&lt;br /&gt;&lt;br /&gt;3. Add documentation and more  examples&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;4. Easier to use local cal arrays,  and more user friendly code for IL creation functions&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-1121247492534756594?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/1121247492534756594/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/3-new-tools.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1121247492534756594'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1121247492534756594'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/3-new-tools.html' title='3 new tools!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-1634037226863076308</id><published>2010-02-25T16:09:00.000+01:00</published><updated>2010-02-25T16:09:28.968+01:00</updated><title type='text'>Ideas for porting algos to GPU:AVX SSE and MMX ports!</title><content type='html'>Hi this can be seen as crazy but some research of year 96 can be useful in thinking what Intel thought&lt;br /&gt;were heavy useful algos that could offer improved perf using SSE,MMX,AVX!&lt;br /&gt;&lt;br /&gt;For AVX there is an AVX site containing a lot of posts:&lt;br /&gt;some new are from January offering general CRC perf spee using pcmuldq on Westemere!&lt;br /&gt;also some AVX report numbers using Sandy Bridge silicon!&lt;br /&gt;For SSE see:&lt;br /&gt;http://www.datasheetarchive.com/datasheet-pdf/1070.html&lt;br /&gt;especially intel reports 802-833 here you can see&lt;br /&gt;"Increasing the Accuracy of the Results from the Reciprocal and Reciprocal Square Root"&lt;br /&gt;Instructions using the Newton-Raphson Method..&lt;br /&gt;which in fact is redeferenced in gpu gems3 nbody&lt;br /&gt;&lt;br /&gt;MMX manuals here:&lt;br /&gt;http://www.tommesani.com/IntelAppNotes.html&lt;br /&gt;http://software.intel.com/en-us/articles/mmxt-technology-manuals-and-application-notes/&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-1634037226863076308?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/1634037226863076308/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/ideas-for-porting-algos-to-gpuavx-sse.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1634037226863076308'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/1634037226863076308'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/ideas-for-porting-algos-to-gpuavx-sse.html' title='Ideas for porting algos to GPU:AVX SSE and MMX ports!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-164554294122829072</id><published>2010-02-25T16:08:00.001+01:00</published><updated>2010-02-25T16:08:56.118+01:00</updated><title type='text'>About ATI and Nvidia drivers (OCL included)!</title><content type='html'>Hi &lt;br /&gt;I have been investigating AMD and Nvidia drivers..&lt;br /&gt;&lt;br /&gt;&lt;div style="margin: 0px;"&gt;for 10.3 there are 3d hooks support for 120hz monitors but is d3d9 d3d10 or d3d11 enabled? or both? what's the API? and also what about only fullscreen support as 3d vision d3d support or windowed support? 3d vision windowed is soon..&lt;/div&gt;&lt;div style="margin: 0px;"&gt;also I have checked 10.3 and has bugs in gdebugger 5.5 (8.68 no perf counters found),gdi (still slow on aero w7),heaven ogl(issues half screen as early drivers but I think hotfix 9.12 worked fine..)&lt;/div&gt;&lt;div&gt;nvapi should be coming with 200 series with gpu usage apis?&lt;/div&gt;&lt;br /&gt;first for AMD you can use some components without installing the complete driver for example&lt;br /&gt;opengl drivers and ati cal drivers I think..&lt;br /&gt;note this is no go in Nvidia where every component has to be from the same version..&lt;br /&gt;well I don't remember opencl if it's equal since tesla computing driver not includes nvcompiler.dll&lt;br /&gt;but I have found you can enable OpenCL with tesla computing driver just use opencl.dll from amd sdk 2.01 the problem it didn't work is because I have opencl.dll from older nvidia driver and installing tcc driver over it didn't remove it and amd installer doesn't overwrite (but it should!)..&lt;br /&gt;to fix it del opencl.dll and reinstall or:&lt;br /&gt;goto C:\ATI\Support\streamsdk_2-0-1_win764\Packages\Apps and in dev dir drop msi to&lt;br /&gt;http://dl.dropbox.com/u/1416327/extractmsi.bat and search in temp3 opencl.dll..&lt;br /&gt;Now I have learned some things: OpenCL works with tcc, is clever enoguh to disable cl_khr_sharing (or it's some weird issue that seems superintelligent?) and also that introduces d3d9 interop, reports icd extension and exntesions report some unroll extensions I didn't konw if where on 195 first drivers.. also&lt;br /&gt;note d3d9 interop in tcc won't work so also if were intelligent would be disable on tcc..&lt;br /&gt;finally I have found aes amd sample now works and some other demos that didn't work first mandelbrot I think.. I don't know if it's due to 2.01 source improvements or Nvidia improvements or both!.&lt;br /&gt;Note OpenCL Nvidia is super good now with some weird issues (amd aes sample, volume 3d demo fast on w 7.. still to check is functions with no parameters as HelloCL and Apple FFT d(and ocena) emo working?) fixed and also to be ICD complaint (now amd+nvida works with 2.01 opencl.dll)and d3d9 interop (i have not checked if working) I think almost all of these are 196 ocl improvements.. no it's left for ocl is d3d10 and half support and for Fermi 3d_image_writes and d3d11 interop and better perf?&lt;br /&gt;Also ocl d3d interop enables gl-dx interop?&lt;br /&gt;also I have tested cuvid if working and I can't &amp;nbsp;see OGL CUVID example working and I have selected preferCUDA (no default preferVP (video processor) as that return errors on init same as preferDXVA(but this doesn't worked with normal drivers)) with that I have to cuGLctxcreate to cuctxcreate and remove all cuGL functions.. should work.. but returns error in hadledecodepicture with context_invalid.. theortically should work if only CUDa cores are used..&lt;br /&gt;What about win7 mft and CUVENC I think this should work as is cuda kernels..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;overclock tcc doesn't work and clock reporting but with Nvidia CUDA you can get stream processor clocks at least.. anyway fan speed works..&lt;br /&gt;Lastly seeing from icd spec I think all that remains for a non Khronos member to implement it are an ordered list of ocl functions in dispatch struct..&lt;br /&gt;&lt;br /&gt;Also reported version in ccc of 2d, d3d and ogl drivers are this files:&lt;br /&gt;2d-&amp;gt;atikmdag.sys 2d version 8.01.01.1010&lt;br /&gt;3d-&amp;gt;atiumdag.dll atiumd64 d3d 735&lt;br /&gt;ogl-&amp;gt;atiogl.. 9606&lt;br /&gt;&lt;br /&gt;All files found in 10.3 beta (new is for post 9.12 files i.e. not found in 9.12 due to crossfire restructuring almost for sure all inclde):&lt;br /&gt;ati2edxx.dl_-&amp;gt; ati external device utility syswow64&lt;br /&gt;ati2erec.dl_&lt;br /&gt;&lt;br /&gt;AtiEDUGetThermalApiVersion&lt;br /&gt;AtiEDUEnumApiSupportedDevices&lt;br /&gt;AtiEDUEnumSupportedExternalDevices&lt;br /&gt;AtiEDUGetExtDeviceInfo&lt;br /&gt;AtiEDUOpenAdapterHandle&lt;br /&gt;AtiEDUCloseAdapterHandle&lt;br /&gt;AtiEDUInitializeThermal&lt;br /&gt;AtiEDUSetThermalRemoteTemperatureOffset&lt;br /&gt;AtiEDUSetThermalRemoteTemperatureHighSetPoint&lt;br /&gt;AtiEDUSetThermalRemoteTemperatureLowSetPoint&lt;br /&gt;AtiEDUSetThermalRemoteTemperatureCriticalSetPoint&lt;br /&gt;AtiEDUGetThermalRemoteTemperatureOffset&lt;br /&gt;AtiEDUGetThermalRemoteTemperatureHighSetPoint&lt;br /&gt;AtiEDUGetThermalRemoteTemperatureLowSetPoint&lt;br /&gt;AtiEDUGetThermalRemoteTemperatureCriticalSetPoint&lt;br /&gt;AtiEDUGetThermalRemoteTemperature&lt;br /&gt;AtiEDUThermalEnableInterrupt&lt;br /&gt;AtiEDUThermalDisableInterrupt&lt;br /&gt;AtiEDUGetAdapterTemperatureOffset&lt;br /&gt;AtiEDUGetThermalRemoteTemperatureFP&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;ati adl sdk&lt;br /&gt;atiadlxx.dl_ system32&lt;br /&gt;atiadlxy.dl_ syswow64&lt;br /&gt;ADL_Workstation_Stereo_Get&lt;br /&gt;ADL_Workstation_Stereo_Set&lt;br /&gt;&lt;br /&gt;ADL_Workstation_AdapterNumOfGLSyncConnectors_Get&lt;br /&gt;ADL_Workstation_Caps&lt;br /&gt;ADL_Workstation_DisplayGLSyncMode_Get&lt;br /&gt;ADL_Workstation_DisplayGLSyncMode_Set&lt;br /&gt;ADL_Workstation_DisplayGenlockCapable_Get&lt;br /&gt;ADL_Workstation_GLSyncCounters_Get&lt;br /&gt;ADL_Workstation_GLSyncGenlockConfiguration_Get&lt;br /&gt;ADL_Workstation_GLSyncGenlockConfiguration_Set&lt;br /&gt;ADL_Workstation_GLSyncModuleDetect_Get&lt;br /&gt;ADL_Workstation_GLSyncModuleInfo_Get&lt;br /&gt;ADL_Workstation_GLSyncPortState_Get&lt;br /&gt;ADL_Workstation_GLSyncPortState_Set&lt;br /&gt;ADL_Workstation_LoadBalancing_Caps&lt;br /&gt;ADL_Workstation_LoadBalancing_Get&lt;br /&gt;ADL_Workstation_LoadBalancing_Set&lt;br /&gt;ADL_Workstation_Stereo_Get&lt;br /&gt;ADL_Workstation_Stereo_Set&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;aplication profiles&lt;br /&gt;atiapfxx.blb system32&lt;br /&gt;atiapfxx.ex_ system32&lt;br /&gt;&lt;br /&gt;no se (old)&lt;br /&gt;atibtmon.ex_ system32 ati brigthnes monitor&lt;br /&gt;&lt;br /&gt;ati cal&lt;br /&gt;aticalcl.dl_&lt;br /&gt;aticalcl64.dl_&lt;br /&gt;aticaldd.dl_        OK&lt;br /&gt;aticaldd64.dl_&lt;br /&gt;aticalrt.dl_&lt;br /&gt;aticalrt64.dl_&lt;br /&gt;&lt;br /&gt;crossfire (new)&lt;br /&gt;aticfx32.dl_ ati radeon d3d11 driver syswow64&lt;br /&gt;aticfx64.dl_ ati radeon d3d11 driver system32&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;old&lt;br /&gt;atidemgx.dll graphics demsystem32  (catalyst control center)&lt;br /&gt;atidxx32.dl_ d3d 11 driver syswow64&lt;br /&gt;atidxx64.dl_ d3d 11 driver system32&lt;br /&gt;atieclxx.ex_ AMD external events client module (ccc) system32&lt;br /&gt;atiedu64.dl_ ati external device utility system32 ati2edxx.dl&lt;br /&gt;atiesrxx.ex_ AMD external events client module (ccc) system32&lt;br /&gt;&lt;br /&gt;no se (new)&lt;br /&gt;atig6pxx.dl_ powerxpress vista ogl (thunk) driver system32&lt;br /&gt;atig6txx.dl_ powerxpress vista ogl driver syswow64&lt;br /&gt;atigktxx.dl_ powerxpress vista ogl (thunk) driver syswow64&lt;br /&gt;atiglpxx.dl_ powerxpress vista ogl  driver system32&lt;br /&gt;&lt;br /&gt;old&lt;br /&gt;atiicdxx.da_&lt;br /&gt;atikmdag.sy_&lt;br /&gt;atikmpag.sy_&lt;br /&gt;atimpc32.dl_ radeon pcom universal driver syswow64&lt;br /&gt;atimpc64.dl_ radeon pcom universal driver sys32&lt;br /&gt;atimuixx.dl_ multilanguage dppe dll&lt;br /&gt;atio6axx.dl_ ati opengl driver system32&lt;br /&gt;atiodcli.ex_ no se&lt;br /&gt;atiode.ex_ no se&lt;br /&gt;&lt;br /&gt;ogl driver syswow64&lt;br /&gt;atiogl.xml&lt;br /&gt;atioglxx.dl_&lt;br /&gt;&lt;br /&gt;new&lt;br /&gt;atipblag.dat contiains list (3DMark06*.exe 3DMark2001.exe 3DMark2001SE.exe 3DMark03.exe 3DMark05.exe ..)&lt;br /&gt;&lt;br /&gt;atipdl64.dl_ --&lt;br /&gt;atipdlxx.dl_ ati desktop cwddedi  syswow64 old adl lib&lt;br /&gt;atitmm64.dl_ tmm clone control module&lt;br /&gt;&lt;br /&gt;new&lt;br /&gt;atitmp64.dl_&lt;br /&gt;atiu9p64.dl_ -&lt;br /&gt;atiu9pag.dl_ powerxpress vista user mode driver (d3d9?) syswow64&lt;br /&gt;&lt;br /&gt;old&lt;br /&gt;atiumd64.dl_ readeon directx universl driver system32&lt;br /&gt;atiumd6a.ca_ dat64&lt;br /&gt;atiumd6a.dl_ video acceleratrion universal driver&lt;br /&gt;atiumdag.dl_ readeon directx universl driver syswow64&lt;br /&gt;atiumdva.ca_ dat32&lt;br /&gt;atiumdva.dl_ video acceleratrion universal driver syswow64&lt;br /&gt;&lt;br /&gt;new&lt;br /&gt;atiuxp64.dl_ -&lt;br /&gt;atiuxpag.dl_ powerxpress vista user mode driver (d3d10?) syswow64&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I see amdpcom32  radeon pcom universal driver syswow64&lt;br /&gt;&lt;br /&gt;ati2erec.dl_ atitmp64.dl_ &lt;br /&gt;atikmdag.sys ati radeon kernel model driver&lt;br /&gt;atipmdag.sys ati radeon kernel model driver&lt;br /&gt;atikmpag.sys mini port driver&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Also found is some functions of&amp;nbsp;GL_EXT_direct_state_access&amp;nbsp;extensions are found (is this useful?)&lt;br /&gt;this is a good extensions so is good to know..&lt;br /&gt;seems GL_ARB_compatibilty is not found..&lt;br /&gt;&lt;br /&gt;GL_EXT_direct_state_access:                                    MISSING &lt;br /&gt;---------------------------&lt;br /&gt;glBindMultiTextureEXT:                                       OK&lt;br /&gt;glCheckNamedFramebufferStatusEXT:                            OK&lt;br /&gt;glClientAttribDefaultEXT:                                    OK&lt;br /&gt;glCompressedMultiTexImage1DEXT:                              OK&lt;br /&gt;glCompressedMultiTexImage2DEXT:                              OK&lt;br /&gt;glCompressedMultiTexImage3DEXT:                              OK&lt;br /&gt;glCompressedMultiTexSubImage1DEXT:                           OK&lt;br /&gt;glCompressedMultiTexSubImage2DEXT:                           OK&lt;br /&gt;glCompressedMultiTexSubImage3DEXT:                           OK&lt;br /&gt;glCompressedTextureImage1DEXT:                               OK&lt;br /&gt;glCompressedTextureImage2DEXT:                               OK&lt;br /&gt;glCompressedTextureImage3DEXT:                               OK&lt;br /&gt;glCompressedTextureSubImage1DEXT:                            OK&lt;br /&gt;glCompressedTextureSubImage2DEXT:                            OK&lt;br /&gt;glCompressedTextureSubImage3DEXT:                            OK&lt;br /&gt;glCopyMultiTexImage1DEXT:                                    OK&lt;br /&gt;glCopyMultiTexImage2DEXT:                                    OK&lt;br /&gt;glCopyMultiTexSubImage1DEXT:                                 OK&lt;br /&gt;glCopyMultiTexSubImage2DEXT:                                 OK&lt;br /&gt;glCopyMultiTexSubImage3DEXT:                                 OK&lt;br /&gt;glCopyTextureImage1DEXT:                                     OK&lt;br /&gt;glCopyTextureImage2DEXT:                                     OK&lt;br /&gt;glCopyTextureSubImage1DEXT:                                  OK&lt;br /&gt;glCopyTextureSubImage2DEXT:                                  OK&lt;br /&gt;glCopyTextureSubImage3DEXT:                                  OK&lt;br /&gt;glDisableClientStateIndexedEXT:                              OK&lt;br /&gt;glDisableClientStateiEXT:                                    MISSING&lt;br /&gt;glDisableVertexArrayAttribEXT:                               MISSING&lt;br /&gt;glDisableVertexArrayEXT:                                     MISSING&lt;br /&gt;glEnableClientStateIndexedEXT:                               OK&lt;br /&gt;glEnableClientStateiEXT:                                     MISSING&lt;br /&gt;glEnableVertexArrayAttribEXT:                                MISSING&lt;br /&gt;glEnableVertexArrayEXT:                                      MISSING&lt;br /&gt;glFlushMappedNamedBufferRangeEXT:                            MISSING&lt;br /&gt;glFramebufferDrawBufferEXT:                                  OK&lt;br /&gt;glFramebufferDrawBuffersEXT:                                 OK&lt;br /&gt;glFramebufferReadBufferEXT:                                  OK&lt;br /&gt;glGenerateMultiTexMipmapEXT:                                 OK&lt;br /&gt;glGenerateTextureMipmapEXT:                                  OK&lt;br /&gt;glGetCompressedMultiTexImageEXT:                             OK&lt;br /&gt;glGetCompressedTextureImageEXT:                              OK&lt;br /&gt;glGetDoubleIndexedvEXT:                                      OK&lt;br /&gt;glGetDoublei_vEXT:                                           MISSING&lt;br /&gt;glGetFloatIndexedvEXT:                                       OK&lt;br /&gt;glGetFloati_vEXT:                                            MISSING&lt;br /&gt;glGetFramebufferParameterivEXT:                              OK&lt;br /&gt;glGetMultiTexEnvfvEXT:                                       OK&lt;br /&gt;glGetMultiTexEnvivEXT:                                       OK&lt;br /&gt;glGetMultiTexGendvEXT:                                       OK&lt;br /&gt;glGetMultiTexGenfvEXT:                                       OK&lt;br /&gt;glGetMultiTexGenivEXT:                                       OK&lt;br /&gt;glGetMultiTexImageEXT:                                       OK&lt;br /&gt;glGetMultiTexLevelParameterfvEXT:                            OK&lt;br /&gt;glGetMultiTexLevelParameterivEXT:                            OK&lt;br /&gt;glGetMultiTexParameterIivEXT:                                OK&lt;br /&gt;glGetMultiTexParameterIuivEXT:                               OK&lt;br /&gt;glGetMultiTexParameterfvEXT:                                 OK&lt;br /&gt;glGetMultiTexParameterivEXT:                                 OK&lt;br /&gt;glGetNamedBufferParameterivEXT:                              OK&lt;br /&gt;glGetNamedBufferPointervEXT:                                 OK&lt;br /&gt;glGetNamedBufferSubDataEXT:                                  OK&lt;br /&gt;glGetNamedFramebufferAttachmentParameterivEXT:               OK&lt;br /&gt;glGetNamedProgramLocalParameterIivEXT:                       MISSING&lt;br /&gt;glGetNamedProgramLocalParameterIuivEXT:                      MISSING&lt;br /&gt;glGetNamedProgramLocalParameterdvEXT:                        OK&lt;br /&gt;glGetNamedProgramLocalParameterfvEXT:                        OK&lt;br /&gt;glGetNamedProgramStringEXT:                                  OK&lt;br /&gt;glGetNamedProgramivEXT:                                      OK&lt;br /&gt;glGetNamedRenderbufferParameterivEXT:                        OK&lt;br /&gt;glGetPointerIndexedvEXT:                                     OK&lt;br /&gt;glGetPointeri_vEXT:                                          MISSING&lt;br /&gt;glGetTextureImageEXT:                                        OK&lt;br /&gt;glGetTextureLevelParameterfvEXT:                             OK&lt;br /&gt;glGetTextureLevelParameterivEXT:                             OK&lt;br /&gt;glGetTextureParameterIivEXT:                                 OK&lt;br /&gt;glGetTextureParameterIuivEXT:                                OK&lt;br /&gt;glGetTextureParameterfvEXT:                                  OK&lt;br /&gt;glGetTextureParameterivEXT:                                  OK&lt;br /&gt;glGetVertexArrayIntegeri_vEXT:                               MISSING&lt;br /&gt;glGetVertexArrayIntegervEXT:                                 MISSING&lt;br /&gt;glGetVertexArrayPointeri_vEXT:                               MISSING&lt;br /&gt;glGetVertexArrayPointervEXT:                                 MISSING&lt;br /&gt;glMapNamedBufferEXT:                                         OK&lt;br /&gt;glMapNamedBufferRangeEXT:                                    MISSING&lt;br /&gt;glMatrixFrustumEXT:                                          OK&lt;br /&gt;glMatrixLoadIdentityEXT:                                     OK&lt;br /&gt;glMatrixLoadTransposedEXT:                                   OK&lt;br /&gt;glMatrixLoadTransposefEXT:                                   OK&lt;br /&gt;glMatrixLoaddEXT:                                            OK&lt;br /&gt;glMatrixLoadfEXT:                                            OK&lt;br /&gt;glMatrixMultTransposedEXT:                                   OK&lt;br /&gt;glMatrixMultTransposefEXT:                                   OK&lt;br /&gt;glMatrixMultdEXT:                                            OK&lt;br /&gt;glMatrixMultfEXT:                                            OK&lt;br /&gt;glMatrixOrthoEXT:                                            OK&lt;br /&gt;glMatrixPopEXT:                                              OK&lt;br /&gt;glMatrixPushEXT:                                             OK&lt;br /&gt;glMatrixRotatedEXT:                                          OK&lt;br /&gt;glMatrixRotatefEXT:                                          OK&lt;br /&gt;glMatrixScaledEXT:                                           OK&lt;br /&gt;glMatrixScalefEXT:                                           OK&lt;br /&gt;glMatrixTranslatedEXT:                                       OK&lt;br /&gt;glMatrixTranslatefEXT:                                       OK&lt;br /&gt;glMultiTexBufferEXT:                                         OK&lt;br /&gt;glMultiTexCoordPointerEXT:                                   OK&lt;br /&gt;glMultiTexEnvfEXT:                                           OK&lt;br /&gt;glMultiTexEnvfvEXT:                                          OK&lt;br /&gt;glMultiTexEnviEXT:                                           OK&lt;br /&gt;glMultiTexEnvivEXT:                                          OK&lt;br /&gt;glMultiTexGendEXT:                                           OK&lt;br /&gt;glMultiTexGendvEXT:                                          OK&lt;br /&gt;glMultiTexGenfEXT:                                           OK&lt;br /&gt;glMultiTexGenfvEXT:                                          OK&lt;br /&gt;glMultiTexGeniEXT:                                           OK&lt;br /&gt;glMultiTexGenivEXT:                                          OK&lt;br /&gt;glMultiTexImage1DEXT:                                        OK&lt;br /&gt;glMultiTexImage2DEXT:                                        OK&lt;br /&gt;glMultiTexImage3DEXT:                                        OK&lt;br /&gt;glMultiTexParameterIivEXT:                                   OK&lt;br /&gt;glMultiTexParameterIuivEXT:                                  OK&lt;br /&gt;glMultiTexParameterfEXT:                                     OK&lt;br /&gt;glMultiTexParameterfvEXT:                                    OK&lt;br /&gt;glMultiTexParameteriEXT:                                     OK&lt;br /&gt;glMultiTexParameterivEXT:                                    OK&lt;br /&gt;glMultiTexRenderbufferEXT:                                   OK&lt;br /&gt;glMultiTexSubImage1DEXT:                                     OK&lt;br /&gt;glMultiTexSubImage2DEXT:                                     OK&lt;br /&gt;glMultiTexSubImage3DEXT:                                     OK&lt;br /&gt;glNamedBufferDataEXT:                                        OK&lt;br /&gt;glNamedBufferSubDataEXT:                                     OK&lt;br /&gt;glNamedCopyBufferSubDataEXT:                                 MISSING&lt;br /&gt;glNamedFramebufferRenderbufferEXT:                           OK&lt;br /&gt;glNamedFramebufferTexture1DEXT:                              OK&lt;br /&gt;glNamedFramebufferTexture2DEXT:                              OK&lt;br /&gt;glNamedFramebufferTexture3DEXT:                              OK&lt;br /&gt;glNamedFramebufferTextureEXT:                                OK&lt;br /&gt;glNamedFramebufferTextureFaceEXT:                            OK&lt;br /&gt;glNamedFramebufferTextureLayerEXT:                           OK&lt;br /&gt;glNamedProgramLocalParameter4dEXT:                           OK&lt;br /&gt;glNamedProgramLocalParameter4dvEXT:                          OK&lt;br /&gt;glNamedProgramLocalParameter4fEXT:                           OK&lt;br /&gt;glNamedProgramLocalParameter4fvEXT:                          OK&lt;br /&gt;glNamedProgramLocalParameterI4iEXT:                          MISSING&lt;br /&gt;glNamedProgramLocalParameterI4ivEXT:                         MISSING&lt;br /&gt;glNamedProgramLocalParameterI4uiEXT:                         MISSING&lt;br /&gt;glNamedProgramLocalParameterI4uivEXT:                        MISSING&lt;br /&gt;glNamedProgramLocalParameters4fvEXT:                         OK&lt;br /&gt;glNamedProgramLocalParametersI4ivEXT:                        MISSING&lt;br /&gt;glNamedProgramLocalParametersI4uivEXT:                       MISSING&lt;br /&gt;glNamedProgramStringEXT:                                     OK&lt;br /&gt;glNamedRenderbufferStorageEXT:                               OK&lt;br /&gt;glNamedRenderbufferStorageMultisampleCoverageEXT:            MISSING&lt;br /&gt;glNamedRenderbufferStorageMultisampleEXT:                    OK&lt;br /&gt;glProgramUniform1fEXT:                                       OK&lt;br /&gt;glProgramUniform1fvEXT:                                      OK&lt;br /&gt;glProgramUniform1iEXT:                                       OK&lt;br /&gt;glProgramUniform1ivEXT:                                      OK&lt;br /&gt;glProgramUniform1uiEXT:                                      OK&lt;br /&gt;glProgramUniform1uivEXT:                                     OK&lt;br /&gt;glProgramUniform2fEXT:                                       OK&lt;br /&gt;glProgramUniform2fvEXT:                                      OK&lt;br /&gt;glProgramUniform2iEXT:                                       OK&lt;br /&gt;glProgramUniform2ivEXT:                                      OK&lt;br /&gt;glProgramUniform2uiEXT:                                      OK&lt;br /&gt;glProgramUniform2uivEXT:                                     OK&lt;br /&gt;glProgramUniform3fEXT:                                       OK&lt;br /&gt;glProgramUniform3fvEXT:                                      OK&lt;br /&gt;glProgramUniform3iEXT:                                       OK&lt;br /&gt;glProgramUniform3ivEXT:                                      OK&lt;br /&gt;glProgramUniform3uiEXT:                                      OK&lt;br /&gt;glProgramUniform3uivEXT:                                     OK&lt;br /&gt;glProgramUniform4fEXT:                                       OK&lt;br /&gt;glProgramUniform4fvEXT:                                      OK&lt;br /&gt;glProgramUniform4iEXT:                                       OK&lt;br /&gt;glProgramUniform4ivEXT:                                      OK&lt;br /&gt;glProgramUniform4uiEXT:                                      OK&lt;br /&gt;glProgramUniform4uivEXT:                                     OK&lt;br /&gt;glProgramUniformMatrix2fvEXT:                                OK&lt;br /&gt;glProgramUniformMatrix2x3fvEXT:                              OK&lt;br /&gt;glProgramUniformMatrix2x4fvEXT:                              OK&lt;br /&gt;glProgramUniformMatrix3fvEXT:                                OK&lt;br /&gt;glProgramUniformMatrix3x2fvEXT:                              OK&lt;br /&gt;glProgramUniformMatrix3x4fvEXT:                              OK&lt;br /&gt;glProgramUniformMatrix4fvEXT:                                OK&lt;br /&gt;glProgramUniformMatrix4x2fvEXT:                              OK&lt;br /&gt;glProgramUniformMatrix4x3fvEXT:                              OK&lt;br /&gt;glPushClientAttribDefaultEXT:                                OK&lt;br /&gt;glTextureBufferEXT:                                          OK&lt;br /&gt;glTextureImage1DEXT:                                         OK&lt;br /&gt;glTextureImage2DEXT:                                         OK&lt;br /&gt;glTextureImage3DEXT:                                         OK&lt;br /&gt;glTextureParameterIivEXT:                                    OK&lt;br /&gt;glTextureParameterIuivEXT:                                   OK&lt;br /&gt;glTextureParameterfEXT:                                      OK&lt;br /&gt;glTextureParameterfvEXT:                                     OK&lt;br /&gt;glTextureParameteriEXT:                                      OK&lt;br /&gt;glTextureParameterivEXT:                                     OK&lt;br /&gt;glTextureRenderbufferEXT:                                    OK&lt;br /&gt;glTextureSubImage1DEXT:                                      OK&lt;br /&gt;glTextureSubImage2DEXT:                                      OK&lt;br /&gt;glTextureSubImage3DEXT:                                      OK&lt;br /&gt;glUnmapNamedBufferEXT:                                       OK&lt;br /&gt;glVertexArrayColorOffsetEXT:                                 MISSING&lt;br /&gt;glVertexArrayEdgeFlagOffsetEXT:                              MISSING&lt;br /&gt;glVertexArrayFogCoordOffsetEXT:                              MISSING&lt;br /&gt;glVertexArrayIndexOffsetEXT:                                 MISSING&lt;br /&gt;glVertexArrayMultiTexCoordOffsetEXT:                         MISSING&lt;br /&gt;glVertexArrayNormalOffsetEXT:                                MISSING&lt;br /&gt;glVertexArraySecondaryColorOffsetEXT:                        MISSING&lt;br /&gt;glVertexArrayTexCoordOffsetEXT:                              MISSING&lt;br /&gt;glVertexArrayVertexAttribIOffsetEXT:                         MISSING&lt;br /&gt;glVertexArrayVertexAttribOffsetEXT:                          MISSING&lt;br /&gt;glVertexArrayVertexOffsetEXT:                                MISSING&lt;br /&gt;I don't know if posted but error printing for ocl errors is in&amp;nbsp;oclutils.cpp nvida sdk&lt;br /&gt;// Helper function to get error string&lt;br /&gt;// *********************************************************************&lt;br /&gt;const char* oclErrorString(cl_int error)&lt;br /&gt;{&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;static char errorString[][64] = {&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_SUCCESS",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_DEVICE_NOT_FOUND",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_DEVICE_NOT_AVAILABLE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_COMPILER_NOT_AVAILABLE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_MEM_OBJECT_ALLOCATION_FAILURE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_OUT_OF_RESOURCES",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_OUT_OF_HOST_MEMORY",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_PROFILING_INFO_NOT_AVAILABLE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_MEM_COPY_OVERLAP",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_IMAGE_FORMAT_MISMATCH",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_IMAGE_FORMAT_NOT_SUPPORTED",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_BUILD_PROGRAM_FAILURE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_MAP_FAILURE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_VALUE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_DEVICE_TYPE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_PLATFORM",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_DEVICE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_CONTEXT",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_QUEUE_PROPERTIES",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_COMMAND_QUEUE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_HOST_PTR",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_MEM_OBJECT",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_IMAGE_FORMAT_DESCRIPTOR",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_IMAGE_SIZE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_SAMPLER",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_BINARY",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_BUILD_OPTIONS",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_PROGRAM",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_PROGRAM_EXECUTABLE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_KERNEL_NAME",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_KERNEL_DEFINITION",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_KERNEL",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_ARG_INDEX",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_ARG_VALUE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_ARG_SIZE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_KERNEL_ARGS",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_WORK_DIMENSION",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_WORK_GROUP_SIZE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_WORK_ITEM_SIZE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_GLOBAL_OFFSET",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_EVENT_WAIT_LIST",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_EVENT",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_OPERATION",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_GL_OBJECT",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_BUFFER_SIZE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_MIP_LEVEL",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;"CL_INVALID_GLOBAL_WORK_SIZE",&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;};&lt;br /&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;return errorString[-error];&lt;br /&gt;}&lt;br /&gt;also ogl qbf stereo is not enabled seeing glwinfo&lt;br /&gt;seeing ogl driver depends on adl but not workstatinon_setstereo get stereo o caps functions used..&lt;br /&gt;also&lt;br /&gt;&lt;br /&gt;set OGL_FORCE_ASIC_ID=37956&lt;br /&gt;set OGL_FORCE_ASIC_ID=68BE&lt;br /&gt;set OGL_FORCE_ASIC_ID=0x68BE&lt;br /&gt;set OGL_FORCE_ASIC_ID=26814&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;not seen&lt;br /&gt;set OGL_FORCE_ASIC_ID=9444&lt;br /&gt;"ATI FirePro V8750 (FireGL)" = ati2mtag_R7XGL, PCI\VEN_1002&amp;amp;DEV_9444&lt;br /&gt;mine is&lt;br /&gt;"ATI Radeon HD 5800 Series " = ati2mtag_Evergreen, PCI\VEN_1002&amp;amp;DEV_6899&lt;br /&gt;set OGL_ENABLE_FORCE_ASIC_ID=1&lt;br /&gt;&lt;br /&gt;tested&lt;br /&gt;glewinfo for gpu name and&lt;br /&gt;visualinfo.exe&lt;br /&gt;&lt;br /&gt;installing 8.68.3 firepro driver has 30bit support and stereo:&lt;br /&gt;seen&lt;br /&gt;&lt;br /&gt;HKR,, DisableOGL10BitPixelFormats, &amp;nbsp; &amp;nbsp; &amp;nbsp;%REG_DWORD%, &amp;nbsp; &amp;nbsp;0&lt;br /&gt;HKR,, Gxo30BppPanels, %REG_BINARY%,&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;15,C3,76,17,15,C3,78,17&lt;br /&gt;&lt;div&gt;in installation diff versus cataluyst&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;HKR,, DALNonStandardModesBCD1, %REG_BINARY%,12,80,07,68,00,00,00,00,12,80,09,60,00,00,00,00,16,00,12,00,00,00,00,70,17,92,13,44,00,00,00,00,18,00,14,40,00,00,00,00,18,56,13,92,00,00,00,00&lt;/div&gt;&lt;div&gt;HKR,, DALRULE_AllowNativeModeAsDefaultModes, &amp;nbsp; &amp;nbsp; &amp;nbsp;%REG_DWORD%, &amp;nbsp; &amp;nbsp;1&lt;/div&gt;&lt;div&gt;GCORULE_ExtTMDSReduceBlankTiming, &amp;nbsp; &amp;nbsp;%REG_DWORD%, &amp;nbsp; &amp;nbsp;1&lt;/div&gt;&lt;/div&gt;&lt;div&gt;"ATI FireGL V3600" = ati2mtag_RV630GL, PCI\VEN_1002&amp;amp;DEV_958D&lt;/div&gt;&lt;div&gt;"ATI FireGL V5600" = ati2mtag_RV630GL, PCI\VEN_1002&amp;amp;DEV_958C&lt;/div&gt;&lt;div&gt;"ATI FireGL V7600" = ati2mtag_R600GL, PCI\VEN_1002&amp;amp;DEV_940F&lt;/div&gt;&lt;div&gt;"ATI FireGL V7700" = ati2mtag_RV630GL, PCI\VEN_1002&amp;amp;DEV_9511&lt;/div&gt;&lt;div&gt;"ATI FireGL V8600" = ati2mtag_R600GL, PCI\VEN_1002&amp;amp;DEV_940B&lt;/div&gt;&lt;div&gt;"ATI FireGL V8650" = ati2mtag_R600GL, PCI\VEN_1002&amp;amp;DEV_940A&lt;/div&gt;&lt;div&gt;"ATI FirePro 2260" = ati2mtag_RV610, PCI\VEN_1002&amp;amp;DEV_95CF&lt;/div&gt;&lt;div&gt;"ATI FirePro 2260 " = ati2mtag_RV610, PCI\VEN_1002&amp;amp;DEV_95CE&lt;/div&gt;&lt;div&gt;"ATI FirePro 2450" = ati2mtag_RV610, PCI\VEN_1002&amp;amp;DEV_95CD&lt;/div&gt;&lt;div&gt;"ATI FirePro V3700 (FireGL)" = ati2mtag_RV620GL, PCI\VEN_1002&amp;amp;DEV_95CC&lt;/div&gt;&lt;div&gt;"ATI FirePro V3750 (FireGL)" = ati2mtag_R7XGL, PCI\VEN_1002&amp;amp;DEV_949F&lt;/div&gt;&lt;div&gt;"ATI FirePro V5700 (FireGL)" = ati2mtag_R7XGL, PCI\VEN_1002&amp;amp;DEV_949E&lt;/div&gt;&lt;div&gt;"ATI FirePro V7750 (FireGL)" = ati2mtag_R7XGL, PCI\VEN_1002&amp;amp;DEV_949C&lt;/div&gt;&lt;div&gt;"ATI FirePro V8700 (FireGL)" = ati2mtag_R7XGL, PCI\VEN_1002&amp;amp;DEV_9456&lt;/div&gt;&lt;div&gt;"ATI FirePro V8750 (FireGL)" = ati2mtag_R7XGL, PCI\VEN_1002&amp;amp;DEV_9444&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;[ati2mtag_R6xGL_SoftwareDeviceSettings]&lt;/div&gt;&lt;div&gt;HKR,, OGL_Specific_NA, %REG_SZ%, 1&lt;/div&gt;&lt;div&gt;HKR,, CatalystAI_NA, %REG_SZ%, 1&lt;/div&gt;&lt;div&gt;HKR,, APISpecific_NA, %REG_SZ%, 1&lt;/div&gt;&lt;div&gt;HKR,, TemporalAAMultiplier_NA, %REG_SZ%, 1&lt;/div&gt;&lt;div&gt;HKR,, Main3D_NA, %REG_SZ%, 1&lt;/div&gt;&lt;div&gt;HKR,, VPURecover_NA, %REG_SZ%, 1&lt;/div&gt;&lt;div&gt;HKR,, SmartGart_NA, %REG_SZ%, 1&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-164554294122829072?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/164554294122829072/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/about-ati-and-nvidia-drivers-ocl.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/164554294122829072'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/164554294122829072'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/about-ati-and-nvidia-drivers-ocl.html' title='About ATI and Nvidia drivers (OCL included)!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-5809807617323931036</id><published>2010-02-25T16:07:00.005+01:00</published><updated>2010-02-25T16:55:23.794+01:00</updated><title type='text'>Shaders: measuring perf, source translation and parsing different languages!</title><content type='html'>Hi,&lt;br /&gt;I hope to be pretty exhaustive of options for parsing and translating between graphics and compute shaders ( some open source)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;For DX shaders:&lt;br /&gt;*GPU Shader analyzer (AMD ONLY)(get DX IL and get AMD IL and GPU assembly from DirectCompute shaders or graphic shaders): now is dx11 compatible and 5xxx series and has compute domain and hull shaders..&lt;br /&gt;*fxc and D3DCompile API and lib: get DX IL and bytecode from DX shaders (multivendor)&lt;br /&gt;note I don't know how to build from DX IL as D3DCompile doesn't accept and also not CreateComputeShader.. finally GPU SA doesn't want to eat too..&lt;br /&gt;it's a fxc option or API for going from DX IL to DX BC so I can optimize DX IL and then compile to DX ByteCode and feed that to a compute shader?&lt;br /&gt;at least if you have source you can see AMD IL, DX IL and R800 assembly and&lt;br /&gt;&lt;br /&gt;teoretically you can get AMD IL from Compute shaders using GPU SA and feed into OpenCL when it support getting binaries and building from it (or intercepting now llc or something like that).. so in OpenCL you can modify generated assembly soon (on&amp;nbsp; Nvidia now..)&lt;br /&gt;so you can compare quailty of generated code..&lt;br /&gt;Also now you can feed DX shaders to GPU SA and a equivalent OCL shader through SKA and compare AMD IL, assembly and even all the info of ALU/tex kernels/s etc..&lt;br /&gt;I have tested simple vectoradd and quality is the same (kernels/s) altough AMD IL from OCL seems much longer..&lt;br /&gt;Parsing HLSL: you have Nvidia CG compiler source so CG=HLSL in 99% so you have parser and front end compiler code.. (I think it has some flex bison things)&lt;br /&gt;&lt;br /&gt;There was a AMD HLSL which was extension to HLSL having scatter doubles etc..&lt;br /&gt;http://coachk.cs.ucf.edu/courses/CDA6938/s08/UCF-2008-02-01b.pdf&lt;br /&gt;http://coachk.cs.ucf.edu/courses/CDA6938/s08/AMD_IL.pdf&lt;br /&gt;&lt;br /&gt;Now included in compute shader 5.0 and pixel shader 5.0 in DX11 all functionality also included in upcoming GLSL ext_gpu_shader5 I presume (I can't find AMD HLSL compiler anywhere so I think efforts migrated to Brook+ efforts and AMD IL):&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Note for Nvidia there is a tool similar to GPU SA but I think it's payed (ShaderPerf, perfkit can't I think..)&lt;br /&gt;Have to see if Nexus will have PTX code from shaders or anything like that..&lt;br /&gt;Also DX11 support is missing naturally in all tools (Perfkit, shaderperf,etc..)&lt;br /&gt;&lt;br /&gt;HLSL&amp;lt;-&amp;gt;GLSL source to source translation:&lt;br /&gt;hlsl2glsl-v0.9 (OpenGL ES also source code)&lt;br /&gt;babelshader&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Only pixel vertex shaders..&lt;br /&gt;&lt;br /&gt;GLSL&lt;br /&gt;GPU Shader analyzer (get AMD IL from DirectCompute  shaders or graphic shaders)&lt;br /&gt;you can use HLSL-&amp;gt;GLSL and using GPU SKA compare quality of  generated GLSL vs HLSL AMD IL or assembly code..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Parsing GLSL:&lt;br /&gt;you have flex and bison almost from spec  (tokens and grammar).. &lt;br /&gt;&lt;br /&gt;3d labs glsl validate and front end compiler open source..&lt;br /&gt;(i can't find)&lt;br /&gt;&lt;br /&gt;hlsl2glsl-v0.9..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Brook+ is open source&lt;br /&gt;has AMD IL code gen source and brook parsing (ctool based)&lt;br /&gt;&lt;br /&gt;Another thing is measuring perf of not shaders of whole thing with OpenGL gdebugger, mac OpenGL perf libs, AMD GPU 2.1, Nvidia Nexus and GL perf API and libs(perfkit sdk),&lt;br /&gt;For having similar to OpenCL CAL lib see CAL++. &lt;br /&gt;For porting CUDA to OpenCL there are to guides from Nvidia and AMD and:&lt;br /&gt;&lt;a class="http" href="http://www.cse.scitech.ac.uk/disco/mew20/presentations/GPU_MattHarvey.pdf"&gt;Experiences  porting from CUDA to OpenCL&lt;/a&gt;&lt;br /&gt;&amp;nbsp;Presentation at the Daresbury Machine  Evaluation Workshop, 2009&lt;br /&gt;also a tool: &lt;br /&gt;&lt;a href="http://www.multiscalelab.org/swan"&gt;Swan: A simple tool for  porting CUDA kernels to OpenCL&lt;/a&gt;&lt;br /&gt;A good OpenCL to DirectCompute driver wuold be good!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-5809807617323931036?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/5809807617323931036/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/gpu-computing-source-translation-and.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5809807617323931036'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5809807617323931036'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/gpu-computing-source-translation-and.html' title='Shaders: measuring perf, source translation and parsing different languages!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-3616851091663499206</id><published>2010-02-19T21:20:00.003+01:00</published><updated>2010-02-20T00:00:20.500+01:00</updated><title type='text'>Enabling OpenCL Image support on AMD GPUs!</title><content type='html'>Well I have been holding this trick on my head for over a month now..&lt;br /&gt;More info on my blog coming soon: oscarbg.blogspot.com&lt;br /&gt;Really you can enable image support set:&lt;br /&gt;set GPU_IMAGES_SUPPORT=1&lt;br /&gt;or export GPU_IMAGES_SUPPORT=1 in linux&lt;br /&gt;tested on 5870 and amd stream 2.0 and hotfix 9.12 only works for&lt;br /&gt;2d images.. &lt;br /&gt;Similarly you can enable byte_addresable_support (but seems is not using RAW UAVs) and some Nvidia samples work (histogram64) with GPU_BYTE_ADDRESSABLE_STORE&lt;br /&gt;Also doules extension reporting&lt;br /&gt;GPU_DOUBLE_PRECISION&lt;br /&gt;and gl&lt;br /&gt;CL_KHR_GL_SHARING&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-3616851091663499206?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/3616851091663499206/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/enabling-image-support-on-gpus.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3616851091663499206'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3616851091663499206'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/enabling-image-support-on-gpus.html' title='Enabling OpenCL Image support on AMD GPUs!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-5909246349703389851</id><published>2010-02-19T21:13:00.000+01:00</published><updated>2010-02-19T21:13:51.586+01:00</updated><title type='text'>Running QT everywhere!</title><content type='html'>TODO: post links for every thing..&lt;br /&gt;I have just found a lot of platform running QT!&lt;br /&gt;Last QT 4.6.2 ships with win32 bin, mac(32,64) and linux(32,64)!&lt;br /&gt;You can build for win64 but is long and qt 4.6 win64 binaries ara avaiable on google code since today!&lt;br /&gt;If you use VS install latest qt vs ide 1.1.4&lt;br /&gt;you can build also with qt creator 1.3.1&lt;br /&gt;For mobiles:&lt;br /&gt;you have symbian and maemo and now meebo (moblin+maemo)&lt;br /&gt;Also I have found tegra2 board working in qt blog post! (android? windows ce? linux?)&lt;br /&gt;Also you have a google nacl port (for Chrome browser or IE via frame) in qt labs blog!&lt;br /&gt;A port to kindle amazon is online also!&lt;br /&gt;And in MWC has been shown working with remaining mobile GPUs:&lt;br /&gt;*omap4(sgx 540)&lt;br /&gt;*st u8500 (mali gpu)&lt;br /&gt;which jointly with &lt;br /&gt;tegra2 (nvidia gpu)&lt;br /&gt;show is everywhere..&lt;br /&gt;Well for Android you have a QT port also:&lt;br /&gt;you need custom NDK with STL port included if you want..&lt;br /&gt;&lt;blockquote&gt;Why I'll choose Qt GUI and not Android one? &lt;br /&gt;1. The speed, Qt is more powerful and it's much more faster. &lt;br /&gt;2. The features, just look at http://doc.trolltech.com/4.6/qtgui.html. &lt;br /&gt;3. Declarative UI. &lt;br /&gt;4. The API is very robust and stable. &lt;br /&gt;5. IMHO Qt is written in a superior language. I don't like java :P. I &lt;br /&gt;think if you'll ask java about me it will give you the same answer :P. &lt;br /&gt;(Ok here I'm jocking). &lt;br /&gt;6. etc.&lt;/blockquote&gt;&lt;br /&gt;only left is ipod ipad but in progress:&lt;br /&gt;http://www.qt-iphone.com/Roadmap.html&lt;br /&gt;currently QtCore mostly done, QtGUI hard as cocoa touch!=coca&lt;br /&gt;also would be good to have all QT multitouch support and Mobility APIs just anounced as Location +Sensors+Camera API..&lt;br /&gt;but this is easier said than done&lt;br /&gt;then I can programm for QT for everything..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-5909246349703389851?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/5909246349703389851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/running-qt-everywhere.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5909246349703389851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5909246349703389851'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/running-qt-everywhere.html' title='Running QT everywhere!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-3815274440506015186</id><published>2010-02-19T17:37:00.002+01:00</published><updated>2010-02-19T17:37:56.289+01:00</updated><title type='text'>Parallel algorithms avaiable on CUDA,OCL,DC,CAL: status update</title><content type='html'>lin alg status update:&lt;br /&gt;Matmul:&lt;br /&gt;CUDA: CUBLAS (no code) Volkov (code) and yesterday post (assembly fastest to date 480 gflops)&lt;br /&gt;CAL: beyond3d cal 1tflop matmul post&lt;br /&gt;OCL: hazeman post above uses port of cal code to propietary but similar to CL code..&lt;br /&gt;DC: bernaclejunior testing with doubles doesn't worked (XNA forums)&lt;br /&gt;Matvec:&lt;br /&gt;CUDA: CUBLAS (closed) and some papers use  custom code (magma, paper mid 2008) (as 20-50% faster)&lt;br /&gt;OCL: Bealto post above (high efficient on AMD and ATI) should be easy to port DC&lt;br /&gt;Sparse matvec:&lt;br /&gt;CUDA: CNC,CUSP,etc..&lt;br /&gt;OCL,DC: BernacleJunior post on AMD and XNA forums (working on it)..&lt;br /&gt;&lt;br /&gt;FFT:&lt;br /&gt;CUDA: CUFFT 2 papers at SC08 having higher perf 3d ftts and 2d paper -&gt;d3dCx&lt;br /&gt;DC: has lib&lt;br /&gt;OCL:&lt;br /&gt;Apple code is 2x-3x slower than CUFFT seems (on Nvidia Linux )(also 10.6.2 is slow go see 10.6.3..)&lt;br /&gt;on AMD doesn't work for size &gt;512^2 in 2.0 or 2.01 fixed internally seems..&lt;br /&gt;AMD 2.01 sample is hard coded 1024 perf?&lt;br /&gt;&lt;br /&gt;Sort:&lt;br /&gt;CUDA: CUDPP, CUDA sample (code)&lt;br /&gt;OCL,DC:  BernacleJunior post on AMD and XNA forums.He claims near 400Mkeys/s on vs state of the art Nvidia sorting less 200mkeys on GTX285.&lt;br /&gt;Also reportedly Lee Hows has fast code working!&lt;br /&gt;&lt;br /&gt;also CUDPP has triangular solvers and soon graph algos and hashes..&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-3815274440506015186?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/3815274440506015186/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/parallel-algorithms-avaiable-on.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3815274440506015186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/3815274440506015186'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/parallel-algorithms-avaiable-on.html' title='Parallel algorithms avaiable on CUDA,OCL,DC,CAL: status update'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-810718209630106189</id><published>2010-02-19T17:08:00.015+01:00</published><updated>2010-02-19T21:33:16.255+01:00</updated><title type='text'>More news!</title><content type='html'>I have left some news and some news:&lt;br /&gt;UPDATE:&lt;br /&gt;&lt;blockquote&gt;1. AMD SKA allows getting AMD IL without having AMD GPU and also see tex:alu ratio, and other info for all AMD GPUs at the same time&lt;br /&gt;2. AMD SDK ships utils source so now Nvidia and AMD OCL SDKs can be compiled in VS2010!&lt;br /&gt;3. gdebugger 5.5 doesn't detect amd perf counters with 10.2 I think with 9.12 hotfix worked&lt;br /&gt;not working with 10.3 beta&lt;br /&gt;&lt;/blockquote&gt;*See next post &lt;a href="http://oscarbg.blogspot.com/2010/02/parallel-algorithms-avaiable-on.html"&gt;http://oscarbg.blogspot.com/2010/02/parallel-algorithms-avaiable-on.html&lt;/a&gt;&lt;br /&gt;*Fermi X2 on track, possible launch date is May!&lt;br /&gt;*In 2-3 weeks we have 5830 (high perf low budget card) and 2GB 6 miniDP 5870 card on 11 march!&lt;br /&gt;*catalyst 10.3 beta leak avaiable go search for it! (8.71.3 CAL 556)&lt;br /&gt;*gpu computing gems call&lt;br /&gt;*&lt;a href="http://forums.amd.com/devforum/messageview.cfm?catid=390&amp;amp;threadid=127963&amp;amp;enterthread=y"&gt;matmul by hazeman&lt;/a&gt;:&lt;br /&gt;it's a assembly-&amp;gt;c port similar to 1tflops mamtmul cal example it's bad it uses her own C-&amp;gt;IR compiler but easy port OCL? and what about perf?&lt;br /&gt;*bernaclejunior is doing good job regarding sort and sparse matvec on OCL,DC..&lt;br /&gt;He claims near 400Mkeys/s on vs state of the art Nvidia sorting less 200mkeys on GTX285.&lt;br /&gt;Also reportedly Lee Hows has fast code working!&lt;br /&gt;Some intermediate code posted on XNA and AMD forums but still not the best..&lt;br /&gt;*Matvec mul high perf OCL code from Bealto (AMD and Nvidia tested).&lt;br /&gt;*I have tested cubin optimized matmul code and I get 480gflop/s not bad from 380gflop/s&lt;br /&gt;and also I have seen tesla computing driver no supports overclokcing in evga precision..&lt;br /&gt;also gpu-z and evga not read core speed and mem speed and also not gpu usage and mem info anyway&lt;br /&gt;temperature and fan speed is ok..&lt;br /&gt;It's very long so (tested on vc2010rc1)&lt;br /&gt;change in autoprofile:&lt;br /&gt;&lt;blockquote&gt;profile_sgemm_square("../method1/decuda_ldsb32_cudasm.cubin", "method1_variant_sgemmNN", &amp;amp;method1_DrvWrapper, cat(OUTPUT_DIR,"method1/variant_threads320.txt") );&lt;br /&gt;profile_general_sgemm_square("../method6/decuda_ldsb32_cudasm.cubin", "method6_variant_sgemmNN", &amp;amp;method6_DrvWrapper, cat(OUTPUT_DIR,"method6/variant_threads320.txt") );&lt;br /&gt;profile_general_sgemm_square("../method7/decuda_ldsb32_cudasm.cubin", "method7_variant_sgemmNN", &lt;br /&gt;&amp;amp;method7_DrvWrapper, cat(OUTPUT_DIR,"method7/variant_threads320.txt") );&lt;br /&gt;profile_sgemm_square("../method8/decuda_ldsb32_cudasm.cubin", "method8_variant_sgemmNN", &lt;br /&gt;&amp;amp;method8_DrvWrapper, cat(OUTPUT_DIR,"method8/variant_threads256.txt") );&lt;/blockquote&gt;variants are the fastest and 1 is the best (480gflops/s). also set:&lt;br /&gt;&lt;blockquote&gt;for( n1 = 32 ; n1 &amp;lt;=  4096 ; n1+=96) for( n1 = 5 ; n1 &amp;lt;=  4096 ; n1++)&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;result is 100x test speed in&lt;br /&gt;&lt;blockquote&gt;-&amp;gt;profile_general_sgemm_square(profile_general_sgemm_suqare.cpp,profile_sgemm_suqare.cpp)&lt;br /&gt;-&amp;gt;profile_CUBLAS_overN&lt;/blockquote&gt;&lt;br /&gt;* I also have tested voxel sparse demo and fixed for tcc but building 1gb samples crashes on&lt;br /&gt;ball example no mem with x32 release exe but x64 crashes anyway have to fix..&lt;br /&gt;Found also sibenik and Fairy scenes but I don't know how to build sibenik-d example displacament&lt;br /&gt;mapped using bump map texs(?)&lt;br /&gt;I have to test it..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&amp;amp;Main=52473&amp;amp;Number=271470#Post271470"&gt;Antialiasing in Deferred shading GL code&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a name='more'&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;new GL multivendor SM5.0 info found in 10.3:&lt;br /&gt;*GL_EXT_tessellation_shader   &lt;br /&gt;&lt;blockquote&gt;gl_TessCoord    gl_TessLevelOuter   gl_TessLevelInner&lt;/blockquote&gt;*GL_EXT_shader_subroutine&lt;br /&gt;*GL_EXT_gpu_shader5&lt;br /&gt;&lt;blockquote&gt;memoryBarrier   bitCount    findLSB findMSB bitfieldReverse bitfieldInsert  bitfieldExtract floatBitsToInt  floatBitsToUint intBitsToFloat  uintBitsToFloat&lt;/blockquote&gt;*GL_EXT_gpu_shader_fp64&lt;br /&gt;new in 10.3:&lt;br /&gt;*GL_EXT_shader_atomic_counters &lt;br /&gt;GL_MAX_ATOMIC_COUNTERS_EXT&lt;br /&gt;&lt;blockquote&gt;glResetAtomicCounter&lt;br /&gt;check fail: index must be a constant in atomic counter functions&lt;br /&gt;gl_MaxAtomicCountersEXT&lt;br /&gt;atomicCounterIncrementEXT   atomicCounterDecrementEXT   atomicCounterEXT&lt;br /&gt;imageAtomicAdd  imageAtomicSub  imageAtomicMin  imageAtomicMax  imageAtomicIncWrap  imageAtomicDecWrap  imageAtomicAnd  imageAtomicOr   imageAtomicXor  imageAtomicExchange imageAtomicCompSwap&lt;/blockquote&gt;GL_EXT_texture_compression_bptc (replaces amd extensions)&lt;br /&gt;GL_AMD_conservative_depth&lt;br /&gt;&lt;br /&gt;OpenCL:&lt;br /&gt;1.pinned mem enabled on nvidia via:&lt;br /&gt;&lt;blockquote&gt;I use&lt;br /&gt;&lt;br /&gt;host_mem = clCreateBuffer(context,&lt;br /&gt;CL_MEM_ALLOC_HOST_PTR | CL_MEM_READ_WRITE,&lt;br /&gt;size,NULL,&amp;amp;ocl_err);&lt;br /&gt;*ptr = (void*)clEnqueueMapBuffer(cmd_queue,host_mem,&lt;br /&gt;CL_TRUE,CL_MAP_READ|CL_MAP_WRITE,&lt;br /&gt;0,size,0,NULL,&amp;amp;evt,&amp;amp;ocl_err);&lt;br /&gt;&lt;br /&gt;to create page locked memory using the NVIDIA driver, where it works fine. However on my AMD card this makes no difference to malloced memory.&lt;/blockquote&gt;AMD guys confirm still not working.&lt;br /&gt;2. cvs with icd code is for Khronos members spec updated&lt;br /&gt;3. new cl headers at khronos list funcaddress used by ICD.. has cl_ext.h and cl_gl_ext.h&lt;br /&gt;http://www.khronos.org/registry/cl/ headers&lt;br /&gt;4. OpenCL 2.01 on Ubuntu 9.10:&lt;br /&gt;&lt;blockquote&gt;You indeed have to boot with "nopat" or use Catalyst 10.2, when it becomes available. CAL version &amp;gt;= 1.4.553 to get this working without "nopat" option.&lt;/blockquote&gt;&lt;br /&gt;XvBA and other linux video decoding updates:&lt;br /&gt;*vaapi guy working on Crystal HD support?&lt;br /&gt;For Crystal HD demos:&lt;br /&gt;Crystal HD SDK from GIT, as of 2010/02/15.&lt;br /&gt;&lt;br /&gt;*at least supported in basic samples&lt;br /&gt;*xvba now working for vlc 1.1git and gnash via updates for xvba-vaapi and gnash&lt;br /&gt;&lt;br /&gt;Status of Xbva:&lt;br /&gt;Works with MPLAYER (ass subtitles included),VLC and GNASH!&lt;br /&gt;issues:&lt;br /&gt;1.First broken decode in 5xxx..&lt;br /&gt;2.Deinterlacing is broken in XvBA. It's the second most critical bug that has to be fixed by the &lt;br /&gt;end of April.&lt;br /&gt;(Only bob deinterlacing at this time. More elaborated deinterlacers are not, and won't be, exposed to the public builds of xvba-video.)&lt;br /&gt;&lt;br /&gt;Changelog:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Version 0.6.5 - 08.Feb.2010&lt;br /&gt;* Add brightness/contrast/hue/saturation display attributes&lt;br /&gt;* Fix vaPutSurface() window resize. e.g. when switching to full-screen mode&lt;br /&gt;* Allow vaPutSurface() to render to multiple drawables from a single surface&lt;br /&gt;&lt;br /&gt;Notes:&lt;br /&gt;- My ProcAmp adjustments are probably not fully correct. e.g. hue doesn't preserve luminance yet. Besides, this uses an extra FBO.&lt;br /&gt;- The last change workarounds a bug in the driver and now makes it possible to use VA-API acceleration with Gnash with the the AGG renderer. However, this exhausts another performance problem (flickering in windowed mode) of the driver. You can workaround that with XVBA_VIDEO_PUTSURFACE_FAST set to "yes" or "1". The semantics are not fully equivalent and can cause problems, hence it's disabled by default though it's designed to work with Gnash and MPlayer.&lt;br /&gt;There is already native VA-API support for G45. At this time, it only does MPEG-2 VLD, i.e. full video decode. Intel is working on H.264 support and this should be available by Q2. I don't think there is any H.264 video decoding at Gallium3D level yet, so VDPAU / VA-API support would be useless at this time.&lt;/blockquote&gt;&lt;br /&gt;&lt;blockquote&gt;Version 0.6.6 - 11.Feb.2010&lt;br /&gt;* Fix XvBA objects destruction for fglrx &amp;gt;= 8.70.3&lt;br /&gt;* Fix vaPutImage() to a surface used for decoding&lt;br /&gt;* Fix vaGetImage()/vaPutSurface() with surface dimensions not a multiple of 16&lt;br /&gt;* Fix rendering of VA subpictures that were previously deassociated&lt;br /&gt;&lt;br /&gt;The third change is actually two different workarounds for a single and major flaw in XvBA. I have not fully regression tested but this looks OK for MPlayer, Gnash and VLC. This should fix Kano problems.&lt;br /&gt;The fourth change is a fix for MPlayer/VA-API with ASS support, and that I will probably upload tomorrow. I have to check against the latest Intel drivers first. NVIDIA is already fine.&lt;/blockquote&gt;With this mplayer-vaapi snapshot and xvba 0.6.6 ASS works!&lt;br /&gt;&lt;blockquote&gt;Version 0.6.7 - 18.Feb.2010&lt;br /&gt;* Use fail-safe values for H.264 videos encoded over HP@L4.1&lt;br /&gt;* Fix hue rotation to preserve luminance&lt;br /&gt;* Fix internal contrast range to [ 0.0f .. 10.0f ]&lt;br /&gt;* Fix rendering of multiple subpictures per surface&lt;br /&gt;* Fix vaCopySurfaceGLX() for surfaces with dimensions not a multiple of 16&lt;br /&gt;&lt;br /&gt;- The first change ensures that we don't crash or do weird things if we throw unsupported H.264 contents to the decoder. Wel, it&lt;br /&gt;tries to get things on a safer side, without really fixing it.&lt;br /&gt;&lt;br /&gt;- The ProcAmp changes are probably still not correct but this looks better for contrast and hue rotation.&lt;br /&gt;&lt;br /&gt;- The fourth change fixes rendering of multiple subpictures per surface. In particular, you can now have OSD + EOSD + ProcAmp&lt;br /&gt;adjustment bars (3 subpictures) in MPlayer without crashing the application.&lt;br /&gt;&lt;br /&gt;- The last change is a workaround for a serious XvBA flaw, now implemented in vaCopySurfaceGLX(). e.g. for mplayer -vo vaapi:gl -va&lt;br /&gt;vaapi. As a side effect, this would also workaround another limitation in the future iteration (0.6.8) whereby only GL_BGRA textures&lt;br /&gt;are supported at this time.&lt;/blockquote&gt;&lt;br /&gt;Mplayer vaapi&lt;br /&gt;&lt;blockquote&gt;Version 2010.02.12&lt;br /&gt;* Fix YV12 rendering for SW codecs&lt;br /&gt;* Add EOSD support (ASS subtitles)&lt;br /&gt;* Add compatibility with original VA-API 0.29&lt;br /&gt;* Add support for -geometry +xxx+yyy (Adam Strzelecki)&lt;br /&gt;&lt;br /&gt;For EOSD &amp;amp; AMD, you need xvba-video &amp;gt;= 0.6.6.&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-810718209630106189?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/810718209630106189/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/more-news.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/810718209630106189'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/810718209630106189'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/more-news.html' title='More news!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-985438326660395631</id><published>2010-02-18T21:21:00.001+01:00</published><updated>2010-02-18T21:23:11.280+01:00</updated><title type='text'>Learned from voxel rendering demo code: CUDA 3.0 how to change cache size (for Fermi) function found!</title><content type='html'>its in voxel code:&lt;br /&gt;\efficient-sparse-voxel-octrees\src\framework\base\dllimport.inl&lt;br /&gt;cuFuncSetCacheConfig&lt;br /&gt;cuFuncSetCacheConfig,                   (CUfunction hfunc, CUfunc_cache config), (hfunc, config))&lt;br /&gt;also other functions i didn't know in:&lt;br /&gt;cuGraphicsSubResourceGetMappedArray&lt;br /&gt;cuGetExportTable&lt;br /&gt;&lt;br /&gt;Also they don't use GLEW and initialize..&lt;br /&gt;other tricks:&lt;br /&gt;CPU trick:&lt;br /&gt;&lt;blockquote&gt;// Force the main thread to run on a single core.&lt;br /&gt;SetThreadAffinityMask(GetCurrentThread(), 1);&lt;/blockquote&gt;GPU trick:&lt;br /&gt;&lt;blockquote&gt;flags |= CU_CTX_SCHED_SPIN;         // use sync() if you want to yield&lt;br /&gt;#if (CUDA_VERSION &gt;= 2030)&lt;br /&gt;flags |= CU_CTX_LMEM_RESIZE_TO_MAX; // reduce launch overhead with large localmem&lt;br /&gt;#endif&lt;br /&gt;what about CU_CTX_LMEM_RESIZE_TO_MAX?&lt;/blockquote&gt;&lt;br /&gt;Also Voxel raycasting demo has good code supports Stereo OpenGL rendering and GUI controls!! for Quadros!&lt;br /&gt;and good code multisampling..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;also you can see functions added since 2.1:&lt;br /&gt;&lt;blockquote&gt;#if (CUDA_VERSION &gt;= 2020)&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuDriverGetVersion,                     (int *driverVersion), (driverVersion))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuMemHostAlloc,                         (void **pp, size_t bytesize, unsigned int Flags), (pp, bytesize, Flags))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuMemHostGetDevicePointer,              (CUdeviceptr *pdptr, void *p, unsigned int Flags), (pdptr, p, Flags))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuFuncGetAttribute,                     (int *pi, CUfunction_attribute attrib, CUfunction hfunc), (pi, attrib, hfunc))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuTexRefSetAddress2D,                   (CUtexref hTexRef, const CUDA_ARRAY_DESCRIPTOR *desc, CUdeviceptr dptr, unsigned int Pitch), (hTexRef, desc, dptr, Pitch))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuWGLGetDevice,                         (CUdevice *pDevice, HGPUNV hGpu), (pDevice, hGpu))&lt;br /&gt;#endif&lt;br /&gt;&lt;br /&gt;#if (CUDA_VERSION &gt;= 2030)&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuMemHostGetFlags,                      (unsigned int *pFlags, void *p), (pFlags, p))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGLSetBufferObjectMapFlags,            (GLuint buffer, unsigned int Flags), (buffer, Flags))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGLMapBufferObjectAsync,               (CUdeviceptr *dptr, unsigned int *size,  GLuint buffer, CUstream hStream), (dptr, size, buffer, hStream))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGLUnmapBufferObjectAsync,             (GLuint buffer, CUstream hStream), (buffer, hStream))&lt;br /&gt;#endif&lt;br /&gt;&lt;br /&gt;#if (CUDA_VERSION &gt;= 3000)&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuMemcpyDtoDAsync,                      (CUdeviceptr dstDevice, CUdeviceptr srcDevice, unsigned int ByteCount, CUstream hStream), (dstDevice, srcDevice, ByteCount, hStream))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuFuncSetCacheConfig,                   (CUfunction hfunc, CUfunc_cache config), (hfunc, config))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsUnregisterResource,           (CUgraphicsResource resource), (resource))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsSubResourceGetMappedArray,    (CUarray *pArray, CUgraphicsResource resource, unsigned int arrayIndex, unsigned int mipLevel), (pArray, resource, arrayIndex, mipLevel))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsResourceGetMappedPointer,     (CUdeviceptr *pDevPtr, unsigned int *pSize, CUgraphicsResource resource), (pDevPtr, pSize, resource))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsResourceSetMapFlags,          (CUgraphicsResource resource, unsigned int flags), (resource, flags))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsMapResources,                 (CUgraphicsResource *resources, CUstream hStream), (resources, hStream))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsUnmapResources,               (unsigned int count, CUgraphicsResource *resources, CUstream hStream), (count, resources, hStream))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGetExportTable,                       (const void **ppExportTable, const CUuuid *pExportTableId), (ppExportTable, pExportTableId))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsGLRegisterBuffer,             (CUgraphicsResource *pCudaResource, GLuint buffer, unsigned int Flags), (pCudaResource, buffer, Flags))&lt;br /&gt;FW_DLL_IMPORT_RETV( CUresult,   CUDAAPI,    cuGraphicsGLRegisterImage,              (CUgraphicsResource *pCudaResource, GLuint image, GLenum target, unsigned int Flags), (pCudaResource, image, target, Flags))&lt;br /&gt;#endif&lt;/blockquote&gt;&lt;br /&gt;currently fails with CUDA Compute Cluster driver:&lt;br /&gt;in CudaModule::staticInit(void)&lt;br /&gt;change that:&lt;br /&gt;checkError("cuGLCtxCreate", cuGLCtxCreate(&amp;s_context, flags, s_device));&lt;br /&gt;by&lt;br /&gt;if(tcc)&lt;br /&gt;{&lt;br /&gt;checkError("cuCtxCreate", cuCtxCreate(&amp;s_context, flags, s_device));&lt;br /&gt;//res = cuGLInit();&lt;br /&gt;}&lt;br /&gt;else&lt;br /&gt;checkError("cuGLCtxCreate", cuGLCtxCreate(&amp;s_context, flags, s_device));&lt;br /&gt;cuglinit perhaps needed but depecrated anyway&lt;br /&gt;changed in cuInit(0); or after cuctxcreate?&lt;br /&gt;&lt;br /&gt;also if tcc was more smart would work and fallback&lt;br /&gt;to host interop as CUDA already does so I think directly&lt;br /&gt;all CUDA GL functions return error in tcc..&lt;br /&gt;&lt;br /&gt;anyway thanks good code change:&lt;br /&gt;Buffer::Hint_CudaGLin CudaRenderer::CudaRenderer(void) to Buffer::Hint_None&lt;br /&gt;so &lt;br /&gt;:   m_frameBuffer       (NULL, 0, Buffer::Hint_None),//Buffer::Hint_CudaGL),&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-985438326660395631?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/985438326660395631/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/learned-from-voxel-rendering-demo-code.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/985438326660395631'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/985438326660395631'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/learned-from-voxel-rendering-demo-code.html' title='Learned from voxel rendering demo code: CUDA 3.0 how to change cache size (for Fermi) function found!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-6567644056846847830</id><published>2010-02-18T19:49:00.003+01:00</published><updated>2010-02-18T19:49:41.981+01:00</updated><title type='text'>A month of news!</title><content type='html'>So here it goes all random news I consider interesting in this past month:&lt;br /&gt;* AMD CAL libs coming to MAC? In PGI 10.2 pgaccelinfo includes -ati -amd to report ati accelerators info.. This is in Mac release too.. and says libamdcalcl.dylib not found.. so seems&lt;br /&gt;is not working?&lt;br /&gt;This will close the hole of having standard OpenCL in 3 OSes and also CUDA and CAL on three Oses also..&lt;br /&gt;Remember related news is PGI interested in using Noveau stack as base of enabling GPU computing stack thorugh it for OpenSolaris and FreeBSD after Nvidia spoke about Solaris and demonstrated(?) in GTC08.. but now is dead..&lt;br /&gt;For Windows pgaccelinfo working copy aticalcl.dll to libamdcalcl.dll in dir and also calrt and it works.. So seems really PGI has AMD CAL for MAC.. as has linked the dylib no?&lt;br /&gt;I hope they don't spend too energies working on it since OpenCL is better target for PGI accelerator model..&lt;br /&gt;perhaps is good mail streamdeveloper amd dot com asking it..&lt;br /&gt;* Par4All allows autopar for CUDA, etc..&lt;br /&gt;* After AMD assembly matmul kernel achieved 1tflop on 48xx hardware and 58xxx should be 2tflops&lt;br /&gt;now we have assembly optimized matmul for Nividia having 10-20% better perf.&lt;br /&gt;search "Hand-Tuned SGEMM on GT200 GPU, 10% ~ 20% improvement of SGEMM" &lt;br /&gt;allows 512 gflops gtx 285-&gt; 1tflop matmul for Fermi? (like larrabe? mira acm video)&lt;br /&gt;it has code and report..&lt;br /&gt;Also has trick that using asm("") in cu kernel including PTX works via nvcc due to Open64 features..&lt;br /&gt;*Nvidia has released updated videos on Youtube demos of "fluid demo" for fermi launch, Parallel Nsight (nexus) and one about Sled demo talking a enginner about it..&lt;br /&gt;* I still don't know if some opencl.dll from Khronos works for Nvidia and AMD cards simultaneously..&lt;br /&gt;some one says 2.01 opencl.dll works for two simultaneous..&lt;br /&gt;I don't know but seems AMD works with Nvidia opencl.dll if I have Tesla Computing driver&lt;br /&gt;Related khronos icd released tough things to remember are now you can program compatible OpenCL ICD with the doc and also that through ICD some functions which can not be resolved to concrete platform as unload compiler are "no operation"..&lt;br /&gt;Another thing is that 2.01 dll seems has d3d10,d3d9 interop functions(?) or this are getted via&lt;br /&gt;ICD that supports functions not exported through it, I must see..&lt;br /&gt;also Nvidia has d3d11 interop what about AMD?..&lt;br /&gt;Also spec has some cvs links from Khronos for getting some code (ICD loader code?) so someone can mail khronos jon leech for ex. for khronos cvs icd password..&lt;br /&gt;*cudpp has now triangular solvers from 2010 paper..&lt;br /&gt;still waiting for adding sa2009 paper hash functions..&lt;br /&gt;Also a survey has been released saying in which to devote more energies: double supp, graph functions etc..&lt;br /&gt;*bad article by demerijan about Fermi&lt;br /&gt;http://www.semiaccurate.com/2010/02/17/nvidias-fermigtx480-broken-and-unfixable/&lt;br /&gt;but Nvidia seems confident and set clocks for Fermi this week and seems also mid range and other cards taped out some time ago..&lt;br /&gt;*cusp progressing towards dense math(?) has matmul dense and lu solve seems.&lt;br /&gt;* Still clGetGLContextInfoKHR not usable altough present in header 2.01 (was it before in 2.0?)&lt;br /&gt;also some string in Khronos ICD dll but no in lib and dll's really..&lt;br /&gt;*Linux news:&lt;br /&gt;Catalyst 10.2 has direct2d based acceleration search phoronix&lt;br /&gt;also now Noveau has Galluim 3d support in Fedora 3 (working OpenGL ES 2.0 and OpenVG state trackers?)&lt;br /&gt;Heaven benchmark for Linux coming in March for GDC? new version for sure (support for Fermi seems also as now Catalyst 10.2 shows all big sm5.0 features going trough EXT as double support (ext_fp64), shader model 5.0 (ext_shader5), tesselation stuff (ext_tessaltion_shader)&lt;br /&gt;still no standard ext's for HDR new tex compression shipping but no doc and also similar for radnom accest target..&lt;br /&gt;*OpenGL and OpenVG demos:&lt;br /&gt;Some nice code and tutorials found on web:&lt;br /&gt;-&gt;OpenGL geometry shader one pass texture cubemap render (3 ways)&lt;br /&gt;-&gt;OpenGL GEO culling -&gt;from 2.1billion to 2 million works ATI and Nvidia is 3.2 code..&lt;br /&gt;-&gt;Complex OpenVG demo from SA 2009 Khronos presentation (animation)&lt;br /&gt;-&gt;OpenGL uniforms vs texture objects.&lt;br /&gt;-&gt;Hardware Tessellation on Radeon in OpenGL (geeks3d):&lt;br /&gt;says there are two tesselators in 5xxx extensions&lt;br /&gt;-&gt;Mali SDK UI 2.3, Tegra Khronos SDK..&lt;br /&gt;-&gt;Code from Stanford Iphone GL ES course&lt;br /&gt;http://www.khronos.org/news/multimedia/optimizing-opengl-for-iphone-stanford-university &lt;br /&gt;-&gt; OpenGL 3.2 samples:&lt;br /&gt;http://nopper.tv/opengl_3_2.html&lt;br /&gt;g-truc -&gt;OpenGL 3 Samples Pack 1.2.1 released&lt;br /&gt;*Also seems WebGL released spec at GDC09 as some talks from Khronos.. also Firefox 3.7 will have it and roadmaps plan for mid year now at alpha 2.&lt;br /&gt;*I have found on ACM video rattner sc09 shows Larrabee demo matmul and sparse math..&lt;br /&gt;More videos are from AMD OpenCL PHD boy..&lt;br /&gt;*Would be nice if optix gets upgraded for:&lt;br /&gt;-&gt;Breadth first abd packet ray compression via sort paper EG2010.. improves kernels Timo and Aila used in Optix?&lt;br /&gt;improves raytracing 2x-4x shadows kernels &lt;br /&gt;-&gt;Include Sparse Voxel Raycasting I3D 2010 paper&lt;br /&gt;-&gt;OpenRL compatibilty.. see diferences are small..&lt;br /&gt;Regarding id3 2010 for me in only remains to be seen stocastic transparency bi Enderton..&lt;br /&gt;Also OpenRL is going to Khronos similar to OpenCL by Apple was.. must check similarities to OpenRT (previous standard )&lt;br /&gt;* Seems AMD drivers for Windows 7 in GDI mode has a bug:&lt;br /&gt;In the same artice some info on GDI accelerated on XP and 7 but not vista..&lt;br /&gt;Also in 7 is in Aero only..&lt;br /&gt;gdi bug 5xxx series:&lt;br /&gt;http://www.tomshardware.com/reviews/2d-windows-gdi,2547-15.html&lt;br /&gt;AMD has supplied hotfix and seems 10.2 WHQL doesn't contain it so perhaps 10.3? or 10.4&lt;br /&gt;good theory about gdi on Windows.. disabled in Vista..&lt;br /&gt;download 2dbench de tomshardware for checking perd..&lt;br /&gt;* opennl 3.0 released having CUDA numerical libraries (CNC and CUSP similar?)&lt;br /&gt;* Sparse voxels octree I3D 2010 paper avaiable and extended NV tech report '10 #1  with more photos and gtx285 perf.. also video and code avaiable in google code cuda voxel raycasting project..&lt;br /&gt;see realtimerendering blog post..&lt;br /&gt;* tegra2 full sdk &lt;br /&gt;has now Android 2.1 images and Khronos full SDK (tegra khronos sdk)..&lt;br /&gt;also seems video compression via OpenMAX in Linux and Android already?..&lt;br /&gt;* Current Catalyst are 10.2 (8.70.2) whql and avaiable 8.70.3( only changes OPenGL version no cal no d3d)&lt;br /&gt;beta given to press 10.3 is ati 8.71.3..&lt;br /&gt;Now about it&lt;br /&gt;3d hooks info is needed and good if enable opengl qb stereo on radeon..&lt;br /&gt;better a sdk as with sample of d3d driver hooks similar to 3d vision is used in Avatar..&lt;br /&gt;*There is a gpu-z enabled opencl ati I don't know if checks correctly or only enables ok..&lt;br /&gt;*Now there are GDC 2010 info from Nvidia in developer.nvidia.com and from intel gdc 2010&lt;br /&gt;From Intel expect:&lt;br /&gt;-&gt;GPA 3.0&lt;br /&gt;You'll see in-depth, real-time demos of GPA 3.0, including the much anticipated advanced &lt;br /&gt;thread/task timeline that helps optimize task-based threading. New features such as automated &lt;br /&gt;summarization of your game engine’s performance on multi-core CPUs, the DirectX API, and the &lt;br /&gt;GPU will have you breath-ing a sigh of relief. Platform performance analysis has finally &lt;br /&gt;arrived.&lt;br /&gt;-&gt;Intel C++ Compiler version 12 info&lt;br /&gt;This session in-cludes a review of the new automatic vectorization features in the upcoming &lt;br /&gt;Intel C++ Compiler version 12.&lt;br /&gt;-&gt;Tickertape&lt;br /&gt;Shows a highly-threaded particle system with orientable quads — like paper in a parade. Particles are affected not only by gravity, but also by air resistance and wind.&lt;br /&gt;*Book Programing ... by Kirk released is CUDA book..&lt;br /&gt;Materials are here:&lt;br /&gt;http://www.elsevierdirect.com/companion.jsp?ISBN=9780123814722&lt;br /&gt;There is also a 3 chapter sample..&lt;br /&gt;*In Khronos I have found a OpenCL NVIDIA build of 2010-02-03 &lt;br /&gt;Released soon?&lt;br /&gt;Also a ARM Cortex A9 one:&lt;br /&gt;Samsung Electronics   2010-02-03   OpenCL_1_0&lt;br /&gt;Embedded Linux System with SAMSUNG OpenCL Library with OpenCL running on a ARM Cortex-A9&lt;br /&gt;MPCore CPU.&lt;br /&gt;* Realistic Demo Crymod: Widet2_Benchmark_alpha.7z&lt;br /&gt;*From Caustics:&lt;br /&gt;"due to be released in March"&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;OpenRL™ SDK Public BETA Registration&lt;br /&gt;&lt;br /&gt;Caustic Graphics is about to achieve our next major milestone in bringing cinema quality graphics to every display. We are introducing our OpenRL SDK V1.0 restricted BETA release this week, which is the first implementation of our Open Ray Tracing Language (OpenRL) specification. The OpenRL SDK also includes our new OpenRL shading language (RLSL), which is based on GLSL and provides run-time compiled programmable shaders for ray tracing. &lt;br /&gt;&lt;br /&gt;Similar to OpenGL for rasterization, the OpenRL specification is a framework for writing ray tracing applications that execute across heterogeneous compute platforms. Today there is no open standard, cross-platform API for ray tracing. Consequently developers must program their ray tracing applications "to the metal" or accept “vendor lock-in” by using a proprietary closed standard that is limited to a specific subset of hardware. &lt;br /&gt;&lt;br /&gt;Later this year, we will be proposing the OpenRL specification as an open standard to the non-profit technology consortium, the Khronos Group. Moreover, we will actively solicit and support the introduction of third-party implementations of OpenRL. In the meantime, we are pleased to introduce the first implementation of the OpenRL specification, which we are calling the OpenRL SDK. &lt;br /&gt;&lt;br /&gt;Some quick facts and features slated for the OpenRL SDK:&lt;br /&gt;OS support for Windows, Mac OS X, and Linux;&lt;br /&gt;Uses all OpenCL-based GPUs (e.g., AMD, nVidia, S3) and x86 CPUs (AMD, Intel) simultaneously;&lt;br /&gt;Adding more compute delivers an immediate and nearly linear performance boost;&lt;br /&gt;Plugging in one or more CausticOne or CausticTwo cards delivers the ultimate in ray tracing acceleration.&lt;br /&gt;Target markets include but are not limited to, Film, Video, Games, Transportation, Education, Consumer Products, Architecture, Engineering, and Construction. &lt;br /&gt;&lt;br /&gt;We would like to invite you to participate in our OpenRL BETA public program, slated for release this quarter. The OpenRL SDK Public BETA program will include free access to our developer forum where you can post your questions and answers to the OpenRL SDK, RLSL, CausticOne and CausticTwo. &lt;br /&gt;&lt;br /&gt;Fill out the form below. Upon release we will send you an email with instructions to download the OpenRL SDK. &lt;br /&gt;&lt;br /&gt;P.S. - For those of you who signed up for the CausticRT Emulator, well don't fret. The OpenRL SDK name supersedes CausticRT and CausticGL, whose names will be retired upon release of the production version of the OpenRL SDK.&lt;br /&gt;&lt;/blockquote&gt;So OpenCL based and submitting to Khronos..&lt;br /&gt;S3 support intigues me as no driver supports it?&lt;br /&gt;&lt;br /&gt;*gdebugger 5.5 with new AMD support for (Catalyst 9.12 and up) performance counters&lt;br /&gt;Also gdebugger cl in beta soon..&lt;br /&gt;*ati OpenCl released 2.01&lt;br /&gt;at least fixes pcchen 8 - knights demo ..&lt;br /&gt;Still no bugs for Apple FFT code fixed but reportedly fixed internally by AMD..&lt;br /&gt;Still not now if OpenCL OpenMM is fixed and about early pyrit builds that now have contermeasures..&lt;br /&gt;&lt;br /&gt;*10.6.3 check opengl 3.2 nvidia doubles and cl ati image and ati cal&lt;br /&gt;&lt;br /&gt;RAW:&lt;br /&gt;catalyst 10.2 i 10.3 news (8.71.3) 3d qb for d3d (can enable qb 3d ogl via ocl dx ogl interop?)&lt;br /&gt;58xx xbvau not work but patch similar to 4xxx card bug earlier will fix it&lt;br /&gt;fglrx 10.4 ubuntu driver fixed by then..&lt;br /&gt;pgi 10.2 pgaccelinfo has cal info and libamdcal.dylib not found (amd has cal for mac?)&lt;br /&gt;gdc eyefinity sdk?&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Catalyst 10.2 has 181 GL extensions!&lt;br /&gt;&lt;br /&gt;3 new, 1 EXT, 2 ARB:&lt;br /&gt;GL_ARB_blend_func_extended - more enhancements to blending? whats left in DX10/11 that OGL doesn't have?&lt;br /&gt;&lt;br /&gt;GL_ARB_fragment_coord_conventions - DX9 compatibility (wasn't this in OpenGl 3.2?!? still missing transform_feedback2)  no estaba en 9.12 hotfix&lt;br /&gt;&lt;br /&gt;GL_EXT_texture_buffer_object_rgb32 - this one is interesting as GL_ARB_texture_buffer_object already lists all the RGBA32 F, I, and UI.&lt;br /&gt;ojo vi en fermi 195 drivers&lt;br /&gt;&lt;br /&gt;Also I note that 2 amd extensions have been documented:&lt;br /&gt;http://www.opengl.org/registry/specs/AMD/seamless_cubemap_per_texture.txt - when did this get added?&lt;br /&gt;http://www.opengl.org/registry/specs/AMD/shader_stencil_export.txt - from 10.1&lt;br /&gt;&lt;br /&gt;Wonder how far away we are from GL 3.3. Still haven't seen DX11 stuff yet, but they must be working on it!&lt;br /&gt;&lt;br /&gt;Can't see any sign of the rumored (or under NDA) per-game application profile support yet in CCC. Supposed to be in 10.2...&lt;/blockquote&gt;&lt;br /&gt;tesla computing driver released 19.628 64 bits windows 2800 r2: opencl support?, nexus with ati?compute exclusive timeout&lt;br /&gt;*Still no compiler no doubles feb 2010 directx sdk &lt;br /&gt;*fermi 4x slowdown doubles&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-6567644056846847830?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/6567644056846847830/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/month-of-news.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/6567644056846847830'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/6567644056846847830'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/month-of-news.html' title='A month of news!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-5664756594054799049</id><published>2010-02-18T15:39:00.005+01:00</published><updated>2010-02-18T21:24:03.122+01:00</updated><title type='text'>About Tesla computing driver!</title><content type='html'>Hi boys,&lt;br /&gt;I'm becoming increasingly lazy in publishing stories.. sorry for that..&lt;br /&gt;A good megacompilation is coming this week..&lt;br /&gt;Anyway today is &lt;a href="http://forums.nvidia.com/index.php?showtopic=159208"&gt;old news&lt;/a&gt; for installing &lt;a href="http://www.nvidia.com/object/winserver_2008R2_64bit_Tesla_196.28_whql.html"&gt;Tesla Computing driver (196.28)&lt;/a&gt; &lt;a href="http://www.nvidia.com/content/webinar/Tesla_Fermi_Webinar_Dec16_09_v1_0.pdf"&gt;see info (slide 35)&lt;/a&gt; &lt;br /&gt;&lt;blockquote&gt;Tesla Compute Cluster (TCC)Driver&lt;br /&gt;Enables Windows HPC on Tesla&lt;br /&gt;Enables Tesla without a NVIDIA graphics card with Windows 7, Server 2008 R2, Windows Vista, Server 2008&lt;br /&gt;Only Tesla 8-series, 10-series and 20-series supported&lt;br /&gt;Only works with CUDA&lt;br /&gt;Does not support OpenGL and DirectX&lt;br /&gt;Available in beta now, release in Jan 2010&lt;br /&gt;Enables the following features under Windows with CUDARDP (Remote Desktop)&lt;br /&gt;Launch CUDA applications via Windows Services&lt;br /&gt;No Windows Timeout issues&lt;br /&gt;No penalty on launch overhead&lt;br /&gt;KVM-over-IP enabled (CPU Server on-board graphics chipset enabled)&lt;/blockquote&gt;&lt;br /&gt;Teoretically is for Tesla and Windows 2008R2 64 bit only but I have succefussly installed  on Windows 7 on GTX 275..&lt;br /&gt;so installing compute driver is possible on Geforce &lt;br /&gt;just locate NVWD.inf and add&lt;br /&gt;under [NVIDIA_SetA_Devices.NTamd64.6.0]&lt;br /&gt;%NVIDIA_DEV.05E6.01% = Section001, PCI\VEN_10DE&amp;DEV_05E6 &lt;br /&gt;and under [NVIDIA_SetA_Devices.NTamd64.6.1]&lt;br /&gt;%NVIDIA_DEV.05E6.01% = Section002, PCI\VEN_10DE&amp;DEV_05E6 &lt;br /&gt;this is for GTX 275 for others locate in inf for your card..&lt;br /&gt;&lt;br /&gt;Only supports CUDA at the moment (CUDA C(++)).. of course I have checked and PGI Fortran detects it so also CUDA Fortran..&lt;br /&gt;OpenCL doesn't work currently but I think they will add support for it in 200 series..&lt;br /&gt;DirectCompute I have not many hopes.. as device detection is for graphic devices and doesn't expose DirectX..&lt;br /&gt;&lt;br /&gt;Have to test Badaboom for seeing if CUVID works and also what about CUDA video encoding via kernels is working (or CUVENC..)&lt;br /&gt;I have to test if OptiX works (I hope but not graphic demos so save render to a file and check)&lt;br /&gt;Also what about PhysX.. Theoretically should work as no interop with graphics is currently enabled at least this enables ATi rendering+PhysX work..&lt;br /&gt;Have to test..&lt;br /&gt;At least has no graphics API dependencies &lt;br /&gt;&lt;br /&gt;Also what about other CUDA strong programs as Vreveal.. &lt;br /&gt;&lt;br /&gt;Last would be good if supported Nexus as I have ATI+Nvidia but Nvidia normal driver only enables CUDA if you extended desktop to *at least ONE Nvidia GPU*..&lt;br /&gt;i.e. if you have 2 nvidias you can use Nexus but ATI+nvidia no work as extending desktop crashes GPU debugger and no extending it doesn't enable Nvidia driver (Windows 7 limitation?)..&lt;br /&gt;&lt;br /&gt;Official response:&lt;br /&gt;The Nexus Beta currently only officially supports the 195.62 driver from nvidia.com. Regarding support of the TCC driver, it is not currently supported, but Nexus debugging support using TCC is something we are considering for a future build.&lt;br /&gt;&lt;br /&gt;Posted in link above:&lt;br /&gt;&lt;blockquote&gt;Hi,&lt;br /&gt;some questions:&lt;br /&gt;I have tested on a GTX 275 (adding device id to the driver inf) with AMD 5850 as display good work..&lt;br /&gt;(hmm I hope by saying that please don't block this possibilty similar to how are you going to block double prec potential by slowing down 4x on geforce fermi cards..)&lt;br /&gt;By "Only CUDA is supported in this release" you mean OpenCL is supported right now?&lt;br /&gt;I have tested OpenCL ocldeviceQuery and fails to search platform ID and is clear the driver doesn't include opencl.dll nor nvcompiler.dll..&lt;br /&gt;Using that dlls from 196.34 don't work altough nvcuda.dll seems to have Khronos ICD entry points..&lt;br /&gt;are you going to support OpenCL soon on Tesla driver?&lt;br /&gt;I have not tested but are CUDA programs using textures mean to work? I assume yes altough a graphics feature..&lt;br /&gt;Also what about for DirectCompute? i.e for DirectCompute apps not using graphics are you going to support it?&lt;br /&gt;And finally Nexus, I have access to Nexus beta which supports by using two Nvidia cards in one PC debugging in one computer.. is this supported by using as display device an ATI card now one Nvidia card has not to have extended desktop&lt;br /&gt;I hope you add support for Nexus for Tesla Computing driver in case it isn't supported right now..&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/8553786559872430029-5664756594054799049?l=oscarbg.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://oscarbg.blogspot.com/feeds/5664756594054799049/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://oscarbg.blogspot.com/2010/02/about-tesla-computing-driver.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5664756594054799049'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/8553786559872430029/posts/default/5664756594054799049'/><link rel='alternate' type='text/html' href='http://oscarbg.blogspot.com/2010/02/about-tesla-computing-driver.html' title='About Tesla computing driver!'/><author><name>oscarbg</name><uri>http://www.blogger.com/profile/10172785372134961220</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-8553786559872430029.post-8828015878808923832</id><published>2010-02-05T16:29:00.352+01:00</published><updated>2010-02-18T19:50:25.256+01:00</updated><title type='text'>A long report of the silence before the storm: AKA a month before Fermi..</title><content type='html'>Sorry raw dump of my ideas:&lt;br /&gt;&lt;br /&gt;Altough we are a month of a complete storm if we follow carefully we can hear some thunders of that storm known as Fermi and new software updates:&lt;br /&gt;&lt;br /&gt;First the base read graphics arch (Nvidia GF100) and compute arch (Fermi arch)..&lt;br /&gt;&lt;br /&gt;also see Deep Dive presentation having more perf chart vs PDF in noticias3d.com or ..&lt;br /&gt;Also altough not kwnown there were two more Deep Dive sessions not much talked about developer relations program showing sled info about demo and Nexus graphics debugging (the first demo I have of debugging a HLSL video as CUDA video has been posted).&lt;br /&gt;Search in cz page..&lt;br /&gt;&lt;br /&gt;Tesla computing driuver&lt;br /&gt;GFX cards:&lt;br /&gt;4x slower doubles?&lt;br /&gt;&lt;br /&gt;As you will know graphics arch reveal revamped geometry power via parallel rasterizers (4 so 4x perf) and 16x geo power via putting this 16 times..&lt;br /&gt;also now geo buffer and stream out buffers are using L1/L2 caches (and atomics?) so much faster&lt;br /&gt;and general (removing fixed funtion hardware)..&lt;br /&gt;this can be seen at least a removal of fixed functions) and generalizing to work in parallel the rasterizer..&lt;br /&gt;This impacts a geometry hard game as Crysis as 60% faster not bad expecting also shader power to be near to 2x increase..&lt;br /&gt;and I think of GF100 as of 4 GPUs in one chip or GPC.. at has all it needs..&lt;br /&gt;&lt;br /&gt;right now is GTX 480 and 470 has h.264 mvc support (bluray 3d by the way HDMI 1.4 3D spec is open) (will be exposed in DXVA or what? also in CUVID VDPAU and or CUVEND?..)&lt;br /&gt;as you know in Mac GPU video encoding are supported by Elemental and video decoding by a shit api (QTKIT) which not exposes decoded frames as OpenGL textures or OpenCL image objects..&lt;br /&gt;Elemental ships in 2.2 with her GPU decoding so have to see is a CUVID using Snow Leo APIs or using shaders..&lt;br /&gt;&lt;br /&gt;Also I have seen HDMI 1.4 outputs in Fermi and this would be marvelous as to interop the output&lt;br /&gt;of 3D Vision to Sony 3D monitors (but what glasses I use?)&lt;br /&gt;&lt;br /&gt;Lastly 3D Vision has now tri SLI or quad SLI support and all new monitors 24 inch support (3 or 4 right now) I have seen 27inch monitor from ASUS for early June and panels with 3d Vision and touch support are being sampled I think..  but remember&lt;br /&gt;Youtube 3D Vision support, windows supported and browser integration are promised soon..&lt;br /&gt;&lt;br /&gt;There are reports that claim&lt;br /&gt;&lt;br /&gt;SA 2009 courses things learned:&lt;br /&gt;SC 2009 courses things learned:&lt;br /&gt;I3D 2010 things learned:&lt;br /&gt;&lt;br /&gt;would be perfect for a fraps grabbing 3d Vision&lt;br /&gt;One thing I'm sad it will not be is this will be of use for not halting the OS and also in&lt;br /&gt;I hope Nvidia are working on right at least for near future this year..&lt;br /&gt;I can't understand why not would be the case..&lt;br /&gt;&lt;br /&gt;1. altough this is not strictly Fermi related, the much needed updates of OpenCL in MacOSX and DirectCompute in Windows are coming in a month I expect..&lt;br /&gt;&lt;br /&gt;Direct3D SDK updates are much needed after some 5 months (a 1.5 month before Windows 7 launch) )since last update something like prehistory is this rapid changing world :-)&lt;br /&gt;I hope a GDC 2010 release (so 6 months later) at least with important fixes all know issues: for double support, CS library: FFT,scan, and other fixes reported on XNA forums..&lt;br /&gt;&lt;br /&gt;Also would be good if some samples shown by Fermi Deep Dive session at CES are given as that seems DirectX samples and released as hair demo or tesselated water demo.. AMD did the same with 5xxx code (search contributed by AMD in Direct3D SDK)..&lt;br /&gt;&lt;br /&gt;Also good demos of Ocean demos are shown by Nvidia a OpenCL code port of DirectCompute and AMD in SA 2009 OpenCL seesion.. would be good to have this..&lt;br /&gt;&lt;br /&gt;I am also Nvidia ships more DirectCompute demos in GPU Computing SDK 3.0 final or beta2 which I hope will be released by Fermi time..&lt;br /&gt;&lt;br /&gt;I also hope cuprintf released two months ago is integrated in CUDA Toolkit or SDK and hopefully&lt;br /&gt;ported to OpenCL for GPU printf debugging support (as said AMD supports in Linux in CPU and coming to MSVC).. Anyway I expect OCL support to be somewhat restricted due to no template support, etc..)&lt;br /&gt;I would port to OCL but anyway is confidential stuff right now..&lt;br /&gt;&lt;br /&gt;See more debugging later..&lt;br /&gt;&lt;br /&gt;I also want to talk about CUDA SDK 3.0 a lot more as about ELF, cuda memcheck, CUDA driver RT interop,etc.. but I will wait until final PTX 2.0, 1.5 (OCL) and docs are updated..&lt;br /&gt;&lt;br /&gt;As a check point would be good to know how ECC and L1/shared cache is configured enabled..&lt;br /&gt;I remember seeing in some Quadro 195 driver released seeing something about ECC in Control Panel..&lt;br /&gt;but I don't know how L1/shader mem cache is going to be used (parameter to nvcc?, CUDA API fuction,etc..)&lt;br /&gt;&lt;br /&gt;10.6.3 is coming this month and has OpenGL 3.x support (well 3.0 seems) (altough netkas claims that not complete as OpenGL extensions viewer doesn't claim GLSL 1.5.0 required support I think this is related to no info on GL 3.x context creation has been published so it's not creating an advanced context but extensions are there.. also comparing to 10.6.2 I see two more 3.2 extensions are supported not bad.. I only hope they are two interesting ones and not directx helper extensions..  give me that plus uniform_object and TBO from 3.1 and I would be more than happy..&lt;br /&gt;So I hope this are at least supported as extensions in Nvidia driver or AMD 5xxx driver..&lt;br /&gt;at Netkas seems is reporting software renderer extension..&lt;br /&gt;oh boy if Apple cared less about a stable platform and give GPU extensions as fast as they come in Windows and Linux would be perfect I don't care about OpenGL 3.x being implemented in software seems a mad situation as much as if Microsoft cared about DirectX reference rasterizer for running actual games (ehem it has WARP..)&lt;br /&gt;If not at least expect 3.1 complete by summer (=10.6.4 or 10.6.5) and perhaps 3.2 by end this year.. so seems 3.2 complete this year..&lt;br /&gt;I hope by that time having also optional 3.2 ext:&lt;br /&gt;GL_ARB_draw_buffers_blend&lt;br /&gt;GL_ARB_sample_shading&lt;br /&gt;GL_ARB_texture_cube_map_array&lt;br /&gt;GL_ARB_texture_gather&lt;br /&gt;GL_ARB_texture_query_lod&lt;br /&gt;at least&lt;br /&gt;GL_ARB_sample_shading&lt;br /&gt;GL_ARB_texture_cube_map_array&lt;br /&gt;GL_ARB_texture_gather&lt;br /&gt;for me are good.&lt;br /&gt;&lt;br /&gt;News are that at WWDC is showing 10.7.0 and if you remeber in 2008 had GT200 support so perhaps at least 3.2 complete and Fermi support will be for 10.7.0 WWDC seed..&lt;br /&gt;&lt;br /&gt;Also altough a bit premature would be good if with initial 5xxx and hopefully coming this year Fermi support adds also new shader 5.0 extensions (more later)&lt;br /&gt;for me would be perfect similar to Leopard having in 10.5.2 at least a lot of G80 new extensions in Nvidia supported (geo shaders, texture feedback,etc..) ..&lt;br /&gt;&lt;br /&gt;OpenCL for MacOS: FFT library perf fixes, also expect some improvementes as double support for Nvidia on GT2xx cards, ATI image support at least this is where I will put my effort being Apple.. Still the bad thing is Apple is no 5xxx support as AMD 4xxx don't have true local mem but this can be changing fast if rumors are true of a expected MacPro shipping this or next month with 24 hardware threads (2 6 cores 32nm Westmere) and hopefully a 5xxx card as option so perhaps good..&lt;br /&gt;&lt;br /&gt;Before leaving MACOS also I expect CUDA updates for 3.x:&lt;br /&gt;Talking CUDA on MACos:&lt;br /&gt;you have cuda memcheck&lt;br /&gt;cuda-gdb coming soon.. will add OpenCL at that time also?&lt;br /&gt;cuda 64 bit support (for 3.x)&lt;br /&gt;cuda opengl efficient support (not hoped but can be)&lt;br /&gt;also would be good if for hackintosh users can use Fermi on CUDA 3.0 in MAcos..&lt;br /&gt;i.e. cuda.kext exposes access to that..&lt;br /&gt;&lt;br /&gt;Also remember Fermi support will not be completed by 3.0 release well at least if not released as beta2 in march and delay 3.0 for June summer..&lt;br /&gt;so expect a lot  more for 3.1 and perhaps some minus things for 3.2&lt;br /&gt;if you not follow gt200 intro, 2.0 had double support and shared mem atomics but until 2.2 we hadn't host pinned mem a feature of gt200..&lt;br /&gt;Amongs the things said to not be present at first are support of recursion and I think also virtual fuction calls and function pointers but I could prove wrong..&lt;br /&gt;&lt;br /&gt;Of course this hardware fe
