<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5021319479788698734</id><updated>2012-01-12T11:45:11.842+08:00</updated><category term='data parallelism'/><category term='GPU'/><category term='High Performance Computing'/><category term='Segmentation fault'/><category term='Win32 Disassembly'/><category term='CUDA 4.0'/><category term='pfiles'/><category term='CUDA 2.3'/><category term='plimit'/><category term='junit'/><category term='concurrent programming'/><category term='Bug'/><category term='HPCSIG'/><category term='Major GC'/><category term='dbx'/><category term='Parallel Algorithms'/><category term='loop unrolling'/><category term='BMT'/><category term='Endian'/><category term='BEA WebLogic'/><category term='Administration'/><category term='MongoDB'/><category term='er_src'/><category term='euclidean distance'/><category term='JNI Reference'/><category term='Debugger'/><category term='Parallel programming'/><category term='volatile'/><category term='Solaris'/><category term='load testing'/><category term='JMS'/><category term='process stack'/><category term='Dtrace'/><category term='prun'/><category term='IronPython'/><category term='eventlet'/><category term='Geekcampsg'/><category term='collective intelligence'/><category term='Automation'/><category term='MPI'/><category term='lvalue'/><category term='CUDA 2.1'/><category term='SecondLife'/><category term='Parallel paradigm'/><category term='CMT'/><category term='Sun Studio 12'/><category term='PyCUDA'/><category term='mac os'/><category term='Coulomb&apos;s Law'/><category term='HPC'/><category term='Dynamic Language Runtime'/><category term='Memory Leaks'/><category term='Java Native Interface'/><category term='bash'/><category term='Geek camp'/><category term='MxCsr'/><category term='OpenSolaris'/><category term='parallelization'/><category term='JVMTI'/><category term='CUDA 2.2'/><category term='dynamic languages'/><category term='xautopar'/><category term='FP'/><category term='JVMPI'/><category term='Unix Inode'/><category term='performance testing'/><category term='OOP'/><category term='unit testing'/><category term='Garbage Collection'/><category term='Pydoc'/><category term='J2EE'/><category term='Intel'/><category term='File Descriptor'/><category term='task parallelism'/><category term='CUFFT'/><category term='MDB'/><category term='Python'/><category term='python yield'/><category term='DLR'/><category term='boost coroutines'/><category term='string matching algorithms'/><category term='collaborative filtering'/><category term='debugging'/><category term='SNL'/><category term='MapReduce'/><category term='free software foundation'/><category term='SIGUSR'/><category term='.Net'/><category term='CUDA 2.0'/><category term='greenlet'/><category term='floating point control word'/><category term='coroutines'/><category term='pointer aliasing'/><category term='OpenMP'/><category term='Linden Lab'/><category term='problem solving'/><category term='parallel string matching algorithms'/><category term='Artificial Intelligence'/><category term='Virtual Functions in C++'/><category term='libumem'/><category term='ctypes'/><category term='Scala'/><category term='Nvidia'/><category term='Type Erasure'/><category term='Java Thread'/><category term='CPU'/><category term='unit test'/><category term='truss'/><category term='CUDA'/><category term='pstack'/><category term='TUT'/><category term='OpenCL'/><category term='Cilk'/><category term='thread parallelism'/><category term='CUBLAS'/><category term='pearson coefficient'/><category term='ulimit'/><category term='ftrap'/><category term='SEGV'/><category term='instrument'/><category term='math'/><category term='JVM'/><category term='implicits'/><category term='gdb'/><category term='cygwin-x'/><category term='btrace'/><category term='bss'/><category term='C/C++'/><category term='FSF'/><category term='web crawling'/><category term='Scala implicit resolution'/><category term='core dump'/><category term='Minor GC'/><category term='multicore'/><category term='Tsung'/><category term='VB'/><category term='Java'/><category term='NoSQL'/><category term='PGI'/><category term='C#'/><category term='PythonWin'/><category term='compiler pragma'/><category term='Microsoft Windows'/><category term='gcore'/><category term='GPGPU'/><category term='Ruby'/><category term='Linux'/><category term='Java Byte Codes'/><category term='cpp unit tests'/><category term='Scala functions'/><category term='CUDA 1.1'/><category term='gcc'/><category term='IronRuby'/><category term='Catch exception'/><category term='jconsole'/><category term='Ruby Conference'/><category term='profiling'/><category term='OpenJDK'/><category term='event system'/><category term='Erlang'/><category term='MaxFDLimit'/><title type='text'>Raymond Tay</title><subtitle type='html'>My ramblings on technologies that interest me</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default?start-index=101&amp;max-results=100'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>124</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-524070000347649875</id><published>2011-12-29T11:54:00.002+08:00</published><updated>2012-01-09T22:51:08.957+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scala'/><category scheme='http://www.blogger.com/atom/ns#' term='Scala implicit resolution'/><category scheme='http://www.blogger.com/atom/ns#' term='implicits'/><title type='text'>Scala's implicit need to be handled with care</title><content type='html'>Quick post when using Scala's &lt;i&gt;implicits&lt;/i&gt;. These constructs are really useful in the manner that they allow the user to create rather complex expressions in type design &amp;amp; bridging frameworks in the Java world. However, i did find some oddity when i was playing with it and found something interesting, to me.&lt;br /&gt;&lt;br /&gt;Let's begin with a simple example.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;scala&amp;gt; object holder {&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;| trait Foo {&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;| implicit val x = new Foo { override def toString = "trait foo" } }&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;| object Foo {&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;| implicit val y = new Foo { override def toString = "object foo" } }&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;| }&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;defined module holder&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;scala&amp;gt; import holder.Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;import holder.Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;scala&amp;gt; def method(implicit foo: Foo) = println(foo)&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;method: (implicit foo: holder.Foo)Unit&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;scala&amp;gt; method&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;java.lang.StackOverflowError&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$$anon$1.&lt;init&gt;(&lt;console&gt;:18)&lt;/console&gt;&lt;/init&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$class.$init$(&lt;console&gt;:18)&lt;/console&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$$anon$1.&lt;init&gt;(&lt;console&gt;:18)&lt;/console&gt;&lt;/init&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$class.$init$(&lt;console&gt;:18)&lt;/console&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$$anon$1.&lt;init&gt;(&lt;console&gt;:18)&lt;/console&gt;&lt;/init&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$class.$init$(&lt;console&gt;:18)&lt;/console&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$$anon$1.&lt;init&gt;(&lt;console&gt;:18)&lt;/console&gt;&lt;/init&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$class.$init$(&lt;console&gt;:18)&lt;/console&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$$anon$1.&lt;init&gt;(&lt;console&gt;:18)&lt;/console&gt;&lt;/init&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;at holder$Foo$class.$init$(&lt;console&gt;:18)&lt;/console&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;The interesting thing is that the Scala runtime can't seem to make up its mind whether to invoke the "x" or the "y" when its being looked up in the call to "method".&lt;br /&gt;&lt;br /&gt;Another example would be related to Scala's package objects. To see how the problem can manifest itself, here's an example with a file named package.scala in Scala's convention and here are its contents:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;package object foo {&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; implicit def foo = new Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; implicit def foo2 = new Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;package foo {&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; class Foo {&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; override def toString = "FOO!"&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;when you run the compilation via scalac, it generates a directory foo and dumps the class files there. Nothing to it. Now run scala and use the implicitly function and you'll see what i mean.&lt;/div&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;Welcome to Scala version 2.9.1.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_29).&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;Type in expressions to have them evaluated.&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;Type :help for more information.&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;scala&amp;gt; implicitly[foo.Foo]&lt;/span&gt;&lt;br /&gt;&lt;console&gt;&lt;span style="color: blue;"&gt;:8: error: ambiguous implicit values:&lt;/span&gt;&lt;/console&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;both method foo in package foo of type =&amp;gt; foo.Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;and method foo2 in package foo of type =&amp;gt; foo.Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp;match expected type foo.Foo&lt;/span&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; implicitly[foo.Foo]&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;This isn't Scala's fault, really; because the compiler can't possibly know what you plan to do with it but it certainly is something you should be aware of and i do think this is a good thing (rather than having the compiler just forget about it)&lt;br /&gt;&lt;br /&gt;The idea is to really limit the scope in which the &lt;i&gt;implicits&lt;/i&gt; exist so as to reduce the amount of pain you have to go through when debugging&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-524070000347649875?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/524070000347649875/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=524070000347649875&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/524070000347649875'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/524070000347649875'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/12/scalas-implicit-need-to-be-handled-with.html' title='Scala&apos;s implicit need to be handled with care'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4486197719074929828</id><published>2011-12-18T11:28:00.000+08:00</published><updated>2011-12-18T11:28:43.132+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scala'/><category scheme='http://www.blogger.com/atom/ns#' term='Scala functions'/><category scheme='http://www.blogger.com/atom/ns#' term='Type Erasure'/><title type='text'>Scala's type erasure in writing functions</title><content type='html'>Wanted to write a quick post to anyone whose out there interested. The topic of this post has to do with type erasure in Scala. I won't dwell on history but to get an idea what it is (other than googling or ref the wikipedia) you should read this well written article&amp;nbsp;&lt;a href="http://lamp.epfl.ch/~emir/bqbase/2006/10/16/erasure.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I like to use an example to illustrate something here. Whilst playing with Scala, i find myself writing code like this&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;class MyTest {&lt;/div&gt;&lt;div&gt;&amp;nbsp;def printSomething = println("blah blah blah")&lt;/div&gt;&lt;div&gt;&amp;nbsp;def compute1Value = 41 * 41&amp;nbsp;&lt;/div&gt;&lt;div&gt;&amp;nbsp;def compute2Value = (41*41*2, 1)&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;object SimpleTest extends App {&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;def test1(a : {def printSomething: Unit ; def compute1Value: Int; def compute2Value: Tuple2[Int, Int]} ) {&lt;/div&gt;&lt;div&gt;.... // do something&amp;nbsp;&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;I thought nothing special here. Compiled it and it works. But i thought i should be able to write something more generic than that because the structural subtyping technique here is REALLY useful in real world applications but there're caveats because of type erasure (this post isn't about that)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So i proceeded to modify the type of 'a' in 'test1' to become like this&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;def test1(a : {def print1Line: Unit; def fn1:Function0[Int]; def fntuple2: Function0[Tuple2[Int,Int]]}) {&lt;/div&gt;&lt;div&gt;.. // do something&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But this didn't compile. Hmm...(honestly, WTF was the word that came to my mind) what happened. Afaik, it should work because i have a valid function definition on both LHS &amp;amp; RHS of the equation. However, after figuring out what happened by examining the java bytecode output i realized it and the following definition works&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;def test1(a : {def print1Line: Unit; def compute1Value:Int; def compute1Value2: Tuple2[Int,Int]} ) {&lt;/div&gt;&lt;div&gt;... // do something&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Mystery solved. Don't get me wrong here but you should probably be able to find the answer by examining the Scala Language Specification &amp;amp; &amp;nbsp;still i chose to de-assemble the code and from the output below you should be able to guess that the return type was determined and instead &amp;nbsp;embedded into the signature of the function.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;div&gt;public int compute1Value();&lt;/div&gt;&lt;div&gt;&amp;nbsp; Code:&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;0:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;sipush&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;1681&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;3:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;ireturn&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;public scala.Tuple2 compute1Value2();&lt;/div&gt;&lt;div&gt;&amp;nbsp; Code:&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;0:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;new&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;#24; //class scala/Tuple2$mcII$sp&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp;3:&lt;span class="Apple-tab-span" style="white-space: pre;"&gt; &lt;/span&gt;dup&lt;/div&gt;&lt;/div&gt;&lt;div&gt;...&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Alternatively, the following code works too&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;class MyAnonFnClass {&lt;/div&gt;&lt;div&gt;&amp;nbsp; def print1Line = println("blah blah blah")&lt;/div&gt;&lt;div&gt;&amp;nbsp; def fn1 = new Function0[Int] {&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; def apply() = 41 * 41}&amp;nbsp;&lt;/div&gt;&lt;div&gt;&amp;nbsp; def fntuple2 = new Function0[Tuple2[Int,Int]] {&lt;/div&gt;&lt;div&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; def apply() = (41 * 41 * 2,1) }&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;object SimpleTest extends App {&lt;/div&gt;&lt;div&gt;def test1(a : {def print1Line: Unit; def fn1:Function0[Int]; def fntuple2: Function0[Tuple2[Int,Int]]}) {&lt;/div&gt;&lt;div&gt;... // do something&lt;/div&gt;&lt;div&gt;}&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;This definition of "test1" is what i like better but the definitions of "fn1" and "fntuple2" is somewhat awkward. Still, some say it doesn't look right ... there're better ways of writing this and an example is that of James Iry at &lt;a href="http://stackoverflow.com/questions/5086769/function-type-definition-and-type-erasure-in-scala"&gt;stackoverflow&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4486197719074929828?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4486197719074929828/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4486197719074929828&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4486197719074929828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4486197719074929828'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/12/scalas-type-erasure-in-writing.html' title='Scala&apos;s type erasure in writing functions'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2856021653743117151</id><published>2011-10-12T09:29:00.002+08:00</published><updated>2011-10-12T09:30:15.495+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Geekcampsg'/><category scheme='http://www.blogger.com/atom/ns#' term='Geek camp'/><title type='text'>GPU development needs a push in Singapore</title><content type='html'>Recently, i got to know of a event where geeks would gather for 1 day every year. Turns out that &lt;a href="http://geekcamp.pbworks.com/w/page/8900309/FrontPage"&gt;Geekcamp&lt;/a&gt; was held once every year and some companies would provide sponsorship either providing a venue, meals, refreshments, collaterals etc and i thought this year, CUDA should be given some highlight. So i went and gave a introductory talk on the subject and the slides are &lt;a href="http://www.slideshare.net/RaymondTay1/introduction-to-cuda-geek-camp-singapore-2011"&gt;&lt;span id="goog_82349167"&gt;&lt;/span&gt;here&lt;span id="goog_82349168"&gt;&lt;/span&gt;&lt;/a&gt;. Glad to see many enthusiasts and heartened by their questions.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Lastly, i noticed that one of the organizers took photos of the event though its on &lt;a href="http://www.flickr.com/photos/elfgoh/sets/72157627676375741/with/6203360083/"&gt;flickr&lt;/a&gt;.&amp;nbsp;&lt;/div&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2856021653743117151?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2856021653743117151/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2856021653743117151&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2856021653743117151'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2856021653743117151'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/10/gpu-development-needs-push-in-singapore.html' title='GPU development needs a push in Singapore'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4190948324817062524</id><published>2011-09-18T10:05:00.000+08:00</published><updated>2011-09-18T10:14:05.361+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenCL'/><category scheme='http://www.blogger.com/atom/ns#' term='thread parallelism'/><category scheme='http://www.blogger.com/atom/ns#' term='data parallelism'/><category scheme='http://www.blogger.com/atom/ns#' term='task parallelism'/><title type='text'>OpenCL Programming Guide - A book review</title><content type='html'>Hi everyone,&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This post is about a book i helped reviewed during this last month. This book is called OpenCL Programming Guide by Addison Wesley where one of the co-authors is the main guy behind OpenCL, Dr. Tim G. Mattson.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;First, let me begin by what i feel about the book and then i'll proceed to give a more detailed review. When i first saw the book, i thought that it was a rather large book (approx 600 pages) for a relatively young technology as OpenCL and the reason for this was that the book was split into 3 major sections: Section 1: All about OpenCL, Section 2: Case studies, Section 3: OpenCL API guide. What i next felt was that there was a lack lustre when it came to the overall cover design of the book but the layout of the contents is good, clear and concise. Fits well into the human hand and i reckon can be carried around the bus, library, shopping mall whenever you feel like reading. I thoroughly enjoyed the case studies since those are real world applications of OpenCL and where details are desired, the appropriate reference are provided.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Overall, i think this is a good guide for programmers in OpenCL, CUDA and the general audience would benefit, too. Thanks&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This book covers the OpenCL 1.1 specification in its entirety which is a good thing. I don't think i'll cover all the chapters but i'll point out which chapters sort of stood out, for me.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Chapter 1:&lt;/div&gt;&lt;div&gt;---------------&lt;/div&gt;&lt;div&gt;What i liked about this chapter was the cautionary note to readers that implementation of work-groups (including execution concurrency) in OpenCL is largely vendor dependent. This marks a great difference between the equivalent in the NVIDIA's CUDA model. I also liked the fact that the authors reminded the readers that they need to be aware of the IEEE 754 floating-point support issues - they did make known that it would suffice for current heterogenous platforms. The authors took a painstaking task, i believe, to lay the ground for OpenCL by going through the core concepts through clear explanation. I thought it was a pleasure to read :)&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What i didn't like about this chapter was that there was too much write up about the API, perhaps they could illustrate a few and point the readers to the specs for details?&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Chapter 2:&lt;/div&gt;&lt;div&gt;---------------&lt;/div&gt;&lt;div&gt;This chapter begins with a sample OpenCL in the classic "Hello World" application and HOWTOs configure 3 IDEs to use OpenCL. I think this was helpful for folks who come to OpenCL having used Eclipse, CodeBlocks and Microsoft Visual Studio. However, i didn't quite appreciate the lengthy explanation of that example but still i think it has value because the book assumes you don't have experience in using OpenCL so i think it still has tremendous value.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Chapter 3:&lt;/div&gt;&lt;div&gt;---------------&lt;/div&gt;&lt;div&gt;This section also suffered, imo, from elaborate explanation of the APIs ... in another word, kind of boring but still nonetheless important. It would serve you well when you hit those bugs because you didn't read the API docs well. However, this OpenCL 1.1 reference card is handy (http://www.khronos.org/files/opencl-1-1-quick-reference-card.pdf)&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Chapter 4, 5, 6:&lt;/div&gt;&lt;div&gt;---------------&lt;/div&gt;&lt;div&gt;This is the chapter i liked. They go into great detail on the data types in OpenCL like the "vector", "half" and illustrate how to manipulate them through code in snippet form. They talked about implicit type conversions among types, IEEE 754 rounding modes and the usual arithmetic operations you can apply to the new data types in OpenCL. Something that caught my eye was the fact that there isn't short circuit evaluation for "vector" data types. This chapter also introduced kernel attribute qualifiers necessary for compiler optimization. They talked about the kinds of memory spaces allowed in OpenCL and i liked the fact that they pointed out that pointer types cannot be re-interpreted across different memory address spaces. Cool.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Coming back to vector data types, the authors didn't show whether binary operations applied but they did show what'll happen when scalars are applied to vectors.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Atomic/synchronization functions are introduced in chapter 5 and they more or less look similar to CUDA's synchronization primitives.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In Chapter 6, i liked that the authors mentioned that you can load NVIDIA's PTX into OpenCL. Interoperability but i suspect that's only pertinent to the OpenCL driver provided by NVIDIA.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Case studies&lt;/div&gt;&lt;div&gt;---------------&lt;/div&gt;&lt;div&gt;Totally love this section of the book where many examples of real-world, working OpenCL are applied. This is where the money's at, i think.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4190948324817062524?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4190948324817062524/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4190948324817062524&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4190948324817062524'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4190948324817062524'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/09/opencl-programming-guide-book-review.html' title='OpenCL Programming Guide - A book review'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>3</thr:total><georss:featurename>Delirfance, Holland Village, 261 Holland Ave, Singapore 278986</georss:featurename><georss:point>1.311636 103.79569</georss:point><georss:box>1.2957615 103.775949 1.3275105 103.81543099999999</georss:box></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3928069511183774783</id><published>2011-09-04T16:21:00.000+08:00</published><updated>2011-09-04T16:21:43.193+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Debugger'/><category scheme='http://www.blogger.com/atom/ns#' term='debugging'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><title type='text'>Unwind A Program's Stack</title><content type='html'>&lt;span style="font-size: large;"&gt;Memories&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Recalling my fond days in taking CS101 more than a decade ago, one of the critical concepts i had to learn was to understand how to debug a program. As time went on, i became interested in compilers and the technologies used and took up Compiler Writing course. What a journey, i'll do it all over again and this time i'll do better ;)&lt;br /&gt;&lt;br /&gt;In this post, i wanted to share with you how you can really unwind the program stack using a C library, libunwind. Really cool stuff - wish i knew this earlier. Anyways, here goes...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Question&lt;/b&gt;&amp;nbsp;Why do i ever need to unwind the program stack programmatically?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Answer&lt;/b&gt;&amp;nbsp;This shouldn't come as a surprise since in many programming languages, you would have added a&amp;nbsp;&lt;i&gt;catch exception clause&lt;/i&gt;&amp;nbsp;that basically is unwinding the stack (when it encounters an error) but what this library does for you is that you can unwind the stack under more favourable circumstances other than receiving an exception ;)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.nongnu.org/libunwind"&gt;libunwind&lt;/a&gt;&amp;nbsp;- is exactly the library you need to build this sort of support into your programs. Imagine how cool it'll be if your friends can debug your program other than GDB - they'll love you!&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Building libunwind &amp;amp; issues&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Download libunwind or clone it thru Git.&lt;br /&gt;&lt;br /&gt;When you first build/install the software, you'll likely encounter two problems&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;/usr/include/bits/setjmp2.h:26:13: error: ‘longjmp’ aliased to undefined symbol ‘_longjmp’&lt;br /&gt;&lt;br /&gt;libtool: install: (cd /home/tayboonl/libunwind/src; /bin/bash /home/tayboonl/libunwind/libtool&amp;nbsp; --tag CC --mode=relink gcc -U_FORTIFY_SOURCE -fexceptions -Wall -Wsign-compare -XCClinker -nostartfiles -version-info 0:0:0 -o libunwind-setjmp.la -rpath /home/tayboonl/ray-opt/lib setjmp/longjmp.lo setjmp/siglongjmp.lo x86_64/longjmp.lo x86_64/siglongjmp.lo libunwind-elf64.la libunwind-x86_64.la libunwind.la -lc )&lt;br /&gt;mv: cannot stat `libunwind-setjmp.so.0.0.0': No such file or directory&lt;br /&gt;libtool: install: error: relink `libunwind-setjmp.la' with the above command before installing it&lt;br /&gt;make[3]: *** [install-libLTLIBRARIES] Error 1&lt;/blockquote&gt;&lt;br /&gt;At first i was stumped, then after browsing through the net, i've discovered that the main cause was because the default optimization level applied (-O2) actually triggers the GCC on the Ubuntu 11 to conduct security checks and it thinks that libunwind has flouted the rules. Admittedly, i'm not too concern at this point in time over security since what i wanted to do is implement this library ASAP and use it. Using the Ubuntu's wiki on compiler flags, circumventing it was to apply the&amp;nbsp;&lt;i&gt;-U_FORTIFY_SOURCE&lt;/i&gt;&amp;nbsp;(which stands for unknown fortify source) to the build.&lt;br /&gt;&lt;br /&gt;One thing you'll immediately notice is that the optimization level (-O2) is no longer applied. Just be aware of this.&lt;br /&gt;&lt;br /&gt;Second thing i've discovered is that a critical library cannot be linked and the hack i've provided below on&amp;nbsp;&lt;i&gt;LDFLAGS&lt;/i&gt;&amp;nbsp;will mitigate the shared library creation error.&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;./configure CFLAGS=-U_FORTIFY_SOURCE LDFLAGS=-L&amp;lt;full pathname&amp;gt;/libunwind/src/.libs --prefix=&amp;lt;prefix directory&amp;gt;&lt;/i&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Last thing is that building the program requires that config.h header file be present. Then i realize that this file has been relieved from the latest Ubuntu 11 kernel and then checking through a couple of forum posts, blogs i realized that it could be hacked by simply creating an empty file i.e.&amp;nbsp;&lt;span style="color: blue;"&gt;touch /usr/include/config.h&lt;/span&gt;&amp;nbsp;- reason for double checking this is because config.h has been used in various linux releases to define variables (i guess that's why its called a configuration header file) and building any simple program that uses the library is something like this&lt;br /&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;i&gt;&lt;b&gt;gcc&lt;/b&gt;&amp;nbsp;(prog name.c) -&lt;b&gt;I&lt;/b&gt;(install dir of libunwind) -&lt;b&gt;L&lt;/b&gt;(location of libraries of libunwind) -&lt;b&gt;lunwind&lt;/b&gt;&amp;nbsp;-&lt;b&gt;o&lt;/b&gt;&amp;nbsp;(prog name)&lt;/i&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A note of caution here is that you really shouldn't leave the&amp;nbsp;&lt;i&gt;config.h&amp;nbsp;&lt;/i&gt;file you just created lying around like that.&lt;br /&gt;&lt;br /&gt;Here's a simple program that shows you how to create a&amp;nbsp;&lt;i&gt;show backtrace&lt;/i&gt;&amp;nbsp;command in C.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;#define UNW_LOCAL_ONLY&lt;br /&gt;#include &amp;lt;libunwind.h&amp;gt;&lt;br /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;&lt;br /&gt;void show_backtrace(void) {&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;unw_cursor_t cursor; unw_context_t uc;&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;unw_word_t ip, sp;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;unw_getcontext(&amp;amp;uc);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;unw_init_local(&amp;amp;cursor, &amp;amp;uc);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;while( unw_step(&amp;amp;cursor) &amp;gt; 0) {&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;		&lt;/span&gt;unw_get_reg(&amp;amp;cursor, UNW_REG_IP, &amp;amp;ip);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;		&lt;/span&gt;unw_get_reg(&amp;amp;cursor, UNW_REG_SP, &amp;amp;sp);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;		&lt;/span&gt;printf("ip = %lx, sp = %lx\n", (long) ip, (long) sp);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;void simple_func_call_no_args() {&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;int i, j;&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;print("value of i=%d, j=%d\n", i, j);&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;show_backtrace();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int main(int argc, char** argv) {&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;simple_func_call_no_args();&lt;br /&gt;&lt;span class="Apple-tab-span" style="white-space: pre;"&gt;	&lt;/span&gt;return 0;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3928069511183774783?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3928069511183774783/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3928069511183774783&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3928069511183774783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3928069511183774783'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/09/unwind-programs-stack.html' title='Unwind A Program&apos;s Stack'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8704489385966585190</id><published>2011-08-23T14:37:00.000+08:00</published><updated>2011-08-23T14:37:25.415+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Parallel programming'/><category scheme='http://www.blogger.com/atom/ns#' term='Cilk'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel paradigm'/><title type='text'>Cilk</title><content type='html'>&lt;span style="font-size: x-large;"&gt;Cilk. Not Silk.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This is another parallel programming paradigm from MIT where the design goal was to allow the programmer to concentrate on expressing parallel constructs while the Cilk runtime takes care of the scheduling etc. Cilk has a commercial edition and was acquired by Intel in 2009 but recently, Intel &lt;a href="http://software.intel.com/en-us/articles/intel-cilk-plus/"&gt;open sourced Cilk Plus&lt;/a&gt; and available in the "cilkplus" branch in GCC 4.7 so i'm expecting a lot of activity in the next 12 months.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Issues getting Cilk to build on your operating system&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Ubuntu 11&lt;/span&gt;&lt;br /&gt;This OS comes with the GNU compiler suite version at 4.5 but there are issues while building the Cilk compiler, otherwise known as cilkc. The issue is that the Cilk-C compiler doesn't like &lt;i&gt;unnamed bit fields&lt;/i&gt; though this is perfectly legal with typical C compilers on Mac OS X and Ubuntu 11/10.&lt;br /&gt;&lt;br /&gt;Solving this unnamed bit fields is actually easy. Just give it a name and make sure there's no name collision. There is 1 file you need to take care and you can just give a unique name to the unnamed bit field and re-build your project.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;/usr/include/bits/waitstatus.h&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Mac OS X&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This OS comes with a pre-installed GNU compiler suite customized for Apple at 4.2.1 Again it faces the similar problems as encountered in Ubuntu 11. This time round, the files are different and that's owed to the fact that although they are both Linux-es, their kernel files are arranged in another manner.&lt;br /&gt;&lt;br /&gt;Fix the following files by providing a dummy name that's unique&lt;br /&gt;&lt;ul&gt;&lt;li&gt; /usr/include/stdlib.h&lt;/li&gt;&lt;li&gt;/usr/include/mac/i386/_structs.h&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Another issue concerns the concept of "&lt;i&gt;restrict&lt;/i&gt;" pointers and how Cilk-c compiler ignores it on the GCC 4.2.1 on Mac OS X. You can go read up about it on Wikipedia or your favourite C authors/books but the observation here is that Cilk doesn't understand restricted pointers in the C API &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;memcpy(...)&lt;/span&gt;&lt;/i&gt; when its attempting to compile its example cilk programs. Looks to me like a self-test where a compiler &amp;nbsp;This obviously can be a TODO for an enthusiast but the simplest thing i did was to ignore the build of those example programs but turns out skipping them is a bad idea as the final binary "cilkc" cannot be generated.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Drats.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The simplest solution is to replace the "offending" functions with their equivalents in "standard" C. e.g. coding a for-loop to replace &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;memcpy(...)&lt;/span&gt;&lt;/i&gt; &amp;amp; &lt;span class="Apple-style-span" style="color: blue;"&gt;&lt;i&gt;strcpy(...)&lt;/i&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Having that done, cilkc was generated and its static/dynamic libraries are generated.&lt;br /&gt;&lt;br /&gt;Hopefully this helps and the next couple of posts, i'll write something up about using Cilk&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8704489385966585190?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8704489385966585190/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8704489385966585190&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8704489385966585190'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8704489385966585190'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/08/cilk.html' title='Cilk'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7855281117326938716</id><published>2011-07-18T21:27:00.005+08:00</published><updated>2011-07-19T12:26:56.101+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MongoDB'/><category scheme='http://www.blogger.com/atom/ns#' term='MapReduce'/><category scheme='http://www.blogger.com/atom/ns#' term='NoSQL'/><title type='text'>MongoDB and JOINs</title><content type='html'>&lt;span class="Apple-style-span" style="font-size: large;"&gt;MongoDB's Great!&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.mongodb.org/"&gt;MongoDB's&lt;/a&gt; a popular NoSQL database that i've heard of for the last 4 months or so but never had any time to consider doing something until last week. I guess the lack of enthusiasm has a lot to do with the fact that 'traditional' databases like Oracle can be quite a RPITA to work with but they've withstood the test of time - though i admit things are changing pretty fast these days. MongoDB is really a good database tool but it's got a catch!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;MongoDB's Great BUT ...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This catch was a direct result of my fictitious assumption that MongoDB supports the concept of multiple table queries a.k.a SQL JOIN e.g. a SQL expression below illustrates&lt;br /&gt;&lt;br /&gt;'&lt;span class="Apple-style-span" style="color: blue;"&gt;SELECT CAR.plate, CAR.owner, Fines.owed from CAR, Fines where CAR.owner = Fines.doneBy&lt;/span&gt;'&lt;br /&gt;&lt;br /&gt;which would retrieve all records done by a particular owner when he/she ran a traffic fine. Contrived really but i quickly realized it couldn't be done with MongoDB, at least not now anyway. I was kind of annoyed but forgiving for reasons that i've come to expect the same lot with Oracle Databases but realized MongoDB != OracleDB.&lt;br /&gt;&lt;br /&gt;MongoDB is different. Accept that. MongoDB is simpler. Accept that too.&lt;br /&gt;&lt;br /&gt;For someone whose played with MongoDB would realize that its operations evolved from the concept of &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;collection&lt;/span&gt;&lt;/i&gt; &amp;amp; &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;document&lt;/span&gt;&lt;/i&gt;; w.r.t SQL parlance its &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;table&lt;/span&gt;&lt;/i&gt; and &lt;i style="color: blue;"&gt;row&lt;/i&gt;&amp;nbsp;respectively. You thought this is the only difference it has? You're wrong. There're others i've discovered which includes &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;Stored Javascript&lt;/span&gt;&lt;/i&gt; (akin to &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;Stored Procedures&lt;/span&gt;&lt;/i&gt;), &lt;i&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;Indexing&lt;/span&gt;&lt;/i&gt; etc which are freaking cool and the data structures that represent all forms of data you can store or retrieved from it is based in &lt;a href="http://www.bsonspec.org/"&gt;BSON&lt;/a&gt;. For web folks, you know what this means don't ya? ;) for everyone else, it simply means you can get your apps into market at a faster rate.&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Back to what you were saying about JOIN and MongoDB ...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;JOINs are pretty important in a lot of applications since everything we know is relational, in SQL parlance. SQL databases have taught us that all data is relational and we need to think in terms of tables and rows. If a relationship needs to established, then we need to select data from multiple tables and derive it.&lt;br /&gt;&lt;br /&gt;Question is whether we can get around it? Spending 2 hrs mucking around MongoDB's documentation revealed two things (1) MongoDB's MapReduce (2) write one yourself.&lt;br /&gt;&lt;br /&gt;Obviously i attempted the easier solution and a member on &lt;a href="http://stackoverflow.com/questions/3837394/mongodb-map-reduce-over-multiple-collections"&gt;stackoverflow&lt;/a&gt; seem to think its possible. But i ran aground. To understand why, the MapReducer's &lt;u&gt;reducer function&lt;/u&gt; has a primary purpose and it is to reduce the size of key value pairs the system has to handle at any point in time and hence it prohibits code like&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// aggregating values to the array&lt;br /&gt;function(k,v) {&lt;br /&gt;&amp;nbsp; var ret = v;&lt;br /&gt;&amp;nbsp; return v;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;or&lt;br /&gt;&lt;pre&gt;// a.k.a bypass&lt;br /&gt;function(k, v) {&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;MongoDB will complain that it doesn't like the fact your reducer does nothing, and also aggregating values to an array since it does absolutely nothing to reduce the key value pairs. There might be other ways to do this but at the time of this writing, i'm not aware of any (Please drop me a mail if you do). Turns out somebody has filed a feature &lt;a href="https://jira.mongodb.org/browse/SERVER-142"&gt;support ticket&lt;/a&gt; to sort out this JOIN. Yay!&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Parser for MongoDB multi-collection query&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I had no choice but to attempt option (2) which was fun! cos i get to built a simple SQL parser and translate the typical SQL syntax to a multi-collection/table query. The approach i took is something about the line that i like to use a SQL-like syntax that supports multi-table/collection (depending on your parlance) and allows me/user to write something like&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;select a.name, a.data, b.name, b.data, c.name, c.data from a, b, c where a.name = b.name and b.name = c.name&lt;/span&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;while underneath the machinery will conduct the query against MongoDB. The implementation approach is pretty simple where i parse the expression and isolate all expressions belonging to each table and keep it in a hashtable like data structure. What happens next is that each table query will be sent against Mongo and retrieved records are stored in RAM.&lt;br /&gt;&lt;br /&gt;As Mongo can't do multi-joins yet, the approach is to conduct a O(N^N) where N's the number of tables operation where the JOIN query is run against each known bunch of records, in turn i.e. &lt;span class="Apple-style-span" style="color: blue;"&gt;for a for b for c&lt;/span&gt;, and finally apply the SQL LIMIT expression to return the final collection of records.&lt;br /&gt;&lt;br /&gt;There are obvious short comings of this, i don't claim to have the answers to them. One shortcoming is that it doesn't scale very well as its limited by the RAM/DRAM available and its a compute intensive job. And if you're interested please visit &lt;a href="https://github.com/raygit/m2query"&gt;https://github.com/raygit/m2query&lt;/a&gt; All feed back is welcome!&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7855281117326938716?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7855281117326938716/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7855281117326938716&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7855281117326938716'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7855281117326938716'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/07/mongodb-and-multi-join.html' title='MongoDB and JOINs'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3951916162614112434</id><published>2011-06-17T09:23:00.001+08:00</published><updated>2011-06-17T09:25:59.307+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenMP'/><category scheme='http://www.blogger.com/atom/ns#' term='HPCSIG'/><category scheme='http://www.blogger.com/atom/ns#' term='MPI'/><category scheme='http://www.blogger.com/atom/ns#' term='PGI'/><title type='text'>Announcing the launch of HPC SIG Group in Singapore</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Hello everyone,&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Birth of a Special Interest Group in Singapore&amp;nbsp;&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;I just wanted to take this post to announce that myself and another fellow CUDA enthusiast have started a Special Interest Group (SIG) in Singapore that focuses on High Performance Computing which we affectionately call &lt;span class="Apple-style-span" style="color: blue;"&gt;HPCSIG Singapore&lt;/span&gt;.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Group's Focus&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;This group's focuses on gathering administrators, researchers, developers or anyone whose interested in HPC to come together to share and learn new knowledge. This includes technology topics in Fortran, C/C++, OpenMP, MPI, CUDA, OpenCL, Python, Ruby etcc&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Meetup&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Our first meetup was held on &lt;u&gt;8 June 2011&lt;/u&gt; at the Institute of High Performance Computing at&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Resonance Room, level 13, Connexis North Tower 1 Fusionopolis Way Singapore 138632 and we had about 20 attendees and 2 talks:&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: 12px;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Group's Information/Forum&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;https://groups.google.com/group/hpcsig-singapore?hl=en &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;(Google Group)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;http://www.facebook.com/home.php?sk=group_210520412314382 (Facebook)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Speakers for First Meetup&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Simulations for Maritime Industry in SimPlus and the Need for HPC&lt;br /&gt;Mr. Ye Rong&lt;br /&gt;CTO&lt;br /&gt;SimPlus Pte Ltd&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;a href="http://slidesha.re/iodpaI"&gt;Introduction to CUDA Programming&lt;/a&gt;&lt;br /&gt;Mr. Raymond Tay&lt;br /&gt;Software Engineer&lt;br /&gt;HP Labs, Singapore&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Participants&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;And we had the pleasure of Dr. Simon See (&lt;i&gt;Chief Solution Architect of Nvidia&lt;/i&gt;), Dr. Rich Goh (&lt;i&gt;Director in Institute of High Performance Computing&lt;/i&gt;) graced our first meetup. In the lineup were a HPC engineer who helped engineered the special effects for the &lt;i&gt;Lord of the Rings Trilogy&lt;/i&gt;, other HPC engineers whom are currently residing in Singapore helping to bring forth this movement.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;What's next?&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;The meetup for July is likely to be canceled as many of the participants are busy with their respective commitments so it would not be in the interest of the group to meet but August is on!&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;u&gt;Founder Co-Founder&lt;/u&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Mr. &lt;a href="http://www.speedgocomputing.com/"&gt;Chung Shin Yee&lt;/a&gt; (&lt;a href="http://groups.google.com/group/sgc-ruby-cuda/topics?hl=en-GB"&gt;Creator of SGC Ruby CUDA&lt;/a&gt;)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;Mr. &lt;a href="http://raymondtay.blogspot.com/"&gt;Raymond Tay&lt;/a&gt; (Cloud &amp;amp; GPU Developer)&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Helvetica, sans-serif;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3951916162614112434?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3951916162614112434/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3951916162614112434&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3951916162614112434'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3951916162614112434'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/06/announcing-launch-of-hpc-sig-group-in.html' title='Announcing the launch of HPC SIG Group in Singapore'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-6990925952442468337</id><published>2011-06-11T10:35:00.002+08:00</published><updated>2011-06-13T09:31:53.331+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel Algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='High Performance Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='HPC'/><category scheme='http://www.blogger.com/atom/ns#' term='web crawling'/><category scheme='http://www.blogger.com/atom/ns#' term='GPGPU'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><title type='text'>Exploring CUDA through Google's PageRank (Part 1)</title><content type='html'>&lt;span class="Apple-style-span" style="font-family: inherit; font-size: large;"&gt;Page Ranking&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;PageRank is new to me. In fact, i've only known about it some time in mid of 2010; talk about coming to a scene pretty late. For those of you whom don't know what PageRank is about, i'll let Wikipedia tell you all about it or as Google tells it "... a technology that determined the 'importance' of a webpage by looking at what other pages link to it, as well as other data."&amp;nbsp;Talk about 'peer' appraisal on the exascale. I do apologize for overloading the terms to the point of abuse.&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;What i do like about the PageRank is that it takes into account two aspects of human beings when it comes to sharing info:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;1) People just love to share information through the internet; thus enabling others to receive that information rapidly;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;2) People will eventually tire of sharing probably at the end of a day...before lunch, before a nap etc&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;I just love the sort of things people do with math, here's another one where mathematicians can describe a &lt;a href="http://www.neatorama.com/2010/12/22/cities-as-math-equations/"&gt;city using equations&lt;/a&gt;. Ok, that should be enough digression.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Why i did this?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;u&gt;Reason is because i asked myself&amp;nbsp;&lt;/u&gt;: Can i model the PageRank algorithm into the GPU so that i can use the super computer under my desk to compute how popular a web page really is?&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;First of all, take some time to understand the mathematics. Frankly, i'm no math major but it seems pretty alright since the equation is&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-MSVD0VUdmaM/TcCuwtKZcPI/AAAAAAAABLU/QuPAysAfGzE/s1600/Screen+shot+2011-05-04+at+AM+09.40.38.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="63" src="http://1.bp.blogspot.com/-MSVD0VUdmaM/TcCuwtKZcPI/AAAAAAAABLU/QuPAysAfGzE/s400/Screen+shot+2011-05-04+at+AM+09.40.38.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Trusting the wikipedia *makes the sign of the cross* and confirming that equation with my favourite algorithm texts, i was assured the math is correct. Whew. From the documentation, i understood that this equation needs to be iterated a number of times so that the results can be more reliable. Those folks in HPC shouldn't have much problem I imagine but how much a novice like myself go about doing this?&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit; font-size: large;"&gt;Modeling the problem ... some ideas ...&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: inherit;"&gt;Few considerations i can think of the top of my mind:&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Mine the Internet (that's crawling the web, really and for ease of programming i've used Python)&lt;/li&gt;&lt;li&gt;Represent the mined data into a graph (i use the adjacency matrix for starters)&lt;/li&gt;&lt;li&gt;Decide a damping factor (that's the d in the equation and i found that 0.85 seems to be the accepted value from &lt;a href="http://en.wikipedia.org/wiki/Pagerank#Damping_factor"&gt;here&lt;/a&gt;&amp;nbsp;and what 0.85 says is that the random surfer has 85% chance of clicking on ANY link and you are free to test different models by adjusting the damping factor)&lt;/li&gt;&lt;li&gt;Decide a number of iterations you are going to use to compute the probability distribution. From this fella's &lt;a href="http://pr.efactory.de/e-pagerank-algorithm.shtml"&gt;description&lt;/a&gt; in his/her post it appears that 100 iterations should suffice (Did i mention there would be hundreds of millions of pages?? That'll give your CPU a good run for the money you've paid)&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Those are just some of the stuff i've thought about, definitely not exhaustive.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I'm assuming you know how to do a web crawling in multi-threading/processing form (&lt;b&gt;note&lt;/b&gt;&amp;nbsp;that the liberal use of '&lt;i&gt;thread&lt;/i&gt;' and '&lt;i&gt;process&lt;/i&gt;' differs from 1 programming language to another)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now, to the implementation and benchmarks!!! Btw, the simulations i have is in the order of hundreds of millions of web links (some cyclic, some not) and running it on my mac book pro didn't work well as it crashed and burned as i didn't have enough RAM and for best results i've used by GTX 480 FERMI 2.0&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Results of performance tests and optimizations applied&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There are a couple of scenarios in the tests i've ran. To give you an idea what the following sub-sections is going to be about, i'll like to show you the benchmark results off the GT 330M Nvidia processor and GTX 480. One key thing i wanted to point out is that the two devices of different compute capabilities DO make a difference.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Results of test runs on GT 330M Nvidia processor (Compute Capability 1.2)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Having to cut back on the size of my data as it crashed much to my &lt;b&gt;horror&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;Moving forward, the tests are modelled based on 4 million web links, the average run times is taken from 10 test runs discarding 1 or 2 runs depending whether the deviation is too wide.&amp;nbsp;The CPU code was compiled to &lt;i&gt;level 2&lt;/i&gt; optimization and GPU code was compiled using the production release of CUDA 4.0 SDK.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Version 1 (straight forward):&lt;/b&gt;&lt;br /&gt;The average GPU run time (over ten runs discarding 1) = 7850.5 ms&lt;br /&gt;The average CPU run time (over ten runs discarding 1) = 22187.89 ms&lt;br /&gt;The speed up over the CPU version is = 2.83 times.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Version 2 (using shared memory, loop optimization etc):&lt;/b&gt;&lt;br /&gt;The average GPU run time (over ten runs discarding 1) = 3033.8 ms&lt;br /&gt;The average CPU run time (over ten runs discarding 1) = 20412.2 ms&lt;br /&gt;&lt;div&gt;The speed up over the CPU version is = 6.7 times&lt;/div&gt;&lt;div&gt;&lt;br /&gt;The speed up over the two GPU versions is approximately 2.4 times. The key improvement was to make use of the device's &lt;b&gt;shared memory&lt;/b&gt; so that it can be as close as possible to the GPU considering the latency of global memory access is between 400 - 600 cycles.&lt;br /&gt;&lt;br /&gt;The discrepancy in the CPU run times is probably due to a Mac OS X system process running in the background and also i was listening to &lt;i&gt;coding music&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Here's a glimpse of the device's configuration i used calculated by CUDA's occupancy calculator (an invaluable tool in your arsenal); see below for the diagram for GTX 480 (CC 2.0)&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-W78Zy707O6c/TfLY_FREq5I/AAAAAAAABLs/ql_agTASVZ4/s1600/Screen+shot+2011-06-11+at+AM+10.53.01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="425" src="http://1.bp.blogspot.com/-W78Zy707O6c/TfLY_FREq5I/AAAAAAAABLs/ql_agTASVZ4/s640/Screen+shot+2011-06-11+at+AM+10.53.01.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;b&gt;&lt;span style="font-size: small;"&gt;Results of test runs on GTX 480 Nvidia GPU (Compute Capability 2.0)&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Consume all of shared memory on device (no L1), compiled for sm_20, use fast math&lt;/u&gt;&lt;br /&gt;&lt;br /&gt;The same code used in Version 2 previously is tested here compiled for &lt;span class="Apple-style-span" style="color: blue;"&gt;-Xptxas -dlcm=cg -use_fast_math, -arch=sm_2&lt;/span&gt;0 and the run times shown here was fine tuned with the help of the visual profiler.&lt;br /&gt;&lt;br /&gt;The average GPU run time (over ten runs discarding 1) = 1176.9 ms&lt;br /&gt;The average CPU run time (over ten runs discarding 1) = 20362.6 ms&lt;br /&gt;The speed up over the CPU version is = 17 times&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;u&gt;Consume all of shared memory on device (no L1), compiled for sm_20, use fast math, texture memory&lt;/u&gt;&lt;/div&gt;&lt;div&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/div&gt;&lt;br /&gt;One last ditch effort came in the form of using texture memory, specifically speaking linear texture memories, on the GTX 480. Reason for poor run is because i've only managed to achieve a 3% hit rate in the texture memory. That's really because of &lt;b&gt;scatter reads and writes;&lt;/b&gt;&amp;nbsp;hence the &lt;b&gt;&lt;span class="Apple-style-span" style="color: blue;"&gt;REAL&lt;/span&gt;&lt;/b&gt; problem is actually the kernel has significant scatter read / write - so it doesn't really matter if i used texture memory or not because the problem's still there.&lt;br /&gt;&lt;br /&gt;The average GPU run time (over ten runs discarding 1) = 1142.1 ms&lt;br /&gt;The average CPU run time (over ten runs discarding 1) = 20092.3 ms&lt;br /&gt;The speed up over the CPU version is = 17 times&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-YLeTNC0UqFE/TfLZe9CIK-I/AAAAAAAABLw/oJDAeTXEIxI/s1600/Screen+shot+2011-06-11+at+AM+10.55.34.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="428" src="http://2.bp.blogspot.com/-YLeTNC0UqFE/TfLZe9CIK-I/AAAAAAAABLw/oJDAeTXEIxI/s640/Screen+shot+2011-06-11+at+AM+10.55.34.png" width="640" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Conclusion&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I can't really conclude this at this point in time since the kernel hasn't reached its maximum potential but a couple of things are useful to take note of and possibly worth sharing:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Choosing a graphics card that supports Fermi or later should be a good bet i.e. compute capability 2.x&lt;/li&gt;&lt;li&gt;Attempt to reduce global memory bandwidth by placing data as close as possible to the processing units e.g. shared memory, L1/L2 caches, Texture memories, constant memories.&lt;/li&gt;&lt;li&gt;Remember to compile your CUDA kernel code for architectures that best match your graphics card e.g. -arch=sm_12 -arch=sm_13 -arch=sm_20 since the default would be sm_10&lt;/li&gt;&lt;li&gt;Experiment with different ways to implement your kernel. Always think about improving memory access.&lt;/li&gt;&lt;li&gt;If you are determined to get the best performing code, then i'd recommend you get really familiar with the visual profiler and understand the metrics that's being measured and how each strategy you applied worked best.&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-size: large;"&gt;Roadmap ahead&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This kernel isn't optimal definitely (see diagrams above which implies low occupancy). I've plans to rewrite the kernel (after hours) and run some more tests - i'll update it in a separate post when i can.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-6990925952442468337?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/6990925952442468337/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=6990925952442468337&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6990925952442468337'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6990925952442468337'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/06/exploring-cuda-through-googles-pagerank.html' title='Exploring CUDA through Google&apos;s PageRank (Part 1)'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-MSVD0VUdmaM/TcCuwtKZcPI/AAAAAAAABLU/QuPAysAfGzE/s72-c/Screen+shot+2011-05-04+at+AM+09.40.38.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8062829743448277050</id><published>2011-04-27T09:48:00.000+08:00</published><updated>2011-04-27T09:48:06.018+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel Algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='HPC'/><category scheme='http://www.blogger.com/atom/ns#' term='GPGPU'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='Coulomb&apos;s Law'/><title type='text'>Exploring CUDA through computing Coulomb's Law</title><content type='html'>This is more like a physics post than anything else. I loved physics when i was doing my secondary school (that's high school in some countries) til pre-university days and at one point in time - i decided i was going to be a physicist! but i chose Computer Science. Well, that's a tad short bit of history for you anyway.&lt;br /&gt;&lt;br /&gt;So, this post isn't how i DIDN'T make it to be a physicist but rather an interesting application (admittedly, contrived) of CUDA into physics, namely the computation of force vectors between sets of charged particles.&lt;br /&gt;&lt;br /&gt;Some basics in Physics is in order ...&lt;br /&gt;&lt;br /&gt;&lt;u&gt;Coulomb's Law&lt;/u&gt; &lt;br /&gt;Atoms and molecules have the   same number of protons as electrons and  are neutral (without overall charge).&amp;nbsp;   Electrons can be transferred  from one object to another.&amp;nbsp; When this happens,   there is an excess of  electrons in one place and a deficiency of electrons in   another.  Charge is the result of an excess or deficiency&amp;nbsp;of electrons.&amp;nbsp; Where an    object has excess electrons, the object is negatively &lt;b&gt;charged&lt;/b&gt;.  Where there is a   deficiency of electrons, the object is positively  charged. Electric charge is usually represented in equations by the  letter q or Q.&lt;br /&gt;&lt;br /&gt;So, the next question is to ask whether 2 charged particles next to one another would "feel" a force of attraction or repulsion? (FYI: 2 charged particles would attract one another if they had opposite charges and repel one another if they had same charge.)&lt;br /&gt;&lt;br /&gt;In fact, they do and this felt force is computed via Coulomb's Law. &lt;span class="style1"&gt;Two charges, Q and q, separated by a distance, r, each experience a force of magnitude&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;span class="style1"&gt; &lt;span style="color: blue;"&gt;F = k|Qq|/r&lt;/span&gt;&lt;sup style="color: blue;"&gt;2&lt;/sup&gt;&lt;span style="color: blue;"&gt; &lt;/span&gt;where k = 9 x 10&lt;sup&gt;9&lt;/sup&gt; and |Qq| is the positive value of the product of Q and q&lt;/span&gt;&lt;br /&gt;&lt;span class="style1"&gt;&lt;/span&gt;&lt;/blockquote&gt;&lt;span class="style1"&gt; &lt;/span&gt;The above formulation is what is called the &lt;i&gt;scalar form&lt;/i&gt;. The &lt;i&gt;k&lt;/i&gt; is known as the &lt;i&gt;Coulomb Constant&lt;/i&gt;. Here's an image i dragged off Wikipedia to illustrate how charged particles can be thought of in their &lt;i&gt;scalar&lt;/i&gt; &amp;amp; &lt;i&gt;vector&lt;/i&gt; forms: &lt;br /&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/-afUZfOzXXI4/TbT8rtl6XLI/AAAAAAAABLI/J4Om9G_G0Tg/s1600/300px-CoulombsLaw.svg.png" style="margin-left: auto; margin-right: auto;" /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;Force Vectors&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&amp;nbsp;The takeaway from this picture is that (1) like charges repel (&lt;i&gt;top row&lt;/i&gt;) (2) opposite charges attract (&lt;i&gt;bottom row&lt;/i&gt;) and (3) each repulsion/attraction can be expressed as a &lt;i&gt;vector&lt;/i&gt; in 2D (You can extend this to 3D if you like). This is something i can imagine quite well but when i attempt to extrapolate it to say 6 x 10 pow 23 or some other big number; then it becomes a bit mind boggling. Question that i have is whether i can find some way to compute the electrostatic force from this imaginary large number of charged particles?&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;Turns out you can by applying the &lt;a href="http://en.wikipedia.org/wiki/Superposition_principle"&gt;Law of Superposition&lt;/a&gt; to Coulomb's Law over large numbers of such particles and the net effect is saying that the net force on any particle, &lt;i&gt;p&lt;/i&gt; ,&amp;nbsp; is the &lt;i&gt;vector sum&lt;/i&gt; of the other charged particles acting on this particle, &lt;i&gt;p&lt;/i&gt;.&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&lt;u&gt;Modelling the problem&lt;/u&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;To model it, i need to create 2 or more sets of charged particles and have CUDA compute their net force by applying both laws. Each particle can be represented by a float4 type (basically, a element that has 4 ordinates &lt;i&gt;x,y,z,w&lt;/i&gt;) ; each particle would be given a set of coordinates in 3D populated into their respective &lt;i&gt;x, y, z&lt;/i&gt; and &lt;i&gt;w&lt;/i&gt; will become the &lt;i&gt;charge&lt;/i&gt;.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;The parallel portion is shown below and in case you are wondering about force_calc; don't worry about it because its just the computation of Coulomb's Law which you can find samples of anywhere.&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&lt;a href="http://1.bp.blogspot.com/-f8ZJVSMM5p8/TbdzozBaM0I/AAAAAAAABLM/bWoZ00su-DA/s1600/Screen+shot+2011-04-27+at+AM+09.38.11.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="268" src="http://1.bp.blogspot.com/-f8ZJVSMM5p8/TbdzozBaM0I/AAAAAAAABLM/bWoZ00su-DA/s640/Screen+shot+2011-04-27+at+AM+09.38.11.png" width="640" /&gt;&lt;/a&gt;&lt;u&gt;&lt;br /&gt;&lt;/u&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&lt;u&gt;Side Note:&lt;/u&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;Another important thing you should remember (or probably know) is that we should always create the exact algorithm that'll run on a CPU (single core preferably). This serves the purpose of creating a reference implementation which you can debug your parallel stuff against and also to determine what kind of optimizations you need when you compare their run times.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;pre&gt;done with host force eval kernel&lt;br /&gt;host force eval took 157.8 ms&lt;br /&gt;done with gpu force eval kernel&lt;br /&gt;gpu force eval took 33.3 ms&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div class="" style="clear: both; text-align: left;"&gt;&lt;u&gt;Summary:&lt;/u&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Think you get the idea now :) also from the output, you can see that the run times is a 4.7 fold drop :) and that's w/o any kind of optimization (you can tell right?) and a couple of optimizations you can apply and that's one simple application of CUDA to high performance computing.&lt;/div&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8062829743448277050?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8062829743448277050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8062829743448277050&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8062829743448277050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8062829743448277050'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/04/exploring-cuda-through-computing.html' title='Exploring CUDA through computing Coulomb&apos;s Law'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-afUZfOzXXI4/TbT8rtl6XLI/AAAAAAAABLI/J4Om9G_G0Tg/s72-c/300px-CoulombsLaw.svg.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2474643250534117698</id><published>2011-03-16T22:29:00.001+08:00</published><updated>2011-03-16T22:29:49.907+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Automation'/><category scheme='http://www.blogger.com/atom/ns#' term='Linden Lab'/><category scheme='http://www.blogger.com/atom/ns#' term='SecondLife'/><title type='text'>Automation - My thoughts ...</title><content type='html'>I like to take this post to share my thoughts on Software Automation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Why the need for Automation?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This topic has been explored by many authors in books and blogs. Automation exists in many industries so don't think that its a "&lt;i&gt;IT&lt;/i&gt;" thing. The cookies you are enjoying right now, the milk you're drinking to the foods you are eating are highly automated. Automation, in my opinion, arose from a human trait - being more effective and productive.&lt;br /&gt;&lt;br /&gt;I'm not going to write about industrial automation since its not something i do from day to day back then though i suspect they share similar traits; so i'm going to focus on automating software or software automation and there's an area called "&lt;span style="color: blue;"&gt;automated software testing&lt;/span&gt;" which is a subset of software automation,imo.&lt;br /&gt;&lt;br /&gt;One reality of Automation,i realized, is that it'll take time to setup the necessary processes, applications and would draw some of your best QA talent from their daily work but the benefits you will reap will far outweigh its initial drawbacks for the simple reason that you would have freed up the resources you previously held and hence you have more QA resources to work on the really big problems that need attention.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;What made me decide to do Automation from QA?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Quite some time back, a company i used to work for recognize that over a long run QA couldn't effectively run its business when you consider a server solution that supported tens of thousands of 3D-clients connected to a back-bone of 10,000 servers.&lt;br /&gt;&lt;br /&gt;It was an exciting opportunity for me, personally, since white-box testing was a largely manual process but there were many other QA tasks that could be automated and hence raise overall efficiency and i wanted to explore that.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;What do you automate?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is tricky stuff. Some folks would like to automate everything but that is not practical. Focus on the problem you're trying to solve that is not is not easily solvable but if you pull it off, you can solve all other problems.&lt;br /&gt;&lt;br /&gt;Many of the stuff you can list as potential candidates often manifest themselves as problems you've been solving repeatedly so a couple of examples could be:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Application startup, shutdown&lt;/li&gt;&lt;li&gt;Application deployment (I know some orgs actually still rely on human efforts)&lt;/li&gt;&lt;li&gt;Application build and release&lt;/li&gt;&lt;li&gt;Application build, test and release &lt;/li&gt;&lt;li&gt;...many others&lt;/li&gt;&lt;/ul&gt;Here's what we did on determining that:&lt;br /&gt;&lt;br /&gt;Back then, QA implemented many processes w.r.t testing, coding standards, code reviews and the like and it was a rigorous effort. But still bugs are reported by the users and that means that our test coverage was inadequate. The answer to this question is not simply to add more QA resources but to solve the problem in a smarter manner...&lt;br /&gt;&lt;br /&gt;Don't get me wrong here, there was automated testing implemented but neither QA nor the Devs had any visibility into how the tests relate to one another, how they vary from release to release and its a tough thing to keep track of considering that you have tens of thousands of tests cases. QA couldn't effectively exhaust all possible testing scenarios.&lt;br /&gt;&lt;br /&gt;In effect, what we/QA/Dev wanted was something like this&lt;br /&gt;1) Build the app and run automated tests on a platform type. e.g. Win32, Win64, Linux32 etc; report success/failures&lt;br /&gt;2) Pre-release (automated) testing to a controlled environment; report success/failures&lt;br /&gt;3) Collect &amp;amp; archive test metrics for trending purposes (afaik, it wasn't implemented by the time i left that company)&lt;br /&gt;&lt;br /&gt;The end result of a series of brain storming sessions revealed that we needed a new way of doing things! Continuous Integration and SCM tools help realized 1) and 2) :)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Source Control Management - How does that come into the picture?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A big part of CI is to set up several gateways where the quality of the code can be checked before the actual build happens. As devs, we all know that when your code base expands rapidly, the time it takes to build an application becomes longer (not to mention source code locking) so it's best not to check-in code that doesn't compile or work else you'll incur the wrath of your co-workers.&lt;br /&gt;&lt;br /&gt;Hence, the first gateway you should consider is scrutinizing the code that's being checked in. Modern SCM tools like Mercurial have a simple regression test framework in python &amp;amp; shell-script driven testing that runs automated tests for your code. Another example would be SVN, read about it &lt;a href="http://subversion.apache.org/docs/community-guide/building.html#automated-tests"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;Check-in code&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;SCM runs first-line defense&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;rejects code if tests fail, accepts code otherwise&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;SCM is not enough ...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Of course, this scrutinizing is somewhat limited in this effectiveness - so you need to ensure that devs check in tests that accompany the codes. This is TDD or Test Driven Development and back then, the product was a mix of C++, Python, Perl and so you can imagine that we used lightweight regression test frameworks to incorporate tests.&lt;br /&gt;&lt;br /&gt;Alright, you say, now you've seem to have all the tests incorporated but how does that solve the problem? Who will run the tests and report the results? Before that, how does anyone know that code's been checked-in and builds need to happen?&lt;br /&gt;&lt;br /&gt;You need Continuous Integration.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Continuous Integration - How does that help?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Martin Fowler said it best. Click &lt;a href="http://martinfowler.com/articles/continuousIntegration.html"&gt;this&lt;/a&gt; to read about it. Key thing to remember about CI is that &lt;span style="color: blue;"&gt;frequent code check-ins trigger automated builds&lt;/span&gt; and that flashes errors faster than the usual build-test-release cycle you are used to. &lt;br /&gt;&lt;br /&gt;A typical CI tool would allow have the capability to pull sources codes out onto its base, compile and run the automated tests you've built in - of course, tests results would be published to the tool's dashboard and alert the QA/Dev that a build/test failed and require attention.&lt;br /&gt;&lt;br /&gt;Another thing that a CI tool would do is to allow you to upload modules that measure and informs you the coverage of your tests!&lt;br /&gt;&lt;br /&gt;That is a really cool thing to have :) because a tool like this allows both QA and Dev to see the same problem in the same language.&lt;br /&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;CI Tool detects code checked-in&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;runs build&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;runs automated tests&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;runs coverage tests&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;reports build pass/fail &amp;amp; publishes test artifacts to dashboard&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;But is that all it can do? Fortunately, no. Read on.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;Extended Automation Framework via Python&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Previously, i mentioned that we had a CI tool that builds, tests the app. In addition, the tool archived/published test results so that Devs/QA can determine which code check-ins caused a particular test/build to fail.&lt;br /&gt;&lt;br /&gt;Now, i want to share with you something else we did. Part of the problem we had was that black box QA were quite involved in manual testing of the 3D client because there was no automated way to interact with such an application, even less report errors in a concise manner.&lt;br /&gt;&lt;br /&gt;In a nutshell, we build a Python framework that allows any Dev/QA to control the 3D client in Linux, Windows &amp;amp; Mac OS X. The idea was to allow our QA/Devs to write automated test scripts in python so that the manual testing could be relieved. The framework came with a small set of APIs back then, but i'm sure it has grown much bigger these days. &lt;br /&gt;&lt;br /&gt;Can you imagine how our test coverage blew sky high?!!!!&lt;br /&gt;&lt;br /&gt;That was a tough problem to crack but we did it :) and that's because we had a bunch of dedicated and passionate engineers working on it.&lt;br /&gt;&lt;br /&gt;But, that's not the end of the story ;)&lt;br /&gt;&lt;br /&gt;With such a framework in place, QA now had the capability to run automated tests with a freshly build app (from the CI tool) in an environment of their choice. So, it resembles something like this:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: blue;"&gt;CI Tool&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;builds app&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;test the (native) app through automated test scripts in Python&lt;/span&gt; &lt;b&gt;-&amp;gt;&lt;/b&gt; &lt;span style="color: blue;"&gt;publishes test results back to CI Tool's dashboard&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;So that's it.&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2474643250534117698?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2474643250534117698/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2474643250534117698&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2474643250534117698'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2474643250534117698'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/03/automation-my-thoughts.html' title='Automation - My thoughts ...'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3049245894444478978</id><published>2011-03-10T15:31:00.000+08:00</published><updated>2011-03-10T15:31:42.239+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 4.0'/><title type='text'>Installing CUDA 4.0 on Ubuntu 10.10 (watch out for a minor detail)</title><content type='html'>Just a quick gotcha when installing CUDA 4.0 onto your Ubuntu 10.10.&lt;br /&gt;&lt;br /&gt;If you recently encountered a situation (like myself) whereby you've downloaded the latest CUDA 4.0 Toolkit from Nvidia together with an Ubuntu image upgrade, rebooted your machine and discovered that your lovely X server doesn't start and complains of API mismatch then this is for you.&lt;br /&gt;&lt;br /&gt;Re-install the downloaded device driver (i.e devdriver_XXXX.run) and it should work again (w/o rebooting) by issuing "&lt;span style="color: blue;"&gt;sudo service gdm restart&lt;/span&gt;" or "&lt;span style="color: blue;"&gt;sudo service gdm start&lt;/span&gt;"&lt;br /&gt;&lt;br /&gt;Technical reason is simple: I installed the device drivers onto an existing image (while the newly upgraded image hasn't been loaded) and when it restarts, you would naturally boot into the upgrade image and the X server's like WTH ? and you're like WTH ?&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3049245894444478978?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3049245894444478978/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3049245894444478978&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3049245894444478978'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3049245894444478978'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/03/installing-cuda-40-on-ubuntu-1010-watch.html' title='Installing CUDA 4.0 on Ubuntu 10.10 (watch out for a minor detail)'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-1342019860179340397</id><published>2011-02-02T10:25:00.002+08:00</published><updated>2011-02-10T22:54:14.059+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Parallel Algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='collective intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='pearson coefficient'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='euclidean distance'/><category scheme='http://www.blogger.com/atom/ns#' term='collaborative filtering'/><title type='text'>Exploring CUDA via the Euclidean Distance</title><content type='html'>So this post is really inspired after reading Chapter 1 of Toby Segaran's book "Programming Collective Intelligence". In that chapter, Toby talked about making recommendations based on the ideas behind the &lt;a href="http://xw2k.nist.gov/dads/html/euclidndstnc.html"&gt;Euclidean Distance&lt;/a&gt; and &lt;a href="http://en.wikipedia.org/wiki/Pearson_coefficient"&gt;Pearson Coefficient&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;I thought of doing a simple exercise and see stuff works out in CUDA. Here's a sample implementation of the Euclidean Distance and conducted benchmarking both on a Nvidia GTX 480 and GT 330M.&lt;br /&gt;&lt;br /&gt;The implementation is pretty straight forward since the block that is suitable for massively parallelism is easily identifiable and here's how. Using the wikipedia's &lt;a href="http://en.wikipedia.org/wiki/Euclidean_distance"&gt;article on Euclidean distance&lt;/a&gt;, we focus the attention on this equation:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/TSEliqeA1bI/AAAAAAAABKU/xHtaEakP10k/s1600/Screen%2Bshot%2B2011-01-03%2Bat%2BAM%2B09.24.03.png" imageanchor="1"&gt;&lt;img border="0" height="38" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/TSEliqeA1bI/AAAAAAAABKU/xHtaEakP10k/s320/Screen%2Bshot%2B2011-01-03%2Bat%2BAM%2B09.24.03.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;Reading the equation from left-to-right or from right-to-left, you can tell that its basically performing computations on two 1-dimensional arrays, namely P and Q aka Euclidean 2-space. From there, you can easily tell that CUDA's technology is made for these forms of computations and best of all - you'll probably never encounter the problem of divergent computation and you can tap into the raw computing power of your GPU since the GPU's constantly busy executing as occupancy's close to 1! that's pretty cool in my dictionary :)&lt;br /&gt;&lt;br /&gt;One implementation that's pretty straightforward to implement is to model these 2 spaces into 2 different input arrays i've named P and Q, coincidentally. If you have read the documentation or books in CUDA, you would know the next few steps is to allocate memory on the device and design your GPU kernel to run these computations. The pseudo algorithm for the GPU kernel is something like this:&lt;br /&gt;&lt;pre&gt;1. Load input arrays, 'P' &amp;amp; 'Q', into shared memory&lt;br /&gt;2. For each input element in 'P' &amp;amp; 'Q' in the shared memory, load them and  &lt;br /&gt;   perform the computation of raising the result (i.e. taking the difference&lt;br /&gt;   of 'P' &amp;amp; 'Q') to the power of 2&lt;br /&gt;3. Store the previous result to the device memory of the GPU&lt;br /&gt;&lt;/pre&gt;. You'll notice that this GPU kernel's not taking the square-root of the result in this kernel - it'll be done in the typical C setting i.e. computing in the &lt;i&gt;host&lt;/i&gt; and not on the GPU and the reason is due to the mathematical nature of performing square-roots since &lt;i style="color: blue;"&gt;sqrt( a + b ) != sqrt(a) + sqrt(b)&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;One gotcha you need to be aware of is that on devices such as those with compute capability 1.2 and lower (e.g. yours truly's mac book pro running GT 330M) the double-precision FP computation is downgraded to single-precision FP. You'll notice a warning message like "&lt;i style="color: blue;"&gt;warning : Double is not supported. Demoting to float&lt;/i&gt;"&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Next question: How do i/you do all this with &lt;a href="http://code.google.com/p/thrust/"&gt;Thrust&lt;/a&gt;?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Thrust is the library you should probably use when it comes to developing stuff with CUDA. It just makes things easier. Here's one implementation with the CUDA Thrust that i've made - one thing you need to take note is that if you conduct this test on a machine that has &lt;i&gt;compute capability&lt;/i&gt; of 1.x then you'll get the warning message i've highlighted above; otherwise it should work well on device's with capability of 2.x&lt;br /&gt;&lt;pre&gt;1 #include &amp;lt;thrust/host_vector.h&amp;gt;&lt;br /&gt;  2 #if __CUDA_ARCH__ == 200 &lt;br /&gt;  3  #include &amp;lt;thrust/device_vector.h&amp;gt;&lt;br /&gt;  4 #elif __CUDA_ARCH__ == 100&lt;br /&gt;  5  #include &amp;lt;thrust/device_ptr.h&amp;gt;&lt;br /&gt;  6 #endif&lt;br /&gt;  7 #include &amp;lt;thrust/generate.h&amp;gt;&lt;br /&gt;  8 #include &amp;lt;thrust/reduce.h&amp;gt;&lt;br /&gt;  9 #include &amp;lt;thrust/transform.h&amp;gt;&lt;br /&gt; 10 #include &amp;lt;thrust/functional.h&amp;gt;&lt;br /&gt; 11 #include &amp;lt;cstdlib&amp;gt;&lt;br /&gt; 12 #include &amp;lt;iostream&amp;gt;&lt;br /&gt; 13 &lt;br /&gt; 14 struct powfunctor&lt;br /&gt; 15 {&lt;br /&gt; 16     __host__ __device__&lt;br /&gt; 17     float operator()(const float&amp;amp; p, const float&amp;amp; q) const {&lt;br /&gt; 18         return pow( p - q, 2);&lt;br /&gt; 19     }&lt;br /&gt; 20 };&lt;br /&gt; 21 &lt;br /&gt; 22 template &amp;lt;typename inputiterator1,typename inputiterator2&amp;gt;&lt;br /&gt; 23 double computeGold(InputIterator1 first1, InputIterator1 last1, InputIterator2 first2, InputIterator2 last2)&lt;br /&gt; 24 {&lt;br /&gt; 25     float sum = 0.0;&lt;br /&gt; 26     for(; (first1 != last1) &amp;amp;&amp;amp; (first2 != last2); ++first1, ++first2)&lt;br /&gt; 27         sum += pow(*first1 - *first2, 2);&lt;br /&gt; 28 &lt;br /&gt; 29     float s1 = sqrt(sum);&lt;br /&gt; 30     std::cout &amp;lt;&amp;lt; "Gold=" &amp;lt;&amp;lt; s1 &amp;lt;&amp;lt; std::endl;&lt;br /&gt; 31 &lt;br /&gt; 32     return s1;&lt;br /&gt; 33 }&lt;br /&gt; 34 &lt;br /&gt; 35 int main(void)&lt;br /&gt; 36 {&lt;br /&gt; 37 #if __CUDA_ARCH__ == 200&lt;br /&gt; 38     thrust::device_vector&amp;lt;float&amp;gt; p_vec(1 &amp;lt;&amp;lt; 20);&lt;br /&gt; 39     thrust::device_vector&amp;lt;float&amp;gt; q_vec(1 &amp;lt;&amp;lt; 20);&lt;br /&gt; 40     thrust::device_vector&amp;lt;float&amp;gt; r_vec(1 &amp;lt;&amp;lt; 20);&lt;br /&gt; 41     thrust::generate(p_vec.begin(), p_vec.end(), rand);&lt;br /&gt; 42     thrust::generate(q_vec.begin(), q_vec.end(), rand);&lt;br /&gt; 43     // Current Thrust's transformations supports 2 input vectors, so we use it&lt;br /&gt; 44     thrust::transform(p_vec.begin(), p_vec.end(), q_vec.begin(), r_vec.begin(), powfunctor());&lt;br /&gt; 45 &lt;br /&gt; 46     int sum = thrust::reduce(r_vec.begin(), r_vec.end(), (int)0, thrust::plus&amp;lt;float&amp;gt;());&lt;br /&gt; 47     std::cout &amp;lt;&amp;lt; "sqrt(" &amp;lt;&amp;lt; sum  &amp;lt;&amp;lt; ")=" &amp;lt;&amp;lt; sqrt(sum) &amp;lt;&amp;lt; std::endl;&lt;br /&gt; 48 #else&lt;br /&gt; 49&lt;br /&gt;50 &lt;br /&gt; 51     unsigned int  N = 1 &amp;lt;&amp;lt; 20;&lt;br /&gt; 52     thrust::host_vector&amp;lt;float&amp;gt; p_vec(N);&lt;br /&gt; 53     thrust::host_vector&amp;lt;float&amp;gt; q_vec(N);&lt;br /&gt; 54     thrust::host_vector&amp;lt;float&amp;gt; r_vec(N);&lt;br /&gt; 55&amp;nbsp;    srand(0);&lt;/pre&gt;&lt;pre&gt;56     thrust::generate(p_vec.begin(), p_vec.end(), rand);&lt;br /&gt; 57     thrust::generate(q_vec.begin(), q_vec.end(), rand);&lt;br /&gt; 58 &lt;br /&gt; 59     double referenceSoln = computeGold(p_vec.begin(), p_vec.end(), q_vec.begin(), q_vec.end());&lt;br /&gt; 60 &lt;br /&gt; 61     // device memory 'raw' pointers&lt;br /&gt; 62     float* raw_ptr_P;&lt;br /&gt; 63     float* raw_ptr_Q;&lt;br /&gt; 64     float* raw_ptr_R;&lt;br /&gt; 65 &lt;br /&gt; 66     cudaMalloc( (void**)&amp;amp;raw_ptr_P, (N)*sizeof(float));&lt;br /&gt; 67     cudaMalloc( (void**)&amp;amp;raw_ptr_Q, (N)*sizeof(float));&lt;br /&gt; 68     cudaMalloc( (void**)&amp;amp;raw_ptr_R, (N)*sizeof(float));&lt;br /&gt; 69 &lt;br /&gt; 70     thrust::device_ptr&amp;lt;float&amp;gt; dev_ptr_P(raw_ptr_P);&lt;br /&gt; 71     thrust::device_ptr&amp;lt;float&amp;gt; dev_ptr_Q(raw_ptr_Q);&lt;br /&gt; 72     thrust::device_ptr&amp;lt;float&amp;gt; dev_ptr_R(raw_ptr_R);&lt;br /&gt; 73 &lt;br /&gt; 74     thrust::copy(p_vec.begin(), p_vec.end(), dev_ptr_P);&lt;br /&gt; 75     thrust::copy(q_vec.begin(), q_vec.end(), dev_ptr_Q);&lt;br /&gt; 76 &lt;br /&gt; 77     // uncommenting the following will produce errors for 1.x devices &lt;br /&gt; 78     // complaining that CUDA doesn't support function pointers and function &lt;br /&gt; 79     // templates. reason is because a host function like 'rand' cannot be &lt;br /&gt; 80     // executed in the device i.e. GPU&lt;br /&gt; 81     //thrust::generate(dev_ptr_P, dev_ptr_Q + N, rand);&lt;br /&gt; 82     //thrust::generate(dev_ptr_Q, dev_ptr_Q + N, rand);&lt;br /&gt; 83 &lt;br /&gt; 84     thrust::transform(dev_ptr_P, dev_ptr_P + N, dev_ptr_Q, dev_ptr_R, powfunctor());&lt;br /&gt; 85 &lt;br /&gt; 86     float sum = thrust::reduce(dev_ptr_R, dev_ptr_R + N, (float)0, thrust::plus&amp;lt;float&amp;gt;());&lt;br /&gt; 87     std::cout &amp;lt;&amp;lt; "1. GPU " &amp;lt;&amp;lt; sqrt(sum) &amp;lt;&amp;lt; std::endl;&lt;br /&gt; 88     std::cout &amp;lt;&amp;lt; "2. CPU " &amp;lt;&amp;lt; referenceSoln &amp;lt;&amp;lt; std::endl;&lt;br /&gt; 89 #endif&lt;br /&gt; 90 &lt;br /&gt; 91     std::cout &amp;lt;&amp;lt; "END" &amp;lt;&amp;lt; std::endl;&lt;br /&gt; 92     return 0;&lt;br /&gt; 93 }&lt;br /&gt; 94 &lt;br /&gt;&lt;/pre&gt;Before i go off, i leave with you Rich Hickey's latest talk on Hammock-drive Development. A really cool video and i strongly recommend that you catch all 40 mins of it. This is a great video to start thinking about improving self-thinking in 2011  ﻿﻿&lt;br /&gt;Would this make computing say recommendation algorithms faster? By the pure fact that you're leveraging the GPU the answer is a quick "Yes" but the actual thing i wanted to show in this post is that there are always at least more than 1 way to get something done which is simply "freedom".&lt;br /&gt;&lt;br /&gt;Have fun!&lt;br /&gt;&lt;a href="http://twitter.com/share" class="twitter-share-button" data-count="vertical" data-via="RaymondTayBL"&gt;Tweet&lt;/a&gt;&lt;script type="text/javascript" src="http://platform.twitter.com/widgets.js"&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-1342019860179340397?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/1342019860179340397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=1342019860179340397&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/1342019860179340397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/1342019860179340397'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2011/02/exploring-cuda-via-euclidean-distance.html' title='Exploring CUDA via the Euclidean Distance'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/TSEliqeA1bI/AAAAAAAABKU/xHtaEakP10k/s72-c/Screen%2Bshot%2B2011-01-03%2Bat%2BAM%2B09.24.03.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7885622512047204893</id><published>2010-12-29T09:52:00.001+08:00</published><updated>2010-12-31T09:34:30.964+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel string matching algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='string matching algorithms'/><title type='text'>My humble google code project</title><content type='html'>Just wanted to inform everyone that i've released some sample implementations of exact string matching algorithms in CUDA into my first google code project &lt;a href="http://code.google.com/p/exactstrmatchgpu/"&gt;exactstrmatchgpu&lt;/a&gt;. As a first release ever, i've included the Bruteforce, Horspool and QuickSearch equivalents in CUDA - i surely like your comments and feedback :).&lt;br /&gt;&lt;br /&gt;I like to leave you with this screen shot from Rich Hickey's recent talk in October 2010 which sums up pretty much the motivation behind this stuff i've just put out.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/TR0ytggbfCI/AAAAAAAABKQ/NLF9UKBAjKE/s1600/Screen+shot+2010-12-31+at+AM+09.28.33.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="203" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/TR0ytggbfCI/AAAAAAAABKQ/NLF9UKBAjKE/s320/Screen+shot+2010-12-31+at+AM+09.28.33.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The code is messy but i'm working on cleaning it up. Thank you for your understanding and patience - meanwhile send me feedback!!!&lt;br /&gt;&lt;br /&gt;I've planned to make more implementations available in the coming weeks before my classes start.&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7885622512047204893?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7885622512047204893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7885622512047204893&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7885622512047204893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7885622512047204893'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/12/my-humble-google-code-project.html' title='My humble google code project'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_n2HkB0XD3Kw/TR0ytggbfCI/AAAAAAAABKQ/NLF9UKBAjKE/s72-c/Screen+shot+2010-12-31+at+AM+09.28.33.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8671201492272464936</id><published>2010-11-30T09:27:00.001+08:00</published><updated>2010-11-30T09:30:23.234+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenCL'/><category scheme='http://www.blogger.com/atom/ns#' term='parallel string matching algorithms'/><category scheme='http://www.blogger.com/atom/ns#' term='string matching algorithms'/><title type='text'>CUDA - An demonstration w.r.t Exact String Matching</title><content type='html'>Haven't been posting about CUDA stuff for a while and i thought why not do something?&lt;br /&gt;&lt;br /&gt;The stuff i'm going to show you is what i submitted for the &lt;a href="http://www.elsevier.com/wps/find/bookdescription.cws_home/724275/description#description"&gt;GPU Computing Gems&lt;/a&gt; by &lt;a href="http://www.elsevier.com/"&gt;Elsevier&lt;/a&gt; - though my work was eventually turned down by the editorial committee recently at the final presentation. Did a post-mortem and realized that i put in too little hours (typically you should spend 3 months per kernel for implementation, testing &amp;amp; optimization and i had worked out 3 CUDA algorithms). Personally disappointed but for some one whose a virgin writer in submitting papers, i'm thankful for being given the&amp;nbsp; opportunity - i like to thank Prof Hwu for his generosity in his comments and review.&lt;br /&gt;&lt;br /&gt;So...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Motivation&lt;/b&gt;&lt;br /&gt;The idea is really simple: &lt;br /&gt;&lt;blockquote&gt;How can i turn algorithms that were designed to run on CPUs to run on the graphics card(s) with possibly running hundreds of cores e.g. GTX2XX, GTX480, GTX580?&amp;nbsp;&lt;/blockquote&gt;&lt;br /&gt;CUDA is possibly the answer to this (possibly in the next 10 years or so). &lt;br /&gt;&lt;br /&gt;CUDA is a extension of the C language (makes sense doesn't it when it comes to controlling the hardware?) and its not that hard to learn it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Getting to it&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;After reading &lt;a href="http://www.amazon.com/Programming-Massively-Parallel-Processors-Hands/dp/0123814723/ref=pd_bxgy_b_img_b"&gt;David Kirk &amp;amp; Prof Hwu's&lt;/a&gt; book, i felt that i just gotta try it out! Next, i spent some time thinking about the sort of problems i can apply the techniques i've learnt and one subject stood out: Exact String Matching. They seemed perfect since the problem was well understood for decades and serial implementations made available via online texts or classic textbooks.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How do i parallelize a serial algorithm?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is the first hurdle you've got to deal with ...&lt;br /&gt;&lt;br /&gt;IMO, it helps to start with something simple (Think about the technology for implementation at a later time).&lt;br /&gt;&lt;br /&gt;Take a pen/pencil and paper and start drawing what you think the algorithm should look like. I use squigglies to represent threads, drew blocks to house the swigglies etc. Plot a data-flow graph on how your implementation would look like. Check your algorithm for potential embarrassingly data parallel code (for-loops are good candidates), identify them and exploit their existence. Depending on your experiences and exposure to this stuff, it could be easy or extremely excruciating.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Which algorithms did i apply parallelism to?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Embarrassingly one of them is the brute-force, the other two are Horspool and QuickSearch. I've picked the latter two because by solving one of them, i get the solution to the other for free :)&lt;br /&gt;&lt;br /&gt;The implementation i have is not optimized, that i know unfortunately since i was rushing to use it to submit an initial draft and thought i should be able to fix it in the inbetween hours of off-work and bedtime (i'm SOooooo wrong). Anyway that's history.&lt;br /&gt;&lt;br /&gt;I don't have a multi-GPU solution at the moment since 1 GTX 480 set me back by 1000 singapore dollars and i wasn't ready to replace my rig which even though had 2 PCIe slots but were physically TOO DAMN CLOSE to house 2 GTX 480s.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;I won't present the code here since i'm also in the preparing to host those codes into a repository - i'll let you know where it is if you are interested. Drop me a mail.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Initial results&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Benchmarking is very important, so starting with a relatively small data set and string i went about testing the algorithm with and without shared memory. Through Nvidia's documentation, i realized that shared memory offers a aggregated bandwidth of 1 TB/s. Woot! To get an idea of how the algorihtm performed, Nvidia provided a Visual Profiler that's very useful in helping me to understand where the latencies are.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/TNjOYem8p3I/AAAAAAAABJ4/JpNJY43Byu4/s1600/Screen+shot+2010-11-09+at+PM+12.27.07.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="250" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/TNjOYem8p3I/AAAAAAAABJ4/JpNJY43Byu4/s400/Screen+shot+2010-11-09+at+PM+12.27.07.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Initially, you will notice 2 long bars which respectively represent the actual kernel function &lt;b&gt;strstr2&lt;/b&gt; and the memory alignment function, &lt;b&gt;memset32_aligned1D&lt;/b&gt;, used internally by CUDA. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;That's not good at all, how to improve it?&lt;/b&gt;&lt;br /&gt;Couple of things came to mind, one is removing dependencies on the global memory and improving memory coalescing. Register spilling doesn't appear to be occurring, whew. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;Use Constant Memory&lt;/b&gt;&lt;br /&gt;This sort of memory is very useful in other applications (10% reduction in run times) but not in my implementation. Reason why its effective is because you need all threads (e.g. in a block) to read the variable (allocated via the &lt;span style="color: blue;"&gt;__const__&lt;/span&gt;) conversely why its not effective in my implementation is because the search against the memory access pattern in block-1 differs from say block-99 but if you have a situation where all blocks are executing an instruction which accesses the same memory location, then its good.&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;Respect data structure alignment&lt;/b&gt;&lt;br /&gt;This is important because in the current architecture, global memory supports reads/writes of 1,2,4,8 &amp;amp; 16 bytes. If they are not, then it means the compiler needs to generate multiple instructions to support the non-aligned access.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Use Shared Memory&lt;/b&gt; &lt;br /&gt;The kernel function, &lt;i&gt;&lt;b&gt;strstr2&lt;/b&gt;&lt;/i&gt;, is the actual workhorse conducting the  search for the string and you can see that by just using shared memory  on the device the run times have been reduced by 997%! woot! almost a  thousand times faster!&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Improvement Memory Coalescing&lt;/b&gt;&lt;br /&gt;Coalescing is a concept that's applicable to a warp of CUDA threads but originated from compute science literature. It all comes down to optimal memory access patterns. A poor or random memory access pattern is sure to hit your application's effective bandwidth while a good memory access pattern (that is, one that respects the hardware) will surely send your application's effective bandwidth through the roof. This concept is applicable to threads in a warp.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Warp divergence&lt;/b&gt;&lt;br /&gt;To understand this right away, immediately examine your kernel code and if there are conditional statements like "if"; if so, then there's a good chance you'll encounter warp divergence. The net effect is that throughput will be lowered since every time a warp divergence occurs, another warp cannot be executed since time's spent in deciphering and executing the threads in the &lt;b&gt;if&lt;/b&gt; portion and the others spend time waiting on the &lt;b&gt;else&lt;/b&gt; portion.&lt;br /&gt;&lt;br /&gt;In my limited experiences, i've found it tough to eliminate warp divergence and appears to me that re-designing the kernels is necessary... *puts on his thinking hat*&lt;br /&gt;&lt;br /&gt;&lt;div style="color: red;"&gt;&lt;b&gt;Gotchas&lt;/b&gt;&lt;/div&gt;If you are developing on a mac book pro, especially the newer models that i got where the laptop will drop the performance of your integrated graphics chip when you're unplugged from AC; just remember to plug in when running CUDA else you'll suffer a performance hit. Not entirely sure what's the cause: my suspects are CPU down-stepping, GT330 graphics card switching on my mac book pro etc.&lt;br /&gt;&lt;br /&gt;I've observed that when developing CUDA apps on Mac OS X its always a good practice to use &lt;i style="color: blue;"&gt;cudaMemset(...)&lt;/i&gt;&lt;span style="color: black;"&gt; to initialize your data structures to a default state.&lt;/span&gt; &lt;br /&gt;&lt;br /&gt;Double-precision operations is supported for devices that's 1.3 and above so that means my GT 330M cannot perform such operations without suffering from lack of accuracies.&lt;br /&gt;&lt;br /&gt;So....&lt;br /&gt;&lt;br /&gt;That's the end of my sharing session and i hope i've been of some help to you, at least got you excited about CUDA. I know i still am :) &lt;br /&gt;&lt;script type="text/javascript"&gt;  var _&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;gaq&lt;/span&gt; = _&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;gaq&lt;/span&gt; || [];  _&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;gaq&lt;/span&gt;.push(['_setAccount', 'UA-5323400-1']);  _&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;gaq&lt;/span&gt;.push(['_trackPageview']);  (function() {    var &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ga&lt;/span&gt; = document.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;createElement&lt;/span&gt;('script'); &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ga&lt;/span&gt;.type = 'text/javascript'; &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ga&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;async&lt;/span&gt; = true;    &lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ga&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;src&lt;/span&gt; = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ga&lt;/span&gt;.js';    var s = document.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;getElementsByTagName&lt;/span&gt;('script')[0]; s.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;parentNode&lt;/span&gt;.&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;insertBefore&lt;/span&gt;(&lt;span style="background: none repeat scroll 0% 0% yellow;" class="goog-spellcheck-word"&gt;ga&lt;/span&gt;, s);  })();&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8671201492272464936?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8671201492272464936/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8671201492272464936&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8671201492272464936'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8671201492272464936'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/11/cuda-demonstration-wrt-exact-string.html' title='CUDA - An demonstration w.r.t Exact String Matching'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/TNjOYem8p3I/AAAAAAAABJ4/JpNJY43Byu4/s72-c/Screen+shot+2010-11-09+at+PM+12.27.07.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8995682262699592643</id><published>2010-11-04T09:34:00.000+08:00</published><updated>2010-11-04T09:34:34.702+08:00</updated><title type='text'>Clojure - I'm excited about functional programming !</title><content type='html'>Oh man, these past few days have been quite a experience. I've just picked up a book &lt;a href="http://www.pragprog.com/titles/shcloj/programming-clojure"&gt;Programming Clojure&lt;/a&gt; by Stuart Halloway (He wrote Practical Lisp too) and started to inject the functional blood into my system again.&lt;br /&gt;&lt;br /&gt;Its very exciting i must say! There are new concepts to be learn, old-concepts to be re-mapped again. I'm planning to write a couple of stuff on Clojure when i finish the book. There are so many ideas inside Clojure and one of those things i'm interested in are Memoization &amp;amp; Software Transaction Memory.&lt;br /&gt;&lt;br /&gt;Wish i could camp out at a corner of the coffeeshop for 8 hrs a day doing nothing but Clojure ; that reminds me of the time when i took time off just to learn Erlang ... these days i can't do that though i strongly believe that it helps anyone whose in the business of software development.&lt;br /&gt;&lt;br /&gt;Stay tuned!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8995682262699592643?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8995682262699592643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8995682262699592643&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8995682262699592643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8995682262699592643'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/11/clojure-im-excited-about-functional.html' title='Clojure - I&apos;m excited about functional programming !'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7578983486941368420</id><published>2010-10-29T09:28:00.000+08:00</published><updated>2010-10-29T09:28:08.575+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='Ruby Conference'/><title type='text'>Ruby conference in Singapore</title><content type='html'>Hi again everyone,&lt;br /&gt;&lt;br /&gt;I like to take the chance to conduct a broadcast. Back a couple of weeks before today, i raised a question in the &lt;a href="http://groups.google.com/group/singapore-rb"&gt;Singapore Ruby Brigade&lt;/a&gt; about the possibility of bringing a &lt;a href="http://groups.google.com/group/singapore-rb/browse_thread/thread/a393a3f32a5657a8"&gt;Ruby Conference to Singapore&lt;/a&gt; (This follows the synergy in the recent PyCON Asia Conf 2010) ; i have to admit that i have no prior experience in this and i'm not the organizer of this event but just trying to lend a helping hand.&lt;br /&gt;&lt;br /&gt;So having said that,&lt;br /&gt;&lt;br /&gt;I'm calling out to &lt;b&gt;ALL&lt;/b&gt; Ruby enthusiasts out there to pitch in ideas at the Singapore Ruby Brigade group w.r.t the following:&lt;br /&gt;&lt;br /&gt;1) What tracks would like the conference to have? e.g. education, research, application of ruby to real-world applications etc.&lt;br /&gt;&lt;br /&gt;2) Speakers list. Currently, a fellow member of the SRB has contact several Ruby heavy weights and thus far a lot of many are interested. But if you know of someone that we don't and is doing GREAT Work, do not hesitate to submit their names&lt;br /&gt;&lt;br /&gt;3) We are also trying to gather information about some practical issues like costs of running this conference so if anyone has expertise in this area, volunteering your expertise would be greatly appreciated.&lt;br /&gt;&lt;br /&gt;4) We are also trying to get an idea of how much people like yourselves are willing to pay to attend the conference? Please send a note to @sausheong on twitter.&lt;br /&gt;&lt;br /&gt;Alright, that's all i have for now. Would be great if we can come together to promote Ruby, share best practices and BE INSPIRED!&lt;br /&gt;&lt;br /&gt;Thank you for your attention!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7578983486941368420?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7578983486941368420/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7578983486941368420&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7578983486941368420'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7578983486941368420'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/10/ruby-conference-in-singapore.html' title='Ruby conference in Singapore'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2235150264395358096</id><published>2010-10-22T09:28:00.001+08:00</published><updated>2010-10-22T12:12:45.096+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IronRuby'/><category scheme='http://www.blogger.com/atom/ns#' term='IronPython'/><category scheme='http://www.blogger.com/atom/ns#' term='DLR'/><title type='text'>Shocked, Sad about DLR's IronPython IronRuby</title><content type='html'>Just a day after my previous post on IronPython and IronRuby and i've read the news that Jim Hugunin of Jython, IronPython and DLR fame has left Microsoft for Google. Read his blog &lt;a href="http://hugunin.net/microsoft_farewell.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/TMDoOBG50RI/AAAAAAAABJY/3Qt1a4zgOsY/s1600/Screen+shot+2010-10-22+at+AM+09.25.32.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="109" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/TMDoOBG50RI/AAAAAAAABJY/3Qt1a4zgOsY/s320/Screen+shot+2010-10-22+at+AM+09.25.32.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;This news was right after when Microsoft announced that it'll stop investing in IronPython. Read it &lt;a href="http://blogs.msdn.com/b/jasonz/archive/2010/10/21/new-components-and-contributors-for-ironpython-and-ironruby.aspx"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'm shocked to say the least ...&lt;br /&gt;&lt;br /&gt;I've got questions ranging from the whys to what's going to happen to IronPython; yesterday i've talked about the countless possibilities that DLR allowed developers like myself but now ...&lt;br /&gt;&lt;br /&gt;Life goes on i suppose ... back to the drawing board ... &lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2235150264395358096?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2235150264395358096/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2235150264395358096&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2235150264395358096'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2235150264395358096'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/10/shocked-sad-about-dlrs-ironpython.html' title='Shocked, Sad about DLR&apos;s IronPython IronRuby'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/TMDoOBG50RI/AAAAAAAABJY/3Qt1a4zgOsY/s72-c/Screen+shot+2010-10-22+at+AM+09.25.32.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7193750255945219700</id><published>2010-10-21T21:11:00.001+08:00</published><updated>2010-10-21T21:12:18.239+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Automation'/><category scheme='http://www.blogger.com/atom/ns#' term='Dynamic Language Runtime'/><category scheme='http://www.blogger.com/atom/ns#' term='Ruby'/><category scheme='http://www.blogger.com/atom/ns#' term='dynamic languages'/><category scheme='http://www.blogger.com/atom/ns#' term='IronRuby'/><category scheme='http://www.blogger.com/atom/ns#' term='IronPython'/><category scheme='http://www.blogger.com/atom/ns#' term='DLR'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>Automation</title><content type='html'>&lt;b&gt;What's been happening&lt;/b&gt; &lt;br /&gt;Has been quite a while since i last posted anything as i've been busy learning new stuff, which includes new programming languages like C#, VB, Ruby and also building automation into a product that specializes in 3D modeling and visualization.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Laying down my perceived flaws&lt;/b&gt;&lt;br /&gt;Before we come to exploring Automation per say, one of the challenges i had to face was to actually learn the tools i needed to hit the ground running. Wasn't exactly tough but it required me to put down my previous reservations about learning and using the Microsoft Technology suite "stack" if i can call it that. Once that was done, it was kind of cool and i believe people whom have extensive coding background in open source technologies would make the leap of faith to Microsoft easier than the other way round. That's what i saw at least.&lt;br /&gt;&lt;br /&gt;The key thing that helped me cross that barrier was recognizing that i should looking at other technologies with an objective mind; focus on the problem and not the technology.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Automation&lt;/b&gt;&lt;br /&gt;Having said that, next thing i want to talk about is Automation. After spending time working on it, i'm happy to say that we've shipped version 1.5 to our customers with the company. Now we're waiting for feedback. Yep, feedback's important to the whole process - no point building something that people don't like to use. Building automation is kind of a tough subject to begin with since it doesn't really fall into the standard SDLC; but that's what they said about Test Driven Development. TDD should be part of the SDLC period&lt;br /&gt;&lt;br /&gt;The usual questions are "How does automation help me?", "How much would it cost to build this automation?", "What should we automate?", "Can this be automated?". There are many more. The typical use cases of automation seem to lie more in complementing QA i.e Automated Testing.&lt;br /&gt;&lt;br /&gt;We've built the usual stuff that comes along with Automated Testing Frameworks using C#, VB - Test Management Tools, Test Execution Tools (local, distributed), Test Reporting/Archival/Trending Tools...what did i miss? Hmm...whatever&lt;br /&gt;&lt;br /&gt;But i see a problem. We have to get someone whose skilled in the Microsoft Technology stack to be able to write tests (Well, there are a lot of Microsoft professionals aren't there?...yeah sure...but what if you can garner the people whom practice Python, Ruby, Scala and wouldn't that open you to a greater pool of talents?). Those professionals may/may not need to re-write libraries not found in the C# arsenal etc&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where is this leading to?&lt;/b&gt;&lt;br /&gt;Think &lt;b&gt;IronPython&lt;/b&gt;, &lt;b&gt;IronRuby&lt;/b&gt;. These two ports originated from Python and Ruby respectively and they both have a great number of followers but most important of all, the integration of Microsoft suite with open source is a very powerful notion. I've been working on a DLR integration with the Automation framework lately and progress has been promising. This is exciting because of a number of reasons: I can still program in Python/Ruby, i can still re-use libraries that was previously developed (re-invention of the wheel aint gonna happen), integration with Rails/Django to pull/publish data, you get the idea? The possibilities are endless.&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7193750255945219700?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7193750255945219700/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7193750255945219700&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7193750255945219700'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7193750255945219700'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/10/automation.html' title='Automation'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3073236129803682758</id><published>2010-09-18T05:23:00.000+08:00</published><updated>2010-09-18T05:23:37.754+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenCL'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><title type='text'>Book cover for GPU GEMs Volume 2</title><content type='html'>Here's a quick peek at one possible book cover for GPU GEMs Volume 2 circa 2011 :)&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/TJPb0efoyOI/AAAAAAAABIw/Dwd0TAlRkUU/s1600/Screen+shot+2010-09-18+at+AM+05.16.39.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="640" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/TJPb0efoyOI/AAAAAAAABIw/Dwd0TAlRkUU/s640/Screen+shot+2010-09-18+at+AM+05.16.39.png" width="523" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3073236129803682758?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3073236129803682758/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3073236129803682758&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3073236129803682758'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3073236129803682758'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/09/book-cover-for-gpu-gems-volume-2.html' title='Book cover for GPU GEMs Volume 2'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_n2HkB0XD3Kw/TJPb0efoyOI/AAAAAAAABIw/Dwd0TAlRkUU/s72-c/Screen+shot+2010-09-18+at+AM+05.16.39.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-5399189334741948897</id><published>2010-08-31T22:08:00.000+08:00</published><updated>2010-08-31T22:08:13.132+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Catch exception'/><category scheme='http://www.blogger.com/atom/ns#' term='VB'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><title type='text'>How to catch unhandled exceptions and generate diagnostic data</title><content type='html'>Recently, i got involved in the C# world and came across an interesting problem where one of our applications would crash at times and nobody would know why and how. The idea was to find a way to detect the crash, break into and obtain its dump file &amp;amp; a nice stack trace. That always help developers.&lt;br /&gt;&lt;br /&gt;There are a number of solutions but i needed something that was relatively easy for a C# newbie (like yours truly), it had to be a runtime artifact where the code would reference (think dynamic loading) among other requirements. Voila, found a nice little library at Microsoft's Open Source website - yeah i know, for microsoft newbies like myself i had NO IDEA microsoft was INTO open source.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;And so the story goes, there is this neat library called &lt;a href="http://appmanagement.codeplex.com/"&gt;Application Management&lt;/a&gt;. Download it and check it out yourself. Might save you the time you needed. I know ;)&lt;br /&gt;&lt;br /&gt;Let me give you a quick rundown of what its about. Basically, its a DLL and supports the major microsoft programming languages like C#, VB etc and so that's what made it nice because i created a sample application in C# and since it works, i know the VB.NET code i'm working on is likely to work. Cool.&lt;br /&gt;&lt;br /&gt;So what you do?&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Create a C# project in Visual Studio 2010 (You can try VS2008, i didn't try it)&lt;/li&gt;&lt;li&gt;Reference the DLL by adding it to your project (its ApplicationManagement.dll)&lt;/li&gt;&lt;li&gt;You need to initialize it in your application. In this case, its the Program.cs&lt;/li&gt;&lt;/ol&gt;&lt;pre&gt;using ApplicationManagement;&lt;br /&gt;namespace MyTestApp&lt;br /&gt;{&lt;br /&gt;    static class Program&lt;br /&gt;    {&lt;br /&gt;     ///&lt;br /&gt;     /// some other stuff&lt;br /&gt;     ///&lt;br /&gt;&lt;br /&gt;     public static ExceptionHandler exceptionHandler; &lt;br /&gt;        static void Main()&lt;br /&gt;        {&lt;br /&gt;            exceptionHandler = new ExceptionHandler(); &lt;br /&gt;            Application.EnableVisualStyles();&lt;br /&gt;            Application.SetCompatibleTextRenderingDefault(false);&lt;br /&gt;            AppDomain.CurrentDomain.UnhandledException += new UnhandledExceptionEventHandler(CurrentDomain_UnhandledException);&lt;br /&gt;            Application.Run(new Form1());&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        static void CurrentDomain_UnhandledException(object sender, UnhandledExceptionEventArgs e)&lt;br /&gt;        {&lt;br /&gt;                exceptionHandler.UnhandledException(sender, e);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So in your application code, it were to throw some form of exception then a user dump + stack trace would be created in your project's working directory. On my setup, its found at &amp;lt;project&amp;gt;\bin\Debug\Logs\AppErrors\&lt;br /&gt;&lt;br /&gt;Well that's it. Have fun!&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-5399189334741948897?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/5399189334741948897/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=5399189334741948897&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5399189334741948897'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5399189334741948897'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/08/how-to-catch-unhandled-exceptions-and.html' title='How to catch unhandled exceptions and generate diagnostic data'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-5706581358724672616</id><published>2010-08-24T22:35:00.000+08:00</published><updated>2010-08-24T22:35:19.950+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft Windows'/><category scheme='http://www.blogger.com/atom/ns#' term='VB'/><category scheme='http://www.blogger.com/atom/ns#' term='C#'/><category scheme='http://www.blogger.com/atom/ns#' term='.Net'/><title type='text'>The World of Windows (C#, F#, VB, .NET)</title><content type='html'>I haven't been blogging much lately because i've found a new job that requires me to learn new stuff - and this time its not exactly the LAMP stack ;)&lt;br /&gt;&lt;br /&gt;So what is it exactly? well its the mysterious world of windows. Like all things, its mysterious because quite simply i have not worked with it previously. But these past few weeks opened my eyes to another platform that i have never considered before. My new employer is receptive to the idea that i lack experience in Windows development but that didn't deter me so i guess its probably why it doesn't deter them :) yay&lt;br /&gt;&lt;br /&gt;One of the questions i asked myself is how do i get to speed with C# and the likes? Kachink! I read the ebook &lt;a href="http://www.amazon.com/Head-First-Learners-Real-World-Programming/dp/1449380344/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1282658951&amp;amp;sr=8-1"&gt;Head First C#&lt;/a&gt; (i was used to their layout and presentation format, so it works much better for me) and that took me where i wanted to pretty quickly enough. And i thought a very well written book on that subject was &lt;a href="http://www.amazon.com/Essential-4-0-Microsoft-NET-Development/dp/0321694694/ref=sr_1_1?s=books&amp;amp;ie=UTF8&amp;amp;qid=1282659395&amp;amp;sr=1-1"&gt;Essential C# 4.0&lt;/a&gt; That was done. How about Visual Basic? I looked to Microsoft's MSDN (which is a great resource btw)&lt;br /&gt;&lt;br /&gt;For a guy whose done Java,J2EE - you'll probably find C# pretty easy to follow since it was created *supposedly* to be Java's rival :) though the Visual Studio 2010 was packed with lots of goodies that the developer can take advantage of. One cool thing from VS2010 was the fact that it reduce the amount of boilerplate code for accessor/mutator methods so much easier.&lt;br /&gt;&lt;pre&gt;Here's an example in VB&lt;br /&gt;Public Class SimpleTest&lt;br /&gt;    ' --- replace the string "newPropertyValue" consistently&lt;br /&gt;    Private newPropertyValue As String&lt;br /&gt;    Public Property NewProperty() As String&lt;br /&gt;        Get&lt;br /&gt;            Return newPropertyValue&lt;br /&gt;        End Get&lt;br /&gt;        Set(ByVal value As String)&lt;br /&gt;            newPropertyValue = value&lt;br /&gt;        End Set&lt;br /&gt;    End Property&lt;br /&gt;&lt;br /&gt;End Class&lt;br /&gt;&lt;/pre&gt;&lt;pre&gt;Another example in C#&lt;br /&gt;namespace SimpleHelloWorld&lt;br /&gt;{&lt;br /&gt;    public partial class Form1 : Form&lt;br /&gt;    {&lt;br /&gt;        public Form1()&lt;br /&gt;        {&lt;br /&gt;            InitializeComponent();&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        private void button1_Click(object sender, EventArgs e)&lt;br /&gt;        {&lt;br /&gt;&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;    //&lt;br /&gt;    // Accessor/Mutator methods to hide 'backing' value&lt;br /&gt;    class SimpleTest&lt;br /&gt;    {&lt;br /&gt;        private int state;&lt;br /&gt;        public int GetState&lt;br /&gt;        {&lt;br /&gt;            get { return this.state; }&lt;br /&gt;            set { this.state = value; }&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;}&amp;nbsp; &lt;/pre&gt;From a learner's point of view, the language's syntax does help make a lot of concepts stick much better and faster since i can quickly identify the commonalities these 2 languages share. Btw, i thought the idea of a 'partial' class was quite nice :) and the idea of namespaces pretty collide with what i understood from C++'s concept of 'namespace'. Of course this is not exhaustive.&lt;br /&gt;&lt;br /&gt;From a guy whose done Erlang,Python,Ruby i wonder what F# holds for me? I guess i'll find out soon...&lt;br /&gt;&lt;br /&gt;I cannot possibly end this post without mention two Scotts: &lt;a href="http://weblogs.asp.net/scottgu/"&gt;Scott Gu's Blog&lt;/a&gt; and &lt;a href="http://scottcate.com/"&gt;Scott Cate's blog&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;My bias is possibly gone as i embrace the new technologies (at least to me)&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-5706581358724672616?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/5706581358724672616/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=5706581358724672616&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5706581358724672616'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5706581358724672616'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/08/world-of-windows-c-f-vb-net.html' title='The World of Windows (C#, F#, VB, .NET)'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-562083318540459858</id><published>2010-07-16T16:26:00.000+08:00</published><updated>2010-07-16T16:26:04.979+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenCL'/><category scheme='http://www.blogger.com/atom/ns#' term='Nvidia'/><category scheme='http://www.blogger.com/atom/ns#' term='GPGPU'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><title type='text'>Author for 4th Edition of GPU GEMs</title><content type='html'>Let me start by saying: Yayyy!!! It was a whole lot of fun! I'll do it again!&lt;br /&gt;&lt;br /&gt;Reason for my joy is that my article has been accepted for GPU GEMs 4th edition and now finalizing my article before its released and all that stuff. That happened rather recently when i started working on the article around April 2010 on a feverish pace and made it before the deadline; now i'm cleaning up the article and i've agreed to release the source code (hope it won't be too contrived for you experts out there) - details will be finalized over the course of time.&lt;br /&gt;&lt;br /&gt;I'm not trying to be secretive on the subject of my work but the topic relates to the application of CUDA (Nvidia's Compute Unified Device Architecture) and so far, i've had tons of fun with it. As part of the finalization process, i've got new hardware namely GTX480 (this hardware is OSSM!) running benchmarks, looking at possibility of embedding PTX (its assembly language for CUDA), learning illustrations etc&lt;br /&gt;&lt;br /&gt;Btw, if you are interested in CUDA - you should check NVIDIA's website on CUDA and go download it from &lt;a href="http://developer.nvidia.com/object/cuda_3_1_downloads.html"&gt;here&lt;/a&gt; and get yourself started. What's interesting with the latest CUDA is the support for C/C++ printf() and function pointers in the CUDA &lt;i&gt;kernel&lt;/i&gt; which opens up a HUGE treasure of possibilities for application.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-562083318540459858?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/562083318540459858/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=562083318540459858&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/562083318540459858'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/562083318540459858'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/07/author-for-4th-edition-of-gpu-gems.html' title='Author for 4th Edition of GPU GEMs'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8838922066797874972</id><published>2010-07-08T09:58:00.000+08:00</published><updated>2010-07-08T09:58:58.527+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Nvidia'/><category scheme='http://www.blogger.com/atom/ns#' term='CPU'/><category scheme='http://www.blogger.com/atom/ns#' term='GPU'/><category scheme='http://www.blogger.com/atom/ns#' term='Intel'/><title type='text'>Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU</title><content type='html'>Just read a very interesting article on the continuing debate between GPU and CPU and their relative performance. Its between Intel and Nvidia; not surprising its not AMD vs. ATI or Intel vs. ATI- that would REALLY be &lt;i&gt;interesting&lt;/i&gt;. Here's the abstract&lt;br /&gt;&lt;br /&gt;Abstract:&lt;br /&gt;&lt;blockquote&gt;Recent advances in computing have led to an explosion in  the amount of data being generated. Processing the ever-growing data in a  timely manner has made throughput computing an important aspect for  emerging applications. Our analysis of a set of important throughput  computing kernels shows that there is an ample amount of parallelism in  these kernels which makes them suitable for today’s multi-core CPUs and  GPUs. In the past few years there have been many studies claiming GPUs  deliver substantial speedups (between 10X and 1000X) over multi-core  CPUs on these kernels. To understand where such large performance  difference comes from, we perform a rigorous performance analysis and  find that after applying optimizations appropriate for both CPUs and  GPUs the performance gap between an NVIDIA GTX280 processor and the  Intel Core i7-960 processor narrows to only 2.5x on average. In this  paper, we discuss optimization techniques for both CPU and GPU, analyze  what architecture features contributed to performance differences  between the two architectures, and recommend a set of architectural  features which provide significant improvement in architectural  efficiency for throughput kernels.&lt;/blockquote&gt;I'm a big fan of GPUs and recognize the value of CPUs but one thing i'll like to keep in mind is that GPUs though not a young technology (in 2010) but they are young as compared to the CPU as we know it today. Go &lt;a href="http://doi.acm.org/10.1145/1816038.1816021"&gt;read it &lt;/a&gt;and form your own ideas about it.&lt;br /&gt;&lt;br /&gt;My thoughts...&lt;br /&gt;&lt;br /&gt;The design and purpose of the CPU and GPUs were entirely different and for decades the GPU have always played a complementary role to the CPU, but things are shifting as we speak from massively-parallel antivirus engines etc. Like any IT product in the market today, it will go through the usual adoption and evolution phases before i can even see it attempt to supplant the CPU. How far away are we? I like to know too.&lt;br /&gt;&lt;br /&gt;So...&lt;br /&gt;&lt;br /&gt;If i were a bystander reading the article, first reaction would be - Has the GPU finally caught up with the CPU to prompt an intensive study of this nature by the largest CPU producer in the world ? Perhaps i should REALLY pay attention to it.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8838922066797874972?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://doi.acm.org/10.1145/1816038.1816021' title='Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU'/><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8838922066797874972/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8838922066797874972&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8838922066797874972'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8838922066797874972'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/07/debunking-100x-gpu-vs-cpu-myth.html' title='Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7655463100650298756</id><published>2010-06-29T22:58:00.000+08:00</published><updated>2010-06-29T22:58:03.050+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Linden Lab'/><category scheme='http://www.blogger.com/atom/ns#' term='SecondLife'/><title type='text'>Another adventure begins!</title><content type='html'>Hello everyone,&lt;br /&gt;&lt;br /&gt;To start, i'd like to say that i'm leaving Linden Lab. If you caught the news from the internet, then you'll understand why but if you didn't here are two i found recently by googling about it. http://www.vizworld.com/2010/06/linden-lab-laying-staff-closing-singapore-office/ and http://digitalmedia.strategyeye.com/article/70432b60e0/2010/06/10/Linden_Lab_cuts_30_of_Second_Life_staff/&lt;br /&gt;&lt;br /&gt;To be clear about the motivation about this post, i'm only sharing my thoughts on what i think about the entire thing. If there is one takeaway from this article i like you to have, that would be " when one door closes, another door opens "&lt;br /&gt;&lt;br /&gt;Obviously, being laid off is not a pleasant experience but its something i have to deal with. Having support from family and co-workers during this period in time really made it easier to cope. Support from co-workers in the U.S. also helped made things easier, collectively speaking it wasn't too bad an experience. Lindens are just a warm and fuzzy bunch ;)&lt;br /&gt;&lt;br /&gt;Our lab in Singapore is "tight" - yeah i mean it. Having smart and enthusiastic people come to office every day and hearing their ideas and how they work feverishly to accomplish is what its all about. period. work hard, play hard. You should see some of the stuff we did every quarter - we did cycling at pulau ubin (offshore island at singapore), had great lunches, bowling etc. Our farewell lunch was quite a different experience, at least for me, as i quickly realized that i would no longer be working, having lunches, discussions/debates with these people. The bantering, ohhhh how i would miss that.&lt;br /&gt;&lt;br /&gt;Someone once said " &lt;i&gt;what makes a person want to stay with a company is always the people&lt;/i&gt; ". Was it Mitch Kapor? Well i don't know&lt;br /&gt;&lt;br /&gt;Moving on, i realized that i'll soon be out of a job. Jobless in Singapore! Hehe, i'll admit these are uncertain times and was wondering what i should do next and thanks to the great support rendered by co-workers and family, i quickly found myself doing interviews and i'm really surprised how many people are doing amazing work in startups and established companies. I had a blast doing these interviews! These companies are recognizing the power of open source and field those leadership roles with people with the right mindset :D&lt;br /&gt;&lt;br /&gt;I'm still looking for my next adventure :D&lt;br /&gt;&lt;br /&gt;Dennis Hopper in "Crash" said "&lt;i&gt; it's a beautiful thing man&lt;/i&gt; ... "&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;  var _gaq = _gaq || [];  _gaq.push(['_setAccount', 'UA-5323400-1']);  _gaq.push(['_trackPageview']);  (function() {    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);  })();&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7655463100650298756?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7655463100650298756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7655463100650298756&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7655463100650298756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7655463100650298756'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/06/another-adventure-begins.html' title='Another adventure begins!'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2474031696688235187</id><published>2010-06-05T00:10:00.000+08:00</published><updated>2010-06-05T00:10:49.393+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='boost coroutines'/><category scheme='http://www.blogger.com/atom/ns#' term='python yield'/><category scheme='http://www.blogger.com/atom/ns#' term='coroutines'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>Touring C++ Coroutines</title><content type='html'>In the last couple of months, i have been exposed to &lt;b&gt;C++ Co-routines&lt;/b&gt; and i wanted to write about my learning. The library i'm referring to is the&lt;b&gt; Boost.coroutine&lt;/b&gt; where it has a family of class templates that wrap function objects in coroutines and these coroutines can be reentered and returned more than once without causing the destruction of automatic objects. Basically, it allow states to be store/retrieve across a function call. I've had the immense pleasure of figuring this stuff out because another favourite scripting language of mine, Python, uses this concept extensively in its 'yield' statement.&lt;br /&gt;&lt;br /&gt;There are two important concepts you'll need to comprehend before you can use coroutines effectively in C++. Number 1, the subroutines we're so used to programming (e.g. functions, procedures) lose their state when they return (You'll probably wonder that &lt;a href="http://www.sgi.com/tech/stl/generate.html"&gt;&lt;i&gt;function objects&lt;/i&gt;&lt;/a&gt; also save state but there's a difference and that is: function objects stored their state in class member variables while coroutines store the state in the stack as automatic objects - these automatic objects are not destroyed btw which is COOL!). Number 2, coroutines preserved the point of execution and is able to resume execution from where it left off which is unlike &lt;i&gt;function objects&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Simple use of Coroutines&lt;/b&gt;&lt;br /&gt;One example is the generator function (yep, python has it too) and what it is is really a function that returns a sequence of values instead of returning all of them; this programming concept is very useful in practice in the form of a dynamic prime number generator.&lt;br /&gt;&lt;br /&gt;I love the function objects because it allows me to indulge in the world of functional programming to an extent though i know i'm still working in C++. You can pretty much recognize a function object in C++ by looking for the &lt;i style="color: blue;"&gt;operator()()&lt;/i&gt; in the source code and possibly a couple of operators that are overloaded that suits the logic.&lt;br /&gt;&lt;br /&gt;Simple example is the generator that generates all numbers (integers) between lower and upper inclusive. e.g. the lower bound is 1 and 4. &lt;br /&gt;&lt;br /&gt;&lt;pre&gt;class generator {&lt;br /&gt;public:&lt;br /&gt;&amp;nbsp; generator(int lower, int upper): lb(lower), ub(upper) {}&lt;br /&gt;&amp;nbsp; int operator()() { return lb++; }&lt;br /&gt;&amp;nbsp; operator bool() const { return lb &amp;lt; ub; }&lt;br /&gt;private:&lt;br /&gt;&amp;nbsp; int lb, ub;&lt;br /&gt;};&lt;br /&gt;int main(int argc, char** argv) {&lt;br /&gt;&amp;nbsp; generator gen(1,4);&lt;br /&gt;&amp;nbsp;// invokes the operator bool() &lt;br /&gt;&amp;nbsp; while(gen)&lt;br /&gt;&amp;nbsp; &amp;nbsp; // invokes the operator()() - function object! &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; gen() &amp;lt;&amp;lt; "\n";&lt;br /&gt;}&lt;/pre&gt;There's another way and that is using &lt;a href="http://www.sgi.com/tech/stl/InputIterator.html"&gt;input iterators&lt;/a&gt; as generators but IMO, its not that straightforward and simple as opposed to using function objects which i do consider more elegance but then again, i'll iterate a philosophy of mine "whatever that works simply" when it comes to choosing paradigms&lt;br /&gt;&lt;br /&gt;So i know i can write generators using function objects and can i do that using coroutines? Sure i can and here's what i found out&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// In a .h file, there's the contents &lt;br /&gt;#ifndef GEN_TEMPLATE_H&lt;br /&gt;#define GEN_TEMPLATE_H&lt;br /&gt;&lt;br /&gt;#include &lt;boost bind.hpp=""&gt;&lt;br /&gt;namespace coro = boost::coroutines;&lt;/boost&gt;&lt;br /&gt;typedef coro::generator&lt;int&gt; gen_type;&lt;br /&gt;/*&lt;br /&gt;Notice that range_generator body is entered for the first itme when the generator is constructed&lt;br /&gt;(from the main entry point) then at every iteration range_iterator is reentered from &lt;br /&gt;yield(). In this case range_iterator is reentered when generator::operator++ is invoked.&lt;br /&gt;*/&lt;br /&gt;int range_generator(gen_type::self&amp;amp; self, int min, int max) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while ( min &amp;lt; max - 1)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.yield(min++);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.exit(); // This is used instead&lt;br /&gt;}&lt;br /&gt;#endif&lt;/int&gt;&lt;br /&gt;...&lt;br /&gt;...&lt;br /&gt;// In another .cpp file&lt;br /&gt;#ifdef GEN_TEMPLATE_H&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gen_type generator(boost::bind(range_generator, _1, 100, 200) );&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while( generator != gen_type() )&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; *generator++ &amp;lt;&amp;lt; "\n";&lt;br /&gt;#endif &lt;/pre&gt;Now this example is the input iterator fashion of creating a generator function and the creation of the generator is in the statement boost::bind(range_generator, _1, 100, 200) where the boost::coroutines::generator&lt;int&gt;::self is bound to range_generator; another thing is that you can see the generator entry function "yield" and exit function "exit" invoked through the "self" object. The "self" object is used to identify the different instances of a generator. &lt;/int&gt;&lt;br /&gt;&lt;br /&gt;So i asked myself: "Great, it works on built-ins, how about user defined types?" Yep can be done too and here's what i did, its not great but it'll do for this example&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// in some .h file &lt;/pre&gt;&lt;pre&gt;#ifndef GEN_TEMPLATE_H&lt;br /&gt;#define GEN_TEMPLATE_H&lt;br /&gt;class test {&lt;br /&gt;public:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; typedef int value_type;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; test(int min) : m_current(min) {}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; test() {}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; test&amp;amp; operator++() {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; m_current++;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ( m_current &amp;gt; max ) m_current = -1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return *this;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; test&amp;amp; operator++(int) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; test t(*this);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ++*this;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return t;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; friend bool operator&amp;lt;(test&amp;amp; rhs, const test&amp;amp; lhs) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ( rhs.max != lhs.max ) rhs.max = lhs.max; // only invoked the first time&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return rhs.m_current &amp;lt; lhs.m_current;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; friend bool operator==(const test&amp;amp; rhs, const test&amp;amp; lhs) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return rhs.m_current == lhs.m_current;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; friend bool operator!=(const test&amp;amp; rhs, const test&amp;amp; lhs) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return !(rhs == lhs);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; operator int() { return m_current;}&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int operator* () { return m_current; }&lt;br /&gt;private:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int m_current;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int max;&lt;br /&gt;};&lt;br /&gt;#include &lt;boost bind.hpp=""&gt;&lt;br /&gt;namespace coro = boost::coroutines;&lt;br /&gt;&lt;br /&gt;typedef coro::generator&lt;int&gt; gen_type;&lt;br /&gt;/*&lt;br /&gt;Notice that range_generator body is entered for the first itme when the generator is constructed&lt;br /&gt;(from the main entry point) then at every iteration range_iterator is reentered from &lt;br /&gt;yield(). In this case range_iterator is reentered when generator::operator++ is invoked.&lt;br /&gt;*/&lt;br /&gt;int range_generator(gen_type::self&amp;amp; self, test min, test max) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while ( min &amp;lt; max)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.yield(min++);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.exit(); // This is used instead&lt;br /&gt;}&lt;br /&gt;#endif&lt;/int&gt;&lt;/boost&gt;&amp;nbsp;&lt;/pre&gt;&lt;pre&gt;...&lt;/pre&gt;&lt;pre&gt;...&lt;/pre&gt;&lt;pre&gt;// in some .c file&lt;/pre&gt;&lt;pre&gt;#ifdef GEN_TEMPLATE_H&lt;br /&gt;    gen_type generator(boost::bind(range_generator, _1, test(100), test(200)) );&lt;br /&gt;    while( generator != gen_type() )&lt;br /&gt;        std::cout &amp;lt;&amp;lt; *generator++ &amp;lt;&amp;lt; "\n";&lt;br /&gt;#endif &lt;/pre&gt;Pretty basic stuff there, so it can be done.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Another concept: the producer/consumer pattern&lt;/b&gt;&lt;br /&gt;I love this concept because its basically a messaging system and i have Nat Goodspeed &amp;amp; Brad Kittenbrink to thank ;) thanks guys!&lt;br /&gt;&lt;br /&gt;Here's how it works (its not perfect code but it works to illustrate my example)&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;// In some .h file&lt;br /&gt;&lt;br /&gt;#ifndef MESSAGE_H&lt;br /&gt;#define MESSAGE_H&lt;br /&gt;#include &lt;vector&gt;&lt;br /&gt;#include &lt;iostream&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;class Message {&lt;br /&gt;public:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Message(int msg) : m(msg) { std::cout &amp;lt;&amp;lt; "Created:" &amp;lt;&amp;lt; m &amp;lt;&amp;lt; std::endl; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int operator *() { return m; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; friend std::ostream&amp;amp; operator&amp;lt;&amp;lt;(std::ostream&amp;amp; os, const Message&amp;amp; m);&lt;br /&gt;private:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int m;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;std::ostream&amp;amp; operator&amp;lt;&amp;lt;(std::ostream&amp;amp; os, const Message&amp;amp; m) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; os &amp;lt;&amp;lt; m.m &amp;lt;&amp;lt; std::endl;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return os;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;template &lt;typename m=""&gt;&lt;br /&gt;class MessageQ {&lt;br /&gt;public:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageQ(int capacity = 10) : capacity(capacity) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for( int i = 0; i &amp;lt; capacity; ++i) q.push_back(Message(i));&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; operator bool() { return q.size() &amp;gt; 0; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Message operator *() { Message lastEle = q.back(); q.pop_back(); return lastEle; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; MessageQ&amp;amp; operator++() {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ( q.size() &amp;lt;= 0 ) return false;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; q.pop_back();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return *this;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;private:&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::vector&lt;m&gt; q;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; int capacity;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;#endif&lt;br /&gt;&lt;br /&gt;#ifdef MESSAGE_H&lt;br /&gt;#include &lt;boost bind.hpp=""&gt;&lt;br /&gt;namespace coro = boost::coroutines;&lt;br /&gt;&lt;br /&gt;typedef coro::generator&lt;message&gt; gen;&lt;br /&gt;&lt;br /&gt;const Message&amp;amp; producer(gen::self&amp;amp; self, MessageQ&lt;message&gt; msgq) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; while (msgq) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.yield(*msgq);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; self.exit();&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;template&lt;typename producer=""&gt;&lt;br /&gt;void consumer(Producer prod) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; std::cout &amp;lt;&amp;lt; *prod &amp;lt;&amp;lt; std::endl;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } while( ++prod );&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;#endif&lt;br /&gt;...&lt;/typename&gt;&lt;/message&gt;&lt;/message&gt;&lt;/boost&gt;&lt;/m&gt;&lt;/typename&gt;&lt;/iostream&gt;&lt;/vector&gt;&lt;br /&gt;...&lt;br /&gt;// and in some other .c file&lt;br /&gt;#ifdef MESSAGE_H&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; consumer(gen(boost::bind(producer, _1, MessageQ&lt;message&gt;())));&lt;br /&gt;#endif&lt;/message&gt;&lt;br /&gt;&lt;/pre&gt;So what i've basically done is to create a message queue of sorts and populate it with 10 simple messages and use the generator, in this case the consumer, to iterate through the messages.&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2474031696688235187?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2474031696688235187/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2474031696688235187&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2474031696688235187'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2474031696688235187'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/06/touring-c-coroutines.html' title='Touring C++ Coroutines'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2401689576903278973</id><published>2010-05-15T10:41:00.002+08:00</published><updated>2010-05-15T10:41:38.446+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='unit testing'/><category scheme='http://www.blogger.com/atom/ns#' term='junit'/><category scheme='http://www.blogger.com/atom/ns#' term='unit test'/><title type='text'>JUnit Cookbook</title><content type='html'>This is simply excellent writing :)&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;http://junit.sourceforge.net/doc/cookbook/cookbook.htm&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2401689576903278973?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2401689576903278973/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2401689576903278973&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2401689576903278973'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2401689576903278973'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/05/junit-cookbook.html' title='JUnit Cookbook'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-487438156067258118</id><published>2010-05-10T23:23:00.000+08:00</published><updated>2010-05-10T23:23:23.513+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pointer aliasing'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><title type='text'>Mudflap - Instrumentation of pointers</title><content type='html'>Heard of mudflap? This technology is exciting in the sense that it allows one to instrument C/C++ pointers that it finds to be error prone to cause an application to crash and its got potential. Read about the &lt;a href="http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging"&gt;wiki&lt;/a&gt; and download the original &lt;a href="http://gcc.fyxm.net/summit/2003/mudflap.pdf"&gt;paper&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;It is enabled by passing -fmudflap to the compiler. &lt;span class="anchor" id="line-4"&gt;&lt;/span&gt;For front-ends that support it (C and very simple  C++ programs), it instruments all risky pointer/array dereferencing  operations, some standard library string/heap functions, and some other  associated constructs with range/validity tests. Modules so instrumented  should be immune to buffer overflows, invalid heap use, and some other  classes of C/C++ programming errors. The instrumentation relies on a  separate runtime library (libmudflap), which will be linked into a  program if -fmudflap -lmudflap is given at link time. Run-time behavior  of the instrumented program is controlled by the MUDFLAP_OPTIONS  environment variable. &lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-487438156067258118?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/487438156067258118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=487438156067258118&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/487438156067258118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/487438156067258118'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/05/mudflap-instrumentation-of-pointers.html' title='Mudflap - Instrumentation of pointers'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7879610526550773870</id><published>2010-02-17T16:48:00.003+08:00</published><updated>2010-05-08T23:28:33.425+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cpp unit tests'/><category scheme='http://www.blogger.com/atom/ns#' term='unit testing'/><category scheme='http://www.blogger.com/atom/ns#' term='TUT'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><category scheme='http://www.blogger.com/atom/ns#' term='unit test'/><title type='text'>Customizing TUT framework for TeamCity 5.x Reporting Tests</title><content type='html'>Recently, i've had the pleasure of figuring out how to get CPP unit tests "report" results to a Distributed Build System like TeamCity 5.x Btw, i'm not accepting any sort of monies from Jetbrains nor was i invited to do this post; its purely to share information.&lt;br /&gt;&lt;br /&gt;The first thing to realize about the&amp;nbsp; &lt;a href="http://tut-framework.sourceforge.net/"&gt;TUT framework&lt;/a&gt; (lightweight C++ unit testing framework - i like it :) ) is basically how it works and how TeamCity 5.x service messages work. There isn't a shortcut to understanding TUT's mechanism except to read the code so download it and understand it. On TeamCity 5.x Service Messages, refer to this article &lt;a href="http://confluence.jetbrains.net/display/TCD5/Build+Script+Interaction+with+TeamCity"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Once you've done that, you'll realize that TeamCity 5.x needs you to output messages of the form&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;##[teamcity testSuiteStarted name='suite.name']&lt;br /&gt;...&lt;br /&gt;##[teamcity testStarted name='XXX']&lt;br /&gt;...&lt;br /&gt;##[teamcity testIgnored name='ccc']&lt;br /&gt;...&lt;br /&gt;##[teamcity testFinished name='XXX']&lt;br /&gt;...&lt;br /&gt;##[teamcity testIgnored name='ddd']&lt;br /&gt;...&lt;br /&gt;##[teamcity testSuiteFinished name='suite.name']&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;during a build process and you'll realize that &lt;i style="color: #6aa84f;"&gt;testStarted&lt;/i&gt; and &lt;i style="color: #6aa84f;"&gt;testFinished&lt;/i&gt; must begin and finished in pairs, so are the &lt;i style="color: #6aa84f;"&gt;testSuiteStarted&lt;/i&gt; and &lt;i style="color: #6aa84f;"&gt;testSuiteFinished&lt;/i&gt; messages. &lt;br /&gt;&lt;br /&gt;Next thing you should figure out is how and where to insert print statements in your test code. Well the short answer is to study the C++ class callback and all classes that inherit from it. Currently, &lt;i style="color: blue;"&gt;tut_console_reporter.hpp&lt;/i&gt; and &lt;i style="color: blue;"&gt;tut_cppunit_reporter.hpp&lt;/i&gt; are the ones i know about.&lt;br /&gt;&lt;br /&gt;What you need to do is to realize that the TUT runner as the framework calls it, is actually invoking whatever class that inherits&lt;i&gt;&lt;span style="color: blue;"&gt; tut::callback&lt;/span&gt;&lt;/i&gt; (i.e. &lt;i style="color: blue;"&gt;class console_reporter&lt;/i&gt;) and you need to implement/override the virtual C++ member functions in &lt;i style="color: blue;"&gt;tut::callback&lt;/i&gt; so that you can place the appropriate TeamCity 5.x service messages. I won't place the code here cos its pretty easy :) and there are more than one way to do it - depends on how your unit tests are crafted.&lt;br /&gt;&lt;br /&gt;Once you get it to work, you should see statistics on your TeamCity dashboard via the Projects dashboard and you can drill down into how long each test took, details etc.&lt;br /&gt;&lt;br /&gt;Quick update on 8 May 2010: I've since then released this modifications to &lt;a href="https://sourceforge.net/projects/tutforteamcity/"&gt;sourceforge&lt;/a&gt; and here's a &lt;a href="http://confluence.jetbrains.net/display/TW/TeamCity+Plugins"&gt;URL&lt;/a&gt; to JetBrains list of available plugins.&lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7879610526550773870?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7879610526550773870/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7879610526550773870&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7879610526550773870'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7879610526550773870'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2010/02/customizing-tut-framework-for-teamcity.html' title='Customizing TUT framework for TeamCity 5.x Reporting Tests'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-800053563555793601</id><published>2009-12-23T21:47:00.003+08:00</published><updated>2009-12-31T10:24:08.693+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='PyCUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 2.3'/><title type='text'>PyCUDA on Mac OS X</title><content type='html'>&lt;a href="http://mathema.tician.de/software/pycuda"&gt;PyCUDA&lt;/a&gt; came to my attention sometime back and i've got a chance to install it and starting to play with it - this package is essentially a python binding to CUDA (Compute Unified Device Architecture) by Nvidia; and this package (download it &lt;a href="http://pypi.python.org/pypi/pycuda"&gt;here&lt;/a&gt;) provides an easier way to program your CUDA graphics card via Python.&lt;br /&gt;&lt;br /&gt;Btw, i've installed PyCuda 0.93, boost 1.41.0 and CUDA 2.3 on Mac OS X 10.5.8&lt;br /&gt;&lt;br /&gt;So in this post of mine, i'd like to share some of the problems i've faced but before i do that its important that you read through the following wiki pages:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://wiki.tiker.net/PyCuda/Installation/Mac"&gt;PyCUDA installation on Mac OS X&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://wiki.tiker.net/BoostInstallationHowto"&gt;Boost installation on Mac OS X&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;And try out the installation of PyCUDA on those pages but if you found a problem, i might have a solution for you. So i'll run a couple of Q&amp;amp;As here and hope it helps&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q:&lt;/b&gt; After i installed PyCUDA as instructed, i ran the test_driver.py and an error message was thrown out &lt;i style="color: blue;"&gt;Fatal Python error: Interpreter not initialized (version mismatch?)&lt;/i&gt;. Why is this happening?&lt;br /&gt;&lt;b&gt;A:&lt;/b&gt; The version of the Python used is incorrect. You need to rebuild boost and PyCUDA using a consistent version of Python and you need to check the following files&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;i&gt;&lt;span style="color: blue;"&gt;siteconf.py&lt;/span&gt;&lt;/i&gt; (&lt;pycuda directory="" installation=""&gt;; this file is produced after launching configure.py)&lt;/pycuda&gt;&lt;/li&gt;&lt;li&gt;&lt;i style="color: blue;"&gt;project-config.jam&lt;/i&gt; (&lt;boost_1_41_0 directory=""&gt;; this file is produced after launching bootstrap.sh)&lt;/boost_1_41_0&gt;&lt;/li&gt;&lt;/ul&gt;I changed the following lines in project-config.jam to look like this:&lt;br /&gt;&lt;blockquote style="color: #38761d;"&gt;&lt;i&gt;# Python configuration&lt;br /&gt;# using python : 2.5 : /Library/Frameworks/Python.framework/Versions/2.5 ;&lt;br /&gt;using python : 2.5 : /System/Library/Frameworks/Python.framework/Versions/2.5/ ;&lt;br /&gt;&lt;/i&gt;&lt;br /&gt;&lt;/blockquote&gt;Noticed that i changed the python interpreter to use another location (This same location must be used to build PyCuda subsequently (i.e. &lt;i style="color: blue;"&gt;/System/Library/Frameworks/Python.framework/Versions/2.5/bin/python ./setup.py build&lt;/i&gt; etc)&lt;br /&gt;&lt;br /&gt;Should be alright now...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Q:&lt;/b&gt; When i ran the test_driver.py, i found an error message like this&lt;br /&gt;&lt;blockquote&gt;&lt;i style="color: blue;"&gt;Traceback (most recent call last):&lt;/i&gt;&lt;br /&gt;&lt;i style="color: blue;"&gt;&amp;nbsp; File "test_driver.py", line 15, in &lt;module&gt;&lt;/module&gt;&lt;/i&gt;&lt;br /&gt;&lt;i style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gen = pycuda.autoinit.device.get_attributes()&lt;/i&gt;&lt;br /&gt;&lt;i style="color: blue;"&gt;&amp;nbsp; File "/Library/Python/2.5/site-packages/pycuda-0.93-py2.5-macosx-10.5-i386.egg/pycuda/driver.py", line 50, in device_get_attributes&lt;/i&gt;&lt;br /&gt;&lt;i style="color: blue;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print getattr(device_attribute,i), dev.get_attribute(getattr(device_attribute, i))&lt;/i&gt;&lt;br /&gt;&lt;i style="color: blue;"&gt;pycuda._driver.LogicError: cuDeviceGetAttribute failed: not found&lt;/i&gt;&lt;br /&gt;&lt;/blockquote&gt;Why is this happening?&lt;br /&gt;&lt;b&gt;A:&lt;/b&gt; Frankly, i don't know at the moment but the other tests in the package i've run and tested is alright but something odd is happening (I'm sure its got something to do with my lack of understanding of how libraries are loaded/unloaded)&lt;br /&gt;&lt;br /&gt;What i know at this point is that when i re-worked the package a little to provide me debug output, i've found an interesting scenario and you can see this below&lt;br /&gt;&lt;blockquote&gt;&lt;div style="color: blue;"&gt;&lt;i&gt;cuDeviceGetAttribute&lt;br /&gt;CAN_MAP_HOST_MEMORY 1&lt;br /&gt;cuDeviceGetAttribute&lt;br /&gt;CLOCK_RATE&lt;br /&gt;Traceback (most recent call last):&lt;br /&gt;&amp;nbsp; File "test_driver.py", line 15, in &lt;module&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; gen = pycuda.autoinit.device.get_attributes()&lt;br /&gt;&amp;nbsp; File "/Library/Python/2.5/site-packages/pycuda-0.93-py2.5-macosx-10.5-i386.egg/pycuda/driver.py", line 50, in device_get_attributes&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; print getattr(device_attribute,i), dev.get_attribute(getattr(device_attribute, i))&lt;br /&gt;pycuda._driver.LogicError: cuDeviceGetAttribute failed: not found&lt;br /&gt;&lt;/module&gt;&lt;/i&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="color: blue;"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;br /&gt;&lt;/div&gt;&lt;/blockquote&gt;You will see a output that resembles this:&lt;br /&gt;&lt;b style="color: blue;"&gt;&lt;i&gt;cuDeviceGetAttribute&lt;/i&gt;&lt;/b&gt; &amp;lt;--- this was enabled by setting &lt;i style="color: purple;"&gt;CUDA_TRACE=True&lt;/i&gt; (False is default)&lt;br /&gt;&lt;i&gt;&lt;b style="color: blue;"&gt;CAN_MAP_HOST_MEMORY&lt;/b&gt;, 1&lt;/i&gt; &amp;lt;--- CUDA device attribute mapped to PyCUDA and the number 1 is the end result of calling the actual API in CUDA&lt;br /&gt;and from some simple deduction you can see that for some reason the program reported that it could not locate the API by the stack trace. .... Will investigate further when i have the bandwidth again but meanwhile feel free to email me if you have an idea why its failing&lt;br /&gt;&lt;br /&gt;---- &lt;span style="color: purple;"&gt;update on 25 dec 2009&lt;/span&gt; -----&lt;br /&gt;A small and quick update is that after removing the attribute &lt;i style="color: blue;"&gt;CLOCK_RATE&lt;/i&gt; in &lt;i style="color: blue;"&gt;driver.py&lt;/i&gt;, it ran fine and the verification passed. &lt;br /&gt;---- &lt;span style="color: purple;"&gt;end of update&lt;/span&gt; ------&lt;br /&gt;&lt;br /&gt;---- &lt;span style="color: purple;"&gt;update on 30 dec 2009&lt;/span&gt; ----&lt;br /&gt;Finally, i've got spare time to look at this and turns out to be a problem with GeForce graphics card present on my machine (there are 2 &lt;i style="color: blue;"&gt;9400m&lt;/i&gt; and &lt;i style="color: blue;"&gt;9600m GT&lt;/i&gt;)&lt;br /&gt;In a nutshell, i've used the low powered version i.e. &lt;i style="color: blue;"&gt;9400m&lt;/i&gt; and when i changed to &lt;i style="color: blue;"&gt;9600m GT&lt;/i&gt; (high performance chip) the problem was resolved.&lt;br /&gt;This is odd because this problem was not present in CUDA 2.0 - Apple / Nvidia bug? Hrmm...&lt;br /&gt;---- &lt;span style="color: purple;"&gt;end of update&lt;/span&gt; ----&lt;br /&gt;&lt;br /&gt;But other than that, so far so good :) Have fun !&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-800053563555793601?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/800053563555793601/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=800053563555793601&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/800053563555793601'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/800053563555793601'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/12/pycuda-on-mac-os-x.html' title='PyCUDA on Mac OS X'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-1278882846675818947</id><published>2009-12-17T17:55:00.002+08:00</published><updated>2009-12-17T20:06:49.216+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ctypes'/><category scheme='http://www.blogger.com/atom/ns#' term='PythonWin'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>PythonWin and Python Ctypes</title><content type='html'>Ran into a situation when using &lt;a href="https://sourceforge.net/projects/pywin32/"&gt;PythonWin &lt;/a&gt;by Mark Hammond - btw its a really nice IDE with syntax highlighting and debugging capabilities :) kudos!. So what happened is this:&lt;br /&gt;When i ran this simple piece of code against the PythonWin and a normal Python session from the Windows command prompt, i get two results.&lt;br /&gt;&lt;pre&gt;&amp;gt;&amp;gt;&amp;gt; from ctypes import *&lt;br /&gt;&amp;gt;&amp;gt;&amp;gt; cdll.msvcrt.printf(c_char_p('hello world'))&lt;br /&gt;&lt;/pre&gt;When i ran it against the &lt;i&gt;PythonWin&lt;/i&gt;, i was only seeing the string '&lt;i&gt;11&lt;/i&gt;' returned and in the other case i was seeing '&lt;i&gt;hello world11&lt;/i&gt;' which is what i expected. For some reason unknown to me at this point in time, PythonWin is capturing the result of the call to printf() whilst ignoring/hid the I/O conducted by printf(). I think it is a bug and i've &lt;a href="https://sourceforge.net/tracker/?func=detail&amp;amp;aid=2916140&amp;amp;group_id=78018&amp;amp;atid=551954"&gt;filed it&lt;/a&gt; &amp;lt;- in case you're interested - but in a nutshell its basically the group of I/O functions &lt;i&gt;sys.stdout&lt;/i&gt; etc being overridden &lt;br /&gt;&lt;br /&gt;Of course i'm not going to dismiss PythonWin (i think its a great job at that, thanks Mark) but i just thought that you might like to know incase you bumped into this.&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-1278882846675818947?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/1278882846675818947/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=1278882846675818947&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/1278882846675818947'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/1278882846675818947'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/12/pythonwin-and-python-ctypes.html' title='PythonWin and Python Ctypes'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7638298190681779296</id><published>2009-11-02T16:17:00.005+08:00</published><updated>2010-01-15T21:16:44.936+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='load testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Tsung'/><category scheme='http://www.blogger.com/atom/ns#' term='performance testing'/><category scheme='http://www.blogger.com/atom/ns#' term='Erlang'/><title type='text'>Tsung - Load Testing Tool in Erlang</title><content type='html'>You've must have heard of Tsung - a load testing tool. Well i've got the opportunity to learn and use it in my current employment and its really versatile in conducting load tests for web-based applications. It has many features that i've come to appreciate after spending a couple of years with Mercury Interactive (its defunct now after being bought over by HP in 2006); admittedly Mercury's solutions were more versatile than what Tsung can offer but considering that web is the common platform where many applications are being hosted on; i think its a safe bet :)&lt;br /&gt;&lt;br /&gt;So, IMO i think&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Tsung is good for conducting load testing scenarios and executing them in a local / distributed manner&lt;/li&gt;&lt;li&gt;Load Testing can be relatively light weight (Erlang processes are hitting the target app) and hence the cost of using relatively heavy weight machines is likely to reduce since there is lesser need to use those machines since more concurrent users can be simulated on 1 machine in Tsung. (I'll see whether this assumption is correct)&lt;/li&gt;&lt;/ol&gt;Good starting points (URLs of interest):&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://tsung.erlang-projects.org/"&gt;http://tsung.erlang-projects.org/&lt;sup&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://www.erlang.org/"&gt;http://www.erlang.org&lt;sup&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/li&gt;&lt;/ol&gt;&lt;ol&gt;&lt;/ol&gt;A load testing tool would be useless if it didn't know how to capture server response data (e.g. checking whether an expected string is returned) and reuse it (e.g. session ids returned from servers which you can use subsequently in a HTTP POST/GET in an URL-rewrite type of string) subsequently, generate dynamic data, read data from an external file (commonly used in storing &amp;lt;&lt;i&gt;username&lt;/i&gt;, &lt;i&gt;password&lt;/i&gt;&amp;gt; pairs)   &lt;br /&gt;So my example would illustrate the logging in to a website (e.g. &lt;a href="http://projecteuler.net/"&gt;http://projecteuler.net&lt;sup&gt;&lt;/sup&gt;&lt;/a&gt;) and logging out from it - simple enough to illustrate my point.&lt;br /&gt;Next what i did was to record the series of events that mimick my user logging and logging out of the website and this is captured in the tsung_recorder_timestamp.xml&lt;br /&gt;and i used that XML file and included some other stuff so that it looks like what i have for you below (this is a basic load test scenario)&lt;br /&gt;&lt;pre&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br /&gt;&amp;lt;!DOCTYPE tsung SYSTEM "/usr/local/share/tsung/tsung-1.0.dtd"&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;tsung loglevel="info" dumptraffic="false" version="1.0"&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;clients&amp;gt;&lt;br /&gt;&amp;lt;client host="localhost" use_controller_vm="true"/&amp;gt;&lt;br /&gt;&amp;lt;/clients&amp;gt;&lt;br /&gt;&amp;lt;servers&amp;gt;&lt;br /&gt;&amp;lt;server host="78.110.165.8" port="80" type="tcp"&amp;gt;&amp;lt;/server&amp;gt;&lt;br /&gt;&amp;lt;/servers&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;load&amp;gt;&lt;br /&gt;&amp;lt;arrivalphase phase="1" duration="2" unit="minute"&amp;gt;&lt;br /&gt;&amp;lt;users interarrival="1" unit="minute"&amp;gt;&amp;lt;/users&amp;gt;&lt;br /&gt;&amp;lt;/arrivalphase&amp;gt;&lt;br /&gt;&amp;lt;user session="rec20091102-01:30" start_time="0" unit="second"&amp;gt;&amp;lt;/user&amp;gt;&lt;br /&gt;&amp;lt;/load&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;options&amp;gt;&lt;br /&gt;&amp;lt;option name="file_server" value="/tmp/userlist.csv"&amp;gt;&amp;lt;/option&amp;gt;&lt;br /&gt;&amp;lt;/options&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;sessions&amp;gt;&lt;br /&gt;&amp;lt;session name='rec20091102-01:30' probability='100'  type='ts_http'&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='http://projecteuler.net/' version='1.1' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/style_main.css' version='1.1' if_modified_since='Sat, 29 Nov 2008 20:04:00 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/logo.jpg' version='1.1' if_modified_since='Thu, 28 Dec 2006 14:12:43 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/icon_register.png' version='1.1' if_modified_since='Fri, 31 Dec 2004 12:48:16 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/icon_about.png' version='1.1' if_modified_since='Fri, 31 Dec 2004 12:48:28 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/icon_problems.png' version='1.1' if_modified_since='Fri, 31 Dec 2004 12:48:26 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/icon_login.png' version='1.1' if_modified_since='Fri, 31 Dec 2004 12:48:32 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='http://projecteuler.net/images/corner_tr.gif' version='1.1' if_modified_since='Thu, 10 Apr 2008 19:35:02 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/corner_tl.gif' version='1.1' if_modified_since='Thu, 10 Apr 2008 19:34:41 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/corner_br.gif' version='1.1' if_modified_since='Thu, 10 Apr 2008 19:35:10 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/corner_bl.gif' version='1.1' if_modified_since='Thu, 10 Apr 2008 19:34:55 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/euler_main.jpg' version='1.1' if_modified_since='Mon, 21 Jan 2002 19:18:20 GMT' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;thinktime random='true' value='2'/&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='http://projecteuler.net/index.php?section=login' version='1.1' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;thinktime random='true' value='6'/&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;request subst="true"&amp;gt;&lt;br /&gt;&amp;lt;match do="continue" when="match"&amp;gt;Logged in as %%readcsv:getUsername%%&amp;lt;/match&amp;gt;&lt;br /&gt;&amp;lt;http url='/index.php' version='1.1'  contents='%%readcsv:getUserString%%' content_type='application/x-www-form-urlencoded' method='POST'&amp;gt;&amp;lt;/http&amp;gt;&lt;br /&gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;thinktime random='true' value='4'/&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='/images/icon_tick.png' version='1.1' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;thinktime random='true' value='4'/&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;thinktime random='true' value='3'/&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;request&amp;gt;&amp;lt;http url='http://projecteuler.net/index.php?section=logout' version='1.1' method='GET'&amp;gt;&amp;lt;/http&amp;gt;&amp;lt;/request&amp;gt;&lt;br /&gt;&amp;lt;/session&amp;gt;&lt;br /&gt;&amp;lt;/sessions&amp;gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;/tsung&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Hence, the main thing you should note is the use of dynamic substitution (e.g. %%readcsv:getUsername%%) where i wrote a simple erlang program to read my username and password from a file (see the XML tag &lt;i&gt;option&lt;/i&gt; above) and replacing each simulated user with a valid user id and password. &lt;br /&gt;Next, i checked that the server response contains a string &lt;i&gt;Logged in as XXX&lt;/i&gt; where XXX would be dynamically generated by the function (Check out the erlang code for the function, simple stuff).&lt;br /&gt;The erlang program is shown below. &lt;br /&gt;&lt;div class="code panel" style="border-width: 1px;"&gt;&lt;div class="codeContent panelContent"&gt;&lt;pre class="code-none"&gt;1 -module(readcsv).&lt;br /&gt;2 -export([getUserString/1, getUsername/1]).&lt;br /&gt;3&lt;br /&gt;4 getUserString({Pid, DynVar}) -&amp;gt;&lt;br /&gt;5  {ok, Line} = ts_file_server:get_next_line(),&lt;br /&gt;6  [Uid,Pwd] = string:tokens(Line, ","),&lt;br /&gt;7  "username=" ++ Uid ++ "&amp;amp;password=" ++ Pwd ++ "&amp;amp;login=Login".&lt;br /&gt;8&lt;br /&gt;9 getUsername({Pid, DynVar}) -&amp;gt;&lt;br /&gt;10  {ok, Line} = ts_file_server:get_next_line(),&lt;br /&gt;11  [Uid,_] = string:tokens(Line, ","),&lt;br /&gt;12  Uid.&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;The above erlang code must be compiled via &lt;i&gt;erlc&lt;/i&gt; and you place the &lt;i&gt;.beam&lt;/i&gt; file into the directory via a command like this &lt;br /&gt;&lt;pre&gt;sudo mv readcsv.beam /usr/local/lib/erlang/lib/tsung-1.3.1/ebin/&lt;/pre&gt;&lt;br /&gt;Now, running the load test should be alright.&lt;br /&gt;&lt;b&gt;Note:&lt;/b&gt; In this load test, i defined a duration of 2 minutes with 2 users since load testing using 800 gazillion users is considered a chargeable offense so DON'T DO IT. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Have fun!&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7638298190681779296?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7638298190681779296/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7638298190681779296&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7638298190681779296'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7638298190681779296'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/11/tsung-load-testing-tool-in-erlang.html' title='Tsung - Load Testing Tool in Erlang'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-5736419851704191129</id><published>2009-10-24T10:21:00.008+08:00</published><updated>2009-12-17T17:56:42.434+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scala'/><category scheme='http://www.blogger.com/atom/ns#' term='problem solving'/><category scheme='http://www.blogger.com/atom/ns#' term='math'/><title type='text'>Simple problem solving using Scala</title><content type='html'>If you like solving mathematical problems using Scala, i would suggest that you sign up at &lt;a href="http://projecteuler.net/"&gt;Project Euler&lt;/a&gt; (Thanks to my good friend Chi Hung for getting me hooked!) and use your favourite programming language or languages to solve it (There is more than 1 way to accomplish a goal)&lt;br /&gt;&lt;br /&gt;So here's my take on it using Scala and just to provide an example, i'm using problem 4&lt;br /&gt;&lt;pre&gt;scala&gt; var largest = 1&lt;br /&gt;largest: Int = 1&lt;br /&gt;&lt;br /&gt;scala&gt; for(i &lt;- 1 until 1000) for( j &lt;- i until 1000) {&lt;br /&gt;|  val v = i*j                                            &lt;br /&gt;|  if ((v.toString.reverse.mkString) == (v.toString)) {     &lt;br /&gt;|   if (v &gt; largest) largest = v               &lt;br /&gt;|  }&lt;br /&gt;| }&lt;br /&gt;&lt;br /&gt;scala&gt; largest&lt;br /&gt;res5: Int = 906609&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This is not efficient since i'm looking for the largest palindrome made from 2 3-digit numbers so i should reverse this and possibly limit the data ranges to perhaps half.  Anyway, i'd like to show you how this simple function can be factored into a style more Scala-like.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;scala&gt; var largest = 1&lt;br /&gt;largest:Int = 1&lt;br /&gt;&lt;br /&gt;scala&gt; def isPalindrome(s:Any):(Boolean,Any) = {&lt;br /&gt;|  if (s.toString.reverse.mkString == s.toString) (true,s)&lt;br /&gt;|  else (false,s)&lt;br /&gt;| }&lt;br /&gt;isPalindrome: (Any)(Boolean, Any)&lt;br /&gt;scala&gt; (1 to 100).map(i =&gt; (1 to 100).map(j =&gt; if ( isPalindrome(i*j)._1 &amp;amp;&amp;amp; (i*j) &gt; largest) largest = (i*j)))&lt;br /&gt;scala&gt; largest&lt;br /&gt;res12: Int = 9009&lt;br /&gt;scala&gt; (1 to 1000).map(i =&gt; (1 to 1000).map(j =&gt; if ( isPalindrome(i*j)._1 &amp;amp;&amp;amp; (i&lt;br /&gt;*j) &gt; largest) largest = (i*j)))&lt;br /&gt;res13: RandomAccessSeq.Projection[RandomAccessSeq.Projection[Unit]] = RangeM(Ran&lt;br /&gt;geM((), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (),&lt;br /&gt;(), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (),&lt;br /&gt;(), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (),&lt;br /&gt;(), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (),...&lt;br /&gt;scala&gt; largest&lt;br /&gt;res14: Int = 906609&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Ignore the part on the variable res13 and this version ran pretty fast for me too and i like this version much better than the former. It can be faster but i'd like to show you how to solve problems for now.&lt;br /&gt;&lt;br /&gt;Feedback is appreciated!&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-5736419851704191129?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/5736419851704191129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=5736419851704191129&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5736419851704191129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5736419851704191129'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/10/simple-problem-solving-using-scala.html' title='Simple problem solving using Scala'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8846398048105134879</id><published>2009-09-17T12:54:00.013+08:00</published><updated>2009-12-17T17:56:26.186+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Scala'/><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='OOP'/><category scheme='http://www.blogger.com/atom/ns#' term='FP'/><title type='text'>Scala - I like it!</title><content type='html'>&lt;a href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SrHDZzS5gSI/AAAAAAAABD0/7INmwn6RvFI/s1600-h/Picture+2.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5382297877718073634" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SrHDZzS5gSI/AAAAAAAABD0/7INmwn6RvFI/s320/Picture+2.png" style="cursor: pointer; float: left; height: 83px; margin: 0pt 10px 10px 0pt; width: 320px;" /&gt;&lt;/a&gt;&lt;br /&gt;Have you heard of &lt;a href="http://www.scala-lang.org/"&gt;Scala&lt;/a&gt;? Its a new programming language designed by &lt;a href="http://lamp.epfl.ch/%7Eodersky/"&gt;Martin Odersky&lt;/a&gt; and i'm exploring it as i write this.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The motivation is this:&lt;br /&gt;&lt;blockquote&gt;I'm working in an environment whereby Java is not the mainstream language being used and i don't want to lose the accumulated programming experiences using Java and i want to find a new way (think functional + imperative) to use it and create apps that scale on multi-core efficiently.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;For those whom have followed my other blog on &lt;a href="http://erlangraymondtay.blogspot.com/"&gt;Erlang&lt;/a&gt; (You can pretty much tell i like functional programming) but here's a new language that combines both OOP + FP. Its a relatively young language when compared to Ruby, Python, Erlang ... but i think its got great potential. Here are two organizations that are using it *drum rolls*&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SrHEG0dYg2I/AAAAAAAABD8/w-p73jAITh0/s1600-h/Picture+5.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5382298651124597602" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SrHEG0dYg2I/AAAAAAAABD8/w-p73jAITh0/s320/Picture+5.png" style="cursor: pointer; float: left; height: 122px; margin: 0pt 10px 10px 0pt; width: 280px;" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SrHEj6NT7AI/AAAAAAAABEE/VhwhqFq1kkk/s1600-h/Picture+6.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5382299150884006914" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SrHEj6NT7AI/AAAAAAAABEE/VhwhqFq1kkk/s320/Picture+6.png" style="cursor: pointer; float: left; height: 71px; margin: 0pt 10px 10px 0pt; width: 176px;" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SrHE08jYpVI/AAAAAAAABEM/aEltX7su-sk/s1600-h/Picture+7.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5382299443571238226" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SrHE08jYpVI/AAAAAAAABEM/aEltX7su-sk/s320/Picture+7.png" style="cursor: pointer; float: left; height: 66px; margin: 0pt 10px 10px 0pt; width: 166px;" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here's a class hierarchy diagram in Scala:&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/Ssw_9jZb_nI/AAAAAAAABEU/tjEVvRnpS4s/s1600-h/Picture+3.png" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}"&gt;&lt;img alt="" border="0" id="BLOGGER_PHOTO_ID_5389753180762144370" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/Ssw_9jZb_nI/AAAAAAAABEU/tjEVvRnpS4s/s320/Picture+3.png" style="cursor: pointer; height: 228px; width: 320px;" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;try {var pageTracker = _gat._getTracker("UA-5323400-1");pageTracker._trackPageview();} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8846398048105134879?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8846398048105134879/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8846398048105134879&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8846398048105134879'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8846398048105134879'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/09/scala-i-like-it.html' title='Scala - I like it!'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_n2HkB0XD3Kw/SrHDZzS5gSI/AAAAAAAABD0/7INmwn6RvFI/s72-c/Picture+2.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-11214419502805813</id><published>2009-08-03T18:11:00.000+08:00</published><updated>2009-08-03T18:13:40.545+08:00</updated><title type='text'>Job openings at Linden Lab</title><content type='html'>The company i'm working for (Linden Lab) has a great list of available positions and i would like to invite interested and passionate individuals about Virtual Worlds and Second Life to go check it out :D ---&gt; http://lindenlab.com/employment&lt;br /&gt;&lt;br /&gt;If you are good with software development (that means programming) and/or QA-ing experience, feel free to drop me a mailto:tay_boon_leong@yahoo.com.sg&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-11214419502805813?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/11214419502805813/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=11214419502805813&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/11214419502805813'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/11214419502805813'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/08/job-openings-at-linden-lab.html' title='Job openings at Linden Lab'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-525256980487790311</id><published>2009-07-15T11:09:00.003+08:00</published><updated>2009-07-15T11:16:03.560+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='FSF'/><category scheme='http://www.blogger.com/atom/ns#' term='free software foundation'/><title type='text'>Should free software depend on Mono or C#?</title><content type='html'>Here's a fantastic read on why free software should never depend on Mono or C# authored by one of the favourite authors of OSes Mr. Richard Stallman. Click &lt;a href="http://www.fsf.org/news/dont-depend-on-mono"&gt;here&lt;/a&gt; for the full story.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Debian's decision to include Mono in its principal way of installing GNOME, for the sake of Tomboy which is an application written in C#, leads the community in a risky direction.  It is dangerous to depend on C#, so we need to discourage its use.&lt;br /&gt;...&lt;br /&gt;We should systematically arrange to depend on the free C# implementations as little as possible. In other words, we should discourage people from writing programs in C#. Therefore, we should not include C# implementations in the default installation of GNU/Linux distributions or in their principal ways of installing GNOME, and we should distribute and recommend non-C# applications rather than comparable C# applications whenever possible.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Make your own conclusions on this but personally, i've never been a fan of C# myself....not now and to be quite frank possibly never.&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-525256980487790311?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/525256980487790311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=525256980487790311&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/525256980487790311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/525256980487790311'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/07/should-free-software-depend-on-mono-or.html' title='Should free software depend on Mono or C#?'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2279333339263564307</id><published>2009-07-12T10:20:00.000+08:00</published><updated>2009-07-12T10:25:52.384+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multicore'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenMP'/><category scheme='http://www.blogger.com/atom/ns#' term='compiler pragma'/><title type='text'>OpenMP - 'firstprivate' and 'lastprivate' caveat</title><content type='html'>I've just started experimenting &lt;a href="http://www.openmp.org/"&gt;OpenMP&lt;/a&gt; and its another super cool multicore development paradigm as compared to &lt;a href="http://www.nvidia.com/"&gt;NVIDIA&lt;/a&gt;'s CUDA where the former concentrates on utilizing the actual CPU cores while the latter is on using the vast numbers of GPUs in the Nvidia's graphics cards.&lt;br /&gt;&lt;br /&gt;To begin, let me do a demonstration of what '&lt;span style="color: rgb(51, 102, 255);"&gt;firstprivate&lt;/span&gt;' or '&lt;span style="color: rgb(51, 102, 255);"&gt;lastprivate&lt;/span&gt;' do.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;extern int n; // external storage linked to some variable i've defined somewhere else.&lt;br /&gt;void demo_firstprivate(void) {&lt;br /&gt; int i, indx, TID;&lt;br /&gt; int a[n];&lt;br /&gt; for(i = 0; i &amp;lt; n; i++ )&lt;br /&gt;  a[i] = -i-1;&lt;br /&gt; indx = 4;&lt;br /&gt; int n1 = 1;&lt;br /&gt;#pragma omp parallel default(none) firstprivate(indx) private(i, TID) shared(n1,a)&lt;br /&gt; {&lt;br /&gt;  TID = omp_get_thread_num();&lt;br /&gt;  indx += n1*TID;&lt;br /&gt;  for( i = indx; i &amp;lt; indx + n1; i++)&lt;br /&gt;   a[i] = TID + 1;&lt;br /&gt; }// end of parallel region&lt;br /&gt; &lt;br /&gt; printf("After the parallel region:\n");&lt;br /&gt; for( i = 0; i &amp;lt; n; i++ )&lt;br /&gt;  printf("a[%d] = %d\n", i, a[i]);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The output is&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Demo 'firstprivate' clause begin..&lt;br /&gt;After the parallel region:&lt;br /&gt;a[0] = -1&lt;br /&gt;a[1] = -2&lt;br /&gt;a[2] = -3&lt;br /&gt;a[3] = -4&lt;br /&gt;a[4] = 1&lt;br /&gt;a[5] = 2&lt;br /&gt;a[6] = -7&lt;br /&gt;a[7] = -8&lt;br /&gt;a[8] = -9&lt;br /&gt;a[9] = -10&lt;br /&gt;Demo 'firstprivate' clause end..&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;There are two things you need to realize when using firstprivate&lt;br /&gt;(1) The &lt;span style="color: rgb(51, 102, 255);"&gt;firstprivate&lt;/span&gt; variable is initialized once per thread&lt;br /&gt;(2) In C++, the &lt;span style="color: rgb(51, 102, 255);"&gt;firstprivate&lt;/span&gt; object is constructed by calling its copy constructor with the master thread's copy of the variable as its argument.&lt;br /&gt;&lt;br /&gt;Demo of &lt;span style="color: rgb(51, 102, 255);"&gt;lastprivate&lt;/span&gt; is the following code&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;extern int n;&lt;br /&gt;void demo_lastprivate(void) {&lt;br /&gt; int a = 0;&lt;br /&gt; int i;&lt;br /&gt;#pragma omp parallel for private(i) lastprivate(a)&lt;br /&gt;  for(i = 0; i &amp;lt; n; i++) {&lt;br /&gt;   a = i + 1;&lt;br /&gt;   printf("Thread:%d got value: %d, iteration: %d\n", omp_get_thread_num(), a, i);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt; // End of parallel region&lt;br /&gt; printf("value of 'lastprivate' variable 'a' is %d\n", a);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The output is&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Demo 'lastprivate' clause begin..&lt;br /&gt;Thread:1 got value: 6, iteration: 5&lt;br /&gt;Thread:0 got value: 1, iteration: 0&lt;br /&gt;Thread:1 got value: 7, iteration: 6&lt;br /&gt;Thread:0 got value: 2, iteration: 1&lt;br /&gt;Thread:1 got value: 8, iteration: 7&lt;br /&gt;Thread:0 got value: 3, iteration: 2&lt;br /&gt;Thread:1 got value: 9, iteration: 8&lt;br /&gt;Thread:1 got value: 10, iteration: 9&lt;br /&gt;Thread:0 got value: 4, iteration: 3&lt;br /&gt;Thread:0 got value: 5, iteration: 4&lt;br /&gt;value of 'lastprivate' variable 'a' is 10&lt;br /&gt;Demo 'lastprivate' clause end..&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The caveat in using the &lt;span style="color: rgb(51, 102, 255);"&gt;lastprivate&lt;/span&gt; clause is&lt;br /&gt;(1) If the lastprivate variable is some sort of an array or structure and only some elements or fields are assigned in the last iteration; then after the parallel execution, the elements or fields that were not assigned in the final iteration are undefined.&lt;br /&gt;(2) In C++, this variable/object needs to have its copy assignment operator invoked with the master thread's copy with the sequentially last value of the variable as the argument.&lt;br /&gt;&lt;br /&gt;That is, both copy assignment operator and copy constructor must be publicly available otherwise you'll find yourself in quite a fix - basically a complier error will be reflected and depending on your compiler e.g. Xcode, VS, gcc you might be able to figure out why the error is there in the first place.&lt;br /&gt;&lt;br /&gt;The last thing i wanted to share is that under OpenMP, variables pointing to heap storage are shared by all threads in the program. So you need to be careful while dealing with memory allocation otherwise you will get runtime errors like&lt;pre&gt;Non-aligned pointer being freed ... or double free&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2279333339263564307?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2279333339263564307/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2279333339263564307&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2279333339263564307'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2279333339263564307'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/07/openmp-firstprivate-and-lastprivate.html' title='OpenMP - &apos;firstprivate&apos; and &apos;lastprivate&apos; caveat'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7887379460155803508</id><published>2009-06-28T21:37:00.003+08:00</published><updated>2009-06-28T21:42:55.224+08:00</updated><title type='text'>Netflix has a winner! Well not if someone beats them to it in 30 days</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SkdyQ8OhzXI/AAAAAAAABDs/IwhbVfcbcbQ/s1600-h/netflix_probable_winner.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 194px;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SkdyQ8OhzXI/AAAAAAAABDs/IwhbVfcbcbQ/s320/netflix_probable_winner.jpg" alt="" id="BLOGGER_PHOTO_ID_5352372317523660146" border="0" /&gt;&lt;/a&gt;Caught this on New York Times while trying to catchup with the rest of the world. See the link &lt;a href="http://bits.blogs.nytimes.com/2009/06/26/and-the-winner-of-the-1-million-netflix-prize-probably-is/"&gt;here&lt;/a&gt; for more details. I've got a snapshot of that story here in case you want to just get the gees of it.&lt;br /&gt;&lt;br /&gt;Turns out that they're not revealing their secret success yet (since they hope there are no challengers within the next 30 days) which is understandable but i sure would like to find out how that team engineered the solution.&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7887379460155803508?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7887379460155803508/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7887379460155803508&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7887379460155803508'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7887379460155803508'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/06/netflix-has-winner-well-not-if-someone.html' title='Netflix has a winner! Well not if someone beats them to it in 30 days'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/SkdyQ8OhzXI/AAAAAAAABDs/IwhbVfcbcbQ/s72-c/netflix_probable_winner.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8392422914275082489</id><published>2009-05-31T19:23:00.008+08:00</published><updated>2009-05-31T20:51:08.194+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 2.2'/><title type='text'>CUDA and libcurl - a simple demo</title><content type='html'>Recently, i got to know &lt;a href="http://curl.haxx.se/"&gt;cURL&lt;/a&gt; and &lt;a href="http://curl.haxx.se/libcurl"&gt;libcurl&lt;/a&gt; and wondered whether i could get sample apps to run under CUDA. However i could only get it to run under &lt;span class="Apple-style-span" style="font-style: italic;"&gt;device emulation&lt;/span&gt; mode under CUDA for the simple reason that libcurl is not CUDA enabled if i may use this term. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So here's the sample run with runtimes and the source code. &lt;/div&gt;&lt;div&gt;The one below is run under simple multi-threading implementation&lt;/div&gt;&lt;div&gt;&lt;pre&gt;ray:src ray$ time ./multiDownload 1&gt;/dev/null 2&gt;&amp;amp;1&lt;br /&gt;&lt;br /&gt;real 0m3.261s&lt;br /&gt;user 0m0.014s&lt;br /&gt;sys 0m0.028s&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This one below is run under CUDA-device emulation mode&lt;/div&gt;&lt;div&gt;&lt;pre&gt;ray:src ray$ time ./cudaDownload 1&gt;/dev/null 2&gt;&amp;amp;1&lt;br /&gt;&lt;br /&gt;real 0m31.893s&lt;br /&gt;user 0m0.022s&lt;br /&gt;sys 0m0.038s&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The performance was disappointing while running under device emulation mode under CUDA and I looked into the runtimes to find an explanation as to why. Here are 2 notes i've made ...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;1. Under device emulation mode, the CUDA still launched as many threads as implied by the source code(s) which is cool stuff. See gdb output of runtime below. From the output below, you can see that the threads are in the semaphore_wait_trap() which probably explains why the runtimes suck that much.&lt;/div&gt;&lt;pre&gt;&lt;br /&gt;(gdb) info threads&lt;br /&gt;33 process 28855 thread 0x5303  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;32 process 28855 thread 0x5103  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;31 process 28855 thread 0x4f03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;30 process 28855 thread 0x4d03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;29 process 28855 thread 0x4b03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;28 process 28855 thread 0x4903  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;27 process 28855 thread 0x4703  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;26 process 28855 thread 0x4503  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;25 process 28855 thread 0x4303  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;24 process 28855 thread 0x4103  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;23 process 28855 thread 0x3f03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;22 process 28855 thread 0x3d03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;21 process 28855 thread 0x3b03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;20 process 28855 thread 0x3903  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;19 process 28855 thread 0x3703  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;18 process 28855 thread 0x3503  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;17 process 28855 thread 0x3303  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;16 process 28855 thread 0x3103  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;15 process 28855 thread 0x2f03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;14 process 28855 thread 0x2d03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;13 process 28855 thread 0x2b03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;12 process 28855 thread 0x2903  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;11 process 28855 thread 0x2703  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;10 process 28855 thread 0x2503  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;9 process 28855 thread 0x2303  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;8 process 28855 thread 0x2103  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;7 process 28855 thread 0x1f03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;6 process 28855 thread 0x1d03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;5 process 28855 thread 0x1b03  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;4 process 28855 thread 0x1903  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;3 process 28855 thread 0x1703  0x946026fa in select$DARWIN_EXTSN ()&lt;br /&gt;2 process 28855 thread 0x1503  0x945b32c2 in semaphore_wait_trap ()&lt;br /&gt;* 1 process 28855 local thread 0x2d03  0x945ba46e in __semwait_signal ()&lt;br /&gt;(gdb) disassemble semaphore_wait_trap&lt;br /&gt;Dump of assembler code for function semaphore_wait_trap:&lt;br /&gt;0x945b32b8 &lt;semaphore_wait_trap+0&gt;: mov    $0xffffffdc,%eax&lt;br /&gt;0x945b32bd &lt;semaphore_wait_trap+5&gt;: call   0x945b3ad4 &lt;_sysenter_trap&gt;&lt;br /&gt;0x945b32c2 &lt;semaphore_wait_trap+10&gt;: ret&lt;br /&gt;0x945b32c3 &lt;semaphore_wait_trap+11&gt;: nop&lt;br /&gt;End of assembler dump.&lt;/semaphore_wait_trap+11&gt;&lt;/semaphore_wait_trap+10&gt;&lt;/semaphore_wait_trap+5&gt;&lt;/semaphore_wait_trap+0&gt;&lt;/pre&gt;2. The &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;semaphore_wait_trap()&lt;/span&gt; resolves into launching a system call into the Mac OS X kernel which is not surprising as that's how CUDA implement kernels in device emulation mode and that causes most of the latencies since under this mode of execution, threads are executed on the CPUs and not on the GPUs.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Its expected that i could not cuda my sample application since its not possible to call &lt;span class="Apple-style-span" style="font-style: italic;"&gt;host&lt;/span&gt; function from within &lt;span class="Apple-style-span" style="font-style: italic;"&gt;kernel&lt;/span&gt; function (to borrow CUDA's terminology) so i had to compile and build it under device emulation but what this experiment demonstrated was that CUDA's device emulation mode may not be the answer that i was looking for but it raises my question "Wouldn't it be great if Nvidia could provide the software library to allow &lt;span class="Apple-style-span" style="font-style: italic;"&gt;kernel&lt;/span&gt; functions to call &lt;span class="Apple-style-span" style="font-style: italic;"&gt;host&lt;/span&gt; functions in the CUDA manner? " Perhaps its a work in progress.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A likely candidate for this sort of computing could be in &lt;a href="http://www.khronos.org"&gt;OpenCL&lt;/a&gt; (Open Computing Language) and it'll be in the next Mac OS (Snow Leopard) Yay! Read the press release &lt;a href="http://www.apple.com/pr/library/2008/06/09snowleopard.html"&gt;here&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here are the sources codes i used (this multi-threaded program was lifted from the libcurl website's example code and i merely modified some stuff to fit my experiment)&lt;/div&gt;&lt;pre&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;pthread.h&amp;gt;&lt;br /&gt;#include &amp;lt;curl/curl.h&amp;gt;&lt;br /&gt;&lt;br /&gt;#define NUMT 32&lt;br /&gt;&lt;br /&gt;const char* const urls[NUMT] = {&lt;br /&gt;"http://www.yahoo.com",&lt;br /&gt;"http://www.cnn.com",&lt;br /&gt;"http://www.hotmail.com",&lt;br /&gt;"http://www.gmail.com",&lt;br /&gt;"http://www.hp.com",&lt;br /&gt;"http://www.microsoft.com",&lt;br /&gt;"http://www.sun.com",&lt;br /&gt;"http://blogs.sun.com/",&lt;br /&gt;"http://www.acm.org",&lt;br /&gt;"http://blogs.sun.com/d/",&lt;br /&gt;"http://blogs.sun.com/jonathan",&lt;br /&gt;"http://blogs.sun.com/jimgris",&lt;br /&gt;"http://blogs.sun.com/theaquarium",&lt;br /&gt;"http://blogs.sun.com/arungupta",&lt;br /&gt;"http://blogs.sun.com/katakai",&lt;br /&gt;"http://blogs.sun.com/webmink",&lt;br /&gt;"http://blogs.sun.com/startups",&lt;br /&gt;"http://blogs.sun.com/geertjan",&lt;br /&gt;"http://blogs.sun.com/eclectic",&lt;br /&gt;"http://blogs.sun.com/theplanetarium",&lt;br /&gt;"http://blogs.sun.com/SDNProgramNews",&lt;br /&gt;"http://blogs.sun.com/GullFOSS",&lt;br /&gt;"http://blogs.sun.com/richb",&lt;br /&gt;"http://blogs.sun.com/chrisg",&lt;br /&gt;"http://blogs.sun.com/ontherecord",&lt;br /&gt;"http://blogs.sun.com/HPC",&lt;br /&gt;"http://blogs.sun.com/bblfish",&lt;br /&gt;"http://blogs.sun.com/enterprisetechtips",&lt;br /&gt;"http://blogs.sun.com/ahl",&lt;br /&gt;"http://blogs.sun.com/jag",&lt;br /&gt;"http://blogs.sun.com/bigadmin",&lt;br /&gt;"http://blogs.sun.com/brendan"&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;static void *pull_one_url(void* url) {&lt;br /&gt; CURL* curl;&lt;br /&gt; curl = curl_easy_init();&lt;br /&gt; curl_easy_setopt(curl, CURLOPT_URL, url);&lt;br /&gt; curl_easy_perform(curl);&lt;br /&gt; curl_easy_cleanup(curl);&lt;br /&gt;&lt;br /&gt;return NULL;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int main(int argc, char** argv) {&lt;br /&gt; pthread_t tid[NUMT];&lt;br /&gt; int i;&lt;br /&gt; int error;&lt;br /&gt; &lt;br /&gt; curl_global_init(CURL_GLOBAL_ALL);&lt;br /&gt; for( i = 0; i &amp;lt; NUMT; i++)&lt;br /&gt;  pthread_create(&amp;amp;tid[i], NULL, pull_one_url, (void*)urls[i]);&lt;br /&gt;&lt;br /&gt; for( i = 0; i &amp;lt; NUMT; i++)&lt;br /&gt;  pthread_join(tid[i], NULL);&lt;br /&gt;&lt;br /&gt;return 0;&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;Here's the portion in CUDA-style (I've shown only the portion where its different)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;__global__ void pull_one_url(char** url) {&lt;br /&gt; int tid = threadIdx.x;&lt;br /&gt; CURL* curl;&lt;br /&gt; curl = curl_easy_init();&lt;br /&gt; curl_easy_setopt(curl, CURLOPT_URL, url[tid]);&lt;br /&gt; curl_easy_perform(curl);&lt;br /&gt; curl_easy_cleanup(curl);&lt;br /&gt; printf("%d finished@%s\n", tid, url[tid]);&lt;br /&gt;return ;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int main(int argc, char** argv) {&lt;br /&gt; char** d_a;&lt;br /&gt; int memSize=0;&lt;br /&gt; for( int i = 0; i &amp;lt; NUMT; i++)&lt;br /&gt;  memSize += strlen(urls[i]);&lt;br /&gt; printf("size=%d\n", memSize);&lt;br /&gt; cudaMalloc((void**)&amp;amp;d_a, memSize);&lt;br /&gt; cudaMemcpy( d_a, urls, memSize, cudaMemcpyHostToDevice );&lt;br /&gt; for( int i = 0; i &amp;lt; NUMT; i++)&lt;br /&gt;  printf("%s\n", d_a[i]);&lt;br /&gt;&lt;br /&gt; pull_one_url&amp;lt;&amp;lt;&amp;lt;1, NUMT&amp;gt;&amp;gt;&amp;gt;(d_a);&lt;br /&gt;&lt;br /&gt; cudaFree(d_a);&lt;br /&gt;return 0;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8392422914275082489?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8392422914275082489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8392422914275082489&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8392422914275082489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8392422914275082489'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/05/cuda-and-libcurl-simple-demo.html' title='CUDA and libcurl - a simple demo'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2529116444426359683</id><published>2009-05-09T16:33:00.011+08:00</published><updated>2009-05-15T08:15:16.406+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='profiling'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>RunSnakeRun - View performance profile in Python</title><content type='html'>In my previous post on&lt;a href="http://raymondtay.blogspot.com/2009/05/pythons-greenletlinden-labs-eventlet.html"&gt; Greenlet/Eventlet&lt;/a&gt;, i went about testing and profiling the apps i wrote just to get a feel but admittedly, i do find reading the statistical data gathered thru profiling laborious and at times, it could get quite tiring..&lt;br /&gt;&lt;br /&gt;So, i've found a nice application that can read python application's profiling data. Its called &lt;a href="http://www.vrplumber.com/programming/runsnakerun/"&gt;RunSnakeRun&lt;/a&gt; which is pretty cool as it helps to visualize the data in a much more interesting and readable manner and the caveat is that you will need to download the appropriate &lt;a href="http://www.wxpython.org/download.php#binaries"&gt;wxPython&lt;/a&gt; binaries or source code to build for the platform you're using. In this case, i'm using Ubuntu Linux to run the profiling.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;Besides the proper wxPython, you will need to follow the instructions on the site to begin profiling and viewing your profiled data.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On a side note, if you are curious about profiling applications using python but have little idea what its all about, here's a &lt;a href="http://us.pycon.org/2009/conference/schedule/event/15/"&gt;link&lt;/a&gt; that gives you an idea why profiling is important in python code.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SgbwQlFuERI/AAAAAAAABDM/tDp88LSk6sE/s1600-h/RunSnakeRun.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 320px; height: 256px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SgbwQlFuERI/AAAAAAAABDM/tDp88LSk6sE/s320/RunSnakeRun.png" alt="" id="BLOGGER_PHOTO_ID_5334214976291606802" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The screen shot i have for you is a caller-callee graph with runtimes (accumulated and exclusive) which allows a python developer to quickly isolate poor performing code and also provides you an idea how code is being executed. E.g. this screenshot illustrated the concurrent execution of the Greenlets&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Obviously, this application allows a developer to quickly isolate problematic code using a &lt;a href="https://launchpad.net/squaremap"&gt;square map&lt;/a&gt; since it highlights the largest consumer of time by the largest square and also provides color highlighting when hovering your mouse over the different squares. Neat!&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2529116444426359683?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2529116444426359683/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2529116444426359683&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2529116444426359683'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2529116444426359683'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/05/ui-to-read-application-profiling-data.html' title='RunSnakeRun - View performance profile in Python'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_n2HkB0XD3Kw/SgbwQlFuERI/AAAAAAAABDM/tDp88LSk6sE/s72-c/RunSnakeRun.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-654585931818053543</id><published>2009-05-09T09:52:00.019+08:00</published><updated>2009-05-21T08:18:38.995+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='greenlet'/><category scheme='http://www.blogger.com/atom/ns#' term='eventlet'/><category scheme='http://www.blogger.com/atom/ns#' term='event system'/><category scheme='http://www.blogger.com/atom/ns#' term='concurrent programming'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>Python's Greenlet/Linden Lab's Eventlet</title><content type='html'>Have you guys used Python's &lt;a href="http://pypi.python.org/pypi/greenlet"&gt;greenlet&lt;/a&gt; or &lt;a href="http://wiki.secondlife.com/wiki/Eventlet"&gt;eventlet&lt;/a&gt; before? Its pretty cool and i thought i write something about it. So here's a little experiment i did using a couple of Python's module i.e. &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;optparse&lt;/span&gt;, &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;greenlet&lt;/span&gt;, &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;eventlet&lt;/span&gt;. I have also did some simple measuring to see what was causing latencies etc.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What i did was basically, collected statistics on running a computation against the standard matrix multiplication which is highly parallel. There are 3 programs i created using the normal iterative-approach i.e. for-loop, greenlet and lastly eventlet implementation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Overall, i find that greenlet is much more suitable to eventlet (w.r.t time) and the iterative-approach (w.r.t flexibility and elegance). I include my scripts below at the end of this post.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's how i executed and profiled my script:&lt;/div&gt;&lt;pre&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o eventletprof_10 ./myEventletDemo.py&lt;br /&gt;Sum of matrix multiplication: 285&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o eventletprof_100 ./myEventletDemo.py --i 100&lt;br /&gt;Sum of matrix multiplication: 328350&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o eventletprof_1000 ./myEventletDemo.py --i 1000&lt;br /&gt;Sum of matrix multiplication: 332833500&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o eventletprof_10000 ./myEventletDemo.py --i 10000&lt;br /&gt;Sum of matrix multiplication: 333283335000&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o eventletprof_100000 ./myEventletDemo.py --i 100000&lt;br /&gt;Sum of matrix multiplication: 333328333350000&lt;br /&gt;...&lt;br /&gt;...&lt;br /&gt;ray:~ ray$ python -m cProfile -o noneventletprof_10 ./myNoneventletDemo.py&lt;br /&gt;iteration version of matrix add: 285&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o noneventletprof_100 ./myNoneventletDemo.py --i 100&lt;br /&gt;iteration version of matrix add: 328350&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o noneventletprof_1000 ./myNoneventletDemo.py --i 1000&lt;br /&gt;iteration version of matrix add: 332833500&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o noneventletprof_10000 ./myNoneventletDemo.py --i 10000&lt;br /&gt;iteration version of matrix add: 333283335000&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o noneventletprof_100000 ./myNoneventletDemo.py --i 100000&lt;br /&gt;iteration version of matrix add: 333328333350000&lt;br /&gt;...&lt;br /&gt;...&lt;br /&gt;ray:~ ray$ python -m cProfile -o greenletprof_10 ./myGreenletDemo.py&lt;br /&gt;iteration version of matrix add: 285&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o greenletprof_100 ./myGreenletDemo.py --i 100&lt;br /&gt;iteration version of matrix add: 328350&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o greenletprof_1000 ./myGreenletDemo.py --i 1000&lt;br /&gt;iteration version of matrix add: 332833500&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o greenletprof_10000 ./myGreenletDemo.py --i 10000&lt;br /&gt;iteration version of matrix add: 333283335000&lt;br /&gt;&lt;br /&gt;ray:~ ray$ python -m cProfile -o greenletprof_100000 ./myGreenletDemo.py --i 100000&lt;br /&gt;iteration version of matrix add: 333328333350000&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What profiling does is to output the results into a file (e.g. greenletprof_100000) and you can view the statistics in the Python interpreter using the following:&lt;div&gt;&lt;pre&gt;&lt;br /&gt;Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)&lt;br /&gt;[GCC 4.0.1 (Apple Inc. build 5465)] on darwin&lt;br /&gt;Type "help", "copyright", "credits" or "license" for more information.&lt;br /&gt;&gt;&gt;&gt; import pstats&lt;br /&gt;&gt;&gt;&gt; profiledData = pstats.Stats('greenletprof_100000')&lt;br /&gt;&gt;&gt;&gt; profiledData.print_stats()&lt;br /&gt;&lt;/pre&gt;And you'll see the statistics printed in much detail. However, this is rather tedious so i would suggest a graphical UI to view the profiled data instead and one such option is &lt;a href="http://www.vrplumber.com/programming/runsnakerun/"&gt;RunSnakeRun&lt;/a&gt;.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;Comparing the statistics in detail between the different runs reveals that the iterative approach is the most efficient followed by Greenlet and last eventlet. Here are the statistics in detail:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&gt;&gt;&gt; import pstats&lt;br /&gt;&gt;&gt;&gt; stats10 = pstats.Stats('nongreenleteventlet100000.profile')&lt;br /&gt;&gt;&gt;&gt; stats10.print_stats()&lt;br /&gt;Sun May 10 22:30:16 2009    nongreenleteventlet100000.profile&lt;br /&gt;&lt;br /&gt;      6 function calls in 0.069 CPU seconds&lt;br /&gt;&lt;br /&gt;Random listing order was used&lt;br /&gt;&lt;br /&gt;ncalls  tottime  percall  cumtime  percall filename:lineno(function)&lt;br /&gt;     1    0.060    0.060    0.069    0.069 /home/tayboonl/Desktop/Greenlet_Eventlet/nongreenleteventletdemo.py:8(execMatrixAdd_Iter)&lt;br /&gt;     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}&lt;br /&gt;     1    0.000    0.000    0.069    0.069 &lt;string&gt;:1(&lt;module&gt;)&lt;br /&gt;     3    0.009    0.003    0.009    0.003 {range}&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&gt;&gt;&gt; stats10 = pstats.Stats('greenlet100000.profile')&lt;br /&gt;&gt;&gt;&gt; stats10.print_stats()&lt;br /&gt;Sun May 10 22:16:45 2009    greenlet100000.profile&lt;br /&gt;&lt;br /&gt;      500011 function calls in 1.511 CPU seconds&lt;br /&gt;&lt;br /&gt;....&lt;br /&gt;&gt;&gt;&gt; stats10 = pstats.Stats('eventlet100000.profile')&lt;br /&gt;&gt;&gt;&gt; stats10.print_stats()&lt;br /&gt;Sun May 10 22:20:17 2009    eventlet100000.profile&lt;br /&gt;&lt;br /&gt;      6500027 function calls (5800031 primitive calls) in 34.901 CPU seconds&lt;br /&gt;&lt;/module&gt;&lt;/string&gt;&lt;/pre&gt;&lt;br /&gt;You might have noticed the remarkable number of functions calls made in the scenario using Eventlet. It suffice at this point for me to say is that Eventlet was not designed for high performance computing w.r.t Greenlet since it was designed to be a asynchronous networking library and to me, it makes alot of sense to always profile the application or framework before using it on a wide basis.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In case you are interested, here are the scripts i used &lt;/div&gt;All together, there are 3 in total first being that implemented using Greenlet (Cool stuff!)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;&lt;br /&gt;import greenlet&lt;br /&gt;import optparse&lt;br /&gt;&lt;br /&gt;tasks = []&lt;br /&gt;accum = list()&lt;br /&gt;sum = 0&lt;br /&gt;vecA = vecB = None&lt;br /&gt;&lt;br /&gt;def mulNAccum(idx, val1, val2) :&lt;br /&gt;global accum&lt;br /&gt;accum.insert(idx, val1*val2)&lt;br /&gt;#print idx,val1*val2,accum&lt;br /&gt;&lt;br /&gt;def createTasks(HowMany):&lt;br /&gt;global vecA, vecB&lt;br /&gt;vecA = range(0, HowMany)&lt;br /&gt;vecB = range(0, HowMany)&lt;br /&gt;for i in range(0,HowMany):&lt;br /&gt;       tasks.append(greenlet.greenlet(run=mulNAccum,parent=greenlet.getcurrent()))&lt;br /&gt;&lt;br /&gt;def executeTasks(HowMany):&lt;br /&gt;global tasks&lt;br /&gt;global accum&lt;br /&gt;global sum&lt;br /&gt;&lt;br /&gt;for i in range(0,HowMany):&lt;br /&gt;       tasks[i].switch(i, vecA[i],vecB[i])&lt;br /&gt;#print "tasks finished executing ...\n"&lt;br /&gt;# print accum&lt;br /&gt;for i in range(0, len(accum)):&lt;br /&gt;       sum = sum + accum[i]&lt;br /&gt;&lt;br /&gt;print "Sum of matrix multiplication: %d\n" % sum&lt;br /&gt;&lt;br /&gt;if __name__ == "__main__":&lt;br /&gt;parser = optparse.OptionParser()&lt;br /&gt;parser.add_option("--items", "-i", default=10, action="store", type="int", dest="HowMany", help="number of elements in matrix array to process")&lt;br /&gt;(options, args) = parser.parse_args()&lt;br /&gt;&lt;br /&gt;createTasks(options.HowMany)&lt;br /&gt;executeTasks(options.HowMany)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Similar program using the for-loop a.k.a iterative approach&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;&lt;br /&gt;import optparse&lt;br /&gt;&lt;br /&gt;sum = 0&lt;br /&gt;vecA = vecB = None&lt;br /&gt;&lt;br /&gt;def execMatrixAdd_Iter(HowMany):&lt;br /&gt;global vecA, vecB, sum&lt;br /&gt;vecA = range(0, HowMany)&lt;br /&gt;vecB = range(0, HowMany)&lt;br /&gt;for i in range(0, HowMany):&lt;br /&gt;       temp = vecA[i] * vecB[i]&lt;br /&gt;       sum = temp + sum&lt;br /&gt;print "iteration version of matrix add: %d\n" % sum&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;if __name__ == "__main__":&lt;br /&gt;parser = optparse.OptionParser()&lt;br /&gt;parser.add_option("--items", "-i", default=10, action="store", type="int", dest="HowMany", help="number of elements in matrix array to process")&lt;br /&gt;(options, args) = parser.parse_args()&lt;br /&gt;execMatrixAdd_Iter(options.HowMany)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Last script using Linden Lab's eventlet&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;&lt;br /&gt;import eventlet&lt;br /&gt;import eventlet.api&lt;br /&gt;import optparse&lt;br /&gt;&lt;br /&gt;tasks = []&lt;br /&gt;accum = list()&lt;br /&gt;sum = 0&lt;br /&gt;vecA = vecB = None&lt;br /&gt;&lt;br /&gt;def mulNAccum(idx, val1, val2) :&lt;br /&gt;global accum&lt;br /&gt;accum.insert(idx, val1*val2)&lt;br /&gt;#print idx,val1*val2,accum&lt;br /&gt;&lt;br /&gt;def createTasks(HowMany):&lt;br /&gt;global vecA, vecB&lt;br /&gt;vecA = range(0, HowMany)&lt;br /&gt;vecB = range(0, HowMany)&lt;br /&gt;for i in range(0,HowMany):&lt;br /&gt;       tasks.append(eventlet.api.spawn(mulNAccum, i, vecA[i-1], vecB[i-1]))&lt;br /&gt;&lt;br /&gt;def executeTasks(HowMany):&lt;br /&gt;global tasks&lt;br /&gt;global accum&lt;br /&gt;global sum&lt;br /&gt;&lt;br /&gt;for i in range(0,HowMany):&lt;br /&gt;       tasks[i].switch()&lt;br /&gt;#print "tasks finished executing ...\n"&lt;br /&gt;# print accum&lt;br /&gt;for i in range(0, len(accum)):&lt;br /&gt;       sum = sum + accum[i]&lt;br /&gt;&lt;br /&gt;print "Sum of matrix multiplication: %d\n" % sum&lt;br /&gt;&lt;br /&gt;if __name__ == "__main__":&lt;br /&gt;parser = optparse.OptionParser()&lt;br /&gt;parser.add_option("--items", "-i", default=10, action="store", type="int", dest="HowMany", help="number of elements in matrix array to process")&lt;br /&gt;(options, args) = parser.parse_args()&lt;br /&gt;&lt;br /&gt;createTasks(options.HowMany)&lt;br /&gt;executeTasks(options.HowMany)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-654585931818053543?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/654585931818053543/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=654585931818053543&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/654585931818053543'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/654585931818053543'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/05/pythons-greenletlinden-labs-eventlet.html' title='Python&apos;s Greenlet/Linden Lab&apos;s Eventlet'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-567404867808701887</id><published>2009-04-19T11:08:00.004+08:00</published><updated>2009-04-26T09:06:54.294+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 2.1'/><title type='text'>Designing efficient sorting algorithms for manycore GPUs</title><content type='html'>There's a link "&lt;a href="http://sites.google.com/site/cudaiap2009/materials-1/research-papers-1/designingefficientsortingalgorithmsformanycoregpus"&gt;Design efficient sorting algorithms for many core GPUs&lt;/a&gt;" which i recommend people to read to understand and think about how CUDA can help implement fast sorting algorithms which has benefitted me which i thought to share with everyone&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div&gt;------------ Updated 26 Apr'09 ----------&lt;/div&gt;&lt;div&gt;On a similar note, here's 2 illustrations from a recent report i read on Nvidia's CUDA which you can download by clicking &lt;a href="http://www.nvidia.com/docs/IO/55972/220401_Reprint.pdf"&gt;here&lt;/a&gt;. I believe these two pictures would incite curiosity in you :)&lt;/div&gt;&lt;div&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 202px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SfOzN2yozMI/AAAAAAAABDE/Cvx9epv3yWo/s320/parallel_processing_with_cuda_2.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5328799834737003714" /&gt;&lt;img style="float:right; margin:0 0 10px 10px;cursor:pointer; cursor:hand;width: 320px; height: 255px;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SfOzNtK-3oI/AAAAAAAABC8/ImSFc6RHYbE/s320/parallel_processing_with_cuda.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5328799832154758786" /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-567404867808701887?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/567404867808701887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=567404867808701887&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/567404867808701887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/567404867808701887'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/04/designing-efficient-sorting-algorithms.html' title='Designing efficient sorting algorithms for manycore GPUs'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_n2HkB0XD3Kw/SfOzN2yozMI/AAAAAAAABDE/Cvx9epv3yWo/s72-c/parallel_processing_with_cuda_2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4693790776671476064</id><published>2009-03-26T11:25:00.010+08:00</published><updated>2009-03-26T12:28:37.716+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 2.1'/><title type='text'>CUDA 2.1 Nbody linkage issue under Mac OS X</title><content type='html'>Just hours ago, i downloaded &lt;a href="http://www.nvidia.com/object/cuda_home.html#"&gt;CUDA 2.1&lt;/a&gt; from NVIDIA's &lt;a href="http://www.nvidia.com/object/cuda_home.html#"&gt;website&lt;/a&gt; and the reason i did that was because a fellow CUDA enthusiast ran into some problem while compiling the samples and in particular N-body. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Well the problem was that there's a duplicate symbol/definition of the name &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;beginWinCoords&lt;/span&gt;&lt;/span&gt; where a particular static library had 2 definitions and this is disallowed by the standards. So the resolution was to remove the directories under &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;/Developer/CUDA&lt;/span&gt;&lt;/span&gt; and reinstall the CUDA 2.1 SDK entirely.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In detail, here's what happened.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When i first installed CUDA 2.1 ontop of CUDA 2.0 and next attempt to rebuild the sample projects, in particular Nbody i hit into the following problem:&lt;/div&gt;&lt;pre&gt;&lt;br /&gt;ray:nbody ray$ make&lt;br /&gt;ld: duplicate symbol beginWinCoords()     in ../../lib/libparamgl.a(paramgl.cpp.o) and ../../lib/libparamgl.a(paramgl.cpp_o)&lt;br /&gt;collect2: ld returned 1 exit status&lt;br /&gt;make: *** [../../bin/darwin/release/nbody] Error 1&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What's happened is that there are two definitions of the same function i.e. &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;beginWinCoords()&lt;/span&gt;&lt;/span&gt; in the static archived file &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;libparamgl.a &lt;/span&gt;&lt;/span&gt;(.a is the file extension for static archive libraries instead of .dylib, .so, .dll)&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Next thing i did was to compare the contents of the two static libraries and found that the 2.1 version had 1 definition (great) while the 2.0 version had two definitions. See below for the comparison (If you are unfamiliar with &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;nm&lt;/span&gt;, consult your man page for your platform)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;ray:Developer ray$ nm CUDA/lib/libparamgl.a | grep beginWinCoords&lt;br /&gt;000016b0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;000019c4 S __Z14beginWinCoordsv.eh&lt;/pre&gt;&lt;pre&gt;&lt;br /&gt;ray:Developer ray$ nm CUDA2.0/lib/libparamgl.a | grep beginWinCoords&lt;/pre&gt;&lt;pre&gt;000017e0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;00001af8 S __Z14beginWinCoordsv.eh&lt;br /&gt;000017e0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;00001af8 S __Z14beginWinCoordsv.eh&lt;br /&gt;000017e0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;00001af8 S __Z14beginWinCoordsv.eh&lt;br /&gt;000016b0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;000019c4 S __Z14beginWinCoordsv.eh&lt;br /&gt;000016b0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;000019c4 S __Z14beginWinCoordsv.eh&lt;br /&gt;000016b0 s __GLOBAL__I__Z14beginWinCoordsv&lt;br /&gt;00000000 a __GLOBAL__I__Z14beginWinCoordsv.eh&lt;br /&gt;00000000 T __Z14beginWinCoordsv&lt;br /&gt;000019c4 S __Z14beginWinCoordsv.eh&lt;br /&gt;ray:Developer ray$&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;The latter output indicated the two object files within &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;libparamgl.a&lt;/span&gt;&lt;/span&gt; that caused the duplicate symbol/ definition issue. Okie Dokie, have fun!&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4693790776671476064?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4693790776671476064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4693790776671476064&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4693790776671476064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4693790776671476064'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/03/cuda-21-nbody-linkage-issue-under-mac.html' title='CUDA 2.1 Nbody linkage issue under Mac OS X'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-5206839960317916464</id><published>2009-03-17T08:24:00.004+08:00</published><updated>2009-03-17T08:34:39.688+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SecondLife'/><title type='text'>Second Life featured in Channel New Asia</title><content type='html'>Guess what?! &lt;a href="http://www.secondlife.com/"&gt;Second Life&lt;/a&gt; is featured in &lt;a href="http://www.channelnewsasia.com/"&gt;Channel News Asia&lt;/a&gt; :). Read about the entire article &lt;a href="http://www.channelnewsasia.com/stories/technologyfeatures/view/415583/1/.html"&gt;here&lt;/a&gt;.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here's an excerpt of that article from Linden Lab CEO Mark Kingdon&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;More than 1.3 million US dollars worth of transactions reportedly take place daily in Second Life, where the currency is the Linden dollar.   There are more than 15,000 merchants in Second Life selling snippets of computer code that become clothing, hair, art work or other items for avatars.   People spent 360 million dollars (US) in Second Life last year, according to Linden.   Schools continue to use Second Life for online classrooms and bands perform on in-world stages, albeit to sometimes meager audiences.   Linden is making avatar tools easier and "reworking the user experience," according to Kingdon.   "We have hired a world-class-team to lead the changes," Kingdon said. "You ain't seen nothing yet. A lot of work is going to be done in the next 9 to 12 months."&lt;/span&gt;&lt;/blockquote&gt;Its very very exciting to be working for this revolutionary company and with so many great talented people :D makes me wanna jump up from my bed everyday and say "What shall we achieve today!"&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-5206839960317916464?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/5206839960317916464/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=5206839960317916464&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5206839960317916464'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5206839960317916464'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/03/second-life-featured-in-channel-new.html' title='Second Life featured in Channel New Asia'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-994460824341490476</id><published>2009-02-26T10:24:00.013+08:00</published><updated>2009-02-26T16:52:33.555+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sun Studio 12'/><category scheme='http://www.blogger.com/atom/ns#' term='profiling'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><title type='text'>Maximizing application performance with Sun Studio 12</title><content type='html'>&lt;a href="http://blogs.sun.com/d"&gt;Darryl Gove&lt;/a&gt;'s blog entry has good notes on improving application performance on x86/SPARC platforms. I've pasted some stuff from his webex session but i do encourage you to listen to it, click &lt;a href="https://sun-developersondemand.webex.com/ec0600l/eventcenter/recording/recordAction.do;jsessionid=9g4nJlRQMGtTdTHVn2LD66W15DpvTywbyctjtkLJksyQkJ17QQCB!-1110377682?theAction=poprecord&amp;amp;actname=%2Feventcenter%2Fframe%2Fg.do&amp;amp;renewticket=0&amp;amp;renewticket=0&amp;amp;apiname=lsr.php&amp;amp;actappname=ec0600l&amp;amp;entappname=url0106l&amp;amp;needFilter=false&amp;amp;&amp;amp;CID=2347&amp;amp;isurlact=true&amp;amp;rID=27631982&amp;amp;entactname=%2FnbrRecordingURL.do&amp;amp;rKey=C15CE510D30BE48C&amp;amp;recordID=27631982&amp;amp;siteurl=sun-developersondemand&amp;amp;rnd=2361120517&amp;amp;SP=EC&amp;amp;AT=pb&amp;amp;format=short"&gt;here&lt;/a&gt;. Otherwise, i've taken some screen shots from the &lt;a href="https://sun-developersondemand.webex.com/ec0600l/eventcenter/recording/recordAction.do;jsessionid=9g4nJlRQMGtTdTHVn2LD66W15DpvTywbyctjtkLJksyQkJ17QQCB!-1110377682?theAction=poprecord&amp;amp;actname=%2Feventcenter%2Fframe%2Fg.do&amp;amp;renewticket=0&amp;amp;renewticket=0&amp;amp;apiname=lsr.php&amp;amp;actappname=ec0600l&amp;amp;entappname=url0106l&amp;amp;needFilter=false&amp;amp;&amp;amp;CID=2347&amp;amp;isurlact=true&amp;amp;rID=27631982&amp;amp;entactname=%2FnbrRecordingURL.do&amp;amp;rKey=C15CE510D30BE48C&amp;amp;recordID=27631982&amp;amp;siteurl=sun-developersondemand&amp;amp;rnd=2361120517&amp;amp;SP=EC&amp;amp;AT=pb&amp;amp;format=short"&gt;webex session&lt;/a&gt; and paste them here for your convenience.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXOFEabII/AAAAAAAABCc/xTQS7i1zhz4/s1600-h/target_hardware.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXOFEabII/AAAAAAAABCc/xTQS7i1zhz4/s320/target_hardware.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307025110293179522" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXN6l7ROI/AAAAAAAABCU/jWVgsktnblk/s1600-h/Target+hardware+32-bit+or+64-bit.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXN6l7ROI/AAAAAAAABCU/jWVgsktnblk/s320/Target+hardware+32-bit+or+64-bit.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307025107480954082" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SaZXNhOfCKI/AAAAAAAABCM/YtqmvNMjZyM/s1600-h/Profile_feedback.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SaZXNhOfCKI/AAAAAAAABCM/YtqmvNMjZyM/s320/Profile_feedback.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307025100671748258" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXNsBc7WI/AAAAAAAABCE/YTLTA_9PlJA/s1600-h/optimization.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXNsBc7WI/AAAAAAAABCE/YTLTA_9PlJA/s320/optimization.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307025103569874274" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXNriw9HI/AAAAAAAABB8/qLVnAzrQ1iE/s1600-h/Leveraging_libraries.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXNriw9HI/AAAAAAAABB8/qLVnAzrQ1iE/s320/Leveraging_libraries.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307025103441163378" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZW7Ehu_ZI/AAAAAAAABB0/mkbBy_OSBo8/s1600-h/Instruction_set_extensions.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZW7Ehu_ZI/AAAAAAAABB0/mkbBy_OSBo8/s320/Instruction_set_extensions.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024783730212242" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZW65JHvtI/AAAAAAAABBs/pkoaGJBuvhs/s1600-h/Instruction+set+extensions+2.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZW65JHvtI/AAAAAAAABBs/pkoaGJBuvhs/s320/Instruction+set+extensions+2.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024780674186962" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZW6sYFXUI/AAAAAAAABBk/Bv0uN1cV0Oc/s1600-h/Inlining_cross-file_optimization.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZW6sYFXUI/AAAAAAAABBk/Bv0uN1cV0Oc/s320/Inlining_cross-file_optimization.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024777247284546" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZW6nnNSDI/AAAAAAAABBc/4L2XHTPh2IU/s1600-h/Increasing+optimization.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZW6nnNSDI/AAAAAAAABBc/4L2XHTPh2IU/s320/Increasing+optimization.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024775968540722" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SaZW6ikWoaI/AAAAAAAABBU/CMprrD-jw7I/s1600-h/Good_practices.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SaZW6ikWoaI/AAAAAAAABBU/CMprrD-jw7I/s320/Good_practices.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024774614393250" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZWlLm-IgI/AAAAAAAABBM/9hpKwDj7Hn0/s1600-h/Gathering_profiles.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZWlLm-IgI/AAAAAAAABBM/9hpKwDj7Hn0/s320/Gathering_profiles.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024407674102274" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZWk3fYSvI/AAAAAAAABBE/N8wSUGP9Fc4/s1600-h/Exploring_-fast.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZWk3fYSvI/AAAAAAAABBE/N8wSUGP9Fc4/s320/Exploring_-fast.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024402273553138" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SaZWksRaVvI/AAAAAAAABA8/Y0ADh5iHK50/s1600-h/Debug+information.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SaZWksRaVvI/AAAAAAAABA8/Y0ADh5iHK50/s320/Debug+information.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024399262177010" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZWkocZOiI/AAAAAAAABA0/GEQ-l665yrY/s1600-h/Checklist.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SaZWkocZOiI/AAAAAAAABA0/GEQ-l665yrY/s320/Checklist.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024398234499618" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SaZWkUvjF0I/AAAAAAAABAs/i-w66fZNk90/s1600-h/Algorithmic+complexity.png"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 320px; height: 250px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SaZWkUvjF0I/AAAAAAAABAs/i-w66fZNk90/s320/Algorithmic+complexity.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5307024392946128706" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;I do apologize that i've not sorted them out, i'll get to them soon.&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-994460824341490476?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/994460824341490476/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=994460824341490476&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/994460824341490476'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/994460824341490476'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/02/maximizing-application-performance-with.html' title='Maximizing application performance with Sun Studio 12'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_n2HkB0XD3Kw/SaZXOFEabII/AAAAAAAABCc/xTQS7i1zhz4/s72-c/target_hardware.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8055372062144969137</id><published>2009-02-22T10:40:00.019+08:00</published><updated>2009-02-22T11:20:46.860+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='CUBLAS'/><category scheme='http://www.blogger.com/atom/ns#' term='CUFFT'/><title type='text'>Notes on CUBLAS &amp; CUFFT</title><content type='html'>Some stuff you need to be aware of while using &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;CUBLAS&lt;/span&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;CUBLAS uses column-major storage and 1-based indexing (because to maintain compatibility with Fortran and btw, i'm not expert in Fortran but i know at least this much about it); contrast this to C and C++'s of using row-major storage and 0-based indexing.&lt;/li&gt;&lt;/ul&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;To overcome this while using CUBLAS in your C and C++ code, you can use macros or&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;inline functions to implement matrices on top of 1-dimensional arrays. As i code in C&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;and C++ primarily since i don't understand Fortran, the macro i used as recommended&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;in Nvidia CUDA's documentation is&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;   &lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;#define IDX2C(i, j, ld) (((j)*(ld))+(i))&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;So here's a situation how i use the above macro to compute the index of an 1-D array&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;  &lt;/span&gt;element using what looks like a 2-D matrix:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;   &lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;for( int j = 0; j &amp;lt; N; ++j) {&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;    &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;for( int i = 0; i &amp;lt; M; ++i) {&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;     &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;a[ IDX2C(i, j, M) ] = value;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;    &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;}&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;   &lt;/span&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;}&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;   &lt;/span&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;   &lt;/span&gt;// call a CUBLAS function (e.g. cublasScal) and using IDX2C to compute array index&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;&lt;span class="Apple-tab-span" style="white-space:pre"&gt;   &lt;/span&gt;cublasScal(...,..., &amp;amp;a[IDX2C(p,q,ldm)], ldm);&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;Include the header file &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;cublas.h&lt;/span&gt; into your programs in case you forget. Normally, i don't use the syntax &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;#include &amp;lt;cublas.h&amp;gt;&lt;/span&gt; but instead &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;#include "cublas.h"&lt;/span&gt; for obvious reasons.&lt;/li&gt;&lt;li&gt;Remember to link your apps with the dynamic library provided &lt;span class="Apple-style-span" style="font-style: italic;"&gt;cublas.so&lt;/span&gt; (Linux), &lt;span class="Apple-style-span" style="font-style: italic;"&gt;cublash.dll&lt;/span&gt; (Windows) &amp;amp; &lt;span class="Apple-style-span" style="font-style: italic;"&gt;cublas.dylib&lt;/span&gt; (Mac OS X) and they have the dynamic libraries for emulation purposes with naming conventions like &lt;span class="Apple-style-span" style="font-style: italic;"&gt;cublas&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;emu&lt;/span&gt;.so&lt;/span&gt; etc.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Notes of using &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;CUFFT&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The NVIDIA CUDA implemented their version of FFT using &lt;a href="http://www.fftw.org/"&gt;FFTW&lt;/a&gt; and follows what's known as a &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;plan&lt;/span&gt;&lt;/span&gt; - which specifies the optimal or minimal number of flops for execution.&lt;/li&gt;&lt;li&gt;Depending on your graphics cards shared memory configuration, its best if you can fit your computation entirely in the CUDA's shared memory to minimize use of global memory. &lt;/li&gt;&lt;li&gt;Refer to the CUFFT documentation for more details as its continually evolving as hardware and software continue to mature over the next few months.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8055372062144969137?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8055372062144969137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8055372062144969137&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8055372062144969137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8055372062144969137'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/02/notes-on-cublas-cufft.html' title='Notes on CUBLAS &amp; CUFFT'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8999257539627935107</id><published>2009-02-22T10:07:00.005+08:00</published><updated>2009-03-19T07:34:28.059+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 2.0'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 1.1'/><title type='text'>CUDA resources</title><content type='html'>Here are links that illustrate how CUDA works and how it can help in maximizing the parallelism inherently in your Nvidia graphics cards.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/207200659"&gt;Part1 - Introduction to CUDA&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/207402986"&gt;Part2 - Explanation of CUDA programming constructs&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/207603131"&gt;Part3 - Error handling and CUDA global memory limitations&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/architect/208401741"&gt;Part4 - Understanding CUDA shared memory&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/208801731"&gt;Part5 - How to use CUDA shared memory effectively&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/architect/209601096"&gt;Part6 - Profiling CUDA code&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/210102115"&gt;Part7 - Current and future implementations&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/210602684"&gt;Part8 - Using CUBLAS and CUFFT (CUDA versions of Fast Fourier Transformation &amp;amp; Basic Linear Algebra Subprograms)&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/architect/211800683"&gt;Part9 - Extending CUDA to support other programming languages&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/cpp/212903437"&gt;Part10 - CUDPP&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://www.ddj.com/hpc-high-performance-computing/215900921"&gt;Part11 - Revisiting CUDA memory spaces&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Personally, i like to thank &lt;a href="http://emslbios.pnl.gov/id/farber_r"&gt;Rob Farber&lt;/a&gt; for this series of &lt;span class="Apple-style-span" style="font-weight: bold;"&gt;REALLY&lt;/span&gt; awesome articles published on &lt;a href="http://www.ddj.com/"&gt;Dr. Dobb's Journal &lt;/a&gt;which is another great platform for developers world-wide. Yay! &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;While using CUDA to implement your test or commercial applications, its likely you'll create your own arrays/matrices or vectors which is fine but i would recommend that you re-implement that part of your program that uses these constructs using the CUBLAS library. Remember to try out the CUFFT library as well.&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8999257539627935107?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8999257539627935107/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8999257539627935107&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8999257539627935107'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8999257539627935107'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/02/cuda-resources.html' title='CUDA resources'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-449537482826291755</id><published>2009-02-16T09:47:00.010+08:00</published><updated>2009-02-18T15:03:47.775+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Linux'/><category scheme='http://www.blogger.com/atom/ns#' term='SecondLife'/><title type='text'>Second Life on Linux is out</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SZjMEQkKw8I/AAAAAAAAA_U/oLc3FR6NTQo/s1600-h/sl_lj_frontpage.jpg"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 240px; height: 320px;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SZjMEQkKw8I/AAAAAAAAA_U/oLc3FR6NTQo/s320/sl_lj_frontpage.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303212934766969794" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SZjMDl5o5DI/AAAAAAAAA-0/R1e-BozNZDw/s1600-h/sl_lj_1.jpeg"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 238px; height: 320px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SZjMDl5o5DI/AAAAAAAAA-0/R1e-BozNZDw/s320/sl_lj_1.jpeg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303212923314299954" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SZjMD2az6MI/AAAAAAAAA-8/MNJyOsqudNM/s1600-h/sl_lj_2.jpeg"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 226px; height: 320px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SZjMD2az6MI/AAAAAAAAA-8/MNJyOsqudNM/s320/sl_lj_2.jpeg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303212927748401346" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SZjMD8ADPQI/AAAAAAAAA_E/z4xgkxDwb5w/s1600-h/sl_lj_3.jpeg"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 238px; height: 320px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SZjMD8ADPQI/AAAAAAAAA_E/z4xgkxDwb5w/s320/sl_lj_3.jpeg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303212929246772482" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SZjMEYG4ycI/AAAAAAAAA_M/HiwO384XD0c/s1600-h/sl_lj_4.jpeg"&gt;&lt;img style="cursor:pointer; cursor:hand;width: 234px; height: 320px;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SZjMEYG4ycI/AAAAAAAAA_M/HiwO384XD0c/s320/sl_lj_4.jpeg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303212936791640514" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Second Life has been able to run on the Linux for some months now but there were glitches on the voice-chats but it looks good now. Have fun and try it out.&lt;/div&gt;&lt;div&gt;If anyone knows whether i've infringe any copyrights, do let me know and i'll take this down.&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-449537482826291755?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/449537482826291755/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=449537482826291755&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/449537482826291755'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/449537482826291755'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/02/second-life-on-linux-is-out.html' title='Second Life on Linux is out'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/SZjMEQkKw8I/AAAAAAAAA_U/oLc3FR6NTQo/s72-c/sl_lj_frontpage.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2162862968845538368</id><published>2009-02-01T10:29:00.023+08:00</published><updated>2009-02-01T17:01:00.223+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 2.0'/><category scheme='http://www.blogger.com/atom/ns#' term='CUDA 1.1'/><category scheme='http://www.blogger.com/atom/ns#' term='Nvidia'/><title type='text'>Getting started with CUDA 2.0</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SYUg_23iyuI/AAAAAAAAA-E/WqZ1_eiKwgU/s1600-h/oceanFFT.png"&gt;&lt;/a&gt;I'm going to run a couple of stuff from this article onwards using &lt;a href="http://www.nvidia.com/"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Nvidia&lt;/span&gt;&lt;/a&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;'s &lt;/span&gt;&lt;a href="http://www.nvidia.com/object/cuda_home.html"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;CUDA&lt;/span&gt;&lt;/a&gt; technology - really cool stuff. But there're some glitches in the installation w.r.t installing &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;CUDA 2.0&lt;/span&gt;&lt;/span&gt; straight away and when you run the example code(s), you'll notice that you are running in emulation mode - i.e. its not running on the GPU (&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(102, 51, 255);"&gt;Graphics Processing Unit&lt;/span&gt;&lt;/span&gt;) which your machine has. &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In my case, i have two Nvidia graphics cards loaded on my &lt;a href="http://www.apple.com/"&gt;MBP&lt;/a&gt; 10.5.6 and when you first installed CUDA 2.0, you will not be able to run the examples or any of your code in the GPU itself because the CUDA driver was never installed !!! Nividia's website doesn't tell you straight that you need to install CUDA 1.1 Toolkit prior to installing CUDA 2.0 Toolkit so go ahead and do it. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Make sure the CUDA driver is installed and the way to check it is to go to &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;/System/Library/Extensions&lt;/span&gt;&lt;/span&gt;/ and look for the file "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;CUDA.kext&lt;/span&gt;&lt;/span&gt;". Should be there by now and updated to CUDA 2.0 driver. Yay!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once you are done with that, build the example code(s) given like this:&lt;/div&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;Go to &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;/Developer/CUDA&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Build the example codes using "make" or "make dbg=1, "make emu=1" and "make emu=1 dbg=1" which will create the directories of "release", "debug", "emu" etc.&lt;/li&gt;&lt;li&gt;Run the example code "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;deviceQuery&lt;/span&gt;&lt;/span&gt;" either in the directory "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;release&lt;/span&gt;" or "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;debug&lt;/span&gt;" etc but that's beside the point because they should all give the same result once the CUDA driver is installed properly&lt;/li&gt;&lt;/ol&gt;Here's the sample output after all these steps are done:&lt;/div&gt;&lt;pre&gt;ray:release ray$ ./deviceQuery&lt;br /&gt;There are 2 devices supporting CUDA&lt;br /&gt;&lt;br /&gt;Device 0: "GeForce 9600M GT"&lt;br /&gt;Major revision number:                         1&lt;br /&gt;Minor revision number:                         337500&lt;br /&gt;Total amount of global memory:                 536543232 bytes&lt;br /&gt;Number of multiprocessors:                     1&lt;br /&gt;Number of cores:                               8&lt;br /&gt;Total amount of constant memory:               1 bytes&lt;br /&gt;Total amount of shared memory per block:       16384 bytes&lt;br /&gt;Total number of registers available per block: 8192&lt;br /&gt;Warp size:                                     32&lt;br /&gt;Maximum number of threads per block:           512&lt;br /&gt;Maximum sizes of each dimension of a block:    512 x 512 x 64&lt;br /&gt;Maximum sizes of each dimension of a grid:     65535 x 65535 x 1&lt;br /&gt;Maximum memory pitch:                          262144 bytes&lt;br /&gt;Texture alignment:                             256 bytes&lt;br /&gt;Clock rate:                                    0.07 GHz&lt;br /&gt;Concurrent copy and execution:                 No&lt;br /&gt;&lt;br /&gt;Device 1: "GeForce 9400M"&lt;br /&gt;Major revision number:                         3&lt;br /&gt;Minor revision number:                         250000&lt;br /&gt;Total amount of global memory:                 266010624 bytes&lt;br /&gt;Number of multiprocessors:                     1&lt;br /&gt;Number of cores:                               8&lt;br /&gt;Total amount of constant memory:               1 bytes&lt;br /&gt;Total amount of shared memory per block:       16384 bytes&lt;br /&gt;Total number of registers available per block: 8192&lt;br /&gt;Warp size:                                     32&lt;br /&gt;Maximum number of threads per block:           512&lt;br /&gt;Maximum sizes of each dimension of a block:    512 x 512 x 64&lt;br /&gt;Maximum sizes of each dimension of a grid:     65535 x 65535 x 1&lt;br /&gt;Maximum memory pitch:                          262144 bytes&lt;br /&gt;Texture alignment:                             256 bytes&lt;br /&gt;Clock rate:                                    0.07 GHz&lt;br /&gt;Concurrent copy and execution:                 No&lt;br /&gt;&lt;br /&gt;Test PASSED&lt;br /&gt;&lt;br /&gt;Press ENTER to exit...&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If the driver was not present, then all you'll get is an message "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;There is no device supporting CUDA&lt;/span&gt;&lt;/span&gt;"&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Also make sure that the shell variables &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;PATH&lt;/span&gt; and &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;DYLD_LIBRARY_PATH&lt;/span&gt; are populated properly to something of the effect&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;export&lt;/span&gt; &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;PATH&lt;/span&gt;=&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;/usr/local/cuda/bin:$PATH&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;export&lt;/span&gt; &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;DYLD_LIBRARY_PATH&lt;/span&gt;=&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;/usr/local/cuda/lib:$DYLD_LIBRARY_PATH&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;Note: The above is for the &lt;a href="http://www.gnu.org/software/bash"&gt;Bash&lt;/a&gt; shell.&lt;/div&gt;&lt;div&gt;If you don't do the above, you're NOT going to be able to run any of the example codes with a error message like "&lt;span class="Apple-style-span" style="color: rgb(255, 0, 0);"&gt;dyld: Library not loaded: @rpath/libcudart.dylib&lt;/span&gt;" which refers to the runtime library of CUDA.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;After all that is done, i ran the "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;oceanFFT&lt;/span&gt;" and if everything runs ay-ok on your machine, you would see something like what i have below:&lt;/div&gt;&lt;div&gt;&lt;img src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SYUg_23iyuI/AAAAAAAAA-E/WqZ1_eiKwgU/s320/oceanFFT.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5297676818103585506" style="float: left; margin-top: 0px; margin-right: 10px; margin-bottom: 10px; margin-left: 0px; cursor: pointer; width: 320px; height: 200px; " /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If you check the output of the terminal screen, you will notice that CUDA is using my slightly more powerful graphics card i.e. &lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;GeForce 9600M GT&lt;/span&gt; instead of the &lt;span class="Apple-style-span" style="color: rgb(51, 204, 0);"&gt;GeForce 9400M&lt;/span&gt; and the reason is because i've configured my MBP's to do that via the &lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;System Preferences -&gt; Energy Saver&lt;/span&gt; and select "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Higher Performance-&gt;Power Adapter"&lt;/span&gt; and set "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Computer Sleep&lt;/span&gt;&lt;/span&gt;" and "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Display Sleep&lt;/span&gt;&lt;/span&gt;" to "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Never&lt;/span&gt;&lt;/span&gt;" so that Mac will automatically pick out your more powerful graphics card :) neat trick i learnt from my colleagues at work! Go Mac! &lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2162862968845538368?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2162862968845538368/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2162862968845538368&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2162862968845538368'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2162862968845538368'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/02/getting-started-with-cuda-20.html' title='Getting started with CUDA 2.0'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_n2HkB0XD3Kw/SYUg_23iyuI/AAAAAAAAA-E/WqZ1_eiKwgU/s72-c/oceanFFT.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-288807835801741201</id><published>2009-01-22T21:28:00.008+08:00</published><updated>2009-01-22T21:49:23.720+08:00</updated><title type='text'>Connect to Internet using Nokia E71 and Macbook Pro</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SXh4Kf_N0aI/AAAAAAAAA9U/rvSbmEiMEk8/s1600-h/Advanced+tab+info.png"&gt;&lt;/a&gt;Basically, this weblog has been done by a previous writer, click &lt;a href="http://www.2nrds.com/using-the-nokia-e71-as-a-modem-on-a-mac"&gt;here&lt;/a&gt; to read the original article but i've adapted this article for people living in Singapore and hold a &lt;a href="http://www.ideas.singnet.com/"&gt;Singnet&lt;/a&gt; account and wish to use their Nokia E71 to connect to the internet via the mobile phone.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Step-by-step:&lt;/div&gt;&lt;div&gt;============&lt;/div&gt;&lt;div&gt;1) Read &lt;a href="http://mroth.info/blog/2008/01/26/nokia-hsdpa-macosx-leopard/"&gt;Mathew Rothenberg's blog&lt;/a&gt; and get the script for Nokia E71. Follow the instructions.&lt;/div&gt;&lt;div&gt;2) Read the blog at &lt;a href="http://www.2nrds.com/using-the-nokia-e71-as-a-modem-on-a-mac"&gt;here&lt;/a&gt; to get an general understanding. For techies, this should be easy but for non-techies it's pretty easy going too :D&lt;/div&gt;&lt;div&gt;3) For me, i'm using the USB cable to connect mobile phone and MacBook Pro OS X 10.5.6&lt;/div&gt;&lt;div&gt;4) For Singnet E-Ideas, you would need the following configuration&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Telephone Number: e-ideas&lt;/li&gt;&lt;li&gt;User name: 65IDEAS&lt;/li&gt;&lt;li&gt;Password : &lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;&lt;your ideas="" password=""&gt;&lt;/your&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 258px;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SXh3GO-PnJI/AAAAAAAAA9M/7w6nOXpi0MA/s320/Nokia+E71+to+Internet.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5294112310956104850" /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;5) Click on the Advanced tab and make sure that you select HSDPA and CID=1 (Note that the CID could be different)&lt;/div&gt;&lt;div&gt;&lt;img src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SXh4Kf_N0aI/AAAAAAAAA9U/rvSbmEiMEk8/s320/Advanced+tab+info.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5294113483754688930" style="float: left; margin-top: 0px; margin-right: 10px; margin-bottom: 10px; margin-left: 0px; cursor: pointer; width: 320px; height: 230px; " /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;6) Click on "Connect" and if all goes well, then you should be connected.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hope this helps&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-288807835801741201?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/288807835801741201/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=288807835801741201&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/288807835801741201'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/288807835801741201'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/01/connect-to-internet-using-nokia-e71-and.html' title='Connect to Internet using Nokia E71 and Macbook Pro'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/SXh3GO-PnJI/AAAAAAAAA9M/7w6nOXpi0MA/s72-c/Nokia+E71+to+Internet.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3550559001439762363</id><published>2009-01-18T10:45:00.008+08:00</published><updated>2009-01-18T11:15:03.354+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mac os'/><category scheme='http://www.blogger.com/atom/ns#' term='instrument'/><category scheme='http://www.blogger.com/atom/ns#' term='profiling'/><title type='text'>Mac OS X "Instruments"</title><content type='html'>Been starting to use Mac OS for 1 month plus and i would say that it really offers developers a plethora of tools and for hardcore Mac fans, i don't think its a surprise but being a newbie user myself there are stuff which i am beginning to appreciate.&lt;br /&gt;&lt;br /&gt;So the one thing i found was "&lt;a href="http://www.apple.com/macosx/developertools/instruments.html"&gt;Instruments&lt;/a&gt;" which is a cool tool to use as it allows me to profile the &lt;a href="http://www.secondlife.com/"&gt;Second life&lt;/a&gt; viewer when it starts up etc. Using the user friendly UI, you can select the stuff you want to see/profile via the "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Library&lt;/span&gt;" and simply click on "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Record&lt;/span&gt;" to start and of course the same button allows you to "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Stop&lt;/span&gt;" the profiling as well. &lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 118px; height: 320px;" src="http://1.bp.blogspot.com/_n2HkB0XD3Kw/SXKaxkBW5NI/AAAAAAAAA88/nEBxzaPDvmM/s320/Library+panel.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5292462688387392722" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For starters, i opened up the Library as in the screen shot above to select what i would like to profile and next thing is to simply select the application and begin :) really easy and after that, voila! the stuff comes out like below &lt;/div&gt;&lt;img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 260px;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SXKcQHIlczI/AAAAAAAAA9E/qtObxV__LJ8/s320/Main+profile+panel.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5292464312720651058" /&gt;&lt;div&gt;but the caveat of this is that my CPU is running close to 100% throughout a 5 minute session and i was using the default profiling...you can imagine what would happen if i select some other stuff &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;div&gt;In this example of mine i was interested in object allocation and so i selected "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;ObjectAlloc&lt;/span&gt;" and one nice thing i discovered is that you can see how much memory is allocated by "&lt;span class="Apple-style-span" style="color: rgb(51, 51, 255);"&gt;drilling&lt;/span&gt;" down into any object on the column "&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;Object Address&lt;/span&gt;" and the profiler will show you the object's memory allocations and de-allocations. Cool!&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3550559001439762363?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3550559001439762363/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3550559001439762363&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3550559001439762363'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3550559001439762363'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2009/01/mac-os-x-instruments.html' title='Mac OS X &quot;Instruments&quot;'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_n2HkB0XD3Kw/SXKaxkBW5NI/AAAAAAAAA88/nEBxzaPDvmM/s72-c/Library+panel.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4467928893243828950</id><published>2008-12-22T20:13:00.004+08:00</published><updated>2008-12-22T20:24:04.458+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><title type='text'>The age old question of whether PermGen is part of JVM memory</title><content type='html'>This article of mine is lifted from the OpenJDK user group discussion on this topic&lt;br /&gt;&lt;br /&gt;This is &lt;a href="http://blogs.sun.com/jonthecollector/"&gt;Jon Masamitsu&lt;/a&gt;'s reply to the question (This has not been edited in any way and represents him 100%)&lt;br /&gt;&lt;blockquote&gt;In the young generation there are 3 spaces - eden plus 2 survivor&lt;br /&gt;spaces.  This organization is to enable the collection of the young generation.  See&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(0, 153, 0);"&gt;http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;for more details.  In calculating the total size of the heap, only 1 on the survivor spaces is being counted.  That's because only&lt;br /&gt;eden + 1 survivor space is available to the application for allocations.&lt;br /&gt;That may be where the rest of the space is.  If you add &lt;span class="Apple-style-span" style="color: rgb(0, 153, 0);"&gt;&lt;span class="Apple-style-span" style="font-style: italic;"&gt;-XX:+PrintHeapAtGC&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;you will see something like &lt;/blockquote&gt;&lt;blockquote&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 51, 255);"&gt;Heap PSYoungGen      total 10752K, used 368K [0xf1000000,0xf1c00000, 0xfbc00000)&lt;br /&gt;eden space 9216K, 4% used [0xf1000000,0xf105c308,0xf1900000)&lt;br /&gt;from space 1536K, 0% used [0xf1a80000,0xf1a80000,0xf1c00000)&lt;br /&gt;to   space 1536K, 0% used [0xf1900000,0xf1900000,0xf1a80000)&lt;br /&gt;PSOldGen        total 24576K, used 0K [0xdb800000, 0xdd000000, 0xf1000000)&lt;br /&gt;object space 24576K, 0% used [0xdb800000,0xdb800000,0xdd000000)&lt;br /&gt;PSPermGen       total 16384K, used 1489K [0xd7800000, 0xd8800000,&lt;br /&gt;0xdb800000)&lt;br /&gt;object space 16384K, 9% used [0xd7800000,0xd7974540,0xd8800000)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Which will tell you the sizes of the survivor spaces.  By the way, the&lt;br /&gt;perm generation is separate from the Java heap.&lt;br /&gt;&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Notice the last statement he says "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;&lt;span class="Apple-style-span" style="color: rgb(51, 102, 255);"&gt;the perm generation is separate from the Java heap&lt;/span&gt;&lt;/span&gt;". I hope this statement more or less settles the issue i have been hearing for 10,000 years (You can verify this by downloading and browsing the code for it ... it will take you some time but the benefits are tremendous :D )&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4467928893243828950?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4467928893243828950/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4467928893243828950&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4467928893243828950'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4467928893243828950'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/12/age-old-question-of-what-constitutes.html' title='The age old question of whether PermGen is part of JVM memory'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2846401401233732152</id><published>2008-12-18T08:16:00.008+08:00</published><updated>2008-12-22T20:23:29.223+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SecondLife'/><title type='text'>Homegrown store with Second Life</title><content type='html'>Today in the newspapers, i saw a piece of news which pleased me. The Singapore store "Tangs" is going to set up a 3-D replica in &lt;a href="http://www.secondlife.com/"&gt;Second Life&lt;/a&gt;. Read about the news by clicking &lt;a href="http://www.straitstimes.com/Breaking%2BNews/Singapore/Story/STIStory_315550.html"&gt;here&lt;/a&gt;. Its warming to see that malls and stores in Singapore are beginning to appreciate 3-D virtual reality. Having&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SUmXfTtHhkI/AAAAAAAAArU/TBTEtdCYdwQ/s1600-h/ln-tangs.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 300px; height: 209px;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SUmXfTtHhkI/AAAAAAAAArU/TBTEtdCYdwQ/s320/ln-tangs.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5280918602189866562" /&gt;&lt;/a&gt;worked in Linden Lab for a couple of weeks, this is a place that really excites me because here's a place where people make it happen. The variety of technologies and the "&lt;span class="Apple-style-span" style="font-style: italic;"&gt;lindens&lt;/span&gt;" (that's what we are called within the company, special isn't it? I thought so too ... has a familial bonding effect to that ring) here are special and dedicated to their craft. Awesome :) &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Having said that, its time to go back to my Second Life (yeah, its corny i know)&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2846401401233732152?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2846401401233732152/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2846401401233732152&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2846401401233732152'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2846401401233732152'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/12/homegrown-store-with-second-life.html' title='Homegrown store with Second Life'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_n2HkB0XD3Kw/SUmXfTtHhkI/AAAAAAAAArU/TBTEtdCYdwQ/s72-c/ln-tangs.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7254151371963262923</id><published>2008-11-20T10:30:00.002+08:00</published><updated>2008-12-18T08:57:43.105+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>Python Performance and the GIL</title><content type='html'>Recently developed a simply multi-threaded &lt;a href="http://www.python.org/"&gt;Python&lt;/a&gt; program (CPU-bound) and tried its performance on my &lt;a href="http://www.amd.com/us-en/Processors/ProductInformation/0,,30_118_15331_15332%5E15334,00.html"&gt;AMD Phenom X4&lt;/a&gt; Quad-Core system. The results were not what i expected. After some investigation and R&amp;amp;D, one cause could be the way i write my application (i always suspect myself first) and the second would be the infamous GIL (&lt;a href="http://www.python.org/doc/2.4/api/threads.html"&gt;Global Interpreter Lock&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;Based on &lt;a href="http://cs.wlu.edu/%7Ewhaleyt/classes/parallel/topics/amdahl.html"&gt;Amdahl's law&lt;/a&gt;, i should obtain roughly a four-fold improvement in runtime since the system-under-test (SUT) has 4 processing cores but because of Python's semantics and requirements i.e. GIL, there isn't a way to really push the performance of the application's runtime.&lt;br /&gt;&lt;br /&gt;Why do i say that? To my understanding, the GIL is being acquired/release for almost every operation you do in Python and based on the specification of the language, it is possible to control the frequency of the lock/release cycle in GIL by changing it via &lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;sys.setcheckinterval()&lt;/span&gt;. The side effect of this is that it would increase the frequency of thread context switches and this means python checks events at a higher/lower rate (depending on the passed-in value) which might spell a good or bad thing depending on the nature of your application i.e. I/O-bound or CPU-bound. For those &lt;a href="http://www.sun.com/"&gt;Solaris&lt;/a&gt; buffs, its something like "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;set hires_tick ...&lt;/span&gt;" not exactly but thinking along that line doesn't hurt.&lt;br /&gt;&lt;br /&gt;There's a book called "&lt;a href="http://books.google.com/books?id=Q0s6Vgb98CQC&amp;amp;pg=PA356&amp;amp;lpg=PA356&amp;amp;dq=global+interpreter+lock+performance&amp;amp;source=web&amp;amp;ots=hc-145Qiwy&amp;amp;sig=Z-MCTZSx0-A8dVer-3UAS4U1-EA&amp;amp;hl=en&amp;amp;sa=X&amp;amp;oi=book_result&amp;amp;resnum=6&amp;amp;ct=result"&gt;Python Cookbook&lt;/a&gt;" that illustrates the issue and i draw a paragraph from there.&lt;br /&gt;&lt;blockquote style="color: rgb(0, 153, 0); font-style: italic;"&gt;To make life easier for programmers, the interpreter releases and reacquires the lock every 100 bytecode instructions ( a value that can be changed using sys.setcheckinterval).&lt;br /&gt;...&lt;br /&gt;However, effective performance-boosting exploitation of multiple processors from multiple Python threads of the same process is just not in the cards.&lt;br /&gt;...&lt;br /&gt;you will not observe substantial performance increases by moving your multithreaded application to a multiprocessor machine&lt;/blockquote&gt;&lt;br /&gt;If i read this book earlier, i probably would not have done this exercise but its interesting to understand why it happens. As part of this "validation" exercise, i browse into the Python 2.6 sources, and here's what i think the author meant.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;file: ------ ceval.c ------&lt;br /&gt;/* for manipulating the thread switch and periodic "stuff" - used to be&lt;br /&gt;per thread, now just a pair o' globals */&lt;br /&gt;int _Py_CheckInterval = 100;&lt;br /&gt;volatile int _Py_Ticker = 100;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;These internal C variables defined the "interval" which i think the author meant and there's a C API that serves as the implementation of &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;sys.setcheckinterval()&lt;/span&gt; in Python which happens to be&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;static PyObject *&lt;br /&gt;sys_setcheckinterval(PyObject *self, PyObject *args)&lt;br /&gt;{&lt;br /&gt;      if (!PyArg_ParseTuple(args, "i:setcheckinterval", &amp;amp;_Py_CheckInterval))&lt;br /&gt;              return NULL;&lt;br /&gt;      Py_INCREF(Py_None);&lt;br /&gt;      return Py_None;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;And there are a couple of C-signal functions which uses the values of &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;_Py_CheckInterval&lt;/span&gt; &amp;amp; &lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;_Py_Ticker&lt;/span&gt; to perform what the author has described but i omit them here.&lt;br /&gt;&lt;br /&gt;However, there is a caveat to increase the frequency of this cycle and that is it will likely increase the occurences of &lt;span style="font-style: italic; font-weight: bold;"&gt;race condition&lt;/span&gt; within the signal handlers i.e. your program's correctness. You can find plenty of such issues by google-ing        &lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7254151371963262923?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7254151371963262923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7254151371963262923&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7254151371963262923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7254151371963262923'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/11/python-performance-and-gil.html' title='Python Performance and the GIL'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-5284242909644980994</id><published>2008-11-12T12:36:00.007+08:00</published><updated>2008-11-20T15:48:34.732+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sun Studio 12'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>Building Python for Performance with Sun Studio</title><content type='html'>This article is lifted from Sun's Wiki page on Python performance and you can view the entire article by clicking &lt;a href="http://wikis.sun.com/display/AppPerfTuning/Python"&gt;here&lt;/a&gt;. Thanks to &lt;a href="http://chihungchan.blogspot.com"&gt;Chi Hung&lt;/a&gt; for sending me this link.&lt;br /&gt;&lt;br /&gt;In a nutshell, the link reports performance improvements on using Sun Studio to compile Python from source on 3 of Sun's processors namely the UltraSPARC IV+, Woodcrest &amp;amp; Niagara2 using compiler flags like "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;-XO4&lt;/span&gt;" and "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;-xipo&lt;/span&gt;" which contains optimizations like inlining, cross source file optimization among others.&lt;br /&gt;&lt;br /&gt;Check out the &lt;a href="http://wikis.sun.com/display/AppPerfTuning/Python"&gt;link&lt;/a&gt; for the benchmarks. One interesting thing was that Sun did not publish any benchmarks for Sun Studio on Linux-based systems; second interesting thing was that there's a piece of software called "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;parrot benchmark&lt;/span&gt;" which contains a iterable option (in the link, the author suggests to change the profiling to occur 20 times instead of the original 2 times) which is used during the compilation process to allow the compiler to profile Python (i assume to run the binary against some benchmarking software, collect the results in a temp place) and optimize portions of it.&lt;br /&gt;&lt;br /&gt;Haven't used much of this parrot benchmarking software so i am not sure what is the simulation procedure used inside it to measure which parts of Python needs to be optimized. Interesting.        &lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-5284242909644980994?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/5284242909644980994/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=5284242909644980994&amp;isPopup=true' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5284242909644980994'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5284242909644980994'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/11/building-python-for-performance-with.html' title='Building Python for Performance with Sun Studio'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8515693289659600999</id><published>2008-10-29T01:01:00.007+08:00</published><updated>2008-11-20T15:48:13.744+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CMT'/><category scheme='http://www.blogger.com/atom/ns#' term='Sun Studio 12'/><category scheme='http://www.blogger.com/atom/ns#' term='SecondLife'/><title type='text'>CMT Presentation in SecondLife</title><content type='html'>&lt;a href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SQdGYxueg2I/AAAAAAAAAqo/2_z9r8YKF0U/s1600-h/CMT+on+SecondLife+2_001.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5262252081084793698" style="WIDTH: 320px; CURSOR: hand; HEIGHT: 236px" alt="" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SQdGYxueg2I/AAAAAAAAAqo/2_z9r8YKF0U/s320/CMT+on+SecondLife+2_001.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SQdF4fP7qnI/AAAAAAAAAqg/6CwvKrRf5f8/s1600-h/CMT+on+SecondLife_001.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5262251526369028722" style="WIDTH: 320px; CURSOR: hand; HEIGHT: 236px" alt="" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SQdF4fP7qnI/AAAAAAAAAqg/6CwvKrRf5f8/s320/CMT+on+SecondLife_001.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Today at 9am PST (1AM Singapore time), &lt;a href="http://blogs.sun.com/d"&gt;Darryl Gove &lt;/a&gt;from &lt;a href="http://www.sun.com/"&gt;Sun Microsystems &lt;/a&gt;presented on &lt;a href="http://developers.sun.com/solaris/articles/chip_multi_thread.html"&gt;Chip Multi Threadi&lt;/a&gt;ng in &lt;a href="http://secondlife.com/"&gt;Second Life&lt;/a&gt;, and it was great to hear him live (even though there were some issues with the sound) speak about the latest compiler (Sun Studio 12) / software technologies like POSIX, OpenMP, Autoparallelization are helping developers write fuss free, if i may, multi threaded code. Here's the link to the presentation &lt;a href="http://blogs.sun.com/d/resource/utilising_CMT.pdf"&gt;slides&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Above are snapshots captured during the live session with him and a bunch other people from all around the world.&lt;br /&gt;&lt;br /&gt;Psss...in case you didn't know, SecondLife is capable of much more ... download it (it's free) .. run it .. enjoy!        &lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8515693289659600999?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8515693289659600999/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8515693289659600999&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8515693289659600999'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8515693289659600999'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/10/cmt-presentation-in-secondlife.html' title='CMT Presentation in SecondLife'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_n2HkB0XD3Kw/SQdGYxueg2I/AAAAAAAAAqo/2_z9r8YKF0U/s72-c/CMT+on+SecondLife+2_001.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8053861263234248732</id><published>2008-10-24T21:25:00.005+08:00</published><updated>2008-11-20T15:47:57.845+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SNL'/><title type='text'>Saturday Night Live</title><content type='html'>You got to see the video, Saturday Night Live and its a full episode :). Depending on the political sensitivity of your geography, you might or might not have access to it. But this episode has Will Ferrell, Tina Fey and others playing out the ongoings of the US Presidential Elections. Fyi, i am not for any US party but i do find this video extremely entertaining.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SQHQRs7khdI/AAAAAAAAAqY/0yzrsY-zvTc/s1600-h/SNL.png"&gt;&lt;img id="BLOGGER_PHOTO_ID_5260714842282821074" style="WIDTH: 320px; CURSOR: hand; HEIGHT: 223px" alt="" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SQHQRs7khdI/AAAAAAAAAqY/0yzrsY-zvTc/s320/SNL.png" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's the url &lt;a href="http://www.nbc.com/Saturday_Night_Live/video/episodes/#vid=784241"&gt;http://www.nbc.com/Saturday_Night_Live/video/episodes/#vid=784241&lt;/a&gt;&lt;br /&gt;        &lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8053861263234248732?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8053861263234248732/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8053861263234248732&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8053861263234248732'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8053861263234248732'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/10/saturday-night-live_24.html' title='Saturday Night Live'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/SQHQRs7khdI/AAAAAAAAAqY/0yzrsY-zvTc/s72-c/SNL.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8586760284222170678</id><published>2008-10-12T11:57:00.018+08:00</published><updated>2008-11-20T15:53:10.644+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='J2EE'/><category scheme='http://www.blogger.com/atom/ns#' term='BEA WebLogic'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><title type='text'>J2EE/WebLogic Tuning project ... continued (Part 3)</title><content type='html'>Finally, the tuning project came to a close (at least for now) and there were many takeaways from this experience with my customer which i like to share. Overall, i felt that the &lt;span style="font-style: italic;"&gt;root cause &lt;/span&gt;of these issues were typical of many software development projects in J2EE and that was not understanding the technology well enough (not to mention project costing and time schedules which compromised software quality...after all these years, you would think that people got it...but apparently they don't...)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Understanding EJBs, Transactions and SQL&lt;/span&gt;&lt;br /&gt;That application had more than 3,000 bean instances in active transacting but the average transaction rate was less than 5 per second, which highlighted an important fact that its heavily clogged. The real clog was really poor performing SQLs, injudicious use of CMT and BMT i.e. Container/Bean Managed Transaction, mixing CMT &amp;amp; BMT in the same bean, "chatty" EJBs. The EJB specifications clearly say that its best not to mix CMT &amp;amp; BMT as it makes transaction management complicated also never EVER place global variables of the kind that represent database connections, user preferences etc in stateless beans except of course, stateful beans. Since the concurrency now is pretty low, those kinds of issues related to concurrency won't surface but you can be sure they will surface once performance improves and you don't want these kinds of bugs are the HARDEST to locate.&lt;br /&gt;&lt;br /&gt;It was so slow to the point that WebLogic actual emitted an "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;STUCK THREAD&lt;/span&gt;" message and these occur when execution exceeds 10 minutes.&lt;br /&gt;&lt;br /&gt;The next thing to do is to re-look at the design of those beans and remove those CMT &amp;amp; BMT mixtures and to convert globals into locals.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Understanding Clustering Technology&lt;/span&gt;&lt;br /&gt;Clustering can be so easily abused by developers and lack of design control could worsen the entire situation. This project's implementation resulted in creating a 1 cluster group of 12 machines and its observed that Minor + Major GCs were happening on such a frequency and pause times were worsening that it totally wrecked the concurrency of the application; after which they changed it to 3 cluster groups which significantly reduced GC (however it still wasn't good enough) ... i suggested to them to reduced to several cluster groups of 2 servers which should/will significantly reduced overall heap size usage, frequency of GCs (and GC pause times) since there's less pressure on the Java memory system now that there should see less replication activity.&lt;br /&gt;&lt;br /&gt;Developers placed huge objects (&gt; 500KB) into the &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;session&lt;/span&gt; via &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;attributes&lt;/span&gt; combined with a fact that session timeouts ranged in hours (can't do anything since that's a requirement of the business) implied that memory can't be freed during GCs. But i suspect its probably an application design issue since these objects tend to reflect database data displayed on the GUI and all those nifty controls that gives users flexbility in the business functions.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Understanding Garbage Collection&lt;/span&gt;&lt;br /&gt;GC are quite a killer of typical J2EE applications and its important to keep the pause times as low as it is possible. One major improvement was in using the &lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;Concurrent Low Pause Collector&lt;/span&gt; i.e. &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;-XX:+UseConcMarkSweepGC&lt;/span&gt; which allowed GC to happen in a concurrent fashion in both the &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Young&lt;/span&gt; and &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Tenured Generation&lt;/span&gt;. Before this change, the &lt;span style="font-style: italic;"&gt;Major GC&lt;/span&gt; time took 10 - 17 secs on average occurring once every 3 hours on a typical day; and 15 - 20 secs on average occuring once every 30 - 45 minutes on a busy day. However, i am skeptical whether this will work because the average object creation rate is 800 MB per minute (since Minor GC occurs twice per minute and each time it ocurred freed approximately 400 MB of memory)...there's still a chance that the Stop-The-World GC will happen; to understand why do read Jon Masamitsu's blog &lt;a href="http://blogs.sun.com/jonthecollector/entry/what_the_heck_s_a"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Class Loading&lt;/span&gt;&lt;br /&gt;The developers of the application did something which is very commonly seen in development and that is dumping their classes into the application server's classpath which not only prevented the class from unloading and what's worse is that a type of memory leak a.k.a Class Loader Leaks occur and GC doesn't collect this memory space. I admit i used to do this but i don't now. The effect was &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;OutOfMemoryError: PermGen Space&lt;/span&gt; and of course increasing the Perm Gen size would effectively stamp out this error.&lt;br /&gt;&lt;br /&gt;Of course, now my customer is pursuing the Queue OverFlow exception with BEA Systems and let's see what happens next ... perhaps a patch or BEA/Oracle will say " Upgrade your WebLogic 10, it solves that problem !" Psss tell you a secret the problem is not resolved there either ...        &lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;try {&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;} catch(err) {}&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8586760284222170678?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8586760284222170678/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8586760284222170678&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8586760284222170678'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8586760284222170678'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/10/j2eeweblogic-tuning-project-continued.html' title='J2EE/WebLogic Tuning project ... continued (Part 3)'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2219443103584195519</id><published>2008-10-02T16:09:00.007+08:00</published><updated>2008-10-02T18:06:48.920+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='J2EE'/><category scheme='http://www.blogger.com/atom/ns#' term='BEA WebLogic'/><title type='text'>J2EE/WebLogic Performance Tuning Project...continued</title><content type='html'>In my previous article, my tuning project with a customer ran into some trouble with WebLogic's Work Manager and in particular, on the Java exception &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;weblogic.utils.UnsyncCircularQueue$FullQueueException&lt;/span&gt; where the WebLogic server indicated that the queue where the server works on submitted requests. Checked the WebLogic &lt;a href="http://e-docs.bea.com/wls/docs92/perform/appb_queues.html"&gt;docs &lt;/a&gt;and accordingly, the server will automatically resize the queue to fit the requests but what the docs didn't mention was that the resize will fail to work if the size of the queue equals the capacity which happens to be 65536 and that's the reason why it threw the error message "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Queue exceeded the maximum capacity of '&lt;span style="font-weight: bold;"&gt;65536&lt;/span&gt;' elements&lt;/span&gt;".&lt;br /&gt;&lt;br /&gt;However, checking the code reveals something quite peculiar and that is the constructor suggests that only queues of sizes exceeding &lt;span style="font-style: italic;"&gt;1 GB&lt;/span&gt; will throw this error but the default capacity is always 256 and reaching a maximum of 65536 elements and btw, its WebLogic 9.2 ; so my guess is that source codes need to be cleaned up ? If anybody has any idea why, do drop me a comment, thanks in advanced.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2219443103584195519?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2219443103584195519/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2219443103584195519&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2219443103584195519'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2219443103584195519'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/10/j2eeweblogic-performance-tuning.html' title='J2EE/WebLogic Performance Tuning Project...continued'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3602360124356609801</id><published>2008-09-30T09:25:00.025+08:00</published><updated>2008-10-02T16:39:44.954+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='J2EE'/><category scheme='http://www.blogger.com/atom/ns#' term='BEA WebLogic'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><title type='text'>J2EE/WebLogic Performance Tuning Project</title><content type='html'>This article contains a series of investigations for a customer of mine where the environment is running a &lt;a href="http://www.bea.com/"&gt;WebLogic&lt;/a&gt; cluster of 20 machines in &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;round-robin&lt;/span&gt; on &lt;a href="http://www.blogger.com/www.hp.com"&gt;HP-UX&lt;/a&gt; to service a global J2EE application and it performed slowly during peak periods and occasional hangs. The application was a typical 3-tier architecture whereby web relegates requests to the middle-tier (EJBs, MQs, MDBs) and this middleware goes to the Database-tier (SQL inserts, updates, deletes, stored procedures etc). The application was found to be experiencing heavy load during peak periods everyday.&lt;br /&gt;&lt;br /&gt;There were a couple of issues related to poor performing SQLs, poorly designed middleware apps, WebLogic cluster design and runtime issues, JVM memory consumption and frequent garbage collections. Let me try to detail them a bit without giving away too much customer information. Hopefully, it can help you in your investigations in your environment.&lt;br /&gt;&lt;br /&gt;During the peak period, the major contributing factors of the apps slow-ness were:&lt;br /&gt;&lt;br /&gt;The heap size was 1.5GB (min,max), 512MB for &lt;span style="font-style: italic;"&gt;Eden&lt;/span&gt; and the &lt;span style="font-style: italic;"&gt;PermGen&lt;/span&gt; was 192MB. The minor GC kicked in frequently releasing approximately 60MB on average; the major GC kicked in twice every minute (avg. 3-5s on average, 40s on max) releasing 400 - 500 MB each time and reverse engineering the figures reveals that the object creation rate was roughly 800 M - 1.0 GB per minute. As GC is primarily a CPU-intensive operation (with saving state, freeing memory, compacting the heap etc). The large object creation rate combined with the relatively long pauses GCs occurences suggests that the application are creating objects in an in-efficient manner and that created problems with the cluster's session replication mechanism as the users of the system would see stale data - due to long pauses in GC, the data in the session was not replicated *properly* across to the other servers.&lt;br /&gt;&lt;br /&gt;Applications were attached to the WebLogic system classpath which meant that the Java classes were never unloaded from memory and combined with the fact that there are ALWAYS classloader leaks meant that whenever the operation team redeploy a.k.a "&lt;span style="font-style: italic;"&gt;hot&lt;/span&gt;"-redeployment the apps, it worsens the memory footprint since the previous memory was never release due to this leakage. If you keep hot-deploying these stuff you will almost certainly get an &lt;span style="font-style: italic;"&gt;OutOfMemory Error: PermGen out of space&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;EJBs (4,000+ EJBs deployed, in my opinion too many) were utilizing &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;remote interfaces&lt;/span&gt; when there was no need as those apps were not doing cross-vm operations and based on my previous experimentation, you would get a 3-fold runtime improvement when you convert the EJBs to &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;local interfaces&lt;/span&gt;. This improvement is because there is less object marshalling/unmarshalling via RMI since everything is on the same JVM heap and consumes less system resources like file descriptors/socket &amp;amp; memory since local interfaces implies a local/normal Java call.&lt;br /&gt;&lt;br /&gt;As i mentioned previously, the apps were deployed in the cluster and that meant that all persistent objects (e.g. session data, user preferences etc) must be &lt;span style="font-style: italic;"&gt;Serializable&lt;/span&gt; (i.e. persistent objects need to implement &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;java.lang.Serializable&lt;/span&gt;) since there would be session replication across the servers in the cluster which further degraded the performance as the cluster needs to maintain state across all 20 machines. Source code analysis found that user's were keeping results of database fetches in session data! You can imagine the pressure faced by the JVM memory subsystem + WebLogic cluster replication.&lt;br /&gt;&lt;br /&gt;WebLogic cluster was also malfunctioning during peak periods throwing an exception message like&lt;span style="font-style: italic; color: rgb(0, 0, 0);"&gt; &lt;/span&gt;&lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;&amp;lt;WorkManager&amp;gt; &amp;lt;BEA-002911&amp;gt; &amp;lt;WorkManager weblogic.kernel.System failed to schedule a request due to weblogic.utils.UnsyncCircularQueue$FullQueueException: Queue exceed maximum capacity of: '65536' elements &lt;/span&gt;and this is an critical error thrown from the Work Manager which replaced the BEA traditional thread pool. What this meant was that the WebLogic cluster could no longer handle user's requests and hanged. *I plan to unravel this mystery in a while to understand why this is happening*&lt;br /&gt;&lt;br /&gt;The hardware loadbalancer was in "&lt;span style="color: rgb(51, 102, 255);"&gt;sticky&lt;/span&gt;" mode even though the WebLogic cluster was in&lt;span&gt;&lt;span&gt; round-robin&lt;/span&gt;&lt;/span&gt; mode which negated this round-robin-ness and resulted in certain servers encountering more stress than others and this was made worse by the long session timeout of 20+ hrs. That's the cost of doing business....&lt;br /&gt;&lt;br /&gt;After tracing the SQL statements execution times, it was found that they were causing alot of problems from missing indexes, lack of functional indexes, improper SQL statements which causes large database table joins and many "select count(*)..." from large table joins statements contributed to this object creation rate.&lt;br /&gt;&lt;br /&gt;When i looked at these issues, the first couple of items i advised my customer was to do the following:&lt;br /&gt;(1) Convert the EJBs to use local interfaces i.e. &lt;span style="color: rgb(51, 102, 255);"&gt;call-by-reference&lt;/span&gt;&lt;br /&gt;(2) Tune the SQL statements via SQL reordering, indexes etc&lt;br /&gt;(3) Tune the JVM heap to use more aggressive + parallel heap collectors via &lt;span style="color: rgb(51, 102, 255);"&gt;-XX:+UseParallelGC&lt;/span&gt; &lt;span style="color: rgb(51, 102, 255);"&gt;-XX:+UseConcMarkSweepGC&lt;/span&gt; (We are still experimenting this portion)&lt;br /&gt;(4) Do not use system classpath to load application classes&lt;br /&gt;(5) Review source codes to remove known classloader leaks&lt;br /&gt;&lt;br /&gt;The customer and myself are still in the process of implementing/reviewing this so i hope to have an update for you in a couple of weeks time. Meanwhile, do visit &lt;a href="http://blogs.sun.com/jonthecollector/"&gt;Jon Masamitsu's blog&lt;/a&gt; for an understanding of some JVM tuning parameters.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3602360124356609801?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3602360124356609801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3602360124356609801&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3602360124356609801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3602360124356609801'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/09/j2eeweblogic-performance-tuning-with.html' title='J2EE/WebLogic Performance Tuning Project'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8148947921386564606</id><published>2008-09-30T09:24:00.000+08:00</published><updated>2008-09-30T09:21:56.794+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Bug'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><title type='text'>"man" crashes on my Lenovo T61 OpenSolaris snv_95</title><content type='html'>Didn't expect "&lt;a href="http://en.wikipedia.org/wiki/Manual_page_%28Unix%29"&gt;man&lt;/a&gt;" to crash on me today. But it's an interesting problem because when i tried the same command that crashed on my &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;OpenSolaris snv_95&lt;/span&gt; on another machine running Solaris (&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;SunOS raymond 5.10 Generic_118855-36 i86pc i386 i86pc&lt;/span&gt;) it didn't crash but gave me instead a error message "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;No manual entry for make&lt;/span&gt;" which is good since its more friendly to the user and i found similar behavior on &lt;a href="http://www.ubuntu.com/"&gt;Ubuntu&lt;/a&gt; Linux.&lt;br /&gt;&lt;br /&gt;After investigating for a while, i believe its a bug. The command i attempted was&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;man -M /usr/bin/man make&lt;br /&gt;&lt;/pre&gt;You will noticed that i made a mistake and its intentional. "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;-M&lt;/span&gt;" is suppose to mean the path but in this case i gave the absolute path to a file (which is executable btw) and when i ran on my OpenSolaris it generated a coredump.&lt;br /&gt;&lt;br /&gt;After examining the code + coredump, it appears to me that the reason of the crash is because the program attempted to release memory via "free" but wasn't assigned previously. Filed a bug report with OpenSolaris. Keep you updated on this when i have news.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;This is an update on this post. Here is the bug id issued, click &lt;a href="http://bugs.opensolaris.org/view_bug.do?bug_id=6750055"&gt;here&lt;/a&gt; for more details.&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8148947921386564606?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8148947921386564606/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8148947921386564606&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8148947921386564606'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8148947921386564606'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/09/man-crashes-on-my-lenovo-t61.html' title='&quot;man&quot; crashes on my Lenovo T61 OpenSolaris snv_95'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-1739839129550776172</id><published>2008-09-12T11:20:00.012+08:00</published><updated>2008-09-15T09:20:03.436+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenJDK'/><title type='text'>Building OpenJDK 7</title><content type='html'>I guess alot of people already knew that &lt;a href="http://www.sun.com/"&gt;Sun Microsystems&lt;/a&gt; has released the source code for &lt;a href="http://www.java.sun.com/"&gt;Java&lt;/a&gt;. Well i tried my hand at compiling the &lt;a href="http://download.java.net/openjdk/jdk7/"&gt;OpenJDK 7&lt;/a&gt; and its important you read the installation manual before you attempt to compile it. Reason i did it was because i wanted to try out the new features of the JVM and also to understand the build-release process. Of course, i am still learning it and this post is about my experiences of compiling it and hope that it can provide you useful information on building your own, customizing it, fixing bugs if you find it etc&lt;br /&gt;&lt;br /&gt;What i did first was to read the instructions found &lt;a href="http://hg.openjdk.java.net/jdk7/jdk7/raw-file/tip/README-builds.html"&gt;here&lt;/a&gt;. Read it a couple of times to understand exactly what you need to do. For my experience, here's what i did&lt;br /&gt;&lt;br /&gt;(1) Alter my .bashrc file to include the environment variables needed&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;path&gt;&lt;span style="font-family: Georgia,serif;"&gt;ALT_BINARY_PLUGS_PATH=/export/home/tayboonl/build_jdk/openjdk-binary-plugs&lt;br /&gt;ANT_HOME=/export/home/tayboonl/apache-ant-1.7.1&lt;br /&gt;ALT_COMPILER_PATH=/opt/SunStudioExpress/bin&lt;br /&gt;ALT_GCC_COMPILER_PATH=/usr/bin/&lt;br /&gt;ALT_CUPS_HEADERS_PATH=/usr/include&lt;br /&gt;ALT_JDK_IMPORT_PATH=/usr/jdk/jdk1.6.0_06&lt;br /&gt;LANG=C&lt;br /&gt;&lt;br /&gt;export ALT_BINARY_PLUGS_PATH&lt;br /&gt;export ALT_CUPS_HEADERS_PATH&lt;br /&gt;export ALT_COMPILER_PATH&lt;br /&gt;export ALT_GCC_COMPILER_PATH&lt;br /&gt;export ALT_JDK_IMPORT_PATH&lt;br /&gt;export ANT_HOME&lt;br /&gt;export LANG&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/path&gt;&lt;/pre&gt;&lt;br /&gt;(2) Invoke the sanity check as defined in the build instructions&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;gmake sanity ARCH_DATA_MODEL=32&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Note&lt;/span&gt;: Its important to fix all the warnings and errors before proceeding. If in doubt, check the openjdk forums. Remember to set the ALT_* variables, they are pretty critical to the success of the build. Also remember to install &lt;a href="http://findbugs.sourceforge.net/"&gt;findbugs&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;(3) Re-run the sanity checks till everything is fixed, then start the build (From the output below, you can tell i am building the JDK for 32-bit instead of 64-bit)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;gmake ARCH_DATA_MODEL=32&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;(4) Wait a while...grab a coffee...grab a bagel...sleep a little&lt;br /&gt;If you have reached this stage, you will probably find that your terminal screen is scrolling away building , compiling etc and you might have hit this error&lt;br /&gt;&lt;br /&gt;"&lt;span&gt;&lt;b class="highlight"&gt;Error&lt;/b&gt;: &lt;b class="highlight"&gt;ia_nice&lt;/b&gt; &lt;b class="highlight"&gt;is&lt;/b&gt; &lt;b class="highlight"&gt;not&lt;/b&gt; &lt;b class="highlight"&gt;a&lt;/b&gt; &lt;b class="highlight"&gt;member&lt;/b&gt; &lt;b class="highlight"&gt;of&lt;/b&gt; &lt;b class="highlight"&gt;iaparms&lt;/b&gt;."&lt;br /&gt;&lt;br /&gt;This is a bug &lt;a href="http://bugs.opensolaris.org/view_bug.do?bug_id=6712505"&gt;6712505&lt;/a&gt; and you can comment out the offending line. I wonder why didn't the folks at Sun took this away...&lt;br /&gt;&lt;br /&gt;Next thing i encountered was a build error where it complains that it cannot find the file "&lt;span style="font-style: italic;"&gt;sys/audio.h&lt;/span&gt;", "&lt;span style="font-style: italic;"&gt;sys/audioio.h&lt;/span&gt;", "&lt;span style="font-style: italic;"&gt;sys/mixer.h&lt;/span&gt;" and i subsequently found the 3 files from &lt;a href="http://src.opensolaris.org"&gt;src.opensolaris.org&lt;/a&gt; and placed them into their proper directories.&lt;br /&gt;&lt;br /&gt;After that, re-running the build took another hit in the form of attempting to locate the file "X11/Intrinsic.h" and some other header files from the X11 package and on OpenSolaris its known as&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt; SUNWxwinc&lt;/span&gt;. Well, here's the irritating part whereby the OpenSolaris package manager tells you that it has the files but when you visit the folder(s), i simply couldn't find it so i have to re-install the package &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;SUNWxwinc&lt;/span&gt; and the header files are there.&lt;br /&gt;&lt;br /&gt;(5) Finally, the entire build process kicked off without any further disappointments and i see IT, i saw the message&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;gmake[2]: Leaving directory `/export/home/tayboonl/build_jdk/openjdk/jdk/make'&lt;br /&gt;gmake[1]: Leaving directory `/export/home/tayboonl/build_jdk/openjdk'&lt;br /&gt;Control solaris i586 1.7.0-internal build_product_image build finished:&lt;br /&gt;Control solaris i586 1.7.0-internal all_product_build build finished:&lt;br /&gt;Control solaris i586 1.7.0-internal all build finished:&lt;br /&gt;tayboonl@opensolaris:~/build_jdk/openjdk$&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Next is to invoke the commandline and when you see something like&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;tayboonl@opensolaris:~/build_jdk/openjdk/build/solaris-i586$ ./j2sdk-image/bin/java -version&lt;br /&gt;openjdk version "1.7.0-internal"&lt;br /&gt;OpenJDK Runtime Environment (build 1.7.0-internal-tayboonl_2008_09_12_11_16-b00)&lt;br /&gt;OpenJDK Server VM (build 14.0-b04, mixed mode)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You know you are DONE. Woo Hoo! What a relief, now i hope the next build is not going to make me weep even further but its best that you test out the build with the demo apps found in your latest build. Here's a snapshot of the ArcTest demo found after completing the build&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SMoFRCtZIAI/AAAAAAAAApI/JWalWQ87sxw/s1600-h/OpenJDK.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SMoFRCtZIAI/AAAAAAAAApI/JWalWQ87sxw/s320/OpenJDK.png" alt="" id="BLOGGER_PHOTO_ID_5245010506369540098" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Last thing i did was to dump the JVM and from the output it appears that its fine&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;2008-09-12 14:00:35&lt;br /&gt;Full thread dump OpenJDK Server VM (14.0-b04 mixed mode):&lt;br /&gt;&lt;br /&gt;"TimerQueue" daemon prio=3 tid=0x08320c00 nid=0x11 waiting on condition [0xb6d7e000..0xb6d7ebf0]&lt;br /&gt; java.lang.Thread.State: WAITING (parking)&lt;br /&gt; at sun.misc.Unsafe.park(Native Method)&lt;br /&gt; - parking to wait for  &lt;0xf3d65360&gt; (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)&lt;br /&gt; at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)&lt;br /&gt; at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1974)&lt;br /&gt; at java.util.concurrent.DelayQueue.take(DelayQueue.java:209)&lt;br /&gt; at javax.swing.TimerQueue.run(TimerQueue.java:170)&lt;br /&gt; at java.lang.Thread.run(Thread.java:674)&lt;br /&gt;&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Hopefully, what i have done would go to show that you can do it too. Now it's time for lunch.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-1739839129550776172?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/1739839129550776172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=1739839129550776172&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/1739839129550776172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/1739839129550776172'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/09/building-openjdk-7.html' title='Building OpenJDK 7'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/SMoFRCtZIAI/AAAAAAAAApI/JWalWQ87sxw/s72-c/OpenJDK.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7494825234368742641</id><published>2008-09-02T21:23:00.005+08:00</published><updated>2008-09-15T09:19:34.143+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><title type='text'>Rules for Writing 64-bit Clean Code</title><content type='html'>I lifted this from the book "&lt;a href="http://www.amazon.com/Solaris-Systems-Programming-paperback-Rich/dp/0768682231/ref=pd_bbs_sr_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1220361922&amp;amp;sr=8-1"&gt;Solaris Systems Programming&lt;/a&gt;" by &lt;a href="http://www.rite-group.com/rich/"&gt;Rich Teer&lt;/a&gt;. This is good advice. So no surprises there and if you know Rich Teer will feel uncomfortable about this, let me know and i will gladly take this off.&lt;br /&gt;&lt;br /&gt;The following rules should be observed to ensure the 64-bit cleanliness of our code. Following these rules will also make it easier to port 32-bit code to the 64-bit model.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Do not assume a pointer and an "int" are the same size. Unfortunately, a lot of code relies on this assumption, because it is true in the ILP32 model. Pointers are sometimes cast to "int"s or "unsigned int"s to perform address arithmetic. Instead, they should be cast to "long" (or "unsigned long") because pointers and "long"s are the same size in both the ILP32 and LP64 data type models. Even better is to cast pointers to "uintptr_ts", because it expresses the intent more clearly, and make the code more portable.&lt;/li&gt;&lt;li&gt;Do not make assumptions about the relative sizes of variable types. A classic example of this is to assume that the size of an "int" is the same as the size of a "long" and use them indiscriminately while implicitly or explicitly assuming they are interchangeable. Although this is typically true for 32-bit processes, it is not true for 64-bit ones.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Be wary of sign extension problems. This is quite a common problem when converting code to 64-bits, and is hard to detect before it actually occurs because "lint" doesn't warn us about it. Also, the type conversion and promotion rules are quite obscure. Hence, we should use explicit casting to fix sign extension problems.&lt;/li&gt;&lt;li&gt;Use pointer arithmetic rather than address arithmetic. As well as leading to cleaner code, pointer arithmetic is independent of the data model...&lt;/li&gt;&lt;li&gt;By default, external variables and functions are assumed to be or return an "int". by the compiler unless we declare them otherwise....in the ILP32 model its OK but on the LP64 then we will loose information which will cause the program to crash because of an illegal memory reference.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;There's a lot more where that came from ... which i suggest you buy a copy (new or used doesn't matter)&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7494825234368742641?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7494825234368742641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7494825234368742641&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7494825234368742641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7494825234368742641'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/09/rules-for-writing-64-bit-clean-code.html' title='Rules for Writing 64-bit Clean Code'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-594823502441594562</id><published>2008-08-30T20:03:00.005+08:00</published><updated>2008-09-15T09:19:09.775+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Administration'/><category scheme='http://www.blogger.com/atom/ns#' term='OpenSolaris'/><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><title type='text'>Update OpenSolaris using Boot Environment Administration tool</title><content type='html'>Was updating my &lt;a href="http://www.opensolaris.org"&gt;OpenSolaris&lt;/a&gt; snv_93 to snv_95 when all of a sudden my wireless connection was dropped and the update was incomplete, so the next thing to do is to remove it. So i removed it by first unmounting the volume and invoke &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;beadm&lt;/span&gt; to remove it like as follows unless it's going to be pretty irritating to see it when i reboot my laptop.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;-bash-3.2# umount rpool/ROOT/opensolaris-5/opt&lt;br /&gt;-bash-3.2# umount rpool/ROOT/opensolaris-5&lt;br /&gt;....&lt;br /&gt;-bash-3.2# beadm destroy opensolaris-5&lt;br /&gt;Are you sure you want to destroy opensolaris-5? This action cannot be undone (y/[n]):&lt;br /&gt;y&lt;br /&gt;The BE that was just destroyed was the 'active on boot' BE. opensolaris-4 is now the 'active on boot' BE. Use 'beadm activate' to change it.&lt;br /&gt;...&lt;br /&gt;-bash-3.2# beadm list&lt;br /&gt;&lt;br /&gt;BE            Active Active on Mountpoint Space&lt;br /&gt;Name                 reboot               Used&lt;br /&gt;----          ------ --------- ---------- -----&lt;br /&gt;opensolaris   no     no        -          66.89M&lt;br /&gt;opensolaris-4 yes    yes       /          12.54G&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If you don't un-mount it, then you are going to get error messages like "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Unable to destroy XXX&lt;/span&gt;"&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-594823502441594562?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/594823502441594562/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=594823502441594562&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/594823502441594562'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/594823502441594562'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/08/update-opensolaris-using-boot.html' title='Update OpenSolaris using Boot Environment Administration tool'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4167710032600860137</id><published>2008-08-22T20:12:00.005+08:00</published><updated>2008-09-15T09:18:44.936+08:00</updated><title type='text'>Quitting my day job</title><content type='html'>&lt;div&gt;I was hanging around a &lt;a href="http://www.starbucks.com.sg/"&gt;Starbucks &lt;/a&gt;caffe today doing my usual research, browsing the internet, reading my "Learn Python" oreilly book today when i was approach by a caucasian male and something really funny happen&lt;br /&gt;&lt;br /&gt;Caucasian: I am sorry to bother you but i need to ask you a favour&lt;br /&gt;Me: Errr...sure, what can i do for you?&lt;br /&gt;Caucasian: I bought a USB hard drive from "&lt;a href="http://www.funan.com.sg/"&gt;funan centre&lt;/a&gt;" (here's a picture of the IT super mall a.k.a "Funan Centre") &lt;/div&gt;&lt;div&gt;&lt;a href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SK6yrEb-gJI/AAAAAAAAApA/z5cZVD0oI5Q/s1600-h/funan_centre.jpg"&gt;&lt;img id="BLOGGER_PHOTO_ID_5237319869673668754" style="CURSOR: hand" alt="" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SK6yrEb-gJI/AAAAAAAAApA/z5cZVD0oI5Q/s320/funan_centre.jpg" border="0" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;and wanted to test it out&lt;br /&gt;Me: Errr...*puzzled look* (which translates to didn't the store tested out for you?)&lt;br /&gt;...&lt;br /&gt;subsequently, we tested it out and chatted his travels, what he liked/disliked about Singapore (my country)&lt;br /&gt;...&lt;br /&gt;... 5 mins later (yeah no doubt! all that happened in 5 mins)&lt;br /&gt;Caucasian: Ahhh...yes i can see the drive and there's a song&lt;br /&gt;Me: (I looked at the song and recognized it to be Rihana's umbrella)and immediately volunteered to hymm a couple of lines&lt;br /&gt;Caucasian: *&lt;span style="color:#3366ff;"&gt;giggle&lt;/span&gt;* and finally broke out into a ROAR!&lt;br /&gt;Caucasian: Please...Please...&lt;strong&gt;DONT QUIT YOUR DAY JOB&lt;/strong&gt;&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;And we both broke out into a good laugh!&lt;/div&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4167710032600860137?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4167710032600860137/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4167710032600860137&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4167710032600860137'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4167710032600860137'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/08/quitting-my-day-job.html' title='Quitting my day job'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_n2HkB0XD3Kw/SK6yrEb-gJI/AAAAAAAAApA/z5cZVD0oI5Q/s72-c/funan_centre.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-7345596464853390221</id><published>2008-08-16T09:59:00.005+08:00</published><updated>2008-08-19T12:29:33.992+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Pydoc'/><category scheme='http://www.blogger.com/atom/ns#' term='Python'/><title type='text'>OpenSolaris Pydoc browser cannot be launched</title><content type='html'>Got inspired by my &lt;a href="http://chihungchan.blogspot.com"&gt;friend&lt;/a&gt; to learn &lt;a href="http://www.python.org"&gt;Python&lt;/a&gt; and i am beginning to appreciate its power and flexibility and it has elements from another favourite of mine &lt;a href="http://www.erlang.org"&gt;Erlang&lt;/a&gt;. So following his example, i got a book called "&lt;a href="http://www.amazon.com/Learning-Python-3rd-Mark-Lutz/dp/0596513984/ref=pd_bbs_sr_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1218852680&amp;amp;sr=8-1"&gt;Learning Python - 3rd Edition&lt;/a&gt;" and begun exploring the language and when i tried to launch the pydoc (fyi, its a tool for displaying available APIs) the script was telling me that it couldn't find the browser.&lt;br /&gt;&lt;br /&gt;Here was the stack trace and like any other language, the "root" exception/cause is always found at the last line (look at the line in bold)&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;tayboonl@opensolaris:~/Desktop$ Exception in Tkinter callback&lt;br /&gt;Traceback (most recent call last):&lt;br /&gt;File "/usr/lib/python2.4/lib-tk/Tkinter.py", line 1345, in __call__&lt;br /&gt;  return self.func(*args)&lt;br /&gt;File "/usr/lib/python2.4/pydoc.py", line 2086, in open&lt;br /&gt;  webbrowser.open(url)&lt;br /&gt;File "/usr/lib/python2.4/webbrowser.py", line 43, in open&lt;br /&gt;  get().open(url, new, autoraise)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;File "/usr/lib/python2.4/webbrowser.py", line 38, in get&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;   raise Error("could not locate runnable browser")&lt;/span&gt;&lt;br /&gt;Error: could not locate runnable browser&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;So, i looked at the file &lt;span style="font-style: italic; font-weight: bold;"&gt;/usr/lib/python2.4/webbrowser.py&lt;/span&gt; and discovered that it was attempting to execute a list of known browser commands and in my case i simply added the string "firefox" to the list and it worked. *A quick hack*. Here the code snippet that i edited on the file.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;   # X browsers have more in the way of options&lt;br /&gt;   if os.environ.get("DISPLAY")$&lt;br /&gt;       _tryorder = ["galeon", "skipstone", &lt;span style="font-weight: bold;"&gt;"firefox",&lt;/span&gt;&lt;br /&gt;                    "mozilla-firefox", "mozilla-firebird", "mozilla", "netscape",&lt;br /&gt;                    "kfm", "grail"] + _tryorder&lt;br /&gt;&lt;/pre&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-7345596464853390221?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/7345596464853390221/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=7345596464853390221&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7345596464853390221'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/7345596464853390221'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/08/opensolaris-pydoc-browser-cannot-be.html' title='OpenSolaris Pydoc browser cannot be launched'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3306022829243587489</id><published>2008-08-08T10:53:00.004+08:00</published><updated>2008-08-19T12:30:00.126+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MDB'/><category scheme='http://www.blogger.com/atom/ns#' term='Dtrace'/><title type='text'>DTrace and Tail-Call Optimization</title><content type='html'>In many of the C compilers today, there's an optimization known as &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;tail-call optimization&lt;/span&gt; and you need to pay careful attention to this when using DTrace's &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Function Boundary Tracing&lt;/span&gt; a.k.a fbt provider. A common application would be to &lt;a href="http://en.wikipedia.org/wiki/Tail_call"&gt;tail-recursion&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Having said that, you probably need to realize that it occurs most often in SPARC systems than compared to x86 Solaris systems. How can you tell that you ran against a tail-call optimized routine or function ?&lt;br /&gt;&lt;br /&gt;In DTrace, the variable &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;arg0&lt;/span&gt; contains the assembly return instruction with its offset in the DTrace return probe. So in your scripts you need to put in a extra statement that traces this variable and check whether this offset is a pure &lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;return&lt;/span&gt; instruction or does it point to something else otherwise you might get a nasty shock while trying to evaluate a result thinking it was the result of a function and in some cases, produce misleading information.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;# dtrace -n fbt::squeue*:return'{printf("%s, 0x%x\n", probefunc, arg0);}'&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and you should see something like&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;squeue_enter_chain,0x3a5&lt;br /&gt;squeue_enter,0x47a&lt;br /&gt;squeue_enter,0x47a&lt;br /&gt;squeue_fire,0x85&lt;br /&gt;^C&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;and next you need to use mdb or your favourite disassembler and look at the function at that offset. In my case, i didn't find any tail-call optimized functions since i saw the &lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;ret&lt;/span&gt;-instruction but in your case you might. Be aware to take care on this when you are tracing the Tcl/Python code since there are providers for them now.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3306022829243587489?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3306022829243587489/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3306022829243587489&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3306022829243587489'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3306022829243587489'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/08/dtrace-and-tail-call-optimization.html' title='DTrace and Tail-Call Optimization'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-6007994125598957816</id><published>2008-08-06T14:00:00.003+08:00</published><updated>2008-09-15T09:17:59.088+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><category scheme='http://www.blogger.com/atom/ns#' term='Dtrace'/><title type='text'>Solaris 10 Network Stack</title><content type='html'>The new implementation in the latest Solaris 10 and OpenSolaris is different and faster. Click &lt;a href="http://www.sun.com/bigadmin/features/articles/solaris_networking.jsp"&gt;here&lt;/a&gt; for details. Alternatively, download a pdf version of it &lt;a href="http://onesearch.sun.com/search/clickthru?qt=%2Bsolaris+%2Bnetwork+%2Bstack&amp;amp;col=community-bigadmin&amp;amp;cksum=eac5a4fb0959d66a8e7754d2e12980ea&amp;amp;url=http%3A%2F%2Fwww.sun.com%2Fbigadmin%2Fcontent%2Fnetworkperf%2FFireEngine_WP.pdf&amp;amp;path=%2Fsearch%2Fonesearch%2Findex.jsp&amp;amp;hit=3"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Here's a picture of the latest architecture&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SJqNUfz0RUI/AAAAAAAAAoU/8atOyc7ouqg/s1600-h/network_magic_fig1.gif"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SJqNUfz0RUI/AAAAAAAAAoU/8atOyc7ouqg/s320/network_magic_fig1.gif" alt="" id="BLOGGER_PHOTO_ID_5231649300419986754" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The new implementation is built on re-using the &lt;a href="http://docs.sun.com/app/docs/doc/816-4855/6mb1p1r1r?l=en&amp;amp;a=view&amp;amp;q=STREAMS"&gt;Solaris STREAMS&lt;/a&gt; framework found in the pre-Solaris 10 OS. Using the picture below, you can see how its evolved through the versions and after reading the PDF, i believe you would have a better idea of this network implementation.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SJqRExXyxtI/AAAAAAAAAoc/jbALZiIUz64/s1600-h/FireEngineDesign.png"&gt;&lt;img style="cursor: pointer;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SJqRExXyxtI/AAAAAAAAAoc/jbALZiIUz64/s320/FireEngineDesign.png" alt="" id="BLOGGER_PHOTO_ID_5231653428302890706" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;And DTrace currently supports the tracing of this and here's a sample output of tracing it on my OpenSolaris 2-CPU machine&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;tayboonl@opensolaris:~/SunStudioProjects/TimeServer# dtrace -s ./trace.d -c ./dist/Debug/Sun12-Solaris-x86/timeserver&lt;br /&gt;dtrace: script './trace.d' matched 2 probes&lt;br /&gt;CPU FUNCTION                              &lt;br /&gt;1  -&amp;gt; squeue_getprivate                     Bound to CPU(1),Squeue Name:ip_squeue_cpu_1/1/0, Thread ID:380&lt;br /&gt;&lt;br /&gt;           ip`tcp_get_conn+0x22&lt;br /&gt;           ip`tcp_open+0x1c6&lt;br /&gt;           ip`tcp_openv4+0x24&lt;br /&gt;           genunix`qattach+0x160&lt;br /&gt;           genunix`stropen+0x490&lt;br /&gt;           sockfs`socktpi_open+0xac&lt;br /&gt;           sockfs`sotpi_create+0x11e&lt;br /&gt;           sockfs`so_socket+0x146&lt;br /&gt;           unix`_sys_sysenter_post_swapgs+0x14b&lt;br /&gt;OS Thread ID:63619&lt;br /&gt;&lt;br /&gt;0  -&amp;gt; squeue_getprivate                     Bound to CPU(0),Squeue Name:ip_squeue_cpu_0/0/0, Thread ID:290&lt;br /&gt;&lt;br /&gt;           ip`tcp_time_wait_collector+0x20&lt;br /&gt;           genunix`callout_execute+0xbf&lt;br /&gt;           genunix`taskq_thread+0x1a7&lt;br /&gt;           unix`thread_start+0x8&lt;br /&gt;OS Thread ID:34&lt;br /&gt;&lt;br /&gt;1  -&amp;gt; squeue_getprivate                     Bound to CPU(1),Squeue Name:ip_squeue_cpu_1/1/0, Thread ID:380&lt;br /&gt;&lt;br /&gt;           ip`tcp_time_wait_collector+0x20&lt;br /&gt;           genunix`callout_execute+0xbf&lt;br /&gt;           genunix`taskq_thread+0x1a7&lt;br /&gt;           unix`thread_start+0x8&lt;br /&gt;OS Thread ID:35&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;What you are seeing is that there is a 1-1-1 ratio to thread:cpu:squeue and what you see is something like&lt;br /&gt;"squeue name: ip_squeue_cpu 1/1/0 is bound to CPU-1 where the bounded thread id is 380" and next you can see that the threads id 34,35, 63619 is interacting with the 2 squeues.&lt;br /&gt;&lt;br /&gt;Here's another method you can try using mdb and attaching to the process in question&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;gt; ::squeue&lt;br /&gt;          ADDR STATE CPU            FIRST             LAST           WORKER&lt;br /&gt;ffffff01ca044dc0 02060   1 0000000000000000 0000000000000000 ffffff0007ec2c80&lt;br /&gt;ffffff01ca044e80 02060   0 0000000000000000 0000000000000000 ffffff0007c25c80&lt;br /&gt;&amp;gt; ffffff0007ec2c80::thread&lt;br /&gt;          ADDR    STATE  FLG PFLG SFLG   PRI  EPRI PIL             INTR&lt;br /&gt;ffffff0007ec2c80 sleep       8    0    3    60     0   0              n/a&lt;br /&gt;&amp;gt; ffffff0007c25c80::thread&lt;br /&gt;          ADDR    STATE  FLG PFLG SFLG   PRI  EPRI PIL             INTR&lt;br /&gt;ffffff0007c25c80 sleep       8    0    3    60     0   0              n/a&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Visit the &lt;a href="http://www.opensolaris.org/os"&gt;opensolaris&lt;/a&gt; website and browse through its &lt;a href="http://src.opensolaris.org/source/"&gt;source code&lt;/a&gt;, but most importantly have fun!&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-6007994125598957816?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/6007994125598957816/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=6007994125598957816&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6007994125598957816'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6007994125598957816'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/08/solaris-10-network-stack.html' title='Solaris 10 Network Stack'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_n2HkB0XD3Kw/SJqNUfz0RUI/AAAAAAAAAoU/8atOyc7ouqg/s72-c/network_magic_fig1.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4448209364719526970</id><published>2008-07-13T12:50:00.012+08:00</published><updated>2008-09-15T09:12:45.394+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>Java Generics - exploration ....</title><content type='html'>Coding in&lt;a href="http://java.sun.com/j2se/1.5.0/docs/guide/language/generics.html"&gt; Java generics &lt;/a&gt;(download this &lt;a href="http://java.sun.com/j2se/1.5/pdf/generics-tutorial.pdf"&gt;tutorial&lt;/a&gt; to gain an understanding) needs some time and experimentation in order to get it right and i discovered one of the pitfalls is how generic methods and type parameters are easily mixed up. Here's what i did and it took a java byte code viewer like &lt;a href="http://www.ej-technologies.com/products/jclasslib/overview.html"&gt;jclasslib&lt;/a&gt; to understand what is happening.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&lt;br /&gt;----------- WeirdBox.java --------------------------&lt;br /&gt;&lt;br /&gt;package generics.box;&lt;br /&gt;&lt;br /&gt;public class WeirdBox&amp;lt;T, X, Y , E&amp;gt; extends PaperBox&amp;lt;T, X, Y &amp;gt; implements WeirdBoxProp {&lt;br /&gt;&lt;br /&gt;public E weird;&lt;br /&gt;&lt;br /&gt;WeirdBox(T id, X name, Y manu, E weird) {&lt;br /&gt;    super.changeId(id);&lt;br /&gt;    super.setManufacturer(manu);&lt;br /&gt;    super.setName(name);&lt;br /&gt;    this.weird = weird;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;// isGlob(E t) &amp;amp; isTwistable(E t) is declared in the interface WeirdBoxProp&lt;br /&gt;public &amp;lt;E extends Boolean&amp;gt; E isGlob(E t) {&lt;br /&gt;// this.weird = t;&lt;br /&gt;// return this.weird;&lt;br /&gt;    return t;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;public &amp;lt;E extends Boolean&amp;gt; E isTwistable(E t) {&lt;br /&gt;// this.weird = t;&lt;br /&gt;// return this.weird;&lt;br /&gt;    return t;&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;----------- WeirdBoxProp.java --------------------------&lt;br /&gt;package generics.box;&lt;br /&gt;&lt;br /&gt;public interface WeirdBoxProp {&lt;br /&gt;&amp;lt;E extends Boolean&amp;gt; E isGlob(E t); // suppose to be a glob so subclass must return true&lt;br /&gt;&amp;lt;E extends Boolean&amp;gt; E isTwistable(E t); // suppose to be twistable so subclass must return true&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The thing to realize from this experiment was that the type parameter declaration statement "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;public E weird&lt;/span&gt;" in &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;WeirdBox.java&lt;/span&gt; is not related to that declared in the overridden methods isGlob(...) nor isTwistable(...). This is evident when you de-compile the java class(s) and here's what i saw&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;public E weird&lt;/span&gt; is translated to a object of type "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;java.lang.Object&lt;/span&gt;"&lt;br /&gt;&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;isGlob(...)&lt;/span&gt; is translated to a method of signature "&lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;&lt;(Ljava/lang/Boolean;)Ljava/lang/Boolean;&gt;&lt;/span&gt;" and similarly for &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;isTwistable(...) &lt;/span&gt;and therefore it explains why i had to comment out the statement "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;this.weird = t; return this.weird;&lt;/span&gt;" because it wasn't type-compatible.&lt;br /&gt;&lt;br /&gt;One of the ways i found (please drop me a email if you found another way) was to do the following changes in &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;WeirdBox.java&lt;/span&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;package generics.box;&lt;br /&gt;&lt;br /&gt;public class WeirdBox&amp;lt;T, X, Y, E1 extends Boolean&amp;gt; extends PaperBox&amp;lt;T, X, Y &amp;gt; implements WeirdBoxProp {&lt;br /&gt;&lt;br /&gt; public E1 weird;&lt;br /&gt;&lt;br /&gt; WeirdBox(T id, X name, Y manu, E1 weird) {&lt;br /&gt;     super.changeId(id);&lt;br /&gt;     super.setManufacturer(manu);&lt;br /&gt;     super.setName(name);&lt;br /&gt;     this.weird = weird;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; public &amp;lt;E extends Boolean&amp;gt; E isGlob(E t) {&lt;br /&gt;     this.weird = (E1)t;&lt;br /&gt;     return (E)this.weird;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt; public &amp;lt;E extends Boolean&amp;gt; E isTwistable(E t) {&lt;br /&gt;     this.weird = (E1)t;&lt;br /&gt;     return (E)this.weird;&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You have to provide unique type parameters inside the code otherwise it won't work. You can see that i had to perform some type casts to get it going and if you compiled it using "-Xlint" you would find that the compiler has the following warning messages:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;found   : E&lt;br /&gt;required: E1&lt;br /&gt;      this.weird = (E1)t;&lt;br /&gt;found   : E1&lt;br /&gt;required: E&lt;br /&gt;      return (E)this.weird;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;which is again weird because i assumed that &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;E&lt;/span&gt; and &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;E1&lt;/span&gt; are now of type java.lang.Boolean but the warning of an unchecked cast exception makes me a little jittery but perhaps i used it in the wrong way.&lt;br /&gt;&lt;br /&gt;--- An update on this post ---&lt;br /&gt;&lt;br /&gt;I found an alternative site on &lt;a href="http://www.angelikalanger.com/GenericsFAQ/FAQSections/TypeParameters.html"&gt;Java Generics&lt;/a&gt; (here's the link to the &lt;a href="http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.pdf"&gt;PDF version&lt;/a&gt;) and its done by &lt;a href="http://www.angelikalanger.com/index.html"&gt;Angelika Langer&lt;/a&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4448209364719526970?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4448209364719526970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4448209364719526970&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4448209364719526970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4448209364719526970'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/07/java-generics-exploration.html' title='Java Generics - exploration ....'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-2048734144537325051</id><published>2008-07-08T09:50:00.001+08:00</published><updated>2008-08-19T12:31:39.178+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Endian'/><title type='text'>Detecting Endian-ness in Java</title><content type='html'>Got a question on this from someone and the exact question was "how to check &lt;a href="http://en.wikipedia.org/wiki/Endianness"&gt;Endian&lt;/a&gt;ness in Java coding"&lt;br /&gt;&lt;br /&gt;The fact that &lt;a href="http://www.java.sun.com"&gt;Java &lt;/a&gt;is so prevalent in the industry and IT world is because you don't have to worry about such stuff. Java takes care of that for you in the Java Virtual Machine implementation and there isn't any API (correct me if i am wrong) that you can invoke like "&lt;span style="color: rgb(51, 102, 255);"&gt;isEndian&lt;/span&gt;()" , "&lt;span style="color: rgb(51, 102, 255);"&gt;BigIndian&lt;/span&gt;()" etc.&lt;br /&gt;&lt;br /&gt;In general, the only area i can think at the moment is when you are running Java on 2 different Endian machines like Windows &lt;--&gt; Unix. And in these cases, Java takes care of the byte-order/endian conversion to/fro in the JVM implementation. One good resource i can think of is the &lt;a href="https://openjdk.dev.java.net"&gt;OpenJDK &lt;/a&gt;project where you can &lt;a href="https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/"&gt;browse &lt;/a&gt;through their source codes to discover the mechanism.&lt;br /&gt;&lt;br /&gt;Hope this helps clear the air.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-2048734144537325051?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/2048734144537325051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=2048734144537325051&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2048734144537325051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/2048734144537325051'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/07/detecting-endian-ness-in-java.html' title='Detecting Endian-ness in Java'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-6395831537300339467</id><published>2008-07-08T08:54:00.003+08:00</published><updated>2008-08-19T12:31:55.672+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java Byte Codes'/><category scheme='http://www.blogger.com/atom/ns#' term='btrace'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><title type='text'>BTrace glitches ?</title><content type='html'>Some glitches here on tracing Kind.NEW. Fyi, it's an feature that is used to track object creation.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;// UDO == User Defined Object&lt;br /&gt;class UDO {&lt;br /&gt;UDO() {&lt;br /&gt;    System.out.println("UDO");&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;public class ObjAlloc {&lt;br /&gt;    Object id;&lt;br /&gt;&lt;br /&gt;    ObjAlloc() {&lt;br /&gt;     // Does not appear to work for Kind.NEW&lt;br /&gt;     //id = new String("ME");&lt;br /&gt;     //id = new String();&lt;br /&gt;     id = new Object();&lt;br /&gt;     //id = new UDO();&lt;br /&gt;&lt;br /&gt;     // Works for Kind.NEWARRAY&lt;br /&gt;     //id = new int[10];&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    private void A(String... str) {&lt;br /&gt;              // Works for Kind.NEW &amp;amp; Kind.NEWARRAY&lt;br /&gt;              //id = new int[10];&lt;br /&gt;              //id = new Object();&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    public static void main(String[] args) throws Exception {&lt;br /&gt;            System.out.println("Here");&lt;br /&gt;            ObjAlloc oa = new ObjAlloc();&lt;br /&gt;            while(true) {&lt;br /&gt;              new ObjAlloc().A();&lt;br /&gt;              Thread.currentThread().sleep(500);&lt;br /&gt;              System.out.println(".");&lt;br /&gt;            }&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Didn't work for &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Kind.NEW&lt;/span&gt; but worked for &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;Kind.NEWARRAY&lt;/span&gt; and filed an issue with BTrace's &lt;a href="https://btrace.dev.java.net/"&gt;website&lt;/a&gt;. Read about it by clicking &lt;a href="https://btrace.dev.java.net/issues/show_bug.cgi?id=9"&gt;here&lt;/a&gt; and i'll update you when i have updates. This is not bashing BTrace because i still think its a cool tool and the potential to challenge commercial implementations out there and like all software there will be glitches here and there, give them a break right ?&lt;br /&gt;&lt;br /&gt;--- This is an update on this article ---&lt;br /&gt;&lt;br /&gt;It turns out that this wasn't a glitch at all but due to my lack of understanding of the subject matter and i decided not remove this post so as a reminder. Having said that, i should proceed to show you how to do it correctly.&lt;br /&gt;&lt;br /&gt;Below is the correct way to do it on a BTrace script&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;@BTrace&lt;br /&gt;public class test {&lt;br /&gt;@OnMethod(clazz="Main", method="/.+/",&lt;br /&gt;&lt;br /&gt;location=@Location(value=Kind.NEW,clazz="/.+/"))&lt;br /&gt;public static void p() {&lt;br /&gt; println("Object Created");&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Or&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;import com.sun.btrace.annotations.*;&lt;br /&gt;import static com.sun.btrace.BTraceUtils.*;&lt;br /&gt;&lt;br /&gt;@BTrace&lt;br /&gt;public class test {&lt;br /&gt;@OnMethod(clazz="Main", method="/.+/",&lt;br /&gt;&lt;br /&gt;location=@Location(value=Kind.NEW,clazz="java.lang.Object"))&lt;br /&gt;public static void p() {&lt;br /&gt; println("Object Created");&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;what i failed to do previously was to investigate the possibility of adding the "clazz" attributed to the "Location" which was why i wasn't able to instrument and detect my code from the perspective of object creation.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-6395831537300339467?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/6395831537300339467/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=6395831537300339467&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6395831537300339467'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6395831537300339467'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/07/btrace-glitches.html' title='BTrace glitches ?'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-4974844740015457411</id><published>2008-06-29T22:10:00.003+08:00</published><updated>2008-08-19T12:32:11.457+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java Byte Codes'/><category scheme='http://www.blogger.com/atom/ns#' term='btrace'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><title type='text'>Using BTrace in Java 6</title><content type='html'>Got up to speed with &lt;a href="https://btrace.dev.java.net"&gt;BTrace&lt;/a&gt; recently and thanks to &lt;a href="http://blogs.sun.com/sundararajan"&gt;Sundararajan&lt;/a&gt;'s clarifications on syntax. This tool is simply TOO cool to give it up since in my opinion, someone whom has developed in Java and/or J2EE would quickly pick this technology up in no time :-)&lt;br /&gt;&lt;br /&gt;However, beware of some caveats as i found out while attempting to monitor certain Java classes in &lt;a href="http://java.sun.com"&gt;Java 6 &lt;/a&gt;- coincidentally i attempt to monitor the EJB lifecycle etc.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;btrace DEBUG: java.lang.RuntimeException: JSR/RET are not supported with computeFrames option&lt;br /&gt;java.lang.RuntimeException: JSR/RET are not supported with computeFrames option&lt;br /&gt;       at org.objectweb.asm.Frame.a(Unknown Source)&lt;br /&gt;       at org.objectweb.asm.MethodWriter.visitJumpInsn(Unknown Source)&lt;br /&gt;       at org.objectweb.asm.MethodAdapter.visitJumpInsn(Unknown Source)&lt;br /&gt;       at org.objectweb.asm.ClassReader.accept(Unknown Source)&lt;br /&gt;       at org.objectweb.asm.ClassReader.accept(Unknown Source)&lt;br /&gt;       at com.sun.btrace.runtime.InstrumentUtils.accept(InstrumentUtils.java:66)&lt;br /&gt;       at com.sun.btrace.runtime.InstrumentUtils.accept(InstrumentUtils.java:62)&lt;br /&gt;       at com.sun.btrace.agent.Client.instrument(Client.java:261)&lt;br /&gt;       at com.sun.btrace.agent.Client.transform(Client.java:101)&lt;br /&gt;       at sun.instrument.TransformerManager.transform(TransformerManager.java:169)&lt;br /&gt;       at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:365)&lt;br /&gt;       at sun.instrument.InstrumentationImpl.retransformClasses0(Native Method)&lt;br /&gt;       at sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:124)&lt;br /&gt;       at com.sun.btrace.agent.Main.handleNewClient(Main.java:278)&lt;br /&gt;       at com.sun.btrace.agent.Main.startServer(Main.java:245)&lt;br /&gt;       at com.sun.btrace.agent.Main.access$000(Main.java:53)&lt;br /&gt;       at com.sun.btrace.agent.Main$1.run(Main.java:127)&lt;br /&gt;       at java.lang.Thread.run(Thread.java:619)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If you attempted to instrument certain code in your EJBs like &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;ejbXXX()&lt;/span&gt; methods you might or would find something similar to the above and that would certainly kill your interest but i would advise you not to be too hasty in this as it still works in the J2SE 1.5.x platform.&lt;br /&gt;&lt;br /&gt;But this certainly sparked my interest in the algorithm that is causing this problem and i went browsing through the fundamental technology in which &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;BTrace&lt;/span&gt; was built - &lt;a href="http://asm.objectweb.org"&gt;ASM&lt;/a&gt;. According to the ASM 3.0 documentation &amp;amp; the &lt;a href="http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html"&gt;Java Virtual Machine specifications 2nd Edition&lt;/a&gt;, it turns out ASM is having a hard time trying to run its algorithm &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;"execute()"&lt;/span&gt; which essentially will simulate the java bytecode instruction on the &lt;span style="font-style: italic;"&gt;output stack frame&lt;/span&gt; ; so the current version of this software will actually throw a Java unchecked exception i.e. java.lang.RuntimeException to inform the user of this tool (in this case, myself) that its not supported .... yet.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-4974844740015457411?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/4974844740015457411/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=4974844740015457411&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4974844740015457411'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/4974844740015457411'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/06/using-btrace-in-java-6.html' title='Using BTrace in Java 6'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8424064783672558388</id><published>2008-06-22T14:10:00.002+08:00</published><updated>2008-08-19T12:32:46.874+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='btrace'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><title type='text'>BTrace - BCI "cool" tool</title><content type='html'>Recently, i got to know about &lt;a href="https://btrace.dev.java.net/"&gt;BTrace&lt;/a&gt; which is really another cool technology by &lt;a href="http://blogs.sun.com/sundararajan/"&gt;A. Sundararajan&lt;/a&gt; on monitoring JVM behavior and it does this via instrumenting the &lt;a href="http://www.java.sun.com/"&gt;Java&lt;/a&gt; bytecode.&lt;br /&gt;&lt;br /&gt;For an actual &amp;amp; real world demonstration, you can read a good article by &lt;a href="http://weblogs.java.net/blog/binod/archive/2008/06/sailfin_work_an.html"&gt;Binod&lt;/a&gt; which illustrates his adventure in using this technology to solve a memory leak problem on the &lt;a href="https://sailfin.dev.java.net/"&gt;sailfin&lt;/a&gt; which is in a nutshell a &lt;a href="http://jcp.org/en/jsr/detail?id=289"&gt;SIP&lt;/a&gt; container based largely on the &lt;a href="http://jcp.org/en/jsr/detail?id=289"&gt;JSR-289&lt;/a&gt;.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8424064783672558388?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8424064783672558388/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8424064783672558388&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8424064783672558388'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8424064783672558388'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/06/btrace-bci-cool-tool.html' title='BTrace - BCI &quot;cool&quot; tool'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-6554656195365514340</id><published>2008-06-15T14:32:00.001+08:00</published><updated>2008-12-09T10:09:17.944+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><category scheme='http://www.blogger.com/atom/ns#' term='Dtrace'/><category scheme='http://www.blogger.com/atom/ns#' term='jconsole'/><title type='text'>Detecting deadlocks in JVM using DTrace</title><content type='html'>In this article, i am going to demonstrate using DTrace to detect deadlocks in the application code. Before i do that, here's how you  can do it using &lt;span style="font-style: italic; color: rgb(0, 204, 204);"&gt;jconsole&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;Simply start up &lt;span style="font-style: italic; color: rgb(0, 204, 204);"&gt;jconsole&lt;/span&gt; and connect to the JVM either locally or remotely using the jconsole GUI and once you are able to log into the java virtual machine, navigate to the "Thread Tab" and look for the "Detect Deadlock" and activate it by clicking it. If your code has a deadlock situation, jconsole would pick it up. See the screen shots below.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SFSrvfbbQEI/AAAAAAAAAoE/4PC9Hvh4gwk/s1600-h/Screenshot-Java+Monitoring+%26+Management+Console.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SFSrvfbbQEI/AAAAAAAAAoE/4PC9Hvh4gwk/s320/Screenshot-Java+Monitoring+%26+Management+Console.png" alt="" id="BLOGGER_PHOTO_ID_5211979501153239106" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SFSr3YU8kUI/AAAAAAAAAoM/xylcjyWGlGE/s1600-h/Screenshot-Java+Monitoring+%26+Management+Console-1.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SFSr3YU8kUI/AAAAAAAAAoM/xylcjyWGlGE/s320/Screenshot-Java+Monitoring+%26+Management+Console-1.png" alt="" id="BLOGGER_PHOTO_ID_5211979636685967682" border="0" /&gt;&lt;/a&gt;From the screen shots, you can see that deadlock detection is pretty easy. You can achieve this using DTrace as well. Here's what i did with a sample d script and the output and of course it needs further refining (JDK SE 6 currently supports limited functionality on this).&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;/*&lt;br /&gt;Detect deadlocks in the JVM&lt;br /&gt;&lt;br /&gt;@author: Raymond Tay&lt;br /&gt;@date: 15 June 2008&lt;br /&gt;@version: 1.0&lt;br /&gt;&lt;br /&gt;*/&lt;br /&gt;profile-100msec&lt;br /&gt;{&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;monitor-contended-enter&lt;br /&gt;{&lt;br /&gt;this-&amp;gt;threadid = arg0;&lt;br /&gt;this-&amp;gt;monitorid = arg1;&lt;br /&gt;this-&amp;gt;monitorclass = (string)copyin(arg2, arg3+1);&lt;br /&gt;printf("Attempt to enter! tid=%d, monitor id=%d, monitor class=%s\n", this-&amp;gt;threadid, this-&amp;gt;monitorid, this-&amp;gt;monitorclass);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;monitor-contended-entered&lt;br /&gt;{&lt;br /&gt;this-&amp;gt;threadid = arg0;&lt;br /&gt;this-&amp;gt;monitorid = arg1;&lt;br /&gt;this-&amp;gt;monitorclass = (string)copyin(arg2, arg3+1);&lt;br /&gt;printf("Entered ! tid=%d, monitor id=%d, monitor class=%s\n", this-&amp;gt;threadid, this-&amp;gt;monitorid, this-&amp;gt;monitorclass);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;And the output from a typical run&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;bash-3.2# dtrace -q -s ./detectdl.d -c "java &lt;span style="font-weight: bold;"&gt;-XX:+ExtendedDTraceProbes&lt;/span&gt; -jar /export/home/tayboonl/NetBeansProjects/DeadlockApp/dist/DeadlockApp.jar "&lt;br /&gt;MyThread1 got mon1&lt;br /&gt;MyThread1 trying to get monitor2...&lt;br /&gt;MyThread2 got mon2&lt;br /&gt;MyThread2 trying to get monitor1...&lt;br /&gt;Attempt to enter! tid=8, monitor id=135675668, monitor class=sun/misc/Launcher$AppClassLoader&lt;br /&gt;Entered ! tid=8, monitor id=135675668, monitor class=sun/misc/Launcher$AppClassLoader&lt;br /&gt;Attempt to enter! tid=8, monitor id=135675868, monitor class=&lt;span style="font-weight: bold;"&gt;java/lang/Object&lt;/span&gt;&lt;br /&gt;Attempt to enter! tid=9, monitor id=135675968, monitor class=&lt;span style="font-weight: bold;"&gt;java/lang/Object&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;From my code below, i know that my threads are contending for a monitor of type "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;java.lang.Object&lt;/span&gt;" and the two threads contending for them are tid=8 and tid=9 but of course the information offered by jconsole is more detailed than what i offer here and its partly due to the support provided by SE6 at the time of this writing. Probably will refine it later to include the same kind of information as provided by jconsole.&lt;br /&gt;&lt;br /&gt;Last thing to note about the output above is that you need to input the JVM parameter "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;-XX:+ExtendedDTraceProbes&lt;/span&gt;" to the JVM inorder to enable the probes firing where by default, its turned off since there's a performance penalty. As far as the documentation goes, the JDK team is currently fixing it and probably will enable it by default in the next major release of the JDK.&lt;br /&gt;&lt;br /&gt;Here's the sample java code&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;/*&lt;br /&gt; * To change this template, choose Tools | Templates&lt;br /&gt; * and open the template in the editor.&lt;br /&gt; */&lt;br /&gt;package deadlockapp;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; *&lt;br /&gt; * @author tayboonl&lt;br /&gt; */&lt;br /&gt;public class Main {&lt;br /&gt;&lt;br /&gt;    Object monitor1 = new Object();&lt;br /&gt;    Object monitor2 = new Object();&lt;br /&gt;&lt;br /&gt;    class MyThread1 extends Thread {&lt;br /&gt;&lt;br /&gt;        public void run() {&lt;br /&gt;            synchronized (monitor1) {&lt;br /&gt;                System.out.println("MyThread1 got mon1");&lt;br /&gt;                System.out.println("MyThread1 trying to get monitor2...");&lt;br /&gt;                synchronized (monitor2) {&lt;br /&gt;                    System.out.println("MyThread1 got mon2");&lt;br /&gt;                }&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    class MyThread2 extends Thread {&lt;br /&gt;&lt;br /&gt;        public void run() {&lt;br /&gt;            synchronized (monitor2) {&lt;br /&gt;                System.out.println("MyThread2 got mon2");&lt;br /&gt;                System.out.println("MyThread2 trying to get monitor1...");&lt;br /&gt;                synchronized (monitor1) {&lt;br /&gt;                    System.out.println("MyThread2 got mon1");&lt;br /&gt;                }&lt;br /&gt;            }&lt;br /&gt;        }&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    /**&lt;br /&gt;     * @param args the command line arguments&lt;br /&gt;     */&lt;br /&gt;    public static void main(String[] args) {&lt;br /&gt;        Main m = new Main();&lt;br /&gt;        m.new MyThread1().start();&lt;br /&gt;        m.new MyThread2().start();&lt;br /&gt;    // TODO code application logic here&lt;br /&gt;    }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Have fun!&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-6554656195365514340?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/6554656195365514340/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=6554656195365514340&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6554656195365514340'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/6554656195365514340'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/06/detecting-deadlocks-in-jvm-using-dtrace.html' title='Detecting deadlocks in JVM using DTrace'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_n2HkB0XD3Kw/SFSrvfbbQEI/AAAAAAAAAoE/4PC9Hvh4gwk/s72-c/Screenshot-Java+Monitoring+%26+Management+Console.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-8515484677403168475</id><published>2008-06-15T12:45:00.003+08:00</published><updated>2008-08-19T12:34:01.253+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JVM'/><category scheme='http://www.blogger.com/atom/ns#' term='Dtrace'/><title type='text'>Monitoring JVM (JavaSE6) activity using DTrace</title><content type='html'>DTrace has been around for what ... 3 years or so i think and there are plenty of articles whereby bloggers like myself used DTrace in one way or another to monitor the internals of the JVM and possibly use JVM-DTrace to solve some real-world problems. So in this article, i like to bring to your attention the fact that DTrace probes are shipped with the latest version of the &lt;a href="http://java.sun.com/javase/6"&gt;Java Development Kit version 6.x&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's a simple D script (that is written in the D programming language) which monitors the JVM activity and in particular garbage collection.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;/*&lt;br /&gt;&lt;br /&gt;Tracks the JVM via the "hotspot" provider&lt;br /&gt;@author: Raymond Tay&lt;br /&gt;@date:  15 June 2008&lt;br /&gt;@version: 1.0&lt;br /&gt;&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;struct mem_pool {&lt;br /&gt;string name;&lt;br /&gt;long long init_size;&lt;br /&gt;long long mem_inuse;&lt;br /&gt;long long max_size;&lt;br /&gt;long long gc_start_time;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;/*&lt;br /&gt;Declare pool as a array of JVM memory pools&lt;br /&gt;*/&lt;br /&gt;struct mem_pool pool[string];&lt;br /&gt;this string poolname;&lt;br /&gt;&lt;br /&gt;BEGIN&lt;br /&gt;{&lt;br /&gt;printf("Tracing has begun\n");&lt;br /&gt;}&lt;br /&gt;mem-pool-gc-begin&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;this-&amp;gt;poolname = (string)copyin(arg2, arg3+1);&lt;br /&gt;pool[this-&amp;gt;poolname].name = this-&amp;gt;poolname;&lt;br /&gt;pool[this-&amp;gt;poolname].init_size = arg4;&lt;br /&gt;pool[this-&amp;gt;poolname].mem_inuse = arg5;&lt;br /&gt;pool[this-&amp;gt;poolname].max_size = arg7;&lt;br /&gt;pool[this-&amp;gt;poolname].gc_start_time = vtimestamp;&lt;br /&gt;printf("\tGC about to begin for name=%s-&amp;gt;(init=%d,inuse=%d,max=%d)\n",this-&amp;gt;poolname,arg4,arg5,arg7);&lt;br /&gt;}&lt;br /&gt;mem-pool-gc-end&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;this-&amp;gt;poolname = copyin(arg2, arg3+1);&lt;br /&gt;this-&amp;gt;FreedMem = pool[this-&amp;gt;poolname].mem_inuse - arg5;&lt;br /&gt;this-&amp;gt;FreeMem = pool[this-&amp;gt;poolname].max_size - arg5;&lt;br /&gt;printf("\tGC ended for name=%s-&amp;gt;(init=%d,inuse=%d,freed=%d,max=%d)\n",this-&amp;gt;poolname,arg4,arg5,this-&amp;gt;FreedMem,arg7);&lt;br /&gt;pool[this-poolname].mem_inuse = arg5;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;END&lt;br /&gt;{&lt;br /&gt;printf("Tracing has ended, thanks\n");&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;If you were to run this on your Solaris 10, you would find the output as below&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;...&lt;br /&gt;    GC about to begin for name=Code Cache-&amp;gt;(init=163840,inuse=4853312,max=33554432)&lt;br /&gt;    GC about to begin for name=Eden Space-&amp;gt;(init=16580608,inuse=16646144,max=33030144)&lt;br /&gt;    GC about to begin for name=Survivor Space-&amp;gt;(init=2031616,inuse=307648,max=4128768)&lt;br /&gt;    GC about to begin for name=Tenured Gen-&amp;gt;(init=247791616,inuse=99175912,max=495583232)&lt;br /&gt;    GC about to begin for name=Perm Gen-&amp;gt;(init=50331648,inuse=93384752,max=134217728)&lt;br /&gt;    GC about to begin for name=Perm Gen [shared-ro]-&amp;gt;(init=8388608,inuse=6633584,max=8388608)&lt;br /&gt;    GC about to begin for name=Perm Gen [shared-rw]-&amp;gt;(init=12582912,inuse=7532648,max=12582912)&lt;br /&gt;    GC ended for name=Code Cache-&amp;gt;(init=163840,inuse=4853312,freed=0,max=33554432)&lt;br /&gt;    GC ended for name=Eden Space-&amp;gt;(init=16580608,inuse=0,freed=16646144,max=33030144)&lt;br /&gt;    GC ended for name=Survivor Space-&amp;gt;(init=2031616,inuse=240352,freed=67296,max=4128768)&lt;br /&gt;    GC ended for name=Tenured Gen-&amp;gt;(init=247791616,inuse=99175912,freed=0,max=495583232)&lt;br /&gt;    GC ended for name=Perm Gen-&amp;gt;(init=50331648,inuse=93384752,freed=0,max=134217728)&lt;br /&gt;    GC ended for name=Perm Gen [shared-ro]-&amp;gt;(init=8388608,inuse=6633584,freed=0,max=8388608)&lt;br /&gt;    GC ended for name=Perm Gen [shared-rw]-&amp;gt;(init=12582912,inuse=7532648,freed=0,max=12582912)&lt;br /&gt;...&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Obviously, you can customize the script to include calculations for gc times, number of gc invocations, etc and all this made possible by the &lt;a href="http://java.sun.com/javase/6/docs/technotes/guides/vm/dtrace.html"&gt;HotSpot Dtrace Providers&lt;/a&gt; in the latest JDK6. Below is the sample script which would have the capability to detect thread starts or stops, java classes loading and unloading, gc times etc&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;/*&lt;br /&gt;&lt;br /&gt;Tracks the JVM via the "hotspot" provider&lt;br /&gt;@author: Raymond Tay&lt;br /&gt;@date:  15 June 2008&lt;br /&gt;@version: 1.0&lt;br /&gt;&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;struct mem_pool {&lt;br /&gt;string name;&lt;br /&gt;long long init_size;&lt;br /&gt;long long mem_inuse;&lt;br /&gt;long long max_size;&lt;br /&gt;long long gc_start_time;&lt;br /&gt;};&lt;br /&gt;&lt;br /&gt;/*&lt;br /&gt;Declare pool as a array of JVM memory pools&lt;br /&gt;*/&lt;br /&gt;struct mem_pool pool[string];&lt;br /&gt;this string poolname;&lt;br /&gt;&lt;br /&gt;BEGIN&lt;br /&gt;{&lt;br /&gt;printf("Tracing has begun\n");&lt;br /&gt;}&lt;br /&gt;/* ---------------------- */&lt;br /&gt;vm-init-begin&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("JVM is initializing ...\n");&lt;br /&gt;}&lt;br /&gt;vm-init-end&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("JVM is initialized completely and going to run app. code\n");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;vm-shutdown&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("JVM is shutting down now\n");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;thread-start&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("Thread=(%s,%d),OS tid=(%d) (daemon:Y/N? %s) is starting\n", stringof(copyin(arg0,arg1+1)), arg2, arg3, arg4 == 0?"N":"Y");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;thread-stop&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("Thread=(%s,%d),OS tid=(%d) (daemon:Y/N? %s) is stopping\n", stringof(copyin(arg0,arg1+1)), arg2, arg3, arg4 == 0?"N":"Y");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;class-loaded&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("\t\t%s loaded\n", stringof(copyin(arg0,arg1+1)));&lt;br /&gt;}&lt;br /&gt;class-unloaded&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("\t\t%s un-loaded\n", stringof(copyin(arg0,arg1+1)));&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;gc-begin&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("\t%s about is starting..\n", arg0 == 0? "Full-GC":"GC");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;gc-end&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;printf("\tGC has completed\n");&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;mem-pool-gc-begin&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;this-&amp;gt;poolname = (string)copyin(arg2, arg3+1);&lt;br /&gt;pool[this-&amp;gt;poolname].name = this-&amp;gt;poolname;&lt;br /&gt;pool[this-&amp;gt;poolname].init_size = arg4;&lt;br /&gt;pool[this-&amp;gt;poolname].mem_inuse = arg5;&lt;br /&gt;pool[this-&amp;gt;poolname].max_size = arg7;&lt;br /&gt;pool[this-&amp;gt;poolname].gc_start_time = vtimestamp;&lt;br /&gt;printf("\tGC about to begin for name=%s-&amp;gt;(init=%d,inuse=%d,max=%d)\n",this-&amp;gt;poolname,arg4,arg5,arg7);&lt;br /&gt;}&lt;br /&gt;mem-pool-gc-end&lt;br /&gt;/execname == "java" &amp;amp;&amp;amp; pid == $1/&lt;br /&gt;{&lt;br /&gt;this-&amp;gt;poolname = copyin(arg2, arg3+1);&lt;br /&gt;this-&amp;gt;FreedMem = pool[this-&amp;gt;poolname].mem_inuse - arg5;&lt;br /&gt;this-&amp;gt;FreeMem = pool[this-&amp;gt;poolname].max_size - arg5;&lt;br /&gt;printf("\tGC ended for name=%s-&amp;gt;(init=%d,inuse=%d,freed=%d,max=%d) took %d msec.\n",this-&amp;gt;poolname,arg4,arg5,this-&amp;gt;FreedMem,arg7, (vtimestamp - pool[this-&amp;gt;poolname].gc_start_time)/1000000);&lt;br /&gt;pool[this-poolname].mem_inuse = arg5;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;/* ---------------------- */&lt;br /&gt;&lt;br /&gt;END&lt;br /&gt;{&lt;br /&gt;printf("Tracing has ended, thanks\n");&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Depending on your preference, you might find the information in &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;jconsole&lt;/span&gt; equally appealing. &lt;a href="http://weblogs.java.net/blog/mandychung/"&gt;Mandy Chung&lt;/a&gt; from Sun Microsystems Inc. wrote a good article on this some years back and you can read it by clicking it &lt;a href="http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Last thing here is that, if you ran your d script on a jvm and closely monitor the garbage collection data, you should notice something peculiar on the portion related to "Tenured Generation" where you will find that with every gc, the size increases and this is due to the object generational promotion. For more details on this, you can click on this article: &lt;a href="http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html"&gt;&lt;span style="color: rgb(51, 102, 255); font-style: italic;"&gt;Java SE6 HotSpot VM Tuning&lt;/span&gt;&lt;/a&gt;.&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-8515484677403168475?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/8515484677403168475/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=8515484677403168475&amp;isPopup=true' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8515484677403168475'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/8515484677403168475'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/06/monitoring-jvm-javase6-activity-using.html' title='Monitoring JVM (JavaSE6) activity using DTrace'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-260735170088865138</id><published>2008-06-03T22:32:00.001+08:00</published><updated>2008-08-19T12:34:16.949+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Solaris'/><title type='text'>Oracle 10g installation on Solaris 10 x86-64 is annoying ...</title><content type='html'>This is irritating as far as i can tell. I was trying to install &lt;a href="http://www.oracle.com/technology/software/index.html"&gt;Oracle 10g &lt;/a&gt;on &lt;a href="http://www.sun.com/"&gt;Solaris 10 x86&lt;/a&gt; and according to the Oracle website, it's supported and so happily downloaded everything and triggered the installer....so far so good but hit into a wall with the following error message:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;bash-3.2$ ./runInstaller&lt;br /&gt;Starting Oracle Universal Installer...&lt;br /&gt;&lt;br /&gt;Checking installer requirements...&lt;br /&gt;&lt;br /&gt;Checking operating system version: must be 5.10.    Actual 5.11&lt;br /&gt;                                    Failed &lt;&lt;&lt;&lt;  &lt;/pre&gt;&lt;br /&gt;Now that's obviously annoying since my OS is a &lt;a href="http://www.sun.com/"&gt;solaris 10 &lt;/a&gt;so what's going on ? Well it turns out that this irritating thing is widely known and you can search for it on &lt;a href="http://www.google.com/"&gt;Google&lt;/a&gt; and see tons of complaints about this ...&lt;br /&gt;&lt;br /&gt;What happens is that the Oracle 10 g installer invokes a particular program and its a hidden file (name: &lt;span style="font-style: italic;"&gt;.oui&lt;/span&gt;) and i guess the name is &lt;span style="font-weight: bold;"&gt;O&lt;/span&gt;racle &lt;span style="font-weight: bold;"&gt;U&lt;/span&gt;niversal &lt;span style="font-weight: bold;"&gt;I&lt;/span&gt;nstaller and hence the name &lt;span style="font-weight: bold;"&gt;oui&lt;/span&gt; and so happens the program invokes "&lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;uname -r&lt;/span&gt;" and on my system returns 5.11 !&lt;br /&gt;&lt;br /&gt;For some reason unknown to me, my native version of &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;uname&lt;/span&gt; contains the hard-coded string 5.11 which of course is used to determine the platform on which i am using. Btw, use the Solaris/Unix utility &lt;a href="http://docs.sun.com/app/docs/doc/816-0210/6m6nb7mm2?a=view"&gt;&lt;span style="font-style: italic;"&gt;strings&lt;/span&gt;&lt;/a&gt; to discover the embedded strings in the program and this works in general for compiled programs. But there's a caveat and that is you will see that the same hard-coded string is found in &lt;span style="font-style: italic; color: rgb(51, 102, 255);"&gt;/lib/libc.so.1&lt;/span&gt; yes that's right a shared library.... and that's the end of the road for me.&lt;br /&gt;&lt;br /&gt;My advise to you is not to touch it the shared file as it might affect your other applications running on your machine ... unless you know EXACTLY 100% what you are doing, you might as well wait for a patch or the next release and perhaps Sun might fix this&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-260735170088865138?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/260735170088865138/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=260735170088865138&amp;isPopup=true' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/260735170088865138'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/260735170088865138'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/06/oracle-10g-installation-on-solaris-10.html' title='Oracle 10g installation on Solaris 10 x86-64 is annoying ...'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3008092027590060742</id><published>2008-05-26T19:03:00.005+08:00</published><updated>2008-08-19T12:34:54.055+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Bug'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><title type='text'>Solaris utility "cputrack" glitches</title><content type='html'>In my previous article whereby i demonstrated that you can use "cputrack" to monitor CPU hardware counters but it turns out that there was a glitch and subsequently i filed a bug report with Sun and they assigned an id for it: &lt;a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6705264"&gt;6705264&lt;/a&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-3008092027590060742?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/3008092027590060742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=3008092027590060742&amp;isPopup=true' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3008092027590060742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/3008092027590060742'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/05/solaris-utility-cputrack-glitches.html' title='Solaris utility &quot;cputrack&quot; glitches'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-5815338328864450977</id><published>2008-05-24T13:00:00.009+08:00</published><updated>2008-12-09T10:09:18.636+08:00</updated><title type='text'>IT salary survey 2008</title><content type='html'>Found an interesting comparison survey conducted by ZDNet Asia on the latest IT salaries in the island state of Singapore. Here are the snapshots from the report. You can draw your own conclusions :-)&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SDej--PfXMI/AAAAAAAAAnE/KlodwC3BuoU/s1600-h/IT_Salary_1.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SDej--PfXMI/AAAAAAAAAnE/KlodwC3BuoU/s320/IT_Salary_1.png" alt="" id="BLOGGER_PHOTO_ID_5203808196705082562" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SDekHePfXNI/AAAAAAAAAnM/GtHcfRMxyo8/s1600-h/IT_Salary_2.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SDekHePfXNI/AAAAAAAAAnM/GtHcfRMxyo8/s320/IT_Salary_2.png" alt="" id="BLOGGER_PHOTO_ID_5203808342733970642" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SDekOuPfXOI/AAAAAAAAAnU/RaSNvtekgAg/s1600-h/IT_Salary_3.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SDekOuPfXOI/AAAAAAAAAnU/RaSNvtekgAg/s320/IT_Salary_3.png" alt="" id="BLOGGER_PHOTO_ID_5203808467288022242" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SDekU-PfXPI/AAAAAAAAAnc/3rI11FfkGgU/s1600-h/IT_Salary_4.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://4.bp.blogspot.com/_n2HkB0XD3Kw/SDekU-PfXPI/AAAAAAAAAnc/3rI11FfkGgU/s320/IT_Salary_4.png" alt="" id="BLOGGER_PHOTO_ID_5203808574662204658" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SDekiePfXQI/AAAAAAAAAnk/emAebUpRud8/s1600-h/IT_Salary_5.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://2.bp.blogspot.com/_n2HkB0XD3Kw/SDekiePfXQI/AAAAAAAAAnk/emAebUpRud8/s320/IT_Salary_5.png" alt="" id="BLOGGER_PHOTO_ID_5203808806590438658" border="0" /&gt;&lt;/a&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");&lt;br /&gt;document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));&lt;br /&gt;&lt;/script&gt;&lt;br /&gt;&lt;script type="text/javascript"&gt;&lt;br /&gt;var pageTracker = _gat._getTracker("UA-5323400-1");&lt;br /&gt;pageTracker._trackPageview();&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5021319479788698734-5815338328864450977?l=raymondtay.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://raymondtay.blogspot.com/feeds/5815338328864450977/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5021319479788698734&amp;postID=5815338328864450977&amp;isPopup=true' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5815338328864450977'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5021319479788698734/posts/default/5815338328864450977'/><link rel='alternate' type='text/html' href='http://raymondtay.blogspot.com/2008/05/it-salary-survey-2008.html' title='IT salary survey 2008'/><author><name>Raymond Tay</name><uri>https://profiles.google.com/110897033209560123249</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh4.googleusercontent.com/-4ZP-rPmLNSw/AAAAAAAAAAI/AAAAAAAABao/8f4AuUXNrno/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_n2HkB0XD3Kw/SDej--PfXMI/AAAAAAAAAnE/KlodwC3BuoU/s72-c/IT_Salary_1.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5021319479788698734.post-3647658548704958587</id><published>2008-05-13T11:34:00.001+08:00</published><updated>2008-12-09T10:09:19.149+08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Sun Studio 12'/><category scheme='http://www.blogger.com/atom/ns#' term='C/C++'/><title type='text'>Profiling Intel Core CPU Data Cache Unit with Sun Studio 12</title><content type='html'>One of the days i spent idling, i wondered how i could monitor the hardware counters of the data cache of a typical CPU. I think Sun Studio is capable of doing this to a large extent since it supports &lt;a href="http://developers.sun.com/solaris/articles/sparcv9.html"&gt;SPARCV9,&lt;/a&gt; &lt;a href="http://www.intel.com/"&gt;Intel&lt;/a&gt; and &lt;a href="http://www.amd.com/"&gt;AMD&lt;/a&gt; CPU; at least that's what the documentation says.&lt;br /&gt;&lt;br /&gt;I know in a modern cpu core, there's both a instruction-cache(L1/L2) and data-cache (L1/L2) and using the Studio, i was able to measure this. I highly recommend you to download the Intel Manual and get the specifics from that document.&lt;br /&gt;&lt;br /&gt;For the purpose of this experiment, i am listing down my specs:&lt;br /&gt;L1 Instruction Cache = 32-Kb, 8-way set associative;&lt;br /&gt;L1 Data Cache = 32-Kb, 8-way set associative, 64-byte cache line size;&lt;br /&gt;L2 Unified Cache = 4-MB, 16-way set associative, 64-byte cache line size.&lt;br /&gt;&lt;br /&gt;Fyi, &lt;a href="http://www.slcentral.com/articles/00/10/cache/page2.php"&gt;&lt;span style="font-style: italic;"&gt;Unified Cache&lt;/span&gt;&lt;/a&gt; here is another glorious term for &lt;span style="font-style: italic;"&gt;Instruction Cache&lt;/span&gt; +  &lt;span style="font-style: italic;"&gt;Data Cache&lt;/span&gt;. So next question is how do we go about measuring the data ?&lt;br /&gt;&lt;br /&gt;As a matter of fact, there are a couple of ways to do this (1) cputrack (2) collect (3) Sun's API. Let's go through them one by one.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Cputrack(1)&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Read the man page on it; you probably discover many counters but the ones i am interested are prefix with &lt;span style="font-style: italic;"&gt;dcu&lt;/span&gt;_* since these counters measure the DCU. A typical example for measuring this would be:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;cputrack -v -c pic0=dcu_lines_in,sys,&lt;span style="font-weight: bold; color: rgb(102, 204, 204);"&gt;pic1=dcu_m_lines_in&lt;/span&gt;,sys ./dist/Debug/Sun12-Solaris-x86/c-programs-testing&lt;br /&gt;time lwp      event      pic0      &lt;span style="font-weight: bold; color: rgb(102, 204, 204);"&gt;pic1&lt;/span&gt;&lt;br /&gt;0.003   1   init_lwp         0         0&lt;br /&gt;1.018   1       tick  28291512         0&lt;br /&gt;2.008   1       tick  32125120         0&lt;br /&gt;3.018   1       tick  34223995         0&lt;br /&gt;4.018   1       tick  31042292         0&lt;br /&gt;4.192   1   fini_lwp 131022626         0&lt;br /&gt;4.192   1       exit 131022626         0&lt;br /&gt;bash-3.2$&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;But there's a problem on my machine for the above command. That is, my machine wasn't able to capture the counter "dcm_m_lines_in". Now if i re-arrange the sequence like below, i get the data.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;bash-3.2$ cputrack -v -c &lt;span style="font-weight: bold; color: rgb(102, 204, 204);"&gt;pic0=dcu_m_lines_in&lt;/span&gt;,sys,pic1=dcu_m_lines_in,sys ./dist/Debug/Sun12-Solaris-x86/c-programs-testing&lt;br /&gt;time lwp      event      &lt;span style="font-weight: bold; color: rgb(102, 204, 204);"&gt;pic0&lt;/span&gt;      pic1&lt;br /&gt;0.003   1   init_lwp         0         0&lt;br /&gt;1.020   1       tick   4035601         0&lt;br /&gt;2.020   1       tick   4641422         0&lt;br /&gt;3.020   1       tick   4283672         0&lt;br /&gt;4.020   1       tick   4235567         0&lt;br /&gt;4.157   1   fini_lwp  17793332         0&lt;br /&gt;4.157   1       exit  17793332         0&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Interesting.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Collect&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is the tool provided and included in the Sun Studio 12 which works similar to &lt;span style="font-style: italic;"&gt;cputrack&lt;/span&gt;; infact the counters available in cputrack are what that is used in &lt;span style="font-style: italic;"&gt;collect&lt;/span&gt;. This applies to the limitations as well. I am not sure if its a new problem but after searching Sun's bug database, i don't think its an existing one.&lt;br /&gt;&lt;br /&gt;Below are a couple of screen shots illustrating how easy it is to collect my CPU's hardware counter's data.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SCfiLQLQD8I/AAAAAAAAAms/0wjrM6STDq8/s1600-h/collect_1.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SCfiLQLQD8I/AAAAAAAAAms/0wjrM6STDq8/s320/collect_1.png" alt="" id="BLOGGER_PHOTO_ID_5199372977771122626" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SCfiVQLQD9I/AAAAAAAAAm0/o-zIgAlI2Lc/s1600-h/collect_2.png"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="http://3.bp.blogspot.com/_n2HkB0XD3Kw/SCfiVQLQD9I/AAAAAAAAAm0/o-zIgAlI2Lc/s320/collect_2.png" alt="" id="BLOGGER_PHOTO_ID_5199373149569814482" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;After which, simply open the Studio experiment and examine the data. Again the limitations i cited in cputrack exists in this as well, at least for me. Would appreciate it if you could drop me a mail if you find its ok for you.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-weight: bold;"&gt;Sun's API&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I wrote a test program to illustrate the collection of hardware counters for profiling. Thanks to the team whom provided the skeleton code from the man page of cpc_bind_curlwp.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;newmain.c&lt;br /&gt;----------&lt;br /&gt;/*&lt;br /&gt;* File:   newmain.c&lt;br /&gt;* Author: tayboonl&lt;br /&gt;*&lt;br /&gt;* Created on April 22, 2008, 1:26 PM&lt;br /&gt;*/&lt;br /&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;string.h&amp;gt;&lt;br /&gt;#include &amp;lt;time.h&amp;gt;&lt;br /&gt;&lt;br /&gt;#define NUM_OF_ITEMS 100000&lt;br /&gt;&lt;br /&gt;/* function prototype */&lt;br /&gt;void hwc_init(void);&lt;br /&gt;void hwc_finit(void);&lt;br /&gt;&lt;br /&gt;struct test {&lt;br /&gt;  char
