Sunday, 1 December 2013

OpenCL on the Samsung Chromebook ARM - Benchmarks

This is a second post on using OpenCL on the Chromebook ARM. The previous one gives instructions to install OpenCL drivers and SDK on the Samsung Chromebook ARM, without requiring to boot a separate Ubuntu, by using crouton. This post compares OpenMP and OpenCL performance of the Chromebook ARM with a 4-year old laptop.

Test setup

In these tests, I compare my 4-year old Dell laptop with the Samsung ARM Chromebook. It's obviously not a very fair comparison: The laptop is quite obsolete now (and will actually be replaced soon). On the other hand, the Samsung ARM is a budget device, with a ridiculously low power consumption.

Dell Latitude E6400Samsung Chromebook ARM
CPUIntel Core2 Duo T9950 @ 2.66Ghz
(no hyperthreading)
Samsung Exynos 5 Dual (5250)
(Cortex A15; 1.7GHz dual core cpu)
GPUNVIDIA Corporation G98M
(Quadro NVS 160M)
256MB dedicated RAM
ARM Mali T604
(quad core)
(gcc 4.8.2)
Ubuntu 12.04 in crouton
(gcc 4.6.3)

As a benchmark test suite, we use Rodinia, a CUDA/OpenMP/OpenCL test suite from the University of Virginia.

The test suite does not compile unmodified, and for some OpenCL tests, the number of threads need to be reduced to fit in the limited memory of both computers. Complete instructions and patches can be found in my github repository.

Test results - OpenMP

First we show comparisons using OpenMP, that only makes use of the CPU. We expect the Intel laptop to be far superior, and this is what we get:
The ARM CPU alone is between 1.75 and 4.5 times slower (average: 2.87 times slower). Not a big surprise considered the lower frequency and simpler architecture. On the other hand, the Chromebook stays cool (no fan!), while the Dell laptop blows out hot air on the side.

Test results - OpenCL

We can then compare the GPUs, using OpenCL tests in Rodinia:
And this is where we get a nice surprise: the ARM GPU is very close in performance in most tests. Excluding ParticleFilter, which is 8 times slower, the GPU is, in the worst case, 1.94 times slower, and it is even 1.8 times faster in the Kmeans test (average: 1.25 times slower).

For some reason, I could not get the OpenCL code to compile on the Samsung ARM for LavaMD (it fails with CL_INVALID_KERNEL_ARGS), but I didn't try very hard. Let me know if you find a way! It would also be interesting to figure out why ParticleFilter is so slow.

Test results correctness

The benchmark timings need to be taken with a bit a precaution, in case the results are garbled. Some tests do not produce any output, so it's hard to tell if the computation is correct. On the other hand, hotspot produce some output that can be plotted:

As you can see, the results look identical in all cases.

To analyse the differences more precisely, we can measure some average numerical error between data results x and y of 2 different implementations, as follows:

And a summary of these errors, for 2 of the tests, where some output is created:

x86 OpenMP
x86 OpenCL
x86 OpenMP
x86 OpenCL

As you can see, running the OpenMP code on ARM and x86 gives identical results. The OpenCL results are also very close. That basically means the comparisons above between the 2 laptops is fair.

When it comes to differences between OpenMP and OpenCL code, the HotSpot test shows a good agreement between the 2 versions. On the other hand, the CFD test outputs very different results between the 2 implementations. This is worrisome as CFD is one of the tests that shows most improvement using OpenCL compared to OpenMP...

Test results - overall

The next graph shows all the results aggregated.
Assuming OpenCL and OpenMP implementations give similar results (which is actually doubtful in some cases, see the previous section), running OpenCL code on the Samsung Chromebook ARM can help a lot in terms of performance: On the Dell laptop, using OpenCL improves performance by a factor 2.25 on average. On the Chromebook ARM, the ratio is 4.1! And this is without any attempt at optimizing the code for the Mali architecture, which is quite different from a normal GPU (in particular, it has no local memory, so data does not need to be copied back and forth).

I'd also like to try some real applications, my next project is to get darktable running (a RAW photo developer). Do let me know if you have some real applications using OpenCL! I'll follow up with another post if I get them to work.

Saturday, 23 November 2013

OpenCL on the Samsung Chromebook ARM, under crouton

In this post we’re going to look into OpenCL development on the Samsung Chromebook ARM, using crouton, and turn your Chromebook into a tiny supercomputer for CFD or bioinformatics.

ARM recently posted some instructions about OpenCL development on the Chromebook ARM, but these require creating a separate Ubuntu installation that you boot off a USB drive, and are quite lenghty. The instructions I provide here are very simple: They make use of crouton, that allows you to run Chrome OS and Ubuntu in parallel, so that you can develop OpenCL applications without rebooting.

Download required files

Download the following files, and put them in Chrome OS Downloads folder:
  • crouton: Read the information there if you do not know (yet) what crouton is all about. One important note, crouton requires that you switch your Chromebook to developer mode, which wipes all the data on the Chromebook (but that's required by the ARM approach as well).
  • Chrome OS does not provide an OpenCL driver, but you can get it from this page: Mali Binary User Space Driver, Linux r3p0-02rel0 (22nd October 2013), X11 version. You can use this direct link to get the tarball.
    Luckily, the newer userspace drivers work with the current kernel drivers in Chrome OS.
  • ARM Mali OpenCL SDK, so you can test some pretty examples.

Create crouton chroot

We are going to create a precise chroot (Ubuntu 12.04, but other releases should also work). You need at least the x11 target to be installed, as Mali libraries depend on X11 libraries. This is actually an artificial dependency: OpenCL itself does not require X11, but the ARM Mali library is a single blob that provides OpenCL and OpenGL ES, and the latter depends on X11 libraries.

We’re going to install xfce, as it pulls in the x11 target, and gives you a nice and light desktop environment.

Open a crosh shell, and create the chroot:
sudo sh -e crouton -r precise -t xfce
This will take a while, depending on your Internet connection speed.

Then start XFCE:
sudo startxfce4 -n precise
Alternatively, if you’d rather stay in the command line, you can simply type:
sudo enter-chroot -n precise

Install OpenCL userspace drivers

Then, inside the chroot (if you have started XFCE, open a terminal), extract the Mali libraries:
mkdir -p ~/mali/lib
tar xvf ~/Downloads/linux-x11-hf-r3p0-02rel0.tgz -C ~/mali/lib/
You also need to install 2 extra dependencies:
sudo apt-get install libxcb-dri2-0 libxcb-render0
The libraries that we need to run OpenCL code, in particular libOpenCL.so, are now located in ~/mali/lib. We could install these in /usr/lib or /usr/local/lib, so that the dynamic linker could find them automatically. However, in this case, we prefer setting an environment variable, in order to avoid touching the original Ubuntu filesystem:
export LD_LIBRARY_PATH=$HOME/mali/lib

Try out an example from Mali OpenCL SDK

Let's try an example from the SDK. First, you need to install some essential tools:
sudo apt-get install make g++
Then, extract the SDK:
cd ~/mali
tar xvf /home/nicolas/Downloads/Mali_OpenCL_SDK_v1.1.0.0a36a7_Linux.tar.gz
cd Mali_OpenCL_SDK_v1.1.0
Now modify platform.mk. We are not doing any cross-compilation here, so CC and AR can simply be set as:
You can pick any example in the samples directory, but mandelbrot is probably the most "spectacular". First compile it:
cd samples/mandelbrot
Then run it:
The output is quite verbose, I’m yet to figure out how to make use of instrumentation and debugging features of the Mali OpenCL library:
[PLUGIN INFO] Plugin initializing
[PLUGIN DEBUG]  './override.instr_config' not found, trying to open the process config file
[PLUGIN DEBUG]  './mandelbrot.instr_config' not found, trying to open the default config file
[PLUGIN ERROR] Couldn't open default config file './default.instr_config'.
[PLUGIN INFO] No configuration file found, attempting to use environment
[PLUGIN INFO] CINSTR GENERAL: Output directory set to: .
[PLUGIN INFO] No instrumentation features requested.
Profiling information:
Queued time:    0.115ms
Wait time:      0.605292ms
Run time:       2218.38ms
In exchange for 2 seconds of GPU time, you get a nice Mandelbrot fractal in output.bmp.

That’s it! The next post will focus on some benchmarks, and comparison with a x86 laptop GPU and OpenMP implementations.

Sunday, 6 October 2013

Running Chromium OS in QEMU

If you'd like to have a taste of Chrome/Chromium OS before buying an actual Chromebook, the recommended way used to be to download the Hexxeh images, and run them in a virtual machine (Virtualbox). However, these images are now quite outdated.

The steps I mention here use development Chromium OS builds, made available by the Chromium developers. The resulting "experience" is still quite different from running Chrome OS on a proper device: there is no graphics acceleration, there are no applications installed by default, and the user interface is painfully slow. However, this is still good enough to get a general idea (or do some development).

Fetch the image

First, go the Chromium OS builders page, then click the architecture you want. This following assumes you are picking amd64 generic full (only use full images, don't even try the other ones), but other architectures should also work. On the next page, choose one of the builds, for example, #9624. Then, under step 16. Report, click on Artifacts. You are presented with a list of files: download chromiumos_qemu_image.tar.xz.

Extract the image

Now, let's extract the image. It is a 8GB file, but contains mostly mostly zeros. We use dd to recreate a sparse file, to save a significant amount of disk space:
$ tar xvfO chromiumos_qemu_image.tar.xz chromiumos_qemu_image.bin |
dd of=chromiumos_qemu_image.bin conv=sparse
If you have a lot of hard drive space, you can ignore that, and simply run:
$ tar xvf chromiumos_qemu_image.tar.xz

Start QEMU

Now, let's start QEMU:
$ qemu-system-x86_64 -enable-kvm -m 1024 -net nic -net user -redir tcp:9922::22 -hda chromiumos_qemu_image.bin
If permissions are setup properly, you do not need to run this as root.
The parameters serve the following purpose:
  • -enable-kvm: Makes sure we use hardware virtualisation extensions.
  • -m 1024: Give 1GB of memory to Chromium OS. If you don't specify this, the default is 128MB, which is far from enough to run Chromium OS.
  • -net nic -net user: creates a virtual ethernet interface
  • -redir tcp:9922::22: Since those are development builds, they come with an SSH server. This redirects local port 9922 to the chroot's port 22 (more on this later)
  • -hda chromiumos_qemu_image.bin: Use the image as hard drive.
And now you should see Chromium OS booting. In some cases, you may be unlucky, and the build you have picked may be broken (remember, those are development builds). In this case, pick an older build (2-3 days older at least), and retry.

Access the virtual machine via SSH

This is mostly useful for developers. The SSH server only allows login via public keys (no password authentication). To setup the public key, follow the following steps:
  • Setup a SSH private/public key pair on the host machine, if you don't have one already.
  • Copy the content of ~/.ssh/id_rsa.pub into a pastebin, Google Doc, or whatever web page you can access from the Chromium OS.
  • In Chromium OS, browse to the pastebin/Google Doc, and copy the key information.
  • Then press Ctrl-Alt-T: this opens a new crosh shell, and type:
> shell
$ cd
$ mkdir .ssh
$ cat > authorized_keys
  • Then paste the content of the key, using Ctrl-Shift-V, and press Ctrl-D to terminate.
Now you can access your QEMU "Chromiumbook" from your host, using:
ssh chronos@ -p 9922
And you can even mount a shared directory on the virtual machine with something like:
sshfs chronos@ ssh.qemu -p 9922
I you later restart QEMU, you will need to login in Chromium OS before the steps above work again.
That's it! I find this particularly useful when I develop for crouton/chroagh, as I only have a Samsung ARM Chromebook. With this setup, I can test my code on x86 and x86_64 architectures.

Monday, 30 September 2013

Tracking your DHL package in conky

(or converting the DHL tracking page from HTML to plain text, using XSLT)

So, let me explain the problem: You have that package shipped by DHL, and its tracking number. And you're so eager to receive it, that you end up checking the package tracking page every 5 minutes. Your productivity falls to zero.

But, worry no more! Let's put the package status inside your conky, so that you can just have a quick look on the side of your screen, and continue working.

Just in case you don't know (but really, you should), conky is the information bar on the right of the screenshot below:
conky is the information bar on the right on the display. CPU/RAM usage, disk free space, network status, temperature, a calendar, my DHL tracking thing, then weather in a few places.
In case you wonder, and the background is somewhere in Hue, Vietnam.
And don't worry about the gimp error, really.
The idea is to write the code that can generate this:
Yes, they spelled my name wrong...

From something as ugly looking as this:

I don't care how it works, I just want to get it running

Ok! After all the point of this was to increase your productivity, right? You can fetch the script from my github.

Then call it with:
./dhl <AWB>
Where <AWB> is the Waybill number (tracking number). It produces a text-only tracking information for your package.

You can integrate it in your conky with something like:
${font Monospace:size=6}${execi 60 ~/.conky/dhl <AWB> | head -n 3 | fold -w 16}$font

Replace ~/.conky/dhl with the path to where you copied the script. Change head parameter if you want more lines, and fold inserts new lines every 16 characters (change that depending on your conky width).

Now, if you want to know how it works, so you can fix it if it breaks, or update the code for other shipping companies, continue reading.

Inspecting the HTML source

The tracking URL looks like this (where <AWB> is your tracking number):
Looking at the HTML source, we notice that the interesting stuff is enclosed in a table:
<table border="0" summary="Summary of table content">
Then, you have a succession of thead/tbody tags. The first thead contains general information about the package, that we are not interested in. It starts like this (notice it has class "tophead"):
<thead class="tophead">
The next thead shows the date valid for the following entries. We are only interested in the first column here (the one that contains the date).
        <td colspan="5" class="emptyRow"></td>
        <th scope="col" colspan="2" axis="length"
            style="width: 40% ;text-align:left">Thursday, September 19, 2013         </th>
        <th scope="col" axis="length"
            style="width: 30% ;text-align:left ">Location</th>
        <th scope="col" axis="length"
            style="width: 9%;text-align:left">Time</th>
        <th scope="col" axis="length" class="lastChild"
            style="width: 25% ;text-align:left">&nbsp;</th>
Finally, the bulk of the events are enclosed in tbody. The first column is a incremented number, the second one is a description of what happened (passed customs, arrived at destination, etc.), the third one tell you the location (but this is often repeated in the description), and the fourth one is the time.
        <td class="" style="width: 5% ;text-align:left">18</td>
        <td class="" style="text-align:left">With delivery courier</td>
        <td class="" style="text-align:left">SINGAPORE - SINGAPORE</td>
        <td class="">7:27 PM</td>
        <td class="lastChild "><!--start contentteaser -->
        <div class="dhl">
        <div><div class="clearAll">&nbsp;</div></div>
        </div><!--end contentteaser --></td>
Ok, now we have an idea of the structure, let's parse that!

Parse HTML with XSLT

Ok, so let's say you have the DHL tracking page downloaded to /tmp/dhl.tmp, and an XSLT file in dhl.xslt, you can parse the page with:
xsltproc --html dhl.xslt /tmp/dhl.tmp
The XSLT file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="utf-8" />
    <xsl:template match="/">
        <xsl:for-each select="//table[@summary='Summary of table content']/*[self::thead|self::tbody][not(@class)]">
                <xsl:when test="name(.) = 'thead'">
                    <xsl:value-of select="tr/th[1]"/>
                    <xsl:if test="floor(tr/td[1]) = tr/td[1]">
                        <xsl:value-of select="normalize-space(tr/td[4])"/>
                        <xsl:text>: </xsl:text>
                        <xsl:value-of select="normalize-space(tr/td[2])"/>
Let's take it step by step. It starts like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output method="text" encoding="utf-8" />
    <xsl:template match="/">
Nothing special here, apart from the text output mode, so that xsltproc outputs a text file (and not another XML file...).

Now begins the fun. We look for a table with summary attribute 'Summary of table content'. Inside that table, we look for thead and tbody elements, that do not have a class attribute set, so we can exclude the 'tophead' row, that we are not interested in.
<xsl:for-each select="//table[@summary='Summary of table content']/*[self::thead|self::tbody][not(@class)]">
Now, thead (containing only the date of the following events) and tbody (containing events) need to be parsed differently. This is done with xsl:choose:
    <xsl:when test="name(.) = 'thead'">
For thead, we just want to show the date, that is the first th inside a tr (tr/th[1]). Then we print a new line with xsl:text.
<xsl:when test="name(.) = 'thead'">
    <xsl:value-of select="tr/th[1]">
For tbody, it is slightly more complicated. First, we check that the first column is indeed a number (this removes the last row in the table, which is another type of summary): this is done with a "trick" (floor(tr/td[1]) = tr/td[1]). Then we print the time (4th column), followed by a colon, and the event description (2nd column).
    <xsl:if test="floor(tr/td[1]) = tr/td[1]">
        <xsl:value-of select="normalize-space(tr/td[4])"/>
        <xsl:text>: </xsl:text>
        <xsl:value-of select="normalize-space(tr/td[2])"/>
That's it! Then you can put everything in a shell script, see the complete code on github for details.

It's so cool, I want more!

I get it. I, too, have become of fan of parsing XML/HTML from scripts. See this post for another example.

Thursday, 25 July 2013

Bokeh-fixing: Opening and cleaning an Olympus OM 50mm f/1.8

In the previous post, I talked about the Olympus OM 50mm f/1.8, and how I find it interesting to use on my Micro Four Thirds camera (Panasonic DMC-GX1), along with some sample shots.

When taking some night shots, with an object close in focus, I could see obvious defects in the bokeh, that is, the round circle of light coming from a distant, out-of-focus, light.

The next image is taken by pointing at a spot light about 200m away, but setting the focus at its closest position (0.45m). This gives a large bokeh:
Spot light is about 200m away, focus set at 0.45m, aperture f/1.8. ISO 1600, 1/40s. The bokeh dimension is about 1000x1000 pixels, that is a little more than a fifth of the width of the image.
Clearly, something is wrong here: there are some black dots and strange reflections on the left side of the bokeh circle.

By looking inside the lens, I can see something that looks like oil drops, apparently not far from the back, maybe behind the outermost lens. I'm wondering if it comes from the aperture mechanism, since it's slow and has obvious oil marks on it, but I can't tell for sure.

I looked up online, and some people on dpreview forums advise that it may not be worth the fuss trying to open it up, and it would be easier to buy a new one, considered the price. On the other hand, it is such a cheap lens (~25USD with shipping) that it would not be a disaster if I broke it. Looking further, I found some diagrams on Olympus Dementia, but even if you can figure out which exact model of lens you have (Olympus made multiple fairly different versions over the years), it still does not tell you how to open it.

Anyway, since my problem looked like to be at the back, and since there are 3 obvious screw there, I decided to start on that side:

It comes out easy. The lever to unlock the lens from the mount falls down (left on the picture below), but it isn't very tricky to find out how to put it back:

Then, a big part of the aperture mechanism comes out easily. This mechanism contains a spring that opens the aperture to the maximum. When the aperture lever is pressed (right of the picture), the spring is extended, and the lens stops down to the desired setting on the aperture ring. I took out the whole thing, taking care of keeping all the elements together. The lever falls out, but it's easy to figure out how to put it in again:

Then I'm left with this, and nothing obvious to remove. I want to remove the metal ring at the top, as it looks like there is oil right behind the glass that it is holding. It is screwed to the bottom part, but hard to remove. I notice some glue near the joint, so I scratch it off with a box cutter:

And after this, I managed to open it up, using a soft cloth to give me more grip and avoid damaging the lens (you can see some scratch on the screw thread, that's where the glue was):

The top glass is now free, and the easiest is to remove it by gravity: invert the lens, hold it in a soft cloth so that the glass does not fall down too hard, and shake it a bit.

No oil on that lens, but, luckily, I could spot it on the lens just below. I did not want to introduce any liquid in the lens, so I removed it the best I could, possibly smudging around instead of properly removing it, actually. A more proper way would have been to find the way to take out that glass, but, well, that would have required significantly more work.

After getting convinced that most of it was removed (or at least evened out...), I reassembled it, and took the same picture. Notice the improvement!
Left: before, Right: after. There is still a slight smudge on the right, but it is noticeably better.
And my 13.5 USD 50mm f/1.8 recovers it's original beautiful bokeh!

Monday, 22 July 2013

Olympus OM 50mm f/1.8 on Micro Four Thirds

One of the strong points of the Micro Four Thirds (MFT) system is that, thanks to its short flange focal distance, you can mount lenses designed for almost any other camera system.

I believe you can get the best deals by buying Olympus OM lenses: There is no current camera supporting those lenses anymore, but they were produced in mass in the 80's and 90's. These are ingredients for a high supply, low demand, therefore low prices on auction websites.

This is especially true of the Olympus OM 50mm, f/1.8, that used to be a kit lens with many film Olympus cameras. Almost a year ago, I bought one on eBay, for 13.50 USD (+ 11 USD shipping). I mounted it on my Panasonic DMC-GX1, using a OM to MFT adapter (less than 10 USD).

I originally bought this lens to use it as part of a custom tilt-shift adapter, but realised that the 50mm focal length is usually too narrow, and purchased a Promaster 28mm f/2.8 for that purpose (OM mount as well).

This lens is really amazing (especially considered its price): It becomes a short telephoto lens on the MFT system (100mm full-frame equivalent), which gives you interesting constraints: you have to focus on details, or put some distance between you and your subject. The large aperture makes it particularly interesting in low-light conditions (museums, night markets, etc.). On the other hand, it does require ND filters in bright daylight, as you are hitting the maximum shutter speed of the camera (1/4000s for the GX1): a 3-stop ND filter, that is ND8 or 0.9 optical density, works perfectly for these situations. I actually never stop the aperture down: I would rather switch to another lens if I want more depth of field.

Focusing is not easy, especially without a viewfinder. MFT cameras provide a magnified view to help you focus, but, with a bit of practice, I'm able to get a reasonably good focus without using that mode, by moving the ring back and forth until I have a good idea of the best position.

The lens I got was in good condition, except for the aperture, that is a bit sluggish: you need to jiggle the aperture ring to get it back to f/1.8 if you stop it down. I could also see some oil on the aperture blades: probably the reason why the mechanism is not working as well as expected. But again, since I only use it at maximum aperture, this is not really a concern for me.

I used that lens for a number of night shots, and realised that the bokeh is not exactly as round and nice as it should be: there is some "dirt" on the left side of the disk (when held in landscape orientation). This does not show up clearly in most shots, but it looks quite silly when the same pattern repeats in different locations on the frame:
Each of the bokeh rings shows some black spots at the bottom: looking through the lens, I can see some oil marks.
The next post will show you how I managed to fix the problem, by opening up the lens.

In the mean time, I uploaded on Flickr a collection of photos taken with that lens:

Wednesday, 19 June 2013

Good morning haze!

Yesterday morning Singapore woke up under thick haze due to forest fires in nearby Sumatra. Not healthy: You can feel it in your throat, and some corridors smell like Scamorza (some delicious Italian smoked cheese).

It smells just like that... (Image from Necrophorus@Wikipedia, GFDL)
Anyway... it gives some "interesting" light when the sun is low.

Good morning purée... (slightly underexposed)

Red sun, still high above the horizon (~1h before sunset).