So, let me explain the problem: You have that package shipped by DHL, and its tracking number. And you're so eager to receive it, that you end up checking the package tracking page every 5 minutes. Your productivity falls to zero.
But, worry no more! Let's put the package status inside your conky, so that you can just have a quick look on the side of your screen, and continue working.
Just in case you don't know (but really, you should), conky is the information bar on the right of the screenshot below:
Yes, they spelled my name wrong... |
From something as ugly looking as this:
I don't care how it works, I just want to get it running
Ok! After all the point of this was to increase your productivity, right? You can fetch the script from my github.Then call it with:
./dhl <AWB>
Where <AWB> is the Waybill number (tracking number). It produces a text-only tracking information for your package.
You can integrate it in your conky with something like:
Replace ~/.conky/dhl with the path to where you copied the script. Change head parameter if you want more lines, and fold inserts new lines every 16 characters (change that depending on your conky width).
Now, if you want to know how it works, so you can fix it if it breaks, or update the code for other shipping companies, continue reading.
Now begins the fun. We look for a table with summary attribute 'Summary of table content'. Inside that table, we look for thead and tbody elements, that do not have a class attribute set, so we can exclude the 'tophead' row, that we are not interested in.
You can integrate it in your conky with something like:
${font Monospace:size=6}${execi 60 ~/.conky/dhl <AWB> | head -n 3 | fold -w 16}$font
Replace ~/.conky/dhl with the path to where you copied the script. Change head parameter if you want more lines, and fold inserts new lines every 16 characters (change that depending on your conky width).
Now, if you want to know how it works, so you can fix it if it breaks, or update the code for other shipping companies, continue reading.
Inspecting the HTML source
The tracking URL looks like this (where <AWB> is your tracking number):
http://www.dhl-usa.com/content/us/en/express/tracking.shtml?brand=DHL&AWB=<AWB>
Looking at the HTML source, we notice that the interesting stuff is enclosed in a table:
<table border="0" summary="Summary of table content">
Then, you have a succession of thead/tbody tags. The first thead contains general information about the package, that we are not interested in. It starts like this (notice it has class "tophead"):
<thead class="tophead">
The next thead shows the date valid for the following entries. We are only interested in the first column here (the one that contains the date).
<thead>
<tr>
<td colspan="5" class="emptyRow"></td>
</tr>
<tr>
<th scope="col" colspan="2" axis="length"
style="width: 40% ;text-align:left">Thursday, September 19, 2013 </th>
<th scope="col" axis="length"
style="width: 30% ;text-align:left ">Location</th>
<th scope="col" axis="length"
style="width: 9%;text-align:left">Time</th>
<th scope="col" axis="length" class="lastChild"
style="width: 25% ;text-align:left"> </th>
</tr>
</thead>
Finally, the bulk of the events are enclosed in tbody. The first column is a incremented number, the second one is a description of what happened (passed customs, arrived at destination, etc.), the third one tell you the location (but this is often repeated in the description), and the fourth one is the time.
<tr>
<td colspan="5" class="emptyRow"></td>
</tr>
<tr>
<th scope="col" colspan="2" axis="length"
style="width: 40% ;text-align:left">Thursday, September 19, 2013 </th>
<th scope="col" axis="length"
style="width: 30% ;text-align:left ">Location</th>
<th scope="col" axis="length"
style="width: 9%;text-align:left">Time</th>
<th scope="col" axis="length" class="lastChild"
style="width: 25% ;text-align:left"> </th>
</tr>
</thead>
<tbody>
<tr>
<td class="" style="width: 5% ;text-align:left">18</td>
<td class="" style="text-align:left">With delivery courier</td>
<td class="" style="text-align:left">SINGAPORE - SINGAPORE</td>
<td class="">7:27 PM</td>
<td class="lastChild "><!--start contentteaser -->
<div class="dhl">
<div><div class="clearAll"> </div></div>
</div><!--end contentteaser --></td>
</tr>
</tbody>
Ok, now we have an idea of the structure, let's parse that!
<tr>
<td class="" style="width: 5% ;text-align:left">18</td>
<td class="" style="text-align:left">With delivery courier</td>
<td class="" style="text-align:left">SINGAPORE - SINGAPORE</td>
<td class="">7:27 PM</td>
<td class="lastChild "><!--start contentteaser -->
<div class="dhl">
<div><div class="clearAll"> </div></div>
</div><!--end contentteaser --></td>
</tr>
</tbody>
Parse HTML with XSLT
Ok, so let's say you have the DHL tracking page downloaded to /tmp/dhl.tmp, and an XSLT file in dhl.xslt, you can parse the page with:
xsltproc --html dhl.xslt /tmp/dhl.tmp
The XSLT file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" />
<xsl:template match="/">
<xsl:for-each select="//table[@summary='Summary of table content']/*[self::thead|self::tbody][not(@class)]">
<xsl:choose>
<xsl:when test="name(.) = 'thead'">
<xsl:value-of select="tr/th[1]"/>
<xsl:text>
</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:if test="floor(tr/td[1]) = tr/td[1]">
<xsl:value-of select="normalize-space(tr/td[4])"/>
<xsl:text>: </xsl:text>
<xsl:value-of select="normalize-space(tr/td[2])"/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Let's take it step by step. It starts like this:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" />
<xsl:template match="/">
<xsl:for-each select="//table[@summary='Summary of table content']/*[self::thead|self::tbody][not(@class)]">
<xsl:choose>
<xsl:when test="name(.) = 'thead'">
<xsl:value-of select="tr/th[1]"/>
<xsl:text>
</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:if test="floor(tr/td[1]) = tr/td[1]">
<xsl:value-of select="normalize-space(tr/td[4])"/>
<xsl:text>: </xsl:text>
<xsl:value-of select="normalize-space(tr/td[2])"/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" />
<xsl:template match="/">
Nothing special here, apart from the text output mode, so that xsltproc outputs a text file (and not another XML file...).<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text" encoding="utf-8" />
<xsl:template match="/">
Now begins the fun. We look for a table with summary attribute 'Summary of table content'. Inside that table, we look for thead and tbody elements, that do not have a class attribute set, so we can exclude the 'tophead' row, that we are not interested in.
<xsl:for-each select="//table[@summary='Summary of table content']/*[self::thead|self::tbody][not(@class)]">
Now, thead (containing only the date of the following events) and tbody (containing events) need to be parsed differently. This is done with xsl:choose:
<xsl:choose>
<xsl:when test="name(.) = 'thead'">
...
</xsl:when>
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
For thead, we just want to show the date, that is the first th inside a tr (tr/th[1]). Then we print a new line with xsl:text.
<xsl:when test="name(.) = 'thead'">
...
</xsl:when>
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
<xsl:when test="name(.) = 'thead'">
<xsl:value-of select="tr/th[1]">
<xsl:text>
</xsl:text>
</xsl:value-of></xsl:when>
For tbody, it is slightly more complicated. First, we check that the first column is indeed a number (this removes the last row in the table, which is another type of summary): this is done with a "trick" (floor(tr/td[1]) = tr/td[1]). Then we print the time (4th column), followed by a colon, and the event description (2nd column).<xsl:value-of select="tr/th[1]">
<xsl:text>
</xsl:text>
</xsl:value-of></xsl:when>
<xsl:otherwise>
<xsl:if test="floor(tr/td[1]) = tr/td[1]">
<xsl:value-of select="normalize-space(tr/td[4])"/>
<xsl:text>: </xsl:text>
<xsl:value-of select="normalize-space(tr/td[2])"/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:otherwise>
That's it! Then you can put everything in a shell script, see the complete code on github for details.<xsl:if test="floor(tr/td[1]) = tr/td[1]">
<xsl:value-of select="normalize-space(tr/td[4])"/>
<xsl:text>: </xsl:text>
<xsl:value-of select="normalize-space(tr/td[2])"/>
<xsl:text>
</xsl:text>
</xsl:if>
</xsl:otherwise>
No comments:
Post a Comment