BirkAndMe: Armory grinding [Part 1]: Getting started

First off this is mainly about programming, and is only connected to WoW since it shows how to get data off the armory.

There are 2 ways (actually there are several, but 2 major ways - actually maybe 3 major ways, but the 3^rd one being less ... legal) of getting data from the WoW Armory using PHP.

Web scraping, here we'll simply get the whole HTML page, and start scrapping the tags, and getting out the information we want.
Simply getting the XML sheet, and using the already structured data.

I'll be using the 2^nd method, because it's obviously the fastest, since the actually amount of bytes send is way less and the script performance is better because there's no need for making a big script scraping the data.

Getting the data
There's several methods getting the actual data, I've decided to write a (very) simple HTTP get function, and then using the more common PHP functions.
I've just looked around the net on how other people are fetching the data, the most popular way is using cURL (and honestly this is the way I would use if I ever was to make something serious), some also spoofs the User-Agent (explained later) by setting it in the php.ini. Unfortunately not everyone has access to these options (I for one don't have access to any of these 2 methods on my cheap cheap web host).

To my surprise the armory is sending chunked data (transfer-coding: Chunked), so my (very) simple HTTP get, just got upgraded to a plain simple function :).
To open the actual connection I'm using fsockopen, after this I'll use the standard file functions available to PHP.
I will not be handling errors in the code examples, it serves no purpose to what I want to show in these posts. Meaning I wont check if I got the connection I'll just assume it worked. This is of course very very bad coding practice, but like I said it's not what I'll be going over in these posts.

Right enough chit chat, let me get started.
First off we'll have to setup the GET request. We need to spoof our User-Agent to get the XML, if the armory server doesn't recognize the user-agent as a browser that support the styling of their XML, it'll send it as HTML (and we would have to scrape the page). So we'll make the armory think we're using FireFox, giving us a request that needs to follow this setup:

GET [ABSOLUTE_PATH]?[QUERY] HTTP/1.1
Host: [ARMORY_URL]
User-Agent: Firefox/3.0.5

We could also use an actual user-agent string such as Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 (.NET CLR 3.5.30729) but my simplified version will give the same result.
The rest of the code is pretty self-explanatory, except for how I get the chunks, which I'll explain below the code:

function httpGetRequest($host, $absPath, $query, $port = 80) {
  $newLine = "\r\n";
  $request = 'GET /'. $absPath .'?'. $query .' HTTP/1.1'. $newLine;
  $request .= 'Host: '. $host . $newLine;
  $request .= 'User-Agent: Firefox/3.0.5'. $newLine . $newLine;

  $handle = fsockopen($host, $port, $errNo, $errStr, 10);
  fwrite($handle, $request);

  $response = "";
  $nextChunk = 1;
  while ($nextChunk > 0 && !feof($handle)) {
     if (fgets($handle) == $newLine) {
        $nextChunk = hexdec(fgets($handle));
        if ($nextChunk > 0) {
           for ($i = 0; $i < $nextChunk; $i++) {
              $response .= fgetc($handle);
           }
        }
     }
  }
  fclose($handle);
  return $response;
}

(oh if this is totally freakish to some of you, go read up on the HTTP protocol, it's a pretty simple protocol and very easy to understand)
Now I read the chunked data by looking for the the "\r\n" and then a number (in hex). As an added bonus this will get me through the headers, and on to the body. When we encounter a "\r\n", then the number followed is the number of bytes in the next chunk of data. So basically just read the next X bytes and ask for the number of bytes in the next chunk.
Then to use the function I'll simply call it as following (this will give my character XML sheet):

var_dump(httpGetRequest($url, 'character-sheet.xml', 'r=Trollbane&n=Apollonaris'));

I'm not going to post the XML here, because it is actually rather big, and would do no good, since you can just copy paste the above code, and give it a try.

Anyway hope this helps someone, I know I had some troubles getting started with this, because all the examples where using cURL. Having been busy busy at work lately I haven't gotten around to fooling around with this, but when a project at work needed for a custom HTTP Get function it was easy to port :)
(hmm I must try to make this field a bit widder, so my code actually is readable without scrolling)

Tuesday, March 03, 2009

Armory grinding [Part 1]: Getting started

No comments:

Post a Comment

New Blog

Blogroll

About Me