Parsing a website used to be rather straightforward affair: You download the webpages and then you poke through the HTML with regular expressions until you find what you're looking for. You could set up a pretty good data extractor going in Perl rather quickly. Later e replaced the regular expressions with proper HTML parsers, but on the whole the process remained rather simple.

These days however, it's rarely so simple. AJAX, which is essentially the practice of having the webpage communicate with the server through javascript, has become so common we don't even call it AJAX anymore. Now we just call it 'The way things are done.'((Calling a website 'dynamic' is another alternative.)) This is nice in a way because it allows web applications to be much more responsive and interactive. A modern website feels more like a client program than a downloaded document. On the other hand, this makes it *much* harder to parse webpages:summary_end:.

When you download a fully dynamic website, typically what you get is just a content-less html shell with a large javascript application built in. Typically what you don't get is the actual content you were looking for. You actually need to execute the javascript code in order to load the actual data and that's just not going to happen with a Perl script.((Sometimes, websites still offer a static version of the website for those without javascript, but that is decreasingly becoming the case.))

Generally with a well made website of this sort, there is a well defined, usually RESTful interface between the client webpage and the server backend. Typically JSON is used as the interchange format and this interface can be made into a public API. A reasonably complete public API will generally obviate the need to parse and scrape a website, but not every dynamic website will have a publicly available API.

So on the occasion that one needs to parse a dynamic, 'AJAXy' website what is one to do? At one point I thought of this as a hard problem, but over the years this has become much easier. What one needs to do is to simply write his parser in javascript within a browser that has loaded the webpage. It's as simple as scripting the webpage from the development console in most web browsers. Doing this will let one run the website's javascript and load the desired content into the website. Further, one can trigger click events and form submissions if interaction is needed with the webpage before it's produce the right content. Load something like JQuery into the browser instance and it becomes very easy to pick out values that you need.

Better yet, instead of using a normal webbrowser, like Firefox or Chrome, which makes it relatively difficult to run a parser automatically, and periodically fetch content, use [PhantomJS](http://phantomjs.org/), which is a headless webbrowser scriptable webbrowser which can be made to download a website and run javascript in it automatically according to a script written ahead of time. You can improve the experience by using [pjsrape](http://nrabinowitz.github.io/pjscrape/) which wraps PhantomJS and makes the process of parsing dynamic webpages almost as simple as that of parsing static ones.