Wrapping Manure

Our company has a common time registration system. Every month we are mandated to fill out our time sheets, using its Netscape-era UI.

It’s named Kaba and it looks roughly like this:

Filling out your time sheet is not the most popular of chores, especially given the fact that you would just mindlessly enter information already available elsewhere, such as your primary project code and any vacation days. This information is already maintained using its own proprietary (and rather nice) application named the Absence Manager.

If you prefer an architectural drawing with impeccably aligned boxes, here you go:

The time registration system and the absence manager system

A tale of two web apps

I guess that I should mention that at the corporate level, tax deductions are involved. In other words it’s not a process which is likely to go away anytime soon.

It’s not the biggest of deals but still … could there be an easier way?

Trying Selenium

Our first attempt at automating the process involved Selenium, a technology that we had just started to explore at that point. After some work we got it to run. After half a year, for some reason, bugs started creeping in, some of which were really odd. We tried to debug them but we hit rock bottom; it seemed like a bug in the web driver or maybe even in Chrome. The HTML and Javascript that it was dealing with was pretty weird stuff so maybe we stumbled upon some corner case. After a year or so we decided to abandon the Selenium strategy. This decision was also guided by the experience that we had gathered using Selenium in other projects. To us, at least, it seems surprisingly brittle. Perhaps because every browser update or web driver update was a potential source of jitter.

Either way, Selenium was off the table.

Wrapping the UI

During the time period where the Selenium based solution was slowly derailing itself, a new idea had formed. If we could only avoid the browser and instead communicate directly using HTTP, maybe there would be fewer moving parts? It would require us to reverse engineer how to get information in and out (i.e. how the UI would communicate with the server). But Chrome makes it easy to snoop on web communication. Also, there were really just three requests that we needed to reverse engineer:

  • Logging in to the time registration web UI with a given user
  • Getting the timesheet data for a given week for the logged in user
  • Posting timesheet data for a given week for the logged in user

In terms of architecture, I decided to create a simple, nodeJS based server which would be able to communicate with the time registration system and which would in turn expose its own REST API. I picked nodeJS to get my feet wet with the technology and to see if it would be an easy and painless way of creating a server (it most certainly was!).

The basic idea was to make the nodeJS server an adapter around the time registration system as well as around the absence manager. For instance, it would expose a REST method to obtain the user list (from the absence manager) and another REST method to obtain the time sheet data for a given week (data from the time registration system). The REST API should be convenient and should not convey the origin of the data.

After a bit of thinking I decided to also make it the responsibility of the nodeJS server to transparently log in. The login mechanism of the time registration system is a pretty standard deal in which it sets a session cookie on the response and use that for subsequent request validation. To store sessions I used a plain javascript object mapping from user name to cookie. As far as the time registration system was concerned, it could only assume that a bunch of people decided to log in simultaneously, forming a bizzare kind of time registration flash mob.

Implementing the method for getting time sheet data involved HTML scraping (no AJAX used here). As it turned out, the HTML returned from the time sheet application included a large inlined data structure (a variable declaration within a script tag) and a bunch of Javascript to render it to the DOM. Thus, it was refreshingly easy to get my hands on the data and to transform it to something useful.

Posting data, on the other hand, was a pain, and I almost gave up on reverse engineering it since nothing seemed to work. As a last resort I challenged a very smart colleague of mine to crack it, offering a desirably bounty composed of beer and bragging rights. Within days he had not only managed to reverse engineer the protocol, he had also written the first take on an AngularJS based UI on top of the NodeJS server. These were valuable contributuions.

All of this brought us pretty much to the place where we wanted to be. A place where we didn’t have to interact with the hideous UX and UI presented by the time sheet application. Where we could write a modern UI to present the data and where we could write business rules on the server to make our lives easier. It looked like this:

A new shiny UI

A new shiny UI

The remaining part consisted of writing the business logic for autofilling timesheets based upon the data that the system could pull from various places (users, project codes, vacation days, and public holidays). Also, I took the opportunity to play around with with fun stuff like AngularJS animations, CSS, and the like, although none of this was really required.

The performance also deserves a mention. Once a user has logged in — which is sluggish in itself — the filling out of a timesheet manually for a week takes at least 30 seconds. As a side note, the time sheet data has historically been wildly inaccurate since a lot of people didn’t really know what to fill in or didn’t bother to check the list of holidays.
The Selenium implementation took around 90 seconds to log in a user and fill out a week; this added up to about an hour for our department as the users were filled out in sequence.
The NodeJS based implementation, running users in parallel and steering clear of the sluggish UI, filled out the weekly timesheet for all users for in about one minute!

Conclusions

  • Small side projects are a great way to explore new technologies. It’s much like training, only without pricey consultants and oxygene depleted conference rooms. Also, it’s highly motivating.
  • NodeJS is both fun and productive for creating all kinds of stuff, including web servers and REST APIs. I don’t miss the verbose Java way where you would define value classes at both the back and the front of your system and creating yet other classes with long sequences of dest.setFoo(src.getFoo()) type statements. In fact I seem to be developing an allergy against just that kind of thing these days.
  • Selenium is ingenious but it’s also brittle. So use it as a last resort, whether you are writing automation scripts or automatic tests.
  • If you don’t like the UI of a system (or a public web site for that matter), and if you are able to work out how it communicates with its backend, then you can create a better UI. It’s not hard at all and you don’t have to ask permission from anyone.