Wednesday, October 14, 2009

What is jsHub?

Some time ago I blogged about a new open-source project I'm involved in called jsHub. Since then there's been a little bit of confusion about what jsHub is all about.

Hopefully, I can clear this up in this blog post with an example.



The Problem

The home page of World Wrestling Entertainment has a total of 11 pieces of JavaScript for tracking and ad-serving. If you have Ghostery installed in your browser it will tell you that that page contains the following:



Using our own internal tool we see that page contains DoubleClick, Google Analytics (which they include three times), LeadBack, Microsoft Atlas, Omniture, OpenX, Quantcast, Quigo AdSonar, Revenue Science, Tacoda and comScore Beacon.

The problems with having so many different pieces of tracking JavaScript are many:

1. They add to the page weight. In the case of WWE the HTML of the page is 54687 bytes (the total non-graphic content downloaded is 433211 bytes).

The JavaScript for tracking and ad-serving is a total of 125454. i.e. 29% of the non-graphic content of that page is JavaScript code used to track usage and serve ads.

2. They create a risk of data integrity problems.

A typical problem occurs when one piece of JavaScript works and sends tracking information back and another doesn't. This creates a discrepancy between products that is a problem when trying to reconcile page counts between say an analytics product and an advertising system.

This is not a theoretical problem. It's easy to have it occur because a page may be viewed and the user may hit stop while the page is loading. A piece of JavaScript near the top of the page may have executed, while a piece near the bottom has not.

Indeed, the page code for the WWE site contains the comment: <!-- Add Google anlytics after omniture --> indicating how important placement of JavaScript code is.

3. They add unnecessary processing time.

Just take a look at this shot of downloading all the JavaScript for the WWE page. This was taken using Firebug and shows how wasteful all that extra code is in terms of download time, and execution time.



4. They are next to impossible to check for security problems.

The only option the web master of WWE.com has is to run all the JavaScript he receives from third-parties through something like Google Caja to ensure that it's safe, or insist that they are ADsafe.

Here, for example, is a section of code from Ominture's tracker used on the WWE page:



But the web master actually doesn't have that luxury because typically the JavaScript is being loaded remotely from the analytics vendor's or ad-server's web site and the end web master has no control at all over what's being loaded. Just look at what happened to the New York Times when a malicious ad turned into malware.

In the case of WWE there are 24 includes of JavaScript code from web sites that WWE do not control. And because of the browser security model all these pieces of code are getting equal access to the page.

5. End-users have no way to understand what they are doing.

Although programs like Ghostery are excellent they can't tell you what's actually happening inside that JavaScript. For example, there's no easy way for an end-user to determine what information is being gathered, or where it's being sent.

There is a tool called WASP but it's aimed at people debugging web site tagging problems, not at the privacy-aware consumer. Here's what WASP says about Tacoda on the WWE web site:



6. They represent duplicated effort as vendors are forced to write and maintain their own JavaScript code.

For example, all those tags have to find a way to send data back to their respective servers meaning there's duplicated code that has to be tested on a wide range of browsers to ensure that it all works.

7. Their inner working are often obscure.

See above!

Enter jsHub

jsHub is designed to eliminate these problems. It's a single piece of JavaScript (a "tag") that can handle reading different sorts of page information and then send them to many different vendors' products. One piece of code to send to Google Analytics, Omniture SiteCatalyst, WebTrends and Mixpanel.

Instead of one piece of JavaScript per vendor, jsHub has a single piece of code (the "hub") and plugins that know how to translate into the required wire protocol for each vendor. Vendors only maintain the plugin for their product.

With one piece of code the page weight is less, there's no danger of one product getting a page view and another not and processing time is reduced.

Then to make the entire thing debuggable and easy for an end-user to understand, there's the tag inspector. It's a user interface that talks to the jsHub tag and interrogates its operation. That way a user can see what's being gathered on a page, and who is receiving it.



Since the entire project is open-source it's possible to inspect the code to ensure that it is well written and secure. And it's licensed under a BSD-license so that it's open and includeable everywhere.

To further ensure that the code is of high-quality (and can handle all the different types of browsers that it might be executed in), there's a complete test suite and cross-browser testing system.

To make exactly what data is being gathered clear we are also proposing (public domain) standards for marking up page metadata using microformats. Our proposed standard is called hPage.

We're just getting started with jsHub. It's running on a small number of sites and we're working to build vendor interest. We strongly believe that a shared, open-source tag is the best solution for the entire web world.

If you want to get involved, contact the team.

Labels: ,

3 Comments:

Blogger Aaron said...

This is a good initiative. I like it. Solves several problems.
Good luck on getting the vendors to work with you!

- Aaron

2:30 PM  
Blogger Duane Johnson said...

Many of the problems you cited were also mentioned at the AJAX Experience (conference in Boston). Particularly the aspect of one script interfering with another script. In a worst-case scenario, it's quite possible that one vendor's javascript will raise an error and bring down the rest of the javascript on the page. Other problem cases included the situation last month when the NY Times had an advertiser that hijacked the page and sent users to an anti-virus installation page (which was a trojan, of course).

Anyway, to summarize my thoughts: your initiative looks like a great start to the puzzle of figuring out how to make all of this stuff work together! I look forward to solving more of the underlying problems as well.

6:09 PM  
Blogger Alex said...

Great idea.

12:42 PM  

Post a Comment

Links to this post:

Create a Link

<< Home