Monday, June 10, 2013

Using Metadata to Find Paul Revere

Absolute brilliance from Kieran Healy.

London, 1772.

I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in certain recent events and the assurances of various respectable parties that the government was merely “sifting through this so-called metadata” and that the “information acquired does not include the content of any communications”. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. I shall also endeavour to show how these methods work in what might be called a relational manner.


The analysis in this report is based on information gathered by our field agent Mr David Hackett Fischer and published in an Appendix to his lengthy report to the government. As you may be aware, Mr Fischer is an expert and respected field Agent with a broad and deep knowledge of the colonies. I, on the other hand, have made my way from Ireland with just a little quantitative training—I placed several hundred rungs below the Senior Wrangler during my time at Cambridge—and I am presently employed as a junior analytical scribe at ye olde National Security Administration. Sorry, I mean the Royal Security Administration. And I should emphasize again that I know nothing of current affairs in the colonies. However, our current Eighteenth Century beta of PRISM has been used to collect and analyze information on more than two hundred and sixty persons (of varying degrees of suspicion) belonging variously to seven different organizations in the Boston area.

Rest assured that we only collected metadata on these people, and no actual conversations were recorded or meetings transcribed. All I know is whether someone was a member of an organization or not. Surely this is but a small encroachment on the freedom of the Crown’s subjects. I have been asked, on the basis of this poor information, to present some names for our field agents in the Colonies to work with. It seems an unlikely task...

...Rest assured that we only collected metadata on these people, and no actual conversations were recorded or meetings transcribed. All I know is whether someone was a member of an organization or not. Surely this is but a small encroachment on the freedom of the Crown’s subjects. I have been asked, on the basis of this poor information, to present some names for our field agents in the Colonies to work with. It seems an unlikely task.

If you want to follow along yourself, there is a secret repository containing the data and the appropriate commands for your portable analytical engine... Here is what the data look like.


...I cannot show you the whole Person by Person matrix, because I would have to kill you. I jest, I jest! It is just because it is rather large. But here is a little snippet of it. At this point in the eighteenth century, a 254x254 matrix is what we call ”Bigge Data”. I have an upcoming EDWARDx talk about it. You should come. Anyway:


...The analytical engine has arranged everyone neatly, picking out clusters of individuals and also showing both peripheral individuals and—more intriguingly—people who seem to bridge various groups in ways that might perhaps be relevant to national security...


...Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.


Once again, I remind you that I know nothing of Mr Revere, or his conversations, or his habits or beliefs, his writings (if he has any) or his personal life. All I know is this bit of metadata, based on membership in some organizations. And yet my analytical engine, on the basis of absolutely the most elementary of operations in Social Networke Analysis, seems to have picked him out of our 254 names as being of unusual interest...

Read the whole thing.


Hat tips: V and Postcardiness's Blog.

3 comments:

Reliapundit said...

GRRRRRREAT!

I've been saying this for days:

THERE'S A SIMPLE REASON WHY THE GOVERNMENT HAS THE NSA COLLECT DATA ON EVERY SINGLE AMERICAN:

AMERICA IS CURRENTLY TOO DANG POLITICALLY CORRECT TO PROFILE AND TARGET THE REAL ENEMY, THE ISLAMISTS.

WE'D HAVE A LOT MORE FOCUSED AND EFFECTIVE COUNTER-ATTACK ON THE ENEMY IF INSTEAD OF HAVING THE NSA LISTEN TO EVERY DANG PHONE CALL AND READ EVERY DANG EMAIL AND ANALYZE EVERY DANG GOOGLE WORD-SEARCH, THEY WENT AFTER ISLAMISTS.

THE NSA DATA-MINING ALL THE CRAP THEY'RE DATA-MINING IS LIKE THE TSA FEELING-UP AND PATTING DOWN EVERY DANG PERSON WHO BOARDS A PLANE.

Data-mining everything from everyone wastes resources and time.

Reliapundit said...
This comment has been removed by the author.
K-Bob said...

This excercise would have been far closer to the truth if it turned out that the full employment of the model missed the very existence of Paul Revere entirely.