<?xml version="1.0" encoding="UTF-8"?>
<feed xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
  <title>Nimble Code: Tag advise</title>
  <subtitle type="html">Jacob Harris' Weblog</subtitle>
  <id>tag:www.nimblecode.com,2005:Typo</id>
  <generator version="4.0" uri="http://typo.leetsoft.com">Typo</generator>
  <link href="http://www.nimblecode.com/xml/atom10/tag/advise/feed.xml" rel="self" type="application/xml+atom"/>
  <link href="http://www.nimblecode.com/articles/tag?tag=advise" rel="alternate" type="text/html"/>
  <updated>2008-11-19T19:57:03-08:00</updated>
  <entry>
    <author>
      <name>Jacob Harris</name>
      <email>harrisj@nimblecode.com</email>
    </author>
    <id>urn:uuid:be46661e-f58c-44ec-908a-f91f7384c61f</id>
    <published>2006-03-01T15:21:00-08:00</published>
    <updated>2008-11-19T19:57:03-08:00</updated>
    <title>The Myth of Total Information Awareness</title>
    <link href="http://www.nimblecode.com/articles/2006/03/01/the-myth-of-total-information-awareness" rel="alternate" type="text/html"/>
    <category term="miscellaneous" scheme="http://www.nimblecode.com/articles/category/advise" label="Miscellaneous"/>
    <category term="politics" scheme="http://www.nimblecode.com/articles/tag"/>
    <category term="surveillance" scheme="http://www.nimblecode.com/articles/tag"/>
    <category term="tia" scheme="http://www.nimblecode.com/articles/tag"/>
    <category term="topsail" scheme="http://www.nimblecode.com/articles/tag"/>
    <category term="advise" scheme="http://www.nimblecode.com/articles/tag"/>
    <category term="nsa" scheme="http://www.nimblecode.com/articles/tag"/>
    <category term="spooks" scheme="http://www.nimblecode.com/articles/tag"/>
    <content type="html">&lt;p&gt;Lost in the last few weeks in the incessant coverage over Dick Cheney&amp;#8217;s decision to &lt;a href="http://www.theonion.com/content/node/45572"&gt;hunt the ultimate prey&lt;/a&gt; were some interesting revelations about the &lt;span class="caps"&gt;NSA&lt;/span&gt;&amp;#8217;s new mechanisms for spying on electronic communications. The Christian Science Monitor broke this story first, report on &lt;a href="http://www.csmonitor.com/2006/0209/p01s02-uspo.html?s=hns"&gt;a system named &lt;span class="caps"&gt;ADVISE&lt;/span&gt; that would spider blogs, wishlists and other relics of online presence to build up dossiers on people&lt;/a&gt;. Some people thought that &lt;span class="caps"&gt;ADVISE&lt;/span&gt; was &lt;a href="http://battellemedia.com/archives/002342.php"&gt;simply the rejected Total Information Awareness (TIA) program in new clothing&lt;/a&gt;, but Newsweek in a &lt;a href="http://www.msnbc.msn.com/id/11238800/site/newsweek/print/1/displaymode/1098/"&gt;truly excellent work of reporting&lt;/a&gt; broke the information that &lt;a href="http://thinkprogress.org/2006/02/11/tia-lives/"&gt;the core of &lt;span class="caps"&gt;TIA&lt;/span&gt; was actually renamed to a project called Topsail&lt;/a&gt; and &lt;span class="caps"&gt;ADVISE&lt;/span&gt; was something else. &lt;span class="caps"&gt;TIA&lt;/span&gt;&amp;#8217;s core motivation was to simply scan communications for &amp;#8220;suspicious activities&amp;#8221; and then notify human analysts of potential problems. &lt;span class="caps"&gt;ADVISE&lt;/span&gt; seems to have a much grander scope of building up dossiers of people&amp;#8217;s interests and intents to identify &amp;#8220;suspicious people&amp;#8221; instead. Yikes.&lt;/p&gt;


	&lt;p&gt;In some sense, this is nothing new. Way back in 1993, the &lt;span class="caps"&gt;NSA&lt;/span&gt; made waves by trying to pass through a mandatory encryption standard called &lt;a href="http://www.epic.org/crypto/clipper/"&gt;the Clipper chip&lt;/a&gt; that would enable the government to decrypt any encrypted communications and the &lt;a href="http://fly.hiwaay.net/~pspoole/echres.html"&gt;Echelon&lt;/a&gt; project has been steadily accumulating intercepted electronic communications under the &lt;span class="caps"&gt;NSA&lt;/span&gt;&amp;#8217;s purview. But the &lt;span class="caps"&gt;NSA&lt;/span&gt; has always had issues analyzing the volume of messages they grab and very, very little of the data they retrieve ever makes its way in front of an analyst. Making light of this, some Emacs developer even added a feature &lt;strong&gt;M-x spook&lt;/strong&gt; that would spit out a series of suspicious words suitable for activists to add to their email and overwhelm the already overstrained capabilities of the government like so:&lt;/p&gt;


	&lt;p&gt;&lt;em&gt;top secret &lt;span class="caps"&gt;SAFE&lt;/span&gt; terrorist &lt;span class="caps"&gt;ANZUS &lt;/span&gt;New World Order enforcers radar &lt;span class="caps"&gt;TELINT &lt;/span&gt;Serbian advisors &lt;span class="caps"&gt;FIPS140 INSCOM&lt;/span&gt; government &lt;span class="caps"&gt;CNCIS&lt;/span&gt; secure&lt;/em&gt;&lt;/p&gt;


	&lt;p&gt;But here we are again, with the government claiming a need to spy on us and the media leading a fight against it. If it continues like it has though, we fighting this are destined to lose. The problem is that most of the ensuing discussion of the government&amp;#8217;s data mining operations have been like those for the wire tapping scandal; criticism is focused on the political and ethical problems of the systems and &lt;a href="http://mediamatters.org/items/200512240002"&gt;lies are exposed&lt;/a&gt;, but the underlying technical problems are glossed over by tech-averse journalists. Indeed, most discussions on the legality of wire tapping implicitly assume that the technology is completely effective but should be avoided strictly out of moral concerns. To see what I mean, recall how the debate over torture that started last year played out in newspapers and TV shows. The &amp;#8220;anti-torture&amp;#8221; side would start the argument by positing the moral horror of torture. The &amp;#8220;pro-torture&amp;#8221; side would almost always then counter with a hypothetical situation of capturing a mastermind who we are unquestionably certain knows about a master plot to destroy an American city and from whom torture is the &lt;em&gt;only&lt;/em&gt; way to get such information. The torture advocate thus sidesteps the moral horrors of the situation by claiming there is no viable alternative. Of course, &lt;a href="http://www.washingtonpost.com/wp-dyn/articles/A2302-2005Jan11.html"&gt;such situations never happen and torture rarely ever yields true information&lt;/a&gt;, but that&amp;#8217;s besides the point. The argument is thus glibly reduced to &amp;#8220;idealists vs. pragmatists&amp;#8221;, and in these times the pragmatists always win the debate for public opinion.&lt;/p&gt;


	&lt;p&gt;This process will likely happen again with the debate over data mining. &lt;a href="http://www.msnbc.msn.com/id/11238800/site/newsweek/print/1/displaymode/1098/"&gt;That Newsweek article&lt;/a&gt; does an exemplary job of exploring the technical issues, but they&amp;#8217;re an exception to the rule. The pragmatists will advance the argument that nobody really like spying on Americans, but it&amp;#8217;s the only way to catch the bad guys. And this is what makes me really upset. It&amp;#8217;s bad enough that data mining is likely illegal and invasive, but even more galling that the system most likely will never work in the first place.&lt;/p&gt;


	&lt;p&gt;&lt;em&gt;eavesdropping emc &lt;span class="caps"&gt;ARPA HAMASMOIS &lt;/span&gt;Aldergrove &lt;span class="caps"&gt;AGT&lt;/span&gt;. AMME Freeh White House jihad csystems &lt;span class="caps"&gt;MIT&lt;/span&gt;-LL 22nd &lt;span class="caps"&gt;SAS NWO&lt;/span&gt; pink noise mania&lt;/em&gt;&lt;/p&gt;


	&lt;p&gt;So, what are the technical problems that make such a system unfeasible? For starters, this isn&amp;#8217;t actually data mining. I used to work at a data mining software developer (&lt;a href="http://www.dimins.com/"&gt;Dimensional Insight&lt;/a&gt;) and the goal of those products was to organize complicated data into easily traversable ways for analysts to drill down for connections. In a typical case, you might want to look over your sales data for the last year to see how products sold in particular parts of the country, which sales divisions did best, or similar queries. The process is human-driven and its sole purpose is to represent complex multi-dimensional data (ie, price, product, sales person, city, region, state, time, correlation to other product sales, etc.) in an easily viewable and usable manner to drill down through the data for connections. In addition, data mining involves looking backwards from the present to gain insight into past purchasing patterns to drive future sales (the classic success story is &lt;a href="http://www.dmreview.com/article_sub.cfm?articleId=1006133"&gt;a supermarket finding a correlation between diaper and beer sales&lt;/a&gt;).&lt;/p&gt;


	&lt;p&gt;Instead, the &lt;span class="caps"&gt;NSA&lt;/span&gt; largely seems to be interested in predicting novel future behavior and retrieving warnings when suspicious activities occur. This is actually more in line with Artificial Intelligence research on classifiers. Essentially what the &lt;span class="caps"&gt;NSA&lt;/span&gt; seems to be striving for is some sort of theoretical Threat Box which can be fed a steady stream of events and spit out a warning for human analysts to follow up in certain cases. Whether the backend classifier is a neural net, support vector machine, or other sort of technology, the process of training classifiers usually includes the same parts. First, a &lt;strong&gt;training set&lt;/strong&gt; needs to be assembled out of a mixture of scored positive and negative events. So, if you were creating a &amp;#8220;terrorist email classifier&amp;#8221;, the positive events might be emails from Osama Bin Ladin, the negative would be emails from your Aunt Sue (there are usually many more negative than positive selections). When the testing is done, a similarly pre-classified &lt;strong&gt;testing set&lt;/strong&gt; is used to evaluate how good the classifier actually is. The goal of a well-trained set is to make the correct correlations between certain inputs (so that the presence of &amp;#8220;bomb&amp;#8221; and &amp;#8220;white&amp;#8221; and &amp;#8220;house&amp;#8221; triggers an alarm), but a constant risk with such systems is the danger of erroneously assuming certain correlations are meaningful (eg, all of Mohammed Atta&amp;#8217;s emails included the word &amp;#8220;the&amp;#8221;, so the system concludes that &amp;#8220;the&amp;#8221; indicates the email is dangerous). To minimize such problems, both the training and test sets usually go through a process of &lt;strong&gt;feature selection&lt;/strong&gt;, where meaningless information is filtered out so it doesn&amp;#8217;t affect the classifier. If this sounds like more of an art than a science, it is and there are several ways in which errors can manifest:&lt;/p&gt;


	&lt;ol&gt;
	&lt;li&gt;Human Bias &amp;#8211; people are necessary to select and classify the training and test sets as well as feature selection. This can create biases in the system that reflect the assumptions of the creators. As Malcolm Gladwell&amp;#8217;s &lt;a href="http://www.newyorker.com/fact/content/articles/060206fa_fact"&gt;excellent article on profiling&lt;/a&gt; explored, such biases create systems that solve the wrong problems and vulnerabilities exploitable by intelligent attackers.&lt;/li&gt;
		&lt;li&gt;Too Little Human Bias &amp;#8211; on the opposite end of the spectrum, it&amp;#8217;s possible to have too much faith in the effectiveness of a classifier. The difficulty here is that the judgement of the computer will be accepted as absolute. One problem is that it generally is impossible to extricate any clear explanation of the classification&amp;#8217;s reasoning (any explanation is like teasing out thought at the neuron level, making it too low-level to be sensible). Furthermore, even if such explanations were available, the experience with the &lt;span class="caps"&gt;TSA&lt;/span&gt;&amp;#8217;s No Fly Lists suggests they would not be made available to agents acting on the classifier output. At best, this would only mean that the &lt;span class="caps"&gt;NSA&lt;/span&gt; is inundated with erroneous data. At worst, it could lead to extensive spying, internment, and misguided strategic directions. Best not to contemplate this further.&lt;/li&gt;
		&lt;li&gt;False Positives and Negatives &amp;#8211; any classification system usually contains some mix of false positives and negatives. In this system, the political pressures seem to mandate that the false negatives (ie, missing a threat) should be 0 if at all possible. Unfortunately, minimizing the false negatives invariably increases the false positives, meaning that more events will be erroneously triggered. For the dangers this presents, look at item #2.&lt;/li&gt;
		&lt;li&gt;The Wrong Tool For the Job &amp;#8211; Even if the classifier were able to achieve a remarkable level of correctness and accuracy, it&amp;#8217;s still possible it would be the wrong tool for the job. As one blog has observed, &lt;a href="http://alexandertheaverage.blogspot.com/2006/02/nsa-scandal-got-you-worried-then-i.html"&gt;Osama Bin Ladin probably isn’t clicking around on Amazon&lt;/a&gt;, meaning that these tools for signal intelligence won&amp;#8217;t be very useful if the enemy is not creating any signals for them to detect. How likely are terrorist cells going to be using email when you can fedex or hand-deliver documents anywhere around the globe in a few days? How hard is it to exploit steganography or misdirection to thwart any tracking systems? In this case, this elaborate &lt;span class="caps"&gt;NSA&lt;/span&gt; system just becomes another example of America&amp;#8217;s heavy reliance on technology (smart bombs, sigint, spy satellites) over the crude and dirty human-based methods of gathering information and waging war. How many gaps in our knowledge will this system ever fill? Or are there other ways to more effectively gather information for the cost of building this surveillance infrstructure?&lt;/li&gt;
	&lt;/ol&gt;


	&lt;p&gt;&lt;em&gt;freedom Kennedy chameleon man mindwar &lt;span class="caps"&gt;BROMURE &lt;/span&gt;Echelon &lt;span class="caps"&gt;TELINT &lt;/span&gt;Armani Marxist Bletchley Park &lt;span class="caps"&gt;FIPS140&lt;/span&gt; nuclear supercomputer mania &lt;span class="caps"&gt;USDOJ&lt;/span&gt;&lt;/em&gt;&lt;/p&gt;


	&lt;p&gt;Of course, the &lt;span class="caps"&gt;NSA&lt;/span&gt; will claim their systems are effective and already performing vital tasks in the war on terror. Indeed, the article about &lt;span class="caps"&gt;ADVISE&lt;/span&gt; reports &amp;#8220;the system &amp;#8211; parts of which are operational, parts of which are still under development &amp;#8211; is already credited with helping to foil some plots.&amp;#8221; But what are the nature of these plots? Even if I give the &lt;span class="caps"&gt;NSA&lt;/span&gt; credit and assume they aren&amp;#8217;t being duplicitous about current problems in the hopes they can fix them later (a condition known as &lt;em&gt;hope creep&lt;/em&gt;), I wonder if the same circular logic presented in the case of Guantanamo Bay detainees will also be applied here: &lt;a href="http://news.bbc.co.uk/1/hi/world/middle_east/4708946.stm"&gt;Guantanamo Bay is only for Al Qaeda terrorists, therefore everybody interred there must be a dangerous terrorist&lt;/a&gt;. Again, what are these thwarted attacks? Are they real, orchestrated and viable or just some bored teens talking smack on MySpace? Such information will never be made publicbecause the &lt;span class="caps"&gt;NSA&lt;/span&gt; needs foiled plots to justify the system and thus any vagule plausible detected messages will become foiled terrorist plots. Meanwhile we lurch closer to national insolvency, confident in our abilities to detect the next 9/11 if only those darn terrorists would play by our expected rules. And we lay the groundwork of a national surveillance state, just ready to be exploited by some avaricious future leaders.&lt;/p&gt;


	&lt;p&gt;So, what is to be done? I wish there was something. If the stakes were not so high, I&amp;#8217;d suggest that it might be worthwhile to update M-x spook with a whole new lexicon to sabotage the surveillance mechanisms now before more money is thrown into the whole. But these are no times for pranksters, and the only real solution is for Congress to exert the oversight they have. They killed &lt;span class="caps"&gt;TIA&lt;/span&gt; one already and the could do it again. The oversight is theirs; if only they had the courage and the wisdom to use it.&lt;/p&gt;</content>
  </entry>
</feed>
