Anonymity: Good and Bad

Reading Jen’s blog entry about anonymity really makes one think about all of the problems derived from the difficulty of identifying someone on the internet.  A backdoor to a computer here, a remote login from a server there, and the dedicated person seeking to hide their true identity makes it very hard for anyone trying to track them down.  You are already somewhat anonymous by merits of the internet’s structure, and it is quite easy to become even more so.

In some cases, this leads to the sort of unfortunate harassment seen in the case of Nikki Catsouras; this is a perverse misuse of anonymity.  Sadly, it is by no means a unique event.  Internet anonymity fairly recently gave us a social tragedy in the form of a site for cruel gossip called JuicyCampus.  People use the Internet in crime, for fraud, for distribution of child pornography.  They can use it for baseless slander and defamation, threats, and even for finding victims as with the Craigslist killer.

Can we legislate in any way that makes it easier for law enforcement to track people down?  I think so.  We could at least make it more difficult to hide.  As an example, if we are dealing with e-mail or something posted on a message board or blog, the IP address from which the connection was made is likely logged.  Of course, that need not be linked to the actual user if they use an anonymizing server.  However, we could put requirements on the operators of such servers, holding them liable if they did not, to log the connections made through them.  We could then begin to unravel the mystery of identity when necessary.

Nonetheless, I do not really want that sort of answer to this problem.  Yes, relative anonymity on the internet leads to some awful things, but it is not without redeeming qualities.  Most importantly, it allows people to speak out in legitimate ways.  That ability is something our country has valued dating all the way back to the days leading up to the American Revolution.  The famous early example of using anonymity in criticizing government is Paine’s Common Sense.  Today, people living under oppressive regimes around the world can use internet anonymity to speak out without revealing their identity.  This anonymity is very important when such a revelation could lead to the government punishing them for their views.  Similarly, people in the United States could speak out against the practices of their company or a client of their company without concern for losing their livelihood.  In these cases, speaking out may facilitate some greater good.

In general, anonymity simply provides a layer of protection for someone who wants to express an unpopular view.  I believe these reasons are amongst the strongest motivations for preserving anonymity, but I think there are other legitimate and inoffensive reasons for it as well.  For example, one might wish to have a hobby or carry out discussions that, while perfectly legal and safe, might be embarrassing in their social and professional circles.  If they are a well-known person or a member of a well-known group, they may seek feedback untainted by the many preconceived ideas people have of their work.

I believe each of these situations are legitimate cases for preserving internet anonymity.  It seems to me that most of our current options for easing the disambiguation of lawbreakers’ identities makes it more difficult for legitimate uses of anonymity.  Perhaps someone has a better solution out there, but if this is the compromise we have to make, it is no real solution.  The unfortunate consequences, while tragic, are not sufficient cause for trying to rid the internet of anonymity, though it really does make you wish people could just be more responsible.

Rights to Virtual Life and Liberty

Out of concerns for fairness and equal opportunity, I think it’s time to declare that people have a right to the Internet.  More specifically, people should have both affordable Internet options for their home and free, public Internet access through a library.  The latter option is to ensure that people will still be able to function in society as more and more activities go completely online.  Additionally, as some of my classmates have discussed previously in this blog, the people need to have the basic education to handle the technology.  By recognizing a right to understand and utilize the Internet, society can progress at a faster rate, becoming more efficient and making more activities possible.

The Internet offers many benefits, allowing people to start and manage businesses, access an unparalleled amount of information, participate in politics and government, and connect with other people.  These benefits can only be completely realized if people have certain liberties when it comes to the Internet.  The Association for Progressive Communications details some Internet freedoms people should have, including rights to speech, information, collaboration, privacy, Internet governance, and knowledge of Internet rights.

While the United States has naturally cultivated an Internet that provides many of those aforementioned liberties, not every country is as lucky.  Both to defend the current freedom in our country and to provide an example abroad, the United States should explicitly codify digital rights in one document, even if already covered by other laws.  After all, one comprehensive document is easier to promulgate than a hodgepodge of laws.  Furthermore, the more liberty everyone enjoys in regards to the Internet, the better off everyone is, as restrictions on one set of people lower the quality of the whole network by limiting the number of people who can provide, interact with, and consume content.

More importantly, the United States needs universal Internet access (which it doesn’t have) to help pave the way for new technology to replace the old.  Estonia, which “passed a law declaring Internet access a fundamental human right of its citizenry,” has done very well with its Internet initiatives. (Granted, Estonia covers an area about 200 times smaller and has a population about 250 times smaller than those of the United States.)    Instead of focusing so much energy into outdated technologies, the government should focus more on how to ensure that everyone has access to the present and future of communication and information, e.g. the Internet.  By outlining rights to the Internet, a technological future can be imagined and fulfilled.

In conclusion, to protect and further current liberties and to enable a better future, the basic rights people have in regards to the Internet should be clearly stated in a single document.

Grief from Griefers

We have talked about anonymity in class and how it doesn’t necessarily exist online.  Generally it’s true that you aren’t as anonymous as you might think, but anonymity is a hallmark of the Internet–people interact with others that they’ll never meet in real life, hiding behind usernames.  But given enough time and effort, anonymity can be breached.  For instance it is possible for an organization like the RIAA to “deanonymize” a filesharer through subpoenas or for a investigative netizen to track down someone’s identity using usernames repeated across websites to gradually gain details.  What I’m curious about isn’t so much strict anonymity (you don’t know someone’s real-life identity), but the consequences.

I’m going to warn you in advance not to Google anything until you’ve read this entire section.  Her name is Nikki Catsouras, but perhaps you have heard of her as the “Porsche girl”–it is likely that you have if you frequent certain social media websites like Digg or Reddit, or go to shock sites like rotten.com.  “Porsche girl” refers to photos of the aftermath of a young woman speeding at nearly 100 mph on a California highway in a Porsche 911 Carrera, who flipped her car over the median and hit a tollbooth.  The pictures are bloody and the girl is an unrecognizable mess.  The way the pictures got on the Internet is bloodboiling–two California Highway Patrol officers leaked the pictures (taken as a routine fatal accident response procedure) by emailing them to friends as a “cautionary tale.”  The family filed suit against the CHP and a judge dismissed the case, writing “No duty exists between the surviving family and defendant.”  Essentially the dead don’t have a right to privacy.  The Catsouras family is appealing, and have a chance, since there has been at least one precedent saying that the family of the deceased have a right to privacy.  The continuing tragedy lies in the behavior of individuals not related to the case.  Griefers took it upon themselves to harass the family, several emailing Nikki’s father pictures from the accident.  The family lives in fear of coming across the photos through innocent Google searches.

When one thinks of Griefers, one usually thinks of online games or “virtual world” communities–places that Griefers are notorious for harrassing. Griefers thrive on the Internet because it is for their purposes, anonymous.  They don’t know their victims on a personal level, and likely don’t care, but the Internet adds another degree of separation.  From the Newsweek article:

“It’s like having a mask,” says John Suler, a cyber-psychologist at Rider University. That mask can cause us to behave in ways we normally wouldn’t—fueled by a kind of mob mentality. “The people looking at these photos don’t have to face this family, and it disconnects them from the victims they’re hurting,” says Solove.”

Anonymity has a hidden cost: people feel like they can be bigger jerks and the structure of the Internet allows them to get away with it.  Griefers on Second Life can have their accounts (or maybe even IP addresses) banned, but it’s easy to create a new account or use proxy services to get around the blocks.

Griefers will likely continue to be a problem, and one can hope that Nikki’s story never repeats itself in the future, but the only place I can imagine stopping it is when the CHP officers emailed their friends.  Once something is on the Internet, it is there to stay and the Griefers will play.  I have no solution for the Griefers problem–as long as there is a computer screen between a Griefer and a victim, Griefers will continue to work.   One answer is a consistent Internet identity that is carried across several sites.  This would require people to be accountable for their behavior over several communities.  A Griefer might be proud of his accomplishments on one message board, but may not be so proud on a job hunting website.  But there are legitimate reasons that a person might have multiple identities–for example a college student separating his social life from professional by making his drunken Facebook exploits friends-only.  Facebook is interesting as it’s been cleaned up voluntarily by users as more employers check students’ Facebook accounts.  This shows that are people are responsible for their behavior over different “communities” (social and professional), they change their behavior.  But this Facebook effect may not translate to the entire Internet, and this might not be something we want.

I don’t know if policy has a solution for them.  I have no idea how to get rid of Griefers.  They seem like a sad, inevitable part of the Internet, and they might be here to stay.

Search and the Swine Flu

This week, much to Jon Stewart’s dismay, instead of discussing Obama’s first 100 days, we spent much of our time talking about the swine flu.  From a public policy standpoint, it would appear that there are some lessons to be learned and gleaned from how we use the Internet to search for information.  Additionally, there was some discussion last night about: 1) documentation of our cultural history (in this case, via a carefully worded court ruling) and 2) how there are differences between actively vs. passively searching for problems.

So what does search tell us about our past, present, and future?

The past
.  It appears that when Google looked at their search data history, they realized that there were some signs of the swine flu’s emergence.  The problem wasn’t discovered earlier because these searches didn’t raise any red flags.  I wouldn’t say that Google’s Flu Trends team (who knew they had one?) necessarily dropped the ball on this but this problem is a great example of the problem that emerged in the Wiretapping discussion – a computer still needs to be asked what to find before it can know to go look for it.  In the swine flu scenario, this can be corrected in the future by telling the computer to localize its search and alert someone when there are changes flu trends that don’t jive with historical trends.  Secondly, as the article mentions, you can’t actually predict that a disease will spread until at least one person has the disease.

The present.  One of the Internet’s (and information technology in general) strengths is its ability to spread information quickly.  The implications of information that is viral can be tremendous – think of the what could’ve happened on September 11, 2001, if folks in surrounding towers got a text message to tell them to exit the building after the first building was hit?  Yet this also means that incorrect information can spread.  When people started to find out about the swine flu, apparently Twitter became ablaze with people looking for updates and spreading the word about latest developments.  Bottom line – just because technology makes information easier to spread doesn’t automatically make public health concerns easier to manage.

The future. So what are the implications for smart information technology policy?  How do we go about creating policy that will actually bridge the gap between what people should do and what they will do?  The biggest advantage of search is how it’s a public demonstration of future intentions.  If it appears that a bunch of people in a small geographic location are searching the words, “flu, cough, fever, sore throat” in the middle of the summer, it’s probably a sign that public health officials should take a closer look.  However, the trickiest part about tracking search inquiries is that once people start to find out about a health issue, they begin to turn to the Internet to learn more about it.  The challenge is now figuring out how separate people’s concerned inquiries from legitimate signs of a potential public health risk.

How does this link back to items #1 and #2 above?  There is now a very different way of recording pieces of our history – we now have the ability to track the development and spread of a potential epidemic in ways that were impossible even 20 years ago.  (Can you imagine if there was Internet and search technology back during the Black Plague?)  Secondly, the biggest task at-hand is to figure out to balance passive and active search when it comes to identifying potential public health risks – how do you know when the search data is telling you there’s a legitimate problem?  For now, we’ll have to keep ourselves occupied with the swine flu.  And Jon Stewart.

Surveillance of Digital Communications

“In executing the Office of the President, Barack Obama is the bomb.”

This is not necessarily my opinion, but I write it to demonstrate something (the above sentence is inspired by a reddit.com comment I saw, but I cannot find it now).  There is a good chance that an email message or other transmission over the packet-switched communications network we call the internet containing some variation of the above sentence, after being filtered using something called deep packet inspection (where a message is dissected and parts of its contents examined by a computer), could be flagged and sent to, say, the NSA for a real human being to have a second look at it.  Although this scenario may not be what would happen today, the technological capability for this is now a reality, and there is a good deal of debate about how to approach wiretapping and surveillance of digital communications and infrastructure.  Traditional law regarding surveillance in telephone and postal communications is not well-suited for application here.

In our readings for this week, Orin S. Kerr provided a general framework of network surveillance law within a discussion of the USA Patriot Act; Charles Savage reported on wireless eavesdropping abroad; and K. A. Taipale argues that FISA is inadequate and that it should be permissible to programmatically select suspects.  Taipale writes that “FISA did not anticipate” the internet or “advanced technical methods for intelligence gathering.”  He rightly notes that applying FISA strictly now would mean that “no automated monitoring of any kind could occur” in the vast majority of cases.

Taipale suggests that the law should be modified to use computationally-enable analysis techniques to communications previously prohibited by the law.  He suggests content filtering and artificial intelligence techniques to “identify potential threats,” and he suggests traffic analysis to “identify organizations or groups and the key people in them.”

While these capabilities are attractive, their implementation would raise a host of issues regarding legal privacy protections (even where the Fourth Amendment protection against unreasonable search and seizure does not apply) and limited government.  Practically speaking, having a judge or congressional committee sign off on every “internet wiretap” on affecting American citizens would be a tremendous challenge because of the technical savvy likely required to understand a (hypothetical) proposed set of criteria.  From a more general policy perspective, having the technical infrastructure and institutional procedures to touch, examine, and data mine any and all digital communications made by any American is itself a huge concern; the potential for abuse, either by rogue individuals, by small groups within the institution, or by the institutions themselves, is only exacerbated by giving an institution more capabilities, no matter the built-in protections.

Digital eavesdropping capability is an important tool for intelligence gathering and law enforcement, but should be implemented with caution and foresight.  As I see it now, a suitable capability would be, at least when an American is the subject, as specific as possible (e.g., target only one email address) and as simple as possible–a judge should be able to understand and approve or disapprove of its use on a case-by-case basis.

Internet Surveillance Oversight

This week our course readings deal with telecommunication, VoIP, and wiretapping and they concentrate on government surveillance of citizens through the various means.  I have a some concerns when it comes to government (or private companies for that matter) collecting a lot of information about citizens.  I do, however, agree with K. A. Taipale (author of one of the readings, “Whispering Wires and Warrantless Wiretaps: Data Mining and Foreign Intelligence Surveillance”) that someone must gather the intelligence.  I think it is important then that the gathering of the intelligence be well controlled and that there’s a way to track how the different agencies used the data in their investigations.

As most of the communication has moved to the internet, much of the intelligence is moving to analyzing the internet packets that carry the data and also the information that is contained in the packets (for example if an e-mail message was broken down into multiple packets, then the packets need to be assembled and their contents analyzed so the e-mail message can be reconstructed and finally analyzed).

As the agencies sift through the internet traffic, they necessarily collect a lot of information that is to be analyzed.  One of the main uses of the surveillance is to use the data to find new threats.  This is where I think it is important to make sure there’s a clear separation of power between who decides what methods can be used to look for the threats and the actual looking for the data.

The articles point out that the current system of issuing warrants is cumbersome and slow compared to the speed of how things change on the internet.  This has led to increased power that is given to the agencies to decide for themselves what is okay to do.  By allowing the agencies to have more control to define what’s suspicious and what’s not, I feel that the checks and balances are lost.

The government agencies are necessarily a bit paranoid and overzealous when looking for threats.  I feel that if they are given too much power to decide for themselves about what’s suspicious and what’s not, if they don’t find enough suspicious things the will quite likely keep broadening the scope of their searches until they find enough things they don’t like.  I think this is especially likely when a lot of the information is collected and stored and can be searched and re-searched with different criteria with little oversight.

I think three things could be done to help crate good oversight of the internet surveillance efforts.  As the court system of wiretap warrants seems to be not adequate, I think a light weight system could be created.  Perhaps still a judge would give out the warrants, but he or she could be dedicated to that job and have reduced paperwork overhead so her or she could meet with law enforcement agents and quickly review some documents they present and issue or deny a warrant.  The judges could be rotated so they don’t have to do this busy work all the time and also so there wouldn’t be a favorite judge who seems to give out warrants all the time that the agencies can go to specifically.

Also, I think that there should be clear data-retention limits.  Same as for the phone line wiretapping where the evidence is collected during the time the wiretap is present.  For the internet data, the agents would have some limited time where they can perform the desired search on the data.  This would force the agencies to have a clear action plan and prevent them from poking around until they finally find something.  Also as searching time will depend on the dataset size, this would force them to narrow down the data that they want to search.

Although I’m not sure how practically feasible this would be, lastly, the collected data could be stored by one agency responsible for the storage, and other agencies would make specific request to that agency for the subsets of the data they want to search.  This way, for different investigations it would be easier to keep track of what data was analyzed and when for each investigation.

I think that the above suggestions could create a more trustworthy and better system for internet surveillance.  The separation of power would ensure that the different agencies access to the data is not overused.  The data retention limits and storage of the data by separate agency would hopefully improve privacy by further controlling the access to the data.  As was pointed out by  Taipale, the surveillance is needed, so we should work on finding good ways of controlling how the surveillance is performed.

Note: I’m not exactly clear of what a wiretap warrant would be for the internet surveillance, but perhaps something that defines what search terms can be used, or  that only traffic that seems to originate from certain IP address can be analyzed.

Re-identification: Not as Easy as it Looks

At a talk here at Princeton yesterday, Colorado law professor Paul Ohm gave a stimulating talk about the future of privacy in an age of re-identification. Re-identification is a computer science term for identifying the subjects of data that has been “anonymized.” Probably the most famous case, which Ohm discusses in his talk, is the re-identification of users in a dataset that had been released by Netflix. It turns out that the date and rating of six movies viewed by a particular user was usually enough information to uniquely identify an individual in the Netflix database. In this case, researchers cross-referenced the Netflix database with information culled from IMDB; they analyzed 50 IMDB users and were able to re-identify two of them.

Ohm is in a unique position: he’s a law professor with extensive experience in computer science. He brings some badly needed technical savvy to a discipline that badly needs it, helping to alert his law professor colleagues to computer science developments that are relevant to contemporary legal debates. And the key insight of his talk—that re-identification techniques make the notion of “anonymous data” more complex than it used to be—is clearly correct and should be incorporated into legal scholarship.

But Ohm takes this worthwhile insight and pushes it farther than it warrants. His talked focused on the gradual erosion of privacy that will occur as people are able to link more and more databases together and re-identify their subjects. In the stylized example he presents in his slides, that Netflix database can be linked to real-life Facebook users by comparing the movies a user has mentioned in his Facebook profile. And at the same time, a user’s IMDB profile might be linked to some other database containing embarrassing and potentially career-wrecking secrets. Ohm imagines a world in which, uddenly, everyone’s darkest secrets are on display for everyone else to see.

I think this story is implausible for a couple of key reasons. First, it downplays the technical difficulty of performing re-identification techniques. The Netflix incident demonstrated that re-identification was possible with publicly available information and commodity hardware. But it required weeks of effort by two of the world’s smartest computer scientists. This doesn’t get us anywhere close to demonstrating that anyone can perform re-identification on any pair of databases.

Second, Ohm glosses over issues of access to data. The Netflix result depended on Netflix having previously released a dataset containing the ratings of hundreds of thousands of its customers. One of the likely consequences of the Netflix story (and others like the AOL incident) is that companies will be much more cautious about releasing large data sets in the future. So future re-identifiers will have to work harder to get access to the large data sets required to carry out these techniques. Facebook is a case in point. Data on Facebook is not publicly available. Rather, by default, any given user’s information is only made available to users who have some connection to the user. This can be a large number of people: in my case, everyone with a princeton.edu email address can get access to my information. But having one’s information available to a few thousand or tens of thousands of people is different from having it freely available to anyone.

Even public websites can take steps to make re-identification attacks more difficult. For example, a site like IMDB can prevent “screen scraping” by imposing limits on the number of pages that can be accessed from any given IP address.

Of course, such tactics, in turn, have counter-measures. For example, a bad guy might get around a per-IP access limit by renting a botnet to scrape the site simultaneously from many computers at once. But this brings us to a final objection to Ohm’s argument, which is that costs matter. A re-identification tactic that takes a month of hacking and $10,000 in botnet time is very different from a technique that can be implemented in a few minutes and executed on a single PC. Ohm is likely right that the re-identifiers will always win, if by “win” we mean demonstrating that any given anonymization technique can be defeated given sufficient resources. But magnitude of the resources required matters quite a bit.

It’s important to remember that it has always been possible to hire a private investigator to dig through someone’s trash, and that this technique often succeeds in unearthing embarrassing information. Ohm’s argument depends on the assumption that re-identification is not only possible, but that it can be done routinely at a cost that’s significantly cheaper than traditional methods. The examples he cites don’t come close to establishing that. Two of the three cases involved computer security researchers with weeks of time to plan their attacks, and all three involved bone-headed (in retrospect) decisions to release enormous data sets to the general public for “research” purposes. These incidents certainly hold important lessons, for both policymakers and researchers. But they’re not evidence that re-identification techniques are about to make privacy a thing of the past, or that radical changes to our privacy laws are needed to deal with the problem.

Workplace Internet

Matthew Lasar at Ars Technica writes about a new report from Palo Altos Networks on the state of internet usage in the corporate setting.  The results are pretty bleak, from the point of view of corporations.  Some highlights:

  • P2P usage was found 92% of the time
  • Web-based file sharing was found 76% of the time.
  • Most of the large amounts of bandwidth consumed are for uses like media, social networking, file sharing, and web browsing
  • Enterprises spend over $6 billion per year in total, protecting their networks
    • 100% had firewalls
    • 87% had one or more firewall helper
    • In spite of these, “they were unable to exercise control over the application traffic traversing the network.”
  • 57% of applications used have built-in features to bypass network constraints, such as opening different ports.

So what does all of this add up to?  Sounds like a network admin’s nightmare.  That’s a lot of resources that companies are pouring into technology, and the majority of it appears to be used for non-work, personal things.

What motivations could an employee have for using or not using company resources for something like file-sharing?   One obvious reason — bandwidth.  Connection speed might be better at work than at home.  Let’s assume for a second that the file-sharing is part of some kind of illegal activity (like downloading illegitimate copies of copyrighted music).  The employee might have motivation to not engage in this activity, for risk of getting fired of her employer finds out.  But she might also have some incentives or perceived incentives for doing so.  She might assume that if she downloads music at work instead of at home, then if she gets caught by authorities, her company will get in trouble instead of her.  (If nothing else, the company represents potentially a more valuable entity to sue).

Employers want to increase productivity, but have to do so while keeping in mind issues like security and liability.  From a productivity and policy standpoint, it might seem draconian to prevent employees from listening to music while working (like Pandora or Last.fm), or ban them from watching YouTube videos on their lunch break.  But they need to worry about a disgruntled employer stealing company secrets or client lists.  P2P applications that share files by default also pose a threat in the form of unintentional data leaks.

Applications have built-in methods for circumventing traffic controls, but in other cases, employees take a more active approach in workarounds.  This adds up to lots of undesirable content.  Over-regulation or too much overight what your employees are doing is likely to create unhappy employees.  But the security implications can’t be ignored either.  Tightly closed systems can decrease productivity, or even increase security problems.  For example, having a LAN that is not connected to the internet, and a wireless network for non-secure web use might work well, or it might encourage employees to just find ways to combine the two, meaning the the wireless network now is the weakest link in the security chain.  Think of an employee who goes to a meeting with his laptop and has to plug in to the LAN to access his important documents.  More likely — he’ll use wireless and maybe personal email or filesharing in order to have more convenient access to the files.

Training will probably be important in whatever policy is enacted.  Make employees aware of what the policies are, what proper and improper use are, and what penalties there are.  Over-regulation should be avoided for reasons other than just keeping employees happy.  If visiting YouTube, checking your personal email, or browsing news stories on the web result in termination of employment, you’ll either have few employees left, or lots of employees with a “don’t ask don’t tell” policy, or just outright apathy towards the regulations.

Perhaps limiting internet bandwidth is an option.  Unusually high bandwidth usage from individual computers should probably be flagged anyway, for security reasons (a computer which suddenly is scanning the network for open ports might be infected with a virus).  It might not be a bad thing to discover that huge bandwidth usage has resulted from an employee watching YouTube all day.  And for those pesky applications that make it easy to do bad things — well maybe the workplace should switch to using Linux.  Force employees to at least work a little harder to do bad things.

The FCC and Profanity

Note: This post was written before the Supreme Court announced its ruling on FCC vs. Fox.  As mentioned in class, the ruling has now been announced

The FCC has been granted the power to regulate obscenity, indecency, and profanity.  While the rules regarding obscenity and indecency are problematic, I will focus on the profanity rule as it is the most general and least defensible of the three.  The FCC doesn’t really define profanity, declaring instead that:

“Profane language” includes those words that are so highly offensive that their mere utterance in the context presented may, in legal terms, amount to a “nuisance.”

In a Memorandum Opinion and Order about the F-word, the FCC cites a circuit court decision that declares profanity to be:

construable  as  denoting   certain  of  those  personally reviling  epithets  naturally  tending  to  provoke  violent resentment  or denoting  language  so  grossly offensive  to members of the public who actually hear it as to amount to a nuisance.

Not surprisingly, this ambiguity in the definition of profanity brings up a lot of questions. In addition to the FCC’s definition of profanity being hazy, I believe it violates the first amendment.

I think it’s fair to say that the only reason that freedom of speech is protected is because speech is sometimes considered offensive (”causing displeasure or resentment”) and, thus, at risk to be stifled, even though the speech might contain public value.  Under this rationale of protecting the value of communication, the fact that some speech is “highly offensive” seems to warrant it in need of more protection, not less.  This is not to say that speech should never be restricted (e.g. “clear and present danger” restrictions make sense), but only to say that utilizing expression to its fullest potential sometimes necessitates making a large portion of the population uncomfortable with the content.

It is especially worrisome that the description given by the FCC for profanity does not make explicit exceptions, as in the test for obscenity, that would exonerate works of “serious literary, artistic, political, or scientific value.”  Instead the FCC makes vague references to the importance of context, claiming that no word is always profane and that legality depends on a “case-by-case basis.”  Considering how riled up people get over topics such as abortion, homosexuality, and evolution, I wonder how one could distinguish their inflaming effects from that of curse words, hate speech, and other vulgar language.  A very loose interpretation of the FCC’s profanity definition could result in a topic being banned from being broadcast from 6 am to 10 pm if enough people complained about hearing about that topic.

The Supreme Court is expected to release a ruling on whether the FCC can regulate “fleeting expletivesin June.  Hopefully that ruling will clear up some of the ambiguity around the FCC’s profanity’s power.  Of course, probably the best outcome is for the Supreme Court to strike down FCC’s power to regulate profanity entirely because of how the power threatens free speech (but I don’t think that will happen).

The Digital Fourth Estate

The Internet is drastically changing the face of journalism. But unlike many other industries that are undergoing painful transformations, journalism is more than a business — it’s a vital part of our system of government. Thus, when institutions like the New York Times that have served as important checks on our government are struggling to stay afloat, it’s more than just their investors who get nervous. Nevertheless, there is strong evidence that the very same technologies that are making print journalism obsolete will also enrich and enhance our democracy.

The Internet isn’t simply replacing newspapers with their digital equivalents. Due to the historically high costs of distribution, newspapers are highly integrated, both vertically and horizontally. A single newsroom would be responsible for tasks at all levels of the journalism stack, gathering information, reporting on it, and providing in-depth analysis. In the digital world, there often different destinations for each of these. When I’m looking for the latest happenings in the world of tech policy, I look to Ars Technica’s Law and Disorder blog. If I need help making sense of it, I read sites like the Technology Liberation Front and Freedom to Tinker. And of course, I don’t expect these sits to tell me the latest baseball scores or help me sell my old couch, there’s mlb.com and Craigslist for that. Because these websites specialize, they provide much richer content than any integrated portal ever could (imagine trying to get all your tech policy news from the Wall Street Journal). And, thanks to RSS readers, I can still get all this information in one convenient place.

But there is still one of these roles that the digital world is lagging behind in filling: gathering news from primary sources. Because of this, some have claimed that digital journalism will suddenly find itself without a leg to stand on when the traditional reporters aren’t around to gather the news  (including me, oops). It’s not hard to see why this would be a point of concern. Unlike distributing content, where both the creators and consumers are online, most news still happens in the “real world.” It still requires old-fashioned leg work to get that information into a computer. This, however, is beginning to change.

Increasingly, ordinary citizens are digitizing information and making it publicly available. At the CITP’s recent conference on “Studying Society in a Digital World,” evidence of this abounded. Professor Samuel Madden of MIT demoed a system that streamed data from cars’ on-board computers into the cloud, where it could be used to map out exact locations of potholes and provide accurate traffic reports. Another researcher, Professor Jon Kleinberg of Cornell, presented an amazing method [pdf] for making sense of the 50 - 100 million geotagged photos that are freely available on Flickr. Kleinberg was able to identify the most photographed landmarks in the world, what their names were based on tags (e.g. eiffeltower), and what a canonical picture of that landmark actually looks like (i.e. the Eiffel Tower is the big metal structure, not the photographer’s mother standing in front of the ticket stand), all with a completely automated algorithm.  As this trend of social information sharing continues, it’s not hard to imagine an independent blogger making a simple query to find a CC-licensed image of a toxic waste dump to appear alongside his critique of the federal government’s Superfund policy, for example.

The second method real-world information will begin to make its way into digital form is by newsmakers providing it that way in the first place. As we discussed in class a few weeks ago, one of the largest sources of news, the federal government, is beginning the process of providing its data in machine-readable formats through it’s as-yet unlaunched data.gov site. When sites like MAPlight.org and Govtrack.us are built around other forms of government data, citizen journalists will find it much easier to provide insightful commentary on what happens inside the beltway, even if they’re a thousand miles away. A similar transformation will hopefully occur for state governments through the efforts the Sunlight Foundation’s 50 states project. And there’s no reason that this trend has to be limited to governments either. Right now, Major League Baseball carefully licenses it’s data, but I wouldn’t be surprised if in a few years they realized they could drive much more interest in baseball if they provided an official API that allowed fans to create exciting new apps and visualizations based on their data.

The Wall Street Journal recently said that “If journalists were the Fourth Estate, bloggers are becoming the Fifth Estate.” They’ve got it backwards. Bloggers aren’t creating a new Estate, they are making the Fourth Estate indistinguishable from the Third.