Category Archives: Proposals

Ideas to support the Tor Project: Wikipedia IdeaLab proposal

Special thanks to my open-access comrade-in-arms Lane Rasberry.

Lane emailed me this morning asking for my input on a current proposal that’s on Jimmy Wales very own Wikipedia talk page.

After CC’ing Runa Sandvik from the Tor Project to verify the factuality of my feedback for the Wikipedia community, I posted my comments.

The ongoing issue, that Jacob Appelbaum repeatedly vocalizes, is that Tor users, Jacob included, is not able to protect his identity and contribute to the knowledge base that exists on Wikipedia.

Political activists and dissidents create a critical feedback loop into the controversial dialogue that is only made possible through the Internet and social media. Not only are these people self-empowering, they are the ones most likely to seek out the truth.

From Lane:

If you would be willing to write a brief set of proposals about what Wikipedia should do with Tor, then [Lane] would format those with you in the IdeaLab. This is a space where ideas are stored on Wikipedia so that they would always be found if anyone ever wanted them. I think it would be a good idea just to establish the conversation.

https://meta.wikimedia.org/wiki/Grants:IdeaLab

[If] it is of interest to you, I would help you start a proposal, format it properly, publicize it, and if you know anyone in the Tor community that might want to make a grant proposal for funding to establish and document the relationship between Tor and Wikipedia, then I might be able to advise on how to do that also.

This conversation is happening now live and it does have Jimbo Wales’ attention. It would be awesome to get input from established Tor supporters.

If you would like to create a proposal and have the support of a Wikipedia veteran, please contact Lane directly, and ask for other peoples input! I’m also extremely interested in supporting, I just don’t know what an ideal proposal would look like, and I don’t want to speak on behalf of Tor Project.

Thank you!

Developing an Open Educational Resource on Encryption

Encryption works. Properly implemented strong crypto systems are one of the few things that you can rely on. Unfortunately, endpoint security is so terrifically weak that NSA can frequently find ways around it.

— Edward Snowden, answering questions live on the Guardian’s website

Society needs an educational resource, covering the complex topics involved with information encryption, that is modular, openly accessible, and freely remixable. This is my proposal to create such a resource.

Open Educational Resources (OER) are freely accessible, openly licensed documents and media that are useful for teaching, learning, educational, assessment and research purposes. The development and promotion of open educational resources is often motivated by a desire to curb the commodification of knowledge[1] and provide an alternate or enhanced educational paradigm.

Utilizing Creative Commons licensing, an OER can be created on oercommons.org, where it will be maintained by a single authority, yet anyone in the world will be able to adapt and create their own work from ours. Oercommons.org provides a long-term support platform for maintaining these resources.

I started publicly asking for help in June of 2013–and I received a very warm welcome. You don’t have to look far to see why.

2013-06-24

August 2013:

2013-08-23 2013-08-23-2

October 2013: KEYNOTE: Journalism in the Age of Surveillance, Threat Modeling: Determining Digital Security for You, [For Journalism] Keeping Under the Security Radar, Improving Your Digital Hygiene

December 2013: United We Stand — and Encrypt by Josh Sterns2013-12-21

December 2013: Arab journalists need training for civil unrest and wars — referencing the CPJ’s Journalist Security Guide

January 2014: A Modest Proposal for Encrypting the Work of Activists by Kate Krauss

2014-01-20

It is clear that a diversity of educational resources are needed. While my original proposal was going to be supported by the United States Open Knowledge Foundation, OKFNUS has since back peddled due to lack of support from central-OKF. I am hoping that the many people behind Crypto.is are interested in spearheading the development of this OER. If they are not, and no other organization is, I will shortly be registering my own domain name to create a project launch page.

The initial launch of the OER can be created using Micah Lee‘s work, of the Freedom of the Press Foundation, Encryption Works: How to Protect Your Privacy (And Your Sources) in the Age of NSA Surveillance. Micah and the Freedom of the Press Foundation graciously licensed this work as CC-BY, allowing us, and even Wikipedia to reuse the work with attribution. I am hoping that Micah, himself, will want to be included in this project.

The target audience, initially, will be journalists, whistle blowers, activists, and dissidents. While these groups are the extreme, their example proves useful for the rest of society.

Please comment on this post, or tweet me, or email me your feedback.

A local initiative for the people’s right to privacy

“Gentlemen do not read each other’s mail.”

This was said by Henry L. Stimson in 1929 in support of the US State Department’s defunding of the Black Chamber program that was used to decipher foreign ambassador communications. At that time, Stinson was the Secretary of State under President William Howard Taft. Stinson’s opinion, however, is said to have changed while he served as the Secretary of War under President Herbert Hoover and President Franklin D. Roosevelt, in which the United States government relied heavily on the enemy’s decrypted communications during wartime.

Mass surveillance is a crime against people, not just the American people. The people did not ask for it, not even the special interests behind the development of the Patriot Act. Secret mass surveillance and secret laws are instituted and accepted by people in power, to gain and maintain power, which are acts that are illegitimate of a developing democracy. They are illegitimate acts of a country that developed the Internet.

Civilly speaking, cryptographically encrypting information before transmission is the same as licking and sealing a letter before mailing it. It is the same as closing a clear glass door on a telephone booth before having a private conversation. It is the same as putting on clothes to protect things expected to remain private.

I expect that only entities that privately sign digital certificates that create the foundation for private chats, private socializing, and secure transactions on the internet can decrypt my information. It should be illegal for entities beyond the original signer of public key infrastructure certificates to have a copy of the private key in such a way that allows said entity to view or record the decrypted content that is expected to remain private between two specific parties. It should also be illegal for any entity to attempt to break or subvert encryption mechanisms on common-carrier infrastructure as long as that data is being transmitted or being stored on American soil, no matter the nationality of the person transmitting their encrypted internet content. It is time for the United States to learn from its mistakes and emerge as a civil liberties leader.

What I would like to do is identify other leaders throughout the United States that want to pass a shared city law that makes illegal the above acts. We should all vote for and approve these laws in tandem to reduce the risk of federal or state legal threats. Cities need to come together to protect local internet infrastructure.

Governance representatives are failing to protect the nature of our constitutional protections in law and debate.  They are failing to understand the importance of the Internet. Federal representatives are literally working backwards at times, with the Patriot Act, CISPA, PIPA, and the TPP as perfect examples. It is time to work from the ground up and enact local laws that affect local internet infrastructure.

We cannot let special interest groups, that bribe our representatives, write our laws for us. The interest of the people needs to be voiced through local law. Let us tell state and federal government that it is not okay to subvert public law with secret law, and that mass surveillance cannot be tolerated, period. Law enforcement has worked, successfully, for hundreds of years without mass surveillance. The city laws that I am proposing do not inhibit the normal procedure of law enforcement to acquire a warrant, through justified evidence, to obtain private information about specific individuals to prevent or punish crime.

In addition to hosting DNS root servers and the Seattle Internet Exchange, the Westin datacenter connects us to billions of un-Americans on the other side of the Pacific Ocean. Many other cities throughout the United States host similar infrastructure. These communication points are ideal for the placement of unethical surveillance equipment, and we must make this act illegal in our cities. Let us put pressure on our state by protecting local resources, the technology that ensures the security of our online communications, and the integrity of our local businesses.

From https://www.aclu.org/sites/default/files/assets/lavabit_brief_of_us.pdf, it is clear that sometimes our founding legal frameworks are not explicit.

THE FOURTH AMENDMENT DOES NOT PROHIBIT OBTAINING ENCRYPTION KEYS FOR THE PURPOSE OF DECRYPTING COMMUNICATIONS THAT THE GOVERNMENT IS LAWFULLY AUTHORIZED TO COLLECT

Let us build our own laws for our expectations of privacy. For example, as described in the book, Toward an Information Bill of Rights & Responsibilities (http://yawnbox.com/?p=283):

Preamble

Information privacy is the claim of individuals to determine what information about them is disclosed to others and encompasses the collection, maintenance, and use of identifiable information. Privacy is an important value in a democratic society. For individuals, it enhances their sense of autonomy and dignity by permitting them to influence what others know about them. For associations, privacy enhances the ability of individuals to function collectively by permitting the association to keep deliberations and membership and other activities confidential. For society, privacy fosters individual and associational contributions to society, promotes diversity, and limits undesirable conduct and abuse of authority by government and other institutions.

Privacy is not an absolute right. It must be balanced with competing values and interests, including First Amendment rights, law enforcement interests, and business or economic interests in information. The following Code of Information Rights and Responsibilities attempts to strike an appropriate balance between privacy and competing interests, in an environment shaped be technological breakthroughs in the ability of organizations to collect and disseminate personal information.

A number of characteristics of the new information environment make it imperative to adopt a Code of Information Rights and Responsibilities. These include:

  • Technological enhancements in the ability to capture, store, aggregate, exchange, and synthesize large quantities of information about individuals, their transactions, and their behavior;
  • Proliferation of powerful computing capacity to the desktop;
  • Creation of worldwide networks through which information about individuals can easily, cheaply, and quickly flow;
  • Increasing use of target marketing, modeling, and profiling;
  • New technological abilities that permit individuals to access personal data maintained by others;
  • Decreasing cost of computing technology used to manipulate data;
  • New social and cultural values and developments regarding personal information.

Two general principles apply to all of the provisions of the Code of Information Rights and Responsibilities. First, an individual is entitled to greater protection and due process when information is used to make determinations about his or her rights, benefits or opportunities. Second, the protection of privacy must be interpreted consistently with First Amendment principles. Resolving the inherent tensions between the values of privacy and the First Amendment must take place on a case-by-case basis.

The scope of the Code of Information Rights and Responsibilities is limited to individual and associational privacy as defined above, and does not cover government and corporate interests in secrecy. It addresses how activities of information keepers and processors involving the collection, maintenance, and use of personal information should be evaluated when privacy interests overlap or conflict with other interests, values, or significant community needs.

First Principles

A. Collection
There should be limits on the ability of information keepers and processors to collect personal information. Information should only be collected when relevant, necessary, and socially acceptable.

A-1.
Information should be collected directly from the individual whenever possible.

A-2.
When not collecting information directly from the individual, notice, access, correction, and other rights should be provided if the information is used to determine rights, benefits, and opportunities.

B. Notice/Transparency
Individuals providing information to an information keeper and processor have the right to receive, at the time that information is provided, a notice of information practices describing how the information will be used, maintained, and disclosed. Information keepers and processors must provide a copy of notice of information practices upon request. There should be no secret systems containing personal information. Individuals have a responsibility to make informed choices about how information about them is to be used.

C. Access and Correction
Individuals have the right to see and have a copy of any information about themselves maintained by others, consistent with the First Amendment and with other important public and private policy interests. Individuals have the right to seek correction of information that is in error. When a correction is made, the individual may require that copies of the corrected information be provided to all previous recipients. Where this is a disagreement about the accuracy of information, the individual may include along with the disputed information a statement of disagreement.

D. Use
Information may only be used for a purpose that is identified and described at the time that the information is collected. Other uses may be permitted only if they are not inconsistent with the original understanding.

E. Disclosure
Disclosures other than those described at the time of collection may be made to third parties only with the consent of the individual or where required by law. Explicit consent by the data subject shall be required for personal information of the highest sensitivity and may be implied for less sensitive personal information. (Whether consent must be express [opt-in] or may be implied [opt-out] is an open question.)

F. Accuracy
Information keepers and processors must take appropriate steps to assure the accuracy, completeness, timeliness, and security of the information. Information keepers and processors must devote adequate resources to these functions.

G. Enforcement
Rules about the collection, maintenance, use, and disclosure of information should be enforced through suitable mechanisms, such as administrative processes, professional standards, civil actions, criminal penalties, government or private ombudsmen, and other means.

H. Oversight
There is a need for an independent federal entity to conduct privacy oversight and policy-making activities.

  • Information keepers and processors and others should be encouraged to explore technical means to protect privacy.
  • There should be an exploration of other means to promote self-determination in the use of personal information, including proprietary rights and dual control mechanisms.
  • The creation of information trustees who maintain personal data on behalf of diverse information keepers and processors should be considered.
  • There is a need to explore the rights and responsibilities of individuals and information keepers and processors when changes in the use and disclosure of information are developed after the time of collection.

Together we must begin drafting a law that can be shared by the people, city governance, and our local businesses. Together we must approve these measures and begin putting a stop to mass surveillance on any and all people, not just Americans, while also demonstrating our right to privacy.

Information strings and their use in understanding digital journalism

Oxford Internet Institute MSc in Social Science of the Internet research proposal by Christopher Sheats

Introduction. In his book, Information: A Very Short Introduction, Dr. Luciano Floridi defines the differences between five types of semantic information (5-TSI)—primary, secondary, meta, operational, and derivative. Floridi then tells a story and describes specific pieces as being “primary” in nature, or “operational” in nature, etcetera. I have adapted Floridi’s 5-TSI to create a framework that goes beyond focusing on independent pieces of information. My research links these independent pieces together to more effectively trace the focus of how the main topic or topics of a news article are being described to the reader. I propose that this linking of information-types becomes an information string.

Identifying information strings allows one to analyze semantic information by qualifying and categorizing information to determine what is and is not present in any given article. My area of interest concerns the quality of information of politically motivated online news articles. A diverse and relevant range of information strings makes up a news article’s informativeness, which is a metric that can describe how high or low the quality of information is. My objective is to determine the quality of any given set of information, which may or may not indicate aspects of informativeness, misinformation, and disinformation.

Hypothesis. In order for my hypothesis to work, I had to invent an information-string system. The notion of “primary” information is the smallest, least complicated “tier” of information which I call a “first-tier” information string. Operational information concerning a news article topic is “primary-operational” information, or, a second-tier information string, because it is operationally describing the primary information. Each sub-tier is always a complication of its respective higher-tier information string, be it secondary, meta, operational, or derivative. I propose that every information string in a stand-alone news article should start as primary-“something”, because information in any news article should focus on or support a main topic.

I hypothesize that information strings can be logically and visually mapped in such a way that will enhance news aggregation websites.

With any given news article topic or topics (the primary information), there should be a substantial amount of related information already available online. With the release of new information by journalists, politicians, whistle-blower release sites, encyclopedia developments, and social media participants, the nature of how that primary information will evolve. Over time, primary information strings change depending on a multitude of factors that affect primary’s sub-tier information. Through analyzing the nature of the information string change, trends should emerge to help identify those factors.

What I expect my research to support is the premise that information string evolution will dramatically shift based on specific sources, including individual journalists, individual political speakers, entire news agencies, or entire political organizations. Tracking information that has a high probability of being deceptive via the application of information strings should allow me to visually represent the change of information over time to better understand the consequences of using low-quality information.

Methodology. With the help of my research mentor, Dr. Floridi, I will select a topic in news media to analyze. For example, in a public blog post, I looked at an NBC News article alleging that US officials claimed that the Iranian government was responsible for cyber attacks against the US government. Through the application of information strings, I was able to provide evidence that its low-quality information was later being referenced in follow up articles by the same and other news agencies, leading to systemic low-quality information and probable deception. Information strings can become dependent on false information that allow the generation of all kinds of information strings in other, stand-alone, online news articles.

Surveys will need to be developed and administered to a wide range of participants to gauge the informativeness of the use of specific information string diversity and order. Survey questions will depend on the development and application of my information strings framework. The following questions were developed to better shape survey questions:

 

  1. Is it feasible to track the behavior of information in semantic media using the 5-TSI?
  2. How persuasive is one information type, of the 5-TSI, over another?
  3. What type of the 5-TSI affect the trend of semantic media the most? The least?
  4. Is objective information composed more of one of the 5-TSI over any of the others?
  5. In semantic media, can the 5-TSI be broken down into percentages and graphed?
  6. How does subjective information and objective information affect the 5-TSI? Vice-versa?
  7. Is it possible to identify the gaps between data and information in semantic media, depending on the type of information, either biological information or the 5-TSI?
  8. Is it possible to automate the detection of the 5-TSI present in a piece of semantic media?
  9. To what degree does biological information affect the 5-TSI?
  10. What types of the 5-TSI persuades a user of that information to ask more questions rather than make more assumptions? Vice-versa?
  11. Is it possible to use one or many types of information to strategically develop information warfare operations?
  12. Do the 5-TSI change in perception by a biological entity that is limited to biological information?
  13. Does understanding the information type affect one’s ability to understand information in a more objective sense?
  14. How do we extract wanted information from all perceived information, of the 5-TSI?
  15. How do we extract primary information from secondary information? Or vice-versa?
  16. What percentage of the 5-TSI create more perceived information entropy, information, and/or contradictory information? Can the 5-TSI be broken down into these categories?
  17. Do various types of the 5-TSI create any more or less information entropy?
  18. Does the diversity or order of the 5-TSI affect information entropy?
  19. Does information entropy shift as one learns more?
  20. How does information entropy change and how is it affected by biological information and the 5-TSI?
  21. Does semantic information have strong relationships with biological information? Can it be understood using complex adaptive systems analysis?
  22. Is there a dualism to Floridi’s 5-TSI?
  23. Is it feasible to minimize or maximize the use of meta information, except when in support of primary information, to better produce disinformation? Or any of the other 5-TSI?
  24. Is it possible to systematically or systemically organize meta information as primary information, or secondary information as primary information, etc?

 

Conclusion. Depending on the probability of informativeness and the ever present risk of deception in political news articles, a news aggregator such as Google News could eventually achieve two things. One, it could make targeted suggestions to information consumers that present the least amount of content to consume while achieving the greatest amount of informativeness based on open sources. Two, because there will eventually be a database of historical trends based on information string change, a news aggregator could strategically suggest information that will best support probable information changes.

This research will allow for the development of automated systems to best support the actions of an information consumer based on high-quality information, rather than wallow in a bunch of unstructured, seemingly random news with no qualified risks of misinformation or disinformation. If my research is successful, I have every intention to push this research in the OII’s DPhil in Information, Communication and the Social Sciences program.

Henry Markram of the Blue Brain project, founded in 2005 to attempt to create a synthetic brain, was quoted in an interview from 2008 as saying, “So much of what we do in science isn’t actually science. I say let robots do the mindless work so that we can spend more time thinking about our questions.” The internet has extraordinary capacity to meaningfully inform its users. We need better information management systems to help us ask the right questions when it comes to consuming information online.

 

Spearheading a Wikisource repository for political speeches

How did President Obama think about a politically-sensitive topic that concerns you a year before his presidency? How about 5 years before his presidency? 10 years? How far back in his public service does his opinion matter?

Politicians talk a lot. Everyday. Their public speeches should showcase their absolute and relative opinions about how they think Government should affect you. Where can you go to see what they said? How compassionate were they about the issues that matter to you? Did they lie? Did their opinion change? Why did it change? We can’t even begin to answer these questions unless we document them.

This project aims to have citizens use their cell phone’s video recorder to document the speeches of local, state, and national representatives. These videos will be uploaded to Wikisource.org, openly licensed using the Creative Commons, and transcribed so that search engines can index these important words.

The goals of phase one:

  • Develop a standard Wikipedia-modeled framework for properly documenting public political speeches
  • Spread the word to everyone so people know to record their representative’s public speeches
  • Spread the word to netizens who wish to transcribe and verify the transcriptions
  • Spread the word to journalists and researchers to constructively use this data
  • Wiki 1,000 political speeches within a one-year time span

Example: Remarks by the President on Osama bin Laden

 

To the EFF: a Tor Challenge proposal

Hello Electronic Frontier Foundation,

In mid 2011, the EFF started a “Tor Challenge” which encouraged more than 500 people to run their own Tor relays.

It was a brilliant way to bring awareness to the project and expand the Tor network. A year later, it seems that 90% of those relays are no longer operational. The Tor Challenge does not seem to be designed for long-term Tor support, which would be ideal. I am writing to you in hopes of re-initiating the Tor Challenge, but also wanting to add some new functionality. I believe that an EFF sponsored program such as the Tor Challenge can be highly successful for two reasons. First, it is a not-for-profit with the ability of collecting tax-deductible donations. Second, it is a legal/rights-oriented organization which can help alleviate the possible perceived worry in regards to running Tor nodes. With the EFF putting its name on this program, it helps remove the possible drama of uneasy emotions while simultaneously promoting a willingness to contribute to the Tor Project.

  1. Lead by example
  2. Create a community
  3. Award the community

# Lead by example

Looking at Torstatus.blutmagie.de, I see two EFF-run Tor relays. I am really happy to see them, but I’m disappointed by how “slow” they are, and the fact that neither of them are Tor exit-routers.

  • observatory5.eff.org [173.236.34.122]
  • tor1.eff.org [64.147.188.11]

In order to make maintaining EFF-run Tor nodes more sustainable, the EFF should make the Tor Challenge into a dedicated program. Not knowing the internals of the EFF, here are some suggestions:

  1. Make the Tor Challenge a formal program within the EFF, even if it is solely supported by new volunteers (like me!).
  2. Re-initiate your social-media and outreach for the program, but also give the program its own home page, as an example, Torchallenge.eff.org.
  3. Expand the bandwidth of your two current Tor nodes (100 Mbps+), but turn at least one of them into a Tor exit-router.
  4. Rename them for self-branding (for example: Exit01.torchallenge.eff.org and Relay01.torchallenge.eff.org)
  5. Allow volunteers of the Tor Challenge to ask for EFF donations, specifically for funding EFF maintained Tor nodes.
  6. The Tor Project currently has a wiki page of Tor-friendly ISPs and hosting companies. Expand their work and actively engage with US-based companies to educate and identify them. This has the added benefit of looking for companies to donate hosting/bandwidth to EFF for the expansion of EFF maintained Tor nodes.

On one of my Tor exit-routers web page,Tor.anon.is, I specify how much traffic the router has processed since its inception. I do this because it enhances my interest for keeping a node online. It is simply amazing to realize how many people I am actually helping through general-quantification. I would encourage the EFF to devise a real-time tool for displaying the same type of information on your relay’s web pages, and to make those tools available to the Tor Challenge community. You might take the opportunity to perform research (simple surveys) to identify why people run Tor nodes. That might also allow you to devise new ways of enhancing the Tor Challenge community for long-term engagement.

# Create a community

Torchallenge.eff.org (example) should be a one-two punch for educating and highlighting the contributions made by the numerous individuals and organizations that run long-term Tor nodes. It might make people feel as though they are part of a greater community. As a Tor exit-router operator, I would feel very alone if for not hanging out in the #Tor IRC channel. What finally made me push myself to running my own Tor exit-router was the University of Washington hackathon. For me, it was a sense of wanting to engage with these many amazing people. By encouraging in-person meet-ups, even if sponsored by related organizations, I strongly feel that this would enhance one’s sense of community. Without that sensation of connection, there is certainly a higher learning-curve to become at ease when taking the risk of running a long-term Tor exit-router.

The Tor Challenge home page should be social (to some extent) so that people can share their own achievements and to see the successes of others. Torstatus.blutmagie.de does have a fair number of metrics available, as does Atlas.torproject.org, but what is missing is the long-term documentation of who has done what, including the amount of traffic and uptime that people and organizations have contributed. It is also limited by the focusing on the tor node, not on the people and organizations behind them.

  1. The Tor Project currently has a fair amount of material for both educating people about Tor and how they might use and/or support Tor. Certainly expand on these ideas but also find specific ways to engage people who want to run their own Tor nodes.
  2. Devise metrics for contributors so that people can identify with their contributions, but also the contributions of others via that shared connection.
  3. Create a blog so that people can tell their stories – from those of whom who use Tor, but also from those who contribute to Tor.
  4. Create hash-tags and other ways for people to share via popular online social networks.
  5. The social aspects of the Tor Challenge home page should not be limited to people and their contributions. Let people create their own “guilds” or TorChallenge clubs that bring awareness to hacker spaces as well as university clubs and/or organizations.

# Award the community

The amazing people who maintain their own Tor relay likely already have a strong understanding of why they support the Tor Project. However, some people are still learning, want to learn more, or want other ways of making connections. An award system might be a good way to provide needed feedback loops. Mozilla has initiated an “Open Badges” program, and it seems ideal for this type of knowledge development and community building.

  1. Create a Tor Challenge OpenBadges authority, and provide direct feedback to the individuals and organizations who have earned achievements.
  2. Research and develop new metrics and new ways to award badges.
  3. Create ways for people to share their badges on social networks as well as blogs/personal pages.
  4. Automate the delivery of awarded badges, detailing the next steps and/or additional ways to get involved with either the Tor Project or the Tor Challenge.
  5. Send out monthly newsletters to the Tor Challenge community alerting all of Tor updates, issues, news stories, and of course, the new achievements awarded to community members.

I hope that the ideas that I present above are useful to you. I understand that these ideas may already have been implemented to some degree, and I hope that you understand that I do not want to step on anyone’s feet, especially the amazing people at the Tor Project. Feel free to reuse or republish any of the above verbiage, and please contact me if you have any questions or concerns. Thank you for your time.

Thesis proposal: An Information Systems Theory Approach to Semantic Non-Information Identification

Will process semantic media for PhD candidacy!

Here are some snapshots of my developing thesis project proposal. I’m looking for professors who care about these issues as much as I do.

At the heart of societal problem solving is information and the mediums that carry it. Information is shared via a wide array of dissemination– the leaders of nations receive succinct reports, crafted with a range of open source information and classified information. Internet users read news articles and Wikipedia entries to attempt to understand very complex issues.

It takes a lot of time and energy to understand complex issues without succinct reports to rely on. Ideally, I believe that if people could spend less time gathering high-quality information, they would spend more time developing follow up strategies, and society would progress forward at an ever accelerating pace. Information consumption has been a societal issue for as long as our species has been able to communicate. But now we have computers to help process large quantities of it. The process of understanding (knowledge development) the problem in order to develop a solution requires high quality information. Misinformation and disinformation are roadblocks to scientific, governance, and plain human development.

Not-information needs to be rooted out. Not only for post processing, but there are key areas where information producers need to be made aware of to minimize information entropy and information noise.

There are several different goals along the way. Some tools needs to be developed and some surveys need to be developed and administered for qualitative analysis.

1. In 2010 (http://yawnbox.com/?p=119) I thought up an interesting way to visualize information entropy. In that post, the simple 3-word example “radical life extension”, it is clear that the word “life” has the greatest complexity. When reading the word “life” in a sentence, one should mitigate its entropy by reading the rest of the sentence. But by itself, it’s still a high-entropy word. If a news article or Wikipedia article has a lot of information entropy, I presume that it’s going to be a less informative article, suspect of not-information and possibly identifiable when juxtaposed with other same-information sources.

1A- I need to come up with an open-source dictionary to work with.
1B- The data needs to be structured in preparation for the next step.
1C- Presuming python and something else, I’ll need to develop a software application that will process a hyperlink, strip it down to the article, and process the article with the dictionary.
1D- Following, I want to develop a stand-alone application (modular?) that will visualize dictionary quantification in real-time.

Later advances will include thematic relation (wikipedia.org) processing.

2. There have been two (http://yawnbox.com/?p=736) occasions (http://yawnbox.com/?p=) where I’ve tried to put some of my ideas into practice. Specifically, I’ve rationalized some of Dr. Luciano Floridi’s work concerning data classification. I think the work is adaptable not only to information (he touches on that) but modifiable to work for an aspect of natural language processing.

2A- Get some article with common primary information (like in http://yawnbox.com/?p=837).
2B- Use the tool created in part 1C to organize the information in the desired format.
2C- Make the articles presentable for a controlled study. Issue the raw articles to a “control group” and the formatted articles to a seconds group. Ask questions like these, or example:

[x] Specific article content are presumed factual and informative
[-] Specific article content are presumed truthful and informative
[-] Specific article content are presumed untruthful and disinformative
[-] Specific article content are presumed nonfactual and disinformative

2D- Compare and contrast how the information was processed– the articles and how they were perceived as they are “normally”, to the same content but color-coded to denote probabilities of entropy, not-information, etc.

Other goals not described in this post include:

Thesis objective: In order to understand the nature of misinformation and disinformation, I need to understand the nature of information using Dr. Floridi’s information classification scheme. I hope to accomplish this while developing several software applications and theoretical frameworks along the way.

Using the applied notion of grounded theory method (wikipedia.org), my over-arching objective will be two-fold:

1. Contribute to the Wikipedia project with F/OSS (wikipedia.org) tools that I’ll develop and education to support the use of the tools, and

2. develop a start-up for-profit company geared towards processing web and user-generated content. Think, Google News on steroids.

And, of course, earn a doctorate diploma so I can guest lecture in my spare time.