Thesis proposal: An Information Systems Theory Approach to Semantic Non-Information Identification

Will process semantic media for PhD candidacy!

Here are some snapshots of my developing thesis project proposal. I’m looking for professors who care about these issues as much as I do.

At the heart of societal problem solving is information and the mediums that carry it. Information is shared via a wide array of dissemination– the leaders of nations receive succinct reports, crafted with a range of open source information and classified information. Internet users read news articles and Wikipedia entries to attempt to understand very complex issues.

It takes a lot of time and energy to understand complex issues without succinct reports to rely on. Ideally, I believe that if people could spend less time gathering high-quality information, they would spend more time developing follow up strategies, and society would progress forward at an ever accelerating pace. Information consumption has been a societal issue for as long as our species has been able to communicate. But now we have computers to help process large quantities of it. The process of understanding (knowledge development) the problem in order to develop a solution requires high quality information. Misinformation and disinformation are roadblocks to scientific, governance, and plain human development.

Not-information needs to be rooted out. Not only for post processing, but there are key areas where information producers need to be made aware of to minimize information entropy and information noise.

There are several different goals along the way. Some tools needs to be developed and some surveys need to be developed and administered for qualitative analysis.

1. In 2010 (http://yawnbox.com/?p=119) I thought up an interesting way to visualize information entropy. In that post, the simple 3-word example “radical life extension”, it is clear that the word “life” has the greatest complexity. When reading the word “life” in a sentence, one should mitigate its entropy by reading the rest of the sentence. But by itself, it’s still a high-entropy word. If a news article or Wikipedia article has a lot of information entropy, I presume that it’s going to be a less informative article, suspect of not-information and possibly identifiable when juxtaposed with other same-information sources.

1A- I need to come up with an open-source dictionary to work with.
1B- The data needs to be structured in preparation for the next step.
1C- Presuming python and something else, I’ll need to develop a software application that will process a hyperlink, strip it down to the article, and process the article with the dictionary.
1D- Following, I want to develop a stand-alone application (modular?) that will visualize dictionary quantification in real-time.

Later advances will include thematic relation (wikipedia.org) processing.

2. There have been two (http://yawnbox.com/?p=736) occasions (http://yawnbox.com/?p=) where I’ve tried to put some of my ideas into practice. Specifically, I’ve rationalized some of Dr. Luciano Floridi’s work concerning data classification. I think the work is adaptable not only to information (he touches on that) but modifiable to work for an aspect of natural language processing.

2A- Get some article with common primary information (like in http://yawnbox.com/?p=837).
2B- Use the tool created in part 1C to organize the information in the desired format.
2C- Make the articles presentable for a controlled study. Issue the raw articles to a “control group” and the formatted articles to a seconds group. Ask questions like these, or example:

[x] Specific article content are presumed factual and informative
[-] Specific article content are presumed truthful and informative
[-] Specific article content are presumed untruthful and disinformative
[-] Specific article content are presumed nonfactual and disinformative

2D- Compare and contrast how the information was processed– the articles and how they were perceived as they are “normally”, to the same content but color-coded to denote probabilities of entropy, not-information, etc.

Other goals not described in this post include:

Thesis objective: In order to understand the nature of misinformation and disinformation, I need to understand the nature of information using Dr. Floridi’s information classification scheme. I hope to accomplish this while developing several software applications and theoretical frameworks along the way.

Using the applied notion of grounded theory method (wikipedia.org), my over-arching objective will be two-fold:

1. Contribute to the Wikipedia project with F/OSS (wikipedia.org) tools that I’ll develop and education to support the use of the tools, and

2. develop a start-up for-profit company geared towards processing web and user-generated content. Think, Google News on steroids.

And, of course, earn a doctorate diploma so I can guest lecture in my spare time.

Toward an Open Privacy Specification

Information privacy is the claim of individuals to determine what information about them is disclosed to others and encompasses the collection, maintenance, and use of identifiable information. Privacy is an important value in a democratic society. For individuals, it enhances their sense of autonomy and dignity by permitting them to influence what others know about them. For associations, privacy enhances the ability of individuals to function collectively by permitting the association to keep deliberations and membership and other activities confidential. For society, privacy fosters individual and associational contributions to society, promotes diversity, and limits undesirable conduct and abuse of authority by government and other institutions.

Toward an Information Bill of Rights & Responsibilities

This post is a brain dump for my ideas for designing a new, community driven (#open) certification. I’d like to eventually make this a program that is actively maintained by the Open Knowledge Foundation and licensed under the Creative Commons.

First, checkout this 5-minute video presentation from the Chaos Communication Congress where this idea spawned:

28c3 LT Day 2: Securing the Servers: Privacy Policy for Providers

The PCP is a policy for communication service providers who seek to respect the privacy of their user-base. It includes a set of modules that cover various aspects of the server configuration and three levels in each module which provide more and more privacy.

I’d like to adapt their work, specifically, to create an open framework that would be made up of a spectrum of policies and procedures for auditing and implementing privacy-centric services for information hosting providers. So, if you’re a blogger or an internet service provider, you would use this specification to audit yourself, make specific changes to your network or hosting infrastructure, then precisely outline such capabilities, publicly. This would be a voluntary and trust-based process, being that service providers will be their own auditors.

Their existing work:

Open Privacy Specification

Mission

Collaboratively build an open framework for a broad range of internet-based information service providers with the objective of creating and maintaining specific policies, procedures, and certifications for objectively controlling personal information.

Purpose

Fundamentally, maintaining individual privacy requires accessibility to control the confidentiality, integrity, and availability of specific information. Information that cannot be controlled by a services user must be defined and made publicly available, with detail, without compromising the security of the information hosting provider.

The purpose of the Open Privacy Specification is to:

  • define the relative privacy expectations between the information hosting provider’s service and the services users;
  • design and implement services that safeguard the services users whenever possible against voluntary and involuntary compromisation;
  • provide the services users meaningful information about their ability to maintain their privacy while using said services;
  • implement routine processes and secure controls via standardized policies and procedures;
  • implement a standardized public disclosure document outlining the information service providers metric-based capabilities and limitations.

Certifications

Certifications will be built around service capabilities and information management infrastructure.

Service capability examples include:

  1. Pertinent regional laws (when available)
  2. Organizational management (as permissible)
  3. Automated and manual processes (as permissible)

Information management infrastructure examples, standardized around the OSI model, may include:

  1. Physical, data link, network, transport, and session layers:
    1. Upstream providers capabilities and limitations
    2. Hardware configuration, capabilities, and limitations
    1. Network configuration, capabilities, and limitations
  2. Presentation and applications layers:
    1. Operating systems configuration, capabilities, and limitations
    2. Software applications configuration, capabilities, and limitations

Future revisions of specific policies or procedures should be adaptable to existing information assurance frameworks, such as PCI-DSS, COBIT, NIST, or ISO/IEC 27002, etcetera. At the moment, I’m thinking about sponsoring a hack-day event to launch the initial draft with the University of Washington. I think it would be a solid start. As always, feel free to share any commentary.

Layered email security

There are two takeaways from CloudFlare‘s (wikipedia.org) recent security breach that are outstanding pieces of actionable information.

Reference: http://blog.cloudflare.com/post-mortem-todays-attack-apparent-google-app

Ensure your password on your email account is extremely strong and not used on any other services…

and

Reference: http://blog.cloudflare.com/the-four-critical-security-flaws-that-resulte

…using an out-of-band authentication that doesn’t rely on the phone company’s network (e.g., Google Authenticator App, not SMS or voice verification).

If you already have two-factor authentication (wikipedia.org) turned on for your Gmail or Google Aps account, you likely have a cell phone number or a landline number in use. It’s really easy to remove the number once you are using the Google Authenticator (wikipedia.org) app. If you have a rooted phone like me, and enjoy reflashing your phone to try out new roms or mods, be sure to deactivate two-factor authentication before you purge your apps!

If you aren’t using two-factor authentication… may the internet gods be with you XD

Password Reset

Hello Company,

Can you please assist me with resetting my account password for the company customer portal? I don’t know how I answered my “security questions”. I never use the same answers since answering the same question at multiple locations (like my bank, etc) is no different than using a password twice, just these ones an attacker could actually figure out just by finding the right information.

If security is important to you, you should look into multi-factor authentication, and not simply increase the amount of passwords a person has to type in. Please forward this suggestion to Jane Doe, your CIO, who apparently designed the company customer portal.

By the way, when you disallow web browsers to remember my randomly-generated passwords, it gets in the way of my workflow. I must have saved the password in clear-text somewhere but instead now I’m spending my employer’s time emailing you for help.

Cheers

[changed for privacy]