Will process semantic media for PhD candidacy!
Here are some snapshots of my developing thesis project proposal. I’m looking for professors who care about these issues as much as I do.
At the heart of societal problem solving is information and the mediums that carry it. Information is shared via a wide array of dissemination– the leaders of nations receive succinct reports, crafted with a range of open source information and classified information. Internet users read news articles and Wikipedia entries to attempt to understand very complex issues.
It takes a lot of time and energy to understand complex issues without succinct reports to rely on. Ideally, I believe that if people could spend less time gathering high-quality information, they would spend more time developing follow up strategies, and society would progress forward at an ever accelerating pace. Information consumption has been a societal issue for as long as our species has been able to communicate. But now we have computers to help process large quantities of it. The process of understanding (knowledge development) the problem in order to develop a solution requires high quality information. Misinformation and disinformation are roadblocks to scientific, governance, and plain human development.
Not-information needs to be rooted out. Not only for post processing, but there are key areas where information producers need to be made aware of to minimize information entropy and information noise.
There are several different goals along the way. Some tools needs to be developed and some surveys need to be developed and administered for qualitative analysis.
1. In 2010 (http://yawnbox.com/?p=119) I thought up an interesting way to visualize information entropy. In that post, the simple 3-word example “radical life extension”, it is clear that the word “life” has the greatest complexity. When reading the word “life” in a sentence, one should mitigate its entropy by reading the rest of the sentence. But by itself, it’s still a high-entropy word. If a news article or Wikipedia article has a lot of information entropy, I presume that it’s going to be a less informative article, suspect of not-information and possibly identifiable when juxtaposed with other same-information sources.
1A- I need to come up with an open-source dictionary to work with.
1B- The data needs to be structured in preparation for the next step.
1C- Presuming python and something else, I’ll need to develop a software application that will process a hyperlink, strip it down to the article, and process the article with the dictionary.
1D- Following, I want to develop a stand-alone application (modular?) that will visualize dictionary quantification in real-time.
Later advances will include thematic relation (wikipedia.org) processing.
2. There have been two (http://yawnbox.com/?p=736) occasions (http://yawnbox.com/?p=) where I’ve tried to put some of my ideas into practice. Specifically, I’ve rationalized some of Dr. Luciano Floridi’s work concerning data classification. I think the work is adaptable not only to information (he touches on that) but modifiable to work for an aspect of natural language processing.
2A- Get some article with common primary information (like in http://yawnbox.com/?p=837).
2B- Use the tool created in part 1C to organize the information in the desired format.
2C- Make the articles presentable for a controlled study. Issue the raw articles to a “control group” and the formatted articles to a seconds group. Ask questions like these, or example:
[x] Specific article content are presumed factual and informative
[-] Specific article content are presumed truthful and informative
[-] Specific article content are presumed untruthful and disinformative
[-] Specific article content are presumed nonfactual and disinformative
2D- Compare and contrast how the information was processed– the articles and how they were perceived as they are “normally”, to the same content but color-coded to denote probabilities of entropy, not-information, etc.
Other goals not described in this post include:
- – Creating software applications to generate Wikipedia or news articles (https://www.nytimes.com/2011/09/11/business/computer-generated-articles-are-gaining-traction.html?_r=1&pagewanted=all)
- – Gathering primary-, meta-, operational-, and derivative-information to support knowledge development in online communities (http://www.rand.org/news/press/2011/06/14.html)
- – Developing frameworks for understanding the systemic effects of networked information, as information versus as knowledge (http://www.sciencedaily.com/releases/2010/09/100923142448.htm) and (http://www.sciencedaily.com/releases/2010/09/100923142448.htm)
- – Developing frameworks for understanding the effects of high-quality information, misinformation, and disinformation in politics and governance (https://www.nber.org/papers/w17395)
- – Developing frameworks for understanding “bias” in news media (http://www.nature.com/news/beware-the-creeping-cracks-of-bias-1.10600)
- – Etcetera
Thesis objective: In order to understand the nature of misinformation and disinformation, I need to understand the nature of information using Dr. Floridi’s information classification scheme. I hope to accomplish this while developing several software applications and theoretical frameworks along the way.
Using the applied notion of grounded theory method (wikipedia.org), my over-arching objective will be two-fold:
1. Contribute to the Wikipedia project with F/OSS (wikipedia.org) tools that I’ll develop and education to support the use of the tools, and
2. develop a start-up for-profit company geared towards processing web and user-generated content. Think, Google News on steroids.
And, of course, earn a doctorate diploma so I can guest lecture in my spare time.