cat brain.log | less

Getting it down on `paper`

The Human Brain as a Parallel Dynamic Problem Solver

In solving the “semantic web” problem, I like to start from scratch with just our alphabet: sequences of characters in {0,1}. To teach a computer to think, it must first understand nouns and verbs. To understand a noun, we must be able to identify letters. To identify letters, we must be able to comprehend symbols.

The bit is the symbol. A sequence of bits is a character in the seemingly infinite alphabet. A sequence of characters makes a word, and meaning goes from there. The most fundamental action of a human is to test hypotheses with tri-state output (true,false,undetermined).

Input: stove-top burner is on.
Hypothesis: burner is hot.
Output: true, true, true, true, true, false, true, true.

Input: I fell asleep.
Hypothesis: When I wake up, the sun will be out.
Output: true, true, false, true, false, false.

Input: I jump through a worm-hole.
Hypothesis. I will travel through space and time.
Output: Undetermined.

The conflicting outputs leads us to question our hypothesis. How often are we right and wrong? We assign probabilities and weights certain outcomes. The probability is the rate you were right. The weight is the number of times you tested your hypothesis.

Solution: Allow the input to be anything machine readable (any sequence of bits), and our catalog of previous hypotheses with results. The output will be the result of a hyper-massive, parallel, asynchronous, dynamic programming infinite-dimension matrix, where all previous hypothesis are tested for all possible outcomes, and the answer is the output with the highest score.

For a human, our time is finite, and our computationally sensitive minds need to prioritize tests. For us, the first test is the categorization of input. One attempts to identify the most probable set of hypotheses to test next. We run through these tests in parallel. When the result appears to stabilize, we reduce our emphasis on executing that test, (we nice the process, so to speak) and begin processing the next probable hypothesis. Not all tests will run to completion; most will stabilize at some improbable result or as a clear outlier. The human mind needs to make decisions after various amounts of time. The longer that time, the more thorough the testing of hypotheses and inputs. We must, however, insist on testing both the positive and negative hypotheses equally, to aid in identifying mis-categorizations. A wise strategy would be to test both the most and least probable outcomes. If there’s surprise, we should consider reclassifying our input, reducing the process priority of our current categorical input, and starting with the next probable (again, test least probable in parallel with most probable).

The notion of a score is essentially the probability of the output being correct. When the effect of an input on a hypothesis is unknown, a neutral weight should be applied. That is: the dynamic score should not increase or decrease from the acceptance or avoidance of the branch. Upon selection, the branch will receive a weighting and probability that it didn’t previously have.

Since the world is constantly changing, there should be emphasis added to recent results. This could be considered a hypothesis branch as well, and should be tested as one among all the others. The database will consist of weights and probabilities for every possible permutation of bits. The size of this database is infinite. Roughly calculated as 2^<number of bits>^<number of hypotheses>. The number of bits will easily reach 2^64. The probability is a float in the range from (0 to 1). Note that 0, and 1 are not reachable. The weight is an infinitely large non-negative integer, which increases every time the branch is tested.

Due to physical constraints, the databases should be specialized for their inputs where possible. That is, there could be a centralized object classifier, and infinite silos of specialized data for each classification of input. Given: Jpeg image. Our first classifier would be image, second jpeg. We consult our highly specialized jpeg database (Flickr) to analyze and classify the jpeg. It should be noted that the classification of our input is only one of multiple classification processes. Further, known that the input is a jpeg image is not 100% guaranteed. We may need to test alternative classification silos. It’s one thing to know that the input is an image. The next question becomes, of what?

The specialized silos must also have their own classification systems. The silos must also address the unaddressable, or unexpected. An image is an image of something? noise, computer output, a drawing, a picture of a picture, picture of a page of text, an empty image, an altered image? The silos should know how to transform and bubble up. Say the image silo thinks that the jpeg image is of a page of text. It will, in parallel, extract what it interprets as text, and perform analysis on it as if it were a picture. It might alternately pass the data off to an OCR-specialized database, one which can handle all forms of characters and encodings.

The solution depends on context. For the human, it’s the decision that has to be made. For the machine, the context is the set of most recent activities it’s performed, or “objects” it’s come “in contact” with. The “objects” are the result of categorization, and the “in contact” is the set of bits it’s seen recently. The context is another hypothesis to test. The machine trying to solve the picture puzzle will consult it’s history of categorizations to determine a probability that the image was a photo or text. Remember, when nothing is known a priori, no weight will be applied and both contexts tested as equal.

The data silos should be able to perform group-think and user-specific historic testing. The result is N! tests, where N is the number of users of the system. The idea is that one or more users may have similar data histories. These histories could help or hurt the solution. The machine will almost certainly know that a particular user only takes pictures of bridges, except for that one time when his wife submitted a picture of their kids. Are there other people who only take pictures of bridges? If so, we should assign their data set with a higher weight and altered probability. This is the process of learning. When the machine isn’t answering a problem, it should be reviewing the facts, looking for patterns, and identifying trends. Each of these sub-processing tasks will alter the weight and probability of a branch. It will also introduce new hypotheses to be tested on the next round.

Finally, we have the decision. We read the output with the highest score from our dynamic program once the maximum allotted decision time has elapsed. We can read the output at any time, but we want the best result, and it’s entirely possible that the low-probability result is the actual result. It could take some time for that hypothesis to be tested with limited infrastructure. The output is the result of our input and our question (the original hypothesis).

What do I know? Wow! Nothing.

Serendipity! Not 4 hours pass since writing this and I find this link: Efficient Learning, Large-scale Inference, and Optimisation Toolkit (elefant). It’s like the computer knows what I’m thinking.

Just two days later, I come across a professor’s blog outlining an essential course encompassing machine learning methods. Data Mining Course Material.



No comments so far.

You must be logged in to post a comment.