Yellow Light Database
Here’s an idea for an Android/iPhone App, or some other in-dash car after-market add-on.
There is a camera attached to your car facing forward. It has computer vision to detect stop-lights. There is a GPS device on-board to geo-tag these stop-lights. Associated with the stop-light location is its yellow-light duration. Ideally, whenever possible, the app would synchronize with other apps, sharing location and duration information as well. Using this information and GPS data, an app would be able to calculate how long it will take the driver to reach the light. If the light turns yellow, the driver will know whether he/she must stop or may continue through the yellow without running a red light.
An added feature would be a local database of start/stop acceleration data. This data could be used to calculate how must faster or slower the driver would/could travel to still make the light before it turns red.
Finally, a moisture sensor would need to be present, or other traction-like sensor, to determine road conditions and stopping distance. Correlation between historic acceleration data and moisture could be used to statistically compute stopping distance.
Future modifications might include a car detection system indicating how much space is needed between cars to stop in the event of a sudden incident.
Synchronization of traffic speed data is also achievable using this same architecture, thereby providing an alternative to in-ground traffic sensors.
Disclaimer: I know of at least one company using cell phone GPS data to measure traffic speed.
Edit: Waze – Real-time maps and traffic information based on the wisdom of the crowd
Edit: INRIX
1 Week of Python
Aaaahghhhh! I’m frustrated. Why are all the Python books on Amazon rated as nearly 5 stars when, after reading or skimming them, they don’t deserve more than 3? None of the books I’ve encountered are even half-decent for a professional programmer. They gloss over the implementation details and get into niche topics with poor programming form and typographic format.
[Humble edit] To the authors of these books (or any book for that matter), I realize there is a ton of effort that goes into writing any book, especially those of a technical nature, and I sincerely appreciate your efforts. However, if you are the author of a book that simply copies the Python documentation and adds a few half-baked examples, then you are who I’m talking to. I’m also talking to the authors of so-called “advanced” Python books who have very poor programming style.
Ok, end rant 1.
What’s with the syntax? Everything looks like a GOTO: statement. Rule #1: No GOTO. It’s ugly and feels dangerous. Let’s not forget to mention the (one_element_tuple,) language artifact. I get it, but it’s ugly. /rant2
NumPy: Excellent and essential tool for data arrays. The strength of interpreted languages is rapid development. NumPy saves Python from the greatest weakness of interpreted languages: numeric operations on large arrays. This is the only reason to use Python IMO.
SciPy: I don’t know where this fits in yet, but I’m sure they’ve created highly specialized modules for various scientific fields, just like MatLab has created … whats the word… toolboxes? for various industrial sectors.
Python 2 or 3? 2. Until NumPy supports v3, I wouldn’t even think about it. Python is just too slow on its own… for dealing with arrays of numbers, that is.
EDIT: Yay! Python 3 support
Please, if there is one good Python book for programmers that involves numeric processing of millions to billions of data points, please speak up.
I with to retract my unbridled like for the language. Please replace that an appreciation for NumPy and interpreted languages.
On “Dreamlining” (a la Timothy Ferriss)
Definition by citation: If I had 100 million dollars in the bank right now what would I want to do, be or have over the next 12 months?
Timothy Ferriss
Why? You don’t want to be rich and bored. You want rich and happy. Boredom is the antonym of happiness. If there were no hold-backs, what do you want to do, have, or learn in the next 3 months, 6 months or 1 year? If you don’t know what to do, determine what do you want to be? We can rewrite be as actionable steps towards doing.
You must name your enemy before you can defend against it. You must define your future before you can take action. Only after this identification process can you move towards your goals. If you so much as complete one dream, that’s a dream that wouldn’t have come true without taking action. Do it. It’s doable. Think Nike®.
The Human Brain as a Parallel Dynamic Problem Solver
In solving the “semantic web” problem, I like to start from scratch with just our alphabet: sequences of characters in {0,1}. To teach a computer to think, it must first understand nouns and verbs. To understand a noun, we must be able to identify letters. To identify letters, we must be able to comprehend symbols.
The bit is the symbol. A sequence of bits is a character in the seemingly infinite alphabet. A sequence of characters makes a word, and meaning goes from there. The most fundamental action of a human is to test hypotheses with tri-state output (true,false,undetermined).
Input: stove-top burner is on.
Hypothesis: burner is hot.
Output: true, true, true, true, true, false, true, true.
Input: I fell asleep.
Hypothesis: When I wake up, the sun will be out.
Output: true, true, false, true, false, false.
Input: I jump through a worm-hole.
Hypothesis. I will travel through space and time.
Output: Undetermined.
The conflicting outputs leads us to question our hypothesis. How often are we right and wrong? We assign probabilities and weights certain outcomes. The probability is the rate you were right. The weight is the number of times you tested your hypothesis.
Solution: Allow the input to be anything machine readable (any sequence of bits), and our catalog of previous hypotheses with results. The output will be the result of a hyper-massive, parallel, asynchronous, dynamic programming infinite-dimension matrix, where all previous hypothesis are tested for all possible outcomes, and the answer is the output with the highest score.
For a human, our time is finite, and our computationally sensitive minds need to prioritize tests. For us, the first test is the categorization of input. One attempts to identify the most probable set of hypotheses to test next. We run through these tests in parallel. When the result appears to stabilize, we reduce our emphasis on executing that test, (we nice the process, so to speak) and begin processing the next probable hypothesis. Not all tests will run to completion; most will stabilize at some improbable result or as a clear outlier. The human mind needs to make decisions after various amounts of time. The longer that time, the more thorough the testing of hypotheses and inputs. We must, however, insist on testing both the positive and negative hypotheses equally, to aid in identifying mis-categorizations. A wise strategy would be to test both the most and least probable outcomes. If there’s surprise, we should consider reclassifying our input, reducing the process priority of our current categorical input, and starting with the next probable (again, test least probable in parallel with most probable).
The notion of a score is essentially the probability of the output being correct. When the effect of an input on a hypothesis is unknown, a neutral weight should be applied. That is: the dynamic score should not increase or decrease from the acceptance or avoidance of the branch. Upon selection, the branch will receive a weighting and probability that it didn’t previously have.
Since the world is constantly changing, there should be emphasis added to recent results. This could be considered a hypothesis branch as well, and should be tested as one among all the others. The database will consist of weights and probabilities for every possible permutation of bits. The size of this database is infinite. Roughly calculated as 2^<number of bits>^<number of hypotheses>. The number of bits will easily reach 2^64. The probability is a float in the range from (0 to 1). Note that 0, and 1 are not reachable. The weight is an infinitely large non-negative integer, which increases every time the branch is tested.
Due to physical constraints, the databases should be specialized for their inputs where possible. That is, there could be a centralized object classifier, and infinite silos of specialized data for each classification of input. Given: Jpeg image. Our first classifier would be image, second jpeg. We consult our highly specialized jpeg database (Flickr) to analyze and classify the jpeg. It should be noted that the classification of our input is only one of multiple classification processes. Further, known that the input is a jpeg image is not 100% guaranteed. We may need to test alternative classification silos. It’s one thing to know that the input is an image. The next question becomes, of what?
The specialized silos must also have their own classification systems. The silos must also address the unaddressable, or unexpected. An image is an image of something? noise, computer output, a drawing, a picture of a picture, picture of a page of text, an empty image, an altered image? The silos should know how to transform and bubble up. Say the image silo thinks that the jpeg image is of a page of text. It will, in parallel, extract what it interprets as text, and perform analysis on it as if it were a picture. It might alternately pass the data off to an OCR-specialized database, one which can handle all forms of characters and encodings.
The solution depends on context. For the human, it’s the decision that has to be made. For the machine, the context is the set of most recent activities it’s performed, or “objects” it’s come “in contact” with. The “objects” are the result of categorization, and the “in contact” is the set of bits it’s seen recently. The context is another hypothesis to test. The machine trying to solve the picture puzzle will consult it’s history of categorizations to determine a probability that the image was a photo or text. Remember, when nothing is known a priori, no weight will be applied and both contexts tested as equal.
The data silos should be able to perform group-think and user-specific historic testing. The result is N! tests, where N is the number of users of the system. The idea is that one or more users may have similar data histories. These histories could help or hurt the solution. The machine will almost certainly know that a particular user only takes pictures of bridges, except for that one time when his wife submitted a picture of their kids. Are there other people who only take pictures of bridges? If so, we should assign their data set with a higher weight and altered probability. This is the process of learning. When the machine isn’t answering a problem, it should be reviewing the facts, looking for patterns, and identifying trends. Each of these sub-processing tasks will alter the weight and probability of a branch. It will also introduce new hypotheses to be tested on the next round.
Finally, we have the decision. We read the output with the highest score from our dynamic program once the maximum allotted decision time has elapsed. We can read the output at any time, but we want the best result, and it’s entirely possible that the low-probability result is the actual result. It could take some time for that hypothesis to be tested with limited infrastructure. The output is the result of our input and our question (the original hypothesis).
What do I know? Wow! Nothing.
Serendipity! Not 4 hours pass since writing this and I find this link: Efficient Learning, Large-scale Inference, and Optimisation Toolkit (elefant). It’s like the computer knows what I’m thinking.
Just two days later, I come across a professor’s blog outlining an essential course encompassing machine learning methods. Data Mining Course Material.
Form Follows Function
What do you picture when you hear the phrase “form follows function”? Do you envision an architectural structure? An artist’s painting? A toothbrush?
Clearly, we mean that form is the after-thought and function is the primary objective. Or do we mean that the form of an object is driven by its function?
When looking for ideas, switch between the two mantras, “form follows function” and “form vs. function.” Does your product change? Try to answer the question: Does the product have to change?
Wordpress is the function, the theme is its form. The form can sometimes be a separate product entirely.