Why Your Search Engine Resisted The Best Technology For So Long
These questions originally appeared on Quora – the knowledge sharing network where compelling questions are answered by people with unique insights.
Answers by Thorsten Joachims, Professor at Cornell University working on Machine Learning from Human Behavior, on Quora.
Q: What are the most interesting applications of ML to education today?
A: Higher education has seen and interesting split in recent years. In addition to the traditional model of education involving a human teacher and large costs, online education has set a new counterpoint to the traditional model that is orders-of-magnitude cheaper. On the one hand, it is so much cheaper that it can address whole new audiences for learning (e.g., developing countries, life-long-learning) for which the traditional model is unobtainable right now. On the other hand, however, online education also lacks many of the qualities that traditional education has, and here is where machine learning, mechanism design, gamification, and other artificial intelligence techniques will have the biggest impact. Let me mention a few examples.
The data that students produce while learning helps teachers debug and further improve online course material. Which paths do students take through the material? How do they fare on assessments? How does this affect their overall learning outcomes? In online systems, all this data is available to data-mine for where certain types of students struggle, and how to guide them on a more personalized basis. For example, we have been working with data from online courses of Knewton Inc. on learning embedding representations of students and content similar to what recommender systems do for movies.
Machine learning can also play an important role in addressing one of the key problems in online learning, which lies in keeping the student engaged and motivated. When in a campus setting, going to lecture and doing your homework is the path of least resistance from a social perspective — everybody around you is doing it and you are just following the crowd. But in an online setting, maintaining motivation is much harder. Creating incentive systems that keep students engaged requires creative new ideas ranging from breaking long-term rewards into short-term rewards (e.g. badges, gamification), to creating a social structure online (e.g., matchmaking). All of these have interesting ML components in making them work well.
In addition to the goal of building better courses, machine learning from data of online learning systems also has the potential to provide us with new insights for how humans learn. This will be a more macroscopic perspective on learning than controlled lab experiments can provide, opening up a whole new range of research possibilities on studying human learning. In this way education is in a similar position that social science was 10 years ago, where online social networks allowed us to ask new types of research questions.
Finally, I don’t think the impact of AI and ML will be limited to online learning. In the traditional university setting, we will see new opportunities as well. For example, when it comes to reviewing the research-project reports in the machine learning classes I teach, I am using peer-reviewing methods (and Geoff Webb and I have been using the same methods when we were program chairs for the KDD 2015 conference). Peer reviewing, where students comment on each other’s works, has many educational advantages (e.g., students get to see each other’s work, learn to articulate constructive feedback), and machine learning algorithms that aggregate all this data provide a way to spot disagreement and focus the human teacher’s attention.
A: There are many ways. Most computer science majors will offer you a sequence of machine learning courses you can take — we at Cornell offer 3 undergraduate ML courses, many graduate ML courses, and then you can augment this with relevant courses from other fields like Statistics, Information Science, and Operations Research. The same applies to the master program, where a good undergraduate education in CS would put you in a position to take these courses.
But beyond this traditional path, there are many less traditional paths that one can take through MOOCs and self-study. While somewhat outdated, Tom Mitchell’s book is still one of the best textbooks to read when developing a first intuitive understanding of the basics of Machine Learning for people with a computer science background. When doing such self-study, my advice is to take small steps and learn by doing. Start from wherever you already have deep knowledge, which may be particular application problem that you have already worked on. The most rewarding problems for ML are those where prediction accuracy matters the most, which are difficult to hand-code, and where it is easy to get data. Then do small steps by applying existing machine learning methods with well-controllable behavior (e.g. regularized linear models like logistic regression or SVMs). There is easy-to-use software available for these methods, as well as whole packages like scikit-learn or WEKA). While doing this, increase your depth of understanding by learning more about the underlying theory and by branching out.
A: What happens in commercial products is always up to competitive pressures and optimizing short-term rewards. While it was still possible to make fast progress in search by hand-tuning systems, this was the easiest way to make progress. But I wouldn’t say that search engine really held out very long against ML. Ranking and search was one of the key drivers of machine learning research starting in the late 90s. It demonstrated the commercial impact that machine learning can have, and the data and new machine learning tasks that search engines have created still drive a lot of the machine learning and information retrieval research today.
In fact, if there is a way toward creating agents of broader and broader intelligence, I think that way will come through search engines. Search engines are the key component in a self-reinforcing cycle of research on artificial intelligence. Where else do we have more data about what language means that is grounded in action, paired with huge economic incentives to better understand what users need? Twenty-five years ago, the concept of a computer system understanding any information need you may have and getting you the answer was pure science fiction. Imagine what another twenty-five years will do.
These questions originally appeared on Quora. – the knowledge sharing network where compelling questions are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+. Follow Quora on Twitter: www.twitter.com/Quora