While curious people have long since realized the potential of using smart filters to tweak, tune, and prune their received flow of information, at large these tools still go unused. Yahoo Pipes, still sadly relegated to the fringes of the tech crowd, is one of the tools that show how useful such tools can be.
Just with Yahoo Pipes, the numbers of interesting use cases are vast. For instance, you could with some minor filtering send in an RSS-feed containing new movies, and only show the ones that are rated higher through a certain service (say, IMDB). Another example is enriching a feed with geographical information, perhaps even showing it on a map.
A more recent example of such a filter is the widely celebrated Priority Inbox-feature for Gmail, where using certain heuristics an email can be suggested to be more important than others.
Tools from text analytics fit very nicely in this category; and can really be an integral part of piecing together certain contexts or bits of information while pushing others further away from each other.
Our experiments in this foray was (and still is, although development has been rather slow lately) Saplo Stream, a small twitter integrated web application where links from people you followed were crawled, and depending on what the content you yourself had shared through your own tweets, different types of links where displayed to you inside the application.
The most interesting part however, was in the form of customizable filters for the incoming articles. We have all seen the “related articles” or “related content” at the bottom of news articles. In this case, the customizable filters could be tweaked and tuned to give a high relevance against certain types of content – meaning you could in fact filter out all the links coming in that for instance had to do with foreign policy, or mobile technology.
The filtering itself was done using our own text analysis API, where you can bunch together texts in different ways and infer how similar they are to one another in different ways.