The Next Big Thing: Personalised Data-Mining

Recently, I’ve been creating a number of web crawlers that gather various things together an do nice things with the information collected and had one of those “aha” moments….

The “Next Big Thing” on the internet will be Personalized Data-Mining. “What’s that?” you ask. “It’s really very simple…” I tease … “Go on then” you prompt… “OK, calm down, calm down… here we go…” I witter…

All Personalized Data-Mining is doing what I’ve been doing, creating web crawlers that gets information and does nice things with it. You see, the trouble with Google is, you can’t ask it difficult questions, like “Get me every TV that is less than ten inches wide”… Now the silly part is that all the information you need is probably sitting on Amazon and John Lewis and Googlebase, but frustratingly, you can’t quite get at it.

So, to pin it down, Personalized Data-Mining is about empowerment. The data is there but you can’t get at it, all Personalized Data-Mining does is turn it from something-you-can-read into something-you-can-use.

And here’s the important point, Personalized Data-Mining is NOT SEMANTIC WEB… it’s the iterim (and necessary) stage that needs to happen before the semantic web can happen. You see, the problems that dog the semantic web are many (and blogged about before) but the biggest problems are…

  1. Not everyone will semanticize their information
  2. Those that do semanticize will do it badly (trust me) meaning you still can’t use it the way you want to anyway

The solution to these problems is Personalized Data-Mining, which has the benefits of…

  1. Being distributed
  2. Being small.
  3. Being creatively in the hands of the individual, rather than people who decide what a “shoe” or “garden plant” is made up of. Being personalized means real people get to bend it, shape it and invent new applications.
  4. Being able to “get at” ANY online information, whether it’s in html, images or even PDF.

Currently there are a number of sites and services that almost get there, that automate data-extraction, that help make mashups, but for me… none quite do the thing required, which is to pull the data into a context where you can manipulate your “world”… they quiclly turn “data” back into “information”. I don’t want a data-mining report… I want well, data and ways of dealing with that data.

“What the hell are you on about?” you rightly ask, sighing…. “Well… ” I dream on…

Imagine an internet where all the sites out there are in fact all working for you. You can have them take this site and mix it with that database, and they’d do it willingly. You can request features and have them implemented by this afternoon. You can ask questions and get them answered, no matter how crazy. You can create new businesses based on other businesses and not be sued whilst doing so.

Or to put it in simpler terms… you can do stuff. At times the internet seems to be all about having stuff done for you. And if you want something done well…

If Data Is Money You’re Probably Burning It!


It really does seem to me that data is where all the Web2.0 action is.

Because there has been so much talk about mashups it can be easy for us to be tired of hearing about them. We are all a bit Web2.0 weary already. Once you’ve seen one GoogleMap it’s hard to get excited about the next.

I’ve been refining my data-mining tools over the weekend, adding in some semantic possibilities (trying to understand what the data is about) and visualisation (trying to communicate what large amounts of data actually tells you). These two technologies, from my perspective have really matured over the last 5 years to the point where idiots like me can use them.

Being data-driven is key element for any online business these days. Even for the smaller business, there’s real value hiding in their data. Or to put it another way, there’s money in data, money in data about data, money in data about data about data and most companies are squandering their opportunity, missing the point.

The Four Types Of Data You Are Probably Burning

Of course, your company isn’t burning the data in a destructive way, but it is lying there, unwashed and unwanted. As a creative exercise, all you have to do it pick two of the types of data from the list below and “breed” them. I guarantee you will have a fantastic idea for your company.

1. Data That’s Out There (Wild, Free Range Data)

Out there on the internet is data just sitting there. You can take that data and do something with it, for example Google did that and they haven’t done so bad. Of course you might have to find two or three interesting sources of data and be creative with it but it IS possible.
And lurking in all the data that’s out there are things that people want to find an can’t. Whether you think Google is a good search engine or not if you examine what your customers really want the chances are you can help them to achieve it.

What are people saying about your company? Reputation-based search engines are starting to appear but do they work for your company?

The online landscape is rich with wild and free data, all you have to do is do if first find it, and then something interesting with it. “Interesting” doesn’t even have to be difficult to be interesting.

2. Your Internal Data (Captive Data)

Most companies are sitting on databases of products, people, places, messages and information flow. If you can find a way to blend your captive data with the wild stuff then you have a good chance of producing a healthy hybrid.

A trivial example might be to use data-mined data alongside customer lists. I can already imagine a “discovered” Flickr feed for a CRM system. I said the example was trivial, I meant scary.

Another tactic is to experiement with letting your captive data out into the wild with RSS feeds and seeing what happens.

3. Your Customers Data ( Very Sensitive Data )

Calm down. I don’t mean customers’ data per se, I mean data about data. I still find it fascinating that I would be loathe to a bookshop other than Amazon because they’ve not only got my addresss, but they’ve got the addresses of people I send stuff to. I can’t imagine how anyone could ever replace Last.fm because my listening chart has taken years to build up.

You sites’ log files, in fact all the stuff you know about your customers is data that becomes more valuable than the original transaction. It’s the data, my usage of the system (perhaps aggregated with other customers) that keeps me coming back.

4. Your Competitors Data (Almost Secret Data)

I say competitor, but by that, I mean any other organisation that is letting their data run free. There are opportunities to mix a customers profile data with some of your data.

The first thing that comes to mind is a terrible example… What if a call centre played me songs I like rather than ones I don’t? I said it was terrible, but it’s doable right now. A company has my phone number which is tied to my email address which is enough to find my Last.fm account and fish out a Nine Inch Nails track. If I’m phoning to complain though it would probably be better from them to look for songs Delicious tagged with “relaxing” and send me one of those whilst my call is being valued.

My point is that many companies are throwing away their crown jewels, their usage data. There exists huge opportunities for companies willing to blur their data boundaries, finding genuinely useful data where there once was none, combining external free-range data with internal data and letting internal data let it’s hair down a bit.

This data revolution isn’t just about Google Maps.