Big ambitions for big data: Launch of the Data Science Institute at Imperial College, London
Last night I had the pleasure of being invited over to Imperial College, London for the launch of their new Data Science Institute (DSI). This is a brave launch out into the murky (and massive) waters of big data and big data management – as Prof. David Hand, Chairman of the research board, said the role of the institute will be “…sharing knowledge and expertise in how best to collect and analyse data.”
For anyone still unclear about the whole idea of big data, I’ll paraphrase this as; “there’s an awful lot of this data ‘stuff’ out there, and by the time you’ve finished reading this there’ll be a lot more of it, the important questions are: how do we store it, index it, share it and, perhaps most importantly, get the information/answers we need from it?”
Founding Director of the DSI, Prof. Yike Guo kicked things off with a summary of the objectives of the DSI:
- To act as a hub co-ordinating data science research at Imperial College
- To develop data management and analysis technologies
- To train and educate data scientists
- To advise on data strategy and policy
- Collaboration with industry and promotion of innovation
What I found particularly interesting was the idea that future data specialists will need a background and understanding of hardware (i.e. the machines data is stored on) in addition to knowledge of data engineering and informatics.
He was followed up by the Chair for the evening, Larry Hirst discussing the future of the internet, he outlined how this is going to be the internet of things, not just the internet of devices – the advent and adoption of wearable technology means that we’re not just going to be connected via our smartphones and TVs any more (if you’re not convinced, check out the research into neural interfacing devices, able to monitor brain activity remotely).
Larry also raised the point that with big data come big security and big information governance issues, and any cross sector collaboration involving data sharing is going to have to find a way to address these.
After the Q&A session we were escorted over the road to the Sherfield building and allowed to run wild talking to a selection of researchers about their projects and the big data challenges they’re facing. (I highly recommend having a poke around the Imperial website to get a flavour for some of these projects, there’s some really interesting work being done – but maybe that’s just me).
I’m going to sign off with a fact that might help you wrap your brains around just how big, big data really is. At the moment ICL has 1000 machine cores with which to process data, they have a processing capacity of 50 Terabytes of memory and a storage capacity of 3 Petabytes (a Petabyte is 1015 bytes – see this Wikipedia table of data magnitude), and it’s not enough….