I’ve been thinking lately about what is—for want of a better term—sometimes called Data 2.0. My thoughts have been triggered by internal discussions at my workplace Unico about the direction of our Designer Analytics™ solutions. Caveat: I’m not a content, data, or even database specialist or architect.
Thinking about what could be termed tactical analytics leads to a bunch of follow-on thinking about where that data comes from, what it is, how useful it is, how much there is, how to present it, how to trust it, and so on.
- Capture from data sources such as sensors, logs, feeds, and events
- Aggregation; involving filtering, transformation, compressing, often lossy
- Storage and indexing into large repositories such as data warehouses, relational databases, key-value stores, and content management systems
- Query and retrieval
- Analysis, perhaps with statistics, clustering
- Presentation of the output; sorting, categorising, summarising, filtering
It occurs to me that tactical is a key word here. As one of my colleagues puts it, our analytics solutions are about “late binding” of the captured un- or semi-structured data, as opposed to the very early binding of structured data in traditional (Data 1.0?) BI, data warehouse, ETL-type solutions, where hundreds of pre-written, pre-ordained management reports are the norm.
By comparison, data 2.0 concerns sets, often large, of unstructured or semi-structured data. Late binding requires that as little data as possible should be thrown away or interpreted, and the downstream activity of query and retrieval is dominated by (often text-based) search, as a more agile approach to extracting sense and meaning from all the data. And because it's tactical, the analytics solution can be a framework for measuring RoI for a particular change project. Baseline at the start, monitor along the way, measure the final improvement, then focus attention elsewhere.
Late binding of data implies loose coupling of systems. SOA is already about looser coupling than pre-existing point-to-point approaches, but there is scope for looser coupling still in things like mashups, using published or enterprise APIs, as tactical responses to getting coherent meaning from disparate data sources. This area is being opened up by approaches like REST, standards such as the OpenData Protocol, nifty products such as ifttt, and ultimately, the Semantic Web.
There’s a lot more to think about here.