Search This Blog

Monday, August 31, 2015

The Seven C's of Analysis.

Feeling adrift in the topic of analytics and data science?

Earth with Seven Seas Oceans

There are two types of people in the world - Those who categorize and those who do not. There are many categorizations around analytics. Model based, Predictive, etc...

Five cases or levels of analysis - Decriptive, Diagnostic, Discovery, Predictive and Prescriptive - are often listed. Analyses are done on Big Data, Small Data, Rich Data and so on. Data gets classified by Velocity, Volume, Variety, Veracity and Value (5V).

What is the definition of analysis?
:Examination of structure as a basis for interpretation: is a fair working definition. The process or action definition is tantamount to taking inputs and ideas and turning them into decision and deliverables.

So how does analysis work? In the real and theory world? Past, present and future? Note that even asking the question involves a bit of analysis - deciding that time, space, quality and value are important parameters in our discussion.

Back to the seven C's. (three Chi's and four K's). Three are old:

Cheat: Know the answer for certain apriori. Like the Sting.  Or like doing the Kaggle Titanic competition from a download of the historical list of survivors. Or edX or Coursera students who use a shadow account to get the answers.

Chance: Guess the answer randomly... Not based on any "knowledge".

Chestnuts: Use rules of thumb, experts and one's gut. Experts can be consultants, practitioners, or highest paid person opinions (HIPPO). 

The next four are actually what most would call based in data science.

Coordinations: Statistics and regressions against well known dependent variables like space, time and value. f(t,x,y,z).. Note the nuance versus correlation below.

Correlations: f(v) for any vector v. - emergence - and not just related to time and space connections. Anything can be connected to anything else... in any way.

Crowd: Gather wisdom from many actors and models. Maybe using rule of thumb, or gut or any of the others, but getting it right(er) by law of averaging. Boosting might even be clumped into this crowd - so to speak.

Continuous: Iterate and apply the above in real world. Not quite like the others, this is about how the analytics get generated and acted upon. Most of the above answer the call when the need arises. They do not have "instinct" building (aka a built-in instinct) nor a process per se.

No comments:

Post a Comment