Search This Blog

Thursday, August 6, 2015

Data Science. Feature Engineering. Sustainable Value. Seeking.

Had some exposure to data science long ago in undergraduate and graduate school:
Mapping spatial spin dynamics in helium fluids with NMR, studying positron annihilation in CuTi alloys, and coming up with good deposition recipes for making smooth gold substrates (as flat backgrounds against which to do scanning tunnelling microscopy of biologics and large chain molecules) were three that come immediately to mind.

Got some sense of theoretical underpinnings of data science from mentors at the math department at Dalhousie University. See that Dalhouse now has a data science department and will be hosting an international conference KDD-2017Physics Department engendered a fondness for a hands on approach. Got exposure to a wide variety of data and modeling techniques at the Condensed Matter Physics department at Cornell. Was away from data science for a long time on pathways of sensors and instrumentation as things in and of themselves. And business issues around production, logistics and customer support (of what amounts to instrumentation software). But analytics of large streams of data (sensors and tags in the Internet of Things) has brought me back to data science.

Have been taking the MIT edX course on data science. Looked at many things that this pointed to. Have become a fan of the materials at kdnuggets. And have been thinking hard about feature engineering. Note the feature engineering article is a very lean in Wikipedia, which belies its importance.

The Machine Learning Mastery article on discovering feature engineering by Jason Brownlee referenced  is well worth reading (well written and math does not overwhelm).

There are many articles about how one can excel at data science (and win Kaggle competitions *grin*) by using some core principles of collecting and understanding the data, feature engineering, applying standard or non-standard data science techniques, boosting (or model combinations).

There are very interesting new companies and business models which leverage the application of data science to seek "treasure". Some have new an wonderful techniques (like Ayasdi) and many have great tools to make a data scientist's life easier like data base stuff from Deep Information Sciences, and automatic model selection and combination like in Azure ML and IBM offers.

Cannot help but think that this all only gets one so far. The value of a model has to be harvested by deploying that model into the real world with real world constituents, and few have ventured there (or have simply taken it for granted). Further a good model often begs more data.

Those are things we have thought hard about in Analytika. One can construct a cycle:
  • A. get and understand data
  • B. feature engineer
  • C. model and transcend
  • D. deploy (get ongoing data, get ongoing analytics results)
  • E. harvest business value
  • F. goto A.
Thinking about how anywhere along the line we might realize a new feature in existing data or ask for....
"Can we get a new temperature sensor?" "Do you have data on turbidity?" "Do you have a flow sensor on x". State of the art is a long way from a computer or AI asking, "Do you have any data on how bright the clothing was for the Titanic survivor? How tall was each passenger?". Human inquiry and framing seems sustainably key. 

3 comments:

  1. Join now for the most comprehensive learning opportunities and create the most efficient set of modifications in Data Science with the aid of our AI Patasala Data Science Training in Hyderabad.
    Data Science Courses

    ReplyDelete
  2. There is noticeably a bundle to understand about this. I suppose you have made certain nice points in functions also. os path

    ReplyDelete
  3. I just wanted to comment on this blog to support you. Nice blog and informative content. Keep sharing more blogs with us. All the best for your future blogs.
    Data Science Course Training in Hyderabad

    ReplyDelete