Search This Blog

Showing posts with label science. Show all posts
Showing posts with label science. Show all posts

Thursday, August 6, 2015

Data Science. Feature Engineering. Sustainable Value. Seeking.

Had some exposure to data science long ago in undergraduate and graduate school:
Mapping spatial spin dynamics in helium fluids with NMR, studying positron annihilation in CuTi alloys, and coming up with good deposition recipes for making smooth gold substrates (as flat backgrounds against which to do scanning tunnelling microscopy of biologics and large chain molecules) were three that come immediately to mind.

Got some sense of theoretical underpinnings of data science from mentors at the math department at Dalhousie University. See that Dalhouse now has a data science department and will be hosting an international conference KDD-2017Physics Department engendered a fondness for a hands on approach. Got exposure to a wide variety of data and modeling techniques at the Condensed Matter Physics department at Cornell. Was away from data science for a long time on pathways of sensors and instrumentation as things in and of themselves. And business issues around production, logistics and customer support (of what amounts to instrumentation software). But analytics of large streams of data (sensors and tags in the Internet of Things) has brought me back to data science.

Have been taking the MIT edX course on data science. Looked at many things that this pointed to. Have become a fan of the materials at kdnuggets. And have been thinking hard about feature engineering. Note the feature engineering article is a very lean in Wikipedia, which belies its importance.

The Machine Learning Mastery article on discovering feature engineering by Jason Brownlee referenced  is well worth reading (well written and math does not overwhelm).

There are many articles about how one can excel at data science (and win Kaggle competitions *grin*) by using some core principles of collecting and understanding the data, feature engineering, applying standard or non-standard data science techniques, boosting (or model combinations).

There are very interesting new companies and business models which leverage the application of data science to seek "treasure". Some have new an wonderful techniques (like Ayasdi) and many have great tools to make a data scientist's life easier like data base stuff from Deep Information Sciences, and automatic model selection and combination like in Azure ML and IBM offers.

Cannot help but think that this all only gets one so far. The value of a model has to be harvested by deploying that model into the real world with real world constituents, and few have ventured there (or have simply taken it for granted). Further a good model often begs more data.

Those are things we have thought hard about in Analytika. One can construct a cycle:
  • A. get and understand data
  • B. feature engineer
  • C. model and transcend
  • D. deploy (get ongoing data, get ongoing analytics results)
  • E. harvest business value
  • F. goto A.
Thinking about how anywhere along the line we might realize a new feature in existing data or ask for....
"Can we get a new temperature sensor?" "Do you have data on turbidity?" "Do you have a flow sensor on x". State of the art is a long way from a computer or AI asking, "Do you have any data on how bright the clothing was for the Titanic survivor? How tall was each passenger?". Human inquiry and framing seems sustainably key. 

Thursday, March 26, 2015

Anna and Charlie - Talking about Snowflakes.

Anna: Why are all snowflakes different? [Scientist]

Snowflake Bentley 11 details

Charlie: No they are not. Snowflakes are all made of water, white, small and hexagons (or clumps of such). [Engineer] 
Snowflake icon simple representation


Thursday, February 12, 2015

Staying on Track Blogging.

Stay on Track Blogging 2015 by thoughtlight, LLC, Thursday, February 5, 2015 from 6 to 8 PM.


Staying On Track - Double Track Railroad

Great group, presentation and ideas. At 50 Milk Street at new(ish) CIC Boston. Visit to the space worthwhile. Takeaways:
- schedule regular blogging
- images always worthwhile (even a little relevance is fine).
- okay to curate or review.

So as a part of the scheduling "exercise": Editorial calendar : (and maybe this has some tongue in cheek spirit *grin*):

Thousand blogs by end of main career (next two decades). Smoothed average: four to five blogs a month.

Current feature: IoT. Ongoing emphasis: Energy and metrology. With a mix of hobbies, book reviews, sociology, robotics, manufacturing, design, toys, transportation, travel, science, engineering and so on.

Tuesday, April 30, 2013

Science and Discovery - Build It Bigger - Danny Forster

When not watching Mythbusters http://en.wikipedia.org/wiki/MythBusters ...

Many interesting things on Science and Discovery... Like "Build It Bigger" aka "Extreme Engineering"
http://en.wikipedia.org/wiki/Extreme_Engineering

Host Danny Forster http://www.dannyforster.com/tv

Enjoyed NYC transportation upgrade episode.