I just finished reading the book What is Data Science?.
It is a small book (25 pages) and one of the many good starting points to learn about Data Science. This not a review but a few quotes from the book:
- According to Mike Driscoll(@dataspora), statistics is the “grammar of data science.”
- According to Martin Wattenberg (@wattenberg, founder of Flowing Media), visualization is key to data conditioning: if you want to find out just how bad your data is, try plotting it.
- Making data tell its story isn’t just a matter of presenting results; it involves making connections, then going back to other data sources to verify them.
- Data science requires skills ranging from traditional computer science to mathematics to art.
- According to DJ Patil, (@dpatil), the best data scientists tend to be “hard scientists,” particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you’ve just spent a lot of grant money generating data, you can’t just throw the data out if it isn’t as clean as you’d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn’t what you think it’s telling.
- What Patil calls “data jiujitsu”—using smaller auxiliary problems to solve a large, difficult problem that appears intractable (he has a book on Data Jujitsu)
- Patil’s first flippant answer to “what kind of person are you looking for when you hire a data scientist?” was “someone you would start a company with.”
- Data scientists combine entrepreneurship with patience, the willingness to build data products incrementally, the ability to explore, and the ability to iterate over a solution. They are inherently interdiscplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: “here’s a lot of data, what can you make from it?”
- The future belongs to the companies who figure out how to collect and use data successfully. Google, Amazon, Facebook, and LinkedIn have all tapped into their datastreams and made that the core of their success. They were the vanguard, but newer companies like bit.ly are following their path.
- The part of Hal Varian’s quote that nobody remembers says it all: “The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades.”
This graphic from BigData Startups shows that lots of organizations still do not understand Big Data and predicts a shortage f 140k-190k big data scientists and 1.5M big data managers in USA alone by 2018.
I am reading a bunch of books and will probably do more of these posts. BTW, big data is not always about big data. It is an umbrella term to cover different areas that deal with deriving value out of data.