2.a) Explain with neat diagram the current Landscape of data science process.
Answer:
The Current Landscape:
- Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics. But data science is not merely hacking—because when hackers finish debugging their Bash one-liners and Pig scripts, few of them care about non-Euclidean distance metrics And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a tab-delimited file into R if their job depended on it. Data science is the civil engineering of data.
- Its acolytes possess a practical knowledge of tools and materials, coupled with a theoretical understanding of what’s possible.
- So, the statement is essentially saying that while hackers may be proficient at writing code and solving technical problems, they may not necessarily have the depth of knowledge or interest in the mathematical and statistical concepts that are crucial to data science.
- While statisticians may excel in theoretical aspects of data analysis, they may lack the programming skills necessary to handle real-world data effectively.
- In summary, while statistics is an important component of data science, data science encompasses a broader set of skills and activities beyond statistical analysis, including programming, data manipulation, and machine learning.