1.a) What is Big Data? Explain the evolution of big data and its characteristics.
Answer:
Big Data definition:
- Big Data is high-volume, high-velocity and/or high-variety information asset that requires new forms of processing for enhanced decision making, insight discovery and process optimization.
- “A collection of data sets so large or complex that traditional data processing applications are inadequate.”
- “Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.”
- “Big Data refers to data sets whose size is beyond the ability of typical database software tool to capture, store, manage and analyze.
Evolution of Big Data
- Early Stages – Traditional Data Storage: Initially, data was structured and stored in Relational Database Management Systems (RDBMS), using SQL for querying and managing structured data.
- The Emergence of Big Data (3Vs): With increased data volume, velocity, and variety, traditional systems were insufficient. Big Data emerged, characterized by:
- Volume: Large amounts of data (measured in petabytes).
- Velocity: Rapid data generation.
- Variety: Diverse data types, including structured, semi-structured, and unstructured data.
- NoSQL and Distributed Systems: To handle Big Data, NoSQL databases (e.g., MongoDB, Cassandra) and distributed systems like Hadoop were developed, enabling scalable storage and processing.
- Cloud Computing: The growth of data required scalable infrastructure, and cloud platforms (e.g., AWS, Google Cloud) provided on-demand storage and compute resources.
- The Addition of Veracity: With more data, the 4th V – Veracity – was introduced, focusing on data quality and accuracy.
- Real-Time Analytics and Machine Learning: The need for real-time insights led to frameworks like Apache Kafka for real-time data processing, and machine learning became key for deriving insights from Big Data.
- Edge Computing and IoT: Edge computing emerged with the rise of IoT devices, allowing local data processing to reduce latency and bandwidth usage.
- Future Trends: The future of Big Data focuses on data integration, advanced analytics, and automation, allowing for deeper insights and real-time decision-making.
Figure 1.1 shows data usage and growth. As size and complexity increase, the proportion of
unstructured data types also increase
![Evolution of Big Data and their characteristics](https://i0.wp.com/vtuupdates.com/wp-content/uploads/2025/01/image.png?resize=618%2C391&ssl=1)
Big Data Characteristics
Volume, variety and/or velocity as the key “data management challenges” for enterprises. Analytics also describe the ‘4Vs’, i.e. volume, velocity, variety and veracity.:
- Volume The phrase ‘Big Data’ contains the term big, which is related to size of the data and hence the characteristic. Size defines the amount or quantity of data, which is generated from an application(s).
- Velocity The term velocity refers to the speed of generation of data. Velocity is a measure of how fast the data generates and processes. To meet the demands and the challenges of processing Big Data, the velocity of generation of data plays a crucial role.
- Variety Big Data comprises of a variety of data. Data is generated from multiple sources in a system. This introduces variety in data and therefore introduces complexity. Data consists of various forms and formats. The variety is due to the availability of a large number of heterogeneous platforms in the industry. This characteristic helps in effective use of data according to their formats.
- Veracity is also considered an important characteristics to take into account the quality of data captured, which can vary greatly, affecting its accurate analysis.