6.A) Explain the shared-nothing architecture for big data tasks. – 10 Marks

Answer:-

Shared-Nothing (SN) architecture is designed to optimize distributed data processing.
Each node operates independently, without sharing data or memory with others, ensuring computational self-sufficiency.
The columns of two tables relate by a relationship. A relational algebraic equation specifies the relation.
Keys share between two or more SQL tables in RDBMS.
Shared nothing (SN) is a cluster architecture. A node does not share data with any other node.

Independence:
- Each node operates independently with no shared memory, providing computational self-sufficiency.
Self-Healing:
- In case of a link failure, new links are created to maintain operations.
Sharding:
- Each node stores a shard (a partition of a large database) and processes queries independently.
No Network Contention:
- Nodes operate without competing for network resources.

Parallel Processing:
- Nodes can handle different queries simultaneously.
- Ideal for large-scale data processing tasks.
Horizontal Scalability:
- Easily adds nodes to handle increasing data or workload.
Fault Tolerance:
- Ensures system availability despite node or link failures.

Examples of Tools Using SN Architecture:

These tools use partitioning and parallel processing, distributing data across nodes for efficiency.

To effectively use SN architecture, appropriate distribution models must be selected based on the application requirements.

Description:
- Simplest distribution model.
- A single server processes data sequentially.
Use Cases:
- Suitable for graph databases or key-value stores requiring sequential processing.
Limitations:
- Limited scalability.

Description:
- Large datasets are divided into shards, distributed across multiple servers in a cluster.
- Each shard operates independently.
Advantages:
- Provides horizontal scalability and improves performance.
Example:
- MongoDB supports auto-sharding to distribute data.
Limitation:
- Link failure may require shard migration to another node.

(a) Single server model (b) Shards distributed on four servers in a cluster

Description:
- One master node handles writes and directs slave nodes.
- Slave nodes replicate data and handle read operations.
Advantages:
- High resilience for read operations.
Limitations:
- Replication can reduce performance for write-heavy tasks.
- Complexity increases in large-scale applications.

Description:
- All nodes are equal and handle both read and write operations.
- Data is distributed among all nodes in a cluster.
Advantages:
- Enhanced consistency and availability.
- Node failures do not impact write capabilities.
Example:
- Cassandra uses this model for distributed data processing.

Leave a ReplyCancel Reply