Explain the shared-nothing architecture for big data tasks

6.A) Explain the shared-nothing architecture for big data tasks. – 10 Marks

Answer:-

Shared-Nothing Architecture for Big Data Tasks

  • Shared-Nothing (SN) architecture is designed to optimize distributed data processing.
  • Each node operates independently, without sharing data or memory with others, ensuring computational self-sufficiency.
  • The columns of two tables relate by a relationship. A relational algebraic equation specifies the relation.
  • Keys share between two or more SQL tables in RDBMS.
  • Shared nothing (SN) is a cluster architecture. A node does not share data with any other node.

Key Features of SN Architecture

  1. Independence:
    • Each node operates independently with no shared memory, providing computational self-sufficiency.
  2. Self-Healing:
    • In case of a link failure, new links are created to maintain operations.
  3. Sharding:
    • Each node stores a shard (a partition of a large database) and processes queries independently.
  4. No Network Contention:
    • Nodes operate without competing for network resources.

Advantages of SN Architecture

  • Parallel Processing:
    • Nodes can handle different queries simultaneously.
    • Ideal for large-scale data processing tasks.
  • Horizontal Scalability:
    • Easily adds nodes to handle increasing data or workload.
  • Fault Tolerance:
    • Ensures system availability despite node or link failures.

Examples of Tools Using SN Architecture:

  • Hadoop
  • Flink
  • Spark

These tools use partitioning and parallel processing, distributing data across nodes for efficiency.


Choosing Distribution Models

To effectively use SN architecture, appropriate distribution models must be selected based on the application requirements.


1. Single Server Model (SSD)

  • Description:
    • Simplest distribution model.
    • A single server processes data sequentially.
  • Use Cases:
    • Suitable for graph databases or key-value stores requiring sequential processing.
  • Limitations:
    • Limited scalability.

2. Sharding Model

  • Description:
    • Large datasets are divided into shards, distributed across multiple servers in a cluster.
    • Each shard operates independently.
  • Advantages:
    • Provides horizontal scalability and improves performance.
  • Example:
    • MongoDB supports auto-sharding to distribute data.
  • Limitation:
    • Link failure may require shard migration to another node.

(a) Single server model (b) Shards distributed on four servers in a cluster


3. Master-Slave Distribution Model

  • Description:
    • One master node handles writes and directs slave nodes.
    • Slave nodes replicate data and handle read operations.
  • Advantages:
    • High resilience for read operations.
  • Limitations:
    • Replication can reduce performance for write-heavy tasks.
    • Complexity increases in large-scale applications.

4. Peer-to-Peer Distribution Model (PPD)

  • Description:
    • All nodes are equal and handle both read and write operations.
    • Data is distributed among all nodes in a cluster.
  • Advantages:
    • Enhanced consistency and availability.
    • Node failures do not impact write capabilities.
  • Example:
    • Cassandra uses this model for distributed data processing.

Leave a Reply

Your email address will not be published. Required fields are marked *