Explain the following terms: scalability and parallel processing, grid and cluster computing.

1.b) Explain the following terms: scalability and parallel processing, grid, and cluster computing. – 10 Marks

Answer:

Scalability and Parallel Processing

Big Data needs processing of large data volume, and therefore needs intensive computations. Processing complex applications with large datasets (terabyte to petabyte datasets) need hundreds of computing nodes.

Convergence of Data Environments and Analytics
  • Big Data processing and analytics requires scaling up and scaling out, both vertical and horizontal computing resources.
  • Computing and storage systems when run in parallel, enable scaling out and increase system capacity. Scalability enables increase or decrease in the capacity of data storage, processing and analytics. Scalability is the capability of a system to handle the workload as per the magnitude of the work.
  • System capability needs increment with the increased workloads. When the workload and complexity exceed the system capacity, scale it up and scale it out.
Analytics Scalability to Big Data
  • Vertical scalability means scaling up the given system’s resources and increasing the system’s analytics, reporting and visualization capabilities. Scaling up means designing the algorithm according to the architecture that uses resources efficiently.
  • Horizontal scalability means increasing the number of systems working incoherence and scaling out the workload. Processing different datasets of a large dataset deploys horizontal scalability. Scaling out means using more resources and distributing the processing and storage tasks in parallel.

The easiest way to scale up and scale out the execution of analytics software is to implement it on a bigger machine with more CPUs for greater volume, velocity, variety and complexity of data. The software will definitely perform better on a bigger machine.

However, buying faster CPUs, bigger and faster RAM modules and hard disks, faster and bigger motherboards will be expensive compared to the extra performance achieved by efficient design of algorithms. If more CPUs add in a computer, but the software does not exploit the advantage of them, then that will not get any increased performance out of the additional CPUs.

Massively Parallel Processing Platforms

When making software, draw the advantage of multiple computers (or even multiple CPUs within the Scaling uses parallel processing systems. Many programs are so large and/ required to enhance (scale) up the computer system or use massive parallel (MPP) processing (MPPs) platforms.

Parallelization of tasks can be done at several levels:

  • distributing separate tasks onto separate threads on the same CPU. in distribution
  • distributing separate tasks onto separate CPUs on the same computer
  • distributing separate tasks onto separate computers

Multiple compute resources are used in parallel processing systems. The computational problem is broken into discrete pieces of sub-tasks that can be processed simultaneously. The system executes multiple program instructions or sub-tasks at any moment in time. Total time taken will be much less than with a single compute resource.

Distributed Computing Model
  • A distributed computing model uses cloud, grid or clusters, which process and analyze big and large datasets on distributed computing nodes connected by high-speed networks.
  • It gives the requirements of processing and analyzing big, large and small to medium datasets on distributed computing nodes.
  • Big Data processing uses a parallel, scalable, and no-sharing program model, such as MapReduce, for computations on it and data is adversely affected.

Grid and Cluster Computing

Grid Computing
  • Grid Computing refers to distributed computing, in which a group of computers from several locations are connected with each other to achieve a common task.
  • A group of computers that might spread over remotely comprise a grid. A grid is used for a variety of purposes.
  • A single grid of course, dedicates at an instance to a particular application only.
  • Grid computing provides large-scale resource sharing which is flexible, coordinated and secure among its users. The users consist of individuals, organizations and resources.
  • Grid computing suits data-intensive storage better than storage of small objects of few million of bytes.
  • To achieve the maximum benefit from data grids, they should be used for a large amount of data that can distribute over grid nodes. Besides data grid, the other variation of the grid.
  • Grid computing is scalable. Grid computing also forms a distributed network for resource integration.
  • Drawbacks of Grid Computing Grid computing is the single point, which leads to failure in case of underperformance or failure of any of the participating nodes.
  • A system’s storage capacity varies with the number of users, instances and the amount of data transferred at a given time.
  • Sharing resources among a large number of users helps in reducing infrastructure costs and raising load capacities.
Cluster Computing
  • A cluster is a group of computers connected by a network. The group works together to accomplish the same task. Clusters are used mainly for load balancing. They shift processes between nodes to keep an even load on the group of connected computers. Hadoop architecture uses the similar methods.

Leave a Reply

Your email address will not be published. Required fields are marked *