REINFORCEMENT LEARNING AS MACHINE LEARNING

  • In supervised learning, there is a supervisor that provides labelled data.
  • The model learns to map inputs to fixed, predefined outputs using this labelled dataset.
  • In reinforcement learning (RL), there is no labelled dataset initially.
  • Instead, the agent (which can be a human, robot, or computer program like a chatbot) learns by interacting with the environment and receives feedback in the form of rewards or penalties.
  • For complex games like Chess and Go, supervised learning isn’t practical:
    • It is hard to generate a complete labelled dataset of all possible moves.
    • So RL becomes the solution: the agent learns by trial and error.
  • In RL, the agent must often make sequences of decisions, and the effects of those decisions can be observed only after several moves — which makes RL harder than supervised learning.

Differences between Reinforcement Learning & Supervised Learning

Reinforcement LearningSupervised Learning
No supervisor; no labelled dataset initiallyHas a supervisor; labelled dataset available
Data points are dependent; each move/action affects the nextData points are independent; model maps input to output directly
Learns by interaction with environment and trial-errorLearns from labelled examples provided by supervisor
No predefined target class; learning is goal-orientedTarget classes are predefined by the problem
Example: Chess, Go, RoboticsExample: Classification tasks like spam detection, cancer diagnosis

Differences between Reinforcement Learning & Unsupervised Learning

Reinforcement LearningUnsupervised Learning
Mapping from input to output is presentNo mapping to output; finds hidden patterns
Gets constant feedback from environment (reward/punishment)No feedback from environment
Goal is to maximize cumulative rewardGoal is to find structure, clusters, or patterns

summary:

AspectReinforcement LearningSupervised LearningUnsupervised Learning
Initial dataNo labelled datasetLabelled datasetUnlabelled dataset
SupervisorNoYesNo
FeedbackFrom environment (reward/punishment)From supervisorNone
GoalLearn best actions to maximize cumulative rewardLearn mapping from input to outputDiscover hidden patterns or clusters
ExampleChess, Go, roboticsEmail spam classifierCustomer segmentation using clustering

Leave a Reply

Your email address will not be published. Required fields are marked *