- In supervised learning, there is a supervisor that provides labelled data.
- The model learns to map inputs to fixed, predefined outputs using this labelled dataset.
- In reinforcement learning (RL), there is no labelled dataset initially.
- Instead, the agent (which can be a human, robot, or computer program like a chatbot) learns by interacting with the environment and receives feedback in the form of rewards or penalties.
- For complex games like Chess and Go, supervised learning isn’t practical:
- It is hard to generate a complete labelled dataset of all possible moves.
- So RL becomes the solution: the agent learns by trial and error.
- In RL, the agent must often make sequences of decisions, and the effects of those decisions can be observed only after several moves — which makes RL harder than supervised learning.
Differences between Reinforcement Learning & Supervised Learning
Reinforcement Learning | Supervised Learning |
---|---|
No supervisor; no labelled dataset initially | Has a supervisor; labelled dataset available |
Data points are dependent; each move/action affects the next | Data points are independent; model maps input to output directly |
Learns by interaction with environment and trial-error | Learns from labelled examples provided by supervisor |
No predefined target class; learning is goal-oriented | Target classes are predefined by the problem |
Example: Chess, Go, Robotics | Example: Classification tasks like spam detection, cancer diagnosis |
Differences between Reinforcement Learning & Unsupervised Learning
Reinforcement Learning | Unsupervised Learning |
---|---|
Mapping from input to output is present | No mapping to output; finds hidden patterns |
Gets constant feedback from environment (reward/punishment) | No feedback from environment |
Goal is to maximize cumulative reward | Goal is to find structure, clusters, or patterns |
summary:
Aspect | Reinforcement Learning | Supervised Learning | Unsupervised Learning |
---|---|---|---|
Initial data | No labelled dataset | Labelled dataset | Unlabelled dataset |
Supervisor | No | Yes | No |
Feedback | From environment (reward/punishment) | From supervisor | None |
Goal | Learn best actions to maximize cumulative reward | Learn mapping from input to output | Discover hidden patterns or clusters |
Example | Chess, Go, robotics | Email spam classifier | Customer segmentation using clustering |