Here is an example https://builtin.com/data-science/reinforcement-learning-python
Let's assume we are trying to train a cat. Here are something copied from the above article.
- The cat will be the “agent” that is exposed to the “environment.”
- The environment is a house/play-area depending on what you're teaching.
- The situation encountered is called the “state,” which is analogous for example, to your cat crawling under the bed or running. These can be interpreted as states.
- The agents react by performing actions to change from one “state” to another.
- After the change in states, we give the agent either a “reward” or a “penalty” depending on the action that is performed.
- The “policy” is the strategy of choosing an action for finding better outcomes.
- States: The state is a complete description of the world. No piece of information present in the world is hidden. It can be a position, a constant or a dynamic. We mostly record these states in arrays, matrices or higher order tensors.
- Action: Action is usually based on the environment, different environments lead to different actions based on the agent. Set of valid actions for an agent are recorded in a space called an action space. These are usually finite in number.
- Environment: This is the place where the agent lives and interacts. For different types of environments, we use different rewards, policies, etc.
- Reward and return: The reward function R is the one which must be tracked all the time in reinforcement learning. It plays a vital role in tuning, optimizing the algorithm and stop training the algorithm. It depends on the current state of the world, the action just taken, and the next state of the world.
- Policies: Policy is a rule used by an agent for choosing the next action. These are also called the agent's brains.