Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal…will, other things being equal, be more firmly connected with the situation…; those which are accompanied or closely followed by discomfort…will have their connections with the situation weakened…The greater the satisfaction or discomfort, the greater the strengthening or weakening of the bond.
Thorndike, , p. Thorndike soon gave up work with animals and became an influential educator at Columbia Teachers College. But the Law of Effect, which is a compact statement of the principle of operant reinforcement, was taken up by what became the dominant movement in American psychology in the first half of the twentieth century: Behaviorism. The founder of behaviorism was John B. Watson at Johns Hopkins university.
They sought mathematical laws for learned behavior. Soon, B. Skinner, at Harvard, reacted against Hullian experimental methods group designs and statistical analysis and theoretical emphasis, proposing instead his radical a-theoretical behaviorism. The best account of Skinner's method, approach and early findings can be found in a readable article -- "A case history in scientific method" -- that he contributed to an otherwise almost forgotten multi-volume project "Psychology: A Study of a Science" organized on positivist principles by editor Sigmund Koch.
A third major behaviorist figure, Edward Chace Tolman, on the West coast, was close to what would now be called a cognitive psychologist and stood rather above the fray. Skinner opposed Hullian theory and devised experimental methods that allowed learning animals to be treated much like physiological preparations.
It was nevertheless valuable because it introduced an important distinction between reflexive behavior, which Skinner termed elicited by a stimulus, and operant behavior, which he called emitted because when it first occurs i.
Skinner and several others noted this connection which has become the dominant view of operant conditioning. Reinforcement is the selective agent, acting via temporal contiguity the sooner the reinforcer follows the response, the greater its effect , frequency the more often these pairings occur the better and contingency how well does the target response predict the reinforcer.
It is also true that some reinforcers are innately more effective with some responses - flight is more easily conditioned as an escape response in pigeons than pecking, for example.
Contingency is easiest to describe by example. Suppose we reinforce with a food pellet every 5th occurrence of some arbitrary response such as lever pressing by a hungry lab rat. The rat presses at a certain rate, say 10 presses per minute, on average getting a food pellet twice a minute. Will he press more, or less? The answer is less. Lever pressing is less predictive of food than it was before, because food sometimes occurs at other times. Exactly how all this works is still not understood in full theoretical detail, but the empirical space — the effects on response strength rate, probability, vigor of reinforcement delay, rate and contingency — is well mapped.
What happens during operant conditioning? The experimenter intervened no further, allowing the animal to do what it would until, by chance, it made the correct response. The result was that, according to what has sometimes been called the principle of postremity, the tendency to perform the act closest in time to the reinforcement — opening of the door — is increased.
Notice that this account emphasizes the selective aspect of operant conditioning, the way the effective activity, which occurs at first at 'by chance,' is strengthened or selected until, within a few trials, it becomes dominant. The nature of how learning is shaped and influenced by consequences has also remained at the focus of current research. Omitted is any discussion of where the successful response comes from in the first place.
It is something of a historical curiosity that almost all operant-conditioning research has been focused on the strengthening effect of reinforcement and almost none on the question of origins, where the behavior comes from in the first place, the problem of behavioral variation , to pursue the Darwinian analogy.
Some light is shed on the problem of origins by Pavlovian conditioning, a procedure that has been studied experimentally even more extensively than operant conditioning. In the present context, perhaps the best example is something called autoshaping, which works like this: A hungry, experimentally naive pigeon Figure 2 , that has learned to eat from the food hopper H , is placed in a Skinner box.
Every 60 seconds or so, on average, the response key K lights up for 7 s. As soon as it goes off, the food hopper comes up for a second or two, allowing the bird to eat. No other behavior is required and nothing the bird does can make the food come any sooner. Nevertheless, after a few trials, the pigeon begins to show vigorous stereotyped key-pecking behavior when the key light called the conditioned stimulus: CS , comes on.
Eventually, the pigeon is likely to peck the key even if a contingency is set up such that key-pecking causes the removal of the food. This conditioned response CR is an example of classical conditioning: behavior that emerges as a consequence of a contingent relationship between a stimulus, the CS, and a reinforcer — in this context termed the unconditioned stimulus US.
Autoshaping, and a related phenomenon called superstitious behavior, has played an important role in the evolution of our understanding of operant conditioning. In the present context it illustrates one of the mechanisms of behavioral variation that generate behavior in advance of operant i.
A stimulus like the CS that predicts food generates via built-in mechanisms, a repertoire that is biased towards food-getting behaviors — behaviors that in the evolution of the species have been appropriate in the neighborhood both spatial and temporal of food.
The usual conditioned response in classical conditioning experiments is what Skinner called a respondent, a reflexive response such as salivation, eyeblink or the galvanic skin response GSR. The general principle that emerges from these experiments is that the predictive properties of the situation determine the repertoire, the set of activities from which consequential, operant, reinforcement can select.
Moreover, the more predictive the situation, the more limited the repertoire might be, so that in the limit the subject may behave in persistently maladaptive way — just so long as it gets a few reinforcers.
Many of the behaviors termed instinctive drift are like this. When levels of arousal become too high, performance will decrease; thus there is an optimal level of arousal for a given learning task. This bitonic relation seems to be the result of two opposed effects. On the one hand, the more predictive the situation the more vigorously the subject will behave — good. Autoshaping was so named because it is often used instead of manual shaping by successive approximations, which is one of the ways to train an animal to perform a complex operant task.
Shaping is a highly intuitive procedure that shows the limitations of our understanding of behavioral variation. The trainer begins by reinforcing the animal for something that approximates the target behavior. If we want the pigeon to turn around, we first reinforce any movement; then any movement to the left say then wait for a more complete turn before giving food, and so on.
But if the task is more complex than turning — if it is teaching a child to do algebra, for example — then the intermediate tasks that must be reinforced before the child masters the end goal are much less well defined. Should he do problems by rote in the hope that understanding eventually arrives? And, if it does, why?
Or should we let the pupil flounder, and learn from his mistakes? A few behaviorists deny there even is such a thing. List of Partners vendors. Operant conditioning, sometimes referred to as instrumental conditioning , is a method of learning that employs rewards and punishments for behavior. Through operant conditioning, an association is made between a behavior and a consequence whether negative or positive for that behavior.
For example, when lab rats press a lever when a green light is on, they receive a food pellet as a reward. When they press the lever when a red light is on, they receive a mild electric shock. As a result, they learn to press the lever when the green light is on and avoid the red light. But operant conditioning is not just something that takes place in experimental settings while training lab animals.
It also plays a powerful role in everyday learning. Reinforcement and punishment take place in natural settings all the time, as well as in more structured settings such as classrooms or therapy sessions.
Operant conditioning was first described by behaviorist B. Skinner , which is why you may occasionally hear it referred to as Skinnerian conditioning. As a behaviorist, Skinner believed that it was not really necessary to look at internal thoughts and motivations in order to explain behavior. Instead, he suggested, we should look only at the external, observable causes of human behavior. Through the first part of the 20th century, behaviorism became a major force within psychology.
The ideas of John B. Watson dominated this school of thought early on. Watson focused on the principles of classical conditioning , once famously suggesting that he could take any person regardless of their background and train them to be anything he chose.
Early behaviorists focused their interests on associative learning. Skinner was more interested in how the consequences of people's actions influenced their behavior. Skinner used the term operant to refer to any "active behavior that operates upon the environment to generate consequences.
His theory was heavily influenced by the work of psychologist Edward Thorndike , who had proposed what he called the law of effect. Operant conditioning relies on a fairly simple premise: Actions that are followed by reinforcement will be strengthened and more likely to occur again in the future. If you tell a funny story in class and everybody laughs, you will probably be more likely to tell that story again in the future. If you raise your hand to ask a question and your teacher praises your polite behavior, you will be more likely to raise your hand the next time you have a question or comment.
Because the behavior was followed by reinforcement, or a desirable outcome, the preceding action is strengthened. Conversely, actions that result in punishment or undesirable consequences will be weakened and less likely to occur again in the future.
If you tell the same story again in another class but nobody laughs this time, you will be less likely to repeat the story again in the future. If you shout out an answer in class and your teacher scolds you, then you might be less likely to interrupt the class again.
Skinner distinguished between two different types of behaviors. While classical conditioning could account for respondent behaviors, Skinner realized that it could not account for a great deal of learning. Instead, Skinner suggested that operant conditioning held far greater importance. Skinner invented different devices during his boyhood and he put these skills to work during his studies on operant conditioning. He created a device known as an operant conditioning chamber, often referred to today as a Skinner box.
The chamber could hold a small animal, such as a rat or pigeon. The box also contained a bar or key that the animal could press in order to receive a reward. In order to track responses, Skinner also developed a device known as a cumulative recorder. At the turn of the 20 th century, psychologists had grown very interested in behaviorism. Already, the concept of classical conditioning , had been proposed. Behaviorists who subscribed to the classical conditioning concept believed that learning was a mental and emotional process.
They believed that the best way of studying behavior and learning was by looking at the internal thoughts and motivations of an individual. While Skinner did not deny that the fact that internal thoughts and motivations have an influence on behavior, he thought that viewing them as the key drivers of behavior was too simplistic to explain complex human behavior. On the other hand, actions that lead to unfavorable outcomes are less likely to be repeated.
Operant conditioning is based on an equally simple premise. Actions that are reinforced will be strengthened and are more likely to be repeated in future. For example, if you take some risks at work and your boss praises you for your courage, you are more likely to take another risk in future.
If you purchase from a particular store and they give you a discount, you are likely to shop from the same store again in future. In this case, receiving praise from your boss and receiving a discount from the store are positive reinforcements that encourage your behavior. The outcomes of your actions were desirable, thus strengthening the preceding actions.
Such actions are weakened and are less likely to be repeated. If you took a risk at work and your boss scolded you for acting without running things through him, you will be less likely to take another risk at work. Similarly, if you shop from a particular store and you later realize they sold you a low quality product, you are less likely to shop from them in future.
In this case, the scolding from your boss and the poor quality product are undesired outcomes or punishments. To test his theory, Skinner invented the operant conditioning chamber , also known as the Skinner box, which he used to conduct experiments using animals.
The operant conditioning chamber allowed Skinner to isolate small animals, such as rats and pigeons, and then expose them to carefully controlled stimuli. Skinner also came up with another invention known as the cumulative recorder , which allowed him to keep a record of the response rates the number of times an animal pressed a key or bar inside the Skinner box.
Skinner stated that individuals both humans and animals display two key types of behaviors. The first type is known as respondent behaviors. Respondent behavior refers to actions automatically and on reflex. A good example of respondent behavior occurs when you touch something hot. Without thinking about it, you immediately draw your hand back from the hot surface.
Dogs automatically and involuntarily salivate to the presentation of food. By ringing a bell every time before presenting food to his dogs, Pavlov formed an association between the ringing of a bell and the presentation of food, and his dogs learned to salivate when they simply heard a bell, even if no food was presented. Skinner noted that classical conditioning was good at explaining how respondent behaviors affected learning. However, not all learning is based on respondent behaviors.
According to Skinner, the greatest learning came from voluntary actions and their consequences. The second type of behaviors that Skinner identified are known as operant behaviors. Skinner defined operant behaviors as voluntary behaviors that act upon the environment resulting in a consequence. Unlike respondent behaviors, operant behaviors are under our conscious control, and can be learned voluntarily. According to Skinner, the outcomes of our actions have a major impact on the process of learning operant behaviors.
We noted earlier that operant conditioning is based on two major factors: reinforcement and punishment. Let us take a look at these two factors. Reinforcement refers to any environmental consequence to an action that increases the likelihood of the action being repeated. Reinforcement strengthens behavior. There are two types of reinforcement:. Positive reinforcement : This refers to consequences where a favorable event or outcome is added following a certain behavior, leading to the strengthening of the behavior.
For example, when you go the extra mile and receive praise from your boss, this is an example of positive reinforcement. To show how positive reinforcement works, Skinner placed a hungry rat in the operant conditioning chamber. In one side of the chamber was a lever that dropped food pellets into the chamber when pressed. As the rat moved around the box, at one point it would accidently press the lever, resulting in a pellet of food being dropped into the chamber immediately.
Over time, the rat would learn that pressing the lever led to food being released, and it quickly learned to go directly to the lever whenever it was placed in the chamber. Receiving food every time it pressed the lever acted as positive reinforcement, ensuring that the rat would keep pressing the lever again and again. Negative reinforcement : This refers to consequences where an unfavorable event or outcome is removed following a certain behavior.
In this case, the behavior is strengthened not by the desire to get something good, but rather by the desire to get out of an unpleasant condition. A good example of negative reinforcement is a teacher promising to exempt students who have perfect attendance from the final test.
This encourages them to attend all classes. Such responses are referred to as negative reinforcement because the removal of the unfavorable event or outcome is rewarding to the individual.
While they have not actually received anything, not sitting a test can still be seen as a reward. To show how negative reinforcement works, Skinner placed a rat in the operant conditioning chamber and then delivered an unpleasant electric through the floor of the chamber. As the rat moved about in discomfort, it would accidently knock the lever, switching off the electric current immediately.
Over time, the rat learns that it can escape from the unpleasant electric current by pressing the lever, and it starts going directly to the lever every time the current is switched on. Punishment refers to any adverse or unwanted environmental consequence to an action that reduces the probability of the action being repeated. In other words, punishment weakens behavior. There are two types of punishment:. Positive punishment : This refers to consequences where an unfavorable or unpleasant event or outcome is presented or applied following a certain behavior in order to discourage the behavior.
For instance, when you get fined for a traffic infraction, that is an example of positive punishment. An unfavorable outcome payment of the fine is applied to discourage you from committing the infraction again.
0コメント