Should We Reinforce the Effort or the Results?

Dog using its nose to search a target scent (photo from http://www.houndcrazy.com).

If you ask, “should we reinforce effort or the results?” you are liable to get as many answers supporting one idea as the other. Supporters of reinforcing effort sustain that reinforcing results creates emotional problems when one doesn’t succeed and decreases the rate of even trying. Supporters of reinforcing results maintain that reinforcing effort encourages sloppiness and cheating.

I shall proceed to argue for and against both theories and prove that it is not a question of either/or, rather of defining our criteria, processes and goals clearly.

I shall compare the learning of some skills in dogs and humans because the principles are the same. The difference between them and us is one “of degree, not of kind,” in the words of Darwin.

I will use SMAF to accurately describe some of the processes whenever I consider it advantageous. If you are not proficient in SMAF, you can read the free SMAF manual at http://wp.me/p1J7GF-8Y.

The main difficulty in some learning processes is reinforcing the right behavior at the right time, which bad teachers, bad parents and bad trainers do not master (here bad means inefficient, it is not a moral judgment).

Much of my personal work with dogs (and rats) is and has been detection work, mainly of narcotics and explosives, but also of people tobacco and other items. One of the first signals I teach the animals is a disguised reinforcer.

With dogs, I use the sound ‘Yes’ (the English word) and with rats a ‘bip….bip….bip’ sound produced on their backpacks and triggered by me.The signal part of this signal/reinforcer means “continue what you’re doing” and the reinforcer part “we’re OK, mate, doing well, keep up.” This is a signal that becomes a reinforcer: Continue,sound(yes) that becomes a “!+sound”(yes).

The difference between the most used “!±sound”(good-job) and “!+sound”(yes) is that the former is associated and maintained with “!-treat”(small food treat) and “!-body(friendly body language) and the latter with a behavior that will eventually produce “!-treat”. The searching behavior does not produce a treat, but continuing searching does, eventually (find or no find). This is why “!+sound”(yes) is a disguised Continue,sound(yes), or the other way around.

Why do I need this interbreeding between a signal and a reinforcer?

Because the signal ‘Search’ (Search,sound) does not mean ‘Find the thing.’ Sometimes (most of the time) there’s nothing to find, which is a relief for all of us (airports and the likes are not that full of drugs and explosives).

So, what does Search,sound mean? What am I reinforcing? The effort?

No, I’m not. We have to be careful because if we focus on reinforcing the effort, we may end up reinforcing the behavior of the animal just strolling around, or any other accidental and/or coincidental behavior.

I am still reinforcing the result. ‘Search’ means, “Go and find out whether there is a thing out there.” ‘Thing’ is everything that I have taught the dog to search and locate for me, e.g. cocaine, hash, TNT, C4.

“Go and find out whether there is a thing out there” leaves us with two options that are equally successful: ‘here’ and ‘clear.’ When there is a thing present, the dog answers ‘here’ by sitting as close to it as possible, or pointing to it (I have taught it those behaviors). When there is no thing, I want the animal to tell me exactly that: the dog answers ‘clear’ by coming back to me (again because I have taught it to do that). We have two signals and two behaviors:

Thing,scent => dog sits (‘here’ behavior).

∅Thing,scent => dog comes back to me (‘clear’ behavior).

The signals are part of the environment, they are not given by me, which does not matter: a signal (SD) is a signal*. An SD is a stimulus associated with a particular behavior and a particular consequence or class of consequences. When we have two of them, we expect two different behaviors and when there is none, we expect no behavior. What fools us here is that in detection work we always have one and only one SD, either a scent or the absence of one. It is not possible to have none. Either we have a scent or we don’t, which means that either we have Thing,scent or we have ∅Thing,scent, each requiring two different behaviors as per usual. The one SD is the absence of the other.

Traditionally, we don’t reinforce a search that doesn’t produce a find. To avoid extinguishing the behavior, we use ‘controlled finds’ (a drug or an explosive, we know it is there because we have placed it there to give the animal a possibility to obtain a reinforcer).

This solution is correct, except that it teaches the dog that the criterion for success is ‘to find’ and not ‘not to find,’ which is not true. ‘Not to find’ (because there is no thing out there) is as good as ‘to find.’ The tricky part is, therefore, to reinforce the ‘clear’ and how to do it to avoid sloppiness (strolling around) and cheating.

Let us analyze the problem systematically.

The following process does not give us any problems:

{Search,sound ⇒ b1(dog searches) => “!+sound”(yes) or Continue,sound(yes) ⇒ b1(dog searches) ⇒ dog finds thing (Thing,scent) ⇒ b2(dog sits=’here’ behavior) => “!+sound” + “!-treat”};

No problem, but what about when there is no thing (∅Thing,scent)? If I don’t reinforce the searching behavior, I might extinguish it. Then, I reinforce the searching with “!+sound”(yes):

{“Search,sound” ⇒ b1(dog searches) => “!+sound”(yes) ⇒ b1(dog searches) => ∅Thing,scent ⇒ b3(dog comes back to me=’clear’ behavior) => “!+sound” + “!-treat”};

It all looks good, but it poses us some compelling questions:

How do I know the dog is searching versus strolling around (sloppiness)?

How do I know I am reinforcing the searching behavior?

If I reinforce the dog coming back to me, then next time I risk the dog having a quick sniff round and coming straight back to me. That’s the problem. I want the dog to come back to me only when it finds nothing (as in it didn’t find anything).

Problems:

Reinforcing the searching behavior.

Identifying the searching behavior versus strolling around (sloppiness). How can I make sure that the dog always searches and never just strolls around?

Solution:

Reinforcing the searching behavior with “!+sound”(yes) works. OK.

Remaining problem:

I have to reinforce the ‘clear’ behavior (coming back to me), but how can I ensure the dog always searches and never just strolls around (avoid sloppiness)?

How can I make sure the dog has no interest in being sloppy or cheating me?

Solution:

To teach the dog that reinforcers are only available if and only if:

1. the dog finds the thing. {Thing,scent ⇒ b2(dog sits) => “!+sound” + “!-treat”};

2. the dog does not ever miss a thing. {∅Thing,scent ⇒ b3(dog comes back to me) => “!+sound” + “!-treat”};

Training:

I gradually teach the dog to find things until I reach a predetermined low concentration of scent (my goal). In this phase of training, there is always one thing to find. After 10 consecutive successful finds (my criterion and quality control measure), all producing reinforcers for both the searching (“!+sound”(yes)) and the finding (“!+sound” + “!-treat”), I set up a situation with no thing present (∅Thing,scent). The dog searches and doesn’t find anything. I reinforce the searching and the finding (no-thing) as previously. Next set-up: I make sure there is a thing to find and I reinforce both searching and finding.

I never reinforce not-finding a thing that is there, nor finding a thing that is not there.

Consequence: the only undesirable situations for a dog is

(1) not-finding a thing that is there (the dog did not indicate Thing,scent), or (2) indicating a thing that is not there (the dog indicates ∅Thing,scent).

(1) {Thing,scent ⇒ b3(dog comes back to me=‘clear’ behavior) => [?+sound] + [?-treat]};

Or:

(2) {∅Thing,scent ⇒ b2(dog sits=‘here’ behavior) => [?+sound] + [?-treat]};

This is (negatively) inhibiting negligence, but since it proves to increase the intensity of the searching, we cannot qualify it as an inhibitor (earlier punisher). Therefore, we call it a non-reinforcer: “∅+sound”, “∅-treat”.

In the first case: Thing,scent => Dog comes back to me => [?+sound] + [?-treat].

Becomes:

Thing,scent => Dog comes back to me => “∅+sound”, “∅-treat”.

Then:

{Thing,scent ⇒ b3(dog comes back to me) => “∅+sound”, “∅-treat” ⇒ b1(dog searches–more intensively) => Thing,scent ⇒ b2(dog sits=‘here’ behavior) => “!+sound” + “!-treat”};

In the second case, I have to be absolutely sure that there is indeed no thing. The training area must be free from any scent remotely similar to the scent we are training (Thing,scent). This is an imperative, especially in the first phases of the training process, and the trainer that misses this point is committing major negligence.

Nevertheless, should the dog, show ‘here’ for ∅Thing,scent, then we can use the same procedure as above:

{∅Thing,scent ⇒ b2(dog shows ‘here’ behavior) => “∅+sound”, “∅-treat” ⇒ b1(dog searches–more intensively) => ∅Thing,scent ⇒ b3(dog comes back to me=‘clear’ behavior) => “!+sound” + “!-treat”}.

What if later the dog doesn’t find a thing that is there in a lower concentration than the one I used for training, or is masked by other scents?

That’s no problem, it’s not the dog’s fault. I didn’t train for it. The dog doesn’t know that it is making a mistake by giving me a wrong ‘clear.’ As far as the dog is concerned, the room is clear: {∅Thing,scent ⇒ b3(dog comes back to me => “!+sound” + “!-treat”}; The dog was not strolling around and is not cheating me.

A human example:

I reinforce the child trying to solve a math problem. ‘Well done, but you got it wrong because…” The solution is wrong, but the method was correct. Therefore, it is all a question of training. The ‘wrong’ will be eliminated with more or better training, or maybe it was caused by an excessive increase in the difficulty curve of the problem (the teacher’s problem). We are not reinforcing trying; we are reinforcing the correct use of a method.

Why reinforce the process?

We must reinforce the process because of its emotional and motivational consequences. The dog and the child must accept the challenge, must want to be challenged, and be able to give their best in solving the problem. The exercise in itself will eventually end up being self-reinforcing.

Are we reinforcing the effort rather than the success?

No, we are not. Reinforcing the effort rather than the result can even lead to false positives. The animal indicates something that it is not there because it associates the reinforcer with the behavior, not the thing. Children give us three-four quick, consecutive, wrong answers if we reinforce the trying, not the process (thinking before answering).

We reinforce result (success) only.

When the dog doesn’t find because there’s nothing to find, that is success. When the dog doesn’t find because the concentration was too low, that is also success because ‘too low’ is here equal to ‘no thing.’ When the child gets it wrong, it is because the exercise exceeded the capacity of the child (he or she has not been taught to that level).

We are still reinforcing success and exactly what we trained the dog and the child to do. We don’t say to the child, “Well, you tried hard enough, good.”

We say, ” Well done, you did everything correctly, you just didn’t get it right because you didn’t know that x=2y-z and there was no way of you knowing.”

Next time, the child gets it right because she now knows it; and if not, it is because x=2y-z exceeds the capacity of that particular child in which case there’s nothing you can do about it.

The same goes for the dog: the dog doesn’t indicate 0.01g of cocaine because I trained it to indicate as low as 0.1g. When I reinforce the dog’s ‘clear,’ I say, “Well done, you did everything correctly, you just didn’t get it right because you didn’t know that 0.01g cocaine is still the thing.”

Now, I train the dog that ‘thing’ means ‘as low as 0.01g cocaine’ and either the dog can do it or it cannot. If it can, good; if it cannot, there’s nothing you or I, the dog or the child can do about it.

Conclusion:

We reinforce result, success, not the effort, not the trying. We must identify success, have clear criteria for success, plan a successive approach to our goal and gradually increase difficulty. We must be able to recognize limits and limitations in ourselves, in the species we work with, in the individuals we tutor, in the particular skill we teach. We must know when we cannot improve a skill any further and when an individual cannot give us more than what we are getting; and be satisfied with that.

Have a great day!

R-

Footnote: * Strictly speaking, the scent that the detection dog searches is not a signal, but a cue, because it is not intentional. In this context, however, it is and SD because we have conditioned it to be so and, therefore, we can call it a signal. Please, see “Signal and Cue—What is the Difference?” at http://wp.me/1J7GF.