Pages Navigation Menu

Parapsychology articles and news

Psi Experiments website announces the launch of its third psi experiment

Mind-Energy.net is proud to announce the launch of its third Psi Experiment. In this experiment, several measures were taken to make the results of this experiment more credible, after the experiment design issues of the first two were taken into account.


In this experiment you’ll be shown with a photograph of a random playing card on a table and you’ll need to answer which color is the card’s suit – red or black. You can see it as a simple game.

Each time you play this game, the computer reshuffles the cards and you’re presented with another photograph of another card. So, you can play as long as you want. After each 20 trials, you’ll be immediately able to see the statistics of your trials.

So, don’t wait and go to Third Psi Experiment.

   

 

 

62 Comments

  1. Sorry, I misread your statement. Oddly enough, I just assumed that you were talking about a central point rather than a minor one. I had not originally said that I had only looked at the protocol for five minutes. I did say repeatedly later on that I had only looked at it briefly. My statement should have been “I also said after looking at the problem for about 5 minutes that it looked” “I also said that after looking at the problem for about 5 minutes that it looked”. Notice the ungrammatical repeated “that” indicating an error in revising a sentence. Normally one relies on people to interpret small errors in the most reasonable ways rather than to assume a misstatement.

    The central point — an initial evaluation, later stated to be quick, revealed the problem. Contrary to your implication I was not reluctantly brought to that opinion by the brilliant Skeptic. A bit more evaluation on my part revealed the problem to be more severe than I had at first thought it to be in practice. You claimed that it was even worse, and I leaned to the opinion that it probably wasn’t. It was all academic at that point, of course. Eventually you proved your point — it didn’t mean a whole lot in practice, but it seemed important to you. Whatever makes you feel good.

    Your representation that “It took some convincing, but eventually the psi-believers here accepted that this experiment is vulnerable to cheating” has zero bearing on the facts and your arguing about whether or not I had previously said that I agreed with the conclusion you later stated after 5 minutes of effort is meaningless. The flaw is obvious and never under dispute. You found a clever way to exploit it, good work.

  2. Topher Cooper for some odd reason claimed:
    “I said from the beginning that this was not a secure protocol and that it could not be relied on for evidence of psi. I also said that after looking at the problem for about 5 minutes that it looked likely to me that it would take enough effort to take significant advantage of the weakness that the results might be interesting though unreliable.”

    No Topher, you didn’t say “after looking at the problem for about 5 minute.” You may now wish you had said that, but you did not.

    ” paraphrasing: ‘Its useless trash because I say so! It clearly is flawed and anything flawed is worthless.'”

    No Topher; I quoted Dean Radin on a “lesson learned” from his on-line tests: “/Any weakness in a freely accessible, Web-based experiment will be exploited, no matter how apparently inconsequential that weakness may be./” I cited a clear statement of a noted Parapsychologists on just this point, and you paraphrase it as me saying “because I said so.” Why act like that Topher?

    And check it out: your assessment that such explanations of a positive result would be of fairly low probability was *after* I cited Radin’s warning. On the other hand, your conclusion there was that if psi exists and we get a positive result, then psi *might* be in play. True, but what is the experiment for, when one can conclude as much from omphaloskepsis?

    Correct me if I’m wrong: you also thought you could detect cheating in post-analysis. I don’t think so. In my attack, there are two kinds of runs: the first are truly random, then the cheater knows every card and can induce any outcome he wants. What detection can you do that a cheater cannot beat?

    Ultimately, I don’t think you grasp the depth of your naivete. All along, you seemed to think that the flaws and exploits you saw at the time were the limits of how one could cheat. You don’t seem to have learned much beyond how well one could cheat in this particular case.

    -Bryan

    • It is true that because of lack of time, I had not completed nor sent the comments that I had intended to send to Jacob commenting on the weakness. My first comment on the issue *was* that the experiment was not evidentially valid. At that point you had presented no particular reason to believe that other than your usual “I have spoken!” stuff. Frankly, that doesn’t count as any reason to revise my opinions — what I said was what I believed all along.

      And I was quite aware of Dean’s statement, and I agree with it — under the conditions he did his test under. Hacking can never be completely ruled out (disrupting psi experiments is something of particular interest to Skeptics, and there is, unfortunately a minority who seem to believe that ethics does not apply as long as its violation is in the interests of *TRUTH* as they see it), but a one-month experiment not widely publicized and not run by a best-selling author with multiple television appearances is a lot less tempting than Dean’s on-line experiment.

      So, I thought that it was less of a problem. But you seem to have difficulty with words like “less” — you always seem to be arguing that in blacks and whites — that when I say that something is less likely that I somehow really mean that it is no problem.

      And yes, I did think that we might be able to find some statistical trace (depending on what information is available about source URLs etc) of some forms of hacking and cheating. I did not ever think nor say that we would be able to detect any form of cheating (more black and white interpretation here). If there was good records kept of usage and if forensic analysis showed no sign of faking than that logically would raise confidence *some* though never fully. After all, if there is one lesson to be learned from decades of computer security, its that while some hackers are very clever, most hackers are complete idiots.

      The real world is filled with uncertainty. I am not “naive” because I refuse to deal with that. Part of normal science is that even poor, inconclusive experiments can be analyzed to suggest hypotheses that might be examined in a more rigorous setting.

  3. It took some convincing, but eventually the psi-believers here accepted that this experiment is vulnerable to cheating. In private e-mail, one asked me how to do the undetectable cheating I claimed to be possible. I had stated, “a straightforward solution will usually find all the cards after 35 random runs.” A “run” is the sequence of 20 guesses that the web site scores. Since writing that, I’ve extended the method, and my tests show that I usually get a unique solution after 12 runs. First I’ll describe the straightforward solution, which is to solve for 36 unknowns with 36 simultaneous linear equations.

    Define an analogous experiment: instead of black and red, we use -1 and 1. Instead of the number right out of 20, our score is the sum of (guess * card) over the 20 cards in the run, which is the same as adding 1 for a right guess and subtracting 1 for a wrong guess. Importantly, from the score the web site provides, call it ‘x’, we can compute the score I’ve defined here, call it ‘s’, as s = (2 * x) – 20.

    There are 36 cards and thus 36 unknowns, call them c1, c2, …, c36. Each run gives us a linear equation, for example:

    1*c1 + 0*c2 + -1*c3 + … + -2*c36 = -4

    In this example I guessed 1 (red) for card c1. Card c2 did not come up in the run. I guessed -1 (black) for card c3 and so on through card 36 which came up twice and I guessed black both times. The web site said I got 8 right out of 20, which is two more wrong guesses than right guesses, so the sum is -4.

    We get one equation for free: there are as many black cards as red cards, so the sum of all the c’s is zero. I need 35 more equations to solve for all the unknowns. There is a chance that they will not all be linearly independent, but it’s rare. The equations are linear, so solution is easy; see:

    http://en.wikipedia.org/wiki/System_of_linear_equations

    I can get by with fewer runs by guessing at some of the cards. Guessing one card gives me one equation. I have to try all the possibilities, so guessing at 5 cards means solving a system of 36 linear equations 2**5 = 32 times. I can distinguish wrong guesses when something other than plus or minus one appears in the solution vector.

    I was surprised how well card-guessing worked. After 11 runs, there were usually multiple possible solutions; after 12, there was usually just one. The computation gets slow, and were I starting anew, I think I would use a more brute-force like method from the start.

    Many alleged demonstrations of psychic ability depend on information being unavailable by normal means. Here we have an example of how naive psychic believers tend to be in thinking they have controlled normal channels.

    • I realize that creating your own version of events makes it so much easier to declare a victory, and declare everyone who disagrees with you as stupid.

      I said from the beginning that this was not a secure protocol and that it could not be relied on for evidence of psi. I also said that after looking at the problem for about 5 minutes that it looked likely to me that it would take enough effort to take significant advantage of the weakness that the results might be interesting though unreliable.

      At about that point you came up with your logical argument — paraphrasing: “Its useless trash because I say so! It clearly is flawed and anything flawed is worthless.” I pointed out then that I agreed it was flawed. I also spent about 10 more minutes of analysis and noted that the flaw was easier to exploit than I had thought so that I no longer believed that a positive outcome was of any particular interest.

      At that point I still felt that I had reason — not convincing evidence, but reason — to believe that it it would probably be difficult to completely bust the experiment wide open.

      That’s the point when you stopped making unsupported pronouncements and actually presented some evidence. As I said then: “Well done!” You obviously worked very hard on this. I did a quick check of your solution and my first try didn’t come out the same as yours, so I checked some stuff with you. When I had confirmed that you *had* done it as it should have been done, I went back to my version and found my error (we call this process “replication”).

      If you mean by “how naive psychic believers tent to be thinking they have controlled normal channels” that you are speaking about Jacob up to the point when I let him know that this was not secure, then your statement is accurate though unnecessarily hostile and not backed up at all by the rest of your posting (though it provided some details of already established information). If you are referring to me, then it is completely contrary to the facts.

      I could say that “here we have an example of how naive so-called Skeptics are only interested in facts and logic when they support their faithful beliefs.”

  4. I just realized that Bryan believes firmly and totally in psi — he just doesn’t realize it. You see he believes that independent recording of targets from calls (where the targets and calls are recorded by different people who have not contact with the other) and automatic recording of targets and calls are insufficient to avoid bias if the people conducting the experiment have a positive intent about the outcome. No proposed mechanism. But if the targets are provided by someone hostile to a positive result supplies the target this effect disappears.

    Sounds like psi to me.

    And Bryan — I don’t have the time nor the interest in being your research assistant. By the way, I used a cryptographic protocol to allow multiple observers to control the target presentation in an on-line experiment about 10 years ago. Didn’t get the turnout necessary to reach my predesignated criteria though, so it isn’t published. Since it is non-peer reviewed, it doesn’t count in one way, but it does show that it isn’t exactly radical.

    Shoot away, I’m way behind at work.

  5. “So you are saying that your continued use of the latter as if it were the former is what? … deliberate deception?”

    I cannot tell whether this is a result of you making up more positions for me, or another result of you not knowing what you are talking about. Where did I use one for the other?

    “You claimed a ‘consistent pattern’ but instead of citing a survey you cite two isolated cases”

    Two cases shows it happens and the lack of counter-examples shows consistency. Where are the psi demonstration under the controls of respected skeptics? Randi’s group has tested claims of psi many times. The results are entirely consistent.

    You can go to YouTube and watch video of Geller fail to perform on national T.V. You might not like Project Alpha, but it was something a turning point; Parapsychology made much less of the abilities exceptional individuals after the project.

    “Particle physics certainly has consistent results.”

    Great. Whatever you were on about, it has nothing to do with my case against Parapsychology.

    “I’ve shown you the ‘sun’ over and over again. You have replied that you will only believe it if it shows up when and where you want it to.”

    Please stop fabricating replies for me.

    “I’m sure that you are sincere in your current belief that the demo would convince you. But I find it hard to believe […]”

    Let’s find out. What can you demo?

    “without ever looking at them”

    You don’t know what I’ve looked at.

    “Furthermore, if you did become convinced then that would be just one more person who was no longer a skeptic. By your logic that should have no conceivable influence on anyone else.”

    That’s why laboratory science requires repeatable results.

    The controls I propose are beyond what I’ve seen in Parapsychology papers, in that they are adversarial. They avoid the situation where the people applying the controls desire a result that could be produced by compromise of the controls.

    For a large class of Parapsychology experiments, I think I can describe even stronger measures where multiple parties could certify the result. That’s still not as good as the repeatable experiments other sciences have, but it would be much more convincing than the laboratory anecdotes Parapsychology now offers. My own theory is that psi would simply fail to appear under controls multiple parties can enforce.

    • “So you are saying that your continued use of the latter as if it were the former is what? … deliberate deception?”

      I cannot tell whether this is a result of you making up more positions for me, or another result of you not knowing what you are talking about. Where did I use one for the other?”

      OK, the answer is that you really don’t understand the difference. “Repeatablity” is a term for a continuum that has nothing to do with evidence. “Replicability” is a term for a Ydichotomy that is relevant to experimental evidence. You keep using the former as if it were the latter. Parapsychological experiments have been massively demonstrated to be replicable — by being replicated. They have, however, been only moderately repeatable — an annoyance but in no way weakens the evidence. That is common in science. I have already explained this.

      “Great. Whatever you were on about, it has nothing to do with my case against Parapsychology.”

      It has everything to do with it. You are still saying that because we choose to label some highly repeatable experiments are labeled for our convenience “particle physics” its magically confers a pass to the majority of experiments that we choose to label “particle physics” of the “flaw” you identify in what we label parapsychology. So since the PEAR lab labeled its results as “engineering” rather than parapsychology those results are indisputable? Does the fact that the highly repeatable “EEG” was invented for what was to be later called parapsychology mean that we can credit its results to make the completely different experiments valid?

      “Please stop fabricating replies for me.”

      I wasn’t — I was metaphorically describing the reply you have already made repeatedly. Its pretty accurate to (though as a metaphor it is, of course, grossly imprecise).

      ——-

      Gellar is a frequent, likely consistent, trickster. He has failed to perform some tricks when prevented from fraud. No one has found, however, any flaws in the controls on the single scientific experiment you cited. It is not an example of what you claimed.

      I would date the falling away from extraordinary subjects as having largely happened a decade before that. In many ways it dates to the Rhines founding of the laboratory science of parapsychology. Before that, exceptional subjects were the norm, after that they became the exception. Some experiments continue to be done. The temptation of possible large effect sizes giving us the opportunity to learn something that might be tested under less difficult conditions to control is high. Project Alpha is a big deal only to Skeptics. It was just a publicity stunt by Randi. And as I pointed out what he lead people to think happened simply didn’t.

      In any case. It was not an example of an experiment that skeptics required higher controls for at which point the effect disappeared.

      You haven’t shown any examples of it. A consistent pattern is not established by zero examples, even if we accept the rather bizarre statement that two observations proves consistency. (Note thousands of consistent parapsychology experiments do not prove consistency, but two irrelevant examples backs up your belief).

      ——-

      “You don’t know what I’ve looked at.”

      I have strong evidence.

      “The controls I propose are beyond what I’ve seen in Parapsychology papers, in that they are adversarial.”

      And there’s some more.

      Its been done. Also it isn’t particularly adversarial its simply a rather common externally monitored design. The only thing that makes it adversarial is the open hostility of the monitor. The same criticism, that the experimenter might cheat, could be leveled against any field — with anyone who gets the wrong results being identified as the potential cheat. Why signal out parapsychology? That’s the special pleading you accused me of.

      If you are down to accusing scientists of cheating with your only evidence that they get results grossly inconsistent with your beliefs, then you have admitted defeat.

      “That’s why laboratory science requires repeatable results.”

      It does not. It requires replicability — which has been met. Your emphasis on the word “laboratory” seems to indicate that you are still making an arbitrary absolute distinction between laboratory and observational science. Its a continuum. All science is ultimately about observations. In laboratory experiments we have some better but not necessarily perfect control over the conditions under which the phenomenon will manifest. There is no logical reason to say that there are intrinsically different meanings to what constitutes valid evidence. We set up conditions that we hope will encourage a phenomenon (say a cosmic ray shower entering our lab, or a salt lick to draw animals to our blind) to occur and that will help us to distinguish it from other phenomena. That what non-theoretical science — all of it — is about. Your fantasy that laboratory work is automatically routinely reliable bears no relation to reality.

      Take the last shot — I’ve been spending to much time arguing that. I don’t promise not to say something if you say something to outragous.

  6. “In the language of engineering and measurement theory, it was completely accurate, which was what was needed, though it was not very *precise*.”

    For 8 heads in ten flips you had p=0.058. Accurate figures are 0.055 single-tailed, 0.11 two-tailed. Precise figures are 0.0546875 single-tailed, 0.109375 two-tailed. I agree that the two-tailed figure is the one to use, but your figure was 0.058 when the two-tailed value is 0.11.

    Are you actually now arguing that 0.058 is an accurate expression for 0.109375?

    “If the hypothesis is that percipients will score at above chance levels then a one-tailed procedure is, as you said, the appropriate choice.”

    That was your assumption. “So if we take this [.2144] as a typical hit rate,” you wrote.

    Here’s a more reasonable table. Assume each trial has a 0.2144 chance of success, where without psi the chance is 0.2.

    Trials Chance of hit-count reaching p <= 0.01, single tailed
    2000 0.23
    5000 0.58
    10000 0.89
    20000 0.996
    50000 0.99999999

    • Oops, cut off myself off with a less-than again.

      Trials Chance of hit-count reaching p = 0.01 or better, single tailed
      2000 0.23
      5000 0.58
      10000 0.89
      20000 0.996
      50000 0.99999999

    • Have it your way, although you continue to disagree about accuracy but argue about precision. What conclusion is changed by the difference? Perhaps since it happens to push above the .01 threshold (i.e., it is not, as I said “highly significant”) that changes the conclusion and therefore can be said to be inaccurate. I should at least have said in the original what procedure I used but I was trying not to be too technical.

      How come you started your table at a number of trials larger than we have actually managed to generate here. I was using figures comparable to the experiments already run. And of course, you’re table is simply wrong if you are correcting mine. Yours is for a different, though valid, experiment. If you are trying to imply that a two-tailed test is invalid or inappropriate than you are quite clearly mistaken. It simply answers a different question than does a one-tailed test. I am unsure which I would use in an actual experiment, but for this design, these are the correct figures — and yours are incorrect for the design given.

      • “Have it your way, although you continue to disagree about accuracy but argue about precision.”

        0.058 is not an accurate measurement for 0.109375. It is not merely imprecise. If you want to lecture about “the language of engineering and measurement theory” you should find out what significant figures are.

        “How come you started your table at a number of trials larger than we have actually managed to generate here.”

        I state numbers of trials easily obtained, then far surpassed, by other on-line psi experiments. The first two experiments here could measure only bias, and I expect the audience is more interested in psi. The first experiments had only one target to guess, where an interesting experiment could offer runs of targets.

        Was my note on Radin’s experience not clear? You can check it out in the paper I linked in my post #1 above. Another quote from the paper reads, “I failed to take into account the very high statistical power afforded by collecting trials not in the thousands, but in the millions.”

        “If you are trying to imply that a two-tailed test is invalid or inappropriate than you are quite clearly mistaken.”

        I’m implying it is a stupid choice under the stated assumptions. Yes, it is statistically valid; so is choosing the one-tailed test in the opposite direction. Given a psi hypothesis, the interesting question is how to maximize the probability of demonstrating it. If the table of probabilities is not showing the best we can do, then it is arbitrary and pointless.

        • “0.058 is not an accurate measurement for 0.109375. It is not merely imprecise. If you want to lecture about “the language of engineering and measurement theory” you should find out what significant figures are.”

          I know what significant figures are, they are a measurement of precision useful under some circumstances. Its handy because it ties into our notational system, but its not a fundamental measure of precision.

          “I state numbers of trials easily obtained, then far surpassed, by other on-line psi experiments.”

          By other on-line psi experiments I presume you mean Dean’s. He had exceptional turn out. Partly because he is rather well known, and partially because he expended resources to publicize it. Most on-line experiments have been less successful in recruiting. I won’t say your speculation about reasons for lower earlier turnout is impossible but you were “correcting” my table which was based on conducting an experiment something like the previous ones, but with better design. No objection to going higher, thought it odd that you left out what we have evidence we might actually get.

          “I’m implying it is a stupid choice under the stated assumptions. Yes, it is statistically valid; so is choosing the one-tailed test in the opposite direction. Given a psi hypothesis, the interesting question is how to maximize the probability of demonstrating it. If the table of probabilities is not showing the best we can do, then it is arbitrary and pointless.”

          If you assume that we will get psi hitting, then a one-tailed test is the most appropriate. If you assume that we might get psi-missing as has happened frequently in the past then you might choose a two-tailed design. In real life we don’t design experiments on the basis of *knowing* what the underlying reality will be. The only assumption was about that underlying reality (i.e. that psi would manifest positively).

          This is one of those surreal discussions in which a Skeptic vehemently insists that they know the characteristics of the phenomenon which they do not believe in.

        • “OK, the answer is that you really don’t understand the difference. ‘Repeatablity’ is a term for a continuum that has nothing to do with evidence. ‘Replicability’ is a term for a Ydichotomy that is relevant to experimental evidence.”

          You said I used repeatable where I meant replicable. I asked you say where, not to bloviate on what you think is good evidence. I used the word I meant, and that you disagree with what I say does not excuse making up stuff about what I say.

          “It has everything to do with it. You are still saying that because we choose to label some highly repeatable experiments are labeled for our convenience ‘particle physics’ its magically confers a pass to the majority of experiments that we choose to label ‘particle physics’ of the ‘flaw’ you identify in what we label parapsychology.”

          I never said anything like that. Again, please stop fabricating positions for me.

          “Gellar is a frequent, likely consistent, trickster. He has failed to perform some tricks when prevented from fraud. No one has found, however, any flaws in the controls on the single scientific experiment you cited. It is not an example of what you claimed.”

          It’s an example of a demonstration that failed under skeptical controls, which is exactly what I claimed.

          “Its been done. Also it isn’t particularly adversarial its simply a rather common externally monitored design.”

          So site something from the Parapsychology literature that uses cryptographic commitments as I described.

          “Take the last shot — I’ve been spending to much time arguing that. I don’t promise not to say something if you say something to outragous.”

          Yes. Funny when in your long-winded lectures on subjects you do not understand, you also complain about wasting your own time. Now who do you think needs this announcement of whether or not you will continue?

        • Oops, mis-threaded the above. It’s a response to #10.1

        • “I know what significant figures are, they are a measurement of precision useful under some circumstances. Its handy because it ties into our notational system, but its not a fundamental measure of precision.”

          So do I have this right: You understand significant figures; you put 0.058 where the actual value is 0.109375; and you call your numbers “completely accurate” just imprecise?

          “If you assume that we will get psi hitting, then a one-tailed test is the most appropriate.”

          No need for the ‘if'; that is what you did assume. You stated your figures under your assumption, “So if we take this as a typical hit rate and we get the same rate in our on-line experiment, how likely are we to get a ‘significant’ result.”

          “This is one of those surreal discussions in which a Skeptic vehemently insists that they know the characteristics of the phenomenon which they do not believe in.”

          I though I used your stated assumption about the psi phenomenon — oh look, I did.

          Actually, I did make another inference: where you had “hit rate”, I assumed you meant hit probability.

  7. Design Note for on-line experiments: Results

    In the modern form of traditional statistics the result of a single experiment in isolation is evaluated by computing its “p-value”. The p-value is the probability that you would get a result at least as extreme as that by chance alone. For example, if I want to test if a particular coin is biased I could flip it 10 times. If it came up heads 8 times could I say it was biased? Well, I can never say for sure, a fair coin *might* come up heads 8 times out of 10. What I can say is how unlikely that is and then decide if the coincidence is enough to make me believe that it probably is biased.

    In this case, the probability that one side or the other would come up 8 or more times by chance on a fair coin (never mind why its “or more”: its subtle and technical but important — just take it as the convention) is .058, so that is the p-value for this outcome of this experiment.

    The convention in most sciences is that if the p value is less than .05 (i.e., that there is less than 1 in 20 chance of it coming out that way just by chance) then it is called “significant” which means that we can conclude that there is probably something of interest happening.

    If the p-value is not less than .05 then that doesn’t mean that nothing is going on — it just means that we don’t have a good reason to believe that something is going on. Maybe nothing is going on, or maybe the number of trials (10) just wasn’t enough to produce a big enough difference from chance. For example, if we flipped the coin 2*10 (20) times and got 2*8 (16) heads the p-value is .0073 which is not only “significant” but “highly significant” (less than .01 — 1 chance out of 100).

    So what is the likelihood that an on-line experiment would produce a significant p-value given the amount of ESP typically seen in ESP experiments?

    For a ball park figure I grabbed a reference that happened to be handy: “Extra Sensory Perception After 60 Years” by Rhine, Pratt, Stuart and Smith which summarizes all the parapsychology experiments done up to its publication in 1940. I checked a table of experiments that used the Zener cars (5 “ESP” symbols on the cards) and that were judged to provide good sensory isolation of the cards from the percipients. Over several dozen experiments, constituting just under 1,000,000 trials the hit rate was .2144 instead of the .2 (1 in 5) expected by chance. Over 1000000 trials that results in a fantastically small p-value: .000…00084 where there are 284 zeros between the “.” and the “8”. So we can be pretty damn certain that something other than chance is causing the deviation (though these statistics don’t tell us whether its psi or something conventional like badly mixed decks).

    So if we take this as a typical hit rate and we get the same rate in our on-line experiment, how likely are we to get a “significant” result. Obviously, it depends on the number of participants. Here are the figures:

    trials prob. of getting signf outcome
    500 13%
    1000 22%
    1500 29%
    2000 36%
    2500 43%
    3000 51%

    So even if we get 3000 trials and psi really is operating at typical levels we only have one chance in 2 of getting a positive outcome. Remember that there is a 5% chance of getting a positive outcome even if nothing is going on.

    Highly significant is even harder:

    500 04%
    1000 08%
    1500 13%
    2000 17%
    2500 23%
    3000 27%

    The lesson is that we can do the best that we can but we shouldn’t get too discouraged if we don’t prove anything. The chances are not so small as to be ignored — especially if we get a good turnout and if we can improve that hit rate to above what we’ve labeled as typical — so this wouldn’t be a waste of time.

    • “For example, if we flipped the coin 2*10 (20) times and got 2*8 (16) heads the p-value is .0073″

      I get 0.0059.

      Likewise, I think your “prob. of getting signf outcome” figures are off. Not vastly, but significantly. Assuming a hit probability of 0.2 per trial, 500 independent trials reach significance with 116 hits, at p=0.043. If the actual hit probability is 0.2144, we should get 116 or more hits with probability 18%, not 13%.

      -Bryan

      • Not off, really, just different and perhaps slightly less precise.

        I used chi-square in both sets of calculations I used chi-square which is a fast, easy approximation. It is a standard approximation (essentially its using the normal approximation to the binomial distribution). I don’t use it where a precise value is important, but it seemed easy enough for these purposes.

        You used the more precise “exact binomial test” but used it incorrectly. Instead of 0.0059 you should have gotten twice that. My problem statement was clearly two tailed (you certainly would not have wanted to declare the coin “fair” if it came up “tails” all 20 times).

        Similarly I used a chi-square criteria for the outcome estimates — which is intrinsically two-tailed (i.e., it counts guessing no Zener trials out of 25 the same as guessing 10). This is a legitimate design decision (as long as it is made in advance) though I’m not sure that it is what I would actually use. In this case (since we are assuming psi hitting) it makes the calculations conservative. I did not bother to state that I was using a two-tailed test because it wasn’t really that important for the point I was making. Again, a chi-square criterion is an approximation, but not a bad one. Rather than doing the calculations I plugged the details into a simulation.

        It’s good, though, to have someone with the time and the interest to double check and correct my quick “back of the envelope” calculations.

        • “I used chi-square in both sets of calculations”

          Don’t do that. Chi-square is a terrible approximation with 10 or 20 trials. The reason I cited the single-tailed p=0.0059 is that your figure is even farther off for two-tailed.

          If one’s theory is that subjects will score above chance, a two-tailed test is ludicrous.

          What turnout can an on-line experiment get? Radin reported that his web experiments at gotpsi.org took more data in one year than Rhine and his colleagues collected in 60. Radin noted the problem of too *much* statistical power, not too little.

        • Beware the “fallacy of false precision.” I would not use chi-square under these condition for an actual experimental analysis. Chi-square is a standard approximation which was quite sufficient for these purposes. In the language of engineering and measurement theory, it was completely *accurate,* which was what was needed, though it was not very *precise*.

          For the coin flipping example, two-tailed was unambiguously the correct statistical procedure. Using a one-tailed procedure was neither precise nor accurate — it was just wrong.

          For the ESP experiment example: chi-square will give only small differences from an exact two-tailed procedure, for these purposes it was completely valid.

          If the hypothesis is that percipients will score at above chance levels then a one-tailed procedure is, as you said, the appropriate choice. If your hypothesis is that percipients will score at non-chance levels thereby making fewer assumptions beyond the essential hypothesis of an unexplained communication channel, then a two-tailed test is appropriate.

          A one-tailed test is more sensitive to positive deviations but is completely insensitive to negative ones. That puts us in the position of having to declare that there was no sign of an effect even if a percipient guessed no Zener cards out of 100 (probability of occurring without some kind of information “leakage”: 10^-70).

          Either procedure is statistically valid as long as the procedure is chosen as part of the design before the experiment is conducted. Both choices are routinely criticized by Skeptics even when selected before hand.

          Since it was observed in early experiments that sometimes the experiments showed significant negative results (for example, in sheep-goat tests) a common choice is the two-tailed procedure. For a while it was the standard, though much less so now. For a number of reasons — wanting my figures to be conservative, my use of classic experiments (many of which used two-tailed tests) for my base estimate, and the simplicity of using chi-square in my simulations — I chose 2-tailed.

  8. Design Note for on-line experiments: Rewards

    I’m taking this opportunity to try to do a design from scratch — just trying to think of what would work best based on successes and failures plus specific differential results that have been found in the past. This first note will be on reward.

    Experiments in the old Zener card days showed a pretty consistent response curve for success vs “external motivation” (i.e., explicit rewards for good scoring) — an “inverted U”. Small rewards improved performance but larger rewards suppressed it. We can only speculate but it seems reasonable that larger rewards suppressed results through one or both of two mechanisms:

    1) Our culture has a negative attitude about money — “filthy lucre” — its necessary and desirable but “tainted”. When rewards get large enough to be meaningful they stop being simply a concrete token of good performance — a “real pat on the head” — and becomes a morally suspect “pay-off”. In the percipients’ minds of course, which is what counts.

    2) As the monetary reward increases and the motivation to “get it right” increases the relaxed, playful attitude that seems to be associated with good ESP results disappears and with it the ability to score will. In other words, when the pressure increases, the percipient chokes.

    I’m going to assume that resources are available to give a reward, and see how they might be used.

    Lots of small rewards for small successes would be a logistic nightmare under these conditions. Instead I would have a successful trial rewarded with a “ticket” in a lottery. The lottery would contain one or more moderate prizes say 1 or 2 hundred dollars. Furthermore, to remove more of the moral taint, when people register they can designate any one of a supplied list of charities, or a charity that they choose, or themselves to receive the prize if they win it. Designating a charity also means that they can maintain their anonymity.

    I’m not sure that I can justify a cash prize over goods. Goods seems like a good encouragement for participating — much better than “we’ll pay you to take part”, but it feels like a more concrete “you won” acknowledgement to give cash — and a lot less problematical than giving some DVDs to a charity.

    • Topher,

      do you think (or know) that giving prizes for a *successful* trial gives better ESP performance?

      In the experiments like I run now half the trials are successful and of 20 trials 10 are (on average).

      How would you reward people in such a setup?
      That’s why I chose to reward for participation (sorry for DVD and books prizes, that’s what I could muster for that small project that I do).

      • Yes, classic experiments showed that small or moderate rewards for successful trials lead to an improved hit rate, but large rewards lead to no improvement or even a decrease from no reward (this pattern is one of many that are difficult for Skeptics to account for — so it is simply ignored). There is always a question of how such results will generalize to different situations (e.g., the isolation of the subject from the experimenter that an on-line experiment creates is going to create a different psyco-social environment, but it is anyone’s guess as to what that does to this effect), but the existing evidence indicates that properly done rewards for performance should boost it.

        The difficulty of dealing with rewarding many small, individually likely trials in an on-line environment is significant. You can’t just slide a coin across the table with each successful trial, or hand the percipient a check when you shake hands with them at the end of their runs. That was among my reasons for proposing a lottery with a bit more substantial reward. Each successful trial wins not money but a “ticket” representing a chance for money. Its more manegable, less of an aura of monetary taint, and more fun, I think.

      • I forgot to reiterate — I think that “goods” rewards, such as DVDs or books of likely interest to your participants are an *excellent* reward for *participation*. Psychologically it seems to me to be saying “thank you” in a substantive way while cash, even cash awarded by a drawing, feels much more like a fee and therefore makes participation more of a job and less of a co-operative game.

        A prize for success, however, shouldn’t say “thank you” but more of “You won! Here’s the prize”. To me, in this context, monetary reward says the latter better. Also, as I said, if the experimenter is trying to encourage, for psychological reasons, having the prize go to a designated charity, a DVD or book is not what they are likely to need.

        This isn’t based on any experimental evidence or special knowledge, just my intuition — what it feels like to me. Your opinion is as good as mine here.

  9. [Topher Cooper:]
    Bryan, I “pretended to” nothing, and that you have decided to stoop to a personal attack on my integretity,

    O.K. I regret any connotation that you were knowingly attempting to deceive readers when you claimed expertise in statistical analysis.

    [Topher:]
    My claim was that particle physics, specifically the measurement of the fundamental properties of particles (e.g., mass of the muon, the speed of the photon — until that got set as a fixed quantity against which other quantities are measured, or the mass of the electron nuetrino) […]

    I see where you had “main-stream particle physics”, and this “specifically…” part simply was not there. When comparing its experimental consistency to that Parapsychology, why specifically exclude its consistent demonstrations?

    [continued:]
    shows a high degree of inconsistency as determined by the particle physics community itself. That’s a fact — but it is a fact that conflicts with your own simplistic, sanitized view of how REAL SCIENCE works.

    Real science works on the frontier, where experimental results are likely to be inconsistent. Once researchers learn to control the variables and repeat the demonstration, further replication soon becomes uninteresting and unpublishable. Particle Physics could reach replications rates close to 1 by showing the same established results over and over.

    [Topher:]
    As for the other, there is a term outside of logic for the rhetorical falacy you committed there, its called “bait and switch” (in logic the fallacy is called “equivocation”).

    So you misunderstood my question, “Did I not ‘actually identify the flaws’ right here?” The experiments right here are the only ones we’ve looked at specifically.

    [continued:]
    We were talking about the corpus of parapsychological evidence and I pointed out that you were attacking it as flawed without any attempt of showing how it was flawed — just making a broad unsubstantiated claim. You came back with a statement that you had specifically identified flaws. When I pointed out that you had not, you came back with “Oh, I wasn’t talking about that, I was talking about something else entirely — these informal, amateur experiments that we all agree were flawed. How foolish you are to think that I was talking about what we had been discussing.”

    I’ve stated the problem with “the corpus of parapsychological evidence” over and over: It lacks a single consistent demonstration that the phenomenon under study even exists.

    Want to see if skeptics can find specific experimental flaws in specific experiments? We’re up for that. What, specifically, can you demo?

    [Topher:]
    You claim to be competent in statistics, yet you make a terrific offer “I’ll offer up to $5000 in prize money as long as I don’t actually have to risk more than $20 of it” (If you were competent, you would no that that is what the .05 level means in connection with your requirement that your expected loss under the null hypothesis could not exceed $1).

    Here’s an example prize scheme: $5000 for p=.0001 or better, $10 for .05 <= p < .0001. The expected payout is $0.999. It meets my three conditions, so it is acceptable to me.

    [continued:]
    So which is it: 1) You don’t even meet Intro to Stats levels of competence in statistics; 2) This was deliberate Flim-Flam; 3) Sometimes when one shoots from the hip one says something that was wrong.

    No, you inferred something incorrectly. My conditions limit my risk to $5000, not $20.

    • Oops — my message got cut off when I used a less-than sign.

      [Topher:]
      You claim to be competent in statistics, yet you make a terrific offer “I’ll offer up to $5000 in prize money as long as I don’t actually have to risk more than $20 of it” (If you were competent, you would no that that is what the .05 level means in connection with your requirement that your expected loss under the null hypothesis could not exceed $1).

      Here’s an example prize scheme: $5000 for p=.0001 or better, failing that, $10 for p=.05 or better. The expected payout is $0.999. It meets my three conditions, so it is acceptable to me.

      [continued:]
      So which is it: 1) You don’t even meet Intro to Stats levels of competence in statistics; 2) This was deliberate Flim-Flam; 3) Sometimes when one shoots from the hip one says something that was wrong.

      No, you inferred something incorrectly. My conditions limit my risk to $5000, not $20.

    • I really, really don’t have time for this. As a result I’ve responded quickly to stuff with my first thoughts without thinking about it. I’m going to try to be brief.

      Your withdrawal of your ill-mannered, personal attack could best be described as half-hearted and inadequate. But I’ll accept it.

      Last point first. You are right, I was shooting from the hip again and did not consider graduated payments. Some of us admiit to our errors, which has the unfortunate side-effect of making some hostile people think that this indicates weakness. Principle comes first though.

      Yes I misunderstood your statement — because you failed to say what you are now claiming you meant. If Jakob and I were talking a bit about Attilla the Hun and the topic shifted to Bryan Olson, and in reply to Jacob saying “Bryan seems to be honestly dedicated to setting up a good experiment” and I replied “He is one of the most evil people ever to have lived”, it would be my fault, not Jakob’s when he thought I was speaking about you not Attilla.

      As for your statement about what is wrong with the corpus of parapsychological evidence — you cannot simply make up a rule that is to apply to parapsychology and claim it as logically necessary. By the nature of its subject matter, and the extreme poverty of resources it has to work with, parapsychology has more problems with creating consistent conditions than most areas of study. Never-the-less its results are as consistent as other areas of science such as psychology and even particle physics. You appear to have firm metaphysical faith that the universe must be such a simple place that any phenomena under study must be completely controlled and understood within a short time span. That ain’t so. Replication is a requirement of good science, cookie-cutter turn-the-crank-and-get-a-result “repeatablility” is not — particularly when we are talking about falsification of existing beliefs rather than support for a theory. That’s a good thing, since a large percentage of cutting-edge scientific research cannot and would not be expected to meet that requirement.

      Furthermore, your claimed problem with the corpus of data is contradicted by your claim that you would be swayed by the results of a single experiment. According to your reasoning we would have to repeat the experiment many times and get the same result every time for the experiment to be valid.

      So no, you have not stated what was wrong with the previous experiments — only that you were convinced on the basis of non-rational reasons that there was *something* wrong with them. Go and find those flaws — skeptics have been trying for close to 100 years with little success except in their own minds (e.g., by setting requirements that are logically unjustified and which are not applied in any other area of scientific study).

      And yes, I did not state at that point the specific area of particle physics (i.e., most of the experimental work in particle physics) that I was referreing to. I had stated it repeatedly elsewhere.

      • “I was shooting from the hip again and did not consider graduated payments.”

        Ah, so the answer to your “Which is it:” was choice (3).

        “Furthermore, your claimed problem with the corpus of data is contradicted by your claim that you would be swayed by the results of a single experiment.”

        You’re so busy lecturing on what skeptics think that you will not hear what we say. It’s the *demo* that could sway me, not the report.

        As I wrote before, my theory is that psi does not exist, and that theory makes specific falsifiable predictions that should hold every time. I also expect Parapsychologists to continue to report that they produced anomalous results in ways that I cannot check.

        “Go and find those flaws — skeptics have been trying for close to 100 years with little success except in their own minds”

        The skeptics that I know realize there’s little chance of identifying most of the specific errors, when given only what believers decide to report. A repeatable experiment would be different; anyone could do it and see. Failing that, a demo under skeptical controls would carry some weight. Alas, we never get those, just the stories.

        • Translation: The normal scientific process of criticical examination of the evidence and replication comes up with *the wrong answer*. The experimental evidence against our view of the world is air tight and we are totally unable to find any flaws in the reporting, oversight, controls, randomization or logic of the experiments but we have utter faith that there *must* be something wrong with it. We therefore demand that parapsychology meet requirements that required of no other field and met by very few of them — requirements without any logical foundation related to the real world.

          One requirement is that anyone who has not stated before hand that they are absolutely certain that psi does not exist must take part — no other qualifications are necessary. Those who have done so in the past and gotten positive results and accepted the logical consequences of that were clearly not actually Skeptics (i.e., truth believers) in the first place — their faith was weak. Those who took part in successful experiments and came out of it with “I can’t explain it, but I still don’t believe in it” show the right immunity to evidence and are to be kept in the fold (though we’ll try not to talk about that little embarrassing incident).

          Bottom line — you are uninterested in scientific evidence. You are interested in a personal experience. I see no reason that I should spend my time on resources for no other purpose but your amusement — even if you kick in some cash.

        • “As I wrote before, my theory is that psi does not exist, and that theory makes specific falsifiable predictions that should hold every time.”

          Once again, this is not the facts. *Your* theory makes specific falsifiable predictions that should hold every time. Psi says that those predictions can be falsified — that sometimes predictions made on the basis of your theories are wrong. This has proven to be the case not just occassionally (all that is required) but frequently when enough care and attention are made.

          You rejoin that the meer falsification of your beliefs should not count as a reason to set them aside. That there might be a vast conspiracy of parapsychologists so clumsy and stupid that they keep churning out flawed experiments but so skilled and bright that they have managed to hide these flaws from hundreds of rabid reviewers and observers over a century and more.

          You are welcome to your belief — but it ain’t science. Doesn’t it bother you that you use exactly the same arguments as those who claim no connection between smoking and lung cancer, no connection between HIV and AIDS, no connection between human activity and climate change, etc.? Anyone who wants to deny clear-cut scientific evidence has to use the same tired, invalid arguments.

        • “Translation: The normal scientific process of criticical examination of the evidence and replication comes up with *the wrong answer*. The experimental evidence against our view of the world is air tight”

          What we get from Parapsychology is special pleading. Other lab sciences offer repeatable results where Parapsychology offers excuses. The reason psi goes away when we look closely is because it’s “actively evasive”. The effect only shows up for the “sheep” who believe in it, and not the “goats” who don’t.

          Actually I kind of like the “sheep” term they use.

          “Doesn’t it bother you that you use exactly the same arguments as those who claim no connection between smoking and lung cancer, no connection between HIV and AIDS”

          It doesn’t bother me because you just make this stuff up. The connections between smoking and lung cancer, HIV and AIDS, are demonstrable. Anyone can look in the same places and see the same things. They meet the standard; Parapsychology does not.

        • Yet of the two of us, you are the one applying special pleading in your case against parapsychology. Sometimes you bother to justify the special pleading as not special at all, sometimes you don’t. Real science in *any* field is not the simple, open and shut case you make it out to be.

          And yes, those connections are demonstrable, just as the claims of parapsychology are. Those who oppose acceptance of those connections use precisely the same arguments you do: a statistical deviation from chance doesn’t prove anything, some experiments that have been performed are have not stood on there own as unambiguous proof and a few have appeared to contradict the rest, those qualified to conduct the tests tend to concur with the “wrong” opinion and are therefore clearly biased and their results should be ignored, if it was true we would be in complete control of it (specifically used by opponents of the HIV/AIDS connection).

          Of course, you know the difference: you are *right* and support *truth* while those other people are *wrong* and use those arguments to support lies.

        • “Yet of the two of us, you are the one applying special pleading in your case against parapsychology.”

          In six decades or so of research, Parapsychology has not found a single consistent demonstration that what they are studying even exists. What other laboratory science has failed so abysmally and still expects to be taken seriously?

          “And yes, those connections are demonstrable, just as the claims of parapsychology are.”

          But all we get are stories and excuses. Repeatability isn’t some new condition invented by skeptics to keep Parapsychologists down; it is the standard for establishing laboratory results. We’re not going to lower the standard just because the Parapsychologists make up excuses, such as that their phenomenon is “actively evasive”, or that it only shows up for the “sheep” who believe in it.

          Wherever you got the idea that the HIV-AIDS aids connection is no more reliable than psi results, it’s just another of those ludicrous notions of which you somehow convince yourself.

        • If you are using “repeatability” in a loose, non-technical sense, then we are in complete agreement. If you are attempting to use it in the technical sense, then you are flat out wrong — perfect repeatability is not now, nor has it ever been an evidential requirement. When available it is a powerful experimental tool, but it *not* a requirement. What is a requirement is *replicability* not repeatability.

          Replicability means that experiments can be independently replicated with enough consistency to exclude it being a fluke. When one experiment replicates another it produces results in agreement with the first. Experiments are said to be “conceptual replication” when they produce consistent results using substantially different experimental procedures.

          In the real world sometimes experiments contain some subtle flaw in its conduct or circumstances, whatever care is taken. For that reason experiments should be replicated before much reliance is put in them. Actually most experiments in most fields are never precisely replicated and a surprisingly many uncontroversial experiments enter into the “canon” without ever having been even conceptually replicatable. That is undesirable but is an unfortunate consequence of the fact that simply doing the same thing as someone else and getting the result that everyone expects is not real easy to get funding for and almost impossible to get published. But that doesn’t apply to parapsychology where replications are routinely done and generally published.

          There is no logical or practical requirement that every attempted replication succeed. It is the fact that well conducted experiments sometimes fail that creates the requitement for replication in the first place. What percentage of experiments fail to replicate the overall results naturally vary depending on what you are doing. A rate of about 30% is common — for example in particle physics. Parapsychology generally does a bit better than that.

          Repeatability on the other hand is the quality of being able to routinely produce the same results every time. It adds no evidential power. It is never the less immensely desirable — the holy grail of experimental science. Why? Because it makes experiments that tell you about the phenomenon so much easier. Parapsychology along with many other fields has not succeeded in that — although the auto-ganzfeld showed signs that it was close to being developed to the point that it qualified.

          So you have confused “repeatability” which parapsychology doesn’t have (nor does the active parst of, e.g., particle physics), with replicability which parapsychology does have. I am not engaging special pleading — when you demand that parapsychology shows perfect repeatability unlike any other field and with no logical justification you are.

          Just check some real sources in the practical philosophy or sociology of science, or check out the Particle Data Groups publications, or just talk to some real experimental physics. However much you loudly assert otherwise the half page in your high school physics text on “the philosophy of science” is not all there is. The real world of experimental science is much more complex and interesting. Your ignorance of it does not constitute an effective argument against parapsychology.

        • “If you are using “repeatability” in a loose, non-technical sense, then we are in complete agreement. If you are attempting to use it in the technical sense, then you are flat out wrong”

          Real, technical repeatability is not limited to experiments by the sheep who already believe in the phenomenon. The excuse that the phenomenon is “actively evasive” is not a substitute for a repeatable result.

          “Replicability means that experiments can be independently replicated with enough consistency to exclude it being a fluke.”

          N-rays had that.

          “There is no logical or practical requirement that every attempted replication succeed.”

          Nor do I expect any such thing. But when the demo consistently fails under skeptical controls, one might take a clue.

          “So you have confused “repeatability” which parapsychology doesn’t have (nor does the active parst of, e.g., particle physics)”

          You’re still on that? Particle Physics has repeatable results; you just insist on ignoring them and looking only at what is currently under examination in the field.

          And you can spare us this “the half page in your high school physics text” rhetoric. Those “sheep/goat” and “actively evasive” excuses are actually from the peer-reviewed literature of Parapsychology.

          Parapsychology is unfalsifiable. It makes no predictions specific enough that any test could refute it. It’s not science.

        • You continue to confuse replicability and repeatability. But ignoring that, you confuse how some experimenters out of frustrations have used the phrase “actively evasive” for psi.” It has nothing to do with whether the experiments are open to the possibility that psi exits or not (“sheep” vs “goats” is a distinction made about groups of subjects in an experiment, it is not applicable to individuals — including experimenters — outside that context; your use of it as a term of disparagement amounts to ad hominim).

          N-Rays: precisely — N-Ray experiments met the criteria of replicability, though weakly. There was a disagreement about the reasons for those results. Critics didn’t just say “there must be something wrong with those results — the only experiments that count are the ones that are consistent with my beliefs”. They identified a plausible explanation for those results other than literal rays and showed that the differentiation between successful and unsuccessful experiments was due to that explanation. That’s called science as opposed to just saying “there must be something wrong with psi (or smoking/cancer or CO2/climate change or species variability/evolution etc)” which is faith.

          Please cite these experiments that “consistently fail under skeptical controls.” The history of parapsychology is pretty much taking even pretty extreme controls suggested by skeptics and showing that they had no effect on the results. Generally, of course, the controls on parapsychology experiments are tighter than some pontificating Skeptic says would be required.

          I’m “still on that” because up to now you haven’t said anything to even attempt to counter it except to deny the clear-cut facts. Now, on the other hand you are claiming that the labels of the discipline that we choose to apply to the experimental field is somehow important to the validity of the evidence. Apparently measurements of the mass of the muon is valid despite having the same or lower repeatability (repeatablility is a continuum by the way, not a dichotomy) as psi experiments because *other* experimenters who experiment with the existence of beta rays have the same label (particle physicist). That is a far more “paranormal” view of the universe than I’m willing to swallow, myself.

          Re: High school text book — I’m not arguing with your misapplication of the sheep/goat effect (the interesting but not terribly surprising experimental result that *subjects* score in conformance with how their beliefs motivate them) nor the *speculation* on the part of a few parapsychologists that psi is “actively evasive” (which has nothing to do with the belief of the experimenter). I was using that to describe the level of sophistication (very naive) of the philosophy of science you espouse so assertively.

          As to falsifiability — you continue to ignore my previous statements on that. Your beliefs (my old beliefs) have been thoroughly and completely falsified. Your insistence that *evidence* (not theory) contrary to your beliefs be “falsifiable” (whatever that means about evidence) puts your beliefs outside the range of falsifiable (i.e., scientific) hypothesis.

          If you say, “There is no such thing as the sun” and I point up on a clear day and say “there it is, unmistakably”. You cannot say “sometimes it is night and sometimes it is overcast and you can’t point to the sun, since you don’t accept that those failures your hypothesis is unfalsifiable and the times there is this big, intensely bright glowing thing in the sky must be due to some unidentified error and my belief that there is no such thing as the sun or anything like it is beyond question.”

          I’m not going to repeat this again. Unless you have something new to say about falsifiable I’m going to ignore your “proofs by vehement assertion.”

        • “You continue to confuse replicability and repeatability”

          Years ago I did, but not here.

          “Please cite these experiments that ‘consistently fail under skeptical controls.’ The history of parapsychology is pretty much taking even pretty extreme controls suggested by skeptics and showing that they had no effect on the results.”

          Geller fooled the Stanford Research Institute, but under the watch of knowledgable skeptics, he failed. Look up “Project Alpha”, where skeptics showed just how gullible some Parapsychologists were.

          “I’m ‘still on that’ because up to now you haven’t said anything to even attempt to counter it”

          Pure rubbish. Particle Physics has many reliable, repeatable demonstrations. If the mass of a muon isn’t one of them, no problem; my belief in the field is based on their consistent demonstrations, and theories that are falsifiable but unfalsified.

          Parapsychology, on the other hand, has not one single consistent demonstration that what they are studying even exists. Nor have they arrived at a falsifiable theory in the several decades they’ve been studying the matter.

          “Re: High school text book — I’m not arguing with your misapplication of the sheep/goat effect (the interesting but not terribly surprising experimental result that subjects score in conformance with how their beliefs motivate them)”

          It’s not just the subjects; look up the experimenter effect. Why did you write about Susan Blackmore, “She was sure that she was supposed to be an experimenter who could elicit psi,” if it’s just about the subjects? Are there experimenters who are not supposed to be able to elicit psi?

          “If you say, ‘There is no such thing as the sun’ and I point up on a clear day and say ‘there it is, unmistakably’. You cannot say ‘sometimes it is night and sometimes it is overcast and you can’t point to the sun,'”

          So you can demonstrate that the Sun exists. On the other hand, when I ask you what psi phenomenon you can demo, you make excuses or change the subject.

          Can you not see the difference between showing that the Sun exists by pointing to the Sun, and saying that psi exists and pointing to reports of inconsistent observations, some of which would be very unlikely if the controls were perfect?

          “I’m not going to repeat this again.”

          I’ll be convinced by the demo.

        • “‘You continue to confuse replicability and repeatability’

          Years ago I did, but not here.”

          So you are saying that your continued use of the latter as if it were the former is what? … deliberate deception?

          ————-

          You claimed a “consistent pattern” but instead of citing a survey you cite two isolated cases — which don’t even support your case. I’m not so stupid as to think that parapsychology is unique in the sciences in having no flawed experiments — but you haven’t even identified them.

          Gellar was investigated at SRI. Putthoff and Targ allowed him to do some demonstrations, which they identified as mostly, clearly faked. While unconvinced by the remainder they felt that it was worth the effort to conduct an experiment under tight controls. They conducted the experiment and got positive results. They published the results including a caveat about the likelihood that Gellar would stoop to cheating but they saw no way that there controls would allow that.

          Persei Diaconis responded that he had attended an informal demonstration by Gellar (with neither P or T present) and since the controls were inadequate then — and that he thought that magicians tricks were being used in the demo that this somehow proved the inadequacy of P & T’s controls. Bizarre logic but widely cited by organized Skepticism. One criticism of the P & T controls were published — based on the assumption that since they did not explicitly say that they did something obviously necessary that they must have done something obviously wrong. P & T demonstrated from their records (including film) that they had done the obviously correct thing.

          I would not put this high on my list strong evidence for psi (the evidential strength of any results with “extraordinary subjects” goes way down in my book, and I think most parapsychologists agree with me) but it does not illustrate your point. There were no controls put forward by critics that their single experiment lacked and there was certainly no attempt by them or anyone else to repeat the experiment with stronger controls at which point the effect seen disappeared.

          Project Alpha — no experiment at all and it didn’t involve parapsychologists except as critics. Two physicists without any training in parapsychology were given a wad of money to study psi. The amazing Randi sent in some young magicians to fool them. His “Alpha Boys” did some demonstrations for the physicists who thought that there might be something worth testing. Randi unsolicited made some suggestions about controls at about the same time that the physicists brought film and descriptions to the annual Parapsychological Association meeting asking for confirmation that this was something of interest. The response was really negative — I’m sorry to say even rude — but it was made clear that there really was no evidence that anything real was taking place and were given some controls to include in the further demonstrations. About the same time Randi put forward some (but not all) of the same suggestions. When the parapsychologists controls were used the “Alpha Boys” stopped being able to perform and so no experiment was ever conducted.

          When it was clear that the “Alpha Boys” were no longer accomplishing anything, Randi waited until the experimenters were out of the country at a physics conference and called a press conference where he revealed “all.” He later gave a reward to the experimenters for having the intelligence to follow “his” suggestions.

          Randi’s account never says anything that contradicts this but leaves people a very different impression.

          You’re 0 for 2 in proving the existence of a consistent pattern.

          I might be able to dig up a few cases of skeptics finding problems in parapsychological experiments that then proved to be the source of the results. The vast majority of criticisms though, as in any field, comes from parapsychologists. Most Skeptics criticism has, however, turned out to be either completely invalid or failed to explain the results.

          ———–

          I think you misunderstand, though I’ve said it clearly enough. Particle physics certainly has consistent results. For the vast majority of contemporary work on the properties of particles — take your pick, the mass of any of the particles, their lifetime, etc. — the consistency is gotten by removing the “outliers” amounting to about 30% of the studies. Parapsychology also needs to remove a (somewhat smaller) percentage of outliers to get a consistent measurement. The difference is that in particle physics there is not a vehement group of people who insist that the only acceptable value is one outside the range of demonstrated by the consistent majority of experiments.

          Oh, there is another difference. Parapsychological journals have a policy of accepting articles for publication on the basis of the experimental design before the experiment is conducted so that even “unsuccessful” experiments get published. This is not true in particle physics.

          If you know about this major group of current research in particle physics with virtually no failed experiments please post the reference here. Also inform the Particle Data Group — they would be thrilled.

          —————–

          Are you claiming that parapsychology needs to be the only experimental field that requires no skills and talent on the part of its researchers? It was Blackmore’s foolishness that she was convinced that the only skill needed was a positive belief.

          When we are talking about encouraging people to perform well at what appears to be a delicate task a positive attitude does seem to be a reasonable requirement. At least its felt that way in psychology (note — no “para”). Clearly someone who manages to communicate that they would be rather displeased if there was a positive result can’t expect the same response as those who are able to be honestly encouraging. The “experimenter effect” has been well demonstrated in psychology — subjects tend to give experimenters what they want whether or not the experimenter has consciously communicated that to the subjects. Dealing with human subjects — in any field — requires certain hard to define skills in human interaction.

          But this is not confined to psychology, sociology, anthropology, etc.

          In physics, I think it was Fermi who was rather famous for screwing up any experiment he was involved with. To the extent that physicists used to joke whenever they started getting weird results that “Fermi must have entered the building”.

          —————–

          I’ve shown you the “sun” over and over again. You have replied that you will only believe it if it shows up when and where you want it to. That’s not science.

          I’m sure that you are sincere in your current belief that the demo would convince you. But I find it hard to believe that when “push comes to shove” that you, who so easily has dismissed thousands of large, well controlled experiments without ever looking at them, would look at this outcome and not just shrug and say “son-of-a-gun a 1 in 20 chance” or “I musta missed something.”

          Furthermore, if you did become convinced then that would be just one more person who was no longer a skeptic. By your logic that should have no conceivable influence on anyone else. It is that the experiment you personally designed convinces you. Science is about collective evidence not personal revelation. Why should I waste my precious time (a great deal of time and effort to be done properly) for your personal edification?

          And no, you don’t have to repeat yourself. I got the message — you want personal proof not scientific evidence.

  10. “That’s devestating counter-argument to a really stupid claim that I never made. ”

    That was you pretending that Particle Physics doesn’t get results any more consistent that Parapsychology wasn’t it? Particle physics has contributed greatly to our understanding of the natural world, with repeatable demonstrations that the predictions of testable theories are correct. Parapsychology, not so much. You’re not going to get anywhere trying to bring Physics down to your level.

    >> Did I not ‘actually identify the flaws’ right here?
    > No, you did not. Using slightly more specific terms for “experimental flaws” does not identify the flaws.

    The trials are not independent and the protocol allows cheating. Flaws actually identified, right here. I also explained how the first two elements here lacked independent trials. (But I missed the sensory leakage in the first one.)

    In the analysis of the first experiment, I identified the flaws in your statistics: they do not mean what you thought they meant. On another page you had claimed, “My specific area of expertise is statistical analysis of psi experiments and statistical computing.” Well I’m not expert in statistics; I’m competent, and that’s enough to bust you as pretender you are. And you have the nerve to say I don’t ‘actually identify the flaws’.

    • Bryan, I “pretended to” nothing, and that you have decided to stoop to a personal attack on my integretity, its pretty clear that you are getting desperate. This is a classic pattern in debates between Skeptics and parapsychologists. The Skeptic keeps presenting their heartfelt beliefs bolstered by cherry-picked quotes, while the parapsychologist keeps presenting facts and logic and eventually the Skeptic ends up making unsubstantiated claims of lying, fraud and stupidity against the parapsychologist. Did I say unsubstantiated — not unsubstantiated at all, its all substantiated by the fact that the alternative is that the Skeptic would otherwise be simply wrong about something he/she have utter and complete faith in.

      My claim was that particle physics, specifically the measurement of the fundamental properties of particles (e.g., mass of the muon, the speed of the photon — until that got set as a fixed quantity against which other quantities are measured, or the mass of the electron nuetrino) shows a high degree of inconsistency as determined by the particle physics community itself. That’s a fact — but it is a fact that conflicts with your own simplistic, sanitized view of how REAL SCIENCE works. Instead of checking the facts you attack me — now there is rational, critical thinking for you.

      As for the other, there is a term outside of logic for the rhetorical falacy you committed there, its called “bait and switch” (in logic the fallacy is called “equivocation”). We were talking about the corpus of parapsychological evidence and I pointed out that you were attacking it as flawed without any attempt of showing how it was flawed — just making a broad unsubstantiated claim. You came back with a statement that you had specifically identified flaws. When I pointed out that you had not, you came back with “Oh, I wasn’t talking about that, I was talking about something else entirely — these informal, amateur experiments that we all agree were flawed. How foolish you are to think that I was talking about what we had been discussing.”

      You have not shown that I am a “pretender.” What you have shown is that I should have known better than to treat someone who is interested in scoring points as if he were someone who, however much prior bias he has, is interested in a discussion that might lead to discovering some truth. My mistake was to treat this as a cooperative discussion rather than an adversarial one, and to share my thoughts with you after the few spurts of 10 or 15 minutes of work that I could devote to this.

      You claim to be competent in statistics, yet you make a terrific offer “I’ll offer up to $5000 in prize money as long as I don’t actually have to risk more than $20 of it” (If you were competent, you would no that that is what the .05 level means in connection with your requirement that your expected loss under the null hypothesis could not exceed $1). So which is it: 1) You don’t even meet Intro to Stats levels of competence in statistics; 2) This was deliberate Flim-Flam; 3) Sometimes when one shoots from the hip one says something that was wrong.

      I’m not interested in sparring — come back when you can show some manners and I may continue the discussion.

  11. “No comment yet on my design for practical experiment.”

    As a way of playing with parapsychology its a start. As a serious experiment that could be included in the corpus of parapsychological experiments it doesn’t come close to the minimum requred for publication in a parapsychology journal. At best you have described an idea for a design but haven’t presented a design yet at all.

    I was sort of waiting till you fleshed things out a bit more.

    Besides what is the point — I don’t think that you would find positive results the least bit convincing. Do you claim otherwise.

    In any case, although sometimes there are surprises this design is unlikely to produce a strong enough effect to have much chance of producing a significant outcome. Beyond the many negatives (from the viewpoint of what is thought to be psi-conducive) of an on-line experiment (e.g., a lot of personal attention directed at the percipient or agent) forced-choice experiments produce much weaker results than free-response, and numbers are among the worst targets I can think of (except, perhaps, for that tiny fraction of the population, mostly “lightning calculators” and many autistic who associate a distinct personality to each and every number).

    • “As a way of playing with parapsychology its a start.”

      Great. I’m one start ahead of all other ideas presented here.

      “this design is unlikely to produce a strong enough effect […] numbers are among the worst targets I can think of”

      No disagreement there; that’s why I suggested words or figures as alternatives. As I explained: “I can fill in stuff like how to assign numbers randomly and independently, use each number in just one trial, and annalize the significance of various outcomes. On the other hand, as a skeptic, I’m not a in position to say what kinds of targets psi ought to able to identify. Would dowsers rather have half the numbers printed in red and half in blue? Would words, or maybe figures, be more appealing identifiers for remote viewers? That kind of thing is all the same to me.”

      If my experiment is unlikely to show psi, well, what experiment can we do that *is* likely to show psi? My design is “forced choice” because I did not see any anyone object to the forced-choice aspect of Jacob’s first three experiments, and it supports objective scoring. An objective scoring method is a requirement for valid analysis; forced-choice is merely a means to that end. If you have an objective test based on free-response, please state it.

      “I don’t think that you would find positive results the least bit convincing. Do you claim otherwise.”

      It’s like I already said: Let me impose secure controls, and I’ll not only find positive results a bit convincing, I’ll pay money for them.

      Jacob proved that he did not change the targets in his first two experiments by showing, in advance, the SHA-1 hashes of the correct answers. Good idea. I offer to take it to the next level, if you and Jacob are game. I’ll generate the targets, and provide cryptographic commitments so I cannot change them. (SHA-1 is now on shaky ground; I’ll provide it, but also SHA-256 and SHA-512.)

      You, meaning the pro-psi side, run the test, collect responses from subjects, and cryptographically commit to some statistic of record. It can be any stat you choose, so long as we can objectively compute a p-value against the target. We commit to how we will mechanically score the result, then we each reveal our data.

      I’ll offer a prize for a positive result, but I have to impose three more rules on the money: First, you have to hit at least statistical significance, meaning I pay nothing if the test of record does not hit at least a 5% chance. Second, I pay at most $5,000; an amount chosen because I can afford it. Third, my statistically expected payout given chance results has to be at most one dollar.

      If I’ve been unclear, please ask specific questions. I may have glossed over things, but the offer is in good faith. What psi phenomenon can you actually demo? I’ll pay money for a real demo of the real thing.

  12. “The thing is, in six or seven decades of fairly serious research, Parapsychologists have not found a single repeatable demonstration that what they are studying even exists.”

    We have different definitions of “repeatable.” I use the (I suppose unorthodox) definition “demonstrably capable of being repeated”, while you apparently mean “so well understood and controlled that a clear cut demonstration can be guarenteed every time”.

    There is apparently a law of physics (perhaps something to do with thermodynamics) that I am unaware of but that you are familiar with that states that “Any real phenomena can be understood and controled, however small the available resources and however complex and inaccessible the associated systems in X years of study” Where X is unspecified but is much less than 60. Perhaps you would enlighten us by presenting the evidence for this broad-reaching and obviously very fundamental law.

    “I mean, come on Topher, you are claiming that I’m armoring myself against falsification of my beliefs, while I’m saying ‘bring it on.'”

    Now *that’s* a phrase with unfortunate echos these days. In any case all that demonstrates is absolute confidence in your armor — as well yiou should. There are thousands of meticulous experiments out there that are much more rigorous then could possibly be set up under these conditions that you feel confident can be dismissed since they just must have a flaw that nobody has spotted yet. All that a successful experiment could demonstrate to you was “gee, there’s another one — and I can’t spot that flaw either”.

    • “When those same exact techniques are applied to parapsychological experiments (by viewing, for example, ESP experiments as, essentially, measuring the rate of guessing) the consistency measures are found to be slightly better than in particle physics.”

      Before 16 July 1945, no one had seen a fission explosion. The first test of an atomic bomb worked. Seems those Physicists with their particles were on to something.

      “Of course this is completely unlike the virtually identical arguments made against the tobaco/lung-cancer connection […]”.

      Why did research funded by tobacco companies find no connection between smoking and lung cancer, contrary to the findings of others? The paranormal sheep-goat effect suggests a theory: perhaps the interaction of carcinogens with lung tissue depends on the beliefs of any scientist who happens to be observing. My theory is different: one side got it wrong. Likewise, I think Parapsychologists get both right results and wrong results; sadly, they believe the wrong ones.

      “It must be “folly” and we must be “gullible” because the alternative would be that you are mistaken (gasp!). It is therefore completely unnecessary to actually identify the flaws that those gullible fools overlooked, we can all be quite sure that they are there because the alternative is quite unthinkable.”

      Did I not “actually identify the flaws” right here? If I don’t conform to your ideas of what a skeptic is and what skeptics do, well, we’ll both just live with that.

      No comment yet on my design for practical experiment.

      • “Before 16 July 1945, no one had seen a fission explosion. The first test of an atomic bomb worked. Seems those Physicists with their particles were on to something.”

        That’s devestating counter-argument to a really stupid claim that I never made. That’s called a “straw-man” argument and it is one of the classical rhetorical fallacies.

        Most scientific experiments deal with subtle effects. Some do not — they just deal with the highly deterministic effects of novel but fairly flexible conditions. The results of those experiments are reproduced with fair reliability and are easily adapted to educational laboratory demonstrations and engineering applications.

        The production of the first fission bombs was such a case. After some rather simple experiments (called something like “tweaking the tiger’s tail”) that produced what might be called suppressed fission “events” (“explosion” in either case is more a metaphor than an accurate description) they learned enough to think that there was a chance that a “bomb” might be possible. The follow-on was a monumental engineering effort (I would guesstimate that the effort in a week of the Manhatten Project far exceeded all the resources available in parapsychologies entire history) to make it a reality. That work was almost entirely an engineering effort — most of it was engaged in trying to figure out how to refine sufficient quantities of uranium or plutonium.

        I don’t have the figures but I suspect that the measurements of the neutron cross-section of uranium (the tweaking the tiger’s tail or whatever it was called that I mentioned above) showed the normal variablility (as well as one fatal accident that lead to the first experiment in exactly how very high doses of radiation kill). They would have thrown out the results that didn’t fit and combined the ones that did. Why do I suspect that? Because that is how most experiments work. But maybe this was an unusual exception — I won’t argue it with you because neither of us know.

        When the famous test was made at Alamagordo, there was no certainty among any involved whether or not it would work, and only the vaguest ideas about what its yield would be if it did (Fermi famously was making book before the test on whether it would work and, mostly humorously, whether if it did work whether it would produce enough heat to cause a fusion reaction in the atmospheric Nitrogen and destroy the world). After the test the scientists begged the politicians not to use the bombs on Japan because, besides any humanitarian concerns, there was no assurance that they would work.

        Some effects are large. The history of the end of WWII would have been very different development of the atomic bomb depended on completely consistent experiments on, say the precise lifetime of a free neutron.

        Most scientific experiments are not repeatable with near perfect confidence, though *some* are. The Particle Data Group throws out about 40% of the relevant experiments as “outliers”. Some surveys indicate that that is fairly typical in all scientific disciplines. It is only the critics of parapsychology (along with the critics of smoking-cancer, anthrogenic climate change, HIV-AIDS, etc) that argue that the observation of a cloudless day *proves* that there are no such things as clouds.

      • “Did I not ‘actually identify the flaws’ right here?”

        No, you did not. Using slightly more specific terms for “experimental flaws” does not identify the flaws. If you want to blame, for example, “inadequate controls” you need to identify in what way the controls are inadequate, that is, you need to show how the controls actually used do not rule out conventional explanations. Simply stating with only “proof by vehement assertion” that there must be something wrong with the controls is worthless and meaningless.

        Watch, “Anthrogenic climate change is unproven because there is something wrong with the models it uses. That’s proven since the details of the different models’ predictions differ even if they all are wildly inconsistent with there being no meaningful effect of human activity on the climate”. Ta da, I have just eliminated all the evidence for “Global Warning” in a stroke, without having to bother doing the work of examining the evidence and finding anything wrong with it. Never mind that the variation seen in the models is precisely what is expected given our current state of knowledge. I just need to bang the table and say that if the climatologists knew what they were doing all the models would agree completely — the existence of any uncertainty proves that everything is compeltely uncertain and nothing is known. Anthrogenic climate change is a *myth* (I trust noone will quote me out of context on that nonesense).

        “If I don’t conform to your ideas of what a skeptic is and what skeptics do, well, we’ll both just live with that.”

        I suppose so. You do conform to my ideas of, for lack of a better term I call “organized Skepticism”, which is a social group who have adopted the label and postive reputation of philosophical skepticism (and generally firmly believe that its justified) but see no reason to adopt its requirements. They tend to sit around in magazines like SI and groups like sci.skptic and assure each other that since they are Skeptics that whatever they believe must be the product of critical thinking so they don’t have to bother actually doing any.

        Terms that others have used is “pseudo-skepticism”, scientism, True Disbelievers, and simply Skepticism with a capital S. None of these capture the group dynamics which I think is essential to understanding the phenomenon.

  13. I was kind of smug with that, “what bounds can you put on the probability you identify?” I should stop doing that. People hate smug.

    I dug up that Radin quote thinking it would sway you and Jacob more than it did. It’s so on-point. A security weakness gets exploited “no matter how apparently inconsequential that weakness may be”. He called it a “lesson learned”. He put it in italics.

    As a skeptic, my a-priori estimate of likelihood of psi significantly effecting an experiment is low. Still, there are lower chances. No powerful outside organization is covertly altering results to fool the world’s Parapsychology labs.

    The possibility I see as realistic is far more ordinary. Parapsychology as a whole has no predictive theory; any significant deviation from what a perfect experiment would produce by chance counts as a positive result. I suspect Parapsychologists are fooled by ordinary consequences of defects in their methods, as have been many other scientists in many other branches of science.

    On the “Bryan has already stated…” thing: I was explicitly not suggesting any dishonesty. You were right to say that I am unconvinced by the experiments you note, but the way you phrased things, the labeling of those experiments as highly rigorous was of ambiguous authorship. Likewise, I think there is more to my skepticism than that the experiments sometimes fail. Perhaps I made too big a deal of it; readers know that each side might not describe the other’s position as the other understands it.

    We seem to agree that here, the three experiments so far are too flawed to make a good case for psi. We both have day jobs, but it seems plausible that we and Jacob, and anyone else who is serious, could work through designing a reasonable experiment. I responded to Jacob on sci.skeptic with a design I think to be valid, while meeting Jacob’s requirement for a physical target. I’m open to alternatives. If you let me build the targets and keep them secret until the results are in, I could even put up a bit of prize money based on the odds against the results. Not a million dollars though, sorry.

    • Bryan, this is quote from your answer on sci.skeptic:

      > Given photos of the backs of the cards, I would not expect people
      > to be able to detect what is on the front, but I think they could
      > tell, if they tried, when they are being shown the same photo
      > versus a different photo. That prevents the trials from being
      > truly independent.

      Ok, I can agree with that, especially since I give away some statistics, which Topher showed can significantly alter the results.

      > How about creating a print-out in advance, where each number from
      > 1 to 10,000 or so appears independently with probability 1/2.
      > Seal the printout. Use each number in just one trial, where the
      > subject guesses whether the given number appears in the printout.
      > You got about 1500 trials for your first experiment. You could be
      > ready for ten times that many using just a dozen sheets of paper.

      I didn’t quite understand what you meant about the printout. What should be printed and what question will be presented to the participants?

      I’d be glad to use your help as well, since we all have day jobs, as you noted.

      • The print-out is just a list of numbers. An example question would be, “is 3491 among the numbers printed on the list in the envelope?”

        Let me describe a variation: You show a picture of a red envelope and a blue envelope. Each holds and hides a printout, a list of numbers. Each number from 1 to 10000 is either on the red printout or the blue printout, never both. In a trial, the subject is given a number, and asked to tell which envelope holds that number.

        I’ve glossed over some things, but that’s the basic straw-man design. My three major goals were to support statistical analysis with many independent trials, to supply a physical target, and to be inexpensive to implement.

        I can fill in stuff like how to assign numbers randomly and independently, use each number in just one trial, and annalize the significance of various outcomes. On the other hand, as a skeptic, I’m not a in position to say what kinds of targets psi ought to able to identify. Would dowsers rather have half the numbers printed in red and half in blue? Would words, or maybe figures, be more appealing identifiers for remote viewers? That kind of thing is all the same to me.

    • “As a skeptic, my a-priori estimate of likelihood of psi significantly effecting an experiment is low.”

      That doesn’t make you a skeptic, that makes you a doubter. Being a skeptic doesn’t mean that you have strong a-priori belief one way or another. It means in the absence of evidence you remain neutral.

      I’ll try one more time.

      There are two kinds of scientific hypotheses. One kind is a statement of a particular model about the way things work. The other, more fundamental, kind is a “falsification hypothesis”. That is simply the hypothesis that another theory is false. Falsification is fundamental to science and every falsification experiment is rooted in a falsification hypothesis.

      Psi is a falsification hypothesis about some very long held and rarely questioned assumptions in science. Broadly speaking, its that information (or, equivalently, affect) cannot flow between physically isolated systems.

      In an ideal world, falsification requires only a single instance of events contrary to the hypothesis being falsified. In the real world, there are always uncontrolled factors that may cause a single experiment to be deceptive, so replication is required.

      Experiments have shown the isolatability assumption to be false. Those experiments have been replicated, not once, not occasionally, but thousands of times. Not all attempts at replication of the falsification have succeeded — we do not understand exactly why and under what conditions the isolatability assumption fails, and we are dealing with some of the most complex systems known in the universe (groups of human beings).

      High-school textbooks and simplistic science popularizations would lead you to believe that after (perhaps) a short period of getting an experiment “right” replication is routine but that is not the way real science works. In fact (and this is a fact) parapsychology experiments have been shown to have, despite the greater difficulties they face, roughly the same inter-experimental consistency (i.e., replicability) as main-stream particle physics (by roughly, I mean that parapsychology experiments are actually slightly but not quite significantly more consistent than particle physics).

      When you say that you have confidence that the evidence that you have confidence that the experiments that contradict your prior beliefs must have some errors in them, without identifying the nature of those errors or showing the experimental flaws that allow those errors you are not being a skeptic. You are simply bolting 18″ steel anti-falsification armor around your beliefs. I know that you are sure that there *must* be an error — just as the climate change denier is sure that there *must* be an error in the evidence for climate change; just as the creationist knows that there *must* be an error in the evidence for evolution; just as the HIV-AIDS rejector knows that there must be an error in the evidence for a causative connection between them; just as the Holocaust denier is sure that there *must* be an error in the evidence for that tragedy.

      I did not believe in psi. I looked at the evidence. Changing my beliefs was emotionally difficult — I’m a pretty traditional, pro-science type — but I hold to the value that in science evidence takes precedence over long held, untested, beliefs.

      • “That doesn’t make you a skeptic, that…” Hey, we feel the same about you guys. Psychic believers falling for each other’s folly and calling it “peer review” doesn’t make you scientists, just gullible.

        “Psi is a falsification hypothesis…” and skeptics offer a million dollar prize if you can actually show it just once — oops, no, my mistake, actually twice; there’s a preliminary challenge followed by a formal challenge. “Armor around your beliefs”? We’ll pay you money to show those beliefs wrong when we’re actually able to check.

        “High-school textbooks and simplistic science popularizations would lead you to believe that after (perhaps) a short period of getting an experiment ‘right’ replication is routine but that is not the way real science works.” Hmmm… if it’s so haphazard, how could engineers use scientific discoveries to build stuff? Want to transfer a thought from your mind to another? The Internet works for that, and we both know it. ESP, not so much. Real science gets reliable results.

        I’m not saying that reliable results are easy and quick. The thing is, in six or seven decades of fairly serious research, Parapsychologists have not found a single repeatable demonstration that what they are studying even exists.

        “Parapsychology experiments have been shown to have, despite the greater difficulties they face, roughly the same inter-experimental consistency (i.e., replicability) as main-stream particle physics”. Been shown has it? Gee, was it shown in a journal of “main-stream particle physics”?

        We disagree about what Parapsychology has shown. No problem. Jacob wants to do experiments. I proposed one,, designed to be valid, practical, and to meet Jacob’s requirement for a physical target. Critique it. Propose another. If you’ll let me impose adversarial controls, I’ll put up a bit of prize money for long-odds results. There are conditions, but try me.

        I mean, come on Topher, you are claiming that I’m armoring myself against falsification of my beliefs, while I’m saying “bring it on.”

        You’ve written about how psi results have been replicated. Great. What should we do to actually see one? You have described the sheep-goat effect as “moderately consistent”. Sounds great; let’s test it. You wrote of multiple experiments finding psi results correlated with the local geomagnetic field. I’m game. Name your measure. Predict the sheep can outscore the goats as some function of the planet’s magnetic field — whatever. I predict that if you let me impose controls that ensure no one can cheat, the score will be as chance and non-paranormal science expect.

        Bring it on.

        • “‘That doesn’t make you a skeptic, that…’ Hey, we feel the same about you guys. Psychic believers falling for each other’s folly and calling it ‘peer review’ doesn’t make you scientists, just gullible.”

          This isn’t about what I feel, its about what the definition of what the philosophy whose followers are called “skeptics” is. It simply does not mean assigning a high prior probability to any belief positive or negative, conventional or unconventional, reasonable or unreasonable. In fact, it means the opposite. I’m not saying here (here at least) that your stance is not reasonable or rational or even incorrect — I’m just saying that it doesn’t make you a skeptic.

          It must be “folly” and we must be “gullible” because the alternative would be that you are mistaken (gasp!). It is therefore completely unnecessary to actually identify the flaws that those gullible fools overlooked, we can all be quite sure that they are there because the alternative is quite unthinkable.

          Of course this is completely unlike the virtually identical arguments made against the tobaco/lung-cancer connection and against anthrogenic climate change — its all a big club where the proponents of an effect are busy peer reviewing each other’s obviously flawed results so we can comfortably ignore them. Why is it different? Well because you are arguing for what’s *true* while the tobaco and oil companies and their many (mostly lay) supporters are arguing for what is *false*.

        • “‘Parapsychology experiments have been shown to have, despite the greater difficulties they face, roughly the same inter-experimental consistency (i.e., replicability) as main-stream particle physics’. Been shown has it? Gee, was it shown in a journal of ‘main-stream particle physics'”?

          Yes and no. There is no more reason to demand that parapsychology results be publishied in main-stream particle physics journals than that biology results are. Its easy to dismiss a scientific result when your stand is “a failure to reject the results is sufficient evidence of incompetence (gullibility).” It makes things so much easier — you don’t have to bother actually doing any critical examination of the evidence.

          Anyway, the Particle Data Group, an international collabrative of high energy physicists, each year publishes “The Review of Particle Physics” (previously the “Review of Particle Properties”), in which the summarize the current state of knowledge of particles, most especially their measured properties. It is published in collaboration with major physics journals. It is *very* mainstream, in fact it is the most heavily cited source in high energy physics.

          In order to derive their statistical summary of the best estimates of the properties of particles they must evaluate and reconcile the many studies that bear on each property. To do so, they have developed procedures that describe the consistency of the body of literature in question so that they can reject those experiments that are inconsistent with the overall body of work. The results are surprisingly inconsistent and a fair amount of data is simply thrown out.

          When those same exact techniques are applied to parapsychological experiments (by viewing, for example, ESP experiments as, essentially, measuring the rate of guessing) the consistency measures are found to be slightly better than in particle physics.

          That is just a fact — your opinion, or for that matter, mine is irrelevant. The foundation of all science is observed data. You can stand there all you want and say “The Earth is no warmer than it ever was” and the facts remain the same.

          Of course, we can have disagreements about how to interpret the facts — you could show, for example, why the analysis of consistency in parapsychology is deceptively high or that the apparent level of inconsistency in particle physics is somehow different from that in parapsychology. But unless you deal with the facts, and actually demonstrate a real difficulty in applying the analysis you are not making a rational argument — just stating your prior belief.

  14. I don’t know if you saw my reply on sci.skeptic, but here’s way I said:

    “Given photos of the backs of the cards, I would not expect people to be able to detect what is on the front, but I think they could tell, if they tried, when they are being shown the same photo versus a different photo. That prevents the trials from being truly independent.”

    Your sci.skeptic question left out the part about subjects being told how they scored after 20 trials. With that additional information, I expect a clever cheater could, in a number of 20-cards runs, figure out the colors of cards. I have not tested this, and would need your permission to do so.

    Is this possibility a serious problem? One of the “Lessons learned” in Dean Radin’s preliminary analysis of his experiments at GotPsi.com reads “/Any weakness in a freely accessible, Web-based experiment will be exploited, no matter how apparently inconsequential that weakness may be./ [emphasis in original]” http://www.boundary.org/articles/GotPsi-public.pdf

    Dean Radin is actually wrong to make that claim; nevertheless, the defensive posture it suggests is well justified. Many times, attackers have gone far beyond what system designers thought they can or would do.

    • Brian, I’ve sent you and Topher an email about this experiment’s setup about a week ago and have not received a reply from any of you. So I went with it how I saw it.

      I know that you could possibly remember 36 cards if they look a bit differently. If you do 200 trials and pay close attention you might be able to do that.

      I’ve also allowed seeing the stats only after 20 exactly for the reason to prevent fraudulent attempts to calculate the colors based on stats. It would seem to me that would require a large number of trials to calculate which is which with 20 trial statistics and 36 cards with only 2 choices.

      There’s still one way that I know of where the system can be fooled using technical knowledge. I would be able to detect it in the post-analysis, but then again, it’s not 100-percent proof and someone with a good technical background and some spare time could do it.

      But being an Internet study, it’s already not controlled laboratory conditions and some of the information people present, like their age, gender etc cannot be verified by me at all.

      Apart from that I believe this experiment is pretty well designed in regards to the problems that plagued the first two. I definitely learned a lot from your, Topher’s and other comments about parapsychology experiments design issues.

      Do you want or Topher want to help me with the analysis of exp 2? You two seem to be better acquainted with statistical analysis than I am.

      • I feel bad about never finishing and sending you my discussion about your experimental design. I’ve been completely snowed under at work. The problem, I think, lies not so much conscious efforts to memorize as unconscious recognition leading to some success in the task. Things like the precise registration of the edge of the card with the pixel positions can influence the look of the image. Furthermore, anyone determined to break this protocol can do so pretty easily — same pixels in the image means the same card.

        Because feedback is the “statistics” over 20 trials — not trial by trial feedback — it will take a lot of runs to accumulate enough information, whether consciously or unconsciously, to significantly bias the results of a run.

        This experiment is not rigorous — but unlike the first experiment it is, I think, interesting. You cannot eliminate these effects — and so cannot say psi occurred — but the alternative explanations are of low enough probability (not tiny, by any means, but fairly low) that its reasonable, and rational to speculate that *if* psi exists and has the characteristics found in other, more rigorous, experiments then there is some reason to suppose that a positive result *might* be psi operating.

        Nothing to convince a skeptic (after all Bryan has already stated elsewhere that thousands of successful, highly rigorous experiments are irrelevant since some experiments have failed to show an effect; if those experiments won’t convince a fairly reasonable skeptic like Bryan, there is little chance that anything you do would convince the kind of “skeptic” I most frequently run into), but something interesting.

        • About your assessment: “the alternative explanations are of low enough probability (not tiny, by any means, but fairly low)”: What bounds can you put on the probability you identify, and what is the evidence that justifies those bounds?

          Is this the only case where such flaws were dismissed for being of low probability? How many defects were as real, but not as evident in the reporting of the experiment?

          On this business, “Bryan has already stated […]”, I can see how you might honestly believe your description there about what I stated. Nevertheless, I like to think my writing is at least clear enough to justify quoting/linking what I did write.

        • We are in the area of Bayesian statistics or “subjective probability”. The statement “…explanations are of low enough probability” is a totally meaningless one in traditional (frequentist) probability theory. You can attach whatever probability you want to the alternatives (deliberate or unconscious use of information about repeated targets) given the *assumption* that psi exists (which is what I was speaking of). That assumption is necessary since this experiment cannot be treated as evidential of psi.

          My thinking was that given the subtlety of the information transmission and the apparent difficulty of deliberately decoding it there was a reasonable chance that effects might be due to psi if we assume that psi occurs and has the characteristics previously demonstrated. The question is how likely (a “Bayesian prior”) is it that someone would put the large effort necessary to significantly distort the overall results by cheating? Even if it ends up fairly high, say .75 cheat, .25 psi (remember this is contingent on an *assumption* that psi exists and is, though small, not uncommon under these circumstances) then we can still say things like “If this *is* psi then it is interesting that …”. In the first experiment this was not possible because the probability of call bias overwhelming any psi effects (under the same assumption) was so high as to make contingent statements about psi effects meaningless.

          However, this is no longer, I’m afraid, my stand on this experiment. I have found an easy method to cheat on this test not requiring very many runs but producing a significantly high hit rate than.

          As for your question “Is this the only case where flaws were dismissed for being of low probability…?” I’m not sure what you mean. There is always the possibility in any experiment in any field whatsoever that an experiment might have been marred by very low probability flaws. For example, any experiment’s results (positive or negative) can be attributed to an elaborate conspiracy by a powerful outside organization intent on producing false results, who break into the lab or the equipment supply room and tamper with the equipment, samples or whatever. Those low probability, unknown, special reasons are why we do replications (when replications are actually done — much rarer in real science than in textbooks).

          Here though, the probability is much too high (even in my original evaluation) to admit the results as evidence for the existence of psi. Given reasonable prior doubt about the existence of psi, any deviation from chance expectation in this experiment could only be seen as evidence of conscious or unconscious sensory contamination.

          When I talk about the experimental evidence for psi I am speaking of a body of highly rigorous experiments that are consistent with the psi hypothesis — i.e., that falsify the conventional assumptions about the physical isolatability of information.

          As to quoting you. Sorry. I thought you *were* very clear, and multiple times. I didn’t think my summary would be the least bit controversial, even to you. When I have time (I’m way behind in my day job) I will find one of the statements (and context) and quote it.

      • Jacob, I see I overlooked your e-mail. Sorry. I get tons of spam, and it got marked as probably junk.

        I generally prefer to discuss these things on a public forum, and we had a thread going on sci.skeptic.

        Cheating well in experiment 3 would take several hours work, building scripts. Minimizing the number of runs required might be an interesting math problem, but after writing the scripts, runs are quick, and a straightforward solution will usually find all the cards after 35 random runs.

        Could we tell if someone cheated enough to fake the results? In this experiment, I do not see how to distinguish clever cheating from Psi. Net identities are virtual. The ~35 runs can appear to come from multiple users, with different addresses. They can consist of truly random guesses. After those runs, the attacker knows all the answers, and can then fabricate other users that score any way he chooses.

        Faking user interaction is a reasonably well-studied problem because of the money scammed in web click-fraud. If you really have a robust method for detecting it, the Internet advertising industry will pay you a hundred million dollars.

        On analysis of your second experiment2, I do how to do know how to do really worthwhile “analysis”. Were it up to me, I would choose the most direct possible reporting of the data. I’m happy to help, but remember that I am a skeptic who criticized this experiment from start. Whether my submissions are “help” would be a subjective matter, on which we might disagree.


        –Bryan

  15. My brother recommended I may like this website. He was entirely right. This publish truly made my day. You cann’t imagine just how much time I had spent for this information! Thanks!

  16. If you are going for most excellent contents like I do, just visit this site every day as it gives feature contents, thanks