Pages Navigation Menu

Parapsychology articles and news

Results to Psi Experiment 1 published

I’m glad to announce that after a lot of work with the results of the first Psi Experiment, I’ve now posted the results at results of the first psi experiment. Despite some bad experiment design decisions there are still some interesting things to learn about demographics, psychological biases and which methods work better.

This is also an opportunity to remind you to take part in the second psi experiment if you have not already done so. Tell your friends to participate as well.

Discuss the results in the comments.





  1. It appears I am psychic. I’m going to buy that lottery ticket tonight.

  2. I predict most people will select “B” the scissors.

  3. Not sure what the memo thing is about. Was it that you never heard that laboratories are supposed to produce repeatable results, and though observations in the wild and controlled laboratory tests were the same? Or were you on a tangent unrelated to the paragraph to which you were supposedly responding?

    Particle physicists get repeatable results. Whatever rate one gets, if others do the same experiment, they should get that too. They accept no theories of sheepons and goatons reacting only to the particle physicists who believe in them.

    “Oh yes. And what about the other memo? The one to Nature that instructs it to make sure that all phenomena […].” No, no, no. The memo you missed went the other way: it came *from* nature. A world that works on physical principle rather than magical fiat was not what people expected, but it is what we found. It’s why real science works.

    The idea that positive results in Parapsychology have no alternate explanation is ludicrous. All they show is deviation from chance and ensuring chance results requires controls to be essentially perfect.

    N-rays had a testable theory, and that theory proved false. Multiple labs reported finding evidence of N-rays, but they turned out to be fooling themselves. Parapsychology as a whole cannot go the way of N-rays simply because there are no specific predictions to falsify. Parapsychologists have shown that some labs sometimes find anomalous results even when they think they are being really careful. Skeptics are not surprised.

    “If your assertion were…” I stand by what I did assert, as both true and significant: after six or seven decades of research, Parapsychologists have failed to find a single consistent demonstration that what they are studying even exists.

  4. [Comment on second paragraph]

    No, “the idea that PSI [sic] results fade away under continued careful examination” is not a simply a broad claim made by skeptics. Careful, though, that is a statement that is subject to very different interpretations. I myself have repeatedly said during this discussion that psi phenomena is erratic. Therefore if you continue to observe it over a period of time you will get unambiguous evidence for its existence at some times and not at others. The times it is not observed do not invalidate the times that it does — and statistically they do not come anywhere close to canceling them out to chance levels (that’s a statistical fact, not a guess). The skeptics make a different claim, which is not supported by the facts, that when you examine closely the results of psi experiments they disappear — a very different statement.

    The erratic behavior of psi, actually produces a bit of a problem for skeptics. You see, if the same methods produce results sometimes but not others, and this cannot be explained as statistical chance (as it can’t) then there must be some reason for the difference. This is why there are so many confident statements that the fading is due to “improved controls” being imposed. They *know* this because otherwise the controls that were adequate to eliminate the “error” in the later experiments would have been adequate to eliminate the “error” in the earlier ones, and that leaves the earlier (I am, by the way pretending that this actually is a simple matter of earlier vs later, which it is not) successes unexplained. But that would imply that, horrors!, they are mistaken and there is something going on that they don’t understand.

    Also keep in mind that parapsychological experiments are, in fact, much more consistent than is generally understood, including by some old-line parapsychologists. If all you count is the “significance” of an experiment, without taking into account its statistical power, then you will see a pattern from a completely consistent source of some experiments being successful and some being unsuccessful. There are statistical tests that measure consistency independent of this bias, and those show much more consistency than most parapsychologists expected.

    • “The skeptics make a different claim, which is not supported by the facts, that when you examine closely the results of psi experiments they disappear — a very different statement.”

      The trouble is, in most cases we cannot really examine how experimenters obtained their results. When we skeptics look for ourselves, we never find psi. Whatever one thinks of Randi’s challenge, many psychics have negotiated conditions and been tested, and the paranormal phenomena have failed to appear every single time.

      My theory is that psi does not exist. It makes specific falsifiable predictions that should hold every time. I also expect Parapsychologists to continue to report that they produced anomalous results in ways that I cannot check.

  5. [Comment on first paragraph]

    Sorry, *I* thought that the point of labatory science was the same as all science. To produce evidence about the existence and characteristics of phenomena, including replication of previously produced results. I never got the memo that made it not just a convenient tool, but *the* primary goal to be able to completely, unambiguously and easily produce and control the phenomena in question. What was the specific time limit specified in the memo? Did it make allowances for the resources available? What about intrinsic complexity and the difficulty observing primary characteristics of parts of the system within which the phenomena occur (such as the human mind — one of the most complex and inaccessible systems known to us)?

    Have you forwarded a copy of this memo to the particle physicists? You know the ones who work on particles and interactions that only rarely show up, once or twice out of thousands of trials, and — get this — not even on the one or two trials that they require. Be sure to the folks who are spending hundreds of millions on the search for the so called “God Particle” (more properly called the “Higgs Boson”). Despite decades of work, they have yet to see it.

    Oh yes. And what about the *other* memo? The one to Nature that instructs it to make sure that all phenomena should work in a simple, easily comprehensible and controlable way, with all the relevant factors obvious and easily observable. I don’t think that Nature received her copy.

    First off, neither you, nor I nor anyone else *knows* that N-rays don’t exist. We just don’t have any good reason to believe that they do. In particular the few experiments that seemed to show that they did had a clear alternative explanation — that human perception at its faintest limits is subject to error. Its not just that it is not reproducible, it is that there never was a single experiment that more than suggested that the assertion “There are no such things as N-rays” had been falsified. That is a very different case than psi where there are thousands of replications which have no reasonable counter explanation. Just that it doesn’t show up today doesn’t invalidate its appearance yesterday.

    If your assertion were, “Parapsychology does not yet have an adequate understanding or control of the phenomena it studies” your arguments would hold weight. In fact I agree completely with that statement. The issue is whether the proper scientific response to a phenomena which has been proven to exist but is not understood or controllable is to a) ignore it or b) study it more.

  6. So you are saying “It is irrelevant that thousands of black swans have been observed by trained observers under good conditions. If the don’t show up on demand and behave the way I think they should (they should act like canaries) they don’t exist. All swans are white and it irrational to think otherwise.”

    Psi phenomena is unquestionably erratic. I personally think that parapsychologists who see an “avoidance” pattern to that are engaging in the all to human tendency to apply patterns on random data, along with attributing to psi the effects of quite conventional things as fatigue and loss of enthusiasm.

    Parapsychology is best understood — at least at the present level of development — as a “laboratory observational science” like laboratory experiments to detect exotic particles from high-energy cosmic ray showers. We set up situations where psi can be distinguished from conventional phenomena and under which we believe that it is more likely to occur (we scatter some fish around and chase away the dogs and hope that the black swans wil show up). Frequently, but not always, this works.

    Keep in mind that resources are limited. I would guess that the entire budget for parapsychology over its entire history could be matched by what is spent on, say, high temperature superconductors in a month. Yet it was several years before high temperature superconductors could be reliably produced.

    As for Susan Blackmore, the embrace of her by the skeptical community is as big an enditement of them as anything else I can think of. In her autobiography Susan says that she would slap together an experimental design, generally while driving in to work with a hangover from wild partying the night before (don’t take my word for it — its in her autobiagraphy). She was sure that she was supposed to be an experimenter who could elicit psi simply because she had become convinced by a personal experience (which most parapsychologists would dismiss as having no paranormal element) that “its real”. Apparently to her that meant that she wouldn’t have to put much work into it. Anyway she’d perform some sloppy little experiment. Sometimes she’d get possitive results. When she did, she would *then* do the critical examination for flaws in the rigor of the design and discover that there was nothing surprising in her results.

    She finally got tired of this and announced that while she accepted that other experimenters clearly were producing positive evidence of psi phenomena that she seemed unable to. Eventually she was invited to a lab to try to learn effective methods. She was thrown out when she broke into one of the experimenter’s office to collect evidence of fraud (successfully only in her mind — she found a Zener deck — imagine, in a parapsychologist’s office) Any phenomena that *she* couldn’t produce at will was of no interest to her and (again, don’t take my word for it) shouldn’t be of interest to anyone else either. Long before that point the parapsychological community had gotten tired of her — including her rejection of every attempt to suggest to her more effective ways to conduct experiments. The skeptical community was willing to kill the fatted calf for her — to lavish the attention on her that she craved. She stayed active with them as long as she continued to receive attention from them — than basically dropped out.

    I would say Susan’s opposition to parasychology is as meaningful as her previous acceptance.

    • Using your example; personally I have never seen a black swan, but if they do exist in nature and I did come across one I would not attribute it to something supernatural, but normal probability since they be by definition rare. I would not expect them to show up on demand.

      How would a believer in PSI distinquish their ability from normal probability when they “predict” anything? Are all correct answers distinquishable to that person? Would they know that the selection was a result of the PSI gift or would they know they were guessing and got lucky? The same question would apply with wrong answers and being unlucky.

      I would think to be more credible the gifted person would have to be able to distinquish this ability in some way from normal probability. The reasoning that this is difficult to test due to people “reacting strongly against the strictures” is hard for me to by in on. If I understand you correctly you are saying they are either conscious or subconscious sabotaging themselves? What would be the difference between taking a controlled test and demonstrating their ability to anyone?

      • Who was talking about supernatural? Paranormal, however some people, for a variety of reasons, use the term, refers to as yet unknown natural processes. That’s why we do experiments, to try to understand those processes.

        I see evidence of psi phenomena and I see evidence of something that we don’t understand yet and which can and should be approached in a scientific manner.

        Complex factors of motivation are among the problems that make results erratic. Human beings are not simple entities like electrons in physics experiments. Some self proclaimed skeptics seem to think that we should be able to simply ignore human psychology, and impose a view that if psychic phenomena are real they should be simple.

        Despite these complexities — this understandable erraticness, our inability to directly observe the factors that we do know about and, the apparent existence of as yet unknown factors (strong correlations have been found in multiple experiments with both local sidereal time and the local geomagnetic field — correlations that are on their own completely inconsistent with any skeptical explanation) — there is a strong residue of positive results. In the terminology of engineering — there is a lot of noise on the channel but there is an unquestionable signal that comes through.

    • “So you are saying ‘It is irrelevant that thousands of black swans’….” No, I didn’t say anything like that; all you. This is laboratory science. There are certainly many reports of paranormal phenomena in the wild, but investigating those calls for quite different standards and methods. Laboratories are supposed to produce repeatable results. Inconsistent replication is common for preliminary results, but continued examination should separate things like quantum interference, which exists, from things like N-rays, which do not.

      Whatever your opinion of the “avoidance” matter, that paper is in Parapsychology’s most prestigious, peer-reviewed journal. Is the idea that PSI results fade away under continued careful examination merely a broad claim by skeptics that has been examined and found wanting? Hard to argue that only skeptics believe something that The Journal of Parapsychology reports.

      I tend to think that continued careful examination ought to yield a better view of the phenomena, maybe just because that’s how all the other branches of science work. Perhaps only skeptics see the comic irony of a paranormal explanation for failure to find paranormal results.

      Using the personal tell-all aspects of Blackmore’s autobiography to indict her science is simply garbage. Where did you get the idea she didn’t put much work into it? If you think an experiment is flawed, point the flaw, not her wild night-life. When someone who reports statistics without knowing what they mean calls another’s work “sloppy”, how much weight should that carry?

      “Sometimes she’d get possitive results. When she did, she would then do the critical examination for flaws in the rigor of the design and discover that there was nothing surprising in her results.” Of course one should critically examines such results. That’s how science typically works: look broadly for initial evidence of a phenomena, and upon finding it examine more and more narrowly and closely.

      “She was sure that she was supposed to be an experimenter who could elicit psi simply because she had become convinced by a personal experience.” She believed in PSI from personal experience, but lab science is not about the specials powers of the experimenter.

  7. The posting form deleted the link to the article on “actively evasive” PSI. Here’s a citation, and it’s easy to find on-line.

    J. E. Kennedy, ” The capricious, actively evasive, unsustainable nature of psi: a summary and hypotheses”; /The Journal of Parapsychology/, Volume 67, Spring 2003, pp. 53-74.

  8. Yes other fields use meta-analysis, but it is notoriously unreliable. Parapsychology has gone a long time without finding a single consistent demonstration that what they are studying even exists.

    Astronomer Stephen Walton summed it up on Penn and Teller’s A&E show: “We’re now at a point where 60 or 70 years of fairly serious research has been done on psychic phenomena, and nothing has been found, and so I think it’s fair bet that nothing will be found.”

    • A meta-analysis is the name given to any set of procedures which looks at a body of independent pieces of evidence and derives a single conclusion. There is a branch of statistics devoted to it whose purpose is methods which can be used to do this without bias and maximum power. A meta-analysis can also be done using weak or biased informal methods. One of the most extreme methods is for someone to just make a bald assertion of opinion without any specific evidence backing up their opinion — such as Dr. Walton (He must be qualified! He’s an Astronomer!).

      Dr. Walton’s statement, similar to many other “skeptics” statements I’ve seen over the years, is obviously ludicrous. He can’t just make thousands of experiments, including many replications, disappear. If he had said “…and the accumulated evidence that I have seen is insufficient to convince me of the reality phenomena in question” then it would have been a reasonable assertion. But that would have made it too obvious that he was stating a subjective opinion not an objective fact.

      Like any statistical technique, conclusions reached on the basis of a formal meta-analysis have a chance of being incorrect. There have been a few studies that have compared selected meta-analyses of several small studies with the “gold-standard” of a single, large unreplicated study and found that sometimes they don’t aggree. The author’s chose to interpret that as a criticism of meta-analysis rather than of the importance of replication and the problems with relying to much, in medicine on that questionable “gold standard”. Maybe they were right, maybe they were wrong.

      Psi true unbelievers tend to be critical of formal meta-analysis because it so frequently contradicts their informal provably weak or highly subjective meta-analyses. They make broad claims (e.g., “not replicated” or “file-drawer problem” or “results disappear as rigorous controls are applied”) and then object to those claims being objectively examined and found wanting.

  9. There have been many protocols with many variations. Dean Radin’s book “The Conscious Universe” contains reviews of several designs and “meta-analyses” (statistical analysis of many experiments of similar design to produce a single set of conclusions — used, for example to prove the dangers of second-hand smoke and the benefits of asprin to prevent heart disease) of the results. I don’t agree with all of his theoretical interpretations, but the experimental reviews are first rate. Dean has a new book out, but I haven’t read it yet so I can’t recommend it myself (Jacob is reading it, I think, and plans on reviewing it).

  10. “What can be seen from the tables. That people who reported that they know that they have strong abilities were about 40% better than average. People who selected that the believe that they have strong abilities were also less subjective to psychological bias but answered 1 much more than 5. So, although no final conclusion can be made, it seems that people who believe or know that they have some psi abilities are less prone to psychological biases and can actually answer better than the average masses.”

    Using chi-square as a measure of bias (departure from an even distribution) we get:

    No abilities: 147.65
    Believe some: 204.62
    Believe strong: 18.21
    Know strong: 31.80

    Bigger the number the stronger the bias, so overall the Believe strong were considerably less biased overall than the know strong. Like with dowsing in the method analysis most of the bias in the “Believe strong” category is found in avoiding the last alternative. Chi-square for just the first four targets are:

    Believe strong: 8.27
    Know strong: 26.70

    (this is not directly comparable to the other chi-squares). The pattern here makes me wonder if there is not an interaction effect going on (i.e., a large overlap between the “dowsers” and the “Believe strong” groups).

    If we were measuring psi effects rather than bias (which we are not) this would be called the “Sheep-Goat Effect” discovered by Dr. Gertrude Schmeidler. There is a moderately consistent pattern of strong believers (sheep) to score more positively than other groups in experiments. Naive strong disbelivers frequently score even more strongly in the negative direction. More sophisticated strong disbelievers score unexceptinably (some indications of them tending to score closer to the expected average than would be likely statistically but as far as I can remember that has only been seen as a post hoc result and therefore cannot be relied on).

    As I remember, the best question for eliciting the sheep-goat effect is to ask something like “Do you agree or disagree with the following statement: I believe that I will do well on this specific experiment: 1) Strongly agree; 2) Agree somewhat; 3) Don’t know 4) Disagree somewhat; 5) Strongly disagree”. The key is focusing on how they feel they personally will do on this particular experiment.

    • Those numbers I understand. That’s Pearson’s chi-squared statistic, with probability 1/5 for each of the 5 choices.

      But it is not a measure of bias, at least not directly. The more trials with a certain bias, the higher the statistic. You could normalize for sample size to get a plausible measure of bias.

      • You’re right, the chi-square is a measure of the *evidence of bias* rather than of bias per se. Larger sample size means capable of providing evidence for a smaller bias. I’ll have to think about this a bit (as if I didn’t have enough to think about) — I’ve been embarrassing myself enough shooting from the hip. I’m tempted to say that the scale factor should be the square root of the sample size rather than the sample size but I’ll have to think about it to be sure.

      • Nope you were right and I was wrong, the chi-square scales with the sample size rather than (like Z-scores) the square root of the sample size. That gives us estimates of the bias:

        No abilities: .224
        Believe some: .307
        Believe strong: .240
        Know strong: .346

        These are *estimates* of the bias factor you would get if you tested a very large number of people, of course, and the smaller groups are less accurate estimates but it looks like those claiming “No abilities” have the smallest bias, followed by “believe strong”, followed (bit step) by “believe some”, with (by an even bigger step) “know strong” having the biggest bias (assuming I didn’t screw up anywhere).

        The “No abilities” and “Believe some” categories are large samples and therefore likely to be pretty accurate — their contrast can be taken as probably representative (though keep in mind, you cannot assume cause and effect — both the choices and the bias might be the effect of some other factors such as cultural factors). The sample sizes for the other two are much smaller but there is a strong enough contrast between them to probably indicate a meaningful difference.

    • I would think any proof would have to provide a measurable distinction between these results and what would be expected by normal probability; regardless of how people came by their choice. Are you trying to prove there are better results between methodologies?

      How would you distinquish the naive strong believers from sophisticated strong believers? Would there also be a bias if there are naive and sophisticated strong disbelievers ?

      I personnally do not believe at this point any methodolgy thought to be used by any participant has any weight on the results. I believe the results are basically only a guess most likely influenced by more mundane issues of selection position, attachment to a particular number or object color.

      I would think the audience for your tests are most likely made of believers, spectics with a some curious. Maybe these groups are represented; the “No ability” group being the skeptics, the curious being the “Believe some” group with the balance being the strong believers.

      I don’t know this, only wondering out loud.

      • This experiment is incapable of showing anomalous information transfer (psi). It therefore cannot prove anything about psi.

        We can show, however, that different methodologies are associated with different degrees and patterns of irrelevant bias. It is reasonable to suppose that strong biases are likely to interfere with any subtle influence psi might have (although one analysis I did indicated that this is not always the case). Therefore, though the data is not directly relevant to psi performance, it is suggestive of factors that might be taken into account in future experiments.

        Actually, it was statistically sophisticated and statistically naive strong *dis*believers that I was distinguishing. The effect is generally seen as a reasonable hypothesis based on the overall characteristics of the subject population in different experiments getting different results from strong disbelievers (psi-missing vs chance levels). It has been, I believe, tested within experiments by including appropriate questions in the questionnaire.

        You are welcome to your beliefs. Keep in mind that they *are* beliefs. There is no evidence here to either support or refute them. Rationally, no one should change their a prior beliefs about the likelihood that psi phenomena occurred during the course of this experiment on the basis of these results — the experimental design precludes that.

        I would certainly think that your speculation about audience is correct, but keep in mind that categories can be slippery. Some “true believers” in psychic phenomena react strongly against the strictures — or even just the idea — of forced-choice style tests such as this one.

        • If you think that is the case, how would you propose to test psychic phenomena?

        • The same way its been done previously. By carefully designed experiments that include proper controls, randomization, analysis and, just as importantly, conditions that experience shows are more likely to elicit detectable phenomena.

        • Makes sense. Have any specific examples of this occuring?

        • There’s nothing repeatable to the point that doubters can do the experiment and expect to see the phenomenon for themselves. The closest thing Parapsychology has is the “Ganzfeld” set-up, and it routinely fails. It’s also so laborious that multiple PSI labs have run out of funding before completed their planned runs.

        • Of course, very few skeptics have ever put any srerious attempts to perform replications. In no other field is “replication” defined to mean “replicatable by anyone without regard to qualifications.” Logically, experimenter attitude is a part of the experimental conditions (as it is in any area revolving around subject motivation). And of course, the experiments have been replicated over and over again. There are dozens of successful replications of the Ganzfeld, for example.

          Replication is about independent replications being done. They have been, over and over. Few fields have as much explicit replication. Being able to replicate at will, with minimum effort and easily aquired skills would make life easier, but it is not a logical requirement.

          Conventional science makes certain strong predictions about a lack of correlation between properly isolated systems. Such claims to be meaningful need to be falsifiable. In principal a single counterexample (a single “black swan” so to speak) is all that is necessary to falsify such a claim. It is not logically necessary that “black swans” should be able to be found anywhere, at any time, by anyone. In practice replication is needed to give confidence that the falsifying observation took place. After thousands of falsifications have been done, it is no longer a question of reasonable doubt. It’s strong unfalsiable belief.

        • Sporadic replication is expected in the early phase of establishing a result, but those days are long past. In real science, decades of research isolates variables and refines experiments to produce stronger and stronger results. In Parapsychology, it works the other way.

          “Few fields have as much explicit replication.” Did you take any science course with “lab” sections? What did you think that was all about?

  11. First, great work and glad to see that negative feedback is not stopping the continued experimentation.

    Right, hear are some thoughts.
    If Psi exists then the full spectrum of Psi will exist which should mean that there is a distribution of talents from strongly negative or reverse to strongly positive with the most likely being in the maybe, maybe not category. This in its self given a large random group will cancel out the experiment BUT since you included the question about peoples perception of their psi ability this would skew the results towards the ‘believe have some / strongly believe’ categories.
    Now that I have said all that I am going to undo some of it.

    I think its been said already but the RV and Intuition seem significant but this could be down to the fact that these methods are less prone to positional bias which the second experiment might eliminate. The random guessers will almost certainly contain a higher proportion of negative or reverse psi people which would again skew this away from the average. The only section that might be significant is the ‘Strongly believe’ because these people presumably have had lots of ‘incidents’ in their history to support that view (people lying not withstanding of cause).

    I hope in your second experiment you will produce results of who picked which photo (1,2,3,4) to see what the bias is there.
    Finally maybe a fully thought experiments should be used rather than pictures to eliminate any sort of bias.

    Pick an object, put it in a box and ask people to tell you what it is and you can add another category to the psi, mind reading or remote mind reading. This might also show up the negative or reverse psi if you can work out what the opposite to the object is.

    Finally I would like to see more analysis of the people in the experiments along the lines of job type, IQ, geographical location, when they did the experiment etc with the ability to skip that section of cause so as not to put people off.

    Again good luck.

  12. The analysis is silly.

    “I’ve manually cleaned several answers.” Where was this manual clearing step in the experimental design? It is totally invalid.

    Yes, people pointed out the experiment is poorly designed, but contrary to your reporting the major problem pointed out to you is that the trials are not independent, because you have everyone guessing at the same random choice. Number, color, and position of boxes is trivia and changing that would not make the experiment valid.

    “As you can see the strong biases really disturb the ability to easily answer if there was any psi effect.” We could see that from the experimental design, and said so.

    “The distribution of the answers to this question showed that 54% of men believed that had some level of ability against 63% of women.” This is a self-selected sample, so result is nonsense.

    “What interesting can be seen from these tables?” Basically nothing. There is no evidence that any of the methods has a better or worse chance on an individual guess than any other.

    Suppose the skeptics are right, and the actual position of the ring had no effect on people’s guesses. With 1500 trials at hitting a 1/5 shot, a 32.33% success rate sounds vastly different from the obtained rate of 9.30%. Really, it was all up to a single roll of a single die.

    • Well, no where in the report did I say that I can conclude anything about psi in this experiment. I also stated that it was badly designed.

      I’m learning from my mistakes (probably not fast enough for the second experiment).

      Actually, psychological biases are very interesting to me, too. And having an experiment that showed how strong they can really be is a great lesson for me and other (beginning) researchers.

      I’m not a scientist, so I really appreciate your inputs, Bryan, Topher and other who wrote. I try the best with the limited time and resources that I have for this project. And I don’t care if 2 or 3 experiments will be badly designed in the learning process as long as I have 2 or 3 well designed where some conclusion will be possible to be made.

    • Not nonsense at all — just quite restricted sense.

  13. “I believe that dowsing is very strongly influenced by the subconscious. From personal experience I know that it’s very hard to disconnect from your thoughts while dowsing for answers”

    Subconscious is good — psi seems to operate subconsciously. What you seem to argue is that dowsing tends to be too contaminated by *conscious* thoughts.

    Here are the standard deviations for each of the methods, in this case, the standard deviation can be used as a measure of bias (if the number of guesses on each of the alternatives were equal — completely non-biased — the standard deviation would be 0.0).

    Guess: 33.01
    Intuition: 83.01
    Visualization: 29.38
    RV: 5.45
    Dowsing: 5.46
    Other: 3.42

    There is going to be some effect from small samples and “discretization” (counts cannot be arbitrary, continuous values but only integers), but I think that it is pretty clear that there was less bias shown by “dowsers” than by “guessers”, “intuiters” or by “visualizers”.

    • Whoops, left out most of the figures above. The standard deviation depends on the sample size so those figures are not comparable as is. What I did is ran some Monte Carlo simulations to get the expected value of the standard deviation for an unbiased process and the standard deviation of the standard deviation for each sample. (I could probably have worked out or found a formula but it was faster to simulate). Then I divided the difference between the observed standard deviation and the expected standard deviation by the standard deviation of the standard deviation to get a measure of departure from unbiased. Here are the results:

      Guess: 11.62
      Intuition: 18.37
      Visualization: 10.78
      RV: 3.53
      Dowsing: 2.54
      Other: 1.56

      Again, these are crude and approximate — not a formal statistical investigation. The point is clear, though, dowsing shows less bias than the first three techniques.

      • But answers of dowsing were worse than even guessing, so what can this say of the method?

        • This experiment is about bias, and in its present form, only about bias. In a properly designed experiment we would be able to detect a bias due to one choice on each trial being the target, but this is not a properly designed experiment. We can’t say anything about psi here.

          Overall the distribution from “dowsing” is quite even *except* that there is a tendency to avoid the final position. That avoidance results in a low score.

          One possible interpretation of that avoidance is “psi missing” a tendency for some percipients under some conditions to appear to avoid the target. We don’t know, and can’t know that that is what is happening here though. All we know is that there is a positional bias — whatever the cause — against the last position, and not much other positional (or color, or whatever) bias.

          There is still bias here, but overall less bias than with other methods. What bias remains though is in a form that results in a particularly low score.

      • Just look it up. The standard deviation of the binomial distribution, with n trials of probability p, is square_root(n * p * (1 – p)). The normal score, or z-score, is (hits – expected_hits) / standard_deviation. That formula gives the score for the normal approximation to the binomial. The samples are small enough here that we could use the exact binomial.

        I do not think analysis of the z-scores or p-values of the individual categories is useful here. We could, and did, tell in advance that the experiment was broken and such numbers would show nothing interesting.

        • Except that this is multinomial not binomial and I’m talking about what might be considered second order statistics under circumstances where assumptions about the normal distribution might not apply.

          On the other hand, I should have spent 2 minutes thinking about it rather than 5 minutes simulating. After throwing in some scale factors this (for you stat geeks) would have a chi-square distribution with 4 degrees of freedom, and in fact, a standard chi-square test would have been appropriate and gotten me the same answers.

        • I don’t get what your number mean. How did you get a single standard deviation for a multinomial distribution?

          I can see a way to use the chi-squared distribution, but I don’t
          see how to get your numbers.

        • Embarrassing case of plunging ahead with something and missing the obvious. I thought of the five entries for each method as five samples against a common mean and computed the standard deviation. I was using standard deviation here just as a measure of deviation from the mean. I then used a simulation to measure the expected standard deviation and the standard deviation of the standard deviation and used them to derive what would roughly be called a “Z-score” (although I did not particularly expect a standard normal distribution from the result).

          Use of the chi-square was the obvious thing to do — and in fact the square of my “standard deviation” (i.e., the variance) scaled by 25/N (N being the number of trials in the category) is the chi-square, so they are equivalent.

          (Apologies to the non math/stat geeks)

        • “I do not think analysis of the z-scores or p-values of the individual categories is useful here. We could, and did, tell in advance that the experiment was broken and such numbers would show nothing interesting.”

          Depends on what you find interesting. It doesn’t tell us anything about psi — but it might tell us something about techniques meant to elicit psi. Jacob seemed to feel that “dowsing” was a technique that might increase biased (hence rigid, hence, presumably, less subject to any subtle psi influences). I was pointing out that the data here indicates otherwise. Z-scores are meaningful (a normalized measure of difference), p-values (i.e., a measure of the degree of evidence of the aforementioned) probably are not.

    • Yes, I probably mixed conscious and subconscious up in this description. I’m going to change it now to say conscious.

    • Further down the page in connection with the break-down by belief we discussed the right way to measure bias. Applying that method here we get:

      [Mathphobes can close their eyes]
      Guess: .282
      Intuition: .284
      Visualization: .265
      RV: .353
      Dowsing: .177
      Other: .202
      [Mathphobes can open their eyes again]

      So Dowsing showed the least bias by a good bit, with RV showing far and away the highest level of bias. Other was somewhat higher than dowsing but with the small sample size in these two categories we can’t really say that there was a big difference. Visualization was almost as biased as Guessing and Intuition which showed (unsurprisingly) indistinguishable levels of bias.

  14. The biases in the “Guess” column indicate pretty strongly that people did not universally interpret that to mean “use an arbitrary, non-biased random generator (like a die)”. My guess is that for most people your “guess” and “intuition” categories were not well distinguished.

    • Yes, guess and intuition could be seen as roughly the same, yet people choose one or the other quite equally. I believed that providing both answers will make the choice easier for some people.

      I didn’t mean by guess that people would roll a dice (although at least one wrote that he did exactly that)

      • I wrote that I rolled a die, but I’m pretty sure I checked “other” as the method, not “guess”. Only the “other” choice asks for text describing the method.

  15. “I’ve manually cleaned several answers where people stated that they used some very analytical skill, like which box was lower or by using reverse psychological bias…”

    Can’t do that. If you had gotten positive results it might have invalidated them — if you had any hint as to what the people you eliminated had guessed. Your judgment as to whether or not to throw a particular response might have been unconsciously biased by what they guessed (even if you didn’t know what the correct guess was, it would add to the lack of independence).

    If you are going to throw out the results you need to 1) make that an open part of the design before you start the experiment; 2) to the extent possible make the decision about whether and which responses to throw out purely mechanical; and 3) to the extent judgment needs to be applied it needs to be made by someone completely blind as to the actual response made or even the overall results (a poor overall hit rate might encourage more aggressive pruning in subconscious hopes that a big change in the population — even a random change — might push things into a positive result).

    Besides, as far as I can see, these responses are perfectly “legit”, just not using a strategy that you feel is likely to create positive results. You encouraged people to use their own strategy — these respondents did so.

    Your correct analytic response should have been to include them and then do a post hoc analysis to see if their was a significant effect (answer would have been — sample was too small to cause any detectable effect). Post hoc analyses cannot be used to reach any conclusions — only to suggest hypotheses that might be incorporated into later experiments.

    • Actually, many of the results I’ve purged (there were less than 10 of them, I think), were correct answers. But since the people wrote they they saw in the picture that box 5 was lower because of the weight of the ring and that’s why they chose 5, I decided to remove these answers as the method is not one of the possible psi methods, and their inclusion would shift statistics (even if to the positive) and could result in different conclusions regarding psi effects.

      If someone would write that he prayed to god and the angels told him an answer, I would accept this, since it’s not one analytical answers but I would think of them as psi.

      But maybe someone less involved should have done this. But I’m mostly a one man shop, so volunteers are welcomed.

      • Doesn’t change my comments at all. That the excluded results tended to be “hits” just illustrates the problem of non-independence. *No* analysis of this experiment can reach any conclusion, for or against psi effects.

        Your directions did not restrict your participants to only use “one of the possible psi methods”, prayer and angels was not one of the methods you did list, and you gave people the opportunity to list “other” methods. Furthermore, you are simply assuming that people who report using “analytic methods” are not simply using an analytic rationalization for a subsconscious, psi derived guess (my quick measurement with a straight-edge actually places 3 as the “lowest” box).
        They were legit, just not the way you wanted them to do it.

      • “the people wrote they saw in the picture that box 5 was lower because of the weight of the ring and that’s why they chose 5, […]”

        Ah, clever.

        So if there’s anything we can take from this experiment, it is that observation and reasoning worked, where intuition, visualization, remote viewing and dowsing failed.

        • “..So if there’s anything we can take from this experiment, it is that observation and reasoning worked, where intuition, visualization, remote viewing and dowsing failed.”

          Definitely, we would agree that measuring is not one of psi effects, and is, usually, a more exact method.

          Yet, one of the men who measured, selected 3 as his answer, since it looked lower to him.

  16. I have a number of comments. I’ll post them individually as time permits.

    Commenting on the phrase: “So, in general women are more likely to believe that they possess some kind of psi abilities.”

    You can’t conclude that. You have what is known in statistics as “a sample of opportunity” — and one with strong biases. All that you can speak of here is the characteristics of the people who chose to participate in your experiment.

    Obviously, your age distribution is not characteristic of the population as a whole, so it might be only that “women in the 20-40 year age bracket are more likely to believe in psi than men in the same group.” But even that is way, way to strong a conclusion.

    Who comes to the site and among those, who choses to take the trouble of taking the test severely biases your sample.

    What if (I’m not saying I believe it, just “what if?”) women in your demographic are more likely to see your test as a way of supporting their positive beliefs but that its a waste of time to bother with a test that they “know” can’t show anything, while men in your demographic are more motivated to “prove” that nothing will come of it (when that is their belief) than to prove that positive results can be gotten (when that is their belief). That could produce a strong bias in who takes the test that might reverse an actual higher proportion of believers among males in the population.

    You wouldn’t want to claim, I’m sure, that 71% of all people are men, but that is what your figures show — if they are taken to apply “in general”.

    Your statistics (this one and throughout) tell you only about the people who chose to participate. That might suggest some interesting, broader hypotheses, but it doesn’t demonstrate anything beyond that specific set of people.

    • Well, of course all the data and my conclusions only refers to the women and men who participated in the experiment. I’ve never meant to generalize that to the whole population of the world.

      But thank you for pointing out to me that I shouldn’t be writing like I did.

      • Another error:

        “So, although no final conclusion can be made, it seems that people who believe or know that they have some psi abilities are less prone to psychological biases and can actually answer better than the average masses.”

        First, you didn’t test the masses. This is a self-selected sample.

        Second, even just considering the test participants, your statistics contradict you on that. The answers of the people who claim to know they have strong psychic abilities show *more* bias than other groups. In the “know strong” group, just two of the five choices (both wrong) got 68.48% of the guesses.

        There no evidence here that any of the groups can “answer better” than any of the others.

  17. There must be a problem with the methodology. 🙂 Pure chance would have given you ~ 20% correct; and the results show 9.3%. Must be a negative energy thing.

    I haven’t kept up with my dowsing exercises. I knew I should have ordered the other guys DVD’s.

    • I’ve written about the methodology problem in the results. People are biased based on the numbers (positions) of the boxes.

      • The second test appears to offer the same problem when selecting an choicer. Most people will not choose a number at the beginning or end. The results of the first test proved that also. Also colors will lead people. Maybe if you could make it where the choices were displayed in a circle. No beginnings or ends.

        • Yes, but the position of the objects is random, so even if people are biased on position that won’t mean that the answers (objects) will be biased.

          In experiment 2, if all people choose the same position, then the 4 objects will be divided equally.