Pages Navigation Menu

Parapsychology articles and news

Results for Psi Experiment 3

The third psi experiment was the best experiment so far, if judging by its design. The results for the data gathered so far, though, show no signs of psi effect from a statistical significance point of view. There were a total of 6417 trials (card guesses) The simple tables below summarize the results. The first one shows the number of guesses based for each card color:

Card Color Guessed Not Guessed Grand Total
Black 1570 1651 3221
Red 1648 1548 3196
Grand Total 3218 3199 6417

The second table shows correct and wrong guesses by gender.

Gender Correct Wrong Grand Total
Female 1517 1475 2992
Male 1701 1724 3425
Grand Total 3218 3199 6417

Since the data gathered so far did not show any evidence for psi, from statistical point of view, further analysis was not performed. The experiment is still running, so you can still take part in it. If you have ideas for additional experiments that can be implemented over the web, I’ll be glad to hear your suggestions.





  1. Oh, there’s also a possibly confusing typo: The last paragraph contradicts the first by stating:

    “Since the results show statistically significant evidence for psi […]”

    The correct statement is the first paragraph. “The results for the data gathered so far, though, show no signs of psi effect from a statistical significance point of view.”


    • Oops, my bad. Going to fix now.

  2. Jacob had reported these numbers to myself and Topher Cooper in private e-mail. His message also noted a technical glitch in running the experiment. I won’t quote private e-mail without permission, but I believe that proper reporting must describe that glitch.

    As I stated from the start, my preference is to keep all technical correspondence on a public and archived forum. Looking back at historical psi experiments, wouldn’t it be great to have a record like that?


    • Ok, maybe I should have reported about the glitch, so here it is:

      During May 15, the site has changed a little its protocol to receive random numbers. It was used to select the card which will be shown. I only noticed that when looking at the results of the experiment around June 15. So, all the trials between those two days were not valid for the experiment’s results. There were about 4000 trials in this period for a total of over 10000 trials. The number of trials up to that event was 6417, as shown in the tables above.

      After I noticed the problem on June 15, I modified the code on the server to work again, so the new trials after that should be legit, but they were not taken into the calculations above.

      Since the test is still running and is not depended on the results above (correct me, Brian, if I’m wrong), you can still take part in it and maybe in a month or so, I’ll collect the data again to see if there was any change.

      Best regards and thank you for your support,

      • To be statistically analyzable an experiment must have a clear-cut criterion for termination (usually either “so many trials” or “such-and-such a date”). Failure to do so is a statistical flaw called “arbitrary stopping.” (Of course, not all planned stopping criteria are valid — you could, for example, plan on stopping when the results reach statistical significance, which random walk theory says would eventually happen by chance).

        So, you can continue to run the experiment, but it must be counted not as a “continuation” but as a new experiment (or “trial” which is what an independent but closely related experiment is called). And you must set a stopping date or some other termination criterion when the analysis is to be done — something outside your control (“when I’m ready to run the next experiment”, for example, won’t do).

        And of course, the design, though better than the previous ones, still suffers from flaws that leaves it open to deliberate or unconscious non-psi influence. Running the experiment longer will give conscious tamperers more time to find the experiment and/or figure out how to “fix” it.

        That doesn’t mean that you shouldn’t continue it. It just means that it is something “interesting” or “fun”, not a serious research contribution.

      • For the record, the e-mail states May 12 as the day changed its interface, not May 15.

        There are a few issues around continuing the tests. First, there is that known channel of sensory leakage, sufficient to expose all the card colors. The at-chance results suggest (but do no prove) that no one exploited it either deliberately or subconsciously, but it is still a fatal defect. If continued trials still come out at chance, that’s the result you already had. If there’s a deviation, the defect is a good reason to reject the new results. Can’t win.

        Could the random-number problem influence results after the fix? Possibly. It combines with the design defect to expose that one card more quickly and easily. It also tends to mis-train participants. While the RNG was bad, the more a participant called that one color, the better he or she does on the run. The mis-training aspect does not effect statistical scoring against chance; rather, if psi is in play, mis-training could plausibly hurt performance.

        As Topher noted, you should specify when you will take the next results of record. You have a lot of flexibility; the one rule is that it must be independent of the outcome of the trials. My own preference is for fully automated reporting like the RPKP project does. The RPKP project is exemplary; you’ve actually done well on this part so far.


    • I agree, it would be interesting. Correspondence, however, technical or not, is simply conversation done in another medium. It is not part of the formal experimental “record”. It hardly seems practical to follow all scientists on Earth around with a camera and recording crew recording every conversation they have (not to mention requiring the scientists to constantly verbalize for recording their own otherwise internal self-conversations).

      What is required in science is that an accurate record of all details of an experimental design, execution, outcome and analysis be permanently recorded (the traditional “laboratory notebook” done in ink on prenumbered pages), and that the formal report (there is no formal report here, but what is reported in this blog somewhat stands in) accurately reflects any deviations from the planned design in execution (e.g., technical failure of the random number source) or analysis (e.g., “the results showed a strong deviation from a normal distribution, so a non-parametric test was substituted before the results were computed”).

      Jacob failed to report an important detail (which, however, did not compromise the accuracy of his published analysis, just their completeness) — that was a mistake. Not making public an email conversation (“correspondence”) that happened to alert you to this error was not a technical flaw.

      • Yeah, let’s nix that “follow all scientists on Earth around with a camera” idea as impractical for now. Fortunately, resources already in place make the cost of using a public forum instead of e-mail indistinguishable from zero.

        Times change. In years past, lab logs were unavailable because disseminating them was impractical, not because of any benefit from withholding them. What’s the reason today? Do we want to conceal something, or is it just the inertia of ink-and-paper thinking in an electronic world?


        • Bryan, nobody is talking about trying to “conceal” anything. You sound like one of those people who respond to any talk about privacy protections with questions like “what are you trying to hide?” and implications about criminal activity.

          If you are talking about the desirability of keeping a record when convenient of the process of scientific creativity, because it is of general interest, then I agree with you. If you are talking about an evidential requirement than I’ll continue to support the rules of *scientific* evidence, thank you.

          It is a strong principal of science that it is the end result that constitutes evidence, not the wrong turns, associations, personalities, or meandering creative paths by which it got there that matters. The rest is not only irrelevant, but its inclusion in an evaluation of the evidence constitutes a severe flaw in that evaluation. What is required is the complete plan (as determined in advance), a published fair summary of the results, the complete data available, and the analysis. That’s it — nothing else is relevant.

          In particular, having discussions, via email or in person, that are casual and tentative (whether or not in a public forum) or are held privately is not a weakness in the evidence. Its the final outcome that constitutes the evidence, period — anything else is bad science.