Fun with Numbers, Key Stats Reconsidered - March 2
I told you last night that I was getting material on statistics from folks far more gifted in that area than I am. I have permission to run the following throught provoking piece from Jack Poirer. I need to thank the Jerusalem Perspective for permission to run this as they are publishing it as well. And, of course, big kudos to Jack as well for his work. Jack is a blog brother on Crosstalk, a discussion group on Jesus I am a part of. His degree is in "ancient Judaism" from the Jewish Theological Seminary in NYC (in 2005). But as they say in the sports world, "he's got game" when it comes to stats. Here is his piece: Fun with Numbers: The Statistics behind “The Tomb” by Jack Poirier Most scholars paid little attention a few years ago when the so-called “Bible Code” was all the rage. After all, the idea that there were hidden words in the Hebrew text of the Bible at given frequencies of letters was hardly worth the trouble to discredit, and the odds of those words appearing merely by accident were overwhelmingly against the view that they were hidden there by divine intelligence. The fact that the proponents of the Bible Code produced their own grossly inflated statistics about these words didn’t make it any more worth the trouble to speak up. But now, with the premiere of “The Tomb” less than a week away, a strikingly parallel abuse of statistics has caught the attention of a number of scholars. “The Tomb” is a documentary on the supposed discovery of Jesus’ family tomb. It argues that a tomb containing 10 ossuaries, including one inscribed with a form of “Mary”, one with a form of “Joseph”, and one with “Jesus son of Joseph”, is none other than the family tomb of the Jesus of the gospels. Almost all scholars think the idea of this being Jesus’ family is ridiculous, but the possibility that some viewers may be taken in by it has been a cause for alarm. As one of the few who actually paid attention to the Bible Code, I’m probably more fascinated than most scholars at the way in which statistics have been padded in support of the documentary’s claims. Several years ago, Bible Code proponents found a number of ways to misrepresent the true odds of finding an imbedded message. One way was to allow themselves the luxury of letting the message appear anywhere it wanted (so long as it appeared), while tying its specific location to a statistical itemization (as if its exact location were an object of statistical prediction). Equally often they failed to differentiate between the finding of precisely the message that was encoded and the phenomenon of encoded messages in general. Any honest and competent student of high school-level statistics would instantly recognize these as cheating moves, yet both figure conspicuously in the statistics behind “The Tomb”. I have seen a number of confusing statistics associated with the upcoming show—with the most extreme naming odds of 42 million to one—but like everyone else, I have to wait until the show airs to know exactly what is being claimed. Although I have an idea of how these statistical sets were delimited, I will, for safety’s sake, just give a brief explanation of what the statistics should say—that is, of how the statistical sets may justifiably be defined and what the true numbers look like. Hopefully, after the show airs, I will be able to append a more specific comment about what was actually claimed. The tomb originally contained ten ossuaries, but one is presently missing, so we effectively have nine ossuaries. The field, however, is defined not by the number of ossuaries but by the number of inscribed names. Six of the nine ossuaries are inscribed, two with patronymics, giving a total of eight names (excluding the listing of a short form of one name along with its longer form). Six of the names are of males. For the sake of getting a handle on the statistics surrounding the names, I’ll handle the masculine and feminine names as two discrete sets: feminine names appear so much less frequently than male names that it distorts things if we don’t treat the appearance of two feminine names as a given. The six ossuaries are inscribed as follows: “Yehudah son of Yeshua”, “Yeshua son of Yoseph”, “Maria”, “Matia”, “Yose”, and “Mariamene [who is also called] Mara”. The inscriptions that are statistically interesting are “Yeshua son of Yoseph”, “Maria”, and “Yose”, which the documentary claims to represent the Jesus, Mary, and Joseph of the gospels. What are the odds of each of those names separately appearing, and what are the odds of their appearing together, and in the form of the appropriate patronymic? Based on Richard Bauckham’s figures (from Jesus and the Eyewitnesses: The Gospels as Eyewitness Testimony [Grand Rapids: Eerdmans, 2006], drawn in turn from Tal Ilan’s Lexicon of Jewish Names in Late Antiquity, part 1: Palestine 330 BCE – 200 CE [Tübingen: Mohr-Siebeck, 2002]), the single shot odds of a given male Jew in our period being named “Yoseph” (in any form) is about 8.30% and that of “Yeshua” (“Jesus”) is about 3.77%, while the single shot odds of a given female Jew being named “Mary” (in any form) is about 21.34% Having more than single shot chances (viz. having six chances for the male names [considered singly] and two chances for “Mary”), we find that the odds of at least one of the six being named “Yoseph” is about 40.56%, the odds of at least one in six being named “Yeshua” is about 20.60%, and the odds of one of the two female names being “Mary” is about 38.13%. (The formula is 1-(1-x)y, where x equals the odds of one occurrence in one shot and y equals the number of shots.) Calculating the odds of these all obtaining together involves multiplying their respective percentages, after changing the calculation for one of the male names from a sixfold shot to a fivefold shot (as one of the shots will already be taken by the granted occurrence of the first male name). Depending on which male name is taken first, the odds of all three occurring together within a field of six masculine and two feminine names is either 2.70% or 2.76%. Since a story on the ABC News website (“Bones of Contention”, dated Feb 26, 2007) cites archaeologist Amos Kloner as saying that there “are more than 900 buried tombs just like the ‘Jesus’ tomb within a 2-mile radius of Talpiyot” (quoting ABC News, not Kloner), we should not be surprised if several (if not dozens) of tombs should contain that same combination of names. But what about the fact that “Yoseph” and “Yeshua” appear specifically in the patronymic relation of “Yeshua bar Yoseph”? Surely the odds of that happening are slighter. Of course they are, but they’re hardly as infinitesimal as the producers of this show would like us to think. The odds of one “Yeshua son of Yoseph” appearing, given two shots at it (there being two ossuaries with patronymics in this tomb), is found either by multiplying the odds of finding one “Yoseph” in two shots by the odds of finding one “Yeshua” in one shot, or of finding one “Yeshua” in two shots by the odds of finding one “Yoseph” in one shot. In other words, the odds are 0.60% (or 0.61%, depending on which order the names are taken), or one chance in 166.6 (or one in 162.7). Again, given the number of tombs in the area, that’s hardly a significant figure. In reality, the odds are slightly better, since the Jewish practice of papponymy (of naming a child after his grandfather) excludes the possibility of the same name appearing in both positions in the same patronymic. Adjusting for papponymy gives odds of 0.65% (or 0.72%), or one chance in 152.7 (or one in 139.5). Now what if we combine the odds of finding the appropriate patronymic (“Yeshua b. Yoseph”) with the odds of finding “Yoseph” and “Mary” (or forms of those names) on the remaining ossuaries? Multiplying 0.65% (or 0.72%) by 22.90% (the odds of “Yoseph” appearing as the occupant of one of three remaining male ossuaries) and then by 38.13% (the odds of “Mary” appearing on one of the two female ossuaries), we obtain odds of just less than 0.06% (more precisely, of about one in 1711.5 or one in 1748.9). As small as that number appears, however, it is hardly telltale. This is readily visible from the great number of tombs there are—it’s hardly surprising that one of them should contain this combination of names. But even the numbers crunched above stand in desperate need of qualification. A few considerations about how the statistical set was delimited “after the fact” will explain what I mean. By “after the fact”, I mean that what obtains in this tomb’s sampling is being treated as the only combination of names that could trigger the suspicion that this is Jesus’ family tomb, when in fact a number of other combinations of names could have done so with equal or more statistical impressiveness. Leaving aside James Tabor’s theory that the so-called James ossuary was stolen from this tomb (see below), one could imagine a scenario in which one ossuary read “Ya’akov” (= “James”) rather than “Yoseph”. Undoubtedly the same media sleuths that are on the case now would have been on the case in that alternative scenario, asking why we have those names and not some others. So in asking about the odds of “Yoseph”, “Mary”, and “Yeshua bar Yoseph” showing up, we (or rather they) are really asking a statistically less meaningful question. The better question is “What are the odds of any suggestive combination of names obtaining?”, and that question, with the numerous counterfactual scenarios that it maps, puts the combination of names in the Talpiot tomb in a more realistic light. I wanted to keep this piece as short as possible, but mention should certainly be made of the fact that one of the two Mary’s in the tomb is being spoken of in the media as the Magdalene, although neither “Mary” ossuary identifies her that way, and there remains (in spite of Dan Brown et al) no hint of a suggestion in the early sources (that is, until the Gnostic Gospel of Philip) that the Magdalene ever became part of Jesus’ family. Mention should also be made of the fact that, although some names in the Talpiot tomb cannot be accounted for on the grounds of Christian tradition, they are being treated neither as a problem nor as a statistically neutral piece of information, but rather as pieces of positive evidence for identifying the tomb with Jesus—even to the point of construing these historically stray names as a statistical debt that naysayers must explain!! (That this is so is shown by something that Tabor posted on Jim West’s weblog on Feb 28, 2007: there he states, “What we have to ask is what are the probabilities of these six names occurring together in a 1st century Jewish family tomb, namely: Mary, a second Mary, Jesus son of Joseph, Jude son of Jesus, Joseph, and Matthew” [!]. He doesn’t see that three of these names are “after the fact” and thus should be laid aside as having no probative value whatsoever.) And, finally, mention should be made of the way in which Tabor is so insistent on identifying the James ossuary (which he believes is authentic) with the missing ossuary from Talpiot that he even incorporates its inscription into the statistical burden. And I’m willing to be money that the documentary doesn’t mention that the official report on the Talpiot tomb describes the missing ossuary as “plain” (meaning either that it is not the James ossuary or that the inscription on the James ossuary is more recent than the report). Neither (I’m willing to bet) does it mention that the length of the missing ossuary differs from that of the James ossuary by more than 9 centimeters. All things considered, there is no way, in my opinion, that the Talpiot tomb should lead to a rewriting of the history of Jesus’ family. _______Thanks for this again, Jack. One more thought that someone else communicated to me that I'd like to present for your thinking. Remember that the least common name in the tomb ossuary list is Jose, which pumps up the numbers in the statistical figuring, be3casue on its own it is so rare. However, that name is merely a variation of Joseph, the second most common male name of the period. Now in a family, the way one distinguishes between senior and junior is to give junior a nickname or shortened name, that is, a name of endearment. So in my own family we have three Joe's (one is Joe, the next generation is Joe Junior, and the third is now little Joe, even though now he is physically bigger than the other Joes! If we get to a great-grandson I am not sure what will happen to this efficient system). This renaming phenomena (but with variation) impacts the numbers of calculating the odds. It is often the case that the same name appears in the same family as the name is passed on through the generations and that a nickname will also appear. That means that Jose is not as rare or surprising in a family that already has a Joseph. What I do not know how to do is turn that point in a precise statistical direction. But what it means is that the largest factor contributing to the proposed 600 to 1 figure is severely compromised by this observation. Ah! I knew I took math and logic for a reason years ago. Aren't numbers fun?