An important topic in professional psychology (my day job ;-) is inter-rater reliability. If different experts provide very similar ratings, then the rating process, or method, has high inter-reliability. Consequently, the ratings are dependable--you can count on them as not being so subjective as to become meaningless. For more info, and so I don't go too far afield, see this site for more info on inter-rater reliability:
http://www.socialresearchmethods.ne...reltypes.phpSo, I wondered if inter-rater reliability studies have been conducted regarding coin grading in general, and among the
TPG services specifically.
My search of this forum and some Google searches found two studies:
1) A study conducted by a coin collector, who might have been a member of this forum,
StuJoe, who at one time had a couple of websites, e.g., TheStuJoeCollection.com, which are no longer functional. The results of his study are posted at:
http://www.rickbassett.com/pace/dis...0results.htmHis results show significant variation amongst the grades assigned by his participants, i.e.,
low inter-rater reliability.
2) A computer science PhD student (now a professor), Rick Bassett, conducted research for a dissertation,
Machine Assisted Grading of Rare Collectibles through the COINS framework, which I found quite interesting, since he approached the topic in a meticulous, scientific manner, and because he constructed a fairly accurate machine-graded system (opticals, software, etc.).
His dissertation is available online:
http://www.richardbassett.com/pace/...on%204.0.pdfDetailed info about his research process is available too:
http://www.rickbassett.com/pace/dissertation/In terms of the inter-rater reliability of experienced ("expert") coin graders, the results revealed a very wide range of grades assigned to digital photographs of various Lincoln cents. In statistical terms, there was a high standard deviation, which also means the inter-rater reliability was low. The results are on page 90-91 of his dissertation.
These results support what many of you have said on this forum:
* Do not become over-focused on
TPG numbers to the exclusion of your own assessment of a coin's grade, appearance, and desirability.
* Understand that different 'experts' will assign different grades.
* Judgments regarding the 'best'
TPG often involve factors other than actual accuracy, such as how the
TPG markets itself and social conformity [
https://en.wikipedia.org/wiki/Conformity ].
* 'Machine-grading', using a combination of computer analysis and human judgment, is probably the wave of the future, and will no doubt be marketed as "even better", for which TPGs will most likely charge a premium.
QUESTIONSa) Do you know of any other research studies, particularly comparing the TPGs?
b) What other conclusions would you draw from the available research?
c) To what extent do you believe the TPGs would welcome a rigorous, unbiased, independent, audited study of the inter-rater reliability of their grading compared to themselves and other TPGs? (My guess is that it's the last thing they would want to see as it might put them behind the

.)
Thanks!
Mark