Minor League Baseball box scores
Building on my earlier work with basketball box scores, here is a new project: Minor League box scores for some of the greatest major league baseball players of all time.
The box score data was entered and partially proofed using custom python scripts written by me, stored in csv files based on Retrosheet’s .EBA format. Additional tools convert that .EBA file into a box score text file, game log text files, etc. What does “partially proofed” mean? Each box score is analyzed to look for inconsistencies within that box score (such as a player with more hits than at-bats, an incomplete batting order, a missing defensive position player, or a pitcher with more runs allowed than the other team scored). However, there was no attempt (yet) to check the cumulative season statistics against the season stats listed on baseball-reference and other sources.
TBD: I will be sharing the python scripts used to digitize these box scores via GitHub in the near future, and a readme on GitHub will explain how the scripts work. The Ted Williams box scores were diverse enough to test an initial prototype, but the box scores I am currently working on are bringing out new scenarios (and script bugs) which I want to fix before sharing the code.
1938 Minneapolis Millers featuring Ted Williams