We study heuristics for a class of complex multi-armed bandit problems, the period-by-period choice of a set of objects or "toolkit" where the decision maker learns about the value of tools within the chosen toolkit. This paper studies heuristics that involve a decision maker who employs Bayesian inference. Analytical results are combined with simulations to gain insights into the relative performance of these heuristics. We depart from the extensive bandit-learning literature in computer science and operations research by employing the discounted-expected-reward formulation that stresses the importance of the classic exploration-exploitation tradeoff. A companion paper, Francetich and Kreps (2019), studies a variety of prior-free heuristics.