Benchmarking Llms Via Uncertainty Quantification