Towards Robust Qa Evaluation Via Open Llms