AI Reasoning Engineer
Turing
Worked on training and evaluating large language models with a focus on reasoning, in-context learning, and software engineering problem solving.
Designed complex coding and algorithmic tasks with structured reasoning paths to improve model accuracy and interpretability.
Contributed to CL-Bench, a context-learning benchmark with 1,899 tasks and 31,000+ evaluation rubrics.
Built an internal response extraction and defect analysis tool to identify inconsistencies in LLM outputs and improve evaluation reliability.