Junheng Hao 郝浚珩
Bio
Junheng Hao is currently a researcher/member of technical staff at Microsoft, working on LLM (Phi-3, GPT, etc) training and customization. Prior to this, he obtained my Ph.D from University of California Los Angeles (UCLA), advised by Yizhou Sun and Wei Wang at Scalable Analytics Institute (ScAi) and UCLA Data Mining Lab in Department of Computer Science. Before coming to UCLA, he graduated in 2017 from Department of Automation, School of Information Science and Technology, Tsinghua University.
Education
- Ph.D. in Computer Science, University of California Los Angeles (UCLA), 2022
- B.Eng. in School of Information Science and Technology, Tsinghua University, 2017
Experiences
- Reseacher, Microsoft GenAI, Oct 2022 - Present
- Research Intern, Microsoft Research (MSR), Jun 2021 - Sep 2021
- PhD Research Intern, IBM Research AI, Jun 2020 - Sep 2020
- Applied Scientist Intern, Amazon, Jun 2019 - Dec 2019
- Research Intern, NEC Lab America, Jun 2018 - Sep 2018
Research Interests
- Large language model (Phi, GPT-4o) Training: Pre-training, Post-training, RLHF, etc
- Systematic data strategies for LLM: data selection and synthetic data generation
- Customized LLM development: Domain-specific LLM, Reasoning/Coding
- LLM + Knowledge Graph (KG)
- LLM benchmarking and evaluation
Selected Publications
Check full list here at Publications and Google Scholar
-
Microsoft GenAI Team. Phi-4-mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs, 2025. Read more: Phi-4-Mini Model Release
-
Microsoft GenAI Team. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, 2024. Read more: Blog: Tiny but mighty: The Phi-3 small language models with big potential Phi-3 Model Release
-
Yubo Ma, Junheng Hao, Ruochen Xu, Shuohang Wang, Zhibin Gou, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Hassan Awadalla, Weizhu Chen. SciAgent: A Tool-augmented LLM for Scientific Reasoning The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)
-
Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Dongyan Zhao. Language Models can be Logical Solvers 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
-
Junheng Hao, Tong Zhao, Jin Li, Xin Luna Dong, Christos Faloutsos, Yizhou Sun and Wei Wang. “P-Companion: A Principled Framework for Diversified Complementary Product Recommendation”. In the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), Applied Research Track. [Amazon Blog]
-
Junheng Hao, Chelsea J.-T. Ju, Muhao Chen, Yizhou Sun, Carlo Zaniolo and Wei Wang. “Bio-JOIE: Joint Representation Learning of Biological Knowledge Bases”. In the 11th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB 2020, Best Student Paper Award). [UCLA CS News]
-
Junheng Hao, Muhao Chen, Wenchao Yu, Yizhou Sun, Wei Wang. “Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts”. In the 25th International ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2019.
Contact
LinkedIn / Email: haojh.ucla@gmail.com