文章摘要
原文 This repo contains the instruction-tuned 0.5B Qwen2.5 model, which has the following features: Type: Causal Language ModelsTraining Stage: Pretraining & Post-trainingArchitecture: transformers with RoPE, SwiGLU, RMSNorm, Attention QKV bias and t…