Accelerating Training of Transformer-BasedLanguage Models with Progressive Layer Dropping
논문 : https://proceedings.neurips.cc/paper_files/paper/2020/file/a1140a3d0df1c81e24ae954d935e8926-Paper.pdf FlowQ. How to train huge transformer networks efficiently?기존 연구들 : Extremely high performance hardwareMixed-precision training (Quantization)forward pass and backward pass are computed in half-precision and parameter update is in single precision-) Tensor Cores가 필요한데 이걸 support하는 hardware가 ..
2024.10.15