應(yīng)yl7703永利官網(wǎng)邀請(qǐng), 江西師范大學(xué)計(jì)算機(jī)學(xué)院曾錦山教授將于2021年5月23日在城關(guān)校區(qū)西區(qū)學(xué)生活動(dòng)中心506會(huì)議室舉辦專題學(xué)術(shù)報(bào)告.
報(bào)告題目:On ADMM in Deep Learning: Convergence and Saturation-Avoidance
時(shí) 間:2021年5月23日(星期天)上午11:10-11:50
線下地點(diǎn): 蘭州大學(xué)城關(guān)校區(qū)西區(qū)學(xué)生活動(dòng)中心506會(huì)議室
報(bào)告摘要:
In this talk, we develop an alternating direction method of multipliers (ADMM) for deep neural networks training with sigmoid-type activation functions (called sigmoid-ADMM pair), mainly motivated by the gradient-free nature of ADMM in avoiding the saturation of sigmoid-type activations and the advantages of deep neural networks with sigmoid-type activations (called deep sigmoid nets) over their rectified linear unit (ReLU) counterparts (called deep ReLU nets) in terms of approximation. In particular, we prove that the approximation capability of deep sigmoid nets is not worse than deep ReLU nets by showing that ReLU activation fucntion can be well approximated by deep sigmoid nets with two hidden layers and finitely many free parameters but not vice-verse. We also establish the global convergence of the proposed ADMM for the nonlinearly constrained formulation of the deep sigmoid nets training from arbitrary initial points to a Karush-Kuhn-Tucker (KKT) point at a rate of order O(1/k). Besides sigmoid activation, such a convergence theorem holds for a general class of smooth activations. Compared with the widely used stochastic gradient descent (SGD) algorithm for the deep ReLU nets training (called ReLU-SGD pair), the proposed sigmoid-ADMM pair is practically stable with respect to the algorithmic hyperparameters including the learning rate, initial schemes and the pro-processing of the input data. Moreover, we find that to approximate and learn simple but important functions the proposed sigmoid-ADMM pair numerically outperforms the ReLU-SGD pair.
曾錦山教授簡介
曾錦山,系江西師范大學(xué)計(jì)算機(jī)信息工程學(xué)院于2015年引進(jìn)的優(yōu)秀海歸博士,2018、2020年兩次獲得世界華人數(shù)學(xué)家大會(huì)(ICCM)最佳論文獎(jiǎng)并受邀作45分鐘學(xué)術(shù)報(bào)告.曾錦山老師多年來聚焦于人工智能應(yīng)用中的優(yōu)化算法理論研究,在相關(guān)研究領(lǐng)域的重要期刊和會(huì)議上發(fā)表高水平論文30余篇,其中SCI論文20余篇,IEEE Transactions系列論文10篇,ESI熱點(diǎn)論文1篇,CCF A類論文3篇.論文近五年被引用近千次,單篇最高引用逾450次.
甘肅省高校應(yīng)用數(shù)學(xué)與復(fù)雜系統(tǒng)省級(jí)重點(diǎn)實(shí)驗(yàn)室
yl7703永利官網(wǎng)
蘭州大學(xué)萃英學(xué)院
2021年5月20日