[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習 - SlideShare
文章推薦指數: 80 %
深度學習 ( Deep Learning ) 是機器學習 ( Machine Learning ) 中近年來備受重視的一支,深度學習根源於類神經網路 ( Artificial Neural Network ) 模型,但今日深度 ...
SlideShareusescookiestoimprovefunctionalityandperformance,andtoprovideyouwithrelevantadvertising.Ifyoucontinuebrowsingthesite,youagreetotheuseofcookiesonthiswebsite.SeeourUserAgreementandPrivacyPolicy.
SlideShareusescookiestoimprovefunctionalityandperformance,andtoprovideyouwithrelevantadvertising.Ifyoucontinuebrowsingthesite,youagreetotheuseofcookiesonthiswebsite.SeeourPrivacyPolicyandUserAgreementfordetails.
Upload
Home
Explore
Login
Signup
Successfullyreportedthisslideshow.
Activateyour14dayfreetrial tounlockunlimitedreading.
[DSC2016]系列活動:李宏毅/一天搞懂深度學習
2765
Share
台灣資料科學年會
•
May.21,2016
•
2,765likes
•
493,622views
DownloadNow
Download
NextSlideShares
Youarereadingapreview.
Activateyour14dayfreetrial tocontinuereading.
ContinueforFree
UpcomingSlideShare
DeepLearning:Introduction&Chapter5MachineLearningBasics
Loadingin…3
×
Facebook
Twitter
LinkedIn
Size(px)
Starton
ShowrelatedSlideSharesatend
Share
Email
Topclippedslide
1
1of301
1of301
[DSC2016]系列活動:李宏毅/一天搞懂深度學習
May.21,2016
•
2,765likes
•
493,622views
2765
Share
DownloadNow
Download
Downloadtoreadoffline
Data&Analytics
深度學習(DeepLearning)是機器學習(MachineLearning)中近年來備受重視的一支,深度學習根源於類神經網路(ArtificialNeuralNetwork)模型,但今日深度學習的技術和它的前身已截然不同,目前最好的語音辨識和影像辨識系統都是以深度學習技術來完成,你可能在很多不同的場合聽過各種用深度學習做出的驚人應用(例如:最近紅遍大街小巷的AlphaGo),聽完以後覺得心癢癢的,想要趕快使用這項強大的技術,卻不知要從何下手學習,那這門課就是你所需要的。
這門課程將由台大電機系李宏毅教授利用短短的一天議程簡介深度學習。
以下是課程大綱:
什麼是深度學習
深度學習的技術表面上看起來五花八門,但其實就是三個步驟:設定好類神經網路架構、訂出學習目標、開始學習,這堂課會簡介如何使用深度學習的工具Keras,它可以幫助你在十分鐘內完成深度學習的程式。
另外,有人說深度學習很厲害、有各種吹捧,也有人說深度學習只是個噱頭,到底深度學習和其他的機器學習方法有什麼不同呢?這堂課要剖析深度學習和其它機器學習方法相比潛在的優勢。
深度學習的各種小技巧
雖然現在深度學習的工具滿街都是,想要寫一個深度學習的程式只是舉手之勞,但要得到好的成果可不簡單,訓練過程中各種枝枝節節的小技巧才是成功的關鍵。
本課程中將分享深度學習的實作技巧及實戰經驗。
有記憶力的深度學習模型
機器需要記憶力才能做更多事情,這段課程要講解遞迴式類神經網路(RecurrentNeuralNetwork),告訴大家深度學習模型如何可以有記憶力。
深度學習應用與展望
深度學習可以拿來做甚麼?怎麼用深度學習做語音辨識?怎麼用深度學習做問答系統?接下來深度學習的研究者們在意的是什麼樣的問題呢?
本課程希望幫助大家不只能了解深度學習,也可以有效率地上手深度學習,用在手邊的問題上。
無論是從未嘗試過深度學習的新手,還是已經有一點經驗想更深入學習,都可以在這門課中有所收穫。
Readmore
台灣資料科學年會
Follow
深度學習(DeepLearning)是機器學習(MachineLearning)中近年來備受重視的一支,深度學習根源於類神經網路(ArtificialNeuralNetwork)模型,但今日深度學習的技術和它的前身已截然不同,目前最好的語音辨識和影像辨識系統都是以深度學習技術來完成,你可能在很多不同的場合聽過各種用深度學習做出的驚人應用(例如:最近紅遍大街小巷的AlphaGo),聽完以後覺得心癢癢的,想要趕快使用這項強大的技術,卻不知要從何下手學習,那這門課就是你所需要的。
這門課程將由台大電機系李宏毅教授利用短短的一天議程簡介深度學習。
以下是課程大綱:
什麼是深度學習
深度學習的技術表面上看起來五花八門,但其實就是三個步驟:設定好類神經網路架構、訂出學習目標、開始學習,這堂課會簡介如何使用深度學習的工具Keras,它可以幫助你在十分鐘內完成深度學習的程式。
另外,有人說深度學習很厲害、有各種吹捧,也有人說深度學習只是個噱頭,到底深度學習和其他的機器學習方法有什麼不同呢?這堂課要剖析深度學習和其它機器學習方法相比潛在的優勢。
深度學習的各種小技巧
雖然現在深度學習的工具滿街都是,想要寫一個深度學習的程式只是舉手之勞,但要得到好的成果可不簡單,訓練過程中各種枝枝節節的小技巧才是成功的關鍵。
本課程中將分享深度學習的實作技巧及實戰經驗。
有記憶力的深度學習模型
機器需要記憶力才能做更多事情,這段課程要講解遞迴式類神經網路(RecurrentNeuralNetwork),告訴大家深度學習模型如何可以有記憶力。
深度學習應用與展望
深度學習可以拿來做甚麼?怎麼用深度學習做語音辨識?怎麼用深度學習做問答系統?接下來深度學習的研究者們在意的是什麼樣的問題呢?
本課程希望幫助大家不只能了解深度學習,也可以有效率地上手深度學習,用在手邊的問題上。
無論是從未嘗試過深度學習的新手,還是已經有一點經驗想更深入學習,都可以在這門課中有所收穫。
Readmore
Data&Analytics
DeepLearning:Introduction&Chapter5MachineLearningBasics
JasonTsai
李宏毅/當語音處理遇上深度學習
台灣資料科學年會
MachineLearning,DeepLearningandDataAnalysisIntroduction
Te-YenLiu
WhatDeepLearningMeansforArtificialIntelligence
JonathanMugan
[系列活動]一日搞懂生成式對抗網路
台灣資料科學年會
WhatDeepLearningMeansforArtificialIntelligence
JonathanMugan
[系列活動]手把手的深度學實務
台灣資料科學年會
PythonforImageUnderstanding:DeepLearningwithConvolutionalNeuralNets
RoelofPieters
DeepLearningTutorial|DeepLearningTensorFlow|DeepLearningWithNeural...
Simplilearn
DeepLearningInterviewQuestionsAndAnswers|AI&DeepLearningInterview...
Simplilearn
DeepLearning:Introduction&Chapter5MachineLearningBasics
JasonTsai
李宏毅/當語音處理遇上深度學習
台灣資料科學年會
MachineLearning,DeepLearningandDataAnalysisIntroduction
Te-YenLiu
WhatDeepLearningMeansforArtificialIntelligence
JonathanMugan
[系列活動]一日搞懂生成式對抗網路
台灣資料科學年會
WhatDeepLearningMeansforArtificialIntelligence
JonathanMugan
[系列活動]手把手的深度學實務
台灣資料科學年會
PythonforImageUnderstanding:DeepLearningwithConvolutionalNeuralNets
RoelofPieters
DeepLearningTutorial|DeepLearningTensorFlow|DeepLearningWithNeural...
Simplilearn
DeepLearningInterviewQuestionsAndAnswers|AI&DeepLearningInterview...
Simplilearn
MoreRelatedContent
Viewersalsoliked
孫民/從電腦視覺看人工智慧:下一件大事
台灣資料科學年會
Deeplearning:thefutureofrecommendations
BalázsHidasi
WhatIsDeepLearning?|IntroductiontoDeepLearning|DeepLearningTutori...
Simplilearn
[系列活動]手把手的深度學習實務
台灣資料科學年會
DeepLearningforPersonalizedSearchandRecommenderSystems
BenjaminLe
Deeplearning-AVisualIntroduction
LukasMasuch
IntroductiontodeeplearninginpythonandMatlab
ImryKissos
DeepLearning
JunWang
Attentionmechanismswithtensorflow
KeonKim
DeepLearningTutorial|DeepLearningTutorialForBeginners|WhatIsDeep...
Simplilearn
AnintroductiontoDeepLearning
DavidRostcheck
DeepLearningwithPython(PyDataSeattle2015)
AlexanderKorbonits
SyntheticdialoguegenerationwithDeepLearning
SN
DeepLearningandReinforcementLearning
RenārsLiepiņš
DeeplearninginComputerVision
DavidDao
RNN,LSTMandSeq-2-SeqModels
EmoryNLP
Deeplearning
PratapDangeti
DeepLearningandDesignThinking
Yen-lungTsai
IntroductiontoNeuralNetworks
Databricks
Similarto[DSC2016]系列活動:李宏毅/一天搞懂深度學習
Hands-onTutorialofDeepLearning
Chun-MingChang
Hands-ontutorialofdeeplearning(Keras)
Chun-MinChang
IntroductiontoDeeplearningandH2Oforbeginner's
VidyasagarBhargava
BaiscDeepLearningHandsOn
SeanYu
Neuralnetworks
PrakharMishra
NeuralnetworkbasicandintroductionofDeeplearning
TapasMajumdar
deepCNNvsconventionalML
[email protected]
Deeplearning
NimritaKoul
Notesfrom2016bayareadeeplearningschool
NiketanPansare
Lecture5:NeuralNetworksII
SangJunLee
DeeplearningMindMap
AshishPatel
Deeplearning
RouyunPan
GettingstartedwithMachineLearning
GauravBhalotia
Handsonmachinelearningwithscikit-learnandtensorflowbyahmedyousry
AhmedYousry
Deeplearningarchitectures
Joeli
20110480.neural-networks
ParneetKaur
AutoencodersinDeepLearning
ShajunNisha
IntroductiontoDeepLearning
MehrnazFaraz
Nimritadeeplearning
NimritaKoul
backpropagationinneuralnetworks
AkashGoel
Morefrom台灣資料科學年會
[台灣人工智慧學校]人工智慧技術發展與應用
台灣資料科學年會
[台灣人工智慧學校]執行長報告
台灣資料科學年會
[台灣人工智慧學校]工業4.0與智慧製造的發展趨勢與挑戰
台灣資料科學年會
[台灣人工智慧學校]開創台灣產業智慧轉型的新契機
台灣資料科學年會
[台灣人工智慧學校]開創台灣產業智慧轉型的新契機
台灣資料科學年會
[台灣人工智慧學校]台北總校第三期結業典禮-執行長談話
台灣資料科學年會
[TOxAIA台中分校]AI引爆新工業革命,智慧機械首都台中轉型論壇
台灣資料科學年會
[TOxAIA台中分校]2019台灣數位轉型與產業升級趨勢觀察
台灣資料科學年會
[TOxAIA台中分校]智慧製造成真!產線導入AI的致勝關鍵
台灣資料科學年會
[台灣人工智慧學校]從經濟學看人工智慧產業應用
台灣資料科學年會
[台灣人工智慧學校]台中分校第二期開學典禮-執行長報告
台灣資料科學年會
台灣人工智慧學校成果發表會
台灣資料科學年會
[台中分校]第一期結業典禮-執行長談話
台灣資料科學年會
[TOxAIA新竹分校]工業4.0潛力新應用!多模式對話機器人
台灣資料科學年會
[TOxAIA新竹分校]AI整合是重點!竹科的關鍵轉型思維
台灣資料科學年會
[TOxAIA新竹分校]2019台灣數位轉型與產業升級趨勢觀察
台灣資料科學年會
[TOxAIA新竹分校]深度學習與Kaggle實戰
台灣資料科學年會
[台灣人工智慧學校]BridgingAItoPrecisionAgriculturethroughIoT
台灣資料科學年會
[2018台灣人工智慧學校校友年會]產業經驗分享:如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質/李明達
台灣資料科學年會
[2018台灣人工智慧學校校友年會]啟動物聯網新關鍵-未來由你「喚」醒/沈品勳
台灣資料科學年會
Featured
WhattoUploadtoSlideShare
SlideShare
BeAGreatProductLeader(Amplify,Oct2019)
AdamNash
TrillionDollarCoachBook(BillCampbell)
EricSchmidt
APIdaysParis2019-Innovation@scale,APIsasDigitalFactories'NewMachi...
apidays
Afewthoughtsonworklife-balance
WimVanderbauwhede
Isvcstillathingfinal
MarkSuster
TheGaryVeeContentModel
GaryVaynerchuk
MammalianBrainChemistryExplainsEverything
LorettaBreuning,PhD
Blockchain+AI+CryptoEconomicsAreWeCreatingaCodeTsunami?
DinisGuarda
TheAIRush
Jean-BaptisteDumont
AIandMachineLearningDemystifiedbyCarolSmithatMidwestUX2017
CarolSmith
10factsaboutjobsinthefuture
PewResearchCenter'sInternet&AmericanLifeProject
HarrySurden-ArtificialIntelligenceandLawOverview
HarrySurden
InsideGoogle'sNumbersin2017
RandFishkin
Pinot:RealtimeDistributedOLAPdatastore
KishoreGopalakrishna
HowtoBecomeaThoughtLeaderinYourNiche
LeslieSamuel
VisualDesignwithData
SethFamilian
DesigningTeamsforEmergingChallenges
AaronIrizarry
UX,ethnographyandpossibilities:forLibraries,MuseumsandArchives
NedPotter
WinnersandLosers-Allthe(Russian)President'sMen
IanBremmer
RelatedBooks
Freewitha14daytrialfromScribd
Seeall
DataVisualization:asuccessfuldesignprocess
AndyKirk
(4/5)
Free
DynamicModelsinBiology
StephenP.Ellner
(3/5)
Free
Agent-BasedandIndividual-BasedModeling:APracticalIntroduction,SecondEdition
StevenF.Railsback
(4/5)
Free
Outnumbered:FromFacebookandGoogletoFakeNewsandFilter-bubbles–TheAlgorithmsThatControlOurLives
DavidSumpter
(5/5)
Free
DataModelPatterns:AMetadataMap
DavidC.Hay
(3/5)
Free
GuerrillaDataAnalysisUsingMicrosoftExcel:2ndEditionCoveringExcel2010/2013
OzduSoleil
(3/5)
Free
PythonMachineLearning
SebastianRaschka
(4/5)
Free
PowerPivotandPowerBI:TheExcelUser'sGuidetoDAX,PowerQuery,PowerBI&PowerPivotinExcel2010-2016
RobCollie
(4.5/5)
Free
SuperchargeExcel:WhenyoulearntoWriteDAXforPowerPivot
MattAllington
(0/5)
Free
LearntoWriteDAX:ApracticalguidetolearningPowerPivotforExcelandPowerBI
MattAllington
(4/5)
Free
BusinessAnalysis
DebraPaul
(4.5/5)
Free
PythonDataScienceEssentials-SecondEdition
LucaMassaron
(4/5)
Free
Probability,MarkovChains,Queues,andSimulation:TheMathematicalBasisofPerformanceModeling
WilliamJ.Stewart
(2/5)
Free
PythonDataAnalysis
IvanIdris
(4/5)
Free
NumericalMethodsforStochasticComputations:ASpectralMethodApproach
DongbinXiu
(5/5)
Free
DataVisualizationwithD3.jsCookbook
NickQiZhu
(0/5)
Free
RelatedAudiobooks
Freewitha14daytrialfromScribd
Seeall
MachineLearninginPython:HandsonMachineLearningwithPythonTools,ConceptsandTechniques
BobMather
(4.5/5)
Free
DataVisualizationGuide:ClearIntroductiontoDataMining,Analysis,andVisualization
AlexCampbell
(0/5)
Free
AdvancesinFinancialMachineLearning
MarcosLopezdePrado
(5/5)
Free
PythonGuide:ClearIntroductiontoPythonProgrammingandMachineLearning
AlexCampbell
(0/5)
Free
DataVisualization:ClearIntroductiontoDataVisualizationwithPython.ProperGuideforDataScientist.
AlexCampbell
(0/5)
Free
DataMiningandAnalytics:UltimateGuidetotheBasicsofDataMining,AnalyticsandMetrics
AlexCampbell
(0/5)
Free
[DSC2016]系列活動:李宏毅/一天搞懂深度學習
1.
DeepLearningTutorial
李宏毅
Hung-yiLee
2.
Deeplearning
attractslotsofattention.
•Ibelieveyouhaveseenlotsofexcitingresults
before.
Thistalkfocusesonthebasictechniques.
Deeplearningtrends
atGoogle.Source:
SIGMOD/JeffDean
3.
Outline
LectureIV:NextWave
LectureIII:VariantsofNeuralNetwork
LectureII:TipsforTrainingDeepNeuralNetwork
LectureI:IntroductionofDeepLearning
4.
LectureI:
Introductionof
DeepLearning
5.
OutlineofLectureI
IntroductionofDeepLearning
WhyDeep?
“HelloWorld”forDeepLearning
Let’sstartwithgeneral
machinelearning.
6.
MachineLearning
≈LookingforaFunction
•SpeechRecognition
•ImageRecognition
•PlayingGo
•DialogueSystem
f
f
f
f
“Cat”
“Howareyou”
“5-5”
“Hello”“Hi”
(whattheusersaid)(systemresponse)
(nextmove)
7.
Framework
Asetof
function21,ff
1f“cat”
1f“dog”
2f“money”
2f“snake”
Model
f“cat”
ImageRecognition:
8.
Framework
Asetof
function21,ff
f“cat”
ImageRecognition:
Model
Training
Data
Goodnessof
functionf
Better!
“monkey”“cat”“dog”
functioninput:
functionoutput:
SupervisedLearning
9.
Framework
Asetof
function21,ff
f“cat”
ImageRecognition:
Model
Training
Data
Goodnessof
functionf
“monkey”“cat”“dog”
*
f
Pickthe“Best”Function
Using
f
“cat”
TrainingTesting
Step1
Step2Step3
10.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
11.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
Neural
Network
12.
HumanBrains
13.
bwawawazKKkk11
NeuralNetwork
z
1w
kw
Kw
…
1a
ka
Ka
b
z
bias
a
weights
Neuron…
……
Asimplefunction
Activation
function
14.
NeuralNetwork
z
bias
Activation
function
weights
Neuron
1
-2
-1
1
2
-1
1
4
z
z
z
e
z
1
1
SigmoidFunction
0.98
15.
NeuralNetwork
z
z
z
z
Differentconnectionsleadsto
differentnetworkstructure
Weightsandbiasesarenetworkparameters𝜃
Eachneuronscanhavedifferentvalues
ofweightsandbiases.
16.
FullyConnectFeedforward
Network
z
z
z
e
z
1
1
SigmoidFunction
1
-1
1
-2
1
-1
1
0
4
-2
0.98
0.12
17.
FullyConnectFeedforward
Network
1
-2
1
-1
1
0
4
-2
0.98
0.12
2
-1
-1
-2
3
-1
4
-1
0.86
0.11
0.62
0.83
0
0
-2
2
1
-1
18.
FullyConnectFeedforward
Network
1
-2
1
-1
1
0
0.73
0.5
2
-1
-1
-2
3
-1
4
-1
0.72
0.12
0.51
0.85
0
0
-2
2
𝑓
0
0
=
0.51
0.85
Givenparameters𝜃,defineafunction
𝑓
1
−1
=
0.62
0.83
0
0
Thisisafunction.
Inputvector,outputvector
Givennetworkstructure,defineafunctionset
19.
Output
LayerHiddenLayers
Input
Layer
FullyConnectFeedforward
Network
InputOutput
1x
2x
Layer1
……
Nx
……
Layer2
……
LayerL
……
……
……
……
……
y1
y2
yM
Deepmeansmanyhiddenlayers
neuron
20.
OutputLayer(Option)
•Softmaxlayerastheoutputlayer
OrdinaryLayer
11zy
22zy
33zy
1z
2z
3z
Ingeneral,theoutputof
networkcanbeanyvalue.
Maynotbeeasytointerpret
21.
OutputLayer(Option)
•Softmaxlayerastheoutputlayer
1z
2z
3z
SoftmaxLayer
e
e
e
1z
e
2z
e
3z
e
3
1
1
1
j
zzj
eey
3
1j
zj
e
3
-3
12.7
20
0.05
0.88
0.12
≈0
Probability:
1>𝑦𝑖>0
𝑖𝑦𝑖=1
3
1
2
2
j
zzj
eey
3
1
3
3
j
zzj
eey
22.
ExampleApplication
InputOutput
16x16=256
1x
2x
256x
……
Ink→1
Noink→0
……
y1
y2
y10
Eachdimensionrepresents
theconfidenceofadigit.
is1
is2
is0
……
0.1
0.7
0.2
Theimage
is“2”
23.
ExampleApplication
•HandwritingDigitRecognition
Machine“2”
1x
2x
256x
……
……
y1
y2
y10
is1
is2
is0
……
Whatisneededisa
function……
Input:
256-dimvector
output:
10-dimvector
Neural
Network
24.
Output
LayerHiddenLayers
Input
Layer
ExampleApplication
InputOutput
1x
2x
Layer1
……
Nx
……
Layer2
……
LayerL
……
……
……
……
“2”
……
y1
y2
y10
is1
is2
is0
……
Afunctionsetcontainingthe
candidatesfor
HandwritingDigitRecognition
Youneedtodecidethenetworkstructureto
letagoodfunctioninyourfunctionset.
25.
FAQ
•Q:Howmanylayers?Howmanyneuronsforeach
layer?
•Q:Canthestructurebeautomaticallydetermined?
TrialandErrorIntuition+
26.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
Neural
Network
27.
TrainingData
•Preparingtrainingdata:imagesandtheirlabels
Thelearningtargetisdefinedon
thetrainingdata.
“5”“0”“4”“1”
“3”“1”“2”“9”
28.
LearningTarget
16x16=256
1x
2x
……256x
……
……
……
……
Ink→1
Noink→0
……
y1
y2
y10
y1hasthemaximumvalue
Thelearningtargetis……
Input:
y2hasthemaximumvalueInput:
is1
is2
is0
Softmax
29.
Loss
1x
2x
……
Nx
……
……
……
……
……
y1
y2
y10
Loss
𝑙
“1”
……
1
0
0……
Losscanbethedistancebetweenthe
networkoutputandtarget
target
Ascloseas
possible
Agoodfunctionshouldmaketheloss
ofallexamplesassmallaspossible.
Givenasetof
parameters
30.
TotalLoss
x1
x2
xR
NN
NN
NN
……
……
y1
y2
yR
𝑦1
𝑦2
𝑦𝑅
𝑙1
……
……
x3NNy3
𝑦3
Foralltrainingdata…
𝐿=
𝑟=1
𝑅
𝑙𝑟
Findthenetwork
parameters𝜽∗that
minimizetotallossL
TotalLoss:
𝑙2
𝑙3
𝑙𝑅
Assmallaspossible
Findafunctionin
functionsetthat
minimizestotallossL
31.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
Neural
Network
32.
Howtopickthebestfunction
Findnetworkparameters𝜽∗thatminimizetotallossL
Networkparameters𝜃=
𝑤1,𝑤2,𝑤3,⋯,𝑏1,𝑏2,𝑏3,⋯
Enumerateallpossiblevalues
Layerl
……
Layerl+1
……
E.g.speechrecognition:8layersand
1000neuronseachlayer
1000
neurons
1000
neurons
106
weights
Millionsofparameters
33.
GradientDescent
Total
Loss𝐿
Random,RBMpre-train
Usuallygoodenough
Networkparameters𝜃=
𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯
w
Pickaninitialvalueforw
Findnetworkparameters𝜽∗thatminimizetotallossL
34.
GradientDescent
Total
Loss𝐿
Networkparameters𝜃=
𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯
w
Pickaninitialvalueforw
Compute𝜕𝐿𝜕𝑤
Positive
Negative
Decreasew
Increasew
http://chico386.pixnet.net/album/photo/171572850
Findnetworkparameters𝜽∗thatminimizetotallossL
35.
GradientDescent
Total
Loss𝐿
Networkparameters𝜃=
𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯
w
Pickaninitialvalueforw
Compute𝜕𝐿𝜕𝑤
−𝜂𝜕𝐿𝜕𝑤
ηiscalled
“learningrate”
𝑤←𝑤−𝜂𝜕𝐿𝜕𝑤
Repeat
Findnetworkparameters𝜽∗thatminimizetotallossL
36.
GradientDescent
Total
Loss𝐿
Networkparameters𝜃=
𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯
w
Pickaninitialvalueforw
Compute𝜕𝐿𝜕𝑤
𝑤←𝑤−𝜂𝜕𝐿𝜕𝑤
RepeatUntil𝜕𝐿𝜕𝑤isapproximatelysmall
(whenupdateislittle)
Findnetworkparameters𝜽∗thatminimizetotallossL
37.
GradientDescent
𝑤1
Compute𝜕𝐿𝜕𝑤1
−𝜇𝜕𝐿𝜕𝑤1
0.15
𝑤2
Compute𝜕𝐿𝜕𝑤2
−𝜇𝜕𝐿𝜕𝑤2
0.05
𝑏1
Compute𝜕𝐿𝜕𝑏1
−𝜇𝜕𝐿𝜕𝑏1
0.2
…………
0.2
-0.1
0.3
𝜃
𝜕𝐿
𝜕𝑤1
𝜕𝐿
𝜕𝑤2
⋮
𝜕𝐿
𝜕𝑏1
⋮
𝛻𝐿=
gradient
38.
GradientDescent
𝑤1
Compute𝜕𝐿𝜕𝑤1
−𝜇𝜕𝐿𝜕𝑤1
0.15
−𝜇𝜕𝐿𝜕𝑤1
Compute𝜕𝐿𝜕𝑤1
0.09
𝑤2
Compute𝜕𝐿𝜕𝑤2
−𝜇𝜕𝐿𝜕𝑤2
0.05
−𝜇𝜕𝐿𝜕𝑤2
Compute𝜕𝐿𝜕𝑤2
0.15
𝑏1
Compute𝜕𝐿𝜕𝑏1
−𝜇𝜕𝐿𝜕𝑏1
0.2
−𝜇𝜕𝐿𝜕𝑏1
Compute𝜕𝐿𝜕𝑏1
0.10
…………
0.2
-0.1
0.3
……
……
……
𝜃
39.
𝑤1
𝑤2
GradientDescent
Color:Valueof
TotalLossL
Randomlypickastartingpoint
40.
𝑤1
𝑤2
GradientDescentHopfully,wewouldreach
aminima…..
Compute𝜕𝐿𝜕𝑤1,𝜕𝐿𝜕𝑤2
(−𝜂𝜕𝐿𝜕𝑤1,−𝜂𝜕𝐿𝜕𝑤2)
Color:Valueof
TotalLossL
41.
GradientDescent-Difficulty
•Gradientdescentneverguaranteeglobalminima
𝐿
𝑤1𝑤2
Differentinitialpoint
Reachdifferentminima,
sodifferentresults
Therearesometipsto
helpyouavoidlocal
minima,noguarantee.
42.
GradientDescent
𝑤1𝑤2
YouareplayingAgeofEmpires…
Compute𝜕𝐿𝜕𝑤1,𝜕𝐿𝜕𝑤2
(−𝜂𝜕𝐿𝜕𝑤1,−𝜂𝜕𝐿𝜕𝑤2)
Youcannotseethewholemap.
43.
GradientDescent
Thisisthe“learning”ofmachinesindeep
learning……
Evenalphagousingthisapproach.
Ihopeyouarenottoodisappointed:p
Peopleimage……Actually…..
44.
Backpropagation
•Backpropagation:anefficientwaytocompute𝜕𝐿𝜕𝑤
•Ref:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_201
5_2/Lecture/DNN%20backprop.ecm.mp4/index.html
Don’tworryabout𝜕𝐿𝜕𝑤,thetoolkitswillhandleit.
台大周伯威
同學開發
45.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ConcludingRemarks
DeepLearningissosimple……
46.
OutlineofLectureI
IntroductionofDeepLearning
WhyDeep?
“HelloWorld”forDeepLearning
47.
LayerXSize
WordError
Rate(%)
LayerXSize
WordError
Rate(%)
1X2k24.2
2X2k20.4
3X2k18.4
4X2k17.8
5X2k17.21X377222.5
7X2k17.11X463422.6
1X16k22.1
DeeperisBetter?
Seide,Frank,GangLi,andDongYu."ConversationalSpeechTranscription
UsingContext-DependentDeepNeuralNetworks."Interspeech.2011.
Notsurprised,more
parameters,better
performance
48.
UniversalityTheorem
Referenceforthereason:
http://neuralnetworksandde
eplearning.com/chap4.html
Anycontinuousfunctionf
M
:RRfN
Canberealizedbyanetwork
withonehiddenlayer
(givenenoughhidden
neurons)
Why“Deep”neuralnetworknot“Fat”neuralnetwork?
49.
Fat+Shortv.s.Thin+Tall
1x2x……Nx
Deep
1x2x……Nx
……
Shallow
Whichoneisbetter?
Thesamenumber
ofparameters
50.
Fat+Shortv.s.Thin+Tall
Seide,Frank,GangLi,andDongYu."ConversationalSpeechTranscription
UsingContext-DependentDeepNeuralNetworks."Interspeech.2011.
LayerXSize
WordError
Rate(%)
LayerXSize
WordError
Rate(%)
1X2k24.2
2X2k20.4
3X2k18.4
4X2k17.8
5X2k17.21X377222.5
7X2k17.11X463422.6
1X16k22.1
Why?
51.
Analogy
•Logiccircuitsconsistsof
gates
•Atwolayersoflogicgates
canrepresentanyBoolean
function.
•Usingmultiplelayersof
logicgatestobuildsome
functionsaremuchsimpler
•Neuralnetworkconsistsof
neurons
•Ahiddenlayernetworkcan
representanycontinuous
function.
•Usingmultiplelayersof
neuronstorepresentsome
functionsaremuchsimpler
ThispageisforEEbackground.
lessgatesneeded
LogiccircuitsNeuralnetwork
less
parameters
less
data?
52.
長髮
男
Modularization
•Deep→Modularization
Girlswith
longhair
Boyswith
shorthair
Boyswith
longhair
Image
Classifier
1
Classifier
2
Classifier
3
長髮
女
長髮
女
長髮
女
長髮
女
Girlswith
shorthair
短髮
女
短髮
男
短髮
男
短髮
男
短髮
男
短髮
女
短髮
女
短髮
女
Classifier
4
Littleexamplesweak
53.
Modularization
•Deep→Modularization
Image
Longor
short?
BoyorGirl?
Classifiersforthe
attributes
長髮
男
長髮
女
長髮
女
長髮
女
長髮
女
短髮
女短髮
男
短髮
男
短髮
男
短髮
男
短髮
女
短髮
女
短髮
女
v.s.
長髮
男
長髮
女
長髮
女
長髮
女
長髮
女
短髮
女
短髮
男
短髮
男
短髮
男
短髮
男
短髮
女
短髮
女
短髮
女
v.s.
Eachbasicclassifiercanhave
sufficienttrainingexamples.
Basic
Classifier
54.
Modularization
•Deep→Modularization
Image
Longor
short?
BoyorGirl?
Sharingbythe
followingclassifiers
asmodule
canbetrainedbylittledata
Girlswith
longhair
Boyswith
shorthair
Boyswith
longhair
Classifier
1
Classifier
2
Classifier
3
Girlswith
shorthair
Classifier
4
LittledatafineBasic
Classifier
55.
Modularization
•Deep→Modularization
1x
2x
……
Nx
……
……
……
……
……
……
Themostbasic
classifiers
Use1stlayerasmodule
tobuildclassifiers
Use2ndlayeras
module……
Themodularizationis
automaticallylearnedfromdata.
→Lesstrainingdata?
56.
Modularization
•Deep→Modularization
1x
2x
……
Nx
……
……
……
……
……
……
Themostbasic
classifiers
Use1stlayerasmodule
tobuildclassifiers
Use2ndlayeras
module……
Reference:Zeiler,M.D.,&Fergus,R.
(2014).Visualizingandunderstanding
convolutionalnetworks.InComputer
Vision–ECCV2014(pp.818-833)
57.
OutlineofLectureI
IntroductionofDeepLearning
WhyDeep?
“HelloWorld”forDeepLearning
58.
Keras
keras
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/L
ecture/Theano%20DNN.ecm.mp4/index.html
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Le
cture/RNN%20training%20(v6).ecm.mp4/index.html
Veryflexible
Needsome
efforttolearn
Easytolearnanduse
(stillhavesomeflexibility)
Youcanmodifyitifyoucanwrite
TensorFloworTheano
Interfaceof
TensorFlowor
Theano
or
Ifyouwanttolearntheano:
59.
Keras
•FrançoisCholletistheauthorofKeras.
•HecurrentlyworksforGoogleasadeeplearning
engineerandresearcher.
•KerasmeanshorninGreek
•Documentation:http://keras.io/
•Example:
https://github.com/fchollet/keras/tree/master/exa
mples
60.
使用Keras心得
感謝沈昇勳同學提供圖檔
61.
ExampleApplication
•HandwritingDigitRecognition
Machine“1”
“Helloworld”fordeeplearning
MNISTData:http://yann.lecun.com/exdb/mnist/
Kerasprovidesdatasetsloadingfunction:http://keras.io/datasets/
28x28
62.
Keras
y1y2y10
……
……
……
……
Softmax
500
500
28x28
63.
Keras
64.
Keras
Step3.1:Configuration
Step3.2:Findtheoptimalnetworkparameters
𝑤←𝑤−𝜂𝜕𝐿𝜕𝑤
0.1
Trainingdata
(Images)
Labels
(digits)
Nextlecture
65.
Keras
Step3.2:Findtheoptimalnetworkparameters
https://www.tensorflow.org/versions/r0.8/tutorials/mnist/beginners/index.html
Numberoftrainingexamples
numpyarray
28x28
=784
numpyarray
10
Numberoftrainingexamples
…………
66.
Keras
http://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
Howtousetheneuralnetwork(testing):
case1:
case2:
Saveandloadmodels
67.
Keras
•UsingGPUtospeedtraining
•Way1
•THEANO_FLAGS=device=gpu0python
YourCode.py
•Way2(inyourcode)
•importos
•os.environ["THEANO_FLAGS"]=
"device=gpu0"
68.
LiveDemo
69.
LectureII:
TipsforTrainingDNN
70.
Neural
Network
GoodResultson
TestingData?
GoodResultson
TrainingData?
Step3:pickthe
bestfunction
Step2:goodness
offunction
Step1:definea
setoffunction
YES
YES
NO
NO
Overfitting!
RecipeofDeepLearning
71.
DonotalwaysblameOverfitting
TestingData
Overfitting?
TrainingData
Notwelltrained
72.
Neural
Network
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
Differentapproachesfor
differentproblems.
e.g.dropoutforgoodresults
ontestingdata
73.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
Choosingproperloss
Mini-batch
Newactivationfunction
AdaptiveLearningRate
Momentum
74.
ChoosingProperLoss
1x
2x……
256x
……
……
……
……
……
y1
y2
y10
loss
“1”
……
1
0
0
……
target
Softmax
𝑖=1
10
𝑦𝑖−𝑦𝑖
2Square
Error
Cross
Entropy−
𝑖=1
10
𝑦𝑖𝑙𝑛𝑦𝑖
Whichoneisbetter?
𝑦1
𝑦2
𝑦10
……
1
0
0
=0=0
75.
Let’stryit
SquareError
CrossEntropy
76.
Let’stryit
Accuracy
SquareError0.11
CrossEntropy0.84
Training
Testing:
Cross
Entropy
Square
Error
77.
ChoosingProperLoss
Total
Loss
w1
w2
Cross
Entropy
Square
Error
Whenusingsoftmaxoutputlayer,
choosecrossentropy
http://jmlr.org/procee
dings/papers/v9/gloro
t10a/glorot10a.pdf
78.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
Choosingproperloss
Mini-batch
Newactivationfunction
AdaptiveLearningRate
Momentum
79.
Mini-batch
x1
NN
……
y1
𝑦1
𝑙1
x31NNy31
𝑦31
𝑙31
x2
NN
……
y2
𝑦2
𝑙2
x16NNy16
𝑦16
𝑙16
Pickthe1stbatch
Randomlyinitialize
networkparameters
Pickthe2ndbatch
Mini-batchMini-batch
𝐿′=𝑙1+𝑙31+⋯
𝐿′′=𝑙2+𝑙16+⋯
Updateparametersonce
Updateparametersonce
Untilallmini-batches
havebeenpicked
…
oneepoch
Repeattheaboveprocess
Wedonotreallyminimizetotalloss!
80.
Mini-batch
x1
NN
……
y1
𝑦1
𝑙1
x31NNy31
𝑦31
𝑙31
Mini-batch
Pickthe1stbatch
Pickthe2ndbatch
𝐿′=𝑙1
+𝑙31
+⋯
𝐿′′=𝑙2
+𝑙16
+⋯
Updateparametersonce
Updateparametersonce
Untilallmini-batches
havebeenpicked
…oneepoch
100examplesinamini-batch
Repeat20times
81.
Mini-batch
x1
NN
……
y1
𝑦1
𝑙1
x31NNy31
𝑦31
𝑙31
x2
NN
……
y2
𝑦2
𝑙2
x16NNy16
𝑦16
𝑙16
Pickthe1stbatch
Randomlyinitialize
networkparameters
Pickthe2ndbatch
Mini-batchMini-batch
𝐿′=𝑙1+𝑙31+⋯
𝐿′′=𝑙2+𝑙16+⋯
Updateparametersonce
Updateparametersonce
…
Lisdifferenteachtime
whenweupdate
parameters!
Wedonotreallyminimizetotalloss!
82.
Mini-batch
OriginalGradientDescentWithMini-batch
Unstable!!!
Thecolorsrepresentthetotalloss.
83.
Mini-batchisFaster
1epoch
Seeall
examples
Seeonlyone
batch
Updateafterseeingall
examples
Ifthereare20batches,update
20timesinoneepoch.
OriginalGradientDescentWithMini-batch
Notalwaystruewith
parallelcomputing.
Canhavethesamespeed
(notsuperlargedataset)
Mini-batchhasbetterperformance!
84.
Mini-batchisBetter!Accuracy
Mini-batch0.84
Nobatch0.12
Testing:
Epoch
Accuracy
Mini-batch
Nobatch
Training
85.
x1
NN……y1
𝑦1
𝑙1
x31NNy31
𝑦31
𝑙31
x2
NN
……
y2
𝑦2
𝑙2
x16NNy16
𝑦16
𝑙16
Mini-batchMini-batch
Shufflethetrainingexamplesforeachepoch
Epoch1
x1
NN
……
y1
𝑦1
𝑙1
x31NNy31
𝑦31
𝑙17
x2
NN
……
y2
𝑦2
𝑙2
x16NNy16
𝑦16
𝑙26
Mini-batchMini-batch
Epoch2
Don’tworry.ThisisthedefaultofKeras.
86.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
Choosingproperloss
Mini-batch
Newactivationfunction
AdaptiveLearningRate
Momentum
87.
HardtogetthepowerofDeep…
Deeperusuallydoesnotimplybetter.
ResultsonTrainingData
88.
Let’stryit
Accuracy
3layers0.84
9layers0.11
Testing:
9layers
3layers
Training
89.
VanishingGradientProblem
Largergradients
AlmostrandomAlreadyconverge
basedonrandom!?
LearnveryslowLearnveryfast
1x
2x
……
Nx
……
……
……
……
……
……
……
y1
y2
yM
Smallergradients
90.
VanishingGradientProblem
1x
2x
……
Nx
……
……
……
……
……
……
……
𝑦1
𝑦2
𝑦𝑀
……
𝑦1
𝑦2
𝑦𝑀
𝑙
Intuitivewaytocomputethederivatives…
𝜕𝑙
𝜕𝑤
=?
+∆𝑤
+∆𝑙
∆𝑙
∆𝑤
Smallergradients
Large
input
Small
output
91.
HardtogetthepowerofDeep…
In2006,peopleusedRBMpre-training.
In2015,peopleuseReLU.
92.
ReLU
•RectifiedLinearUnit(ReLU)
Reason:
1.Fasttocompute
2.Biologicalreason
3.Infinitesigmoid
withdifferentbiases
4.Vanishinggradient
problem
𝑧
𝑎
𝑎=𝑧
𝑎=0
𝜎𝑧
[XavierGlorot,AISTATS’11]
[AndrewL.Maas,ICML’13]
[KaimingHe,arXiv’15]
93.
ReLU
1x
2x
1y
2y
0
0
0
0
𝑧
𝑎
𝑎=𝑧
𝑎=0
94.
ReLU
1x
2x
1y
2y
AThinnerlinearnetwork
Donothave
smallergradients
𝑧
𝑎
𝑎=𝑧
𝑎=0
95.
Let’stryit
96.
Let’stryit
•9layers
9layersAccuracy
Sigmoid0.11
ReLU0.96
Training
Testing:
ReLU
Sigmoid
97.
ReLU-variant
𝑧
𝑎
𝑎=𝑧
𝑎=0.01𝑧
𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈
𝑧
𝑎
𝑎=𝑧
𝑎=𝛼𝑧
𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐𝑅𝑒𝐿𝑈
αalsolearnedby
gradientdescent
98.
Maxout
•Learnableactivationfunction[IanJ.Goodfellow,ICML’13]
Max
1x
2x
Input
Max
+5
+7
+−1
+1
7
1
Max
Max
+1
+2
+4
+3
2
4
ReLUisaspecialcasesofMaxout
Youcanhavemorethan2elementsinagroup.
neuron
99.
Maxout
•Learnableactivationfunction[IanJ.Goodfellow,ICML’13]
•Activationfunctioninmaxoutnetworkcanbe
anypiecewiselinearconvexfunction
•Howmanypiecesdependingonhowmany
elementsinagroup
ReLUisaspecialcasesofMaxout
2elementsinagroup3elementsinagroup
100.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
Choosingproperloss
Mini-batch
Newactivationfunction
AdaptiveLearningRate
Momentum
101.
𝑤1
𝑤2
LearningRates
Iflearningrateistoolarge
Totallossmaynotdecrease
aftereachupdate
Setthelearning
rateηcarefully
102.
𝑤1
𝑤2
LearningRates
Iflearningrateistoolarge
Setthelearning
rateηcarefully
Iflearningrateistoosmall
Trainingwouldbetooslow
Totallossmaynotdecrease
aftereachupdate
103.
LearningRates
•Popular&SimpleIdea:Reducethelearningrateby
somefactoreveryfewepochs.
•Atthebeginning,wearefarfromthedestination,sowe
uselargerlearningrate
•Afterseveralepochs,weareclosetothedestination,so
wereducethelearningrate
•E.g.1/tdecay:𝜂𝑡=𝜂𝑡+1
•Learningratecannotbeone-size-fits-all
•Givingdifferentparametersdifferentlearning
rates
104.
Adagrad
Parameterdependent
learningrate
w←𝑤−ߟ𝑤𝜕𝐿∕𝜕𝑤
constant
𝑔𝑖
is𝜕𝐿∕𝜕𝑤obtained
atthei-thupdate
ߟ𝑤=
𝜂
𝑖=0
𝑡
𝑔𝑖2
Summationofthesquareofthepreviousderivatives
𝑤←𝑤−𝜂𝜕𝐿∕𝜕𝑤Original:
Adagrad:
105.
Adagrad
g0g1……
0.10.2……
g0g1……
20.010.0……
Observation:1.Learningrateissmallerand
smallerforallparameters
2.Smallerderivatives,larger
learningrate,andviceversa
𝜂
0.12
𝜂
0.12+0.22
𝜂
202
𝜂
202+102
=
𝜂
0.1
=
𝜂
0.22
=
𝜂
20
=
𝜂
22
Why?
ߟ𝑤=
𝜂
𝑖=0
𝑡
𝑔𝑖2
Learningrate:Learningrate:
𝑤1𝑤2
106.
SmallerDerivatives
LargerLearningRate
2.Smallerderivatives,larger
learningrate,andviceversa
Why?
Smaller
LearningRate
Larger
derivatives
107.
Notthewholestory……
•Adagrad[JohnDuchi,JMLR’11]
•RMSprop
•https://www.youtube.com/watch?v=O3sxAc4hxZU
•Adadelta[MatthewD.Zeiler,arXiv’12]
•“Nomorepeskylearningrates”[TomSchaul,arXiv’12]
•AdaSecant[CaglarGulcehre,arXiv’14]
•Adam[DiederikP.Kingma,ICLR’15]
•Nadam
•http://cs229.stanford.edu/proj2015/054_report.pdf
108.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
Choosingproperloss
Mini-batch
Newactivationfunction
AdaptiveLearningRate
Momentum
109.
Hardtofind
optimalnetworkparameters
Total
Loss
Thevalueofanetworkparameterw
Veryslowatthe
plateau
Stuckatlocalminima
𝜕𝐿∕𝜕𝑤
=0
Stuckatsaddlepoint
𝜕𝐿∕𝜕𝑤
=0
𝜕𝐿∕𝜕𝑤
≈0
110.
Inphysicalworld……
•Momentum
Howaboutputthisphenomenon
ingradientdescent?
111.
Movement=
Negativeof𝜕𝐿∕𝜕𝑤+Momentum
Momentum
cost
𝜕𝐿∕𝜕𝑤=0
Stillnotguaranteereaching
globalminima,butgivesome
hope……
Negativeof𝜕𝐿∕𝜕𝑤
Momentum
RealMovement
112.
AdamRMSProp(AdvancedAdagrad)+Momentum
113.
Let’stryit
•ReLU,3layer
Accuracy
Original0.96
Adam0.97
Training
Testing:
Adam
Original
114.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
EarlyStopping
Regularization
Dropout
NetworkStructure
115.
WhyOverfitting?
•Trainingdataandtestingdatacanbedifferent.
TrainingData:TestingData:
Theparametersachievingthelearningtargetdonot
necessaryhavegoodresultsonthetestingdata.
Learningtargetisdefinedbythetrainingdata.
116.
PanaceaforOverfitting
•Havemoretrainingdata
•Createmoretrainingdata(?)
Original
TrainingData:
Created
TrainingData:
Shift15。
Handwritingrecognition:
117.
WhyOverfitting?
•Forexperiments,weaddedsomenoisestothe
testingdata
118.
WhyOverfitting?
•Forexperiments,weaddedsomenoisestothe
testingdata
Trainingisnotinfluenced.
Accuracy
Clean0.97
Noisy0.50
Testing:
119.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
EarlyStopping
WeightDecay
Dropout
NetworkStructure
120.
EarlyStopping
Epochs
Total
Loss
Trainingset
Testingset
Stopat
here
Validationset
http://keras.io/getting-started/faq/#how-can-i-interrupt-training-when-
the-validation-loss-isnt-decreasing-anymoreKeras:
121.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
EarlyStopping
WeightDecay
Dropout
NetworkStructure
122.
WeightDecay
•Ourbrainprunesouttheuselesslinkbetween
neurons.
Doingthesamethingtomachine’sbrainimproves
theperformance.
123.
WeightDecay
Useless
Closetozero(萎縮了)
Weightdecayisone
kindofregularization
124.
WeightDecay
•Implementation
Smallerandsmaller
Keras:http://keras.io/regularizers/
w
L
ww
w
L
ww
1
Original:
WeightDecay:
0.01
0.99
125.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
EarlyStopping
WeightDecay
Dropout
NetworkStructure
126.
Dropout
Training:
Eachtimebeforeupdatingtheparameters
Eachneuronhasp%todropout
127.
Dropout
Training:
Eachtimebeforeupdatingtheparameters
Eachneuronhasp%todropout
Usingthenewnetworkfortraining
Thestructureofthenetworkischanged.
Thinner!
Foreachmini-batch,weresamplethedropoutneurons
128.
Dropout
Testing:
Nodropout
Ifthedropoutrateattrainingisp%,
alltheweightstimes(1-p)%
Assumethatthedropoutrateis50%.
Ifaweightw=1bytraining,set𝑤=0.5fortesting.
129.
Dropout-IntuitiveReason
Whenteamsup,ifeveryoneexpectthepartnerwilldo
thework,nothingwillbedonefinally.
However,ifyouknowyourpartnerwilldropout,you
willdobetter.
我的partner
會擺爛,所以
我要好好做
Whentesting,noonedropoutactually,soobtaining
goodresultseventually.
130.
Dropout-IntuitiveReason
•Whytheweightsshouldmultiply(1-p)%(dropout
rate)whentesting?
TrainingofDropoutTestingofDropout
𝑤1
𝑤2
𝑤3
𝑤4
𝑧
𝑤1
𝑤2
𝑤3
𝑤4
𝑧′
Assumedropoutrateis50%
0.5×
0.5×
0.5×
0.5×
Nodropout
Weightsfromtraining
𝑧′≈2𝑧
𝑧′≈𝑧
Weightsmultiply(1-p)%
131.
Dropoutisakindofensemble.
Ensemble
Network
1
Network
2
Network
3
Network
4
Trainabunchofnetworkswithdifferentstructures
Training
Set
Set1Set2Set3Set4
132.
Dropoutisakindofensemble.
Ensemble
y1
Network
1
Network
2
Network
3
Network
4
Testingdatax
y2y3y4
average
133.
Dropoutisakindofensemble.
Trainingof
Dropout
minibatch
1
……
Usingonemini-batchtotrainonenetwork
Someparametersinthenetworkareshared
minibatch
2
minibatch
3
minibatch
4
Mneurons
2Mpossible
networks
134.
Dropoutisakindofensemble.
testingdatax
TestingofDropout
……
average
y1y2y3
Allthe
weights
multiply
(1-p)%
≈y
?????
135.
Moreaboutdropout
•Morereferencefordropout[NitishSrivastava,JMLR’14][PierreBaldi,
NIPS’13][GeoffreyE.Hinton,arXiv’12]
•DropoutworksbetterwithMaxout[IanJ.Goodfellow,ICML’13]
•Dropconnect[LiWan,ICML’13]
•Dropoutdeleteneurons
•Dropconnectdeletestheconnectionbetweenneurons
•Annealeddropout[S.J.Rennie,SLT’14]
•Dropoutratedecreasesbyepochs
•Standout[J.Ba,NISP’13]
•Eachneuralhasdifferentdropoutrate
136.
Let’stryit
y1y2y10
……
……
……
……
Softmax
500
500
model.add(dropout(0.8))
model.add(dropout(0.8))
137.
Let’stryit
Training
Dropout
NoDropout
Epoch
Accuracy
Accuracy
Noisy0.50
+dropout0.63
Testing:
138.
GoodResultson
TestingData?
GoodResultson
TrainingData?
YES
YES
RecipeofDeepLearning
EarlyStopping
Regularization
Dropout
NetworkStructure
CNNisaverygoodexample!
(nextlecture)
139.
ConcludingRemarks
ofLectureII
140.
RecipeofDeepLearning
Neural
Network
GoodResultson
TestingData?
GoodResultson
TrainingData?
Step3:pickthe
bestfunction
Step2:goodness
offunction
Step1:definea
setoffunction
YES
YES
NO
NO
141.
Let’stryanothertask
142.
DocumentClassification
http://top-breaking-news.com/
Machine
政治
體育
經濟
“president”indocument
“stock”indocument
體育政治財經
143.
Data
144.
MSE
145.
ReLU
146.
AdaptiveLearningRate
Accuracy
MSE0.36
CE0.55
+ReLU0.75
+Adam0.77
147.
Dropout
Accuracy
Adam0.77
+dropout0.79
148.
LectureIII:
VariantsofNeural
Networks
149.
VariantsofNeuralNetworks
ConvolutionalNeural
Network(CNN)
RecurrentNeuralNetwork
(RNN)
Widelyusedin
imageprocessing
150.
WhyCNNforImage?
•Whenprocessingimage,thefirstlayeroffully
connectednetworkwouldbeverylarge
100
……
……
……
……
……
Softmax
100
100x100x31000
3x107
Canthefullyconnectednetworkbesimplifiedby
consideringthepropertiesofimagerecognition?
151.
WhyCNNforImage
•Somepatternsaremuchsmallerthanthewhole
image
Aneurondoesnothavetoseethewholeimage
todiscoverthepattern.
“beak”detector
Connectingtosmallregionwithlessparameters
152.
WhyCNNforImage
•Thesamepatternsappearindifferentregions.
“upper-left
beak”detector
“middlebeak”
detector
Theycanusethesame
setofparameters.
Doalmostthesamething
153.
WhyCNNforImage
•Subsamplingthepixelswillnotchangetheobject
subsampling
bird
bird
Wecansubsamplethepixelstomakeimagesmaller
Lessparametersforthenetworktoprocesstheimage
154.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
Convolutional
NeuralNetwork
155.
ThewholeCNN
FullyConnected
Feedforwardnetwork
catdog……
Convolution
MaxPooling
Convolution
MaxPooling
Flatten
Canrepeat
manytimes
156.
ThewholeCNN
Convolution
MaxPooling
Convolution
MaxPooling
Flatten
Canrepeat
manytimes
Somepatternsaremuch
smallerthanthewholeimage
Thesamepatternsappearin
differentregions.
Subsamplingthepixelswill
notchangetheobject
Property1
Property2
Property3
157.
ThewholeCNN
FullyConnected
Feedforwardnetwork
catdog……
Convolution
MaxPooling
Convolution
MaxPooling
Flatten
Canrepeat
manytimes
158.
CNN–Convolution
100001
010010
001100
100010
010010
001010
6x6image
1-1-1
-11-1
-1-11
Filter1
-11-1
-11-1
-11-1
Filter2
……
Thosearethenetwork
parameterstobelearned.
Matrix
Matrix
Eachfilterdetectsasmall
pattern(3x3).
Property1
159.
CNN–Convolution
100001
010010
001100
100010
010010
001010
6x6image
1-1-1
-11-1
-1-11
Filter1
3-1
stride=1
160.
CNN–Convolution
100001
010010
001100
100010
010010
001010
6x6image
1-1-1
-11-1
-1-11
Filter1
3-3
Ifstride=2
Wesetstride=1below
161.
CNN–Convolution
100001
010010
001100
100010
010010
001010
6x6image
1-1-1
-11-1
-1-11
Filter1
3-1-3-1
-310-3
-3-301
3-2-2-1
stride=1
Property2
162.
CNN–Convolution
100001
010010
001100
100010
010010
001010
6x6image
3-1-3-1
-310-3
-3-301
3-2-2-1
-11-1
-11-1
-11-1
Filter2
-1-1-1-1
-1-1-21
-1-1-21
-10-43
Dothesameprocessfor
everyfilter
stride=1
4x4image
Feature
Map
163.
CNN–ZeroPadding
100001
010010
001100
100010
010010
001010
6x6image
1-1-1
-11-1
-1-11
Filter1
Youwillgetanother6x6
imagesinthisway
0
Zeropadding
00
0
0
0
0
000
164.
CNN–Colorfulimage
100001
010010
001100
100010
010010
001010
100001
010010
001100
100010
010010
001010
100001
010010
001100
100010
010010
001010
1-1-1
-11-1
-1-11
Filter1
-11-1
-11-1
-11-1
Filter2
1-1-1
-11-1
-1-11
1-1-1
-11-1
-1-11
-11-1
-11-1
-11-1
-11-1
-11-1
-11-1
Colorfulimage
165.
ThewholeCNN
FullyConnected
Feedforwardnetwork
catdog……
Convolution
MaxPooling
Convolution
MaxPooling
Flatten
Canrepeat
manytimes
166.
CNN–MaxPooling
3-1-3-1
-310-3
-3-301
3-2-2-1
-11-1
-11-1
-11-1
Filter2
-1-1-1-1
-1-1-21
-1-1-21
-10-43
1-1-1
-11-1
-1-11
Filter1
167.
CNN–MaxPooling
100001
010010
001100
100010
010010
001010
6x6image
30
13
-11
30
2x2image
Eachfilter
isachannel
Newimage
butsmaller
Conv
Max
Pooling
168.
ThewholeCNN
Convolution
MaxPooling
Convolution
MaxPooling
Canrepeat
manytimes
Anewimage
Thenumberofthechannel
isthenumberoffilters
Smallerthantheoriginal
image
30
13
-11
30
169.
ThewholeCNN
FullyConnected
Feedforwardnetwork
catdog……
Convolution
MaxPooling
Convolution
MaxPooling
Flatten
Anewimage
Anewimage
170.
Flatten
30
13
-11
30Flatten
3
0
1
3
-1
1
0
3
FullyConnected
Feedforwardnetwork
171.
ThewholeCNN
Convolution
MaxPooling
Convolution
MaxPooling
Canrepeat
manytimes
172.
Max
1x
2x
Input
Max
+5
+7
+−1
+1
7
1
100001
010010
001100
100010
010010
001010
image
convolutionMax
pooling
-11-1
-11-1
-11-1
1-1-1
-11-1
-1-11
(Ignoringthenon-linearactivationfunctionaftertheconvolution.)
173.
100001
010010
001100
100010
010010
001010
6x6image
1-1-1
-11-1
-1-11
Filter1
1:
2:
3:
…
7:
8:
9:
…
13:
14:
15:…Onlyconnectto9
input,notfully
connected
4:
10:
16:
1
0
0
0
0
1
0
0
0
0
1
1
3
Lessparameters!
174.
100001
010010
001100
100010
010010
001010
1-1-1
-11-1
-1-11
Filter1
1:
2:
3:
…
7:
8:
9:
…
13:
14:
15:…
4:
10:
16:
1
0
0
0
0
1
0
0
0
0
1
1
3
-1
Sharedweights
6x6image
Lessparameters!
Evenlessparameters!
175.
Max
1x
2x
Input
Max
+5
+7
+−1
+1
7
1
100001
010010
001100
100010
010010
001010
image
convolutionMax
pooling
-11-1
-11-1
-11-1
1-1-1
-11-1
-1-11
(Ignoringthenon-linearactivationfunctionaftertheconvolution.)
176.
3-1-3-1
-310-3
-3-301
3-2-2-1
30
13
Max
1x
1x
Input
Max
+5
+7
+−1
+1
7
1
177.
Max
1x
2x
Input
Max
+5
+7
+−1
+1
7
1
100001
010010
001100
100010
010010
001010
image
convolution
Max
pooling
-11-1
-11-1
-11-1
1-1-1
-11-1
-1-11
Only9x2=18
parameters
Dim=6x6=36
Dim=4x4x2
=32
parameters=
36x32=1152
178.
ConvolutionalNeuralNetwork
Learning:Nothingspecial,justgradientdescent……
CNN
“monkey”
“cat”
“dog”
Convolution,Max
Pooling,fullyconnected
1
0
0
……
target
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
Convolutional
NeuralNetwork
179.
PlayingGo
Network(19x19
positions)
Nextmove
19x19vector
Black:1
white:-1
none:0
19x19vector
Fully-connectedfeedword
networkcanbeused
ButCNNperformsmuchbetter.
19x19matrix
(image)
180.
PlayingGo
Network
Network
recordofpreviousplays
Target:
“天元”=1
else=0
Target:
“五之5”=1
else=0
Training:
進藤光v.s.社清春
黑:5之五
白:天元
黑:五之5
181.
WhyCNNforplayingGo?
•Somepatternsaremuchsmallerthanthewhole
image
•Thesamepatternsappearindifferentregions.
AlphaGouses5x5forfirstlayer
182.
WhyCNNforplayingGo?
•Subsamplingthepixelswillnotchangetheobject
AlphaGodoesnotuseMaxPooling……
MaxPoolingHowtoexplainthis???
183.
VariantsofNeuralNetworks
ConvolutionalNeural
Network(CNN)
RecurrentNeuralNetwork
(RNN)NeuralNetworkwithMemory
184.
ExampleApplication
•SlotFilling
IwouldliketoarriveTaipeionNovember2nd.
ticketbookingsystem
Destination:
timeofarrival:
Taipei
November2nd
Slot
185.
ExampleApplication
1x2x
2y1y
Taipei
Input:aword
(Eachwordisrepresented
asavector)
Solvingslotfillingby
Feedforwardnetwork?
186.
1-of-Nencoding
Eachdimensioncorresponds
toawordinthelexicon
Thedimensionfortheword
is1,andothersare0
lexicon={apple,bag,cat,dog,elephant}
apple=[10000]
bag=[01000]
cat=[00100]
dog=[00010]
elephant=[00001]
Thevectorislexiconsize.
1-of-NEncoding
Howtorepresenteachwordasavector?
187.
Beyond1-of-Nencoding
w=“apple”
a-a-a
a-a-b
p-p-l
26X26X26
……
a-p-p
…
p-l-e
…
…………
1
1
1
0
0
WordhashingDimensionfor“Other”
w=“Sauron”
…
apple
bag
cat
dog
elephant
“other”
0
0
0
0
0
1
w=“Gandalf”
187
188.
ExampleApplication
1x2x
2y1y
Taipei
dest
timeof
departure
Input:aword
(Eachwordisrepresented
asavector)
Output:
Probabilitydistributionthat
theinputwordbelongingto
theslots
Solvingslotfillingby
Feedforwardnetwork?
189.
ExampleApplication
1x2x
2y1y
Taipei
arriveTaipeionNovember2nd
otherotherdesttimetime
leaveTaipeionNovember2nd
placeofdeparture
Neuralnetwork
needsmemory!
dest
timeof
departure
Problem?
190.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
Recurrent
NeuralNetwork
191.
RecurrentNeuralNetwork(RNN)
1x2x
2y1y
1a2a
Memorycanbeconsidered
asanotherinput.
Theoutputofhiddenlayer
arestoredinthememory.
store
192.
RNN
storestore
x1
x2x3
y1y2
y3
a1
a1
a2
a2a3
Thesamenetworkisusedagainandagain.
arriveTaipeionNovember2nd
Probabilityof
“arrive”ineachslot
Probabilityof
“Taipei”ineachslot
Probabilityof
“on”ineachslot
193.
RNN
store
x1x2
y1y2
a1
a1
a2
……
……
……
store
x1x2
y1y2
a1
a1
a2
……
……
……
leaveTaipei
Probof“leave”
ineachslot
Probof“Taipei”
ineachslot
Probof“arrive”
ineachslot
Probof“Taipei”
ineachslot
arriveTaipei
Different
Thevaluesstoredinthememoryisdifferent.
194.
Ofcourseitcanbedeep…
…………
xt
xt+1xt+2
……
……yt
……
……
yt+1
……
yt+2
……
……
195.
BidirectionalRNN
yt+1
…………
…………
yt+2yt
xtxt+1xt+2
xt
xt+1xt+2
196.
Memory
Cell
LongShort-termMemory(LSTM)
InputGate
OutputGate
Signalcontrol
theinputgate
Signalcontrol
theoutputgate
Forget
Gate
Signalcontrol
theforgetgate
Otherpartofthenetwork
Otherpartofthenetwork
(Otherpartof
thenetwork)
(Otherpartof
thenetwork)
(Otherpartof
thenetwork)
LSTM
SpecialNeuron:
4inputs,
1output
197.
𝑧
𝑧𝑖
𝑧𝑓
𝑧𝑜
𝑔𝑧
𝑓𝑧𝑖
multiply
multiply
Activationfunctionfis
usuallyasigmoidfunction
Between0and1
Mimicopenandclosegate
c
𝑐′=𝑔𝑧𝑓𝑧𝑖+𝑐𝑓𝑧𝑓
ℎ𝑐′𝑓𝑧𝑜
𝑎=ℎ𝑐′
𝑓𝑧𝑜
𝑔𝑧𝑓𝑧𝑖
𝑐′
𝑓𝑧𝑓
𝑐𝑓𝑧𝑓
𝑐
198.
7
3
10
-10
10
3
≈13
≈1
10
10
≈0
0
199.
7
-3
10
10
-10
≈1
≈0
10
≈1
-3
-3
-3
-3
-3
200.
LSTM
ct-1
……
vector
xt
zzizfzo4vectors
201.
LSTM
xt
zzi
×
zfzo
×+×
yt
ct-1
z
zi
zf
zo
202.
LSTM
xt
zzi
×
zfzo
×+×
yt
xt+1
zzi
×
zfzo
×+×
yt+1
ht
Extension:“peephole”
ht-1ctct-1
ct-1ct
ct+1
203.
Multiple-layer
LSTM
Thisisquite
standardnow.
https://img.komicolle.org/2015-09-20/src/14426967627131.gif
Don’tworryifyoucannotunderstandthis.
Kerascanhandleit.
Kerassupports
“LSTM”,“GRU”,“SimpleRNN”layers
204.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
205.
copycopy
x1
x2x3
y1y2
y3
Wi
a1
a1
a2
a2a3
arriveTaipeionNovember2nd
Training
Sentences:
LearningTarget
otherotherdest
100100100
otherdestother
………………
timetime
206.
Step1:
defineaset
offunction
Step2:
goodnessof
function
Step3:pick
thebest
function
ThreeStepsforDeepLearning
DeepLearningissosimple……
207.
Learning
RNNLearningisverydifficultinpractice.
Backpropagation
throughtime(BPTT)
𝑤←𝑤−𝜂𝜕𝐿∕𝜕𝑤1x2x
2y1y
1a2a
copy
𝑤
208.
Unfortunately……
•RNN-basednetworkisnotalwayseasytolearn
感謝曾柏翔同學
提供實驗結果
RealexperimentsonLanguagemodeling
Lucky
sometimes
TotalLoss
Epoch
209.
Theerrorsurfaceisrough.
w1
w2
Cost
Theerrorsurfaceiseither
veryflatorverysteep.
Clipping
[RazvanPascanu,ICML’13]
TotalLoss
210.
Why?
1
1
y1
0
1
w
y2
0
1
w
y3
0
1
w
y1000
……
𝑤=1
𝑤=1.01
𝑦1000
=1
𝑦1000≈20000
𝑤=0.99
𝑤=0.01
𝑦1000≈0
𝑦1000≈0
1111
Large
𝜕𝐿𝜕𝑤
Small
Learningrate?
small
𝜕𝐿𝜕𝑤
Large
Learningrate?
ToyExample
=w999
211.
add
•LongShort-termMemory(LSTM)
•Candealwithgradientvanishing(notgradient
explode)
HelpfulTechniques
Memoryandinputare
added
Theinfluenceneverdisappears
unlessforgetgateisclosed
NoGradientvanishing
(Ifforgetgateisopened.)
[Cho,EMNLP’14]
GatedRecurrentUnit(GRU):
simplerthanLSTM
212.
HelpfulTechniques
VanillaRNNInitializedwithIdentitymatrix+ReLUactivation
function[QuocV.Le,arXiv’15]
OutperformorbecomparablewithLSTMin4differenttasks
[JanKoutnik,JMLR’14]
ClockwiseRNN
[TomasMikolov,ICLR’15]
StructurallyConstrained
RecurrentNetwork(SCRN)
213.
MoreApplications……
storestore
x1
x2x3
y1y2
y3
a1
a1
a2
a2a3
arriveTaipeionNovember2nd
Probabilityof
“arrive”ineachslot
Probabilityof
“Taipei”ineachslot
Probabilityof
“on”ineachslot
Inputandoutputarebothsequences
withthesamelength
RNNcandomorethanthat!
214.
Manytoone
•Inputisavectorsequence,butoutputisonlyonevector
SentimentAnalysis
……
我覺太得糟了
超好雷
好雷
普雷
負雷
超負雷
看了這部電影覺
得很高興…….
這部電影太糟了
…….
這部電影很
棒…….
Positive(正雷)Negative(負雷)Positive(正雷)
……
KerasExample:
https://github.com/fchollet/keras/blob
/master/examples/imdb_lstm.py
215.
ManytoMany(Outputisshorter)
•Bothinputandoutputarebothsequences,buttheoutput
isshorter.
•E.g.SpeechRecognition
好好好
Trimming
棒棒棒棒棒
“好棒”
Whycan’titbe
“好棒棒”
Input:
Output:(charactersequence)
(vector
sequence)
Problem?
216.
ManytoMany(Outputisshorter)
•Bothinputandoutputarebothsequences,buttheoutput
isshorter.
•ConnectionistTemporalClassification(CTC)[AlexGraves,
ICML’06][AlexGraves,ICML’14][HaşimSak,Interspeech’15][JieLi,
Interspeech’15][AndrewSenior,ASRU’15]
好φφ棒φφφφ好φφ棒φ棒φφ
“好棒”“好棒棒”Addanextrasymbol“φ”
representing“null”
217.
ManytoMany(NoLimitation)
•Bothinputandoutputarebothsequenceswithdifferent
lengths.→Sequencetosequencelearning
•E.g.MachineTranslation(machinelearning→機器學習)
Containingall
informationabout
inputsequence
learning
machine
218.
learning
ManytoMany(NoLimitation)
•Bothinputandoutputarebothsequenceswithdifferent
lengths.→Sequencetosequencelearning
•E.g.MachineTranslation(machinelearning→機器學習)
machine
機習器學
……
……
Don’tknowwhentostop
慣性
219.
ManytoMany(NoLimitation)
推tlkagk:=========斷==========
Ref:http://zh.pttpedia.wikia.com/wiki/%E6%8E%A5%E9%BE%8D%
E6%8E%A8%E6%96%87(鄉民百科)
220.
learning
ManytoMany(NoLimitation)
•Bothinputandoutputarebothsequenceswithdifferent
lengths.→Sequencetosequencelearning
•E.g.MachineTranslation(machinelearning→機器學習)
machine
機習器學
Addasymbol“===“(斷)
[IlyaSutskever,NIPS’14][DzmitryBahdanau,arXiv’15]
===
221.
OnetoMany
•Inputanimage,butoutputasequenceofwords
Input
image
awomanis
……
===
CNN
Avector
forwhole
image
[KelvinXu,arXiv’15][LiYao,ICCV’15]
CaptionGeneration
222.
Application:
VideoCaptionGeneration
Video
Agirlisrunning.
Agroupofpeopleis
walkingintheforest.
Agroupofpeopleis
knockedbyatree.
223.
VideoCaptionGeneration
•Canmachinedescribewhatitseefromvideo?
•Demo:曾柏翔、吳柏瑜、盧宏宗
224.
ConcludingRemarks
ConvolutionalNeural
Network(CNN)
RecurrentNeuralNetwork
(RNN)
225.
LectureIV:
NextWave
226.
Outline
SupervisedLearning
•UltraDeepNetwork
•AttentionModel
ReinforcementLearning
UnsupervisedLearning
•Image:RealizingwhattheWorldLooksLike
•Text:UnderstandingtheMeaningofWords
•Audio:Learninghumanlanguagewithoutsupervision
Newnetworkstructure
227.
Skyscraper
https://zh.wikipedia.org/wiki/%E9%9B%99%E5%B3%B0%E5%A1%94#/me
dia/File:BurjDubaiHeight.svg
228.
UltraDeepNetwork
8layers
19layers
22layers
AlexNet(2012)VGG(2014)GoogleNet(2014)
16.4%
7.3%
6.7%
http://cs231n.stanford.e
du/slides/winter1516_le
cture8.pdf
229.
UltraDeepNetwork
AlexNet
(2012)
VGG
(2014)
GoogleNet
(2014)
152layers
3.57%
ResidualNet
(2015)
Taipei
101
101layers
16.4%
7.3%6.7%
230.
UltraDeepNetwork
AlexNet
(2012)
VGG
(2014)
GoogleNet
(2014)
152layers
3.57%
ResidualNet
(2015)
16.4%
7.3%6.7%
Thisultradeepnetwork
havespecialstructure.
Worryaboutoverfitting?
Worryabouttraining
first!
231.
UltraDeepNetwork
•Ultradeepnetworkisthe
ensembleofmanynetworks
withdifferentdepth.
6layers
4layers
2layers
Ensemble
232.
UltraDeepNetwork
•FractalNet
ResnetinResnet
GoodInitialization?
233.
UltraDeepNetwork
••
+
copy
copy
Gate
controller
234.
Inputlayer
outputlayer
Inputlayer
outputlayer
Inputlayer
outputlayer
HighwayNetworkautomatically
determinesthelayersneeded!
235.
Outline
SupervisedLearning
•UltraDeepNetwork
•AttentionModel
ReinforcementLearning
UnsupervisedLearning
•Image:RealizingwhattheWorldLooksLike
•Text:UnderstandingtheMeaningofWords
•Audio:Learninghumanlanguagewithoutsupervision
Newnetworkstructure
236.
Organize
Attention-basedModel
http://henrylo1605.blogspot.tw/2015/05/blog-post_56.html
LunchtodayWhatyoulearned
intheselectures
summer
vacation10
yearsago
Whatisdeep
learning?
Answer
237.
Attention-basedModel
ReadingHead
Controller
Input
ReadingHead
output
…………
Machine’sMemory
DNN/RNN
Ref:
http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Attain%20(v3).e
cm.mp4/index.html
238.
Attention-basedModelv2
ReadingHead
Controller
Input
ReadingHead
output
…………
Machine’sMemory
DNN/RNN
NeuralTuringMachine
WritingHead
Controller
WritingHead
239.
ReadingComprehension
Query
Eachsentencebecomesavector.
……
DNN/RNN
ReadingHead
Controller
……
answer
Semantic
Analysis
240.
ReadingComprehension
•End-To-EndMemoryNetworks.S.Sukhbaatar,A.Szlam,J.
Weston,R.Fergus.NIPS,2015.
Thepositionofreadinghead:
Kerashasexample:
https://github.com/fchollet/keras/blob/master/examples/ba
bi_memnn.py
241.
VisualQuestionAnswering
source:http://visualqa.org/
242.
VisualQuestionAnswering
QueryDNN/RNN
ReadingHead
Controller
answer
CNNAvectorfor
eachregion
243.
VisualQuestionAnswering
•HuijuanXu,KateSaenko.Ask,AttendandAnswer:Exploring
Question-GuidedSpatialAttentionforVisualQuestion
Answering.arXivPre-Print,2015
244.
SpeechQuestionAnswering
•TOEFLListeningComprehensionTestbyMachine
•Example:
Question:“WhatisapossibleoriginofVenus’clouds?”
AudioStory:
Choices:
(A)gasesreleasedasaresultofvolcanicactivity
(B)chemicalreactionscausedbyhighsurfacetemperatures
(C)burstsofradioenergyfromtheplane'ssurface
(D)strongwindsthatblowdustintotheatmosphere
(Theoriginalstoryis5minlong.)
245.
SimpleBaselines
Accuracy(%)
(1)(2)(3)(4)(5)(6)(7)
NaiveApproaches
random
(4)thechoicewithsemantic
mostsimilartoothers
(2)selecttheshortest
choiceasanswer
Experimentalsetup:
717fortraining,
124forvalidation,122fortesting
246.
ModelArchitecture
“whatisapossible
originofVenus‘clouds?"
Question:
Question
Semantics
……Itbequitepossiblethatthisbe
duetovolcaniceruptionbecause
volcaniceruptionoftenemitgas.If
thatbethecasevolcanismcouldvery
wellbetherootcauseofVenus'sthick
cloudcover.Andalsowehaveobserve
burstofradioenergyfromtheplanet
'ssurface.Theseburstbesimilarto
whatweseewhenvolcanoerupton
earth……
AudioStory:
Speech
Recognition
Semantic
Analysis
Semantic
Analysis
Attention
Answer
Selectthechoicemost
similartotheanswer
Attention
Everythingislearned
fromtrainingexamples
247.
ModelArchitecture
Word-basedAttention
248.
ModelArchitecture
Sentence-basedAttention
249.
(A)
(A)(A)(A)(A)
(B)(B)(B)
250.
SupervisedLearning
Accuracy(%)
(1)(2)(3)(4)(5)(6)(7)
MemoryNetwork:39.2%
NaiveApproaches
(proposedbyFBAIgroup)
251.
SupervisedLearning
Accuracy(%)
(1)(2)(3)(4)(5)(6)(7)
MemoryNetwork:39.2%
NaiveApproaches
Word-basedAttention:48.8%
(proposedbyFBAIgroup)
[Fang&Hsu&Lee,SLT16]
[Tseng&Lee,Interspeech16]
252.
Outline
SupervisedLearning
•UltraDeepNetwork
•AttentionModel
ReinforcementLearning
UnsupervisedLearning
•Image:RealizingwhattheWorldLooksLike
•Text:UnderstandingtheMeaningofWords
•Audio:Learninghumanlanguagewithoutsupervision
Newnetworkstructure
253.
ScenarioofReinforcement
Learning
Agent
Environment
ObservationAction
RewardDon’tdo
that
254.
ScenarioofReinforcement
Learning
Agent
Environment
ObservationAction
RewardThankyou.
Agentlearnstotakeactionsto
maximizeexpectedreward.
http://www.sznews.com/news/conte
nt/2013-11/26/content_8800180.htm
255.
Supervisedv.s.Reinforcement
•Supervised
•Reinforcement
Hello
Agent
……
Agent
…….…….……
Bad
“Hello”Say“Hi”
“Byebye”Say“Goodbye”
Learningfrom
teacher
Learningfrom
critics
256.
ScenarioofReinforcement
Learning
Environment
ObservationAction
RewardNextMove
Ifwin,reward=1
Ifloss,reward=-1
Otherwise,reward=0
Agentlearnstotakeactionsto
maximizeexpectedreward.
257.
Supervisedv.s.Reinforcement
•Supervised:
•ReinforcementLearning
Nextmove:
“5-5”
Nextmove:
“3-3”
Firstmove……manymoves……Win!
AlphaGoissupervisedlearning+reinforcementlearning.
258.
DifficultiesofReinforcement
Learning
•Itmaybebettertosacrificeimmediaterewardto
gainmorelong-termreward
•E.g.PlayingGo
•Agent’sactionsaffectthesubsequentdatait
receives
•E.g.Exploration
259.
DeepReinforcementLearning
Environment
ObservationAction
Reward
Function
Input
Function
Output
Usedtopickthe
bestfunction
…
……
DNN
260.
Application:InteractiveRetrieval
•Interactiveretrievalishelpful.
user
“DeepLearning”
“DeepLearning”relatedtoMachineLearning?
“DeepLearning”relatedtoEducation?
[Wu&Lee,INTERSPEECH16]
261.
DeepReinforcementLearning
•Differentnetworkdepth
Betterretrieval
performance,
Lessuserlabor
Thetaskcannotbeaddressed
bylinearmodel.
Somedepthisneeded.
MoreInteraction
262.
Moreapplications
•AlphaGo,PlayingVideoGames,Dialogue
•FlyingHelicopter
•https://www.youtube.com/watch?v=0JL04JJjocc
•Driving
•https://www.youtube.com/watch?v=0xo1Ldx3L
5Q
•GoogleCutsItsGiantElectricityBillWith
DeepMind-PoweredAI
•http://www.bloomberg.com/news/articles/2016-07-
19/google-cuts-its-giant-electricity-bill-with-deepmind-
powered-ai
263.
Tolearndeepreinforcement
learning……
•LecturesofDavidSilver
•http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Te
aching.html
•10lectures(1:30each)
•DeepReinforcementLearning
•http://videolectures.net/rldm2015_silver_reinfo
rcement_learning/
264.
Outline
SupervisedLearning
•UltraDeepNetwork
•AttentionModel
ReinforcementLearning
UnsupervisedLearning
•Image:RealizingwhattheWorldLooksLike
•Text:UnderstandingtheMeaningofWords
•Audio:Learninghumanlanguagewithoutsupervision
Newnetworkstructure
265.
Doesmachineknowwhatthe
worldlooklike?
Drawsomething!
Ref:https://openai.com/blog/generative-models/
266.
DeepDream
•Givenaphoto,machineaddswhatitsees……
http://deepdreamgenerator.com/
267.
DeepDream
•Givenaphoto,machineaddswhatitsees……
http://deepdreamgenerator.com/
268.
DeepStyle
•Givenaphoto,makeitsstylelikefamouspaintings
https://dreamscopeapp.com/
269.
DeepStyle
•Givenaphoto,makeitsstylelikefamouspaintings
https://dreamscopeapp.com/
270.
DeepStyle
CNNCNN
contentstyle
CNN
?
271.
GeneratingImagesbyRNN
colorof
1stpixel
colorof
2ndpixel
colorof
2ndpixel
colorof
3rdpixel
colorof
3rdpixel
colorof
4thpixel
272.
GeneratingImagesbyRNN
•PixelRecurrentNeuralNetworks
•https://arxiv.org/abs/1601.06759
Real
World
273.
GeneratingImages
•Trainingadecodertogenerateimagesis
unsupervised
NeuralNetwork
?Trainingdataisalotofimagescode
274.
Auto-encoder
NN
Encoder
NN
Decoder
code
code
LearntogetherInputLayer
bottle
OutputLayer
Layer
Layer
……
Code
Ascloseaspossible
Layer
Layer
EncoderDecoder
Notstate-of-
the-art
approach
275.
GeneratingImages
•Trainingadecodertogenerateimagesis
unsupervised
•VariationAuto-encoder(VAE)
•Ref:Auto-EncodingVariationalBayes,
https://arxiv.org/abs/1312.6114
•GenerativeAdversarialNetwork(GAN)
•Ref:GenerativeAdversarialNetworks,
http://arxiv.org/abs/1406.2661
NN
Decoder
code
276.
Whichoneismachine-generated?
Ref:https://openai.com/blog/generative-models/
277.
畫漫畫!!!https://github.com/mattya/chainer-DCGAN
278.
Outline
SupervisedLearning
•UltraDeepNetwork
•AttentionModel
ReinforcementLearning
UnsupervisedLearning
•Image:RealizingwhattheWorldLooksLike
•Text:UnderstandingtheMeaningofWords
•Audio:Learninghumanlanguagewithoutsupervision
Newnetworkstructure
279.
http://top-breaking-news.com/
MachineReading
•Machinelearnthemeaningofwordsfromreading
alotofdocumentswithoutsupervision
280.
MachineReading
•Machinelearnthemeaningofwordsfromreading
alotofdocumentswithoutsupervision
dog
cat
rabbit
jump
run
flower
tree
WordVector/Embedding
281.
MachineReading
•GeneratingWordVector/Embeddingis
unsupervised
NeuralNetwork
Apple
https://garavato.files.wordpress.com/2011/11/stacksdocuments.jpg?w=490
Trainingdataisalotoftext
?
282.
MachineReading
•Machinelearnthemeaningofwordsfromreading
alotofdocumentswithoutsupervision
•Awordcanbeunderstoodbyitscontext
蔡英文520宣誓就職
馬英九520宣誓就職
蔡英文、馬英九are
somethingverysimilar
Youshallknowaword
bythecompanyitkeeps
283.
WordVector
Source:http://www.slideshare.net/hustwj/cikm-keynotenov2014
283
284.
WordVector
•Characteristics
•Solvinganalogies
𝑉ℎ𝑜𝑡𝑡𝑒𝑟−𝑉ℎ𝑜𝑡≈𝑉𝑏𝑖𝑔𝑔𝑒𝑟−𝑉𝑏𝑖𝑔
𝑉𝑅𝑜𝑚𝑒−𝑉𝐼𝑡𝑎𝑙𝑦≈𝑉𝐵𝑒𝑟𝑙𝑖𝑛−𝑉𝐺𝑒𝑟𝑚𝑎𝑛𝑦
𝑉𝑘𝑖𝑛𝑔−𝑉𝑞𝑢𝑒𝑒𝑛≈𝑉𝑢𝑛𝑐𝑙𝑒−𝑉𝑎𝑢𝑛𝑡
Rome:Italy=Berlin:?
𝑉𝐺𝑒𝑟𝑚𝑎𝑛𝑦
≈𝑉𝐵𝑒𝑟𝑙𝑖𝑛−𝑉𝑅𝑜𝑚𝑒+𝑉𝐼𝑡𝑎𝑙𝑦
Compute𝑉𝐵𝑒𝑟𝑙𝑖𝑛−𝑉𝑅𝑜𝑚𝑒+𝑉𝐼𝑡𝑎𝑙𝑦
FindthewordwwiththeclosestV(w)
284
285.
MachineReading
•Machinelearnthemeaningofwordsfromreading
alotofdocumentswithoutsupervision
286.
Demo
•Modelusedindemoisprovidedby陳仰德
•Partoftheprojectdoneby陳仰德、林資偉
•TA:劉元銘
•TrainingdataisfromPTT(collectedby葉青峰)
286
287.
Outline
SupervisedLearning
•UltraDeepNetwork
•AttentionModel
ReinforcementLearning
UnsupervisedLearning
•Image:RealizingwhattheWorldLooksLike
•Text:UnderstandingtheMeaningofWords
•Audio:Learninghumanlanguagewithoutsupervision
Newnetworkstructure
288.
LearningfromAudioBook
Machinelistenstolotsof
audiobook
[Chung,Interspeech16)
Machinedoesnothave
anypriorknowledge
Likeaninfant
289.
AudioWordtoVector
•Audiosegmentcorrespondingtoanunknownword
Fixed-lengthvector
290.
AudioWordtoVector
•Theaudiosegmentscorrespondingtowordswith
similarpronunciationsareclosetoeachother.
everever
never
never
never
dog
dog
dogs
291.
Sequence-to-sequence
Auto-encoder
audiosegment
acousticfeatures
Thevaluesinthememory
representthewholeaudio
segment
x1x2x3x4
RNNEncoder
audiosegment
vector
Thevectorwewant
HowtotrainRNNEncoder?
292.
Sequence-to-sequence
Auto-encoder
RNNDecoder
x1x2x3x4
y1y2y3y4
x1x2x3
x4
RNNEncoder
audiosegment
acousticfeatures
TheRNNencoderand
decoderarejointlytrained.
Inputacousticfeatures
293.
AudioWordtoVector
-Results
•Visualizingembeddingvectorsofthewords
fear
nearname
fame
294.
WaveNet(DeepMind)
https://deepmind.com/blog/wavenet-generative-model-raw-audio/
295.
ConcludingRemarks
296.
ConcludingRemarks
LectureIV:NextWave
LectureIII:VariantsofNeuralNetwork
LectureII:TipsforTrainingDeepNeuralNetwork
LectureI:IntroductionofDeepLearning
297.
AI即將取代多數的工作?
•NewJobinAIAge
http://www.express.co.uk/news/science/651202/First-step-towards-The-Terminator-
becoming-reality-AI-beats-champ-of-world-s-oldest-game
AI訓練師
(機器學習專家、
資料科學家)
298.
AI訓練師
機器不是自己會學嗎?
為什麼需要AI訓練師
戰鬥是寶可夢在打,
為什麼需要寶可夢訓練師?
299.
AI訓練師
寶可夢訓練師
•寶可夢訓練師要挑選適合
的寶可夢來戰鬥
•寶可夢有不同的屬性
•召喚出來的寶可夢不一定
能操控
•E.g.小智的噴火龍
•需要足夠的經驗
AI訓練師
•在step1,AI訓練師要挑
選合適的模型
•不同模型適合處理不
同的問題
•不一定能在step3找出
bestfunction
•E.g.DeepLearning
•需要足夠的經驗
300.
AI訓練師
•厲害的AI,AI訓練師功不可沒
•讓我們一起朝AI訓練師之路邁進
http://www.gvm.com.tw/web
only_content_10787.html
2,765likes
×
ssuser53561c
May.07,2022
yunkun1
Mar.20,2022
jackwilson431910
Mar.05,2022
reawenchiang
Jan.22,2022
TiffanyBeaumont
Dec.16,2021
Ifuneedahandinmakingyourwritingassignments-visit⇒www.HelpWriting.net⇐formoredetailedinformation.
ShowMore
Views
×
Totalviews
493,622
OnSlideShare
0
FromEmbeds
0
NumberofEmbeds
4,592
Youhavenowunlockedunlimitedaccessto20M+documents!
×
UnlimitedReading
Learnfasterandsmarterfromtopexperts
UnlimitedDownloading
Downloadtotakeyourlearningsofflineandonthego
YoualsogetfreeaccesstoScribd!
Instantaccesstomillionsofebooks,audiobooks,magazines,podcastsandmore.
Readandlistenofflinewithanydevice.
FreeaccesstopremiumserviceslikeTuneln,Mubiandmore.
DiscoverMoreOnScribd
×
ShareClipboard
×
Facebook
Twitter
LinkedIn
Link
Publicclipboardsfeaturingthisslide
×
Nopublicclipboardsfoundforthisslide
Selectanotherclipboard
×
Lookslikeyou’veclippedthisslidetoalready.
Createaclipboard
Youjustclippedyourfirstslide!
Clippingisahandywaytocollectimportantslidesyouwanttogobacktolater.Nowcustomizethenameofaclipboardtostoreyourclips.
Name*
Description
Visibility
OtherscanseemyClipboard
Cancel
Save
SharethisSlideShare
SpecialOffertoSlideShareReaders
×
TheSlideSharefamilyjustgotbigger.Enjoyaccesstomillionsofebooks,audiobooks,magazines,andmorefromScribd.
Readfreefor60days
Cancelanytime.
延伸文章資訊
- 1人工智慧Artificial Intelligence (AI) Part I - 國立臺北科技大學
知、機器社交等,而我們常常聽. 到的「機器學習(machine learning)」是屬於人工智慧的一. 部分,「深度學習(deep learning)」又屬於機器學習的另.
- 2[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習 - SlideShare
深度學習 ( Deep Learning ) 是機器學習 ( Machine Learning ) 中近年來備受重視的一支,深度學習根源於類神經網路 ( Artificial Neural Ne...
- 3PyTorch 深度學習入門推薦!清楚的PPT 講解,讓你5 天就上手
之所以專注於Python 中的PyTorch 庫,而不使用任何高級別的神經網絡API,是為了同學們能夠快速理解深度學習,先理解網上提供的論文、Blog 文章和程式即可 ...
- 4人工智慧入門- 深度學習 - 朝陽科技大學
深度學習(Deep Learning; DL)也是一種機器學習的方法,它的模型. 種類也很多,如果是傳統ANN來增加層數的深度網路模型,我們叫做. DNN(Deep Neural Network...
- 5深度学习入门---只要300页ppt就够了!!! - 程序员大本营
深度学习不是梦,只要有高数基础都能看懂。好东西要分享,需要的童鞋请拿走~ https://pan.baidu.com/s/1nv54p9R 提取码:3mty 里面的一页ppt哈哈哈,这个画风还可以吧.