[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習 - SlideShare

文章推薦指數: 80 %
投票人數:10人

深度學習 ( Deep Learning ) 是機器學習 ( Machine Learning ) 中近年來備受重視的一支,深度學習根源於類神經網路 ( Artificial Neural Network ) 模型,但今日深度 ... SlideShareusescookiestoimprovefunctionalityandperformance,andtoprovideyouwithrelevantadvertising.Ifyoucontinuebrowsingthesite,youagreetotheuseofcookiesonthiswebsite.SeeourUserAgreementandPrivacyPolicy. SlideShareusescookiestoimprovefunctionalityandperformance,andtoprovideyouwithrelevantadvertising.Ifyoucontinuebrowsingthesite,youagreetotheuseofcookiesonthiswebsite.SeeourPrivacyPolicyandUserAgreementfordetails. Upload Home Explore Login Signup Successfullyreportedthisslideshow. Activateyour14dayfreetrial tounlockunlimitedreading. [DSC2016]系列活動:李宏毅/一天搞懂深度學習 2765 Share 台灣資料科學年會 • May.21,2016 • 2,765likes • 493,622views DownloadNow Download NextSlideShares Youarereadingapreview. Activateyour14dayfreetrial tocontinuereading. ContinueforFree UpcomingSlideShare DeepLearning:Introduction&Chapter5MachineLearningBasics Loadingin…3 × Facebook Twitter LinkedIn Size(px) Starton ShowrelatedSlideSharesatend Share Email     Topclippedslide 1 1of301 1of301 [DSC2016]系列活動:李宏毅/一天搞懂深度學習 May.21,2016 • 2,765likes • 493,622views 2765 Share DownloadNow Download Downloadtoreadoffline Data&Analytics 深度學習(DeepLearning)是機器學習(MachineLearning)中近年來備受重視的一支,深度學習根源於類神經網路(ArtificialNeuralNetwork)模型,但今日深度學習的技術和它的前身已截然不同,目前最好的語音辨識和影像辨識系統都是以深度學習技術來完成,你可能在很多不同的場合聽過各種用深度學習做出的驚人應用(例如:最近紅遍大街小巷的AlphaGo),聽完以後覺得心癢癢的,想要趕快使用這項強大的技術,卻不知要從何下手學習,那這門課就是你所需要的。

這門課程將由台大電機系李宏毅教授利用短短的一天議程簡介深度學習。

以下是課程大綱: 什麼是深度學習 深度學習的技術表面上看起來五花八門,但其實就是三個步驟:設定好類神經網路架構、訂出學習目標、開始學習,這堂課會簡介如何使用深度學習的工具Keras,它可以幫助你在十分鐘內完成深度學習的程式。

另外,有人說深度學習很厲害、有各種吹捧,也有人說深度學習只是個噱頭,到底深度學習和其他的機器學習方法有什麼不同呢?這堂課要剖析深度學習和其它機器學習方法相比潛在的優勢。

深度學習的各種小技巧 雖然現在深度學習的工具滿街都是,想要寫一個深度學習的程式只是舉手之勞,但要得到好的成果可不簡單,訓練過程中各種枝枝節節的小技巧才是成功的關鍵。

本課程中將分享深度學習的實作技巧及實戰經驗。

有記憶力的深度學習模型 機器需要記憶力才能做更多事情,這段課程要講解遞迴式類神經網路(RecurrentNeuralNetwork),告訴大家深度學習模型如何可以有記憶力。

深度學習應用與展望 深度學習可以拿來做甚麼?怎麼用深度學習做語音辨識?怎麼用深度學習做問答系統?接下來深度學習的研究者們在意的是什麼樣的問題呢? 本課程希望幫助大家不只能了解深度學習,也可以有效率地上手深度學習,用在手邊的問題上。

無論是從未嘗試過深度學習的新手,還是已經有一點經驗想更深入學習,都可以在這門課中有所收穫。

Readmore 台灣資料科學年會 Follow 深度學習(DeepLearning)是機器學習(MachineLearning)中近年來備受重視的一支,深度學習根源於類神經網路(ArtificialNeuralNetwork)模型,但今日深度學習的技術和它的前身已截然不同,目前最好的語音辨識和影像辨識系統都是以深度學習技術來完成,你可能在很多不同的場合聽過各種用深度學習做出的驚人應用(例如:最近紅遍大街小巷的AlphaGo),聽完以後覺得心癢癢的,想要趕快使用這項強大的技術,卻不知要從何下手學習,那這門課就是你所需要的。

這門課程將由台大電機系李宏毅教授利用短短的一天議程簡介深度學習。

以下是課程大綱: 什麼是深度學習 深度學習的技術表面上看起來五花八門,但其實就是三個步驟:設定好類神經網路架構、訂出學習目標、開始學習,這堂課會簡介如何使用深度學習的工具Keras,它可以幫助你在十分鐘內完成深度學習的程式。

另外,有人說深度學習很厲害、有各種吹捧,也有人說深度學習只是個噱頭,到底深度學習和其他的機器學習方法有什麼不同呢?這堂課要剖析深度學習和其它機器學習方法相比潛在的優勢。

深度學習的各種小技巧 雖然現在深度學習的工具滿街都是,想要寫一個深度學習的程式只是舉手之勞,但要得到好的成果可不簡單,訓練過程中各種枝枝節節的小技巧才是成功的關鍵。

本課程中將分享深度學習的實作技巧及實戰經驗。

有記憶力的深度學習模型 機器需要記憶力才能做更多事情,這段課程要講解遞迴式類神經網路(RecurrentNeuralNetwork),告訴大家深度學習模型如何可以有記憶力。

深度學習應用與展望 深度學習可以拿來做甚麼?怎麼用深度學習做語音辨識?怎麼用深度學習做問答系統?接下來深度學習的研究者們在意的是什麼樣的問題呢? 本課程希望幫助大家不只能了解深度學習,也可以有效率地上手深度學習,用在手邊的問題上。

無論是從未嘗試過深度學習的新手,還是已經有一點經驗想更深入學習,都可以在這門課中有所收穫。

Readmore Data&Analytics DeepLearning:Introduction&Chapter5MachineLearningBasics JasonTsai 李宏毅/當語音處理遇上深度學習 台灣資料科學年會 MachineLearning,DeepLearningandDataAnalysisIntroduction Te-YenLiu WhatDeepLearningMeansforArtificialIntelligence JonathanMugan [系列活動]一日搞懂生成式對抗網路 台灣資料科學年會 WhatDeepLearningMeansforArtificialIntelligence JonathanMugan [系列活動]手把手的深度學實務 台灣資料科學年會 PythonforImageUnderstanding:DeepLearningwithConvolutionalNeuralNets RoelofPieters DeepLearningTutorial|DeepLearningTensorFlow|DeepLearningWithNeural... Simplilearn DeepLearningInterviewQuestionsAndAnswers|AI&DeepLearningInterview... Simplilearn DeepLearning:Introduction&Chapter5MachineLearningBasics JasonTsai 李宏毅/當語音處理遇上深度學習 台灣資料科學年會 MachineLearning,DeepLearningandDataAnalysisIntroduction Te-YenLiu WhatDeepLearningMeansforArtificialIntelligence JonathanMugan [系列活動]一日搞懂生成式對抗網路 台灣資料科學年會 WhatDeepLearningMeansforArtificialIntelligence JonathanMugan [系列活動]手把手的深度學實務 台灣資料科學年會 PythonforImageUnderstanding:DeepLearningwithConvolutionalNeuralNets RoelofPieters DeepLearningTutorial|DeepLearningTensorFlow|DeepLearningWithNeural... Simplilearn DeepLearningInterviewQuestionsAndAnswers|AI&DeepLearningInterview... Simplilearn MoreRelatedContent Viewersalsoliked 孫民/從電腦視覺看人工智慧:下一件大事 台灣資料科學年會 Deeplearning:thefutureofrecommendations BalázsHidasi WhatIsDeepLearning?|IntroductiontoDeepLearning|DeepLearningTutori... Simplilearn [系列活動]手把手的深度學習實務 台灣資料科學年會 DeepLearningforPersonalizedSearchandRecommenderSystems BenjaminLe Deeplearning-AVisualIntroduction LukasMasuch IntroductiontodeeplearninginpythonandMatlab ImryKissos DeepLearning JunWang Attentionmechanismswithtensorflow KeonKim DeepLearningTutorial|DeepLearningTutorialForBeginners|WhatIsDeep... Simplilearn AnintroductiontoDeepLearning DavidRostcheck DeepLearningwithPython(PyDataSeattle2015) AlexanderKorbonits SyntheticdialoguegenerationwithDeepLearning SN DeepLearningandReinforcementLearning RenārsLiepiņš DeeplearninginComputerVision DavidDao RNN,LSTMandSeq-2-SeqModels EmoryNLP Deeplearning PratapDangeti DeepLearningandDesignThinking Yen-lungTsai IntroductiontoNeuralNetworks Databricks Similarto[DSC2016]系列活動:李宏毅/一天搞懂深度學習 Hands-onTutorialofDeepLearning Chun-MingChang Hands-ontutorialofdeeplearning(Keras) Chun-MinChang IntroductiontoDeeplearningandH2Oforbeginner's VidyasagarBhargava BaiscDeepLearningHandsOn SeanYu Neuralnetworks PrakharMishra NeuralnetworkbasicandintroductionofDeeplearning TapasMajumdar deepCNNvsconventionalML [email protected] Deeplearning NimritaKoul Notesfrom2016bayareadeeplearningschool NiketanPansare Lecture5:NeuralNetworksII SangJunLee DeeplearningMindMap AshishPatel Deeplearning RouyunPan GettingstartedwithMachineLearning GauravBhalotia Handsonmachinelearningwithscikit-learnandtensorflowbyahmedyousry AhmedYousry Deeplearningarchitectures Joeli 20110480.neural-networks ParneetKaur AutoencodersinDeepLearning ShajunNisha IntroductiontoDeepLearning MehrnazFaraz Nimritadeeplearning NimritaKoul backpropagationinneuralnetworks AkashGoel Morefrom台灣資料科學年會 [台灣人工智慧學校]人工智慧技術發展與應用 台灣資料科學年會 [台灣人工智慧學校]執行長報告 台灣資料科學年會 [台灣人工智慧學校]工業4.0與智慧製造的發展趨勢與挑戰 台灣資料科學年會 [台灣人工智慧學校]開創台灣產業智慧轉型的新契機 台灣資料科學年會 [台灣人工智慧學校]開創台灣產業智慧轉型的新契機 台灣資料科學年會 [台灣人工智慧學校]台北總校第三期結業典禮-執行長談話 台灣資料科學年會 [TOxAIA台中分校]AI引爆新工業革命,智慧機械首都台中轉型論壇 台灣資料科學年會 [TOxAIA台中分校]2019台灣數位轉型與產業升級趨勢觀察 台灣資料科學年會 [TOxAIA台中分校]智慧製造成真!產線導入AI的致勝關鍵 台灣資料科學年會 [台灣人工智慧學校]從經濟學看人工智慧產業應用 台灣資料科學年會 [台灣人工智慧學校]台中分校第二期開學典禮-執行長報告 台灣資料科學年會 台灣人工智慧學校成果發表會 台灣資料科學年會 [台中分校]第一期結業典禮-執行長談話 台灣資料科學年會 [TOxAIA新竹分校]工業4.0潛力新應用!多模式對話機器人 台灣資料科學年會 [TOxAIA新竹分校]AI整合是重點!竹科的關鍵轉型思維 台灣資料科學年會 [TOxAIA新竹分校]2019台灣數位轉型與產業升級趨勢觀察 台灣資料科學年會 [TOxAIA新竹分校]深度學習與Kaggle實戰 台灣資料科學年會 [台灣人工智慧學校]BridgingAItoPrecisionAgriculturethroughIoT 台灣資料科學年會 [2018台灣人工智慧學校校友年會]產業經驗分享:如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質/李明達 台灣資料科學年會 [2018台灣人工智慧學校校友年會]啟動物聯網新關鍵-未來由你「喚」醒/沈品勳 台灣資料科學年會 Featured WhattoUploadtoSlideShare SlideShare BeAGreatProductLeader(Amplify,Oct2019) AdamNash TrillionDollarCoachBook(BillCampbell) EricSchmidt APIdaysParis2019-Innovation@scale,APIsasDigitalFactories'NewMachi... apidays Afewthoughtsonworklife-balance WimVanderbauwhede Isvcstillathingfinal MarkSuster TheGaryVeeContentModel GaryVaynerchuk MammalianBrainChemistryExplainsEverything LorettaBreuning,PhD Blockchain+AI+CryptoEconomicsAreWeCreatingaCodeTsunami? DinisGuarda TheAIRush Jean-BaptisteDumont AIandMachineLearningDemystifiedbyCarolSmithatMidwestUX2017 CarolSmith 10factsaboutjobsinthefuture PewResearchCenter'sInternet&AmericanLifeProject HarrySurden-ArtificialIntelligenceandLawOverview HarrySurden InsideGoogle'sNumbersin2017 RandFishkin Pinot:RealtimeDistributedOLAPdatastore KishoreGopalakrishna HowtoBecomeaThoughtLeaderinYourNiche LeslieSamuel VisualDesignwithData SethFamilian DesigningTeamsforEmergingChallenges AaronIrizarry UX,ethnographyandpossibilities:forLibraries,MuseumsandArchives NedPotter WinnersandLosers-Allthe(Russian)President'sMen IanBremmer RelatedBooks Freewitha14daytrialfromScribd Seeall DataVisualization:asuccessfuldesignprocess AndyKirk (4/5) Free DynamicModelsinBiology StephenP.Ellner (3/5) Free Agent-BasedandIndividual-BasedModeling:APracticalIntroduction,SecondEdition StevenF.Railsback (4/5) Free Outnumbered:FromFacebookandGoogletoFakeNewsandFilter-bubbles–TheAlgorithmsThatControlOurLives DavidSumpter (5/5) Free DataModelPatterns:AMetadataMap DavidC.Hay (3/5) Free GuerrillaDataAnalysisUsingMicrosoftExcel:2ndEditionCoveringExcel2010/2013 OzduSoleil (3/5) Free PythonMachineLearning SebastianRaschka (4/5) Free PowerPivotandPowerBI:TheExcelUser'sGuidetoDAX,PowerQuery,PowerBI&PowerPivotinExcel2010-2016 RobCollie (4.5/5) Free SuperchargeExcel:WhenyoulearntoWriteDAXforPowerPivot MattAllington (0/5) Free LearntoWriteDAX:ApracticalguidetolearningPowerPivotforExcelandPowerBI MattAllington (4/5) Free BusinessAnalysis DebraPaul (4.5/5) Free PythonDataScienceEssentials-SecondEdition LucaMassaron (4/5) Free Probability,MarkovChains,Queues,andSimulation:TheMathematicalBasisofPerformanceModeling WilliamJ.Stewart (2/5) Free PythonDataAnalysis IvanIdris (4/5) Free NumericalMethodsforStochasticComputations:ASpectralMethodApproach DongbinXiu (5/5) Free DataVisualizationwithD3.jsCookbook NickQiZhu (0/5) Free RelatedAudiobooks Freewitha14daytrialfromScribd Seeall MachineLearninginPython:HandsonMachineLearningwithPythonTools,ConceptsandTechniques BobMather (4.5/5) Free DataVisualizationGuide:ClearIntroductiontoDataMining,Analysis,andVisualization AlexCampbell (0/5) Free AdvancesinFinancialMachineLearning MarcosLopezdePrado (5/5) Free PythonGuide:ClearIntroductiontoPythonProgrammingandMachineLearning AlexCampbell (0/5) Free DataVisualization:ClearIntroductiontoDataVisualizationwithPython.ProperGuideforDataScientist. AlexCampbell (0/5) Free DataMiningandAnalytics:UltimateGuidetotheBasicsofDataMining,AnalyticsandMetrics AlexCampbell (0/5) Free [DSC2016]系列活動:李宏毅/一天搞懂深度學習 1. DeepLearningTutorial 李宏毅 Hung-yiLee 2. Deeplearning attractslotsofattention. •Ibelieveyouhaveseenlotsofexcitingresults before. Thistalkfocusesonthebasictechniques. Deeplearningtrends atGoogle.Source: SIGMOD/JeffDean 3. Outline LectureIV:NextWave LectureIII:VariantsofNeuralNetwork LectureII:TipsforTrainingDeepNeuralNetwork LectureI:IntroductionofDeepLearning 4. LectureI: Introductionof DeepLearning 5. OutlineofLectureI IntroductionofDeepLearning WhyDeep? “HelloWorld”forDeepLearning Let’sstartwithgeneral machinelearning. 6. MachineLearning ≈LookingforaFunction •SpeechRecognition •ImageRecognition •PlayingGo •DialogueSystem f f f f “Cat” “Howareyou” “5-5” “Hello”“Hi” (whattheusersaid)(systemresponse) (nextmove) 7. Framework Asetof function21,ff 1f“cat” 1f“dog” 2f“money” 2f“snake” Model f“cat” ImageRecognition: 8. Framework Asetof function21,ff f“cat” ImageRecognition: Model Training Data Goodnessof functionf Better! “monkey”“cat”“dog” functioninput: functionoutput: SupervisedLearning 9. Framework Asetof function21,ff f“cat” ImageRecognition: Model Training Data Goodnessof functionf “monkey”“cat”“dog” * f Pickthe“Best”Function Using f “cat” TrainingTesting Step1 Step2Step3 10. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… 11. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… Neural Network 12. HumanBrains 13. bwawawazKKkk11 NeuralNetwork z 1w kw Kw … 1a ka Ka  b z bias a weights Neuron… …… Asimplefunction Activation function 14. NeuralNetwork z bias Activation function weights Neuron 1 -2 -1 1 2 -1 1 4 z z z e z   1 1  SigmoidFunction 0.98 15. NeuralNetwork z z z z Differentconnectionsleadsto differentnetworkstructure Weightsandbiasesarenetworkparameters𝜃 Eachneuronscanhavedifferentvalues ofweightsandbiases. 16. FullyConnectFeedforward Network z z z e z   1 1  SigmoidFunction 1 -1 1 -2 1 -1 1 0 4 -2 0.98 0.12 17. FullyConnectFeedforward Network 1 -2 1 -1 1 0 4 -2 0.98 0.12 2 -1 -1 -2 3 -1 4 -1 0.86 0.11 0.62 0.83 0 0 -2 2 1 -1 18. FullyConnectFeedforward Network 1 -2 1 -1 1 0 0.73 0.5 2 -1 -1 -2 3 -1 4 -1 0.72 0.12 0.51 0.85 0 0 -2 2 𝑓 0 0 = 0.51 0.85 Givenparameters𝜃,defineafunction 𝑓 1 −1 = 0.62 0.83 0 0 Thisisafunction. Inputvector,outputvector Givennetworkstructure,defineafunctionset 19. Output LayerHiddenLayers Input Layer FullyConnectFeedforward Network InputOutput 1x 2x Layer1 …… Nx …… Layer2 …… LayerL …… …… …… …… …… y1 y2 yM Deepmeansmanyhiddenlayers neuron 20. OutputLayer(Option) •Softmaxlayerastheoutputlayer OrdinaryLayer 11zy 22zy 33zy 1z 2z 3z    Ingeneral,theoutputof networkcanbeanyvalue. Maynotbeeasytointerpret 21. OutputLayer(Option) •Softmaxlayerastheoutputlayer 1z 2z 3z SoftmaxLayer e e e 1z e 2z e 3z e    3 1 1 1 j zzj eey  3 1j zj e    3 -3 12.7 20 0.05 0.88 0.12 ≈0 Probability: 1>𝑦𝑖>0 𝑖𝑦𝑖=1   3 1 2 2 j zzj eey   3 1 3 3 j zzj eey 22. ExampleApplication InputOutput 16x16=256 1x 2x 256x …… Ink→1 Noink→0 …… y1 y2 y10 Eachdimensionrepresents theconfidenceofadigit. is1 is2 is0 …… 0.1 0.7 0.2 Theimage is“2” 23. ExampleApplication •HandwritingDigitRecognition Machine“2” 1x 2x 256x …… …… y1 y2 y10 is1 is2 is0 …… Whatisneededisa function…… Input: 256-dimvector output: 10-dimvector Neural Network 24. Output LayerHiddenLayers Input Layer ExampleApplication InputOutput 1x 2x Layer1 …… Nx …… Layer2 …… LayerL …… …… …… …… “2” …… y1 y2 y10 is1 is2 is0 …… Afunctionsetcontainingthe candidatesfor HandwritingDigitRecognition Youneedtodecidethenetworkstructureto letagoodfunctioninyourfunctionset. 25. FAQ •Q:Howmanylayers?Howmanyneuronsforeach layer? •Q:Canthestructurebeautomaticallydetermined? TrialandErrorIntuition+ 26. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… Neural Network 27. TrainingData •Preparingtrainingdata:imagesandtheirlabels Thelearningtargetisdefinedon thetrainingdata. “5”“0”“4”“1” “3”“1”“2”“9” 28. LearningTarget 16x16=256 1x 2x ……256x …… …… …… …… Ink→1 Noink→0 …… y1 y2 y10 y1hasthemaximumvalue Thelearningtargetis…… Input: y2hasthemaximumvalueInput: is1 is2 is0 Softmax 29. Loss 1x 2x …… Nx …… …… …… …… …… y1 y2 y10 Loss 𝑙 “1” …… 1 0 0…… Losscanbethedistancebetweenthe networkoutputandtarget target Ascloseas possible Agoodfunctionshouldmaketheloss ofallexamplesassmallaspossible. Givenasetof parameters 30. TotalLoss x1 x2 xR NN NN NN …… …… y1 y2 yR 𝑦1 𝑦2 𝑦𝑅 𝑙1 …… …… x3NNy3 𝑦3 Foralltrainingdata… 𝐿= 𝑟=1 𝑅 𝑙𝑟 Findthenetwork parameters𝜽∗that minimizetotallossL TotalLoss: 𝑙2 𝑙3 𝑙𝑅 Assmallaspossible Findafunctionin functionsetthat minimizestotallossL 31. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… Neural Network 32. Howtopickthebestfunction Findnetworkparameters𝜽∗thatminimizetotallossL Networkparameters𝜃= 𝑤1,𝑤2,𝑤3,⋯,𝑏1,𝑏2,𝑏3,⋯ Enumerateallpossiblevalues Layerl …… Layerl+1 …… E.g.speechrecognition:8layersand 1000neuronseachlayer 1000 neurons 1000 neurons 106 weights Millionsofparameters 33. GradientDescent Total Loss𝐿 Random,RBMpre-train Usuallygoodenough Networkparameters𝜃= 𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯ w Pickaninitialvalueforw Findnetworkparameters𝜽∗thatminimizetotallossL 34. GradientDescent Total Loss𝐿 Networkparameters𝜃= 𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯ w Pickaninitialvalueforw Compute𝜕𝐿𝜕𝑤 Positive Negative Decreasew Increasew http://chico386.pixnet.net/album/photo/171572850 Findnetworkparameters𝜽∗thatminimizetotallossL 35. GradientDescent Total Loss𝐿 Networkparameters𝜃= 𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯ w Pickaninitialvalueforw Compute𝜕𝐿𝜕𝑤 −𝜂𝜕𝐿𝜕𝑤 ηiscalled “learningrate” 𝑤←𝑤−𝜂𝜕𝐿𝜕𝑤 Repeat Findnetworkparameters𝜽∗thatminimizetotallossL 36. GradientDescent Total Loss𝐿 Networkparameters𝜃= 𝑤1,𝑤2,⋯,𝑏1,𝑏2,⋯ w Pickaninitialvalueforw Compute𝜕𝐿𝜕𝑤 𝑤←𝑤−𝜂𝜕𝐿𝜕𝑤 RepeatUntil𝜕𝐿𝜕𝑤isapproximatelysmall (whenupdateislittle) Findnetworkparameters𝜽∗thatminimizetotallossL 37. GradientDescent 𝑤1 Compute𝜕𝐿𝜕𝑤1 −𝜇𝜕𝐿𝜕𝑤1 0.15 𝑤2 Compute𝜕𝐿𝜕𝑤2 −𝜇𝜕𝐿𝜕𝑤2 0.05 𝑏1 Compute𝜕𝐿𝜕𝑏1 −𝜇𝜕𝐿𝜕𝑏1 0.2 ………… 0.2 -0.1 0.3 𝜃 𝜕𝐿 𝜕𝑤1 𝜕𝐿 𝜕𝑤2 ⋮ 𝜕𝐿 𝜕𝑏1 ⋮ 𝛻𝐿= gradient 38. GradientDescent 𝑤1 Compute𝜕𝐿𝜕𝑤1 −𝜇𝜕𝐿𝜕𝑤1 0.15 −𝜇𝜕𝐿𝜕𝑤1 Compute𝜕𝐿𝜕𝑤1 0.09 𝑤2 Compute𝜕𝐿𝜕𝑤2 −𝜇𝜕𝐿𝜕𝑤2 0.05 −𝜇𝜕𝐿𝜕𝑤2 Compute𝜕𝐿𝜕𝑤2 0.15 𝑏1 Compute𝜕𝐿𝜕𝑏1 −𝜇𝜕𝐿𝜕𝑏1 0.2 −𝜇𝜕𝐿𝜕𝑏1 Compute𝜕𝐿𝜕𝑏1 0.10 ………… 0.2 -0.1 0.3 …… …… …… 𝜃 39. 𝑤1 𝑤2 GradientDescent Color:Valueof TotalLossL Randomlypickastartingpoint 40. 𝑤1 𝑤2 GradientDescentHopfully,wewouldreach aminima….. Compute𝜕𝐿𝜕𝑤1,𝜕𝐿𝜕𝑤2 (−𝜂𝜕𝐿𝜕𝑤1,−𝜂𝜕𝐿𝜕𝑤2) Color:Valueof TotalLossL 41. GradientDescent-Difficulty •Gradientdescentneverguaranteeglobalminima 𝐿 𝑤1𝑤2 Differentinitialpoint Reachdifferentminima, sodifferentresults Therearesometipsto helpyouavoidlocal minima,noguarantee. 42. GradientDescent 𝑤1𝑤2 YouareplayingAgeofEmpires… Compute𝜕𝐿𝜕𝑤1,𝜕𝐿𝜕𝑤2 (−𝜂𝜕𝐿𝜕𝑤1,−𝜂𝜕𝐿𝜕𝑤2) Youcannotseethewholemap. 43. GradientDescent Thisisthe“learning”ofmachinesindeep learning…… Evenalphagousingthisapproach. Ihopeyouarenottoodisappointed:p Peopleimage……Actually….. 44. Backpropagation •Backpropagation:anefficientwaytocompute𝜕𝐿𝜕𝑤 •Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_201 5_2/Lecture/DNN%20backprop.ecm.mp4/index.html Don’tworryabout𝜕𝐿𝜕𝑤,thetoolkitswillhandleit. 台大周伯威 同學開發 45. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ConcludingRemarks DeepLearningissosimple…… 46. OutlineofLectureI IntroductionofDeepLearning WhyDeep? “HelloWorld”forDeepLearning 47. LayerXSize WordError Rate(%) LayerXSize WordError Rate(%) 1X2k24.2 2X2k20.4 3X2k18.4 4X2k17.8 5X2k17.21X377222.5 7X2k17.11X463422.6 1X16k22.1 DeeperisBetter? Seide,Frank,GangLi,andDongYu."ConversationalSpeechTranscription UsingContext-DependentDeepNeuralNetworks."Interspeech.2011. Notsurprised,more parameters,better performance 48. UniversalityTheorem Referenceforthereason: http://neuralnetworksandde eplearning.com/chap4.html Anycontinuousfunctionf M :RRfN  Canberealizedbyanetwork withonehiddenlayer (givenenoughhidden neurons) Why“Deep”neuralnetworknot“Fat”neuralnetwork? 49. Fat+Shortv.s.Thin+Tall 1x2x……Nx Deep 1x2x……Nx …… Shallow Whichoneisbetter? Thesamenumber ofparameters 50. Fat+Shortv.s.Thin+Tall Seide,Frank,GangLi,andDongYu."ConversationalSpeechTranscription UsingContext-DependentDeepNeuralNetworks."Interspeech.2011. LayerXSize WordError Rate(%) LayerXSize WordError Rate(%) 1X2k24.2 2X2k20.4 3X2k18.4 4X2k17.8 5X2k17.21X377222.5 7X2k17.11X463422.6 1X16k22.1 Why? 51. Analogy •Logiccircuitsconsistsof gates •Atwolayersoflogicgates canrepresentanyBoolean function. •Usingmultiplelayersof logicgatestobuildsome functionsaremuchsimpler •Neuralnetworkconsistsof neurons •Ahiddenlayernetworkcan representanycontinuous function. •Usingmultiplelayersof neuronstorepresentsome functionsaremuchsimpler ThispageisforEEbackground. lessgatesneeded LogiccircuitsNeuralnetwork less parameters less data? 52. 長髮 男 Modularization •Deep→Modularization Girlswith longhair Boyswith shorthair Boyswith longhair Image Classifier 1 Classifier 2 Classifier 3 長髮 女 長髮 女 長髮 女 長髮 女 Girlswith shorthair 短髮 女 短髮 男 短髮 男 短髮 男 短髮 男 短髮 女 短髮 女 短髮 女 Classifier 4 Littleexamplesweak 53. Modularization •Deep→Modularization Image Longor short? BoyorGirl? Classifiersforthe attributes 長髮 男 長髮 女 長髮 女 長髮 女 長髮 女 短髮 女短髮 男 短髮 男 短髮 男 短髮 男 短髮 女 短髮 女 短髮 女 v.s. 長髮 男 長髮 女 長髮 女 長髮 女 長髮 女 短髮 女 短髮 男 短髮 男 短髮 男 短髮 男 短髮 女 短髮 女 短髮 女 v.s. Eachbasicclassifiercanhave sufficienttrainingexamples. Basic Classifier 54. Modularization •Deep→Modularization Image Longor short? BoyorGirl? Sharingbythe followingclassifiers asmodule canbetrainedbylittledata Girlswith longhair Boyswith shorthair Boyswith longhair Classifier 1 Classifier 2 Classifier 3 Girlswith shorthair Classifier 4 LittledatafineBasic Classifier 55. Modularization •Deep→Modularization 1x 2x …… Nx …… …… …… …… …… …… Themostbasic classifiers Use1stlayerasmodule tobuildclassifiers Use2ndlayeras module…… Themodularizationis automaticallylearnedfromdata. →Lesstrainingdata? 56. Modularization •Deep→Modularization 1x 2x …… Nx …… …… …… …… …… …… Themostbasic classifiers Use1stlayerasmodule tobuildclassifiers Use2ndlayeras module…… Reference:Zeiler,M.D.,&Fergus,R. (2014).Visualizingandunderstanding convolutionalnetworks.InComputer Vision–ECCV2014(pp.818-833) 57. OutlineofLectureI IntroductionofDeepLearning WhyDeep? “HelloWorld”forDeepLearning 58. Keras keras http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/L ecture/Theano%20DNN.ecm.mp4/index.html http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Le cture/RNN%20training%20(v6).ecm.mp4/index.html Veryflexible Needsome efforttolearn Easytolearnanduse (stillhavesomeflexibility) Youcanmodifyitifyoucanwrite TensorFloworTheano Interfaceof TensorFlowor Theano or Ifyouwanttolearntheano: 59. Keras •FrançoisCholletistheauthorofKeras. •HecurrentlyworksforGoogleasadeeplearning engineerandresearcher. •KerasmeanshorninGreek •Documentation:http://keras.io/ •Example: https://github.com/fchollet/keras/tree/master/exa mples 60. 使用Keras心得 感謝沈昇勳同學提供圖檔 61. ExampleApplication •HandwritingDigitRecognition Machine“1” “Helloworld”fordeeplearning MNISTData:http://yann.lecun.com/exdb/mnist/ Kerasprovidesdatasetsloadingfunction:http://keras.io/datasets/ 28x28 62. Keras y1y2y10 …… …… …… …… Softmax 500 500 28x28 63. Keras 64. Keras Step3.1:Configuration Step3.2:Findtheoptimalnetworkparameters 𝑤←𝑤−𝜂𝜕𝐿𝜕𝑤 0.1 Trainingdata (Images) Labels (digits) Nextlecture 65. Keras Step3.2:Findtheoptimalnetworkparameters https://www.tensorflow.org/versions/r0.8/tutorials/mnist/beginners/index.html Numberoftrainingexamples numpyarray 28x28 =784 numpyarray 10 Numberoftrainingexamples ………… 66. Keras http://keras.io/getting-started/faq/#how-can-i-save-a-keras-model Howtousetheneuralnetwork(testing): case1: case2: Saveandloadmodels 67. Keras •UsingGPUtospeedtraining •Way1 •THEANO_FLAGS=device=gpu0python YourCode.py •Way2(inyourcode) •importos •os.environ["THEANO_FLAGS"]= "device=gpu0" 68. LiveDemo 69. LectureII: TipsforTrainingDNN 70. Neural Network GoodResultson TestingData? GoodResultson TrainingData? Step3:pickthe bestfunction Step2:goodness offunction Step1:definea setoffunction YES YES NO NO Overfitting! RecipeofDeepLearning 71. DonotalwaysblameOverfitting TestingData Overfitting? TrainingData Notwelltrained 72. Neural Network GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning Differentapproachesfor differentproblems. e.g.dropoutforgoodresults ontestingdata 73. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning Choosingproperloss Mini-batch Newactivationfunction AdaptiveLearningRate Momentum 74. ChoosingProperLoss 1x 2x…… 256x …… …… …… …… …… y1 y2 y10 loss “1” …… 1 0 0 …… target Softmax 𝑖=1 10 𝑦𝑖−𝑦𝑖 2Square Error Cross Entropy− 𝑖=1 10 𝑦𝑖𝑙𝑛𝑦𝑖 Whichoneisbetter? 𝑦1 𝑦2 𝑦10 …… 1 0 0 =0=0 75. Let’stryit SquareError CrossEntropy 76. Let’stryit Accuracy SquareError0.11 CrossEntropy0.84 Training Testing: Cross Entropy Square Error 77. ChoosingProperLoss Total Loss w1 w2 Cross Entropy Square Error Whenusingsoftmaxoutputlayer, choosecrossentropy http://jmlr.org/procee dings/papers/v9/gloro t10a/glorot10a.pdf 78. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning Choosingproperloss Mini-batch Newactivationfunction AdaptiveLearningRate Momentum 79. Mini-batch x1 NN …… y1 𝑦1 𝑙1 x31NNy31 𝑦31 𝑙31 x2 NN …… y2 𝑦2 𝑙2 x16NNy16 𝑦16 𝑙16 Pickthe1stbatch Randomlyinitialize networkparameters Pickthe2ndbatch Mini-batchMini-batch 𝐿′=𝑙1+𝑙31+⋯ 𝐿′′=𝑙2+𝑙16+⋯ Updateparametersonce Updateparametersonce Untilallmini-batches havebeenpicked … oneepoch Repeattheaboveprocess Wedonotreallyminimizetotalloss! 80. Mini-batch x1 NN …… y1 𝑦1 𝑙1 x31NNy31 𝑦31 𝑙31 Mini-batch Pickthe1stbatch Pickthe2ndbatch 𝐿′=𝑙1 +𝑙31 +⋯ 𝐿′′=𝑙2 +𝑙16 +⋯ Updateparametersonce Updateparametersonce Untilallmini-batches havebeenpicked …oneepoch 100examplesinamini-batch Repeat20times 81. Mini-batch x1 NN …… y1 𝑦1 𝑙1 x31NNy31 𝑦31 𝑙31 x2 NN …… y2 𝑦2 𝑙2 x16NNy16 𝑦16 𝑙16 Pickthe1stbatch Randomlyinitialize networkparameters Pickthe2ndbatch Mini-batchMini-batch 𝐿′=𝑙1+𝑙31+⋯ 𝐿′′=𝑙2+𝑙16+⋯ Updateparametersonce Updateparametersonce … Lisdifferenteachtime whenweupdate parameters! Wedonotreallyminimizetotalloss! 82. Mini-batch OriginalGradientDescentWithMini-batch Unstable!!! Thecolorsrepresentthetotalloss. 83. Mini-batchisFaster 1epoch Seeall examples Seeonlyone batch Updateafterseeingall examples Ifthereare20batches,update 20timesinoneepoch. OriginalGradientDescentWithMini-batch Notalwaystruewith parallelcomputing. Canhavethesamespeed (notsuperlargedataset) Mini-batchhasbetterperformance! 84. Mini-batchisBetter!Accuracy Mini-batch0.84 Nobatch0.12 Testing: Epoch Accuracy Mini-batch Nobatch Training 85. x1 NN……y1 𝑦1 𝑙1 x31NNy31 𝑦31 𝑙31 x2 NN …… y2 𝑦2 𝑙2 x16NNy16 𝑦16 𝑙16 Mini-batchMini-batch Shufflethetrainingexamplesforeachepoch Epoch1 x1 NN …… y1 𝑦1 𝑙1 x31NNy31 𝑦31 𝑙17 x2 NN …… y2 𝑦2 𝑙2 x16NNy16 𝑦16 𝑙26 Mini-batchMini-batch Epoch2 Don’tworry.ThisisthedefaultofKeras. 86. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning Choosingproperloss Mini-batch Newactivationfunction AdaptiveLearningRate Momentum 87. HardtogetthepowerofDeep… Deeperusuallydoesnotimplybetter. ResultsonTrainingData 88. Let’stryit Accuracy 3layers0.84 9layers0.11 Testing: 9layers 3layers Training 89. VanishingGradientProblem Largergradients AlmostrandomAlreadyconverge basedonrandom!? LearnveryslowLearnveryfast 1x 2x …… Nx …… …… …… …… …… …… …… y1 y2 yM Smallergradients 90. VanishingGradientProblem 1x 2x …… Nx …… …… …… …… …… …… …… 𝑦1 𝑦2 𝑦𝑀 …… 𝑦1 𝑦2 𝑦𝑀 𝑙 Intuitivewaytocomputethederivatives… 𝜕𝑙 𝜕𝑤 =? +∆𝑤 +∆𝑙 ∆𝑙 ∆𝑤 Smallergradients Large input Small output 91. HardtogetthepowerofDeep… In2006,peopleusedRBMpre-training. In2015,peopleuseReLU. 92. ReLU •RectifiedLinearUnit(ReLU) Reason: 1.Fasttocompute 2.Biologicalreason 3.Infinitesigmoid withdifferentbiases 4.Vanishinggradient problem 𝑧 𝑎 𝑎=𝑧 𝑎=0 𝜎𝑧 [XavierGlorot,AISTATS’11] [AndrewL.Maas,ICML’13] [KaimingHe,arXiv’15] 93. ReLU 1x 2x 1y 2y 0 0 0 0 𝑧 𝑎 𝑎=𝑧 𝑎=0 94. ReLU 1x 2x 1y 2y AThinnerlinearnetwork Donothave smallergradients 𝑧 𝑎 𝑎=𝑧 𝑎=0 95. Let’stryit 96. Let’stryit •9layers 9layersAccuracy Sigmoid0.11 ReLU0.96 Training Testing: ReLU Sigmoid 97. ReLU-variant 𝑧 𝑎 𝑎=𝑧 𝑎=0.01𝑧 𝐿𝑒𝑎𝑘𝑦𝑅𝑒𝐿𝑈 𝑧 𝑎 𝑎=𝑧 𝑎=𝛼𝑧 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑟𝑖𝑐𝑅𝑒𝐿𝑈 αalsolearnedby gradientdescent 98. Maxout •Learnableactivationfunction[IanJ.Goodfellow,ICML’13] Max 1x 2x Input Max +5 +7 +−1 +1 7 1 Max Max +1 +2 +4 +3 2 4 ReLUisaspecialcasesofMaxout Youcanhavemorethan2elementsinagroup. neuron 99. Maxout •Learnableactivationfunction[IanJ.Goodfellow,ICML’13] •Activationfunctioninmaxoutnetworkcanbe anypiecewiselinearconvexfunction •Howmanypiecesdependingonhowmany elementsinagroup ReLUisaspecialcasesofMaxout 2elementsinagroup3elementsinagroup 100. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning Choosingproperloss Mini-batch Newactivationfunction AdaptiveLearningRate Momentum 101. 𝑤1 𝑤2 LearningRates Iflearningrateistoolarge Totallossmaynotdecrease aftereachupdate Setthelearning rateηcarefully 102. 𝑤1 𝑤2 LearningRates Iflearningrateistoolarge Setthelearning rateηcarefully Iflearningrateistoosmall Trainingwouldbetooslow Totallossmaynotdecrease aftereachupdate 103. LearningRates •Popular&SimpleIdea:Reducethelearningrateby somefactoreveryfewepochs. •Atthebeginning,wearefarfromthedestination,sowe uselargerlearningrate •Afterseveralepochs,weareclosetothedestination,so wereducethelearningrate •E.g.1/tdecay:𝜂𝑡=𝜂𝑡+1 •Learningratecannotbeone-size-fits-all •Givingdifferentparametersdifferentlearning rates 104. Adagrad Parameterdependent learningrate w←𝑤−ߟ𝑤𝜕𝐿∕𝜕𝑤 constant 𝑔𝑖 is𝜕𝐿∕𝜕𝑤obtained atthei-thupdate ߟ𝑤= 𝜂 𝑖=0 𝑡 𝑔𝑖2 Summationofthesquareofthepreviousderivatives 𝑤←𝑤−𝜂𝜕𝐿∕𝜕𝑤Original: Adagrad: 105. Adagrad g0g1…… 0.10.2…… g0g1…… 20.010.0…… Observation:1.Learningrateissmallerand smallerforallparameters 2.Smallerderivatives,larger learningrate,andviceversa 𝜂 0.12 𝜂 0.12+0.22 𝜂 202 𝜂 202+102 = 𝜂 0.1 = 𝜂 0.22 = 𝜂 20 = 𝜂 22 Why? ߟ𝑤= 𝜂 𝑖=0 𝑡 𝑔𝑖2 Learningrate:Learningrate: 𝑤1𝑤2 106. SmallerDerivatives LargerLearningRate 2.Smallerderivatives,larger learningrate,andviceversa Why? Smaller LearningRate Larger derivatives 107. Notthewholestory…… •Adagrad[JohnDuchi,JMLR’11] •RMSprop •https://www.youtube.com/watch?v=O3sxAc4hxZU •Adadelta[MatthewD.Zeiler,arXiv’12] •“Nomorepeskylearningrates”[TomSchaul,arXiv’12] •AdaSecant[CaglarGulcehre,arXiv’14] •Adam[DiederikP.Kingma,ICLR’15] •Nadam •http://cs229.stanford.edu/proj2015/054_report.pdf 108. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning Choosingproperloss Mini-batch Newactivationfunction AdaptiveLearningRate Momentum 109. Hardtofind optimalnetworkparameters Total Loss Thevalueofanetworkparameterw Veryslowatthe plateau Stuckatlocalminima 𝜕𝐿∕𝜕𝑤 =0 Stuckatsaddlepoint 𝜕𝐿∕𝜕𝑤 =0 𝜕𝐿∕𝜕𝑤 ≈0 110. Inphysicalworld…… •Momentum Howaboutputthisphenomenon ingradientdescent? 111. Movement= Negativeof𝜕𝐿∕𝜕𝑤+Momentum Momentum cost 𝜕𝐿∕𝜕𝑤=0 Stillnotguaranteereaching globalminima,butgivesome hope…… Negativeof𝜕𝐿∕𝜕𝑤 Momentum RealMovement 112. AdamRMSProp(AdvancedAdagrad)+Momentum 113. Let’stryit •ReLU,3layer Accuracy Original0.96 Adam0.97 Training Testing: Adam Original 114. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning EarlyStopping Regularization Dropout NetworkStructure 115. WhyOverfitting? •Trainingdataandtestingdatacanbedifferent. TrainingData:TestingData: Theparametersachievingthelearningtargetdonot necessaryhavegoodresultsonthetestingdata. Learningtargetisdefinedbythetrainingdata. 116. PanaceaforOverfitting •Havemoretrainingdata •Createmoretrainingdata(?) Original TrainingData: Created TrainingData: Shift15。

Handwritingrecognition: 117. WhyOverfitting? •Forexperiments,weaddedsomenoisestothe testingdata 118. WhyOverfitting? •Forexperiments,weaddedsomenoisestothe testingdata Trainingisnotinfluenced. Accuracy Clean0.97 Noisy0.50 Testing: 119. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning EarlyStopping WeightDecay Dropout NetworkStructure 120. EarlyStopping Epochs Total Loss Trainingset Testingset Stopat here Validationset http://keras.io/getting-started/faq/#how-can-i-interrupt-training-when- the-validation-loss-isnt-decreasing-anymoreKeras: 121. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning EarlyStopping WeightDecay Dropout NetworkStructure 122. WeightDecay •Ourbrainprunesouttheuselesslinkbetween neurons. Doingthesamethingtomachine’sbrainimproves theperformance. 123. WeightDecay Useless Closetozero(萎縮了) Weightdecayisone kindofregularization 124. WeightDecay •Implementation Smallerandsmaller Keras:http://keras.io/regularizers/ w L ww     w L ww   1 Original: WeightDecay: 0.01 0.99 125. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning EarlyStopping WeightDecay Dropout NetworkStructure 126. Dropout Training: Eachtimebeforeupdatingtheparameters Eachneuronhasp%todropout 127. Dropout Training: Eachtimebeforeupdatingtheparameters Eachneuronhasp%todropout Usingthenewnetworkfortraining Thestructureofthenetworkischanged. Thinner! Foreachmini-batch,weresamplethedropoutneurons 128. Dropout Testing: Nodropout Ifthedropoutrateattrainingisp%, alltheweightstimes(1-p)% Assumethatthedropoutrateis50%. Ifaweightw=1bytraining,set𝑤=0.5fortesting. 129. Dropout-IntuitiveReason Whenteamsup,ifeveryoneexpectthepartnerwilldo thework,nothingwillbedonefinally. However,ifyouknowyourpartnerwilldropout,you willdobetter. 我的partner 會擺爛,所以 我要好好做 Whentesting,noonedropoutactually,soobtaining goodresultseventually. 130. Dropout-IntuitiveReason •Whytheweightsshouldmultiply(1-p)%(dropout rate)whentesting? TrainingofDropoutTestingofDropout 𝑤1 𝑤2 𝑤3 𝑤4 𝑧 𝑤1 𝑤2 𝑤3 𝑤4 𝑧′ Assumedropoutrateis50% 0.5× 0.5× 0.5× 0.5× Nodropout Weightsfromtraining 𝑧′≈2𝑧 𝑧′≈𝑧 Weightsmultiply(1-p)% 131. Dropoutisakindofensemble. Ensemble Network 1 Network 2 Network 3 Network 4 Trainabunchofnetworkswithdifferentstructures Training Set Set1Set2Set3Set4 132. Dropoutisakindofensemble. Ensemble y1 Network 1 Network 2 Network 3 Network 4 Testingdatax y2y3y4 average 133. Dropoutisakindofensemble. Trainingof Dropout minibatch 1 …… Usingonemini-batchtotrainonenetwork Someparametersinthenetworkareshared minibatch 2 minibatch 3 minibatch 4 Mneurons 2Mpossible networks 134. Dropoutisakindofensemble. testingdatax TestingofDropout …… average y1y2y3 Allthe weights multiply (1-p)% ≈y ????? 135. Moreaboutdropout •Morereferencefordropout[NitishSrivastava,JMLR’14][PierreBaldi, NIPS’13][GeoffreyE.Hinton,arXiv’12] •DropoutworksbetterwithMaxout[IanJ.Goodfellow,ICML’13] •Dropconnect[LiWan,ICML’13] •Dropoutdeleteneurons •Dropconnectdeletestheconnectionbetweenneurons •Annealeddropout[S.J.Rennie,SLT’14] •Dropoutratedecreasesbyepochs •Standout[J.Ba,NISP’13] •Eachneuralhasdifferentdropoutrate 136. Let’stryit y1y2y10 …… …… …… …… Softmax 500 500 model.add(dropout(0.8)) model.add(dropout(0.8)) 137. Let’stryit Training Dropout NoDropout Epoch Accuracy Accuracy Noisy0.50 +dropout0.63 Testing: 138. GoodResultson TestingData? GoodResultson TrainingData? YES YES RecipeofDeepLearning EarlyStopping Regularization Dropout NetworkStructure CNNisaverygoodexample! (nextlecture) 139. ConcludingRemarks ofLectureII 140. RecipeofDeepLearning Neural Network GoodResultson TestingData? GoodResultson TrainingData? Step3:pickthe bestfunction Step2:goodness offunction Step1:definea setoffunction YES YES NO NO 141. Let’stryanothertask 142. DocumentClassification http://top-breaking-news.com/ Machine 政治 體育 經濟 “president”indocument “stock”indocument 體育政治財經 143. Data 144. MSE 145. ReLU 146. AdaptiveLearningRate Accuracy MSE0.36 CE0.55 +ReLU0.75 +Adam0.77 147. Dropout Accuracy Adam0.77 +dropout0.79 148. LectureIII: VariantsofNeural Networks 149. VariantsofNeuralNetworks ConvolutionalNeural Network(CNN) RecurrentNeuralNetwork (RNN) Widelyusedin imageprocessing 150. WhyCNNforImage? •Whenprocessingimage,thefirstlayeroffully connectednetworkwouldbeverylarge 100 …… …… …… …… …… Softmax 100 100x100x31000 3x107 Canthefullyconnectednetworkbesimplifiedby consideringthepropertiesofimagerecognition? 151. WhyCNNforImage •Somepatternsaremuchsmallerthanthewhole image Aneurondoesnothavetoseethewholeimage todiscoverthepattern. “beak”detector Connectingtosmallregionwithlessparameters 152. WhyCNNforImage •Thesamepatternsappearindifferentregions. “upper-left beak”detector “middlebeak” detector Theycanusethesame setofparameters. Doalmostthesamething 153. WhyCNNforImage •Subsamplingthepixelswillnotchangetheobject subsampling bird bird Wecansubsamplethepixelstomakeimagesmaller Lessparametersforthenetworktoprocesstheimage 154. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… Convolutional NeuralNetwork 155. ThewholeCNN FullyConnected Feedforwardnetwork catdog…… Convolution MaxPooling Convolution MaxPooling Flatten Canrepeat manytimes 156. ThewholeCNN Convolution MaxPooling Convolution MaxPooling Flatten Canrepeat manytimes Somepatternsaremuch smallerthanthewholeimage Thesamepatternsappearin differentregions. Subsamplingthepixelswill notchangetheobject Property1 Property2 Property3 157. ThewholeCNN FullyConnected Feedforwardnetwork catdog…… Convolution MaxPooling Convolution MaxPooling Flatten Canrepeat manytimes 158. CNN–Convolution 100001 010010 001100 100010 010010 001010 6x6image 1-1-1 -11-1 -1-11 Filter1 -11-1 -11-1 -11-1 Filter2 …… Thosearethenetwork parameterstobelearned. Matrix Matrix Eachfilterdetectsasmall pattern(3x3). Property1 159. CNN–Convolution 100001 010010 001100 100010 010010 001010 6x6image 1-1-1 -11-1 -1-11 Filter1 3-1 stride=1 160. CNN–Convolution 100001 010010 001100 100010 010010 001010 6x6image 1-1-1 -11-1 -1-11 Filter1 3-3 Ifstride=2 Wesetstride=1below 161. CNN–Convolution 100001 010010 001100 100010 010010 001010 6x6image 1-1-1 -11-1 -1-11 Filter1 3-1-3-1 -310-3 -3-301 3-2-2-1 stride=1 Property2 162. CNN–Convolution 100001 010010 001100 100010 010010 001010 6x6image 3-1-3-1 -310-3 -3-301 3-2-2-1 -11-1 -11-1 -11-1 Filter2 -1-1-1-1 -1-1-21 -1-1-21 -10-43 Dothesameprocessfor everyfilter stride=1 4x4image Feature Map 163. CNN–ZeroPadding 100001 010010 001100 100010 010010 001010 6x6image 1-1-1 -11-1 -1-11 Filter1 Youwillgetanother6x6 imagesinthisway 0 Zeropadding 00 0 0 0 0 000 164. CNN–Colorfulimage 100001 010010 001100 100010 010010 001010 100001 010010 001100 100010 010010 001010 100001 010010 001100 100010 010010 001010 1-1-1 -11-1 -1-11 Filter1 -11-1 -11-1 -11-1 Filter2 1-1-1 -11-1 -1-11 1-1-1 -11-1 -1-11 -11-1 -11-1 -11-1 -11-1 -11-1 -11-1 Colorfulimage 165. ThewholeCNN FullyConnected Feedforwardnetwork catdog…… Convolution MaxPooling Convolution MaxPooling Flatten Canrepeat manytimes 166. CNN–MaxPooling 3-1-3-1 -310-3 -3-301 3-2-2-1 -11-1 -11-1 -11-1 Filter2 -1-1-1-1 -1-1-21 -1-1-21 -10-43 1-1-1 -11-1 -1-11 Filter1 167. CNN–MaxPooling 100001 010010 001100 100010 010010 001010 6x6image 30 13 -11 30 2x2image Eachfilter isachannel Newimage butsmaller Conv Max Pooling 168. ThewholeCNN Convolution MaxPooling Convolution MaxPooling Canrepeat manytimes Anewimage Thenumberofthechannel isthenumberoffilters Smallerthantheoriginal image 30 13 -11 30 169. ThewholeCNN FullyConnected Feedforwardnetwork catdog…… Convolution MaxPooling Convolution MaxPooling Flatten Anewimage Anewimage 170. Flatten 30 13 -11 30Flatten 3 0 1 3 -1 1 0 3 FullyConnected Feedforwardnetwork 171. ThewholeCNN Convolution MaxPooling Convolution MaxPooling Canrepeat manytimes 172. Max 1x 2x Input Max +5 +7 +−1 +1 7 1 100001 010010 001100 100010 010010 001010 image convolutionMax pooling -11-1 -11-1 -11-1 1-1-1 -11-1 -1-11 (Ignoringthenon-linearactivationfunctionaftertheconvolution.) 173. 100001 010010 001100 100010 010010 001010 6x6image 1-1-1 -11-1 -1-11 Filter1 1: 2: 3: … 7: 8: 9: … 13: 14: 15:…Onlyconnectto9 input,notfully connected 4: 10: 16: 1 0 0 0 0 1 0 0 0 0 1 1 3 Lessparameters! 174. 100001 010010 001100 100010 010010 001010 1-1-1 -11-1 -1-11 Filter1 1: 2: 3: … 7: 8: 9: … 13: 14: 15:… 4: 10: 16: 1 0 0 0 0 1 0 0 0 0 1 1 3 -1 Sharedweights 6x6image Lessparameters! Evenlessparameters! 175. Max 1x 2x Input Max +5 +7 +−1 +1 7 1 100001 010010 001100 100010 010010 001010 image convolutionMax pooling -11-1 -11-1 -11-1 1-1-1 -11-1 -1-11 (Ignoringthenon-linearactivationfunctionaftertheconvolution.) 176. 3-1-3-1 -310-3 -3-301 3-2-2-1 30 13 Max 1x 1x Input Max +5 +7 +−1 +1 7 1 177. Max 1x 2x Input Max +5 +7 +−1 +1 7 1 100001 010010 001100 100010 010010 001010 image convolution Max pooling -11-1 -11-1 -11-1 1-1-1 -11-1 -1-11 Only9x2=18 parameters Dim=6x6=36 Dim=4x4x2 =32 parameters= 36x32=1152 178. ConvolutionalNeuralNetwork Learning:Nothingspecial,justgradientdescent…… CNN “monkey” “cat” “dog” Convolution,Max Pooling,fullyconnected 1 0 0 …… target Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function Convolutional NeuralNetwork 179. PlayingGo Network(19x19 positions) Nextmove 19x19vector Black:1 white:-1 none:0 19x19vector Fully-connectedfeedword networkcanbeused ButCNNperformsmuchbetter. 19x19matrix (image) 180. PlayingGo Network Network recordofpreviousplays Target: “天元”=1 else=0 Target: “五之5”=1 else=0 Training: 進藤光v.s.社清春 黑:5之五 白:天元 黑:五之5 181. WhyCNNforplayingGo? •Somepatternsaremuchsmallerthanthewhole image •Thesamepatternsappearindifferentregions. AlphaGouses5x5forfirstlayer 182. WhyCNNforplayingGo? •Subsamplingthepixelswillnotchangetheobject AlphaGodoesnotuseMaxPooling…… MaxPoolingHowtoexplainthis??? 183. VariantsofNeuralNetworks ConvolutionalNeural Network(CNN) RecurrentNeuralNetwork (RNN)NeuralNetworkwithMemory 184. ExampleApplication •SlotFilling IwouldliketoarriveTaipeionNovember2nd. ticketbookingsystem Destination: timeofarrival: Taipei November2nd Slot 185. ExampleApplication 1x2x 2y1y Taipei Input:aword (Eachwordisrepresented asavector) Solvingslotfillingby Feedforwardnetwork? 186. 1-of-Nencoding Eachdimensioncorresponds toawordinthelexicon Thedimensionfortheword is1,andothersare0 lexicon={apple,bag,cat,dog,elephant} apple=[10000] bag=[01000] cat=[00100] dog=[00010] elephant=[00001] Thevectorislexiconsize. 1-of-NEncoding Howtorepresenteachwordasavector? 187. Beyond1-of-Nencoding w=“apple” a-a-a a-a-b p-p-l 26X26X26 …… a-p-p … p-l-e … ………… 1 1 1 0 0 WordhashingDimensionfor“Other” w=“Sauron” … apple bag cat dog elephant “other” 0 0 0 0 0 1 w=“Gandalf” 187 188. ExampleApplication 1x2x 2y1y Taipei dest timeof departure Input:aword (Eachwordisrepresented asavector) Output: Probabilitydistributionthat theinputwordbelongingto theslots Solvingslotfillingby Feedforwardnetwork? 189. ExampleApplication 1x2x 2y1y Taipei arriveTaipeionNovember2nd otherotherdesttimetime leaveTaipeionNovember2nd placeofdeparture Neuralnetwork needsmemory! dest timeof departure Problem? 190. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… Recurrent NeuralNetwork 191. RecurrentNeuralNetwork(RNN) 1x2x 2y1y 1a2a Memorycanbeconsidered asanotherinput. Theoutputofhiddenlayer arestoredinthememory. store 192. RNN storestore x1 x2x3 y1y2 y3 a1 a1 a2 a2a3 Thesamenetworkisusedagainandagain. arriveTaipeionNovember2nd Probabilityof “arrive”ineachslot Probabilityof “Taipei”ineachslot Probabilityof “on”ineachslot 193. RNN store x1x2 y1y2 a1 a1 a2 …… …… …… store x1x2 y1y2 a1 a1 a2 …… …… …… leaveTaipei Probof“leave” ineachslot Probof“Taipei” ineachslot Probof“arrive” ineachslot Probof“Taipei” ineachslot arriveTaipei Different Thevaluesstoredinthememoryisdifferent. 194. Ofcourseitcanbedeep… ………… xt xt+1xt+2 …… ……yt …… …… yt+1 …… yt+2 …… …… 195. BidirectionalRNN yt+1 ………… ………… yt+2yt xtxt+1xt+2 xt xt+1xt+2 196. Memory Cell LongShort-termMemory(LSTM) InputGate OutputGate Signalcontrol theinputgate Signalcontrol theoutputgate Forget Gate Signalcontrol theforgetgate Otherpartofthenetwork Otherpartofthenetwork (Otherpartof thenetwork) (Otherpartof thenetwork) (Otherpartof thenetwork) LSTM SpecialNeuron: 4inputs, 1output 197. 𝑧 𝑧𝑖 𝑧𝑓 𝑧𝑜 𝑔𝑧 𝑓𝑧𝑖 multiply multiply Activationfunctionfis usuallyasigmoidfunction Between0and1 Mimicopenandclosegate c 𝑐′=𝑔𝑧𝑓𝑧𝑖+𝑐𝑓𝑧𝑓 ℎ𝑐′𝑓𝑧𝑜 𝑎=ℎ𝑐′ 𝑓𝑧𝑜 𝑔𝑧𝑓𝑧𝑖 𝑐′ 𝑓𝑧𝑓 𝑐𝑓𝑧𝑓 𝑐 198. 7 3 10 -10 10 3 ≈13 ≈1 10 10 ≈0 0 199. 7 -3 10 10 -10 ≈1 ≈0 10 ≈1 -3 -3 -3 -3 -3 200. LSTM ct-1 …… vector xt zzizfzo4vectors 201. LSTM xt zzi × zfzo ×+× yt ct-1 z zi zf zo 202. LSTM xt zzi × zfzo ×+× yt xt+1 zzi × zfzo ×+× yt+1 ht Extension:“peephole” ht-1ctct-1 ct-1ct ct+1 203. Multiple-layer LSTM Thisisquite standardnow. https://img.komicolle.org/2015-09-20/src/14426967627131.gif Don’tworryifyoucannotunderstandthis. Kerascanhandleit. Kerassupports “LSTM”,“GRU”,“SimpleRNN”layers 204. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… 205. copycopy x1 x2x3 y1y2 y3 Wi a1 a1 a2 a2a3 arriveTaipeionNovember2nd Training Sentences: LearningTarget otherotherdest 100100100 otherdestother ……………… timetime 206. Step1: defineaset offunction Step2: goodnessof function Step3:pick thebest function ThreeStepsforDeepLearning DeepLearningissosimple…… 207. Learning RNNLearningisverydifficultinpractice. Backpropagation throughtime(BPTT) 𝑤←𝑤−𝜂𝜕𝐿∕𝜕𝑤1x2x 2y1y 1a2a copy 𝑤 208. Unfortunately…… •RNN-basednetworkisnotalwayseasytolearn 感謝曾柏翔同學 提供實驗結果 RealexperimentsonLanguagemodeling Lucky sometimes TotalLoss Epoch 209. Theerrorsurfaceisrough. w1 w2 Cost Theerrorsurfaceiseither veryflatorverysteep. Clipping [RazvanPascanu,ICML’13] TotalLoss 210. Why? 1 1 y1 0 1 w y2 0 1 w y3 0 1 w y1000 …… 𝑤=1 𝑤=1.01 𝑦1000 =1 𝑦1000≈20000 𝑤=0.99 𝑤=0.01 𝑦1000≈0 𝑦1000≈0 1111 Large 𝜕𝐿𝜕𝑤 Small Learningrate? small 𝜕𝐿𝜕𝑤 Large Learningrate? ToyExample =w999 211. add •LongShort-termMemory(LSTM) •Candealwithgradientvanishing(notgradient explode) HelpfulTechniques Memoryandinputare added Theinfluenceneverdisappears unlessforgetgateisclosed NoGradientvanishing (Ifforgetgateisopened.) [Cho,EMNLP’14] GatedRecurrentUnit(GRU): simplerthanLSTM 212. HelpfulTechniques VanillaRNNInitializedwithIdentitymatrix+ReLUactivation function[QuocV.Le,arXiv’15] OutperformorbecomparablewithLSTMin4differenttasks [JanKoutnik,JMLR’14] ClockwiseRNN [TomasMikolov,ICLR’15] StructurallyConstrained RecurrentNetwork(SCRN) 213. MoreApplications…… storestore x1 x2x3 y1y2 y3 a1 a1 a2 a2a3 arriveTaipeionNovember2nd Probabilityof “arrive”ineachslot Probabilityof “Taipei”ineachslot Probabilityof “on”ineachslot Inputandoutputarebothsequences withthesamelength RNNcandomorethanthat! 214. Manytoone •Inputisavectorsequence,butoutputisonlyonevector SentimentAnalysis …… 我覺太得糟了 超好雷 好雷 普雷 負雷 超負雷 看了這部電影覺 得很高興……. 這部電影太糟了 ……. 這部電影很 棒……. Positive(正雷)Negative(負雷)Positive(正雷) …… KerasExample: https://github.com/fchollet/keras/blob /master/examples/imdb_lstm.py 215. ManytoMany(Outputisshorter) •Bothinputandoutputarebothsequences,buttheoutput isshorter. •E.g.SpeechRecognition 好好好 Trimming 棒棒棒棒棒 “好棒” Whycan’titbe “好棒棒” Input: Output:(charactersequence) (vector sequence) Problem? 216. ManytoMany(Outputisshorter) •Bothinputandoutputarebothsequences,buttheoutput isshorter. •ConnectionistTemporalClassification(CTC)[AlexGraves, ICML’06][AlexGraves,ICML’14][HaşimSak,Interspeech’15][JieLi, Interspeech’15][AndrewSenior,ASRU’15] 好φφ棒φφφφ好φφ棒φ棒φφ “好棒”“好棒棒”Addanextrasymbol“φ” representing“null” 217. ManytoMany(NoLimitation) •Bothinputandoutputarebothsequenceswithdifferent lengths.→Sequencetosequencelearning •E.g.MachineTranslation(machinelearning→機器學習) Containingall informationabout inputsequence learning machine 218. learning ManytoMany(NoLimitation) •Bothinputandoutputarebothsequenceswithdifferent lengths.→Sequencetosequencelearning •E.g.MachineTranslation(machinelearning→機器學習) machine 機習器學 …… …… Don’tknowwhentostop 慣性 219. ManytoMany(NoLimitation) 推tlkagk:=========斷========== Ref:http://zh.pttpedia.wikia.com/wiki/%E6%8E%A5%E9%BE%8D% E6%8E%A8%E6%96%87(鄉民百科) 220. learning ManytoMany(NoLimitation) •Bothinputandoutputarebothsequenceswithdifferent lengths.→Sequencetosequencelearning •E.g.MachineTranslation(machinelearning→機器學習) machine 機習器學 Addasymbol“===“(斷) [IlyaSutskever,NIPS’14][DzmitryBahdanau,arXiv’15] === 221. OnetoMany •Inputanimage,butoutputasequenceofwords Input image awomanis …… === CNN Avector forwhole image [KelvinXu,arXiv’15][LiYao,ICCV’15] CaptionGeneration 222. Application: VideoCaptionGeneration Video Agirlisrunning. Agroupofpeopleis walkingintheforest. Agroupofpeopleis knockedbyatree. 223. VideoCaptionGeneration •Canmachinedescribewhatitseefromvideo? •Demo:曾柏翔、吳柏瑜、盧宏宗 224. ConcludingRemarks ConvolutionalNeural Network(CNN) RecurrentNeuralNetwork (RNN) 225. LectureIV: NextWave 226. Outline SupervisedLearning •UltraDeepNetwork •AttentionModel ReinforcementLearning UnsupervisedLearning •Image:RealizingwhattheWorldLooksLike •Text:UnderstandingtheMeaningofWords •Audio:Learninghumanlanguagewithoutsupervision Newnetworkstructure 227. Skyscraper https://zh.wikipedia.org/wiki/%E9%9B%99%E5%B3%B0%E5%A1%94#/me dia/File:BurjDubaiHeight.svg 228. UltraDeepNetwork 8layers 19layers 22layers AlexNet(2012)VGG(2014)GoogleNet(2014) 16.4% 7.3% 6.7% http://cs231n.stanford.e du/slides/winter1516_le cture8.pdf 229. UltraDeepNetwork AlexNet (2012) VGG (2014) GoogleNet (2014) 152layers 3.57% ResidualNet (2015) Taipei 101 101layers 16.4% 7.3%6.7% 230. UltraDeepNetwork AlexNet (2012) VGG (2014) GoogleNet (2014) 152layers 3.57% ResidualNet (2015) 16.4% 7.3%6.7% Thisultradeepnetwork havespecialstructure. Worryaboutoverfitting? Worryabouttraining first! 231. UltraDeepNetwork •Ultradeepnetworkisthe ensembleofmanynetworks withdifferentdepth. 6layers 4layers 2layers Ensemble 232. UltraDeepNetwork •FractalNet ResnetinResnet GoodInitialization? 233. UltraDeepNetwork •• + copy copy Gate controller 234. Inputlayer outputlayer Inputlayer outputlayer Inputlayer outputlayer HighwayNetworkautomatically determinesthelayersneeded! 235. Outline SupervisedLearning •UltraDeepNetwork •AttentionModel ReinforcementLearning UnsupervisedLearning •Image:RealizingwhattheWorldLooksLike •Text:UnderstandingtheMeaningofWords •Audio:Learninghumanlanguagewithoutsupervision Newnetworkstructure 236. Organize Attention-basedModel http://henrylo1605.blogspot.tw/2015/05/blog-post_56.html LunchtodayWhatyoulearned intheselectures summer vacation10 yearsago Whatisdeep learning? Answer 237. Attention-basedModel ReadingHead Controller Input ReadingHead output ………… Machine’sMemory DNN/RNN Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/Attain%20(v3).e cm.mp4/index.html 238. Attention-basedModelv2 ReadingHead Controller Input ReadingHead output ………… Machine’sMemory DNN/RNN NeuralTuringMachine WritingHead Controller WritingHead 239. ReadingComprehension Query Eachsentencebecomesavector. …… DNN/RNN ReadingHead Controller …… answer Semantic Analysis 240. ReadingComprehension •End-To-EndMemoryNetworks.S.Sukhbaatar,A.Szlam,J. Weston,R.Fergus.NIPS,2015. Thepositionofreadinghead: Kerashasexample: https://github.com/fchollet/keras/blob/master/examples/ba bi_memnn.py 241. VisualQuestionAnswering source:http://visualqa.org/ 242. VisualQuestionAnswering QueryDNN/RNN ReadingHead Controller answer CNNAvectorfor eachregion 243. VisualQuestionAnswering •HuijuanXu,KateSaenko.Ask,AttendandAnswer:Exploring Question-GuidedSpatialAttentionforVisualQuestion Answering.arXivPre-Print,2015 244. SpeechQuestionAnswering •TOEFLListeningComprehensionTestbyMachine •Example: Question:“WhatisapossibleoriginofVenus’clouds?” AudioStory: Choices: (A)gasesreleasedasaresultofvolcanicactivity (B)chemicalreactionscausedbyhighsurfacetemperatures (C)burstsofradioenergyfromtheplane'ssurface (D)strongwindsthatblowdustintotheatmosphere (Theoriginalstoryis5minlong.) 245. SimpleBaselines Accuracy(%) (1)(2)(3)(4)(5)(6)(7) NaiveApproaches random (4)thechoicewithsemantic mostsimilartoothers (2)selecttheshortest choiceasanswer Experimentalsetup: 717fortraining, 124forvalidation,122fortesting 246. ModelArchitecture “whatisapossible originofVenus‘clouds?" Question: Question Semantics ……Itbequitepossiblethatthisbe duetovolcaniceruptionbecause volcaniceruptionoftenemitgas.If thatbethecasevolcanismcouldvery wellbetherootcauseofVenus'sthick cloudcover.Andalsowehaveobserve burstofradioenergyfromtheplanet 'ssurface.Theseburstbesimilarto whatweseewhenvolcanoerupton earth…… AudioStory: Speech Recognition Semantic Analysis Semantic Analysis Attention Answer Selectthechoicemost similartotheanswer Attention Everythingislearned fromtrainingexamples 247. ModelArchitecture Word-basedAttention 248. ModelArchitecture Sentence-basedAttention 249. (A) (A)(A)(A)(A) (B)(B)(B) 250. SupervisedLearning Accuracy(%) (1)(2)(3)(4)(5)(6)(7) MemoryNetwork:39.2% NaiveApproaches (proposedbyFBAIgroup) 251. SupervisedLearning Accuracy(%) (1)(2)(3)(4)(5)(6)(7) MemoryNetwork:39.2% NaiveApproaches Word-basedAttention:48.8% (proposedbyFBAIgroup) [Fang&Hsu&Lee,SLT16] [Tseng&Lee,Interspeech16] 252. Outline SupervisedLearning •UltraDeepNetwork •AttentionModel ReinforcementLearning UnsupervisedLearning •Image:RealizingwhattheWorldLooksLike •Text:UnderstandingtheMeaningofWords •Audio:Learninghumanlanguagewithoutsupervision Newnetworkstructure 253. ScenarioofReinforcement Learning Agent Environment ObservationAction RewardDon’tdo that 254. ScenarioofReinforcement Learning Agent Environment ObservationAction RewardThankyou. Agentlearnstotakeactionsto maximizeexpectedreward. http://www.sznews.com/news/conte nt/2013-11/26/content_8800180.htm 255. Supervisedv.s.Reinforcement •Supervised •Reinforcement Hello Agent …… Agent …….…….…… Bad “Hello”Say“Hi” “Byebye”Say“Goodbye” Learningfrom teacher Learningfrom critics 256. ScenarioofReinforcement Learning Environment ObservationAction RewardNextMove Ifwin,reward=1 Ifloss,reward=-1 Otherwise,reward=0 Agentlearnstotakeactionsto maximizeexpectedreward. 257. Supervisedv.s.Reinforcement •Supervised: •ReinforcementLearning Nextmove: “5-5” Nextmove: “3-3” Firstmove……manymoves……Win! AlphaGoissupervisedlearning+reinforcementlearning. 258. DifficultiesofReinforcement Learning •Itmaybebettertosacrificeimmediaterewardto gainmorelong-termreward •E.g.PlayingGo •Agent’sactionsaffectthesubsequentdatait receives •E.g.Exploration 259. DeepReinforcementLearning Environment ObservationAction Reward Function Input Function Output Usedtopickthe bestfunction … …… DNN 260. Application:InteractiveRetrieval •Interactiveretrievalishelpful. user “DeepLearning” “DeepLearning”relatedtoMachineLearning? “DeepLearning”relatedtoEducation? [Wu&Lee,INTERSPEECH16] 261. DeepReinforcementLearning •Differentnetworkdepth Betterretrieval performance, Lessuserlabor Thetaskcannotbeaddressed bylinearmodel. Somedepthisneeded. MoreInteraction 262. Moreapplications •AlphaGo,PlayingVideoGames,Dialogue •FlyingHelicopter •https://www.youtube.com/watch?v=0JL04JJjocc •Driving •https://www.youtube.com/watch?v=0xo1Ldx3L 5Q •GoogleCutsItsGiantElectricityBillWith DeepMind-PoweredAI •http://www.bloomberg.com/news/articles/2016-07- 19/google-cuts-its-giant-electricity-bill-with-deepmind- powered-ai 263. Tolearndeepreinforcement learning…… •LecturesofDavidSilver •http://www0.cs.ucl.ac.uk/staff/D.Silver/web/Te aching.html •10lectures(1:30each) •DeepReinforcementLearning •http://videolectures.net/rldm2015_silver_reinfo rcement_learning/ 264. Outline SupervisedLearning •UltraDeepNetwork •AttentionModel ReinforcementLearning UnsupervisedLearning •Image:RealizingwhattheWorldLooksLike •Text:UnderstandingtheMeaningofWords •Audio:Learninghumanlanguagewithoutsupervision Newnetworkstructure 265. Doesmachineknowwhatthe worldlooklike? Drawsomething! Ref:https://openai.com/blog/generative-models/ 266. DeepDream •Givenaphoto,machineaddswhatitsees…… http://deepdreamgenerator.com/ 267. DeepDream •Givenaphoto,machineaddswhatitsees…… http://deepdreamgenerator.com/ 268. DeepStyle •Givenaphoto,makeitsstylelikefamouspaintings https://dreamscopeapp.com/ 269. DeepStyle •Givenaphoto,makeitsstylelikefamouspaintings https://dreamscopeapp.com/ 270. DeepStyle CNNCNN contentstyle CNN ? 271. GeneratingImagesbyRNN colorof 1stpixel colorof 2ndpixel colorof 2ndpixel colorof 3rdpixel colorof 3rdpixel colorof 4thpixel 272. GeneratingImagesbyRNN •PixelRecurrentNeuralNetworks •https://arxiv.org/abs/1601.06759 Real World 273. GeneratingImages •Trainingadecodertogenerateimagesis unsupervised NeuralNetwork ?Trainingdataisalotofimagescode 274. Auto-encoder NN Encoder NN Decoder code code LearntogetherInputLayer bottle OutputLayer Layer Layer …… Code Ascloseaspossible Layer Layer EncoderDecoder Notstate-of- the-art approach 275. GeneratingImages •Trainingadecodertogenerateimagesis unsupervised •VariationAuto-encoder(VAE) •Ref:Auto-EncodingVariationalBayes, https://arxiv.org/abs/1312.6114 •GenerativeAdversarialNetwork(GAN) •Ref:GenerativeAdversarialNetworks, http://arxiv.org/abs/1406.2661 NN Decoder code 276. Whichoneismachine-generated? Ref:https://openai.com/blog/generative-models/ 277. 畫漫畫!!!https://github.com/mattya/chainer-DCGAN 278. Outline SupervisedLearning •UltraDeepNetwork •AttentionModel ReinforcementLearning UnsupervisedLearning •Image:RealizingwhattheWorldLooksLike •Text:UnderstandingtheMeaningofWords •Audio:Learninghumanlanguagewithoutsupervision Newnetworkstructure 279. http://top-breaking-news.com/ MachineReading •Machinelearnthemeaningofwordsfromreading alotofdocumentswithoutsupervision 280. MachineReading •Machinelearnthemeaningofwordsfromreading alotofdocumentswithoutsupervision dog cat rabbit jump run flower tree WordVector/Embedding 281. MachineReading •GeneratingWordVector/Embeddingis unsupervised NeuralNetwork Apple https://garavato.files.wordpress.com/2011/11/stacksdocuments.jpg?w=490 Trainingdataisalotoftext ? 282. MachineReading •Machinelearnthemeaningofwordsfromreading alotofdocumentswithoutsupervision •Awordcanbeunderstoodbyitscontext 蔡英文520宣誓就職 馬英九520宣誓就職 蔡英文、馬英九are somethingverysimilar Youshallknowaword bythecompanyitkeeps 283. WordVector Source:http://www.slideshare.net/hustwj/cikm-keynotenov2014 283 284. WordVector •Characteristics •Solvinganalogies 𝑉ℎ𝑜𝑡𝑡𝑒𝑟−𝑉ℎ𝑜𝑡≈𝑉𝑏𝑖𝑔𝑔𝑒𝑟−𝑉𝑏𝑖𝑔 𝑉𝑅𝑜𝑚𝑒−𝑉𝐼𝑡𝑎𝑙𝑦≈𝑉𝐵𝑒𝑟𝑙𝑖𝑛−𝑉𝐺𝑒𝑟𝑚𝑎𝑛𝑦 𝑉𝑘𝑖𝑛𝑔−𝑉𝑞𝑢𝑒𝑒𝑛≈𝑉𝑢𝑛𝑐𝑙𝑒−𝑉𝑎𝑢𝑛𝑡 Rome:Italy=Berlin:? 𝑉𝐺𝑒𝑟𝑚𝑎𝑛𝑦 ≈𝑉𝐵𝑒𝑟𝑙𝑖𝑛−𝑉𝑅𝑜𝑚𝑒+𝑉𝐼𝑡𝑎𝑙𝑦 Compute𝑉𝐵𝑒𝑟𝑙𝑖𝑛−𝑉𝑅𝑜𝑚𝑒+𝑉𝐼𝑡𝑎𝑙𝑦 FindthewordwwiththeclosestV(w) 284 285. MachineReading •Machinelearnthemeaningofwordsfromreading alotofdocumentswithoutsupervision 286. Demo •Modelusedindemoisprovidedby陳仰德 •Partoftheprojectdoneby陳仰德、林資偉 •TA:劉元銘 •TrainingdataisfromPTT(collectedby葉青峰) 286 287. Outline SupervisedLearning •UltraDeepNetwork •AttentionModel ReinforcementLearning UnsupervisedLearning •Image:RealizingwhattheWorldLooksLike •Text:UnderstandingtheMeaningofWords •Audio:Learninghumanlanguagewithoutsupervision Newnetworkstructure 288. LearningfromAudioBook Machinelistenstolotsof audiobook [Chung,Interspeech16) Machinedoesnothave anypriorknowledge Likeaninfant 289. AudioWordtoVector •Audiosegmentcorrespondingtoanunknownword Fixed-lengthvector 290. AudioWordtoVector •Theaudiosegmentscorrespondingtowordswith similarpronunciationsareclosetoeachother. everever never never never dog dog dogs 291. Sequence-to-sequence Auto-encoder audiosegment acousticfeatures Thevaluesinthememory representthewholeaudio segment x1x2x3x4 RNNEncoder audiosegment vector Thevectorwewant HowtotrainRNNEncoder? 292. Sequence-to-sequence Auto-encoder RNNDecoder x1x2x3x4 y1y2y3y4 x1x2x3 x4 RNNEncoder audiosegment acousticfeatures TheRNNencoderand decoderarejointlytrained. Inputacousticfeatures 293. AudioWordtoVector -Results •Visualizingembeddingvectorsofthewords fear nearname fame 294. WaveNet(DeepMind) https://deepmind.com/blog/wavenet-generative-model-raw-audio/ 295. ConcludingRemarks 296. ConcludingRemarks LectureIV:NextWave LectureIII:VariantsofNeuralNetwork LectureII:TipsforTrainingDeepNeuralNetwork LectureI:IntroductionofDeepLearning 297. AI即將取代多數的工作? •NewJobinAIAge http://www.express.co.uk/news/science/651202/First-step-towards-The-Terminator- becoming-reality-AI-beats-champ-of-world-s-oldest-game AI訓練師 (機器學習專家、 資料科學家) 298. AI訓練師 機器不是自己會學嗎? 為什麼需要AI訓練師 戰鬥是寶可夢在打, 為什麼需要寶可夢訓練師? 299. AI訓練師 寶可夢訓練師 •寶可夢訓練師要挑選適合 的寶可夢來戰鬥 •寶可夢有不同的屬性 •召喚出來的寶可夢不一定 能操控 •E.g.小智的噴火龍 •需要足夠的經驗 AI訓練師 •在step1,AI訓練師要挑 選合適的模型 •不同模型適合處理不 同的問題 •不一定能在step3找出 bestfunction •E.g.DeepLearning •需要足夠的經驗 300. AI訓練師 •厲害的AI,AI訓練師功不可沒 •讓我們一起朝AI訓練師之路邁進 http://www.gvm.com.tw/web only_content_10787.html 2,765likes × ssuser53561c May.07,2022 yunkun1 Mar.20,2022 jackwilson431910 Mar.05,2022 reawenchiang Jan.22,2022 TiffanyBeaumont Dec.16,2021 Ifuneedahandinmakingyourwritingassignments-visit⇒www.HelpWriting.net⇐formoredetailedinformation. ShowMore Views × Totalviews 493,622 OnSlideShare 0 FromEmbeds 0 NumberofEmbeds 4,592 Youhavenowunlockedunlimitedaccessto20M+documents! × UnlimitedReading Learnfasterandsmarterfromtopexperts UnlimitedDownloading Downloadtotakeyourlearningsofflineandonthego YoualsogetfreeaccesstoScribd! Instantaccesstomillionsofebooks,audiobooks,magazines,podcastsandmore. Readandlistenofflinewithanydevice. FreeaccesstopremiumserviceslikeTuneln,Mubiandmore. DiscoverMoreOnScribd × ShareClipboard × Facebook Twitter LinkedIn Link Publicclipboardsfeaturingthisslide × Nopublicclipboardsfoundforthisslide Selectanotherclipboard × Lookslikeyou’veclippedthisslidetoalready. Createaclipboard Youjustclippedyourfirstslide! Clippingisahandywaytocollectimportantslidesyouwanttogobacktolater.Nowcustomizethenameofaclipboardtostoreyourclips. Name* Description Visibility OtherscanseemyClipboard Cancel Save SharethisSlideShare SpecialOffertoSlideShareReaders × TheSlideSharefamilyjustgotbigger.Enjoyaccesstomillionsofebooks,audiobooks,magazines,andmorefromScribd. Readfreefor60days Cancelanytime.



請為這篇文章評分?