浩爾筆記Howard's note

發表文章

Supervised Hebbian Learning-監督式赫布學習

Hebbian Learning 赫布理論（英語：Hebbian theory）是一個神經科學理論，解釋了在學習的過程中腦中的神經元所發生的變化。赫布理論描述了突觸可塑性的基本原理，即突觸前神經元向突觸後神經元的持續重複的刺激，可以導致突觸傳遞效能的增加。因此又稱為赫布定律（Hebb's rule）、赫布假說（Hebb's postulate）、細胞結集理論（cell assembly theory）等。 Hebb規則是最早的神經網絡學習定律之一。它是由Donald Hebb在1949年提出的，它是大腦中突觸修飾的一種可能機制，此後一直用於訓練人工神經網絡。 When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased. 當細胞A的軸突足夠靠近以激發細胞B並反复或持續參與激發它時，一個或兩個細胞中都會發生某些生長過程或代謝變化，從而使A的效率（其中一個激發B）成為可能，增加。 Linear Associator 輸出的向量為 a = Wp 或是主要可以構成下圖所示: 公式整理

閱讀完整內容

Chapter 4 - Dynamic Programming

Dynamic Programming Introduction Dynamic programming (DP) refers to a collection of algorithms that can be used to compute optimal policies given a perfect model of the environment as a MDP 動態編程（DP）是指算法的集合，可在給定環境完美模型（如MDP）的情況下用於計算最佳策略 Consider the environment as a finite MDP, and its dynamics are given by a set of probability p(s’, r|s, a) for all s ∈ 𝒮, a ∈𝒜(𝑠), r ∈ ℛ, and s’ ∈ 𝒮+ 將環境視為有限的MDP，其動態性由一組概率給出 The key idea of DP is the use of value functions to organize and structure the search for good policies. DP的關鍵思想是使用value functions(價值功能)來組織和建構對良好(policies)策略的搜索。 We can easily obtain optimal policies once we have found the optimal value functions which satisfy the Bellman optimality equations 一旦找到滿足Bellman最優性方程的最優值函數，就可以輕鬆獲得最優策略。 4.1 Policy Evaluation(Prediction)

閱讀完整內容

浩爾筆記Howard's note

搜尋此網誌

發表文章

精選文章

什麼是強化學習? 你必須知道的幾件事-Chapter 1 - Intoduction

Supervised Hebbian Learning-監督式赫布學習

Chapter 4 - Dynamic Programming