tailieunhanh - Mobile Robots Navigation 2008 Part 15

Tham khảo tài liệu 'mobile robots navigation 2008 part 15', kỹ thuật - công nghệ, cơ khí - chế tạo máy phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | 548 Mobile Robots Navigation The effective reinforcement is used to update the connection weights between PCL and PU in AC . reward expectation associated to a place and also between Actor units and map nodes . reward expectations associated to different orientations . In the first case we use WpcL t 1 WpcL t p r t Epcl t 5 where p is the learning rate and EPCL is the matrix of 1 n eligibility traces corresponding to connections between PCL and PU in AC. In the second case we use WNk t 1 WNk t pr t ENk t V map node k 6 where Wwt is the vector of connection weights between map node k and a maximum of eight Actor units and ENk is the vector of eligibility traces corresponding to a maximum of eight Actor units. As shown in 5 and 6 both learning rules depend on the eligibility of the connections. At the beginning of every trial in a given experiment eligibility traces in AC and in Actor units are initialized to 0. At each time step t in a trial eligibility traces in AC are increased in the connections between PU and the most active neurons within PCL only when the action executed by the animat at time t-1 allowed it to perceive the goal Epcl t Epcl t-1 zPC t 7 where X is the increment parameter and PC stores the activity pattern registered by the collection of neurons in PCL. Also at time step t the eligibility trace e of the connection between the active map node na and the Actor unit corresponding to the current animat orientation dir is increased by T as described by 8 dir dir ed a t en t -1 T . 8 Finally after updating connection weights between PCL and AC and between Actor units and map nodes at any time step t in the trial all eligibilities decay at certain rates Ấ and o respectively as shown in 9 Epcl t ZEpcL t - 1 ENk t ENk t -1 V map node k. 9 The use of the Actor-Critic architecture enables the estimation of reward expectation values of different locations in the environment where maximum expectations correspond to locations from where the goal is .

TỪ KHÓA LIÊN QUAN
crossorigin="anonymous">
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.