An algorithm for learning without external supervision and its application to learning control systems

An algorithm is proposed for the design of ``on-line'' learning controllers to control a discrete stochastic plant. The subjective probabilities of applying control actions from a finite set of allowable actions using random strategy, after any plant-environment situation (called an ``even...

Ausführliche Beschreibung

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence. - 1979. - 8(1986), 3 vom: 01. März, Seite 304-12
1. Verfasser:	Nikolic, Z J (VerfasserIn)
Weitere Verfasser:	Fu, K S
Format:	Aufsatz
Sprache:	English
Veröffentlicht:	1986
Zugriff auf das übergeordnete Werk:	IEEE transactions on pattern analysis and machine intelligence
Schlagworte:	Journal Article


LEADER	01000naa a22002652 4500
001	NLM211016071
003	DE-627
005	20231224012722.0
007	tu
008	231224s1986 xx \|\|\|\|\| 00\| \|\|eng c
028	5	2	\|a pubmed24n0703.xml
035			\|a (DE-627)NLM211016071
035			\|a (NLM)21869349
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Nikolic, Z J \|e verfasserin \|4 aut
245	1	3	\|a An algorithm for learning without external supervision and its application to learning control systems
264		1	\|c 1986
336			\|a Text \|b txt \|2 rdacontent
337			\|a ohne Hilfsmittel zu benutzen \|b n \|2 rdamedia
338			\|a Band \|b nc \|2 rdacarrier
500			\|a Date Completed 02.10.2012
500			\|a Date Revised 12.11.2019
500			\|a published: Print
500			\|a Citation Status PubMed-not-MEDLINE
520			\|a An algorithm is proposed for the design of ``on-line'' learning controllers to control a discrete stochastic plant. The subjective probabilities of applying control actions from a finite set of allowable actions using random strategy, after any plant-environment situation (called an ``event'') is observed, are modified through the algorithm. The subjective probability for the optimal action is proved to approach one with probability one for any observed event. The optimized performance index is the conditional expectation of the instantaneous performance evaluations with respect to the observed events and the allowable actions. The algorithm is described through two transformations, T1, and T2. After the ``ordering transformation'' T1 is applied on the estimates of the performance indexes of the allowable actions, the ``learning transformation'' T2 modifies the subjective probabilities. The cases of discrete and continuous features are considered. In the latter, the Potential Function Method is employed. The algorithm is compared with a linear reinforcement schenme and computer simulation results are presented
650		4	\|a Journal Article
700	1		\|a Fu, K S \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t IEEE transactions on pattern analysis and machine intelligence \|d 1979 \|g 8(1986), 3 vom: 01. März, Seite 304-12 \|w (DE-627)NLM098212257 \|x 1939-3539 \|7 nnns
773	1	8	\|g volume:8 \|g year:1986 \|g number:3 \|g day:01 \|g month:03 \|g pages:304-12
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_NLM
912			\|a GBV_ILN_350
951			\|a AR
952			\|d 8 \|j 1986 \|e 3 \|b 01 \|c 03 \|h 304-12