he the endeavour the methods applied tohe the endeavour the methods applied to

he Policy network pre-training can be performed in the manner described within the modifiedK2 algorithm; the pseudo code for the Reinforcement Learning is described below.Algorithm 5Modified K2 (CNN based Reinforcement Learning)1:procedureK2ModRL(A set of ? tuples (edge, image) representing the edges within theBN; a count,episodesof episodes; a limit,u, for the maximum number edges a node mayhave; a database,D, containing the node probability distributions)2:initialiseQ-values (Q(s,a)) for all state-action pairs, e.g. through CNN supervised pre-training, where the state ‘s’ is the probability distributions and the action ‘a’ is the proposededgeRank search sequence3:forx= 1 toepisodesdo4:edgeRank=CNNProbRank(?); propose action5:ScoreK2,Gnew=K2Mod(edgeRank,D,u); calculate reward6:Update Q(s,a); updates CNNProbRank, see update function below7:end for8:returnGnew,ScoreK29:end procedureQ(st,at)?(1??)·Q(st,at) +?·{rt+?·maxaQ(st+1,at)}wherertis the reward derived from the current statest,?is the discount factor and?is thelearning rate (Watkins & Dayan 1992).This approach resembles AlphaGo’s (Silver & Dieleman 2016) combination of a policy network’sproposal of a search strategy to a Monte Carlo tree search which, in turn, selects optimal Gomoves (albeit without the use of a value network to mediate the tree search depth).6.3 Lessons Learned6.3.1 LiteratureThe broad objective of the research, that is to define BN structures through neural networks,remained constant. However the approach differed substantially from that which was initiallyenvisioned in the proposal; but as a testament to the academic, nay scientific, process thethorough literature review served both to inform and inspire the final approach. Given thesubstantial overlap between the subject area of the initial proposal and final thesis’ score thiswas, perhaps, unsurprising. Nevertheless, without the context that it provided the deliverycomponent of this research would have inevitably stalled. Furthermore, while not strictly rele-vant to the final thesis research, the additional reading into reinforcement learning approachesinformed the further work proposals.6.3.2 MethodsBased on the success of the endeavour the methods applied to the problem are consideredappropriate. However the essentially infinite volume of permutations that can be conceived fora BN structure (when distributions and graph structure are considered) may render a supervisedlearning approach to BN structure learning impractical. This depends, of course, on the degreeof homogeneity in the BNs observe in the real world. A potential countermeasure could be thecreation of domain specific CNNs, capable of the required specificity.