scholarly journals What if the World Were Different? Gradient-Based Exploration for New Optimal Policies

10.29007/6jsv ◽  
2018 ◽  
Author(s):  
Rui Silva ◽  
Francisco S. Melo ◽  
Manuela Veloso

Planning under uncertainty assumes a model of the world that specifies the probabilistic effects of the actions of an agent in terms of changes of the state. Given such model, planning proceeds to determine a policy that defines for each state the choice of action that the agent should follow in order to maximize a reward function. In this work, we realize that the world can be changed in more ways than those possible by the execution of the agent’s repertoire of actions. These additional configurations of the world may allow new policies that let the agent accumulate even more reward than that possible by following the optimal policy of the original world. We introduce and formalize the problem of planning while considering these additional possible worlds. We then present an approach that models feasible changes to the world as modifications to the probability transition function, and show that the problem of computing the configuration of the world that allows the most rewarding optimal policy can be formulated as a constrained optimization problem. Finally, we contribute a gradient-based algorithm for solving this optimization problem. Experimental evaluation shows the effectiveness of our approach in multiple problems of practical interest.

Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


2017 ◽  
Author(s):  
Nicholas Franklin ◽  
Michael J. Frank

AbstractHumans are remarkably adept at generalizing knowledge between experiences in a way that can be difficult for computers. Often, this entails generalizing constituent pieces of experiences that do not fully overlap, but nonetheless share useful similarities with, previously acquired knowledge. However, it is often unclear how knowledge gained in one context should generalize to another. Previous computational models and data suggest that rather than learning about each individual context, humans build latent abstract structures and learn to link these structures to arbitrary contexts, facilitating generalization. In these models, task structures that are more popular across contexts are more likely to be revisited in new contexts. However, these models can only re-use policies as a whole and are unable to transfer knowledge about the transition structure of the environment even if only the goal has changed (or vice-versa). This contrasts with ecological settings, where some aspects of task structure, such as the transition function, will be shared between context separately from other aspects, such as the reward function. Here, we develop a novel non-parametric Bayesian agent that forms independent latent clusters for transition and reward functions, affording separable transfer of their constituent parts across contexts. We show that the relative performance of this agent compared to an agent that jointly clusters reward and transition functions depends environmental task statistics: the mutual information between transition and reward functions and the stochasticity of the observations. We formalize our analysis through an information theoretic account of the priors, and propose a meta learning agent that dynamically arbitrates between strategies across task domains to optimize a statistical tradeoff.Author summaryA musician may learn to generalize behaviors across instruments for different purposes, for example, reusing hand motions used when playing classical on the flute to play jazz on the saxophone. Conversely, she may learn to play a single song across many instruments that require completely distinct physical motions, but nonetheless transfer knowledge between them. This degree of compositionality is often absent from computational frameworks of learning, forcing agents either to generalize entire learned policies or to learn new policies from scratch. Here, we propose a solution to this problem that allows an agent to generalize components of a policy independently and compare it to an agent that generalizes components as a whole. We show that the degree to which one form of generalization is favored over the other is dependent on the features of task domain, with independent generalization of task components favored in environments with weak relationships between components or high degrees of noise and joint generalization of task components favored when there is a clear, discoverable relationship between task components. Furthermore, we show that the overall meta structure of the environment can be learned and leveraged by an agent that dynamically arbitrates between these forms of structure learning.


Author(s):  
Vasyl Karpo ◽  
Nataliia Nechaieva-Yuriichuk

From ancient times till nowadays information plays a key role in the political processes. The beginning of XXI century demonstrated the transformation of global security from military to information, social etc. aspects. The widening of pandemic demonstrated the weaknesses of contemporary authoritarian states and the power of human-oriented states. During the World War I the theoretical and practical interest toward political manipulation and political propaganda grew definitely. After 1918 the situation developed very fast and political propaganda became the part of political influence. XX century entered into the political history as the millennium of propaganda. The collapse of the USSR and socialist system brought power to new political actors. The global architecture of the world has changed. Former Soviet republic got independence and tried to separate from Russia. And Ukraine was between them. The Revolution of Dignity in Ukraine was the start point for a number of processes in world politics. But the most important was the fact that the role and the place of information as the challenge to world security was reevaluated. The further annexation of Crimea, the attempt to legitimize it by the comparing with the referendums in Scotland and Catalonia demonstrated the willingness of Russian Federation to keep its domination in the world. The main difference between the referendums in Scotland and in Catalonia was the way of Russian interference. In 2014 (Scotland) tried to delegitimised the results of Scottish referendum because they were unacceptable for it. But in 2017 we witness the huge interference of Russian powers in Spain internal affairs, first of all in spreading the independence moods in Catalonia. The main conclusion is that the world has to learn some lessons from Scottish and Catalonia cases and to be ready to new challenges in world politics in a format of information threats.


2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


10.6036/10099 ◽  
2021 ◽  
Vol DYNA-ACELERADO (0) ◽  
pp. [ 8 pp.]-[ 8 pp.]
Author(s):  
SALAH KAMAL ◽  
ATTIA EL-FERGANY ◽  
EHAB EHAB ELSAYED ELATTAR ◽  
AHMED AGWA

The accuracy of fuel cell (FC) models is important for the further numerical simulations and analysis at several conditions. The electrical (I-V) characteristic of the polymer exchange membrane fuel cells (PEMFCs) has high degree of nonlinearity comprising uncertain seven parameters as they aren’t given in fabricator's datasheets. These seven parameters need to be obtained to have the PEMFC model in order. This research addresses an up-to-date application of the gradient-based optimizer (GBO) to generate the best estimated values of such uncertain parameters. The estimation of these uncertain parameters is adapted as optimization problem having a cost function (CF) subjects to set of self-constrained limits. Three test cases of widely used PEMFCs units; namely, SR-12, 250-W module and NedStack PS6 to appraise the performance of the GBO are demonstrated and analyzed. The best values of the CF are 0.000142, 0.33598, and 2.10025 V2 for SR-12, 250-W module and NedStack PS6; respectively. Furthermore, the assessment of the GBO-based model is made by comparing its obtained results with the experiential results of these typical PEMFCs plus comparisons to other methods. At a due stage, many scenarios as a result of operating variations in regard to inlet regulation pressures and unit temperatures are performed. The copped reported results of the studied scenarios indicate the effectiveness of the GBO in establishing an accurate PEMFC model.


Author(s):  
Vu Huy Thang

The paper will study on the reality of the policies for development for technology and science information system in the world, the necessary discussion before the development of S&T. The author had in-depth assessment of the macroeconomic policy-oriented goals for the development of S&T information in the maritime sector and the development of Vietnamese science and technology information sources. A number of typical policies have been analyzed in the direction of practical application in the situation that Vietnam has been actively integrating with the world. Orientations for development of science and technology, maritime training and coaching to 2025 and a vision to 2030 with the case study of Vietnam Maritime University.  The article conducted researches, surveys, and interviews on the demand trend of S&T information use of information users and managers in the maritime field in the near future through the questionnaire system. From that, the paper proposed a policy framework to develop the S&T information system in Vietnam's maritime sector and analyze the advantages and disadvantages compared to the current policies. The author conducted a SWOT analysis to assess the strengths, weaknesses, opportunities and challenges of the Maritime Science and Technology Information System. Proposal to supplement new policies and complete amendments to existing ones and assess the possible impacts when applying policies in practice. The article confirms the important role of the proposed policies in the context and the practical situation contributing to the successful implementation of the national maritime strategy.


2021 ◽  
Vol 5 (1) ◽  
pp. 111-120
Author(s):  
Yunanta Arief Rusmana

Title: Merdeka Belajar on the Changes in the Form of STARS YKPN Yogyakarta in the Covid 19 Pandemic Era   Currently in all fields in the world have entered the era of revolution 4.0, with the advancement of science and technology and information technology in particular, universities are required to welcome and prepare themselves to face it. The current adjustment of the "Merdeka Belajar Kampus Merdeka" curriculum has resulted from reviews for the curriculum, especially for the higher level of education in architecture. At the same time as the Covid 19 pandemic that hit Indonesia and affected all fields, including the world of education, especially the Sekolah Tinggi Arsitektur (STARS) YKPN Yogyakarta. STARS YKPN which is a change in the form of the Akademi Teknik (AT) YKPN, in carrying out its role as an educational institution cannot be separated from this situation. In addition to this, the existence of the Architects Law, and many new policies and regulations from the Ministry of Education and Culture, require institutions to change shape. AT YKPN which initially only had a 3 years Diploma Architectural Drawing Study Program, after the change of form to STARS YKPN, currently has 2 study programs, namely 3 years Diploma Architectural Drawing and Bachelor of Architecture. With these changes, it is necessary to change the curriculum and learning methods, both in the 3 years Diploma Architectural Drawing Study Program and the Bachelor of Architecture Study Program. This article aims to provide an overview of how various changes occur, especially the curriculum and learning methods.


2019 ◽  
Vol 6 (1) ◽  
pp. 15-22
Author(s):  
Adriano De Oliveira Sampaio ◽  
Inés Martins

ABSTRACTThis article analyzes the sense production that a group of young Catalans came up with after seeing some advertising pieces about Brazil’s self-promotion campaign (2011-2014) abroad and after some posters’ presentations about the advertising campaign “Brazil. The world is here”. Discourse Analysis and focus groups were used as a technique to analyze and to collect the interviews. The results show us that the binomial similarity / difference - essential for a brand positioning construction - is practically non-existent in those campaigns, which causes a homogenization of them in several countries.RESUMENEl artículo se centra en el análisis sobre los significados que un grupo de jóvenes catalanes elaboran de las campañas turísticas de autopromoción de Brasil (2011-2014) en el exterior antes y después de la presentación de los carteles de la campaña publicitaria “Brasil. O mundo se encontra aquí”. Fueron utilizados el análisis del discurso y el focus-group como técnica de recolección de las entrevistas. Los resultados obtenidos nos han permitido comprobar que el binomio semejanza/diferencia –esencial para la construcción do posicionamiento de marcas- es prácticamente inexistente en las campañas analizadas, lo que provoca la homogenización de las campañas turísticas de diferentes países.


2013 ◽  
Vol 1 (2) ◽  
pp. 161-188 ◽  
Author(s):  
Véra Ehrenstein ◽  
Fabian Muniesa

This paper examines counterfactual display in the valuation of carbon offsetting projects. Considered a legitimate way to encourage climate change mitigation, such projects rely on the establishment of procedures for the prospective assessment of their capacity to become carbon sinks. This requires imagining possible worlds and assessing their plausibility. The world inhabited by the project is articulated through conditional formulation and subjected to what we call “counterfactual display”: the production and circulation of documents that demonstrate and con!gure the counterfactual valuation. We present a case study on one carbon offsetting reforestation project in the Democratic Republic of Congo. We analyse the construction of the scene that allows the “What would have happened” question to make sense and become actionable. We highlight the operations of calculative framing that this requires, the reality constraints it relies upon, and the entrepreneurial conduct it stimulates.


Dialogue ◽  
1990 ◽  
Vol 29 (2) ◽  
pp. 205-218 ◽  
Author(s):  
John Bigelow

Recently, Brian Ellis came up with a neat and novel idea about laws of nature, which at first I misunderstood. Then I participated, with Brian Ellis and Caroline Lierse, in writing a joint paper, “The World as One of a Kind: Natural Necessity and Laws of Nature” (Ellis, Bigelow and Lierse, forthcoming). In this paper, the Ellis idea was formulated in a different way from that in which I had originally interpreted it. Little weight was placed on possible worlds or individual essences. Much weight rested on natural kinds. I thought Ellis to be suggesting that laws of nature attribute essential properties to one grand individual, The World. In fact, Ellis is hostile towards individual essences for any individuals at all, including The World. He is comfortable only with essential properties of kinds, rather than individuals. The Ellis conjecture was that laws of nature attribute essential properties to the natural kind of which the actual world is one (and presumably the only) member.


Sign in / Sign up

Export Citation Format

Share Document