Open questions in machine intelligence

This is a running post aggregating some key open questions in Machine Intelligence. If you want to suggest an update, comment below.

Building predictive models of the world at different temporal resolutions

Existing agents do not understand the environment we inhabit. In order to efficiently plan with minimal environmental interactions, it is necessary for the agent to learn knowledge at different spatio-temporal abstractions. For example, even if you haven't been in a car crash, you can predict sequence of high-level events following the crash or you could even imagine rough visual imagery of the scene. Current AI systems are nowhere close to achieving such a feat.

The nature and origin of goals and intrinsic motivation

Let us start with the notion of value functions $V(s)$, where $s$ denotes the state of the world. $V(s)$ caches the utility of $s$ towards achieving any desidred goal $g$. In typical reinforcement learning settings, the agent observes rewards and states from the environment and takes actions to estimate the value function:

$$V(s)=max_{\pi} \mathbb E ( \sum_{t'=t}^\infty \gamma^{t'-t} r_{t'} \mid s_t=s, \pi ) $$

Here $\gamma$ denotes a discount factor which ensures a convergent series when the episode length $t$ is arbitrary. $\pi$ denotes the goal-directed policy that maps states to actions.

Value functions are more important than rewards. Rewards are observed only in certain states. Value functions provide an estimate of expected future rewards from an aribtrary state in the environment, given the agent sufficiently explores the environment. This feature is crucial for generalization. But what happens when the agent cannot sufficiently explore high-dimensional state spaces or if the reward is too sparse? In such typical situations, the agent could benefit from being intrinsically motivated to come up with 'useful' goals to solve. The most basic form of intrinsic motivation could be stated as -- "learning feels good". Can we say something more tangible? I have listed some landmark papers on this topic:

Efficiently solving Human Goals

To build AI to solve human-like goals such as navigation, scene understanding, motor-control, etc., how should we think about intermediate representations for efficient goal-directed behavior? Evolution has spent billions of years tuning biological agents to adapt in stochastic environments. A compact way to transfer information between agents is to encode useful goals into the genome, so that the agent can learn a policy to accomplish such goals during its lifetime. The niche of the universe we found ourselves in dictates the regularities in our environment. Due to this regularity, it is concievable for evolution to preserve basic goals such as discovering visual concepts, which may turn out to be useful to learn many higher order behaviors. This is evident in baby Gazelle which learns to walk and do basic navigation within 10 minutes from birth.

This suggests an evolutionary account of why the brain creates sub-modules for different computations such as face processing. Early recognition of faces is critical for social intelligence and could also be leveraged to bootstrap learning about the visual world itself. What are the other mid-level representations that evolution could have preserved, that we as humans leverage in order to solve all the goals in our environmental niche? I believe understanding and building such representations will enable us to create data-efficient agents for solving many human goals.

Control over the Reward Channel in Reinforcement Learning and Value Alignment

How do we ensure that an AI agent does not take control of it’s own reward channel? In a typical reinforcement learning setup, the agent tries to maximize expected future rewards to find optimal policies. In order to ensure that the agent continues to maximize it's reward channel, it can alter the it's environment in arbitrary ways. While doing so, the agent can disregard any human goals along it's path. How do we stop this? Nobody really knows but people have tried to come up with several arguments.

The most reasonable one I have come across is from Daniel Dewey. He argues that instead of expected future reward maximization, the agent should instead optimize expected utility over a pool of possible utilities. These utility functions could be designed so that the agent's values are consistent with human values. Since future trajectories over time are scored under a defined set of utility functions, even if the agent modifies it's future utility functions, this improvement may not yield improvements according to the originally defined utility functions. And therefore this self-modification of future utilities will not be chosen by the agent.

Tejas Kulkarni, Cambridge, MA

comments powered by Disqus