Compositional Learning Based Planning for Vision Pomdps

Sampada Deglurkar, Michael Lim, Johnathan Tucker, Zachary Sunberg, Aleksandra Faust, Claire Tomlin.

Bottom Line Up Front

More, smaller, models trained to their own goals end up being more efficient and robust. This is… close to what I was attempting, but I think my “produce an image” thing was insane. Next step: go find this thing and see how it is proposing particles. I think I may be able to adapt this so that I’m keeping a bunch of unrelated particles alive and just deciding which ones to kill.

Summary

Uncertainty under both sensing and planning. Finding an optimal POMDP policy is computationally demanding, often intractable, due to the uncertainty introduced by imperfect observations. “Continuous POMDPs” denote continuous state, action, and observation spaces. GANs are better than Autoencoders. In our MCTS planner, GANs

𝐺

propose

𝑃

state particles given current observation.

Ground truth state information is available to the planner during training, but not during testing.

POMCPOW and friends use weighted collections of particles to represent complex beliefs.

Differentiable Particle Filtering

Learn two neural network based models: 1) Observation Density, and 2) Particle Proposer. Assume transition model is known (could be a physical system), could also learn

𝑇

Filtering Procedure

MCTS

For online planning, Particle Filter Trees-Double progressive Widening algorithm. Easy to implement. Vectorizes particle filtering. D8 action space, however “could work with continuous action space in principle”. (Press X to doubt)

Learn a deep generative model that generates an observation given a state? They use Conditional Variational Autoencoder (CVAE [SLY15]) after some experiments.

Compositional Training of Visual Tree Search

State, Action, Observations are pre-collected. Belief can be sampled on the fly.

𝑍_{𝜃}

and

𝑃_{𝜑}

are trained with adversarial objectives.

𝑍_{𝜃}

is discriminator that gives higher likelihood to states that are more likely for a given observation.

𝑃_{𝜑}

proposes plausible state particles for a given observation. Transform it from a reinforcement learning problem to a compositional learning problem. Training each model is framed as an unsupervised learning problem.

Experiments

First: Measure a wall and move to the corner task, Visual-Tree-Search requires less than half the time of other planners. Good balance of offline training and online planning.

Second: Light-Dark variant using 32×32×3 RGB photos with some pixels ‘Salt and Pepper’d out in the ‘dark’ region. Visual-Tree-Search is again the best.

They then move around traps, I guess because it has

𝑇

baked in the Visual-Tree-Search doesn’t need anything to cope with this? They swap noise models and find that Visual-Tree-Search is fine with this too.

Bibliography

[SLY15] K. Sohn, H. Lee, and X. Yan, “Learning Structured Output Representation using Deep Conditional Generative Models,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2015.