2023-12-03
Sampada Deglurkar, Michael Lim, Johnathan Tucker, Zachary Sunberg, Aleksandra Faust, Claire Tomlin.
I rate this 8.1
More, smaller, models trained to their own goals end up being more efficient and robust. This is… close to what I was attempting, but I think my “produce an image” thing was insane. Next step: go find this thing and see how it is proposing particles. I think I may be able to adapt this so that I’m keeping a bunch of unrelated particles alive and just deciding which ones to kill.
Uncertainty under both sensing and planning. Finding an optimal POMDP policy is computationally demanding, often intractable, due to the uncertainty introduced by imperfect observations. “Continuous POMDPs” denote continuous state, action, and observation spaces. GANs are better than Autoencoders. In our MCTS planner, GANs propose state particles given current observation.
Ground truth state information is available to the planner during training, but not during testing.
POMCPOW and friends use weighted collections of particles to represent complex beliefs.
Instead of end-to-end, use a basket of model components.
Learn two neural network based models: 1) Observation Density, and 2) Particle Proposer. Assume transition model is known (could be a physical system), could also learn .
For online planning, Particle Filter Trees-Double progressive Widening algorithm. Easy to implement. Vectorizes particle filtering. D8 action space, however “could work with continuous action space in principle”. (Press X to doubt)
Learn a deep generative model that generates an observation given a state? They use Conditional Variational Autoencoder (CVAE [SLY15]) after some experiments.
State, Action, Observations are pre-collected. Belief can be sampled on the fly. and are trained with adversarial objectives. is discriminator that gives higher likelihood to states that are more likely for a given observation. proposes plausible state particles for a given observation. Transform it from a reinforcement learning problem to a compositional learning problem. Training each model is framed as an unsupervised learning problem.
First: Measure a wall and move to the corner task, Visual-Tree-Search requires less than half the time of other planners. Good balance of offline training and online planning.
Second: Light-Dark variant using 32×32×3 RGB photos with some pixels ‘Salt and Pepper’d out in the ‘dark’ region. Visual-Tree-Search is again the best.
They then move around traps, I guess because it has baked in the Visual-Tree-Search doesn’t need anything to cope with this? They swap noise models and find that Visual-Tree-Search is fine with this too.