| Mainstream Methods of Multimodal Imitation Learning for Embodied Intelligent Robots |
| Multimodal Output Decoding: Main Methods and Core Challenges in Large-Scale Multimodal Models |
| Multimodal Reasoning: Main Classifications and Mainstream Techniques in Large-Scale Multimodal Models |
| Cross-Modal Fusion: Main Classifications and Techniques in Large-Scale Multimodal Models |
| Training Strategies for Large-Scale Multimodal Models in Embodied Intelligent Robots |
| Methods for Obtaining Multimodal Data |
| Multimodal Input Encoding |