Towards Cross-Modal Point Cloud Retrieval for Indoor Scenes