The increasing popularity of head-mounted displays and 360° video cameras has encouraged content providers to provide virtual reality video streaming over the Internet, using HTTP adaptive streaming to deliver a two-dimensional representation of the immersive content. However, since only a limited part of the video (i.e., the viewport) is watched by the user, the available bandwidth is not optimally used. Recently, adaptive tile-based video streaming has been proposed; rather than sending the whole 360° video at once, the video is cut into temporal segments and spatial tiles. Each tile can be requested at a different quality level, giving priority to content within the viewport. This results in higher video quality and an increased bandwidth utilization. In this paper, we address three open research questions, concerning viewport prediction, tile-based rate adaptation, and application layer optimizations. First, we present a content-agnostic viewport prediction scheme based on spherical walks. Second, we introduce a new rate adaptation heuristic for tile-based video, which prioritizes tiles according to the great-circle distance between the viewport's and the tile's center. Third, we investigate the advantages of using HTTP/2 server push. We show that the proposed optimizations result in significant improvements in terms of viewport prediction error and video quality, compared to state-of-the-art solutions.