Most of existing digital content recommendation systems use explicit feedback data like user's ratings. While such systems rely on user input, those do not sense the context. In this paper, we propose a framework for digital content recommendation using only implicit feedback data (i.e., information collected from session usage without any direct feedback from user), which not only considers interactions among users and contents but also various other implicit information available during a video session. To capture interactions among such attributes, we choose Higher-Order Factorization Machines (HoFM) as our predictor and test our approach on real-world video usage data. In the experiments we explore different possible factors that may affect the performance of HoFM predictor. We observe that increasing the number of sessions of users considered to build the predictor significantly improves prediction accuracy, whereas increasing the order or depth of interactions may not. We also present an application of our work to a video recommendation system.