The explosion of social video sharing sites gives new challenges on video search and indexing techniques. Because of the concept diversity in social videos, it is very hard to build a well annotated dataset that provides good coverage over the whole meaning of concepts. However, the prosperity of social videos on the internet also make it easy to obtain a huge number of videos, which gives an opportunity to mine the semantic content from an infinite amount of video entities. In this paper, we focus on improving the performance concept detectors and propose a refinement framework based on a semi-supervised learning technique. In our framework, the self-training algorithm is employed to expand the training dataset with automatically labeled data. The contribution of this paper is to demonstrate how to utilize the visual feature and text metadata to enhance the performance of concept classifier with a lot number of unlabeled videos. By experimenting on a social video dataset with 21,000 entities, it is shown that after expanding the training set with automatically labeled shots, the concept detectors' performance can be significantly improved.