content based filtering | MyJourneyAsaDataScientist

Here is an overview of the features we wanted to use to determine the “score” of a paper that we would then rank and output to the user as recommendations.

We also decided that since we had created these “topics”, and were running the LDA inferencer on all the new papers everyday classifying them into topics, we would provide topic based recommendations as well- so if a new user came in, and was browsing the topics- they could see the top papers in that topic. Ofcourse, in addition to having high topic probability, these papers were ranked by recency of publication, impact factor of their host journal and tweet counts (if any)!

For personalized recommendations, we decided we would first use topic similarity between the users papers (or library) and the corpus of all recent papers to filter or shortlist possible candidate papers to recommend, and then use word similarity to further refine the selection. The final ranking would use our special ‘sauce’ based on tweet counts, date of publication, author quality etc to order these papers and present to the user!

This involved connecting various pieces of the pipeline and by September 2014, we had a working pipeline that generated and displayed topic recommendations and library recommendations (if a user had uploaded a personal library) on the website!!

YaY!

Here is a list of books/talks I found useful: Introduction to Recommender systems (coursera) Intro to recommender Systems – a four hour lecture by Xavier Amatriain Coursera: Machine Learning class: Section on Recommender systems

MyJourneyAsaDataScientist

About eighteen months ago I decided to leave astronomy, change my career trajectory and follow the Data Science Bandwagon- this is a blog about that ongoing journey…

Category Archives: content based filtering

Generating Topic and Personal recommendations