recommender system | MyJourneyAsaDataScientist

Here is an overview of the features we wanted to use to determine the “score” of a paper that we would then rank and output to the user as recommendations.

We also decided that since we had created these “topics”, and were running the LDA inferencer on all the new papers everyday classifying them into topics, we would provide topic based recommendations as well- so if a new user came in, and was browsing the topics- they could see the top papers in that topic. Ofcourse, in addition to having high topic probability, these papers were ranked by recency of publication, impact factor of their host journal and tweet counts (if any)!

For personalized recommendations, we decided we would first use topic similarity between the users papers (or library) and the corpus of all recent papers to filter or shortlist possible candidate papers to recommend, and then use word similarity to further refine the selection. The final ranking would use our special ‘sauce’ based on tweet counts, date of publication, author quality etc to order these papers and present to the user!

This involved connecting various pieces of the pipeline and by September 2014, we had a working pipeline that generated and displayed topic recommendations and library recommendations (if a user had uploaded a personal library) on the website!!

YaY!

Here is a list of books/talks I found useful: Introduction to Recommender systems (coursera) Intro to recommender Systems – a four hour lecture by Xavier Amatriain Coursera: Machine Learning class: Section on Recommender systems

About eighteen months ago I decided to leave astronomy and follow the Data Science Bandwagon- this is a blog about that journey. I spent a few months studying DataScience courses on Coursera and Udacity and was fortunate enough to become part of a project to build a “recommender system for Biomedical Literature”.

Some background: Turns out that the biomedical field is growing so rapidly that it is getting really difficult to keep up with the literature. For newcomers to the field, its hard to figure out what research papers to read, where to start as few thousand articles are published daily and new /open source journals are popping up regularly. For veterans its hard to keep up and not enough hours in the day to scan through the articles to figure what is relevant, new and exciting in their area of research. This is true not just for the academic researchers but also those in the related fields of medicine and bioinformatics. Here is a recent plot I made of number of papers/month uploaded to pubmed (a popular biomedical research literature repository). As you can see, there are about ~92,000 new publications a month…

I have been working mainly on the algorithm design and development for this project and my intention with this blog is to focus on that and my growth as a data scientist.

MyJourneyAsaDataScientist

About eighteen months ago I decided to leave astronomy, change my career trajectory and follow the Data Science Bandwagon- this is a blog about that ongoing journey…

Tag Archives: recommender system

Generating Topic and Personal recommendations

The making of a recommendation system-