The practice of making the item-based collaborative filtering in python.

Photo by CardMapr.nl on Unsplash

Item-based collaborative filtering is the recommendation system to use the similarity between items using the ratings by users. In this article, I explain its basic concept and practice how to make the item-based collaborative filtering using Python.

Basic Concept

The fundamental assumption for this method is that a user gives similar ratings to similar movies. Consider the following table for movie ratings.


Photo by Glenn Carstens-Peters on Unsplash

Content-based recommender is the system to rely on the similarity of items when it recommends items to users. For example, when a user likes a movie, the system finds and recommends movies which have more similar features to the movie the user likes. (Feature 1)


Part 2: Creating a database and tables in MySQL, and Importing files into tables

Photo by Caspar Camille Rubin on Unsplash

A Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query Language (SQL) is a programming language used to communicate with data stored in the RDMS. The SQL skill for using a RDMA is required for many data-related positions these days. In the social media forums like Quora or Reddit, there are many people who search for a public database for practicing their SQL querying skills. However, although there are many public data sets in…


Part 1: Creating an Entity Relational Diagram (ERD)

Photo by Caspar Camille Rubin on Unsplash

A Relational Database Management System (RDMS) is a program that allows us to create, update, and manage a relational database. Structured Query Language (SQL) is a programming language used to communicate with data stored in the RDMS. The SQL skill for using a RDMA is required for many data-related positions these days. In the social media forums like Quora or Reddit, there are many people who search for a public database for practicing their SQL querying skills. However, although there are many public data sets in a single spreadsheet, there are not…


Analysis and prediction for the housing market prices using Cross Validation and Grid Search in several regression models

Photo by SGC on Unsplash

In this article, I analyze the factors related to housing prices in Melbourne and perform the predictions for the housing prices using several machine learning techniques: Linear Regression, Ridge Regression, K-Nearest Neighbors (hereafter, KNN), and Decision Tree. Using the methods of the Cross Validation and Grid Search techniques, I find the optimal values for hyper parameters in each model, and compare the results to find the best machine learning model to predict the housing prices in Melbourne.

The entire code for this…


Balance between Concentration and Efficiency

Photo by ben o'bro on Unsplash

It is well known that countries become urbanized as their economies grow. Here, I will explore how the world has became urbanized over time, and how urbanization looks different according to income levels.

Reading the Data

import numpy as np 
import pandas as pd
import geopandas as gpd
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
# read the file of 'indicators'
data = pd.read_csv('...\Indicators.csv')

The data source for this analysis is the data set called ‘World Development Indicators’ in the Kaggle datasets. Among the several files under the title, ‘indicator.csv’ was used.

data.head()

This is the analysis about the determinants of the period between movie trailer release dates and actual movie release dates.

Let’s define the Promotion Period as the period between the official movie trailer release date and the movie release date. The picture above visually shows what the promotion period used in this analysis means. The next picture displays the distribution of this promotion period for the movies released in the US between 2017 and 2019.


Scrapy is a Python tool for scraping web pages. This article covers how to scrape web pages with Scrapy. The website that will be scraped here is the Box Office Mojo. Especially, I will extract information about revenues, distributors, genres, etc.

Photo by Addy Ire on Unsplash

The Web Page for Scraping

The goal of this project is to check all the movies released in the US during certain periods of time and extract useful information about the individual movies. If you check any movie through the Box Office Mojo, you can see all the related information about the movie like the following format:

Yohan Jeong

Business Analyst at Samsung Electronics America. PhD in Economics from University of California, Davis.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store