Datathon 2023 - “Canvas Cupid”, an ML&AI painting recommender system

8 min readFeb 24, 2023

Welcome everyone!

This blog post has been written to showcase the work of “The Four Emergenteers”, who won ❤ BEST MASTERS ❤, in Datathon 2023.

Before we deep dive into our story, we would like to send huge thanks to all the organizers of the event.

In Datathon 2023, we were asked to play “FREELY” with artworks-related data and no particular assignment was given. Please find more information about the data on the official website.

Okay now, let’s find out what we came up with!

“Canvas Cupid”- The Four Emergenteers

Have you ever looked for another painting to decorate your house? Imagine that you already have some pieces on your wall, but you are not really sure how to find a PERFECT match that could get along nicely with the paintings shining on the wall. What can you do? Would there be any interesting service dedicated to providing a solution to this inquiry?

Ta-da, we are happy to present “Canvas Cupid”, a painting recommender system inspired by multiple data science skills! Alright, what a cool name, but what is inside?

Say you like it or not. We will take of the rest ❤

On “Canvas Cupid”, all you have to do is super simple! Please just indicate whether you like the painting on the screen or not. Like Tinder, you thumb up for the ones you like!

Once you have liked FIVE images in total, our recommender systems will display FOUR different paintings (recommendations) based on the five images! What a cool and easy solution. But, what do the systems look like?

The recommender systems consist of the following:

Image-based
Content-based
NLP
Collaborative Filtering

In this blog post, we do not intend to provide detailed descriptions of each system but rather provide a comprehensive summary to understand what they do. For further details and steps taken, please send us a request to check the codes in our private Git repository. Alright, let’s move on!

Image-based system

The VGG 19 CNN architecture to extract features from an input image

The first recommender system is based on image feature extraction. To do so, we utilized the pre-trained VGG 19 convolutional neural network. The extracted features from the architecture are then used for “unsupervised learning”, specifically the K-means clustering.

Note that images suffer the curse of dimensionality and it is interesting (or highly necessary) to apply dimension reduction techniques. Hence, we added principal component analysis in the pipeline prior to the K-means clustering so that “better” clusters could be formed.

Next, parameter tuning has been done to find the optimal values of “K” clusters and “N” principal components. The used performance metrics are the elbow method and the silhouette coefficient! With the optimal values, you define the clusters and then label each painting according to the cluster it belongs to.

Please find some paintings in each cluster. Clearly, you can find some distinctive features from the clusters at least in terms of color scheme.

Let me present a pseudo-code of this recommender system

Save IDs of the images liked by a user
Check the cluster of the images
Determine the most common cluster
Show a random painting that belongs to that cluster

Content-based

The content-based recommender system is based on the summary of the paintings, that is provided in the original dataset. More specifically, we apply text vectorization to convert a text (summary) to a vector. Dimension reduction is also required in order to project the vector into a 2D space as shown in the graph above.

source: https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similarity-maths-behind-and-usage-in-python-50ad30aad7db

After the vectorization, we can measure the cosine similarity of the vectors. Find a simple example: “Have a great day” and “Have a nice day” seem to be highly similar to each other and hence the cosine angle of these vectors is expected to be very high. Alright. Let’s check how it works in the paintings.

The example is shown above. You can see that the description (summary) of the output is exactly the same as the input, which indicates that our model works properly (100% similarity).

At the same time, it is bizarre that the two paintings share exactly the same description. Everything was based on the original data, and we can raise a (valid) assumption that there was an error in scrapping the data during the preparation stage. We found this discovery great as we can raise awareness that “Collecting GOOD AND READY-TO-GO data” could be more important (and/or difficult) than building a fancy model.

Find a pseudo-code of this recommender system

Save the TITLEs of the images liked by a user
Find one generating the largest cosine similarity according to the matrix
Show the selected painting

Natural Language Processing (NLP)

This recommender system is inspired by Few-shot learning. Simply speaking, Few-shot learning is to help a machine learning model make predictions thanks to ONLY a couple of examples. Famous OpenAI models such as GPT-series have shown the effectiveness of the method. Let’s check how we utilized this to recommend a painting.

We save the TITLEs of the liked images and put them into GPT-2 to generate a new title. In the example above, you can see that the titles contain “snow” and such. As a result, “A Winter Soldier at the Gates of Vienna” is generated. What a cool prompt!

Next, we request StableDiffusion to generate a new artwork with the suggested prompt. The artwork seems amazing!

Find a pseudo-code of this recommender system

Save the TITLEs of the images liked by a user
Generate a new prompt by GPT-2 with the titles
Generate a new artwork by StableDiffusion with the prompt
Show the artwork

Collaborative Filtering

We share the taste! If you like it, I would like it too!

This recommender system is motivated by collaborative filtering using similarities between users and paintings simultaneously to provide recommendations. Let us give you a simple example.

Person A: “I like Painting 1 and 2”

Person B: “I like Painting 1”

System: ‘Person B may like Painting 2 too!’

To implement this, we distributed a survey asking to indicate “like and dislike” for 100 randomly picked paintings. In total, we collected 22 responses (thanks to our EMERGENT community).

Next, we encode “like” to be 1 and “dislike” to be 0. Hence, 22 users' responses comprise one single tabular data with binary values.

The selected paintings, on “Canvas Cupid”, are also encoded the same way (like = 1, dislike = 0). What’s next? How does the system recommend?

Some mathematics comes upfront. No need to explain the equations. Simply, we are interested in “User-User similarity” and utilized the modified Jaccard Index for the task. In other words, we check the similarity between the 22 reference users and the current (particular) user. Look at the example above; the current user is most similar to “Rater 5"!

Next, we move on to “User-Item Prediction” according to the formula described above, and we see that “Unseen Painting 3” could be the best recommendation!

Find a pseudo-code of this recommender system

Save the IDs of the images liked by a user
Compute the “User-User similarity”
Determine the most similar rater
Compute the “User-item prediction”
Show the one with the highest prediction score

Simple Demonstration

The video demonstrates how the systems operate. Again, all you have to do is to like or dislike the painting on the screen. Once we cumulate 5 likes, we recommend 4 different paintings by each recommender system. Cool! Let’s check out the compact summary.

Idea

“Canvas Cupid” is to recommend a painting that meets your taste.

Recommender Design

Image-based

VGG 19 for feature extraction
PCA + Kmeans clustering
The elbow method + silhouette coefficient for parameter tuning

Content-based

Text Vectorization
Cosine Similarity

NLP

Few shot learning
GPT-2 & StableDiffusion

Collaborative Filtering

Binary encoding
User-User similarity (modified Jaccard Index)
User-Item similarity

Our aim for the hackathon was very simple: “What can Data Science do?”. We intend to exhibit multiple aspects of Data Science and AI and tell people there are so many fascinating techniques that may help your life!

So, shall we ask you one question?

“Can Canvas Cupid help you find your PERFECT MATCH?”

Would you like to find out more? You may send us a request to check our codes. We will process your request and give access to the private Git repository.

Would you like to see the full slide deck? Check this out (Anja, your hard work deserves to be recognized):

Canvas Cupid Slide Deck

Kudos to all members of “The Four Emergenteers”! Would you like to network? Please find our LD profiles:

Abel Kempynck | Anja Deric | Gunho Lee | Senne Colsen

The Four Emergenteers (check the beautiful Emergent team sweater)

Last but not least, we would like to thank Emergent Leuven for all the support behind! We are recruiting new members to continue the BEST (yes you just found out how good we are!) Data Science & AI student organization in Leuven ❤ Do not hesitate to reach out to us!

Datathon 2023 - “Canvas Cupid”, an ML&AI painting recommender system

“Canvas Cupid”- The Four Emergenteers

Image-based system

Content-based

Natural Language Processing (NLP)

Collaborative Filtering

Idea

Recommender Design

Written by Gunho Lee

No responses yet