Datathon 2023 - “Canvas Cupid”, an ML&AI painting recommender system

Gunho Lee
8 min readFeb 24, 2023

--

Welcome everyone!

This blog post has been written to showcase the work of “The Four Emergenteers”, who won ❤ BEST MASTERS ❤, in Datathon 2023.

Before we deep dive into our story, we would like to send huge thanks to all the organizers of the event.

In Datathon 2023, we were asked to play “FREELY” with artworks-related data and no particular assignment was given. Please find more information about the data on the official website.

Okay now, let’s find out what we came up with!

“Canvas Cupid”- The Four Emergenteers

Have you ever looked for another painting to decorate your house? Imagine that you already have some pieces on your wall, but you are not really sure how to find a PERFECT match that could get along nicely with the paintings shining on the wall. What can you do? Would there be any interesting service dedicated to providing a solution to this inquiry?

Canvas Cupid! What an amazing name ❤

Ta-da, we are happy to present “Canvas Cupid”, a painting recommender system inspired by multiple data science skills! Alright, what a cool name, but what is inside?

Say you like it or not. We will take of the rest ❤

On “Canvas Cupid”, all you have to do is super simple! Please just indicate whether you like the painting on the screen or not. Like Tinder, you thumb up for the ones you like!

Once you have liked FIVE images in total, our recommender systems will display FOUR different paintings (recommendations) based on the five images! What a cool and easy solution. But, what do the systems look like?

The FOUR recommenders

The recommender systems consist of the following:

  1. Image-based
  2. Content-based
  3. NLP
  4. Collaborative Filtering

In this blog post, we do not intend to provide detailed descriptions of each system but rather provide a comprehensive summary to understand what they do. For further details and steps taken, please send us a request to check the codes in our private Git repository. Alright, let’s move on!

Image-based system

The VGG 19 CNN architecture to extract features from an input image

The first recommender system is based on image feature extraction. To do so, we utilized the pre-trained VGG 19 convolutional neural network. The extracted features from the architecture are then used for “unsupervised learning”, specifically the K-means clustering.

SAY SMTH

Note that images suffer the curse of dimensionality and it is interesting (or highly necessary) to apply dimension reduction techniques. Hence, we added principal component analysis in the pipeline prior to the K-means clustering so that “better” clusters could be formed.

Next, parameter tuning has been done to find the optimal values of “K” clusters and “N” principal components. The used performance metrics are the elbow method and the silhouette coefficient! With the optimal values, you define the clusters and then label each painting according to the cluster it belongs to.

Examples from each cluster

Please find some paintings in each cluster. Clearly, you can find some distinctive features from the clusters at least in terms of color scheme.

Let me present a pseudo-code of this recommender system

  • Save IDs of the images liked by a user
  • Check the cluster of the images
  • Determine the most common cluster
  • Show a random painting that belongs to that cluster

Content-based

source: https://towardsdatascience.com/hands-on-content-based-recommender-system-using-python-1d643bf314e4

The content-based recommender system is based on the summary of the paintings, that is provided in the original dataset. More specifically, we apply text vectorization to convert a text (summary) to a vector. Dimension reduction is also required in order to project the vector into a 2D space as shown in the graph above.

source: https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similarity-maths-behind-and-usage-in-python-50ad30aad7db

After the vectorization, we can measure the cosine similarity of the vectors. Find a simple example: “Have a great day” and “Have a nice day” seem to be highly similar to each other and hence the cosine angle of these vectors is expected to be very high. Alright. Let’s check how it works in the paintings.

The example is shown above. You can see that the description (summary) of the output is exactly the same as the input, which indicates that our model works properly (100% similarity).

At the same time, it is bizarre that the two paintings share exactly the same description. Everything was based on the original data, and we can raise a (valid) assumption that there was an error in scrapping the data during the preparation stage. We found this discovery great as we can raise awareness that “Collecting GOOD AND READY-TO-GO data” could be more important (and/or difficult) than building a fancy model.

Find a pseudo-code of this recommender system

  • Save the TITLEs of the images liked by a user
  • Find one generating the largest cosine similarity according to the matrix
  • Show the selected painting

Natural Language Processing (NLP)

Few-shot learning example

This recommender system is inspired by Few-shot learning. Simply speaking, Few-shot learning is to help a machine learning model make predictions thanks to ONLY a couple of examples. Famous OpenAI models such as GPT-series have shown the effectiveness of the method. Let’s check how we utilized this to recommend a painting.

Example of the recommender system

We save the TITLEs of the liked images and put them into GPT-2 to generate a new title. In the example above, you can see that the titles contain “snow” and such. As a result, “A Winter Soldier at the Gates of Vienna” is generated. What a cool prompt!

New artwork

Next, we request StableDiffusion to generate a new artwork with the suggested prompt. The artwork seems amazing!

Find a pseudo-code of this recommender system

  • Save the TITLEs of the images liked by a user
  • Generate a new prompt by GPT-2 with the titles
  • Generate a new artwork by StableDiffusion with the prompt
  • Show the artwork

Collaborative Filtering

We share the taste! If you like it, I would like it too!

This recommender system is motivated by collaborative filtering using similarities between users and paintings simultaneously to provide recommendations. Let us give you a simple example.

Person A: “I like Painting 1 and 2”

Person B: “I like Painting 1”

System: ‘Person B may like Painting 2 too!’

Thanks Emergent

To implement this, we distributed a survey asking to indicate “like and dislike” for 100 randomly picked paintings. In total, we collected 22 responses (thanks to our EMERGENT community).

Binary encoding

Next, we encode “like” to be 1 and “dislike” to be 0. Hence, 22 users' responses comprise one single tabular data with binary values.

Binary encoding for the current user

The selected paintings, on “Canvas Cupid”, are also encoded the same way (like = 1, dislike = 0). What’s next? How does the system recommend?

User-User similarity

Some mathematics comes upfront. No need to explain the equations. Simply, we are interested in “User-User similarity” and utilized the modified Jaccard Index for the task. In other words, we check the similarity between the 22 reference users and the current (particular) user. Look at the example above; the current user is most similar to “Rater 5"!

User-Item Prediction

Next, we move on to “User-Item Prediction” according to the formula described above, and we see that “Unseen Painting 3” could be the best recommendation!

Find a pseudo-code of this recommender system

  • Save the IDs of the images liked by a user
  • Compute the “User-User similarity”
  • Determine the most similar rater
  • Compute the “User-item prediction”
  • Show the one with the highest prediction score
Simple Demonstration

The video demonstrates how the systems operate. Again, all you have to do is to like or dislike the painting on the screen. Once we cumulate 5 likes, we recommend 4 different paintings by each recommender system. Cool! Let’s check out the compact summary.

Idea

  • “Canvas Cupid” is to recommend a painting that meets your taste.

Recommender Design

Image-based

  • VGG 19 for feature extraction
  • PCA + Kmeans clustering
  • The elbow method + silhouette coefficient for parameter tuning

Content-based

  • Text Vectorization
  • Cosine Similarity

NLP

  • Few shot learning
  • GPT-2 & StableDiffusion

Collaborative Filtering

  • Binary encoding
  • User-User similarity (modified Jaccard Index)
  • User-Item similarity
Would you trust “Canvas Cupid”?

Our aim for the hackathon was very simple: “What can Data Science do?”. We intend to exhibit multiple aspects of Data Science and AI and tell people there are so many fascinating techniques that may help your life!

So, shall we ask you one question?

“Can Canvas Cupid help you find your PERFECT MATCH?

Would you like to find out more? You may send us a request to check our codes. We will process your request and give access to the private Git repository.

Would you like to see the full slide deck? Check this out (Anja, your hard work deserves to be recognized):

Canvas Cupid Slide Deck

Kudos to all members of “The Four Emergenteers”! Would you like to network? Please find our LD profiles:

Abel Kempynck | Anja Deric | Gunho Lee | Senne Colsen

The Four Emergenteers (check the beautiful Emergent team sweater)

Last but not least, we would like to thank Emergent Leuven for all the support behind! We are recruiting new members to continue the BEST (yes you just found out how good we are!) Data Science & AI student organization in Leuven ❤ Do not hesitate to reach out to us!

❤ Emergent Leuven ❤

--

--

Gunho Lee
Gunho Lee

No responses yet