If you are bored this weekend here’s an interesting small project
Here me out, what if, from the JSON file you iterate each project and scrape it’s webpage (please add a sensitive delay between calls so you don’t stress Google’s servers), from there you take the title and the description, and using the Youtube’s API you also get the videos duration (I think @ehsa_293 already did this).
From that data scrap videos shorter than 30 seconds (those longer than 3 minutes are not technically disqualified tho, BUT I will also scrap those over 5 minutes).
Now here’s an interesting extra step you could also do, from each video extract the audio a pass it tru a speech to text AI and get that data too. At the end you should have a CSV file with:
title | shortDescription | description | category | videoTranscription
Now comes the fun part, take the competitions rules (the whole thing) and put that into a pdf/doc file, and feed that and the CSV to the 3 main LLMs; Gemini, ChatGPT and Claude, and with a clever prompt asking them to act as judges based on the rules files to choose the winners.
I think it will be super interesting for us to discover cool projects among the huge amount there is in the list and also to compare the results between the LLMs and between them and the actual official results!
I wouldn’t recommend this, as it’d introduce a sort of Bias. a validator is definitely useful, however going towards direct judging even if generated will introduce unwanted Bias, which is against the randomization Google is going for.
yeah thats why I added it as an extra step, it will be around 150USD according to my estimations using the Google’s speech to text API, BUT, there are very good local open source speech to text projects, you could use them to make this essentially free.
I was going to do that a few days ago, you also have to consider a rating system and manage the ratings and eliminations properly, mostly some math, but then I realized I’m already 500 dollars in debt to Gemini api lmao. Although, I was going to make it choose three potential winners for each category and then compare it to the judge’s choices. Also, I didn’t use any Youtube API to get the durations, I showed the code how I did it, found the specific class name responsible for the length of the video in the html file and the rest done with selenium.
I believe there is a set of cli tools that will parse that transcript from a YouTube videos. Might be Fabric but I’m on vacation right now and don’t have the mind for it
Just noticed now that I can do it for free without using any APIs, only tampermonkey, it would just take about 5 hrs for it to finish (5 secs per entry)
It’s useless since most videos (as Google itself recommended) shouldn’t have a lot of speech in it as you should rather show than describe your product.