How to run a hackathon for data in the age of the modern data stack
February 15, 2022
Photo courtesy of NASA
A Datathon, or hackathon for data, is a time-constrained event where people form teams to collaborate intensively on data-related projects. Datathons are a great way to inject creativity and fun into a data team’s work, explore unusual ideas, and identify promising avenues for further work or investigation. They are an R&D function— an ideation tool with team bonding and fun built in.
This is Hex’s heartbreak-free guide to running a great datathon in the age of the Modern Data Stack. To be completely honest, the Modern Data Stack doesn't have that much to do with this, I just like to say 'Modern Data Stack'.
Up front plug: We are writing this because we think Hex is just about the best tool for running a datathon that you could hope for. With Hex, teams can easily access data, collaborate in real time in Python and SQL, and effortlessly share their work.
Even if you read no further, if you are interested in using Hex to host a datathon, please reach out and we’ll give you free access.
About this guide
Hackathons and datathons are very interesting and rewarding events to run. They're a bit chaotic and seat-of-the-pants by design, so planning a great event that feels organic yet planned in all the right ways is easier said than done.
This guide provides high level advice built for any datathon, but assumes you’re running an internal datathon for the organization or team you’re a part of. Much of this advice still holds for external customer-facing events, but not all of it.
This guide is perhaps a bit longer than you anticipated at first click! There are just too many short SEO-grab posts out there that say the word "hackathon" a bunch but don't actually provide much useful information. I hope that you find this a bit more substantive— I have organized many datathons and hackathons over the years, and made plenty of mistakes so that you don’t have to.
Keep reading and you will find clear and specific do's & don'ts for organizing a datathon, not vague thought leadership about their value.
As organizer of a datathon, you have 5 key responsibilities:
This is about walking the line and finding the just-right amount of structure. You need to make your datathon enticing, exciting, and inclusive— and crucially, make it clear that attendees won’t be expected to keep up with normal work.
Market your event! Just because it’s your team doesn’t mean everyone will automatically show up. Make flyers, hype it up in slack channels, show off or tease prizes.
Make sure everyone knows what technical expertise is expected. Also consider hosting some learning sessions as part of the event to make it more approachable and valuable.
Clear normal work responsibilities for the duration. Well, as much as reasonably possible.
Don’t make it too long. I’m a fan of a 2-day event— any longer and the outside world of work stress starts building up too much.
Don't over-plan every tiny little detail. The sparkle of chaos and creativity that makes hackathon-type events what they are happens when there’s room to flex and when things don’t go precisely according to plan.
Don't make it too competitive. More on this later, but you want to find a balance of fun & inspiring prizes that don't make everything feel stressful.
Over-communicate expectations and information
You should be checking in with clear status updates starting 2 weeks before and until 2 weeks after your datathon. Be obnoxiously over-communicative and clear about deadlines, schedules, and how excited you are.
Set up a Slack channel for all communications. Or the equivalent, if you don't use Slack. But really, you need some real time messaging. Don't rely on email.
Give your datathon a theme or overarching goal. This helps narrow focus and make projects productive. Bonus points if there are specific categories, or even specific questions that you’d like to see solved.
Give a great kickoff speech. This is really your last chance to set the energy of the event before everyone starts scrambling about. It should be informative, inspiring, and fun.
Before, be very clear about: Timing, schedule, goals/theme, project evaluation, team structure, and prizes.
During, be very clear about: What people should be doing at any given moment, how much time is left, where to get help/pizza.
Don’t overload people with too much information. Over-communicating in this context means being relentlessly clear about expectations and timelines, and repeating information— not sending around a 10-page event document with three appendices.
Don’t freak out if things seem quiet. If everything is going well, everyone will start completely ignoring you after the kickoff session. There’s always a moment (especially during a remote event when you can’t see teams working) where you think “oh s*#% nobody’s doing anything”. This will pass!
Don’t bother teams too much while they’re working. In my experience, while it’s tempting to do little games or polls in the middle of “hacking time”, you’ll never get the response you’re looking for and ultimately, you only want to do them to calm your nerves. People are now doing cool things by themselves— let them work. Circulate, help out, obtain food and drink for hackers, and keep communication to tactical things & the occasional fun status update.
Provide access to relevant data and tools
Data-wise, the gold standard here is to prepare & clean relevant data and index it in some kind of data dictionary.
Tool-wise, if you’re a data team then you probably have your preferred tools already— Use them! That said, datathons can also be a fun way to explore using new tools.
Hex (you knew this was coming!) is an outstanding tool for datathons with technical audiences. If you don’t already use Hex, it makes it easy to load in datasets and give everyone access, collaborate on team projects, and share final results. It also makes it easy for SQL-first folks to work on projects with R or Python-heavy data scientists.
Also, pizza. Or something pizza-like (see: everyone feels welcome). Even if you’re working remotely you should order your hackers some ‘za, or give them a gift card / reimbursement. It’s basically a required tool.
Prepare & clean relevant data as much as possible.
Create a data dictionary and example projects. For some, this might just mean a Google doc or Notion page. If folks will be working in Hex, it’s easy to make a quick data dictionary app in Hex that indexes the relevant data.
Collect an example library of “seed questions” and project ideas. Teams rarely take these ideas outright, but they are often great catalysts to tangential ideas and help guide projects towards the goals.
Make sure everyone has used the recommended tools before. If using new tools, or opening up to a wider audience, make sure everyone is aware of what they’re getting into.
Double-check that any API keys, invitations, and group permissions are properly configured. Because there's no chance you did this right the first time.
Don’t encourage people to dig up their own dirty data and clean it. Some will still do this, but in my experience the time constraints of a datathon means projects that spend all their time on ETL and data cleaning ultimately have less impressive demos than those focused on analyzing existing datasets.
Create a clear process for submitting and judging project artifacts
Communication around project submission & judging should start long before the event and continue throughout. Some eschew prizes for being unnecessarily competitive, but I personally appreciate a couple of light prizes as part of the friendly competition that makes a datathon a “thon”. Consider non-monetary prizes or some “everyone’s a winner” swag to commemorate the event, too.
Keep deadlines top-of-mind. Every time you announce anything (pizza’s here!) note how much time teams have left, any reminders about demo recordings or presentations, and link to the submission form.
Be extremely clear about: the submission, judging, and awards process. Announce who the judges are ahead of time. Pick a diverse cast of fun judges! This means people that do not look like you or do your same job.
Have teams do short 3-5min demos for everyone, if possible. This is the most fun part of the event— when you get to see what everyone worked on. Make sure they get the hype they deserve!
If you have too many teams, you may want to have them all submit videos instead and let preliminary judges select a final cohort to do live demos.
Invite the entire organization to watch demos! Make sure the CEO or some execs attend and ask questions. Better yet, invite them to be judges.
Don’t make any prizes too lavish. These should be fun tokens of a job well done and a fun time had, not something that makes non-winners feel bad.
Don't make judging a black box. Publish your criteria or rubric, and how points are weighted.
Don’t make winning everything. Even though having winners adds some healthy competition and energy to the event, everyone should feel like their work was valuable and worthwhile.
Follow through on promising projects
Remember, a datathon is basically just an over-the-top brainstorming session. Make sure to treat it as such and follow through with interesting avenues of further exploration. This is also, in many ways, the ultimate prize— if a team did a great project in 2 days, giving them a month to dive deeper is a pretty cool reward.
Send a recap email after the event. Highlight projects, relive some of the fun, and mention what work will be prioritized as a result of the datathon.
Prioritize time for deeper investigation on promising projects.
Make sure all projects are collected and archived somewhere safe. They’ll be useful as examples for next time, if nothing else.
Don’t feel you have to follow through with everything. Pick just a few projects to really invest time in.
Don't force it. Didn't wind up with anything mind-blowing or worth investing more cycles in? Rather than waste time forcing further investigation on something not worthwhile, think about what you could have done differently to have more success next time.
If you actually read all this, then you're well on your way to hosting a great datathon. There's still lots of fine details to figure out and planning to be done, but if you keep those basic 5 responsibilities in mind, you'll have a fun and worthwhile event.
Oh, and remember to have fun! Not every project needs to focus on core business data for the event to go well. For every earth-shatteringly genius demo that will change your organization's bottom line, there's a comic relief project that's just as, if not more beneficial to team morale and the overall success of your datathon.
Hex is an outstanding tool for datathons with technical audiences. Let us know you want to host a datathon and we’ll give you free access, no strings attached.