A software engineer specializing in security and architecture spent two weeks evaluating open-source tools built for refugees, students, and small businesses — and found that the gap between a polished demo and a production-ready system is almost always the same gap, repeated in different vocabulary across every project.
There is a particular kind of vulnerability that almost never appears in security audits, because security audits are usually commissioned after a product has reached the threshold where commissioning one makes economic sense. The vulnerability is what happens to software in the years before that threshold: when a project is built by volunteers, hosted on a free tier, used by people who cannot afford to be careful, and maintained — if at all — by whoever has the time that week. Civic technology lives almost entirely inside this gap. So does open-source software for social good.
Sofia Kalinina spent two weeks reviewing nine projects built during sudo make world 2026, an international 72-hour hackathon organized by Hackathon Raptors. The brief was simple: build open-source tools that help. The submissions ranged from a refugee onboarding platform to a small-business loss-prevention dashboard, from a YouTube lecture-to-study notes converter to a 3D driving simulator that taught Indian road law as a karma system. What unified them, in Kalinina’s reading, wasn’t the social mission. It was a recurring set of architectural decisions that quietly determined whether each tool could survive the journey from working demo to deployed system without harming the people it was built to help.
“There is a moment in almost every civic-tech repository I look at,” Kalinina explains, “where you can see exactly when the team stopped thinking about security as an architectural concern and started thinking about it as a feature they would add later. The problem is, if you build that way, later never comes, because the project is already in production by then.”
When the API Key Lives in the Browser
The strongest entry in Kalinina’s batch was Refugee Ready, a multilingual onboarding platform built by Team Dua for newly arrived refugees navigating their first 72 hours in a host country. The project had real depth: clean UI, in-browser OCR for translating documents, a curated directory of Wi-Fi access and shelter information, and a deliberate accessibility-first design. Kalinina scored it 3.95/5.00, the highest weighted score in her batch.
But the architectural weakness she flagged was structural: AI calls were being made directly from the browser. “The in-browser OCR flow is a strong choice for accessibility and cost,” Kalinina wrote in her review. “But the AI calls should be moved to a backend so API keys and sensitive text aren’t exposed in the client.”
For most engineers, this reads as a routine optimization. For Kalinina — and for the population the tool was built to serve — it is something more serious. A refugee uploading a passport, a court summons, or a medical record is not generating data that should pass through a third-party AI inference endpoint with credentials embedded in the page source. The text being scanned may contain identifying information about family members still in danger. The API key being exposed may also be the key that handles other tenants’ traffic. In the worst case, both problems happen simultaneously.
“Civic tech designed for vulnerable users has to make decisions about backend architecture that consumer software gets to defer,” she observes. “When your user is a journalist or a refugee or a domestic abuse survivor, the threat model is not ‘a competitor steals your dataset.’ It is ‘the worst person in your user’s life finds out where they are.’ That changes the math on every component.”
Her recommendation — move the AI inference behind a backend, validate posts, add basic auth — is not exotic. It is the kind of advice a senior engineer would give a junior teammate during a code review. What makes it consequential is that civic tech projects almost never get that code review, because there is rarely a senior engineer in the room to give it.
When Documentation Becomes a Security Surface
The most striking finding in Kalinina’s evaluation wasn’t a missing test or an exposed credential. It was a date in a README file.
The Nurture Minds team submitted a polished mental health platform with thoughtful navigation, clear onboarding, and accessibility-first design touches. Kalinina rated its Impact & Vision 5/5 — the highest possible score. Then she rated its Innovation 1/5, and explained why in her review.
“Your README lists ‘v1.0.0 – Hackathon Release (January 2024),'” she wrote, “which predates the February 27 to March 2, 2026 sudo make world window. Since the event allows libraries but disallows submitting old projects, this date creates avoidable ambiguity. Also, several headline ‘AI’ features in this repo behave like simulated/mock outputs rather than a fully implemented backend pipeline, so it would help to clearly label what is demo vs production-ready.”
This is a category of problem that very few engineers are trained to think about. A README file is not, in the conventional sense, a security artifact. It is a marketing document, a quickstart guide, a place to put badges. But Kalinina’s framing makes the case that documentation is an integrity surface. If a README claims AI features that are actually mock outputs, the gap between claim and reality becomes a place where users are silently misled about what is happening to their data. If a project claims a release date that predates the hackathon window, the integrity of the entire submission becomes ambiguous — not because the project is necessarily fraudulent, but because the ambiguity itself erodes trust in everything else the team says.
“Honesty in documentation is a security control,” she observes. “If your README says you have backend AI processing and your code uses static mocks, the user has no way to know what is real. That gap is exactly where harm happens — because the user makes decisions based on the documented behavior, and the actual behavior diverges in ways the user can’t see.”
Her recommendation to Nurture Minds was concrete: add a “hackathon disclosure” section explicitly stating what was built during the 72 hours and what was reused or forked. Label every AI feature as demo or production-ready. Align README messaging with current shipped behavior. The cost of these changes is minutes. The integrity dividend is permanent.
Privacy as an Architectural Decision, Not a Toggle
The most ethically charged project in Kalinina’s batch was RetailGuard-AI, submitted by Big dawgs — a pose-based loss-prevention dashboard for small retail businesses. The technical approach was deliberately privacy-aware: instead of using face recognition, the system extracted skeletal pose keypoints from video and ran a classifier on the resulting motion patterns. Pose-based detection is structurally less invasive than face identification. It doesn’t bind a behavioral signal to a specific person’s identity. It generalizes across stores without retraining on local faces. For a small business that cannot afford a dedicated security team, it was a thoughtful, defensible architectural choice.
Kalinina scored it 3.15/5.00 and used her comment to push the team further into the privacy-aware design space. “Pose-based signals can be more privacy-friendly than face identity and may generalize across stores,” she wrote. “To improve safety and reduce harm from false positives, blur faces by default in saved captures and reword the UI away from ‘shoplifter certainty’ (e.g., ‘suspicious event’). Adding a few smoke tests would significantly increase trust and reliability.”
The face-blur recommendation is a privacy-engineering pattern. The UI rewording recommendation is something rarer: it is a recognition that the language of a system shapes the actions its operators take. A dashboard that displays “shoplifter certainty: 87%” invites an operator to treat a probabilistic signal as a verdict. A dashboard that displays “suspicious event: 87% confidence” invites the operator to investigate. The difference between those two phrasings, in a small store with no legal team to draft policies, is the difference between an operator who confronts a customer and an operator who watches more carefully.
“In civic tech and small-business tech, you don’t get to assume there will be a layer of legal review between your model and its consequences,” Kalinina observes. “The UI is the policy. The wording on the screen is the rule the operator follows. You have to design it like you know that, because you do.”
Her smoke test recommendation — model load, single-frame inference, prediction output — is a 30-line addition. The blur-by-default change is a single function. The UI rewording is a string change. None of these would have moved her score significantly. But all of them, combined, would have changed the trajectory of the project’s first month in the wild.
The Engineering Maturity Multiplier
A pattern that ran through Kalinina’s reviews — across DevGuard, Fixit, LetsStud, DriveWise, and others — was a request for what she called “engineering maturity.” This was not a demand for enterprise-grade infrastructure. It was a request for the small set of practices that determine whether a project survives contact with new contributors: a .gitignore file that doesn’t accidentally exclude the trained model artifact, a .env.example instead of committed .env files with placeholder secrets, a top-level LICENSE/NOTICE for projects that ported components from upstream sources with incompatible licenses, a single documented “happy path” for installation.
For DriveWise — the 3D driving simulator that ported components from DrivingSchool (MIT) and SUMO (EPL/GPL) — her recommendation was specific: “Tighten licensing and attribution. The repo doesn’t yet include a clear top-level LICENSE/NOTICE file. Also, there are no tests, which increases the risk of regressions as the game logic expands.”
This is the kind of advice that sounds like bureaucratic hygiene but is actually existential. Open-source projects that mix licenses without notice files generate legal exposure that can prevent the project from being adopted by any organization with a legal review process. Projects without tests cannot be refactored without breaking. Projects without .gitignore Discipline accidentally publishes secrets to GitHub, and bots scrape them within minutes of the commit.
“The reason engineering maturity matters in civic tech specifically,” Kalinina explains, “is that the people who will actually adopt your project are not your peer hackathon team. They are some volunteer at a refugee center who has never written Python. They are a small-business owner with a Chromebook. They cannot debug your install script, resolve your dependency conflicts, or rebuild your model from scratch. The discipline you put into reproducibility is the only thing standing between your project and a closed tab.”
Her DevGuard review made the same case in different words: “Make installation and environment setup fully reproducible and document it as a single ‘happy path.’ Some higher-level outputs are primarily derived from import-graph heuristics, so clarifying their scope would improve accuracy and trust. Adding a small suite of smoke tests plus stricter error handling would quickly raise the project’s engineering maturity.”
The phrase “raise the project’s engineering maturity” is what makes Kalinina’s reviews load-bearing. Maturity, in her usage, is not a level on a scale. It is a behavior: the willingness to do the small unglamorous things that allow other people to use your work without your help.
What the Batch Looked Like in the Aggregate
Across nine projects, Kalinina’s average score on Impact & Vision was 4.3 out of 5. Her average on Technical Execution was 2.8. Her average on Innovation was 2.9. Her average on Usability was 3.1. The shape of those averages is a portrait of an entire batch of open-source civic tech: high vision, modest execution, modest novelty, modest practical adoption readiness.
This is not a criticism of the teams. It is a description of the ceiling that hackathon civic tech runs into when it tries to ship — a ceiling defined less by the technical ambition of the participants than by the absence of the senior-engineer time it would take to bridge the gap between “it works on my machine” and “a refugee can install this and trust their data to it.”
“The teams that scored highest in my batch all had something specific in common,” Kalinina notes. “They had thought, before they started writing code, about who the user actually was. Refugee Ready knew that the user was scared and possibly under surveillance. DriveWise knew that the user was learning the rules of a road they had never driven on. The teams that scored lower had usually thought hard about the technology and lightly about the user. That gap is the entire game in civic tech.”
The takeaway from her two weeks of reviews is not a prescription for better hackathon code. It is a recognition that the most consequential work in open-source civic technology is the work that happens in the margins: the line in the README that admits what is real and what is mocked, the function that moves an API key out of the client, the string change that turns “shoplifter certainty” into “suspicious event,” the LICENSE file that lets a non-profit’s lawyers approve adoption. None of these things are visible in a demo. All of them determine whether the demo becomes a product, and whether the product helps anyone.
sudo make world 2026 — Open-Source Tools for Social Good was an international 72-hour hackathon organized by Hackathon Raptors from February 27 to March 2, 2026, with the official evaluation period running March 3–14. The competition attracted over 300 registrants and resulted in 26 valid submissions across six tracks: Education, Climate, Health, Civic, Tools, and Wildcard. Submissions were independently reviewed by a panel of 38 judges across three evaluation batches. Projects were assessed against five weighted criteria: Impact & Vision (35%), Technical Execution (25%), Innovation (20%), Usability (15%), and Presentation (5%). Hackathon Raptors is a United Kingdom Community Interest Company (CIC No. 15557917) that curates technically rigorous international hackathons and engineering initiatives focused on meaningful innovation in software systems.

