It’s been a while since I’ve written something for Technically Speaking. My apologies, I’ve kept busy with a dozen or so projects, two of which I’d like to present today (I’ll save the others for later posts, once I kick off my Bots series).
First off, Technically Speaking now offers a free Hastebin instance, available at https://hastebin.technicallyspeaking.tech/. For those that don’t know, Hastebin is an open-source version of Pastebin, a quick and easy way to share text or code with others. Simply visit the link above, paste your text in the web page, and click “Save.” Your file will be encrypted and stored securely on my personal server for 30 days (I’m not snooping, I promise). A random link is also generated, which can be used to access the file. Any time the file is modified and saved, a new link is generated, making it extremely difficult for someone to brute-force their way into finding a particular file.
In addition to the encryption enforced by the Hastebin software, the sub-domain is also secured with an SSL certificate from Let’s Encrypt (noted by the HTTPS in the link and the padlock icon in your URL bar).
Feel free to use the Hastebin as you see fit.
Secondly, I’d like to formally announce another publicly-available sub-domain (this from my personal website), https://gdot.nathandharris.com/, for archived photos from Georgia Department of Transportation traffic cameras.
This one requires a bit of a backstory and explanation. Last week, I witnessed a hit-and-run crash right in front of me along the highway. I pulled over to help the driver and give witness testimony to law enforcement. Since the other vehicle that struck the driver continued on after the accident, there is likely little investigators can do to identify and charge the other driver for the incident. A member of the driver’s family pointed across the highway to a handful of traffic cameras, saying they could get the footage from them to identify the vehicle. Unfortunately, as I told them and law enforcement confirmed, those cameras only took photos every 20 minutes or so, rather than recording video, and it was unlikely the moment of the crash was captured on film. Even worse, any new photo from the camera replaces the old one, meaning there’s no historical record to dig through.
That got me thinking: I understand that the Georgia Department of Transportation uses the cameras mainly for live or near-live monitoring, but the fact that there is no way to revisit a moment in time seems ridiculous.
So, I built one.
I won’t go line-by-line through my code (the entire downloading and processing system only took approximately 40 lines of Python scripting), but I’ll give a basic overview of how the system works. If you’re interested in the full code, it’s up on my GitHub page here.
Images from the GDOT cameras are publicly available through the department’s website. My Python script visits the link for each camera’s photo and downloads the image, organizing the photos by date, highway and time. Because of the naming structure for each file, multiple photos from the same camera should appear next to each other chronologically in the directory listing.
Now, there are close to 500 cameras spread out across the three major interstates, I-75, I-85 and I-285, in the Atlanta area, plus GA 400, a popular state route, and a stretch where I-75 and I-85 merge in the heart of Atlanta. As I said earlier, the cameras usually take a new photo every 15 to 20 minutes; assuming I’m catching every single new photo by scanning every 15 minutes, that’s four photos per camera, or approximately 2,000 photos every hour.
Not that I’m crunched for space (this sub-domain runs on my beefy home server, rather than the Raspberry Pi where most of my web sites, including this blog, are hosted), but that’s a lot of photos, and a lot of potentially wasted space. So, there’s a bit of post-download processing that happens for each new photo.
First off, each photo is compared pixel-by-pixel to the previous one from the same camera, to see if the new photo truly is new. There’s no sense in keeping the same photo twice, so the duplicate is immediately deleted.
Secondly, a large amount of the cameras are unavailable; rather than a picture of the highway, the link to the camera’s image returns a placeholder image saying a snapshot from the camera is not currently available. This is also useless for my purposes; any new photo is compared to the placeholder photo, and deleted if it matches.
In the end, while perhaps only half of the downloaded photos each round may be kept, those that are kept are unique, actual photos of the highways rather than duplicates or placeholders. Over twelve hours of downloading every 15 minutes, and I’m looking at just under 1 GB of images. At that rate, I could store an entire year’s worth of images for less than 1 TB of space.
The downloading-processing code was rather simple to write, thanks to the diffimg Python library, which makes comparing images simple, as well as the wget Python library which helps with downloading the photos from their links. In fact, getting the links to each individual camera phot was perhaps the toughest part. At first, I was going to manually right-click and copy the URL for each photo from GDOT’s website, but eventually I decided that I could probably write a program to grab the links in one fell swoop in less time.
I’m not sure if I ended up saving time (I’d like to think that I wrote the program quick enough that I did), but after a bit of exploring of the GDOT’s website’s source code, I was able to cobble together a web scraper using BeautifulSoup that visits each page and printed out the link to each image (pre-formatted with the quotation marks and commas I needed for the Python script) into the console. Then, it was just a matter of copying and pasting the list into dictionary items in the main Python script. The entire script is nearly 600 lines, but approximately 550 of those lines are the dictionaries with all the camera links in them.
After learning how to set up an Nginx web server in a Docker container on unRAID, I soon had the website up and running, and the Python script set to run every 15 minutes through a cron job. The result: A chronological, historical list of photos from Georgia highway traffic cameras.
But what’s the point? Well, there’s not much of one, at least for me. I have no real use for the images; the code was a fun challenge and proof-of-concept for me, but I have no need for the end result. Nevertheless, I’ll keep it running and publicly available, in case someone does.
In what situation would this be necessary? Well, one hypothetical I used to explain to my friend is for news reporting. Two years ago, a tiger got loose in Henry County, just south of Atlanta, late at night and wandered around on I-75. Now, there were plenty of photos of the incident from eye witnesses and law enforcement officials, but say there weren’t. Say the only photographic evidence of the incident is a photo taken by the traffic camera. Or, to bring it it back to what inspired this project, imagine a traffic accident like a hit-and-run, where photographic evidence can help an investigation. Now imagine that that photo is deleted, replaced with a new one, that data lost forever. If only there was something out there keeping a historical record of those photos.
Well, now there is.