Yesterday on the Datasette Discord, Simon teased a new version of llm with multimodal capabilities. With one tiny command line tool, you can throw images at GPT-4o, Llama, Claude, and Gemini and ask for interpretations or details.

My mind snapped to one use case: fog forecasts. Using Alert California’s Sutro Tower webcam, we can pipe a current image of Downtown San Francisco into GPT-4o and ask, “Is it foggy?” With zero tuning, this worked perfectly:

llm prompt -m gpt-4o "tell me if it's foggy in this image, reply on a scale from 1-10 with 10 being so foggy you can't see anything and 1 being clear enough to see the hills in the distance. Only respond with a single number." -a https://cameras.alertcalifornia.org/public-camera-data/Axis-SutroTower2/panogrid/latest-pg-2.jpg

But we can go further…

Why not grab the current conditions from Weather.gov? Let’s feed that into the prompt as well and make the prompt a bit more evocative, why not?

Below is the weather forecast for Downtown San Francisco: 

 - Today: Sunny. High near 61, with temperatures falling to around 59 in the afternoon. West wind 5 to 14 mph, with gusts as high as 20 mph.
 - Tonight: Partly cloudy, with a low around 51. West wind 5 to 13 mph, with gusts as high as 18 mph.

Current local time: 2024-10-29 11:32:09

Review these two images and assess the weather, specifically looking for where any fog is, the clarity of the day, and more. The first image is a view of the city, looking North from Sutro Tower, towards the Golden Gate Bridge. The second image is a view of the city, looking Northeast from Sutro Tower, towards Downtown.

Considering the weather forecast and the images, please write a weather report for Downtown San Francisco capturing the current conditions; the expected weather for the day; how pleasant or unpleasant it looks; how foggy it is and/or where the marine layer is; how one might best dress for the weather; and what one might do given the conditions, day, and time. Remember: you will generate this report many times a day, your recommended activities should be relatively mundane and not too cliche or stereotypical.

Do not use headers or other formatting in your response. Just write one to two single paragraphs that are elegant, don't use bullet points or exclamation marks, don't mention the images as input, and use emotive words more often than numbers and figures – but don't be flowery. You write like a novelist describing the scene, producing a work suitable for someone calmly reading it on a classical radio station between songs. With a style somewhere between Jack Kerouac and J. Peterman.

Remember to keep the response under 500 words.

J. Peterman is perfect prompt fodder for something silly like this. He’s an imitation of Hemmingway, parodied by a sitcom writers’ room. LLM tone, no matter how much prompting or fine tuning you layer in, always feels a degree or two removed from what you’re going for. It’s a blurry JPEG, an average of an average of what most people think Kerouac or Hemmingway sounded like. So we’ll lean into JP.

Finally, we’ll append one more line to the prompt as a cheap hack to save an API call (and to avoid wrangling proper JSON output):

After the weather report, please put an HTML color code that best represents the weather forecast, time of day, and the images.

We’ll strip that out with some regex and use it to color some horizontal dividers in a basic HTML page. We’ll write a quick Github Action to run it every hour and host it all with Github Pages…

And voila: your descriptive Downtown San Francisco weather report.

Check out the code and build your own!


Have thoughts? Send me a note