Home
======================================================================
=                   : 

on gopher recipes

=
======================================================================

I like cooking and, by extension, I like recipes. I also like Gopher and simple formats. I also hope to be able to preserve some old cookbooks from grandparents while also making them accessible online. I'll be talking about some of the things I've looked into as far as preserving recipes goes.

In an ideal world, these old cookbooks would just be scanned, maybe even run through OCR to preserve their content. This would then be indexed and all work automatically. In practice, I don't expect that option to pan out because it requires a lot of luck at every step (will the scan be clear enough? will the OCR fail and be subtly wrong in the hundreds of pages? will I be comfortable possibly bending pages to get it scanned? will there be any degree of semantic information possible to pull out?).

Another alternative is that I simply take good care of the books. This I can and will do, but it still doesn't solve the problem of keeping the information available online or accessible from a phone.

Most importantly, it doesn't scratch the itch of "wanting to play with file formats and protocols," which is the most important thing for this little idea. Of course if I wanted to just get it done, I would just scan it and be content with PDFs, but that doesn't really help me grow.

Let's talk about what the success criteria are.

  1. Easy to edit. These files will likely be hand-typed by me, or otherwise generated by a tool, and it's best if this is easy to modify. As an example of something that isn't immediately easy to edit: binary formats (I haven't found one yet, but I'm sure it exists), XML/JSON (getting quoting right can be challenging with free-form text, plus having to quote all keys), YAML (it's possible to get right, but the level of nesting is complex).

  2. Easy to process. I don't intend to spend much time coding something for this up, so that means a very minimal format will make things easy. This also precludes using XML, JSON or YAML because I would have to use a library to parse the files. Additionally, these are all nested formats which encodes more information than I'd like (I would prefer a flat format).

  3. Easy to repurpose. I don't expect a format like this to last long. Just looking at all the existing formats (detailed in the references section) should show that most formats are "write once, use once." Whatever format, I should be able to turn it into another one with relative ease.

  4. Easy to "web." My main target is to have this in a browser (a secondary is to have it accessible from Gopher). This means taking full advantage of browser-centric technology, or in short, using h-recipe. This is a format that search engines support which means that pretty much everything else supports it. Furthermore, h-recipe already provides a pretty good vocabulary for recipes that others have thought through, so if I support h-recipe, then I support most of what is needed for recipes.



----------------------------------------------------------------------
-                            

how it works

-
----------------------------------------------------------------------

With these in mind, let's share a couple of example recipes I've already translated and then talk about the format and parser.

Cheddar Cheese Bread
Blackberry Wine
Bread Pudding

Immediately apparent in the format is the use of a gophermap-esque format:

  Gopher: TYPE NAME <tab> SELECTOR <tab> HOST <tab> PORT
  Recipe: TYPE NAME <tab> QUANTITY <tab> MODIFIER

Where Gopher uses different types like "0: link to raw text file," "1: link to gophermap," "h: link to HTML page," "i: informational text." This format has:

  n: Name of recipe (exactly 1).
  a: Author of recipe (1 or more).
  i: Ingredient (uses quantity and modifier, 1 or more).
  I: Instructions (1 or more).
  y: Yields (uses quantity and modifier, exactly 1).

For ingredients, the quantity and modifier are expected to be used like:

  isugar
  => sugar
 
  isalt<tab>1/2 tsp
  => 1/2 tsp salt
 
  iapple<tab>1<tab>cored
  => 1 apple, cored

The second thing that's apparent is that there's some HTML embedded in the format. This is unfortunate, but it's the easiest way to support all of the complex things that h-recipe's h-card supports. My blog already supports Gopher and HTML together, so I leveraged this for my recipe format. The key thing is that an HTML tag is prefixed by an exclamation point (!), and in Gopher mode, the tag gets removed, while in HTML mode, it stays included. This allows me to use the same source code for both the HTML and Gopher versions.

  !<span class=h-card>John Smith!</span>
 
  => HTML: <span class=h-card>John Smith</span>
 
  => Gopher: John Smith


----------------------------------------------------------------------
-                               

my code

-
----------------------------------------------------------------------

The code is barely in a format that can be considered reusable, but here it is anyways.

publish.sh
publish.awk
convert.awk

These expect to be called with a directory structure like:

  ./src  -- source "fancy" gophermap files
  ./src/gophermap  -- must be called gophermap
  ./src/test.h-r  -- name of recipe file
  ./publish.awk  -- convert "fancy" gophermap files to real ones
  ./convert.awk  -- convert recipes to "fancy" gophermap files
  ./publish.sh  -- wrap publish.awk to be easier to use

The gophermap should look like:

  iMy cool recipe
  i
  <<src/test.h-r ./convert.awk

The recipe should look like:

  nMy Cool Recipe
  aJohn Smith
  iapple<tab>1<tab>cored
  ICook the apple however you like.
  ycooked apple<tab>1

Then call it by:

  $ ./publish.sh

Now the directories dist/html and dist/gopher are generated. The dist/html file can be served with:

  $ (cd dist/html && python3 -m http.server 8080)

The gopher files can be served by any static gopher server.


----------------------------------------------------------------------
-                             

did it work

-
----------------------------------------------------------------------

I think the idea is working well enough. At least, enough to try this out on a few more recipes once I get the books I want to save. Likely, I'll end up using some sort of OCR to get most of the text in this format but with lots of hand-tuning and fixing up.

The code to process the recipes was a little more complex than I had wanted. I was really hoping for an easy 1-1 mapping of h-recipe/h-card to a tab-delimited format, but h-card is very non-trivial to encode in a normal way. One idea I considered was using something like the vCard format, but I didn't see how reusable the idea was, so I scrapped it early on.

Once I've played with this more, I'll do a real retrospective on it.


----------------------------------------------------------------------
-                             

references

-
----------------------------------------------------------------------

I found lots of resources while working on this, so here there are with a brief summary of each.

[1]: http://microformats.org/wiki/h-recipe

This is the canonical source for the h-recipe format, which is part of the microformats collection. The goal for these formats is to be easily embeddable in semantic HTML so that tools can easily parse recipes (and other data) and, ideally, they will stay up to date with the HTML document describing them.

One thing I still don't like much about the format is that there isn't a good answer to "how do I encode name, count, and modifier of each ingredient?" I think these are important qualities to maintain because it would enable building something that says "I have 4 apples, what can I make with them?" whereas now we have to settle for just searching for recipes with "apple" in their ingredient list. The older format (hRecipe) had support for this, but it hasn't been included in the new format, at least not on the main page describing the format.


[2]: https://threadreaderapp.com/thread/1105845169138647040.html

This is a discussion on the difficulties behind making cooking recipes and how people shouldn't discard all the discussion before the recipe. I'm guilty of this and have, on many occasions, balked at the fact that I have to scroll through 10 pages of text to get to the 1 page of recipe. Reading this gave me a new perspective and appreciation for the text before the recipe.

In particular, this quote:

If you've ever built something iteratively, you know this feeling: All you want to do is talk about the context and backstory for the end result. We started here at A! Then this other thing happened so we moved to B! I was listening to this music and that made me think of C!


I mean, that's the real impetus behind writing this post, so it'd be pretty contradictory to discount other people wanting to do the same.

However, this comes with one major distinction that can be made: many of these recipes sites take ages to load and constantly jump around, making it almost impossible to actually read the text before the recipe. I might read them more often if I didn't feel like I was fighting every asynchronously loaded video or the font changing for the entire site or whatever new annoyance web designers throw my way.


[3]: https://indieweb.org/recipe

A collection of other recipe formats. Most of them are just about using h-recipe manually to include recipes, which supports my idea of doing the same.


[4]: https://twitter.com/cookbook

This is a super neat concept that was linked to in [3]. The idea is to shorten recipes into a tweetable format by abusing abbreviations and using some syntax to get rid of wordy transitions between steps. The main trouble is that the online documentation of what the abbreviations mean is seemingly lost, due to the website hosting the glossary of terms being both: insecure, and linking to a Google Spreadsheet that doesn't seem to exist any more.

I think if there wasn't a single online decoder linked to, then this problem would be solved. It essentially means that no one took the time to save or duplicate the information, since one source was already too good at the time.

Thankfully, the Amazon preview for @cookbook's book includes a substantial bit of the glossary, so that could be scraped for most of the abbreviations (and buying the book could give the rest).

It would be interesting to see how this concept could perhaps extend to save existing recipes, or otherwise have a shortened view of them.


[5]: http://pin13.net/mf2/

A website that you can upload your HTML or a URL to where it can be downloaded, and see what the encoded h-recipe (and other microformat v2 data) is there. I used this to verify that I was understanding the format correctly and it saved a lot of time and headache.


[6]: http://microformats.org/wiki/recipe-formats

Some more micro recipe formats. Kind of drives the point home that many people have tackled this particular problem.


[7]: http://the-eye.eu/public/Books/campdivision.com/PDF/Computers%20General/Papers%3A%20Biotech-Programming/CompCook.html

This one is a really great source on recipes that is a good read. It introduces many different styles of writing recipes and pushes against the idea of splitting a recipe into its ingredients and cooking instructions, which I hadn't realized was a big problem until it was pointed out. Ultimately, it uses all of this to push for its format, RxOL, and its software, Cocina.

Here is an example of a recipe in RxOL format.

  <* Chicken Chasseur (Poulet Sauté Chasseur).
  *chicken, 1 kg =cut up
  { *oil *butter /and } /sauté in, till three-quarters cooked
  *mushrooms, 125 g =slice /add =cook
  *:chicken =remove
  *white wine, 100 ml /add
  *shallot, 10 g /add =reduce
  *thickened rich veal gravy, 150 ml /add
  *tomato sauce, 50 ml /add =simmer, 30 s
  *brandy, 15 ml /add
  *parsley, 10 g =mince /add
  *chervil, 10 g =mince /add
  *tarragon, 10 g =mince /add
  *:chicken /pour over
  *parsley =mince /sprinkle with>

Notably, we have a recipe title, and then a series of ingredients and operations with those ingredients. The ingredients are prefixed with a '*', unary operations with an '=', and binary operations with a '/'.

For any of these, they can be followed by a comma and some extra information (RxOL also supports a semicolon and more information). It's unclear how smart their software is with parsing this extra info, such as whether it understands "till three-quarters cooked" which seems counter intuitive compared to parsing "till 3/4 cooked."

RxOL has a sister format that it can produce (and maybe parse as well?) that looks like:

  Chicken with Rice (Pollo con Arroz).
 
  Ingredients:
       1       chicken.
       1 clove garlic.
       0.5     onion.
       4       peppercorns.
               salt.
               cumin.
               fat.
       250 g   rice. Soak, 15 min.
       2       tomatoes. Chop.
       1 L     water.
       2       sweet peppers, green. Slice.
       1       chicken liver, from chicken. Grind.
 
  Preparation:
 
  [A]  Add onion, peppercorn, salt, and cumin to garlic. Grind.
  [B]  Spread chicken with [A]. Brown in fat, lightly. Add rice.
       Brown, lightly. Add tomatoes. Sauté, 5 min. Add water and
       sweet peppers. Cover. Simmer, till done. Add chicken liver.

Annoyingly, this isn't the exact same recipe as the previous one, so we aren't able to compare them one-to-one. But we can see how everything would be structured and how this might fascillitate writing new recipes down.

Of note: each ingredient is listed with its count/measurement, followed by ingredient name, followed by some operations to do with the ingredient. I'm unclear why there is the preparation section, at least with this recipe.

One reason to add the section is because, with how RxOL is formulated, cooking a recipe involves following a tree from its leaves to its root. So when two large subtrees come together, it's good to be able to refer to them in a meaningful way. (Otherwise, the last step might read "take what you did before, with what you just did, and combine them" whereas now it's less ambiguous "take part A and part B and combine them."

However, that isn't the case with this recipe. This one could just as easily say "Add these things to garlic. Grind. Add to chicken. Brown in fat. etc." and it would be just as clear as this one.


[8]: https://open-recipe-format.readthedocs.io/en/latest/

This is another modern recipe format, designed with the goal of being able to encode alterations for recipes that happen when it is scaled up for a big kitchen. This one is built on top of YAML, and looks like this:

  recipe_name: Basic Fruit Salad
  yields:
      - servings: 6
      - servings: 18
  ingredients:
      - apple:
          amounts:
              - amount: 1
                unit: each
              - amount: 3
                unit: each
      - banana:
          amounts:
              - amount: 1
                unit: each
              - amount: 3
                unit: each

Here we have a recipe that targets 2 sizes, and has 2 different ingredients. Each ingredient has a variation on the count for each number of servings. In this case, they are both straight multiples of their original values (i.e. going from 6 to 6x3=18, they increase apples from 1 to 1x3=3), which means that this variation could have been gotten implicitly. However, duplicating this information means that, if you find larger batches to need less banana, you have the ability to change this.

This kind of change isn't as interesting to me, as I'm intending to focus solely on the home cook aspect rather than for a large scale kitchen. Additionally, the YAML makes me not want to use it because it's very hierarchial.


[9]: https://6xq.net/pesto/

This is another modern recipe format that has similar goals to mine. A recipe for this might look like:

  So let’s start by introducing Pesto by example. This text does not belong
  to the recipe and is ignored by any software. The following line starts the
  recipe:
 
  %pesto
 
  &pot
  +1 l water
  +salt
  [boil]
 
  +100 g penne
  &10 min
  [cook]
 
  >1 serving pasta
  (language: en)

This appears to take a similar "graph-based" approach, however it's unclear how it's supposed to work. It doesn't help that the main form of documentation appears to be a literate Haskell program that does the parsing. Another challenge in understanding this project is that all the example recipes are in German (except for the one above).