--- Synthesizing and dating phylogenies using the Open Tree of Life, SSB UNAM 2023: Jan 13, 2023

Synthesizing and dating phylogenies using the Open Tree of Life, SSB UNAM 2023

Jan 13, 2023

10 am - 2 pm

Instructors: Mark Holder, Emily Jane McTavish, Luna Luisa Sanchez Reyes, Ben Redelings

We have borrowed the Carpentries website template and installation instructions. This is not a Carpentries workshop (although we highly recommend them!). Learn more about the carpentries at https://carpentries.org/

See https://opentreeoflife.github.io/SSBworkshop2023/ for the schedule of the full workshop. A video demo following these instructions is available at https://youtu.be/7LAGjowolmU.

The Open Tree of Life "synthetic tree"

The Open Tree of Life project produces a supertree estimate of the full tree of Life. This tree has been called the Open Tree “synthetic tree”, but “supertree” would be a more specific name.

A supertree is a phylogenetic estimate that uses input trees as the data source for the estimation. Typically the input trees are not taxonomically complete - they each only contain a subset of the full collection of species. However the output tree contains all of the species mentioned in any of the input trees.

See https://peerj.com/articles/3058/ (and subsequent papers by Redelings and Holder) for the details of how the supertree is constructed. The most important features to be aware of are:

  1. The algorithm builds a synthetic tree by adding as many groupings from the input trees as it can.
  2. The order of the input trees matters. Addition of grouping occurs according to the order of trees in the input list.
  3. The Open Tree Taxonomy is used as the last input tree - so the output will be as comprehensive as the Open Tree Taxonomy.

"Custom synthesis"

In this workshop, we will use a new (and relatively untested) interface that allows you to apply the Open Tree of Life synthesis pipeline to a set of trees of your choosing.

The fundamental steps are:

  1. Store an (ordered) list of input trees in a "collection"
  2. Choose the root taxon for the analysis (Please do not conduct analyses of over 10,000 species during the workshop, because we will all be sharing computational resources).
  3. Submit those inputs to the server
  4. Download the results

#1 Creating an input collection

The input trees have to be in curated in the Open Tree of Life’s corpus of published trees. Fortunately, you just had a tutorial on how to add trees to that database.

1A. Find input trees

If you know what trees you want to include, just make sure that you know the “Study ID” and “Tree Name” for each tree. Or you can conduct a search for trees that include a taxon.

To search you can use the search box on https://tree.opentreeoflife.org/curator/, though that will only return studies tagged with the taxon name in some crucial fields.

To do a more thorough search, you can:

  1. Navigate to https://tree.opentreeoflife.org/taxonomy/browse and use the search box to find the taxon you want. Write down its OTT ID (this ID is in the "Taxon details" section of the page).
  2. Use the Open Tree web-service API to search for trees that include that taxon. For instance I found that the taxon Primates has the OTT ID of 913935). So (from a terminal) I can run the command:
    curl -XPOST https://api.opentreeoflife.org/v3/studies/find_trees -H "content-type:application/json" \
      -d '{"property":"ot:ottId", "value":913935, "verbose":true}'
    
    to find trees that include primates.

For this exercise: Make sure you know the (Study ID, Tree Name) pairs for at least 2 trees. Note: if you use the API search, the “Tree Name” is the “@label” field of the response.

1B. Create a new collection

Click on your user name at the top-right of https://tree.opentreeoflife.org/curator to reveal the menu and choose “My collections”.

screenshot of the top banner of the curator app, showing the drop-down menu from the user-name

This will take you to your curator’s profile, which has a URL similar to: https://tree.opentreeoflife.org/curator/profile/mtholder#collections except with your username in place of mtholder.

Click “Create new tree collection” button to create the list of trees to use. Choose a simple name (just letters and number) for the collection.

Save the collection. This should redirect you to a full-screen page for adding trees to the collection.

Add trees by repeatedly:

  1. Clicking on the "Add a tree to this collection" button
  2. Clicking on the magnifying glass button near the "by matching a study" prompt.
  3. Typing in the study ID (starting with pg_ or ot_) in the box
  4. Clicking on the study citation when it appears by the box
  5. Choosing the tree name from the drop down menu
  6. Clicking on the "+" button

Reorder your trees using the rank column, if you added them in an order that you don’t like.

Save the collection (“Save Collection” button near the top).

#2 Choose your root taxon

Use https://tree.opentreeoflife.org/taxonomy/browse to find the Open Tree of Life Taxonomy’s name for taxon that is of interest to you.

Presumably all of the trees that you put in the collection would contain some members of this taxon, but the taxon need not include the entirety of the input trees. The synthesis algorithm will trim the input trees down so that only portions relevant to this taxon will be used.

Please do not conduct analyses of over 10,000 species during the workshop, because we will all be sharing computational resources

Note the OTT ID of the taxon, too.

#3 Launch a custom synthesis run

We have not fully tested the custom synthesis procedure yet, so it is on a testing server at the moment.

Direct your web browser to https://ot38.opentreeoflife.org/v3/tree_of_life/launch_custom

  1. Select your user name after the "Collection to use:" label
  2. Select the name of the collection that you want to use. If you don't see your collection: make sure that you saved it, and use the "Refresh list of collections" button
  3. Start to type the root taxon name in "Search for name" and pause after the first few letters to allow it to autocomplete the name. You may have to add more letters to narrow down the options.
  4. Select the name of the taxon when you have found it in the autocomplete list. This should fill in the OTT ID field.
  5. If everything looks correct, click the "Request build for..." button only once

That should redirect you to a page (with a URL that starts with https://ot38.opentreeoflife.org/v3/tree_of_life/browse_custom) which highlights the synthesis run you just requested.

You can use the “Refresh” button to update the status. Hopefully, it will eventually go to “COMPLETED.”

If the status changes to “REDIRECTED” it means that the server detected the same root taxon and the same set of input trees in another run, so it will give you a link to that previous run (rather than recomputing the synthesis).

This is still a testing service. So, if the status changes to “FAILED”, please let us know by filling out the fields of a new issue with the synth_id that failed in this GitHub issue tracker.

#4 Understanding the results

If the status of you run made it to “COMPLETED” or “REDIRECTED” then you should see a “download” link appear.

Download the archive.

On some platforms, you should be able to double-click to extract the downloaded archive. The archive has a tar.gz extension, because it is a directory that was collected into 1 archive using tar and then compressed using gzip.

If double-clicking doesn’t work for you, then open a terminal, and navigate to the Download directory containing the archive using the cd command.

Then use the tar xfvz with a filename, to unpack the archive. For example, I used (from a new terminal session):

  cd Downloads
  tar xfvz mtholder_prim2_913935_tmpsr44rdvv.tar.gz

to unpack my results.

The final tree is in the labelled_supertree subdirectory.

You can open the index.html file at the top of each directory of the archive in a browser to see explanations of the contents of that directory.