Note
|
Familiarize yourself with the use case scenario. |
You have a choice for Use Case II between two scenarios:
-
Invasive species checklist
-
Lepidoptera sampling
Your choice for Use Case II will be graded.
Sampling of Lepidoptera across Countries
This narrative was developed as a basis for practical exercises in the biodiversity data mobilization course and the exercise concept and content was developed by Alberto González-Talaván, based on previous work by Alberto González-Talaván, Danny Vélez, Larissa Smirnova, Laura Russell, Mélianie Raymond and Nicolas Noé. It is a fictionalized scenario and is meant only for instructional purposes.
The International Butterfly Amateur Network (IBAN) has been providing a framework for national amateur observational groups to capture data about the occurrence of butterflies (Lepidoptera) since 2009. An extensive network of amateur observers use a standard protocol based on Pollard walks to capture this information on paper sheets that they send to their national office. Some of these offices digitize this information into spreadsheets, but others do not have the human resources to do this and they send the paper logs to the IBAN for processing. IBAN produces an annual report based on the sightings provided by these national members, with updated distribution maps and analysis of population trends for some key species.
The IBAN headquarters is mainly staffed with volunteers. With the increasing popularity of citizen science and the general interest in butterflies as a charismatic group of organisms, more and more data are received every year and the paper data sheets quickly pile up undigitized. The IBAN steering committee is trying to identify a more efficient and agile workflow for the creation of digital data because they would like to start publishing these data online regularly. They would also like to start processing digital pictures that their volunteers are already capturing with mobile devices like phones and tablets. Their ultimate objective is to raise the profile of the network and strengthen collaborations with local and regional governments to influence conservation policies for Lepidoptera in the countries involved.
There is currently no formal agreement between IBAN and the amateurs capturing data, to cover the ways in which the data can be used, for example. The steering committee has some concerns that when they start publishing the data online, they will have to formalize this arrangement.
The recommended protocol —Pollard walks— is based on transects that range between 300 and 600 m in length, divided into 50 m sections. Each transect should cover a single habitat type.
In each visit, transect-walkers have to count all species of Lepidoptera that can be seen within 5 m of the transect line. Special behaviours (egg laying or nectaring), as well as developmental stage (e.g., larvae or eggs), should be recorded as well.
For most countries, these sampling efforts happen once every two weeks from the beginning of October to the end of June.
There are quality control measures in place: every reported record is flagged "Pending approval". Record status is only changed to "Approved" after verification by a designated taxonomic expert. Species spotted out of their regular season or distribution area are flagged for additional verification.
Time of day and weather conditions are recorded at the beginning of the transect. Along the transect, the number of individuals of every species seen is counted. Un-identified species are counted and recorded either by family or as a predefined complex of two or three similar species. Butterflies seen outside the 5 meter range are recorded as “Extra+the number of the nearest section” (e.g. 5-extra). The end time of the transect is also recorded.
Some national offices use groups of volunteers to digitize the paper logs and produce digital spreadsheets. The spreadsheets are very simple and include three datasheets. One captures the information linked to the sampling efforts, the second the weather conditions and the third the species encountered and the number of individuals observed by the amateur.
Download the exercise sheet. (MS Word, 342 KB)
Planning
The volume of analogue data (paper logs) arriving at the IBAN headquarters will soon exceed their capacity to digitize, so the steering committee has decided to reconsider the current approach to this area of their work which has grown unmanaged for the last few years. To date, this is how work has been organized:
-
The paper logs arrive via post. The secretary opens the packages and collates the logs.
-
There are five volunteers with basic computer skills using two shared computers to digitize the paper logs. These volunteers are also citizen scientists themselves, so they are familiar with the taxonomy of the order Lepidoptera, and with the species occurring in the country where the headquarters of IBAN are located.
-
The digitizers come and go whenever they have time so they usually check for computer availability via phone. Sometimes there are time clashes and some have to go home as the two computers are busy, and sometimes the two computers are unused.
-
When they digitize, they usually pick one paper log at a time from the pile, and digitize it (if they can). Common problems that occur are that:
-
the digitizer does not know the species observed (misspellings occur),
-
the digitizer does not know the area where the sampling has occurred,
-
the digitizer cannot read the handwriting or the language in which some of the comments are written.
-
-
A single taxonomic expert gets all the digitized tables and produces the report and distribution maps based on them. Normally she needs to discard around 15% of the digitized data because of inconsistencies, misspellings or other errors that she does not have the time to check.
Analyze the financial component of their new digitization plan
The steering committee is analyzing the following options for their new digitization plan, all of which have financial implications on their already reduced budget. They know they can only implement TWO of these options, so they need to choose wisely. Use the exercise sheet to provide a recommendation on which two options they should select and explain why you chose them.
-
Option 1: Buy three more computers so all digitizers can work simultaneously.
-
Option 2: Offer financial support to the national offices to buy flatbed scanners and send/share the logs electronically instead of by post.
-
Option 3: Offer financial compensation to the digitizers. They will not be able to pay all five of them the equivalent of a regular salary, but could cover the costs of part time positions for three of the volunteers.
-
Option 4: Purchase existing biodiversity digitization software in English, which comes with taxonomic entry check and in-built aids to correct geographical information.
-
Option 5: Contract a software development company to develop customized digitization software. For the same price of the commercial software, the developers will provide a solution in the local language, which will match the original data schema perfectly and will also provide a web data portal to expose the results of the digitization effort.
-
Option 6: Organize a course for the five digitizers to improve their skills in taxonomy, computer use and biodiversity informatics standards.
Assign roles
These are the human resources available for this digitization effort. How would you assign roles to maximize the efficiency of the digitization process and produce data of the highest quality possible? Use the exercise sheet to provide your answers.
-
Administrative assistant. No taxonomic knowledge. Basic computer use. Can read 3 languages.
-
Volunteer 1. Basic taxonomic knowledge. Basic computer use.
-
Volunteer 2. Basic taxonomic knowledge. Basic computer use.
-
Volunteer 3. Basic taxonomic knowledge. Basic computer use. Can read 3 languages.
-
Volunteer 4. Basic taxonomic knowledge. Basic computer use. Can read 3 languages.
-
Volunteer 5. Basic taxonomic knowledge. Advanced computer use (including GIS and data analysis tools).
-
Taxonomic expert. Advanced taxonomic knowledge. Advanced computer use (including GIS and data analysis tools).
Data capture
Imagine you are one of the volunteers digitizing the paper logs received at the IBAN headquarters. You have received two paper logs.
-
Download logs 1 and 2 UC2-LS-2-ForCapture.zip. (943 KB)
-
What data structure would you use to reflect the data in these logs?
-
Create a spreadsheet using this structure and the data from the logs.
-
Use the exercise sheet to provide your answers and submit the spreadsheet.
Data management
Taking the role of one of the volunteers with advanced computer skills, imagine you have been assigned the responsibility for data quality issues. Your main task is to reduce the amount of data that is currently discarded (around a 15%) before processing due to errors and inconsistencies. You have received a dataset as the raw product of the digitization effort.
-
Download UC2-LS-3-ForCleaning.xlsx. (44 KB)
-
Evaluate the dataset and identify which types of errors are present.
-
Identify possible ways to correct those issues, and perform those corrections for as many of the errors present as you can.
-
Use the exercise sheet to provide your answers and submit the spreadsheet.
Data publishing
For this exercise, you will take the role of the taxonomic expert collaborating with IBAN at their headquarters. Some of your previous responsibilities (writing the annual report, and producing the base distribution maps) have been handed over to the volunteers, and you have now been given a new responsibility: publishing the cleaned data online through the GBIF network. The volunteer in charge of data quality has provided a dataset to be published.
-
Download UC2-LS-4-ForPublication.xlsx. (58 KB)
-
Use the previously provided IPT installation to publish the given dataset.
-
Use the exercise sheet to provide your answers and link to the published dataset.