Developer / Tutorials / Scifeon App / Loading a Custom Dataset

Loading a Custom Dataset

Last updated on 2-Oct-2020 by Jakob Jakobsen Boysen
Jakob Jakobsen Boysen

Platform Lead
boysen@scifeon.com
, Frej Nøhr Larsen
Frej Nøhr Larsen

Bioinformatician
larsen@scifeon.com

1. The Dataset

The dataset can be downloaded by clicking here and see the plugin documentation here.

The dataset describes a pH measuring experiment: Samples are taken from a solution every hour and the pH-value is measured. The results are placed in a Microsoft Excel file as a datamatrix with a sample on each row.

We would like to be able to upload this file format to Scifeon and create entities for the samples and the experiment.

2. Function Defining Decorators

There are many ways to customize your Scifeon instance. To tell Scifeon what exactly we are modifying, we set a decorator at the beginning of the script.

A decorator is a wrapper for your code and it tells Scifeon where and how to use your script.

There are currently two essential decorators: The @route decorator, which defines a new page on the website (more on that later), and the @scifeonPlugin decorator which adds additional functionality to already existing pages in Scifeon (a "plugin").

A simple plugin is defined below:

The @scifeonPlugin is imported and added to the beginning of the file. The decorator takes a number of parameters, but mandatory ones are a name and a type. The type decides what kind of plugin we are making. The list of possible plugins are available in the PLUGIN_TYPE interface which is also imported to the file.

As we are making a data loader, we use the PLUGIN_TYPE.DATA_LOADER type.

Another important parameter for the decorator is the match parameter. This is a method which decides whether a given plugin is available or not. As we are making a data loader plugin, we (ultimately) only want it to show up when we are uploading a dataset in our specific format. Leaving it out will make it evaluate true no matter the dataset type, which is fine while we are simply trying out how to make the data loader.

Let's start by making a class with an init() method:

The init() method is part of the Aurelia (the software used for the Scifeon framework) lifecycle and is called as soon as the element is loaded.

3. Accessing File Data

To access the file data, we are using Scifeons data upload page. Open your Scifeon instance and click the button with a levitating arrow in the side panel:

You can then upload all kinds of things to Scifeon by drag-and-dropping the files onto the marked area. Or you can browse your system by clicking the Select.. button.

Once the files are selected for upload, their data will be available to your data loader. To read them, import the FileContext plugin:

The file data can now be accessed and, for instance, be printed to the console by changing the init() method in your data loader a little:

The context object contains information about the file(s) selected for upload. This is both meta data such as creation date and file size, and the data in the file.

In the case of an Excel file, information on the data cells can be found in the wb (workbook) property of the fileInfo object.

4. Workbook Data

The workbook property contains information on the Excel file marked for upload. To reach the data matrix itself, we open the Sheets property. This property contains a list of the worksheets in the file. If you are using the given demo file, there should be a single worksheet called ScifeonDemoSamples.

This property contains all of the cells in the worksheet in several representations. To access the data of, for instance, cell D3, type ...Sheets.ScifeonDemoSamples["D3"].v where the triple dots represent the rest of the property chain. The .v property contains the cell in its Excel data type (integer, text, etc.) which is directly translated to the TypeScript equivalent.

5. Iterating through the Cells

To access each cell of the sheet, we then iterate through all of the properties of the ScifeonDemoSamples object.

We used the !ref property which contains the border values for the Excel sheet to create an upper bound on a for-loop running through each data row.

The !ref property contains both letters and numbers. Since we already know the number of columns for our data, we only extract the number of rows with the regex expression "/\d+/". We can then iterate through row 3 to limit, accessing each data point.

6. The Scifeon Datamodel

Before we start writing to the database, let's take a step back and think about what kind of data we want saved and how it translates to the Scifeon datamodel.

Scifeon consists of predefined entity classes all taking different parameters. A closer look at these can be seen on the datamodel page (requires a running Scifeon instance) in Scifeon.

For the experimental dataset we would like to create entities in Scifeon, both for the samples defined by the set, but also an experiment entity with a laboratory step for the samples to belong to.

Moreover, samples are a representation of a laboratory entity, moving from step to step, and do not contain result values. Instead, we generate a result set entity with result values to save these.

Thus, we end up with the following list of entities we wish to create for each experiment:

  • 1 Experiment
  • 1 Step
  • 1 ResultSet
  • 1 Sample for each row in the Excel file
  • 1 ResultValue for each row in the Excel file

7. Generating Scifeon Entities

We make a list of entities that will be saved to the database later. We can then push the single elements as we create them. Here's an example of how to make the experiment and step entities:

The class of the entity is decided by the eClass property. Different classes have different mandatory properties, but an ID is required for all entities. To make ID generation easier, Scifeon has automated this with the "#" notation. This notation allows us to simply pass a string such as "#expID" to the database, and Scifeon will automatically update it to an eligible ID.

This allows us to generate relations between entities. A step entity, for instance, requires an experimentID. By using the same ID notation for this property as was used for the experiment ID, they will match in the database aswell.

To generate the samples, we loop through the rows of the Excel sheet:

The "ResultValue" class has several ways to save the data depending on the data type and the preference of the developer. Here we simply use the valueText field. If you want to connect the result values to the samples, you can use the subjectID and subjectClass properties.

Remember to also create a "ResultSet" entity connecting the step entity with the result values.

Once we have generated all of the required entities, our entity list contains 43 entities.

8. Saving Entities to the Database

Congratulations! You've made all of the entities necessary to generate a complete experiment with results in Scifeon.

To save the entities to the database we use a special function that interact with the Aurelia framework to trigger when the upload button is pushed.

The function looks like this:

The list we made earlier is wrapped in an object which is sent to the upload functionality within Scifeon. If it works, you should see something like the following picture when you click the button: