Assembly¶
XML |
Chado Base Table |
Chado Column |
|---|---|---|
AssemblyName |
analysis |
name |
AssemblyDescription |
analysis |
description |
|
analysis |
program |
|
analysis |
programversion |
N/A |
analysis |
algorithm |
BioSampleAccn |
analysis |
sourcename |
N/A |
analysis |
sourceversion |
N/A |
analysis |
sourceuri |
SubmissionDate |
analysis |
timeexecuted |
Stats |
analysisprop |
type/value |
FtpSites |
analysisprop |
type/value |
SpeciesTaxid |
organism |
chado.analysis_organism |
RsUid |
dbxref |
chado.analysis_dbxref |
GbUid |
dbxref |
chado.analysis_dbxref |
WGS RS_BioProjects/GB_BioProjects |
project |
(Chado table doesnt exist yet) |
BioSampleID |
biomaterial |
(Chado table doesnt exist yet) |
RefSeq_category |
analysisprop |
rdfs:type (sets the analysis type) |
Note that the program and program version are not found directly in the XML. Instead they are extracted from the FTP attribute.
Analysis type¶
The analysis table has no type_id column. The type is therefore set with the rdfs:type property.
The RefSeq_category tag is used to determine the analysis type. Currently, only the value representative genome is supported and mapped to the bundle Genome Assembly (operation:0525), via the type value ‘genome_assembly’. We have thus far come across no other values for this key in the database.
Is an assembly a Chado analysis or project?¶
This is still an open question. This module maps NCBI Assemblies into chado.analysis, but it may split the NCBI assembly record into an analysis and project in the future. This is because the current definition of a Chado analysis is a single program run. Assemblies are typically many programs run in a pipeline.
Undecided mappings¶
We don’t currently know how we will map analyses to biomaterials in Chado. BioSamples that are listed in Assembly records are therefore ignored currently.