Assembly¶

Assembly to Chado.analysis mappings¶
XML	Chado Base Table	Chado Column
AssemblyName	analysis	name
AssemblyDescription	analysis	description
`# Assembly method:` (from FTP)	analysis	program
`# Assembly method:` (from FTP)	analysis	programversion
N/A	analysis	algorithm
BioSampleAccn	analysis	sourcename
N/A	analysis	sourceversion
N/A	analysis	sourceuri
SubmissionDate	analysis	timeexecuted
Stats	analysisprop	type/value
FtpSites	analysisprop	type/value
SpeciesTaxid	organism	chado.analysis_organism
RsUid	dbxref	chado.analysis_dbxref
GbUid	dbxref	chado.analysis_dbxref
WGS RS_BioProjects/GB_BioProjects	project	(Chado table doesnt exist yet)
BioSampleID	biomaterial	(Chado table doesnt exist yet)
RefSeq_category	analysisprop	rdfs:type (sets the analysis type)

Note that the program and program version are not found directly in the XML. Instead they are extracted from the FTP attribute.

Analysis type¶

The analysis table has no type_id column. The type is therefore set with the rdfs:type property.

The RefSeq_category tag is used to determine the analysis type. Currently, only the value representative genome is supported and mapped to the bundle Genome Assembly (operation:0525), via the type value ‘genome_assembly’. We have thus far come across no other values for this key in the database.

Is an assembly a Chado analysis or project?¶

This is still an open question. This module maps NCBI Assemblies into chado.analysis, but it may split the NCBI assembly record into an analysis and project in the future. This is because the current definition of a Chado analysis is a single program run. Assemblies are typically many programs run in a pipeline.

Undecided mappings¶

We don’t currently know how we will map analyses to biomaterials in Chado. BioSamples that are listed in Assembly records are therefore ignored currently.