Import
This describes the state of the import code as of 5.51. This is for a common understanding of the code in core.
The Import framework consists of:
- The form layer
- The datasource object
- The parser object
- The user job
- The import table
- The queue
The form layer
There are 6 imports in core (contact, participant, contribution, membership, activity and multiple-field custom data) and each has the following forms:
- DataSource - responsible for initiating the datasource. During postProcess
- the userJob is created
- the datasource is initialised with the data in it being moved to the import table
- MapField - responsible for storing the mapping to be used to map the datasource
to the import fields. During postProcess the
validateValues
function on the Parser is called and the import table is updated with the results. - Preview - responsible for presenting the results of validating the data, permitting download of rows that failed validation. On postProcess this sets off the import
- Summary - this used to be the last step in the flow. However, the summary is now a standalone form that the user lands on, loading the data from the UserJob and import table. (only the contact summary form is now used and as of writing removing the others is a todo).
DataSource
The Datasource is responsible for getting the managing the source data and the import table. Currently there are
two classes CRM_Import_DataSource_CSV
and CRM_Import_DataSource_SQL
and only the Contact import
offers a choice between them. Each datasource needs to offer the following functions:
getInfo()
- declare the source title and permissionsbuildQuickForm
build the ajax snippet that adds the fields specific to this dataSourcegetSubmittableFields
get a list of the fields added via the quickForm. (in practice thesubmittableFields
property is set and the parent getter is used)initialize()
- this function takes the submitted datasource information and transfers it to the import table.
The parent class offers additional functions
setUserJobID
getUserJobID
getColumnHeaders
getColumnCount
getRow
- retrieve a row from the import table - can be adjusted with setLimit
, setOffset
and setStatuses
updateStatus
upddates the status for the given row in the import table.
purge
The import table
The import framework expects the data to be import to be transferred to a table. Some tracking
fields are added to this table. The tracking fields all have leading underscore and are not
returned using getColumnHeaders
:
- _id - row number
- _status - see further down the page
- _status_message
- _entity_id - eg. the imported contact ID - not yet exposed in the UI.
- potentially other import-specific fields. This is used in contact import to store the number of related contacts created - but not displayed in the UI.
The import table persists beyond the import which potentially can allow troubleshooting or auditing. The intention is to add a cleanup process for the import table and user jobs but for now they are dropped during cache clear if more than two days old, or during upgrade.
The following shows interrogation of the datasource during a test to check on the outcome
$dataSource = new CRM_Import_DataSource_CSV($this->userJobID);
$dataSource->setOffset(4);
$dataSource->setLimit(1);
$row = $dataSource->getRow();
$this->assertEquals(1, $row['_entity_id']);
$this->assertEquals(1, $dataSource->getRowCount([CRM_Import_Parser::VALID]));
CRM_Import_DataSource
class. The keys are
the constants and the values are the values stored in the _status
field. Note that
a status might mean 'valid' and also mean 'soft_credit_imported'.
protected function getStatusMapping(): array {
return [
CRM_Import_Parser::VALID => ['imported', 'new', 'soft_credit_imported', 'pledge_payment_imported'],
CRM_Import_Parser::ERROR => ['error', 'invalid', 'soft_credit_error', 'pledge_payment_error'],
CRM_Import_Parser::DUPLICATE => ['duplicate'],
CRM_Import_Parser::NO_MATCH => ['invalid_no_match'],
CRM_Import_Parser::UNPARSED_ADDRESS_WARNING => ['warning_unparsed_address'],
CRM_Contribute_Import_Parser_Contribution::SOFT_CREDIT_ERROR => ['soft_credit_error'],
CRM_Contribute_Import_Parser_Contribution::SOFT_CREDIT => ['soft_credit_imported'],
CRM_Contribute_Import_Parser_Contribution::PLEDGE_PAYMENT => ['pledge_payment_imported'],
CRM_Contribute_Import_Parser_Contribution::PLEDGE_PAYMENT_ERROR => ['pledge_payment_error'],
'new' => ['new'],
];
}
User Job
The user job corresponds to a row in civicrm_user_job
and is used to track the import job.
It holds metadata about the job including
submitted_values
- what was submitted on the formsdataSource
- dataSource metadatatype_id
- used to determine theParser
class.
Parser
This class does the importing. A lot of the work is done by the parent class, based on the
metadata for the class. There is a bit of ambiguity currently around when it is set but at this
stage both calling setMetaData
and init
should ensure it is set. The key functions called
from outside the class are
Field presentation functions:
getAvailableFields()
- get a [name => label] array of fields that can be imported to in this importgetDataPatterns()
- gets an array like [name => dataPattern] (a re-keying of the metadata) for attempting to guess the field if column headers are not used.getHeaderPatterns()
- gets an array like [name => dataPattern] (a re-keying of the metadata) for attempting to guess the field if column headers are used.-
getMappedFieldLabel($mappedField)
- gets the label to present on the preview screen from the mapped field. This may include location, phone type. -
validate
- this is called duringPostProcess
fromMapField
. It runs through all the fields in the dataSource and runs validation on them. The validation is limited to what can be checked without row-specific database queries. ie. it will validate the option value or date is valid but it will not do a lookup oncontact_id
. Validate sets the status runImport
- imports all rows with a status of 'new', updating the import table with the outcome.
Internally the validate
and import
functions rely a lot of these functions
getMappedRow(array $values)
- this receives a row from the DataSource
and maps it to
and api-ish $params
array. It does this by retrieving the saved mapper
submitted on
the MapField
form from the UserJob
and passing it through the parser-specific
getFieldMappings
(for Contact the values are parsed into locations and relationship, the
others are mostly generic).
getTransformedValue
Before returninggetMappedRow
Each field is parsed throughgetTransformedValue
which validates the value against field metadata. This function explodes serialized values to arrays and converts any option values to their id. Invalid values are set toinvalid_import_value.
Note that option values are matched in a case-insensitive way with labels, names or ids.getFieldMetadata
- gets the metadata for the specific field. Does some extra loading in some cases.
Obsolete methods
run
summary
preview
encloseScrub
errorFileName
And see https://github.com/civicrm/civicrm-core/pull/23721#issuecomment-1149616759
Thoughts/ todos:
- Make
skipColumnHeaders
the default - Move
fieldSeparator
from the DataSource form to the form declared byCRM_Import_Datasource_CSV
- Why not allow creation of contacts from (e.g) contribution import
- Add batch size
- Would be really good to be able to be able to specify defaults (e.g default source ie 'my spreadsheet'). It might be in the csv or not (e.g not have to add a currency column to say 'all of these are AUD')
- The UserJob type id is currently hard coded as an option list in the UserJob BAO. It
includes the ParserClass so the
runImport
function cannot currently be used with a non-core parser but that feels like it will be needed for the two existing import extensions.