Bulk data import

Bulk data import in ELMA is used to automatically transfer information from an external data source (for example, from any database or Excel file) to the main database of the ELMA system.

Data is imported into ELMA in packages. A package is the number of records that will be imported in one iteration. Package size is configured in the external data source provider.

We can depict the bulk import process as a diagram:

An external data source is a database or an Excel file from which information must be transferred to ELMA. The external data source provider allows you to present information from an external data source in the form of a flat table. Data contained in the flat table is mapped and converted to the ELMA tree structure. Then it can finally be loaded into the system. This is how each data package is imported until all the data from the external data source is transferred.

In addition, users can intervene in the data import process with the help of a handler. To do this, you need to implement an IBulkDataImportHandler extension point that contains 4 methods:

  • The bool CanUse(Type type) method is executed just once before the import process and shows whether the handler will be used.
  • The void BeforeMapping(DataTable data, BulkDataImportHandlerData handlerData) method is executed at each iteration of the import process after receiving data in the form of a flat table, but before data is mapped, and/or is used to view or make changes to the flat table before data mapping. The method contains a flat table and the mapping data object that will be passed from method to method.
  • The void AfterImportPage(IEnumerable <MapInfo> mapInfos, BulkDataImportHandlerData handlerData) method is executed after loading the mapping results after each iteration of the import process. An example of how this method can be used: the recalculation of access rights when importing documents into previously created folders. For recalculation, it is required to recalculate the access rights to each document separately, this is why this method is better to apply to a package of objects. This method contains a tree data structure and the mapping data object that will be passed from method to method.
  • The void AfterImport(BulkDataImportHandlerData handlerData) method is executed before the completion of the import process. This method can be used, for example, to recalculate access rights in CRM (this extension point is already implemented in the ELMA system). It contains an import data object. You can find an example of an extension point for recalculating access rights to CRM objects below:
    using System;
    using System.Collections.Generic;
    using System.Data;
    
    using EleWise.ELMA.Common.BulkDataImport.Handlers;
    using EleWise.ELMA.Common.BulkDataImport.Mapping.Models;
    using EleWise.ELMA.ComponentModel;
    using EleWise.ELMA.CRM.Models;
    using EleWise.ELMA.Runtime.NH;
    
    using NHibernate;
    
    namespace EleWise.ELMA.CRM.ExtensionPoints
    {
        /// <summary>
        /// Import handler for recalculating access rights to objects of the CRM module
        /// </summary>
        [Component(Order = 100)]
        public class CRMBulkDataImportHandler : IBulkDataImportHandler
        {
            private ISessionProvider sessionProvider;
    
            /// <summary>
            /// Ctor
            /// </summary>
            /// <param name="sessionProvider"><see cref="ISessionProvider"/></param>
            public CRMBulkDataImportHandler(ISessionProvider sessionProvider)
            {
                this.sessionProvider = sessionProvider;
            }
    
            /// <summary>
            /// The handler is triggered if you import mapped Contractors or Leads
            /// </summary>
            /// <param name="type"></param>
            /// <returns></returns>
            public virtual bool CanUse(Type type)
            {
                return type.IsAssignableFrom(typeof(IContractor)) || type.IsAssignableFrom(typeof(ILead));
            }
    
            /// <summary>
            /// Recalculate access rights to all CRM objects
            /// </summary>
            /// <param name="handlerData">Import data </param>
            public virtual void AfterImport(BulkDataImportHandlerData handlerData)
            {
                var session = sessionProvider.GetSession("");
                session.GetNamedQuery("ExecFillUserSecuritySetCache")
                    .SetTimeout(0)
                    .ExecuteUpdate(false);
            }
    
            // implementation of other interface methods
Attention!
  1. Bulk data import is available only for commercial editions of ELMA.
  2. Before starting the import, you must create a backup copy of the ELMA database.
  3. Bulk data import can consume a large amount of resources of both the ELMA server and the DBMS, so it is recommended to perform it during nonworking hours.
  4. It is not recommended to simultaneously run more than one import since this can lead to conflicts related to database locks.
  5. Each imported object must have at least one key field, otherwise, the unique Uid identifier field (if present) will be selected as the key field. If there is no Uid field, all records will be added as new instances of the object. There can be several key fields – these fields are used when searching for existing objects in ELMA.
  6. To avoid performance degradation when performing bulk import, it is strongly recommended to use indexed fields as key fields.
  7. If in the ELMA database there is an entry object, the key field of which matches the value from the imported table, the new object will not be created. Instead, the existing object will be updated.
  8. Before importing data into the Oracle database, make sure that key fields are limited in length, otherwise, objects will not be found and new objects will be created, which may lead to duplication of data. To check the key field of a system object, you need to open the DBMS, find the required table and field, and check its data type for length restrictions. If this is a custom object, you can check its key fields in ELMA Designer on the Objects tab. To do so, open the desired object page and go to the Properties tab.
  9. If there are no rules for a property in the mapping rules, the property will be ignored during the import. If a rule for a property is specified in the mapping rules, but the import column is not specified, the column with the same name as the property name will be searched.
  10. To analyze if the mapping is working correctly, you have to enable logging in the log4net.config file located in ../<ELMA system folder>/Web/Config:
    ...
    <logger name="BulkDataImport" additivity="false">
        <level value="OFF"/> // change the "OFF" value for the required logging type (for example, "INFO" or "ERROR")
        <appender-ref ref="BulkDataImportLog" />
      </logger>
    ...​
    The log files are stored in ../<ELMA system folder>/Web/logs/BulkDataImport.

Here is an example of how to import a list of individuals from an external source into the ELMA database, to be more exact, into the Individual object.

You can find an example of the external data source provider for the Firebird database below:

 [Component]
    public class FirebirdBulkDataImportDataSource : DBDataSourceBase
    {

        internal static string TableName = " ContractorData"; // ContractorData – the name of the DB table from which the data will be imported
        internal static long CountConst = 10;

        private bool canGetCount = false;

        public FirebirdBulkDataImportDataSource(bool canGetCount)
        {
            this.canGetCount = canGetCount;
        }

        public override int PackageSize
        {
            get
            {
                return 5; // size of the package for processing requested from an external data source
            }
        }

        public static string PathDataBase // full path to the database file 
            get
            {
                return Path.Combine(IOExtensions.GetTempPath(), "WorkDir", "commonTest.fdb");
            }
        }

        public static string ConnectionString
        {
            get
            {
                return string.Format("Data Source=localhost;Initial Catalog={0};User ID=sysdba;Password=masterkey;Dialect=3;ServerType=0", PathDataBase); // localhost – address of the DBMS from which data will be imported
            }
        }

        protected override string CommandText()
        {
            return string.Format("select first {0} skip {1} LastName, FirstName, MiddleName, ITN, Phones_String, AddressListCount, Address_Region, Address_Town, Address_Street, Address_Home from {2} order by Id", // LastName, FirstName, MiddleName, ITN, Phones_String, AddressListCount, Address_Region, Address_Town, Address_Street, Address_Home are the object property names to be imported
                PackageSize,
                StartIndex, TableName);
        }

        protected override string CountCommandText()
        {
            return canGetCount
                ? string.Format("select Count(*) from {0}", TableName)
                : string.Empty;
        }

        protected override IDbConnection CreateConnection()
        {
            return new FbConnection(ConnectionString);
        }

    }

Below you can find an example of how to use an external data source provider in order to import an Excel file:

using EleWise.ELMA.Common.BulkDataImport.DataSources;
….
var dataSource = new ExcelDataSourceBase(excelFilePath);
…

As an input parameter, the external data source provider uses the path to the Excel file that contains the flat data table that will be imported.

The limitations imposed by the standard Excel provider are described below:

  • during the import procedure, it is impossible to edit files (i.e. files cannot be deleted/renamed/transferred);
  • only the first row can be used for column names, the values in the unnamed columns (the values in the first row) are ignored;
  • when importing, only one sheet of an Excel file is used.

Download this file with an example of how an external data source provider converts data into a flat table.

A flat table is a table with data. Rows of this table are imported objects whose types are defined by mapping classes, whereas columns are properties of these objects, as well as properties of other objects, if they are associated with an imported object and must be imported with it. For example, when importing documents, each document will correspond to a row. Each column will contain the properties of a document. If a property is complex (it is an object or a list), all its properties should also be located in the same row of the flat table in certain columns. Flat table rows can contain columns with additional data, for example, the number of objects in the list property. You can add all additional data and properties to one column and convert them in the mapping class to minimize the number of columns. We do not recommend this option because it implies extra expenses for the calculation of property values, so it is better to stick to the rule ‘one property – one column’. Simple lists with a large amount of data might be the only exception. For instance, a list of phone numbers: it’s easier to place all phone numbers in a row, separating them with a comma, and break such a row into phones during mapping.

After that mapping rules for changing data into a tree structure should be specified for this table.

In this case, the AddressListCount column is used as a service variable indicating the number of addresses for one contractor. This is how this service variable allows you to organize the mapping of properties with the many-to-many relationship type.

Here is an example of such mapping rules:

private MapBuilder<IContractorIndividual> GetMapBuilder()
        {
var mapper = MapBuilder<IContractorIndividual>.Create(null);

            return mapper
            .Rules(rule =>
            {
rule.Property(p => p.ITN) // property mapping named ITN  
		.Key() // we choose the ITN property as the key property 
.Column("ITNCode"); // We take the ITNCode column as the data source for the property. If the property name and the column name in the table are the same, this row can be omitted. If the names of object fields and/or the fields of the property-object are the same, this string is required.
               
                		rule.Property(p => p.LastName)
                		.Key()
                		.Action(info => info.Row["LastName"].ToString().ToUpper()/*If required, we perform data formatting*/);

 		rule.Property(p => p.FirstName);
                		
		rule.Property(p => p.MiddleName)
                                   .Column("Patronymic");

rule.Property(p => p.DocumentSeries);

rule.Property(p => p.DocumentNumber);

rule.PropertyObject(p => p.DocumentType) // We perform mapping of the Document Type property that has the “Object” data type (Client Document Type object)
		.Rules(documentTypeRules =>
		{
			documentTypeRules.Property(p => p.Name)
			.Column("DocumentType_Name")
			.Key(); // we choose the Name property as the key property 
		});

               		rule.PropertyObject(p => PostalAddress, a => { return a.Row["AddressListCount"] == DBNull.Value ? 0 : Convert.ToInt32(a.Row["AddressListCount"]); })
                		.Rules(ruleAddress =>
// we write mapping rules for the properties of the internal object "Address" 
// choose all properties of an object as the key properties because only the combination of all fields is unique 
               		{
                   		ruleAddress.Property(p => p.Building)
			.Key()
                    		.Action(info => info.Row["Address_Home" + info.Index.Value]);

                    		ruleAddress.Property(p => p.Street)
			.Key()
                    		.Action(info => info.Row["Address_Street" + info.Index.Value]);

                    		ruleAddress.Property(p => p.City)
			.Key()
                 .Action(info => info.Row["Address_Town" + info.Index.Value]);

                    		ruleAddress.Property(p => p.Region)
			.Key()
                    		.Action(info => info.Row["Address_Region" + info.Index.Value]);
                		 });
            });
  });
Attention!
Note that when importing data, validation for required fields and regular expressions, as well as checking for the type of imported data and the null value do not occur. However, the rules imposed by the database structure work, for example, if a field in a database has length restrictions, an error may occur if this restriction is violated in the input data.

As a result of the mapping, we get the following structured information:

The ContractorIndividual object consists of the following properties:

  • ITN – string, key field;
  • LastName – string;
  • FirstName – string;
  • MiddleName – string;
  • DocumentSeries – string;
  • DocumentNumber – string;
  • DocumentType – an object consisting of the property:
    • DocumentType_Name - string, key field;
  • PostalAddress – a list of objects consisting of the properties:
    • Building - string, key field;
    • Street - string, key field;
    • City - string, key field;
    • Region - string, key field.

At the stage of loading, the mapping results are transferred from the RAM to the ELMA database.

The script block for starting import looks as follows:

public virtual void StartImportAction(Context context)
        {
            var mapBuilder = GetMapBuilder();
            var dataSource = new ExcelDataSourceBase(context.DataSource.ContentFilePath);
            // we execute import according to the specified data source and mapping, and get the import process identifier workUid
            context.WorkUid = PublicAPI.Services.BulkDataImport.Import(dataSource, mapBuilder);
        }

To analyze the import process execution in a different script block, implement the following code:

public virtual void CheckImportProcess(Context context)
        {
               var progressInfo = PublicAPI.Services.BulkDataImport.GetImportProgressInfo(context.WorkUid.Value);
            context.ProgressInfo = new JavaScriptSerializer().Serialize(progressInfo);
           // if the service returns mapping results for the specified workUid identifier, the import is still running
            context.Success = progressInfo == null;
        }

Thus, information from an external data source will be imported into the main database of the ELMA system, to be more exact, into instances of such objects as Individual (ContractorIndividual), Client document type (ClientDocumentType), and Address (Address).

Attachments