365DBA Blog

Database Stuff

Get Excel Spreadsheet Names by Mapping a DataSet to System.Object Variable in SSIS

I was creating an import task for a product and the source Excel workbook had a date-stamped name along with date-stamped worksheets within the book (image below) and the date-stamp was not always consistent so I couldn’t use any date conversion/ modification logic to determine what the names of the spreadsheets were going to be. In order to accomplish pulling the data from the individual sheets without knowing exactly what the names were going to be at any given point, I needed to get the names of the spreadsheets and populate variables for each spreadsheet. In my situation, I knew the number of spreadsheets that were in the workbook which helps, but with same additional preparation, I believe you could perform this with a varying amount of spreadsheets.

I did some research and did find some ways of performing this by using a ForEach Loop using an ADO.NET Schema Rowset Enumerator, but that relied heavily on the OLE DB Provider and the registered version of the providers which makes it much more difficult to configure initially and maintain if you move it to different servers and environments. In order to get around some of the issues associated with the ADO.NET Enumerator method, I started looking at using C# code within a script task to loop through the spreadsheet names and I came across the fact that the System.Object variable type was a .Net DataSet and I was using a DataSet within my script task for the list of sheets. The process is quite simple as shown below.

The spreadsheet names were similar to this:

and the relevant tasks in the package are here:

And the variables I am using:

The code for getting the DataSet of all the spreadsheet names is quite simple (in the "Populate SheetNames Variable" task), you need to first make sure that you have a variable with a Data type of Object and that the variable is set to ReadWrite. 

In your script, you need to import the OleDb namespace:

Then you pull in the list of sheets as a DataTable, add it to a DataSet, and set the variable from the DataSet:

As mentioned, I also had a variable I was using to store the location and Excel file name (FullSourcePath) and if your situation is similar, you will need to make sure that the variable is listed in the ReadOnlyVariables section of the script task.
Also, you could still use just code in the same script task to set the name of the sheet variables, but I wanted the possibility of using the list in other places within the package so went with just pulling the entire list out into the context of the entire package.

Once the Object variable is populated with the DataSet of sheet names, you can use that in a Foreach Loop container by using a Foreach ADO Enumerator and specifying your Object variable as the "ADO object source variable" like so:

In your loop, you will be be using a variable to hold the current sheet name so in your "Variable Mappings" section, you will want to specify that variable and make sure you use the index of 2 since that is the Index that holds the actual sheet name:

Once you have the current name in the holding variable, you can use that to specify your specific sheet names via several different methods. I used a script task, but you could use other methods and you could also do it differently by using a switch statement within your code, but whatever works for you is cool. In this script, I am determining if the current sheet name in the SheetName variable contains my known values for determining which sheet it is and notice that I am removing the quotes that are around the name within the DataSet. 

Once your specific sheet names are known and set, they can be used in expressions for queries later on in the package.