code-u-like

C#, .Net, Sql Server, Salesforce, Dynamics CRM etc,
Charity and Not-for-Profit sector

Tuesday, May 19, 2009

Reading a CSV file into a DataTable

If you don't share my opinion that CSV should be banned by an international convention, it might be because you've never had to write code to read a CSV file.

"Ah, but thats easy", you say, "Just use String.Split()". Well, that will work if your CSV file is an academic example that doesn't contain strings that were typed by real human beings. But real-world CSV files generally contain some or all of the following horrors:

- some strings enclosed in speechmarks, but some not
- empty fields
- speechmarks within strings
- commas within strings
- carriage returns within strings

To demonstrate, here is a CSV example that exhibits all of the traits listed above:
"First Name","Last Name","Address","Town","Postcode"
David,O'Leary,"12 Acacia Avenue",London,NW5 3DF
June,Robinson,"14, Abbey Court","Putney",SW6 4FG
Greg,Hampton,"",,
Stephen,James,"""Dunroamin"" 45 Bridge Street",Bristol,BS2 6TG
Daisy,Smith,"Flat 6,
Porchester Court,
24 Dragon Road",Newcastle,NE3 4FG
(Note that the last line has two carriage returns in the Address, which is why it sprawls over three lines. Also speechmarks within speechmark-delimited strings have to be escaped somehow; in this example they are escaped by doubling them up.)

It is possible to deal with pathological cases of CSV using carefully tuned Regular Expressions. It is also possible to use an OdbcConnection and the Microsoft Text ODBC Driver.

But needing to read a CSV file is a very common affliction - surely someone has built an open-source framework to deal with it reliably? Surprisingly, such frameworks are quite hard to come by, but I did eventually find a brilliant one:

LumenWorks.Framework.IO.Csv by Sebastien Lorien

Downloadable from The Code Project, here:
A Fast CSV Reader by Sebastien Lorien

You can download the binaries or the source, its much faster than Regex or ODBC, and it can handle cases like the mangled CSV snippet shown above. It is also open-source, under the MIT Licence.

The design is based on the StreamReader class, and Sebastien gives a few example on the CodeProject page. However the examples mostly show how to bind a CSV straight to a DataGrid. So here is an example of how to use the library to read a CSV file into a DataTable:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;

using System.IO;
using LumenWorks.Framework.IO.Csv;

namespace CSVTest
{
    class LumenWorksCSVExample
    {

        /// <summary>
        /// Example code to fetch a CSV file and turn it into a DataTable
        /// using the LumenWorks.Framework.IO.Csv library
        /// </summary>
        /// <param name="filePath">The full path of the CSV file</param>
        /// <returns>A DataTable with all-string columns</returns>
        public DataTable FetchFromCSVFile(string filePath)
        {
            DataTable csvTable = new DataTable();
            using (CsvReader csvReader =
                new CsvReader(new StreamReader(filePath),true))
            {
                csvTable.Load(csvReader);
            }
            return csvTable;
        }
    }
}
So basically, you just use DataTable.Load(). Note that this example assumes that the CSV file has a header line, and that nothing is known about the column types. Hence all columns are stored in the DataTable as strings.

If Load() is a bit too automatic and you need more control over the loading of the rows, maybe to only get certain rows or columns, or to do column type conversions, this longer example might be more useful:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;

using System.IO;
using LumenWorks.Framework.IO.Csv;

namespace CSVTest
{
    class LumenWorksCSVExample2
    {
        /// <summary>
        /// Example code to fetch a CSV file and turn it into a DataTable
        /// using the LumenWorks.Framework.IO.Csv library
        /// Note: This example uses loops instead of DataTable.Load()
        /// </summary>
        /// <param name="filePath">The full path of the CSV file</param>
        /// <returns>A DataTable with all-string columns</returns>
        public DataTable FetchFromCSVFileLong(string filePath)
        {
            char delimiter = ',';
            bool hasHeader = true;
            DataTable csvTable = new DataTable();
            using (CsvReader csvReader =
                new CsvReader(new StreamReader(filePath), hasHeader, delimiter))
            {

                int fieldCount = csvReader.FieldCount;
                string[] headers = csvReader.GetFieldHeaders();

                // this bit could be modified to fine-tune the columns
                foreach (string headerLabel in headers)
                    csvTable.Columns.Add(headerLabel, typeof(string));

                while (csvReader.ReadNextRecord())
                {
                    DataRow newRow = csvTable.NewRow();
                    // this bit could be modified to do type conversions, skip columns, etc
                    for (int i = 0; i < fieldCount; i++)
                        newRow[i] = csvReader[i];

                    csvTable.Rows.Add(newRow);
                    
                }
            }
            return csvTable;
        }
    }
}
Of course, on its own this second example is pretty pointless, but you might be able to use it as a baseline to develop something a bit more tailored.

Labels: ,

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home