Parsing CSV data files with PHP, using quotesplit

I originally got this function from the comments on this page: http://us2.php.net/manual/en/function.split.php. But I recently put a bit of time into making it compatible with fields quoted with multiple quotes. This function can deal with input like:

"one"," "two"", """three"""

It will parse data that is not CSV (comma separated values) as well, just pass in a different delimiter.

I have two helper functions here, one which removes an element from the array, then rebuilds the array to re-create the array keys. The other deals with the multiple quote issue by stepping through the initial array and removing the extra rows created due to the fact that we have multiple quotes.

	
#######################################
function RemoveArrayElement($array, $removeKey)
{
   unset($array[$removeKey]);
      foreach ($array as $value)
         $return[] = $value;
   return ($return);
}
#######################################
function DealWithMultipleSurroundingQuotes($splitter, &$getstrings)
{
   for($x = 0; $x < count($getstrings); $x += 2) //foreach even key
   {
      if (!stristr($getstrings[$x], $splitter)) //if splitter is not in row
      {
         if (trim($getstrings[$x-1]) == '') //if previous row is empty
            //remove previous row
            $getstrings = RemoveArrayElement($getstrings, $x-1);
         else
            //remove current row
            $getstrings = RemoveArrayElement($getstrings, $x);

         return false;
      }
   }
   return true; //Function finished successfully!
}
#######################################
function quotesplit( $splitter=',', $s, $restore_quotes=false )
{
   # First step is to split it up into the bits that are surrounded by quotes
   # and the bits that aren't. Adding the delimiter to the ends simplifies
   # the logic further down

   $getstrings = explode('"', $splitter . $s . $splitter);

   while(!DealWithMultipleSurroundingQuotes($splitter, $getstrings));

   # $instring toggles so we know if we are in a quoted string or not
   $delimlen = strlen($splitter);
   $instring = 0;

   while (list($arg, $val) = each($getstrings))
   {
      if ($instring == 1)
      {
         if($restore_quotes)
         {
            # Add string with quotes to the previous value in the array
            $result[count($result)-1] = $result[count($result)-1]. '"' . addslashes(trim($val)) . '"';
         } else {
            # Add the whole string, untouched to the array
            $result[count($result)-1] = addslashes(trim($val));
         }
         $instring = 0;
      } else {
         # Break up the string according to the delimiter character
         # Each string has extraneous delimiters around it (inc the ones
         #  we added above), so they need to be stripped off
         $temparray = split($splitter, substr($val, $delimlen, strlen($val)-$delimlen-$delimlen+1 ) );
         while(list($iarg, $ival) = each($temparray))
            $result[] = addslashes(trim($ival));
         $instring = 1;
      }
   }
   return $result;
}

2 thoughts on “Parsing CSV data files with PHP, using quotesplit

  1. Pingback: Convert a tab-delimited file to SQL inserts

Leave a Reply

Your email address will not be published. Required fields are marked *