add details about how the CSV plugin code works under the hood

springmeyer 2012-08-31 12:06:19 -07:00
parent 95336ca03a
commit 0b9537e614

@ -5,3 +5,33 @@ This plugin can read tabular data with embedded geometries. It auto-detects colu
This plugin reads the entire file upon initialization and caches features in memory so it is extremely fast for rendering from after initial startup (for reasonable size files under 5-10 MB).
For more details on the motivations and design of this plugin see: https://github.com/mapnik/mapnik/issues/902
Details on auto-detection:
### headers
The plugin requires headers be present that give each column a name. It will parse the first line of the file as headers. If the first line is not actually headers then parsing behavior will be unpredictable because the plugin makes no attempt to detect whether the first line is pure data. You should ideally edit your CSV, adding headers if it does not have them. For cases where the CSV file may be very large or it is not feasible to add headers, then the plugin can be asked to dynamically accept them. Simply pass the option called `headers`, providing a comma delimited list of names like: `headers=x,y,name`. This would work for a csv file lacking headers like:
```csv
-122,48,Seattle
0,51,London
```
### file size
Because the plugin caches data in memory, large file sizes are not recommended. If your data is large, then please choose a more suitable format and then Mapnik can easily read data one row at a time. Because the CSV Plugin reads all data at initialization it makes an attempt to detect files over a certain size and will throw an error. The default threshold is 20 MB. This can be changed by passing the `filesize_max` option an alternative MB amount, but doing so is not recommended unless you are sure your machine has sufficient memory.
### line breaks
The plugin will read the first 2000 bytes of the file to count the occurrences of `\n` and \r`. Which ever is more plentiful will be the assumed character that signifies line breaks - this allows the plugin to automatically work for both mac, windows, and unix style line breaks.
### column separator
The plugin will read the first line of the file to count the occurrences of `,`, `\t`, `|`, and `;` as possible delimiters of columns. You can disable this auto-detection by passing the `separator` option.
### escape character
The plugin defaults to assuming `\` is the escape character. So data like `"This is a cell\"s data"` would parse well. It has spaces so it needs to be quoted and while it uses the quoting character in the text it is properly escaped. If you prefer to use a different escape character then pass the `escape` option.
### quote character
The plugin defaults to assuming `"` is the quote character that lines needing quoted will be wrapped in. Therefore a line like `"Main street,USA"` would parse fine and the `,` inside would not be interpreted as a column separator, but `'Main street, USA` would not parse unless you passed a custom option like `quote='`.