ETL

extraction, transformation, loading

Marc Tobias Metten

Lokku Ltd.

overview

ETL


ISBN 0764567578

examples

  • daily import from legacy systems
  • log file analysis
  • SAP R/3 sales data

nestoria - screenshot

screenshot Nestoris Oxford
Nestoria - Houses to buy in Oxford

nestoria - business overview


UK, Spain, Italy, Germany

data sizes

nestoria - challenges

... prepare and you'll have low maintenance

ETL software

A - commercial

Enterprise

problem solved

IBM DataStage


source

IBM DataStage

A - commercial


ISBN 1599941988

B - Open Source - Kettle

Kettle

Pentaho Data Integration is a powerful, metadata-driven ETL tool designed to bridge the gap between business and IT; Turning your company's data into increased profits.

http://kettle.pentaho.org/

B - Open Source - Kettle

B - Open Source - Sprog

Sprog

B - Open Source - Talend Open Studio

Talend Open Studio

B - Open Source - Talend

C - build-your-own

why?

why we've built our own


source

ETL @ nestoria

convert @ nestoria

serialize

JSON:XS, Data::Dumper vs XML

typical ETL transform tasks

Transform @ nestoria

lessons learned - transform

free-text data example

Commission:
             
submit

Loading @ Nestoria

simple

lessions learned - loading

scaling ETL

can't give a clear recommendation we did

monitoring

1. dashboard 2. alarms 3. performance

monitoring @ nestoria

lessons learned - summary

lessons learned - summary II

    $ prove -Ilib -r t/etl/ t/blackbox/etl/
    t/etl/attrhash........................................ok
    t/etl/convert.........................................ok
    t/etl/dropbox.........................................ok
    t/etl/get_summaries_since.............................ok
    t/etl/images..........................................ok
    t/etl/import..........................................ok
    t/etl/object_cache....................................ok
    t/etl/openimmoutils...................................ok
    [...]
    All tests successful.
    Files=56, Tests=1493, 372 wallclock secs (305.88 cusr +  6.26 csys = 312.14 CPU)

recommended CPAN modules I

recommended CPAN modules II

recommended CPAN modules III

Questions?


!! find us in the hallway !!