Skip to content
sonalgoyal edited this page Nov 29, 2010 · 5 revisions

Configurations for loading data to MySQL

Here is a description of the different config values and properties.

Properties

* mapreduce.jdbc.driver.class Name of your JDBC driver class, example com.mysql.jdbc.Driver
* mapreduce.jdbc.url DB connect URL example jdbc:mysql://localhost:3306/cumulus
* mapreduce.jdbc.username DB Username
* mapreduce.jdbc.password DB user password
* mapreduce.jdbc.hiho.load.query.suffix This is the description of your file format. Please check the MySQL Load data command syntax. This property contains the command you would have used for load data. Please note, only the part tbl_name and beyond is to be supplied, the initial command is automatically derived. So, suppose on examining your hdfs files, you would have written the command as LOAD DATA LOCAL INFILE 'filename.txt' INTO TABLE tbl_name fields terminated by ',' then, the mapreduce.jdbc.hiho.load.query.suffix property has to be configured only with tbl_name fields terminated by ',' 

Configurations for importing query based data to Hadoop from RDBMS

HIHO importer is configurable to allow user values. Here is a description of the different config values and properties. Properties

* mapreduce.jdbc.driver.class Name of your JDBC driver class, example com.mysql.jdbc.Driver
* mapreduce.jdbc.url DB connect URL example jdbc:mysql://localhost:3306/cumulus
* mapreduce.jdbc.username DB Username
* mapreduce.jdbc.password DB user password
* mapreduce.jdbc.input.query The query you want to run against the database. This is the query which will fetch records from the db to Hadoop. As we want to split this query across multiple mappers, we can specify $CONDITIONS at the point we want to add our bounding query. Example select extractJobEmployee.id, extractJobEmployee.name, extractJobEmployee.age, extractJobEmployee.salary, designations.id, designations.designation from extractJobEmployee, designations where extractJobEmployee.designationId = designations.id and extractJobEmployee.isMarried = ? AND $CONDITIONS
* mapreduce.jdbc.input.orderby This is the column against which the splitting will take place. Example extractJobEmployee.id
* mapreduce.jdbc.hiho.input.outputPath HDFS path where records will be written
* mapreduce.jdbc.hiho.input.outputStrategy Format of output record. Choice between AVRO/DELIMITED. Case sensitive 
* mapreduce.jdbc.hiho.input.delimiter - the delimiter by which to separate the columns
* mapred.jdbc.input.bounding.query The query which will compute the range of the column on which splitting is happening. Example select min(id)+5, max(id)-5 from extractJobEmployee
* io.serializations Used internally, please preserve this value.
* mapreduce.jdbc.hiho.number.mappers Actual number of mappers to use 

**Configurations for importing table data from RDBMS to Hadoop **

Properties * mapreduce.jdbc.driver.class JDBC driver of the database * mapreduce.jdbc.url DB URL, for example jdbc:mysql://localhost:3306/test * mapreduce.jdbc.username DB User * mapreduce.jdbc.password DB password * mapreduce.jdbc.input.table.name Table from which to fetch data * mapreduce.jdbc.input.field.names Columns to fetch * mapreduce.jdbc.input.orderby Column by which splitting will happen across various mappers. This propertly controls the parallelization of fetching data from the db.

Configurations for exporting data to an Oracle external table Properties * mapreduce.jdbc.driver.class JDBC driver * mapreduce.jdbc.url Database url like jdbc:oracle:thin:@192.168.128.2:1521:nube * mapreduce.jdbc.username DB user * mapreduce.jdbc.password DB user password * mapreduce.jdbc.hiho.oracle.externaltable.dml External table definition. Please check conf/oracleExport.xml for a sample * mapreduce.jdbc.hiho.oracle.ftp.serveraddress oracle server ftp address like 192.168.128.2 * mapreduce.jdbc.hiho.oracle.ftp.portnumber FTP port * mapreduce.jdbc.hiho.oracle.ftp.username FTP user * mapreduce.jdbc.hiho.oracle.ftp.password FTP password * mapreduce.jdbc.hiho.oracle.ftp.extdir Folder which is mapped to an Oracle directory and where ftp will take place.

Configurations for exporting data to Salesforce * mapreduce.jdbc.hiho.sf.username - Salesforce account username * mapreduce.jdbc.hiho.sf.password - Salesforce password * mapreduce.jdbc.hiho.sf.sobjectype - Salesforce object * mapreduce.jdbc.hiho.sf.headers Salesforce object headers

Clone this wiki locally