1*40427ccaSGordon Tetlow 2*40427ccaSGordon Tetlow#------------------------------------------------------------------------------ 3*40427ccaSGordon Tetlow# $File: apache,v 1.1 2017/04/11 14:52:15 christos Exp $ 4*40427ccaSGordon Tetlow# apache: file(1) magic for Apache Big Data formats 5*40427ccaSGordon Tetlow 6*40427ccaSGordon Tetlow# Avro files 7*40427ccaSGordon Tetlow0 string Obj Apache Avro 8*40427ccaSGordon Tetlow>3 byte x version %d 9*40427ccaSGordon Tetlow 10*40427ccaSGordon Tetlow# ORC files 11*40427ccaSGordon Tetlow# Important information is in file footer, which we can't index to :( 12*40427ccaSGordon Tetlow0 string ORC Apache ORC 13*40427ccaSGordon Tetlow 14*40427ccaSGordon Tetlow# Parquet files 15*40427ccaSGordon Tetlow0 string PAR1 Apache Parquet 16*40427ccaSGordon Tetlow 17*40427ccaSGordon Tetlow# Hive RC files 18*40427ccaSGordon Tetlow0 string RCF Apache Hive RC file 19*40427ccaSGordon Tetlow>3 byte x version %d 20*40427ccaSGordon Tetlow 21*40427ccaSGordon Tetlow# Sequence files (and the careless first version of RC file) 22*40427ccaSGordon Tetlow 23*40427ccaSGordon Tetlow0 string SEQ 24*40427ccaSGordon Tetlow>3 byte <6 Apache Hadoop Sequence file version %d 25*40427ccaSGordon Tetlow>3 byte >6 Apache Hadoop Sequence file version %d 26*40427ccaSGordon Tetlow>3 byte =6 27*40427ccaSGordon Tetlow>>5 string org.apache.hadoop.hive.ql.io.RCFile$KeyBuffer Apache Hive RC file version 0 28*40427ccaSGordon Tetlow>>3 default x Apache Hadoop Sequence file version 6 29