MapFile
是排序后的SequenceFile,MapFile由两部分组成,分别是data和index。
index
文件的数据索引,主要记录了每个Record的key值,以及该Record在文件中的偏移位置。在MapFile被访问的时候,索引文件会被加载到内存,通过索引映射关系可迅速定位到指定Record所在文件位置,因此,相对SequenceFile而言,MapFile的检索效率是高效的,缺点是会消耗一部分内存来存储index数据,因为读取的时候,一整个的index都会被读取到内存中,所以在实现key的时候,要使数据尽可能的小。
读、写源码:
package org.apache.hadoop.io;
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.ReflectionUtils;
public class THT_testMapFileWrite1 {
private static final String[] DATA = { "One, two, buckle my shoe",
"Three, four, shut the door", "Five, six, pick up sticks",
"Seven, eight, lay them straight", "Nine, ten, a big fat hen" };
public static void main(String[] args) throws IOException {
// String uri = args[0];
String uri = "file:///D://tmp//map1";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
IntWritable key = new IntWritable();
Text value = new Text();
MapFile.Writer writer = null;
try {
writer = new MapFile.Writer(conf, fs, uri, key.getClass(),
value.getClass());
for (int i = 0; i < 10; i++) {
key.set(i + 1);
value.set(DATA[i % DATA.length]);
writer.append(key, value);
}
} finally {
IOUtils.closeStream(writer);
}
MapFile.Reader reader = null;
reader = new MapFile.Reader(fs, uri, conf);
Writable keyR = (Writable) ReflectionUtils.newInstance(
reader.getKeyClass(), conf);
Writable valueR = (Writable) ReflectionUtils.newInstance(
reader.getValueClass(), conf);
while (reader.next(key, value)) {
System.out.printf("%s\t%s\n", key, value);
}
}
}
运行结果:
2015-11-08 11:46:09,532 INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate]
1 One, two, buckle my shoe
2 Three, four, shut the door
3 Five, six, pick up sticks
4 Seven, eight, lay them straight
5 Nine, ten, a big fat hen
6 One, two, buckle my shoe
7 Three, four, shut the door
8 Five, six, pick up sticks
9 Seven, eight, lay them straight
10 Nine, ten, a big fat hen
生成一个文件夹,文件夹中有两个文件index、data:
index内容:
data内容:
重建索引(index)
这里首先将刚才生成的index文件删除掉,上源码:
package org.apache.hadoop.io;
//cc MapFileFixer Re-creates the index for a MapFile
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.MapFile;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader;
import org.apache.hadoop.util.ReflectionUtils;
//vv MapFileFixer
public class THT_testMapFileFix {
public static void main(String[] args) throws Exception {
// String mapUri = args[0];
String mapUri = "file:///D://tmp//map1";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(mapUri), conf);
Path map = new Path(mapUri);
Path mapData = new Path(map, MapFile.DATA_FILE_NAME);
// Get key and value types from data sequence file
SequenceFile.Reader reader = new SequenceFile.Reader(fs, mapData, conf);
Class keyClass = reader.getKeyClass();
Class valueClass = reader.getValueClass();
reader.close();
// Create the map file index file
long entries = MapFile.fix(fs, map, keyClass, valueClass, false, conf);
System.out.printf("Created MapFile %s with %d entries\n", map, entries);
Path path = new Path(mapUri+"//data");
SequenceFile.Reader.Option option1 = Reader.file(path);
SequenceFile.Reader reader1 = null;
try {
reader1 = new SequenceFile.Reader(conf, option1);
Writable key = (Writable) ReflectionUtils.newInstance(
reader1.getKeyClass(), conf);
Writable value = (Writable) ReflectionUtils.newInstance(
reader1.getValueClass(), conf);
long position = reader1.getPosition();
while (reader1.next(key, value)) {
String syncSeen = reader1.syncSeen() ? "*" : "";
System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
value);
position = reader1.getPosition(); // beginning of next record
}
} finally {
IOUtils.closeStream(reader1);
}
}
}
// ^^ MapFileFixer
运行结果如下:
2015-11-08 12:16:34,015 INFO compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
Created MapFile file:/D:/tmp/map1 with 10 entries
[128] 1 One, two, buckle my shoe
[173] 2 Three, four, shut the door
[220] 3 Five, six, pick up sticks
[264] 4 Seven, eight, lay them straight
[314] 5 Nine, ten, a big fat hen
[359] 6 One, two, buckle my shoe
[404] 7 Three, four, shut the door
[451] 8 Five, six, pick up sticks
[495] 9 Seven, eight, lay them straight
[545] 10 Nine, ten, a big fat hen
分享到:
相关推荐
Hadoop安装教程_单机/伪分布式配置_Hadoop2.7.1/Ubuntu 16.04
hadoop2.7.1的eclipse插件,编译环境,eclipse 4.4(luna) ,jdk1.7,ant1.9.6,maven3.3,hadoop2.7.1,centos6.7,jdk1.7 要注意的是开发黄金下jdk版本必须是jdk1.7及以上,否则无法使用
eclipse hadoop2.7.1 plugin 配置,包括操作步骤说明,eclipse hadoop2.7.1的插件,还有hadoop.dll和winutils.exe等文件。
Hadoop2.7.1中文文档
hadoop2.7.1的linux下eclipse支持插件。已经经过检查,安装后在eclipse下,可以正常上传,下载,删除hdfs文件,以及跑mapreduce程序。 编译环境:ant1.9.6 jdk1.8 hadoop2.7.1
hadoop2.7.1平台搭建
Spark所需的hadoop2.7.1相关资源 hadoop2.7.1版本的hadoop.dll,winutils.exe 适用Spark2.0.0+版本
本人用7个多小时成功编译 hadoop 2.7.1 64位编译包(JDK1.8 64),由于文件太大,分3卷压缩。 hadoop 2.7.1 相对于2.7.0修复了上百个Bug,是可用于生产环境的版本了。
hadoop2.7.1版本的hadoop.dll,winutils.exe hadoop2.7.1版本的hadoop.dll,winutils.exe
在网上下了好多2.6版本的hadoop.dll,但是都不好使,昨天有个好心网友给我发了一份,实际测试通过。开发环境是64位win7+hadoop2.7.1+redhat版本的linux。
hadoop2.7.1 windows 64位 相关 dll文件
Hadoop-2.7.1-Windows-64-binaries, 预先编译,非官方的Hadoop 2.7.1的Win64二进制文件 用于 Windows 64位 平台的 2.7.1二进制文件这是针对 Windows 64位 平台的非官方预先编译的Apache Hadoop 2.7.1. 这里的tar.gz ...
Hadoop 2.7.1 中文文档 Hadoop 2.7.1 中文文档 Hadoop 2.7.1 中文文档
hadoop.dll winutils.exe for hadoop2.7.1 下载版本,
Windows 7 or 10 eclipse hadoop2.7.1 配置需要文件和工具
hadoop2.7.1运行Wordcount错误 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012) Exception in thread "main" java.lang.UnsatisfiedLinkError
win32平台winutils,hadoop win32 native code
hadoop-2.7.1-64位编译包,本人亲测,绝对有用。
hadoop2.7.1 windows7 32 位 hadoop.dll winutils.exe.在window7 32位下,编译的源码。
hadoop-2.7.1 windows版本,bin目录包括hadoop.dll winutils.exe hadoop-2.6.0\bin