Package org.apache.iceberg.util
Class FileSystemWalker
java.lang.Object
org.apache.iceberg.util.FileSystemWalker
Utility class for recursively traversing file systems and identifying hidden paths. Provides
methods to list files recursively while filtering out hidden paths based on specified criteria.
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
listDirRecursivelyWithFileIO
(SupportsPrefixOperations io, String dir, Map<Integer, PartitionSpec> specs, Predicate<FileInfo> filter, Consumer<String> fileConsumer) Recursively lists files in the specified directory that satisfy the given conditions.static void
listDirRecursivelyWithHadoop
(String dir, Map<Integer, PartitionSpec> specs, Predicate<org.apache.hadoop.fs.FileStatus> filter, org.apache.hadoop.conf.Configuration conf, int maxDepth, int maxDirectSubDirs, Consumer<String> directoryConsumer, Consumer<String> fileConsumer) Recursively traverses the specified directory using Hadoop FileSystem API to collect file paths that meet the conditions.
-
Method Details
-
listDirRecursivelyWithFileIO
public static void listDirRecursivelyWithFileIO(SupportsPrefixOperations io, String dir, Map<Integer, PartitionSpec> specs, Predicate<FileInfo> filter, Consumer<String> fileConsumer) Recursively lists files in the specified directory that satisfy the given conditions. UseFileSystemWalker.PartitionAwareHiddenPathFilter
to filter out hidden paths.- Parameters:
io
- FileIO implementation interface supporting prefix operationsdir
- Base directory to start recursive listingspecs
- Map ofpartition specs
for this table. Used to prevent partition directories from being filtered as hidden paths.filter
- File filter condition, only files satisfying this condition will be collected.fileConsumer
- Consumer to accept matching file locations
-
listDirRecursivelyWithHadoop
public static void listDirRecursivelyWithHadoop(String dir, Map<Integer, PartitionSpec> specs, Predicate<org.apache.hadoop.fs.FileStatus> filter, org.apache.hadoop.conf.Configuration conf, int maxDepth, int maxDirectSubDirs, Consumer<String> directoryConsumer, Consumer<String> fileConsumer) Recursively traverses the specified directory using Hadoop FileSystem API to collect file paths that meet the conditions.This method provides depth control and subdirectory quantity limitation:
- Stops traversal when maximum recursion depth is reached and adds current directory to pending list
- Stops traversal when number of direct subdirectories exceeds threshold and adds subdirectories to pending list
- Parameters:
dir
- The starting directory path to traversespecs
- Map ofpartition specs
for this table. Used to prevent * partition directories from being filtered as hidden paths.filter
- File filter condition, only files satisfying this condition will be collectedconf
- Hadoop's configuration used to load the FileSystemmaxDepth
- Maximum recursion depth limitmaxDirectSubDirs
- Upper limit of subdirectories that can be processed directlydirectoryConsumer
- Consumer for collecting parameter for storing unprocessed directory pathsfileConsumer
- Consumer for collecting qualified file paths
-