Class RemoveOrphanFilesProcedure
java.lang.Object
org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure
- All Implemented Interfaces:
Procedure
A procedure that removes orphan files in a table.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.spark.sql.types.DataTypeprotected static final org.apache.spark.sql.types.DataType -
Method Summary
Modifier and TypeMethodDescriptionprotected SparkActionsactions()builder()org.apache.spark.sql.catalyst.InternalRow[]call(org.apache.spark.sql.catalyst.InternalRow args) Executes this procedure.protected voidCloses this procedure's executor service if a new one was created withBaseProcedure.executorService(int, String).Returns the description of this procedure.protected ExecutorServiceexecutorService(int threadPoolSize, String nameFormat) Starts a new executor service which can be used by this procedure in its work.protected ExpressionfilterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, String where) protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> protected SparkTableloadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident) protected <T> TmodifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table, T> func) protected org.apache.spark.sql.catalyst.InternalRownewInternalRow(Object... values) org.apache.spark.sql.types.StructTypeReturns the type of rows produced by this procedure.Returns the input parameters of this procedure.protected voidrefreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table) protected org.apache.spark.sql.SparkSessionspark()protected org.apache.spark.sql.connector.catalog.TableCatalogprotected Spark3Util.CatalogAndIdentifiertoCatalogAndIdentifier(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.IdentifiertoIdentifier(String identifierAsString, String argName) protected <T> TwithIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table, T> func)
-
Field Details
-
STRING_MAP
protected static final org.apache.spark.sql.types.DataType STRING_MAP -
STRING_ARRAY
protected static final org.apache.spark.sql.types.DataType STRING_ARRAY
-
-
Method Details
-
builder
-
parameters
Description copied from interface:ProcedureReturns the input parameters of this procedure. -
outputType
public org.apache.spark.sql.types.StructType outputType()Description copied from interface:ProcedureReturns the type of rows produced by this procedure. -
call
public org.apache.spark.sql.catalyst.InternalRow[] call(org.apache.spark.sql.catalyst.InternalRow args) Description copied from interface:ProcedureExecutes this procedure.Spark will align the provided arguments according to the input parameters defined in
Procedure.parameters()either by position or by name before execution.Implementations may provide a summary of execution by returning one or many rows as a result. The schema of output rows must match the defined output type in
Procedure.outputType().- Parameters:
args- input arguments- Returns:
- the result of executing this procedure with the given arguments
-
description
Description copied from interface:ProcedureReturns the description of this procedure. -
spark
protected org.apache.spark.sql.SparkSession spark() -
actions
-
tableCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog() -
modifyIcebergTable
-
withIcebergTable
-
toIdentifier
-
toCatalogAndIdentifier
protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
loadSparkTable
-
loadRows
-
refreshSparkCache
protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table) -
filterExpression
protected Expression filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, String where) -
newInternalRow
-
closeService
protected void closeService()Closes this procedure's executor service if a new one was created withBaseProcedure.executorService(int, String). Does not block for any remaining tasks. -
executorService
Starts a new executor service which can be used by this procedure in its work. The pool will be automatically shut down ifwithIcebergTable(Identifier, Function)ormodifyIcebergTable(Identifier, Function)are called. If these methods are not used then the service can be shut down withcloseService()or left to be closed when this class is finalized.- Parameters:
threadPoolSize- number of threads in the servicenameFormat- name prefix for threads created in this service- Returns:
- the new executor service owned by this procedure
-