Class RemoveOrphanFilesProcedure

java.lang.Object
org.apache.iceberg.spark.procedures.RemoveOrphanFilesProcedure
All Implemented Interfaces:
org.apache.spark.sql.connector.catalog.procedures.BoundProcedure, org.apache.spark.sql.connector.catalog.procedures.Procedure, org.apache.spark.sql.connector.catalog.procedures.UnboundProcedure

public class RemoveOrphanFilesProcedure extends Object
A procedure that removes orphan files in a table.
See Also:
  • Field Details

    • STRING_MAP

      protected static final org.apache.spark.sql.types.DataType STRING_MAP
    • STRING_ARRAY

      protected static final org.apache.spark.sql.types.DataType STRING_ARRAY
  • Method Details

    • builder

      public static SparkProcedures.ProcedureBuilder builder()
    • bind

      public org.apache.spark.sql.connector.catalog.procedures.BoundProcedure bind(org.apache.spark.sql.types.StructType inputType)
    • parameters

      public org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter[] parameters()
    • call

      public Iterator<org.apache.spark.sql.connector.read.Scan> call(org.apache.spark.sql.catalyst.InternalRow args)
    • name

      public String name()
    • description

      public String description()
    • requiredInParameter

      protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter requiredInParameter(String name, org.apache.spark.sql.types.DataType dataType)
    • optionalInParameter

      protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter optionalInParameter(String name, org.apache.spark.sql.types.DataType dataType)
    • optionalInParameter

      protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter optionalInParameter(String name, org.apache.spark.sql.types.DataType dataType, String defaultValue)
    • isDeterministic

      public boolean isDeterministic()
      Specified by:
      isDeterministic in interface org.apache.spark.sql.connector.catalog.procedures.BoundProcedure
    • spark

      protected org.apache.spark.sql.classic.SparkSession spark()
    • actions

      protected SparkActions actions()
    • tableCatalog

      protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog()
    • modifyIcebergTable

      protected <T> T modifyIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table,T> func)
    • withIcebergTable

      protected <T> T withIcebergTable(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table,T> func)
    • toIdentifier

      protected org.apache.spark.sql.connector.catalog.Identifier toIdentifier(String identifierAsString, String argName)
    • toCatalogAndIdentifier

      protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog)
    • loadSparkTable

      protected SparkTable loadSparkTable(org.apache.spark.sql.connector.catalog.Identifier ident)
    • loadRows

      protected org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> loadRows(org.apache.spark.sql.connector.catalog.Identifier tableIdent, Map<String,String> options)
    • refreshSparkCache

      protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table)
    • filterExpression

      protected Expression filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, String where)
    • newInternalRow

      protected org.apache.spark.sql.catalyst.InternalRow newInternalRow(Object... values)
    • asScanIterator

      protected Iterator<org.apache.spark.sql.connector.read.Scan> asScanIterator(org.apache.spark.sql.types.StructType readSchema, org.apache.spark.sql.catalyst.InternalRow... rows)
    • closeService

      protected void closeService()
      Closes this procedure's executor service if a new one was created with BaseProcedure.executorService(int, String). Does not block for any remaining tasks.
    • executorService

      protected ExecutorService executorService(int threadPoolSize, String nameFormat)
      Starts a new executor service which can be used by this procedure in its work. The pool will be automatically shut down if withIcebergTable(Identifier, Function) or modifyIcebergTable(Identifier, Function) are called. If these methods are not used then the service can be shut down with closeService() or left to be closed when this class is finalized.
      Parameters:
      threadPoolSize - number of threads in the service
      nameFormat - name prefix for threads created in this service
      Returns:
      the new executor service owned by this procedure