Class RewritePositionDeleteFilesProcedure
java.lang.Object
org.apache.iceberg.spark.procedures.RewritePositionDeleteFilesProcedure
- All Implemented Interfaces:
org.apache.spark.sql.connector.catalog.procedures.BoundProcedure
,org.apache.spark.sql.connector.catalog.procedures.Procedure
,org.apache.spark.sql.connector.catalog.procedures.UnboundProcedure
A procedure that rewrites position delete files in a table.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected static final org.apache.spark.sql.types.DataType
protected static final org.apache.spark.sql.types.DataType
-
Method Summary
Modifier and TypeMethodDescriptionprotected SparkActions
actions()
protected Iterator
<org.apache.spark.sql.connector.read.Scan> asScanIterator
(org.apache.spark.sql.types.StructType readSchema, org.apache.spark.sql.catalyst.InternalRow... rows) org.apache.spark.sql.connector.catalog.procedures.BoundProcedure
bind
(org.apache.spark.sql.types.StructType inputType) builder()
Iterator
<org.apache.spark.sql.connector.read.Scan> call
(org.apache.spark.sql.catalyst.InternalRow args) protected void
Closes this procedure's executor service if a new one was created withBaseProcedure.executorService(int, String)
.protected ExecutorService
executorService
(int threadPoolSize, String nameFormat) Starts a new executor service which can be used by this procedure in its work.protected Expression
filterExpression
(org.apache.spark.sql.connector.catalog.Identifier ident, String where) boolean
protected org.apache.spark.sql.Dataset
<org.apache.spark.sql.Row> protected SparkTable
loadSparkTable
(org.apache.spark.sql.connector.catalog.Identifier ident) protected <T> T
modifyIcebergTable
(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table, T> func) name()
protected org.apache.spark.sql.catalyst.InternalRow
newInternalRow
(Object... values) protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter
optionalInParameter
(String name, org.apache.spark.sql.types.DataType dataType) protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter
optionalInParameter
(String name, org.apache.spark.sql.types.DataType dataType, String defaultValue) org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter[]
protected void
refreshSparkCache
(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table) protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter
requiredInParameter
(String name, org.apache.spark.sql.types.DataType dataType) protected org.apache.spark.sql.classic.SparkSession
spark()
protected org.apache.spark.sql.connector.catalog.TableCatalog
protected Spark3Util.CatalogAndIdentifier
toCatalogAndIdentifier
(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) protected org.apache.spark.sql.connector.catalog.Identifier
toIdentifier
(String identifierAsString, String argName) protected <T> T
withIcebergTable
(org.apache.spark.sql.connector.catalog.Identifier ident, Function<Table, T> func)
-
Field Details
-
STRING_MAP
protected static final org.apache.spark.sql.types.DataType STRING_MAP -
STRING_ARRAY
protected static final org.apache.spark.sql.types.DataType STRING_ARRAY
-
-
Method Details
-
builder
-
bind
public org.apache.spark.sql.connector.catalog.procedures.BoundProcedure bind(org.apache.spark.sql.types.StructType inputType) -
parameters
public org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter[] parameters() -
call
public Iterator<org.apache.spark.sql.connector.read.Scan> call(org.apache.spark.sql.catalyst.InternalRow args) -
name
-
description
-
requiredInParameter
protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter requiredInParameter(String name, org.apache.spark.sql.types.DataType dataType) -
optionalInParameter
protected static org.apache.spark.sql.connector.catalog.procedures.ProcedureParameter optionalInParameter(String name, org.apache.spark.sql.types.DataType dataType) -
optionalInParameter
-
isDeterministic
public boolean isDeterministic()- Specified by:
isDeterministic
in interfaceorg.apache.spark.sql.connector.catalog.procedures.BoundProcedure
-
spark
protected org.apache.spark.sql.classic.SparkSession spark() -
actions
-
tableCatalog
protected org.apache.spark.sql.connector.catalog.TableCatalog tableCatalog() -
modifyIcebergTable
-
withIcebergTable
-
toIdentifier
-
toCatalogAndIdentifier
protected Spark3Util.CatalogAndIdentifier toCatalogAndIdentifier(String identifierAsString, String argName, org.apache.spark.sql.connector.catalog.CatalogPlugin catalog) -
loadSparkTable
-
loadRows
-
refreshSparkCache
protected void refreshSparkCache(org.apache.spark.sql.connector.catalog.Identifier ident, org.apache.spark.sql.connector.catalog.Table table) -
filterExpression
protected Expression filterExpression(org.apache.spark.sql.connector.catalog.Identifier ident, String where) -
newInternalRow
-
asScanIterator
protected Iterator<org.apache.spark.sql.connector.read.Scan> asScanIterator(org.apache.spark.sql.types.StructType readSchema, org.apache.spark.sql.catalyst.InternalRow... rows) -
closeService
protected void closeService()Closes this procedure's executor service if a new one was created withBaseProcedure.executorService(int, String)
. Does not block for any remaining tasks. -
executorService
Starts a new executor service which can be used by this procedure in its work. The pool will be automatically shut down ifwithIcebergTable(Identifier, Function)
ormodifyIcebergTable(Identifier, Function)
are called. If these methods are not used then the service can be shut down withcloseService()
or left to be closed when this class is finalized.- Parameters:
threadPoolSize
- number of threads in the servicenameFormat
- name prefix for threads created in this service- Returns:
- the new executor service owned by this procedure
-