https://nyc2024.pydata.org/cfp/talk/DQGWQF/
=PY
and hit tabType =PY
then hit <tab>
.
Type
2+2
and press Ctrl+Enter.
Click the [PY]
icon to view the returned type and value.
Let's try a multi-line cell.
Type
arr = np.arange(0, 50, 2)
arr + 100
Click the [PY]
icon to view the returned type and value. Notice the custom view for lists and arrays.
Convert the return type to Excel Value. The result will spill onto the grid.
Python can refer to previously defined variables.
Type
arr.sum()
NOTE: This cell must be to the right and/or down from the previous cell. Python cells are executed in row-major order.
Python can refer to regular Excel values using xl()
Enter 5 in cell F1
Create a =PY
cell and type
xl("F1") + 1
Within the Python formula bar, clicking on a cell or range will automatically add the correct xl()
code to refer to the selected cell.
Python automatically converts Tables and Ranges into pandas DataFrames.
Paste the following into cell F10
={"Student","Score"; "Alice",94; "Bob",85; "Cesar",92; "Dot",88;"Eustace",79}
Create a =PY
cell and type
df = xl("F10#", headers=True)
df.plot(kind='bar', x='Student', y='Score')
Switch the output type to Excel Value, then click the small icon to create a chart reference. This allows you to move and resize the chart independent of the chart creation.
The Anaconda Toolbox helps you write Python code which is executed using Python in Excel.
Python in Excel has default conversion from Excel types to Python types when using xl()
, but this can be overridden.
Delete Sheet 1 so we have a fresh start to work with.
Add a new sheet and rename it "Data Conversion".
You should only have two sheets:
Drag the formula to see how each Excel value is converted into Python.
NOTE: Even though Excel stores dates as numbers, the xl()
function looks at the formatting to infer dates and times.
In general, don't drag Python formulas like this
We're doing it here so xl() has a single value for input, but normally for contiguous data you want to select the whole range and work with it as a single table.
Repeating Python formulas will slow down the execution significantly!
When selecting a range, Excel will attempt to infer if the range has headers.
Ranges are always converted into pandas DataFrames. If no headers are provided, it defaults to numbered headers (0-based).
Create a =PY
cell in I5.
Drag F5:G10. Expect Excel to write xl("F5:G10", headers=True)
in the formula bar.
Create another =PY
cell in I6.
Drag F6:G10 (don't include the header). Excel now writes xl("F6:G10")
in the formula bar.
In both cases, the output should say "[PY] DataFrame".
The Initialization section contains code which is run before all other Python cells.
It is currently read-only, but will likely be made editable at some future time.
When it is editable, the Data Conversion and Custom Repr code should be placed here. Until then, we will place it in Cell A1 so it is run before other Python cells.
When providing a custom conversion from an Excel Range to a Python object, obj
is a list of lists.
In cell A1, type
def our_custom_conversion(obj, headers=False, **kwargs):
return np.array(obj)
excel.set_xl_array_conversion(our_custom_conversion)
"xl() conversion function"
Excel ranges will now be converted to a numpy array instead of a pandas DataFrame.
This affects all Python cells.
NOTE: A good practice when definining functions or other things which return None
is to end with a string giving some indication of what the cell does. Think of it like a code comment.
Cells I5 and I6 should now say "[PY] ndarray"
This is great if you only want numpy arrays, but we can make this more flexible.
Update cell A1 to read
def our_custom_conversion(obj, headers=False, type='DataFrame', **kwargs):
if type in {'ndarray', 'numpy', 'np', 'array'}:
if headers:
raise ValueError('Numpy does not support headers')
return np.array(obj, **kwargs)
if type in {None, 'DataFrame', 'pandas', 'pd', 'dataframe'}:
return excel.convert_to_dataframe(obj, headers=headers, **kwargs)
raise TypeError(f'Unknown type: {type}')
excel.set_xl_array_conversion(our_custom_conversion)
"xl() conversion function"
We added a type
keyword, allowing us to specify what output type we want. Importantly, the default is the usual DataFrame
so we are adding extra functionality, not changing the default behavior.
Cells I5 and I6 should now say "[PY] DataFrame" again.
Modify cell I6 with type='array'
to convert to a numpy array.
Modify cell A1.
Add code to create a simple dictionary.
if type in {dict, 'dict'}:
return {row[0]: row[1] for row in obj}
Create a =PY
cell in I7
xl("F6:G10", type=dict)
Now we can easily create Python dictionaries from two Excel columns. Converting first to pandas and then breaking it apart into a dictionary is possible, but this is much easier and much cleaner.
Let's add one more.
Modify cell A1.
Add code to create a list from either a vertical or horizontal input range.
if type in {list, 'list'}:
if len(obj) == 1:
return obj[0]
return [row[0] for row in obj]
Create a =PY
cell in I8
xl("G6:G10", type=list)
Why create and register custom xl() loaders?
For a small amount of code in a workbook, probably not worth the effort. But for a larger workbook, if pandas DataFrame is not the right input format, a lot of code will be created simply to undo that step, complicating the analysis and making the real logic harder to follow.
Magic is happening before your code is sent to Azure. Think of xl() as a pre-processing MACRO, not as a true function.
This will fail
from functools import partial
xlheaders = partial(xl, headers=True)
xlheaders("F5:G10")
The lesson is that while you can customize how xl()
is handled, it isn't actually a true function, but rather acts like a pre-process MACRO in C. Excel does some magic to resolve the reference, and the result is given as obj
to the conversion function. But don't expect xl()
to be as flexible as normal Python functions.
Now we will look at controlling how Python objects are rendered in Excel, both as an Excel Value and as a Python Object.
Let's create a new sheet named "Custom Reprs" to keep our learning cleanly separated.
Create a =PY
cell in D4
50
Look at the card view and notice the fields.
You can click the Extract icon to the right of each field to send it to the grid.
There is also an Insert Data icon when hovering on the Python cell to extract data. Extracting data when something already exists to the right will find the next available cell to the right.
These fields are linked. Let's change cell D4 and watch them update.
list(range(50))
Notice how the value changed as well as the datatype.
Even though the card view shows only the array preview, all fields are still available for use.
The next built-in repr is for pandas DataFrames.
Create a =PY
cell in D6
df = xl("IPhoneReviews[#All]", headers=True)
The card view displays a Table view.
If we select a single column, the same view is used for the pandas Series.
df = xl("IPhoneReviews[#All]", headers=True)
df['ratingScore']
NOTE: The entity card always shows the row index. However, when switching the output to Excel Value, the row index is only shown if it isn't a pure RangeIndex (i.e. 0..n)
To see this behavior, change D6
df = xl("IPhoneReviews[#All]", headers=True)
df[df.country == 'Mexico']['ratingScore']
The row index is no longer a pure RangeIndex, so it will spill to the grid.
Excel also has default handling for static images (either from matplotlib or the Python Image Library (PIL).
Create a =PY
cell in D8
df.groupby('country')['ratingScore'].mean().plot(kind='bar')
We can view the plot in the card view. We can also send the chart to the grid either using a Field or by changing the output type to Excel Value.
For the Excel Value, a new icon appears to create a Reference. This is the easiest way to make the chart larger and move it wherever is appropriate.
However, with the Field option, we can still get a Reference by right-clicking and selecting Picture in Cell -> Create Reference.
What if your Python object is not one of the built-in types?
One approach is to manually return one of the built-in types as the last call in your cell.
Let's look at how Excel handles Python's complex numbers.
Hint: it doesn't
Create a =PY
cell in D10
c = 25 + 42j
The default entity card is used.
Switch the output to Excel Value and we get an #N/A error. Excel doesn't know how to display a complex number in the grid.
Let's help Excel out by giving it something it does know how to render.
c = 25 + 42j
[c.real, c.imag]
Now we get the complex number split into real and imaginary components in the grid.
Excel has a mechanism to associate any Python object with a repr function. Let's create one now.
NOTE: As mentioned before, this would ideally go in Initialization, but as it's not editable, we will place this in cell A1.
Create a =PY
cell in A1
def _complex_value_repr(c):
return [[c.real, c.imag]]
excel.repr.register_repr_xl_value(complex, _complex_value_repr)
This creates a function to return the horizontal result we built previously. Then we register the function for the complex
type.
Change cell D10 so it returns a complex number
c = 25 + 42j
It now works because _complex_value_repr
is being called to render the object as an Excel value.
If you are creating your own custom Class, there is a special dunder method _repr_xl_value_
which will achieve the same result without requiring you to register a function.
If you are writing your own class, this is the better approach.
Create a =PY
cell in A2
import math
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def distance(self):
return math.sqrt(self.x ** 2 + self.y ** 2)
def _repr_xl_value_(self):
return [['x', self.x],
['y', self.y]]
Create a Point
in cell D12
p = Point(3, 4)
Change the output type to Excel Value and see how the _repr_xl_value_
function controls how the object is spilled to Excel.
NOTE: Notice that I didn't have to register this. Any class with this special function will use it to spill to the grid.
Excel has a way to customize the entity card by returning a specially formatted dictionary.
Detailed information about entity cards: https://learn.microsoft.com/en-us/office/dev/add-ins/excel/excel-data-types-entity-card
Let's create a custom entity card for complex numbers.
Add the following to the bottom of cell A1
def _complex_preview_repr(c):
return {
"type": "Entity",
"text": f"complex({c.real}, {c.imag})",
"properties": {
"real": c.real,
"imag": c.imag,
},
}
excel.repr.register_repr_xl_preview(complex, _complex_preview_repr)
"complex repr"
Switch cell D10 from Excel Value output to Python Object.
Look at the entity card and see the real
and imag
properties.
These are also available as Fields.
Why does imag
come before real
? Excel sorts Fields alphabetically.
We can change the order in the card view, but not in the Field list.
Let's create a custom entity card for our Point
class.
Add the following to the bottom of cell B1
def _repr_xl_preview_(self):
return {
"type": "Entity",
"text": f"Point(x={self.x}, y={self.y})",
"properties": {
"x": self.x,
"y": self.y,
"distance": self.distance(),
"array": (self.x, self.y),
"labels": [['x', self.x], ['y', self.y]]
},
"layouts": {
"card": {
"sections": [
{
"layout": "List",
"properties": ["x", "y", "distance"],
},
{
"layout": "List",
"properties": ["array", "labels"],
}
],
},
},
}
"Point"
Change cell D12 from Excel Value to Python Object and look at the card view.
Click on the icon next to array
and labels
. It shows a nested view of just that element.
Extract the array
Field. One thing to notice is that these fields are normal Excel objects. They can be placed anywhere (doesn't need to respect row-major order).
Also, you can mix them with other Excel formulas. Try this:
=TRANSPOSE(D12.array)
Why would we use custom reprs?
Pros:
Cons:
Associate Python types with a custom repr using
excel.repr.register_repr_xl_preview(<type>, <function>)
If creating a new class, simply include a method
def _repr_xl_preview_(self): ...
Disclaimer: We will be using NLTK for this analysis. However, I am not an expert or even an intermediate user of NLTK. I'm using it here as an example of a workflow using real libraries.
This tutorial is ultimately about using Python in Excel, not best practices for using NLTK.
We will be using the Vader library within NLTK
The dataset we will work with is a set of Amazon reviews about the iPhone.
https://www.kaggle.com/datasets/mrmars1010/iphone-customer-reviews-nlp
Create a new tab named "NLTK Analysis".
Make sure it is to the right of the other sheets, as we will use some of the data conversions defined previously.
On the "NLTK Analysis" sheet, create a =PY
cell in A1
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sia = SentimentIntensityAnalyzer()
df_reviews = xl("IPhoneReviews[#All]", headers=True)
Normally, this would go in Initialization, but it lives here until that page is editable.
Rather than giving the full solution, we will walk through step-by-step so you can see the process of debugging and working towards the final goal, all in Excel.
Let's start by looking at the review descriptions.
Create a =PY
cell in D1
reviews = df_reviews['reviewDescription']
Modify D1 to calculate a sentiment score for each review
reviews = df_reviews['reviewDescription']
scores = reviews.apply(sia.polarity_scores)
Oh no! A #PYTHON! error ->
AttributeError: 'NoneType' object has no attribute 'encode'
Let's try getting the sentiment of just a single review.
reviews = df_reviews['reviewDescription']
#scores = reviews.apply(sia.polarity_scores)
review = reviews[0]
sia.polarity_scores(review)
Now we get a valid result.
Let's do a search on the error, with extra context for good measure.
pandas apply AttributeError: 'NoneType' object has no attribute 'encode'
Let's check for None
values in the ratings description column.
reviews = df_reviews['reviewDescription']
#scores = reviews.apply(sia.polarity_scores)
reviews[reviews.isna()]
Apparently there are 87 blanks. That explains our error.
For now, let's set them as empty strings, essentially giving each a completely neutral sentiment score.
reviews = df_reviews['reviewDescription']
reviews[reviews.isna()] = ""
scores = reviews.apply(sia.polarity_scores)
Now we have a column of dictionaries. Luckily pandas has a method to convert a dictionary into columns based on the keys.
reviews = df_reviews['reviewDescription']
reviews[reviews.isna()] = ""
scores = pd.json_normalize(reviews.apply(sia.polarity_scores))
Looks good. Let's combine with the original DataFrame.
reviews = df_reviews['reviewDescription']
reviews[reviews.isna()] = ""
scores = pd.json_normalize(reviews.apply(sia.polarity_scores))
df_reviews = pd.concat([df_reviews, scores], axis=1)
Are we done?
No, because no one is going to remember what each line is doing two days from now.
Let's add comments because we can -- one more thing that makes Python nicer than regular Excel formulas.
Remember: Readability counts. (Zen of Python)
reviews = df_reviews['reviewDescription']
# Convert missing reviews into empty reviews
reviews[reviews.isna()] = ""
# Compute sentiment scores (compound, neg, neu, pos)
scores = reviews.apply(sia.polarity_scores)
# Convert dictionaries to columns
scores = pd.json_normalize(scores)
# Combine original table with scores
df_reviews = pd.concat([df_reviews, scores], axis=1)
One benefit of Excel over Jupyter Notebooks is that the output is easy to view. Let's make the results visible.
Set cell D1 to output as Excel Value.
Then style using normal Excel options (bold header, column widths, etc).
Unfortunately, we can't convert this into a proper Excel Table.
Spilled results are dynamic and are incompatible with Tables.
We will use the Anaconda Toolbox to demonstrate an easy way to make Python charts.
Click "Visualize with Python", then create a new Box Plot.
For the Data, select "Choose Range" and click on cell D1. The Toolbox is smart enough to know you want the entire spilled table.
Choose ratingScore for the X-Axis.
Choose compount for the Y-Axis.
Choose R1 as the chart output location.
Cell R1 now contains Python code to build the box plot.
Click "Create Reference" to view a larger image.
What about the chart is expected and what is a surprise?
Excel can do box and whisker charts. Let's do something Excel can't do.
Wordcloud generates an image of common words in a paragraph. It's fun, it looks nice, and it can impress the boss.
We will use the Anaconda AI Assistant to help us.
Make a new chat and choose the "Aggregate or summarize a table or range" template.
Choose the range D1 so the chat knows about our combined results table.
Ignore the suggestions and type
Filter the table to only show rating 5, then create a wordcloud from the reviewDescription field
Click the purple Play button below the code block, and choose cell T26 as the location.
Once the wordcloud is generated, click "Create Reference" to make it larger.
Some of the most common words are not about sentiment. Let's remove these using STOPWORDS.
Stopwords are common, but unhelpful words like a, an, the, in, on, etc
.
Wordcloud has a default list, but we can add to the list.
Overwrite cell T25 as follows:
_df = xl("'NLTK Analysis'!$D$1#", headers=True)
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
# Filter the dataframe for rating
filtered_df = _df[_df['ratingScore'] == xl("T25")]
# Combine all review descriptions into a single string
text = ' '.join(filtered_df['reviewDescription'].astype(str))
# Create a word cloud
stopwords = list(STOPWORDS) + xl("IgnoreWords[Ignore Words]", type=list)
wordcloud = WordCloud(width=800, height=400, background_color='white', stopwords=stopwords).generate(text)
# Display the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
Now add some more stop words: Apple, Amazon
The wordchart will regenerate.
NOTE: This is one benefit of creating a Table. As new rows are added, the reference will pick up the change. No need to update the range.
Add a regular Excel 4
, 3
, 2
, 1
in cells U24:X24.
Drag the formula in cell T25 to the right. The Python formula should work as we made everything into references.
Add more stopwords as appropriate: day, will, one, even, à
Python is a vast ecosystem with lots of amazing libraries.
Hopefully you have some new tools in your Excel toolbelt.