Python in Excel Tutorial¶

PyData NYC 2024¶

https://nyc2024.pydata.org/cfp/talk/DQGWQF/

Presenters:¶

  • Jim Kitchen (Anaconda)
  • Timothy Hewitt (Anaconda)

Support¶

  • Owen Price (Anaconda)
  • Chris Gross (Microsoft)
  • Keyur Patel (Microsoft)

Necessary Tools¶

  • Python in Excel
    • Windows Desktop Excel (Microsoft 365 version)
    • Try typing =PY and hit tab
  • Anaconda Cloud account
    • Go to anaconda.cloud and register if you don't already have an account
  • Install AnacondaToolbox add-in
    • Home tab -> Add-ins
    • Search for "anaconda"
    • Click AnacondaToolbox, then install

Python in Excel Basics¶

1¶

Type =PY then hit <tab>.

Type

2+2

and press Ctrl+Enter.

Click the [PY] icon to view the returned type and value.


2¶

Let's try a multi-line cell.

Type

arr = np.arange(0, 50, 2)
arr + 100

Click the [PY] icon to view the returned type and value. Notice the custom view for lists and arrays.

Convert the return type to Excel Value. The result will spill onto the grid.


3¶

Python can refer to previously defined variables.

Type

arr.sum()

NOTE: This cell must be to the right and/or down from the previous cell. Python cells are executed in row-major order.


4¶

Python can refer to regular Excel values using xl()

Enter 5 in cell F1

Create a =PY cell and type

xl("F1") + 1

Within the Python formula bar, clicking on a cell or range will automatically add the correct xl() code to refer to the selected cell.


5¶

Python automatically converts Tables and Ranges into pandas DataFrames.

Paste the following into cell F10

={"Student","Score"; "Alice",94; "Bob",85; "Cesar",92; "Dot",88;"Eustace",79}

Create a =PY cell and type

df = xl("F10#", headers=True)
df.plot(kind='bar', x='Student', y='Score')

Switch the output type to Excel Value, then click the small icon to create a chart reference. This allows you to move and resize the chart independent of the chart creation.


6 - Anaconda Toolbox¶

The Anaconda Toolbox helps you write Python code which is executed using Python in Excel.

  1. Open Anaconda Toolbox
  2. Sign In using your anaconda.cloud account
    • If you don't have one, you can create one now
  3. Click "Visualize with Python"
  4. Create a new Bar Chart
  5. Select Data using Choose Range and highlight the Student/Score range
  6. Select "Student" on the X-Axis
  7. Select "Score" on the Y-Axis
  8. Click "Create" and choose a location for the chart

Data Conversion using xl()¶

Python in Excel has default conversion from Excel types to Python types when using xl(), but this can be overridden.


7 - Fresh Start¶

Delete Sheet 1 so we have a fresh start to work with.

Add a new sheet and rename it "Data Conversion".

You should only have two sheets:

  • Reviews
  • Data Conversion (active)

Scalar xl() conversion¶

8¶

Paste into cell B5

TRUE
Hello PyData
-159
3.14159
5/5/2005
2024/01/01 13:14:15
0:01:59.4

9¶

Create a =PY in C5 and type

val = xl("B5")
type(val).__name__

Drag the formula to see how each Excel value is converted into Python.

NOTE: Even though Excel stores dates as numbers, the xl() function looks at the formatting to infer dates and times.


10 - Don't Try this at Home!¶

In general, don't drag Python formulas like this

We're doing it here so xl() has a single value for input, but normally for contiguous data you want to select the whole range and work with it as a single table.

Repeating Python formulas will slow down the execution significantly!


Range xl() conversion¶

When selecting a range, Excel will attempt to infer if the range has headers.

Ranges are always converted into pandas DataFrames. If no headers are provided, it defaults to numbered headers (0-based).

11¶

Paste into cell F5

={"Student","Score"; "Alice",94; "Bob",85; "Cesar",92; "Dot",88;"Eustace",79}

12¶

Create a =PY cell in I5.

Drag F5:G10. Expect Excel to write xl("F5:G10", headers=True) in the formula bar.

Create another =PY cell in I6.

Drag F6:G10 (don't include the header). Excel now writes xl("F6:G10") in the formula bar.

In both cases, the output should say "[PY] DataFrame".


13 - Initialization¶

The Initialization section contains code which is run before all other Python cells.

It is currently read-only, but will likely be made editable at some future time.

When it is editable, the Data Conversion and Custom Repr code should be placed here. Until then, we will place it in Cell A1 so it is run before other Python cells.


Custom xl array conversion¶

When providing a custom conversion from an Excel Range to a Python object, obj is a list of lists.

14¶

In cell A1, type

def our_custom_conversion(obj, headers=False, **kwargs):
    return np.array(obj)

excel.set_xl_array_conversion(our_custom_conversion)

"xl() conversion function"

Excel ranges will now be converted to a numpy array instead of a pandas DataFrame.

This affects all Python cells.

NOTE: A good practice when definining functions or other things which return None is to end with a string giving some indication of what the cell does. Think of it like a code comment.

Cells I5 and I6 should now say "[PY] ndarray"


15¶

This is great if you only want numpy arrays, but we can make this more flexible.

Update cell A1 to read

def our_custom_conversion(obj, headers=False, type='DataFrame', **kwargs):
    if type in {'ndarray', 'numpy', 'np', 'array'}:
        if headers:
            raise ValueError('Numpy does not support headers')
        return np.array(obj, **kwargs)
    if type in {None, 'DataFrame', 'pandas', 'pd', 'dataframe'}:
        return excel.convert_to_dataframe(obj, headers=headers, **kwargs)
    raise TypeError(f'Unknown type: {type}')

excel.set_xl_array_conversion(our_custom_conversion)

"xl() conversion function"

We added a type keyword, allowing us to specify what output type we want. Importantly, the default is the usual DataFrame so we are adding extra functionality, not changing the default behavior.


16¶

Cells I5 and I6 should now say "[PY] DataFrame" again.

Modify cell I6 with type='array' to convert to a numpy array.


17 - Even more type handling¶

Modify cell A1.

Add code to create a simple dictionary.

if type in {dict, 'dict'}:
        return {row[0]: row[1] for row in obj}

Create a =PY cell in I7

xl("F6:G10", type=dict)

Now we can easily create Python dictionaries from two Excel columns. Converting first to pandas and then breaking it apart into a dictionary is possible, but this is much easier and much cleaner.


18¶

Let's add one more.

Modify cell A1.

Add code to create a list from either a vertical or horizontal input range.

if type in {list, 'list'}:
        if len(obj) == 1:
            return obj[0]
        return [row[0] for row in obj]

Create a =PY cell in I8

xl("G6:G10", type=list)

19 - Discussion¶

Why create and register custom xl() loaders?

For a small amount of code in a workbook, probably not worth the effort. But for a larger workbook, if pandas DataFrame is not the right input format, a lot of code will be created simply to undo that step, complicating the analysis and making the real logic harder to follow.


xl() is a MACRO, not a function¶

Magic is happening before your code is sent to Azure. Think of xl() as a pre-processing MACRO, not as a true function.

20¶

This will fail

addr = "A1"
xl(addr)

21¶

This will fail

f'This is a string talking about {xl("A1")}'

22¶

This will fail

from functools import partial

xlheaders = partial(xl, headers=True)
xlheaders("F5:G10")

23¶

The lesson is that while you can customize how xl() is handled, it isn't actually a true function, but rather acts like a pre-process MACRO in C. Excel does some magic to resolve the reference, and the result is given as obj to the conversion function. But don't expect xl() to be as flexible as normal Python functions.


Custom Reprs¶

Now we will look at controlling how Python objects are rendered in Excel, both as an Excel Value and as a Python Object.

24 - Separation of Concerns¶

Let's create a new sheet named "Custom Reprs" to keep our learning cleanly separated.


25¶

Create a =PY cell in D4

50

Look at the card view and notice the fields.

You can click the Extract icon to the right of each field to send it to the grid.

There is also an Insert Data icon when hovering on the Python cell to extract data. Extracting data when something already exists to the right will find the next available cell to the right.


26¶

These fields are linked. Let's change cell D4 and watch them update.

list(range(50))

Notice how the value changed as well as the datatype.

Even though the card view shows only the array preview, all fields are still available for use.


Built-in reprs¶

Excel has built-in reprs for common Python types.

27¶

The first is a list of values, truncated to limit the number shown.

You have seen this with list. Change D4 to a tuple and see that it uses the same repr.

tuple(range(50))

The same happens for 1-d numpy arrays.

np.array(range(50))

28¶

The next built-in repr is for pandas DataFrames.

Create a =PY cell in D6

df = xl("IPhoneReviews[#All]", headers=True)

The card view displays a Table view.


29¶

If we select a single column, the same view is used for the pandas Series.

df = xl("IPhoneReviews[#All]", headers=True)
df['ratingScore']

NOTE: The entity card always shows the row index. However, when switching the output to Excel Value, the row index is only shown if it isn't a pure RangeIndex (i.e. 0..n)

30¶

To see this behavior, change D6

df = xl("IPhoneReviews[#All]", headers=True)
df[df.country == 'Mexico']['ratingScore']

The row index is no longer a pure RangeIndex, so it will spill to the grid.


Images¶

Excel also has default handling for static images (either from matplotlib or the Python Image Library (PIL).

31¶

Create a =PY cell in D8

df.groupby('country')['ratingScore'].mean().plot(kind='bar')

We can view the plot in the card view. We can also send the chart to the grid either using a Field or by changing the output type to Excel Value.

For the Excel Value, a new icon appears to create a Reference. This is the easiest way to make the chart larger and move it wherever is appropriate.

However, with the Field option, we can still get a Reference by right-clicking and selecting Picture in Cell -> Create Reference.


Create a Custom Repr¶

What if your Python object is not one of the built-in types?

One approach is to manually return one of the built-in types as the last call in your cell.

32¶

Let's look at how Excel handles Python's complex numbers.

Hint: it doesn't

Create a =PY cell in D10

c = 25 + 42j

The default entity card is used.

Switch the output to Excel Value and we get an #N/A error. Excel doesn't know how to display a complex number in the grid.


33¶

Let's help Excel out by giving it something it does know how to render.

c = 25 + 42j
[c.real, c.imag]

Now we get the complex number split into real and imaginary components in the grid.

34¶

If we wanted it to spill horizontally, we could do that as well.

c = 25 + 42j
[[c.real, c.imag]]

Associate an Excel Value¶

Excel has a mechanism to associate any Python object with a repr function. Let's create one now.

NOTE: As mentioned before, this would ideally go in Initialization, but as it's not editable, we will place this in cell A1.

35¶

Create a =PY cell in A1

def _complex_value_repr(c):
    return [[c.real, c.imag]]

excel.repr.register_repr_xl_value(complex, _complex_value_repr)

This creates a function to return the horizontal result we built previously. Then we register the function for the complex type.


36¶

Change cell D10 so it returns a complex number

c = 25 + 42j

It now works because _complex_value_repr is being called to render the object as an Excel value.


Alternate method for Custom Classes¶

If you are creating your own custom Class, there is a special dunder method _repr_xl_value_ which will achieve the same result without requiring you to register a function.

If you are writing your own class, this is the better approach.

37¶

Create a =PY cell in A2

import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def distance(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)

    def _repr_xl_value_(self):
        return [['x', self.x],
                ['y', self.y]]

38¶

Create a Point in cell D12

p = Point(3, 4)

Change the output type to Excel Value and see how the _repr_xl_value_ function controls how the object is spilled to Excel.

NOTE: Notice that I didn't have to register this. Any class with this special function will use it to spill to the grid.


Custom Entity Cards¶

Excel has a way to customize the entity card by returning a specially formatted dictionary.

Detailed information about entity cards: https://learn.microsoft.com/en-us/office/dev/add-ins/excel/excel-data-types-entity-card

39¶

Let's create a custom entity card for complex numbers.

Add the following to the bottom of cell A1

def _complex_preview_repr(c):
    return {
        "type": "Entity",
        "text": f"complex({c.real}, {c.imag})",
        "properties": {
            "real": c.real,
            "imag": c.imag,
        },
    }

excel.repr.register_repr_xl_preview(complex, _complex_preview_repr)

"complex repr"

40¶

Switch cell D10 from Excel Value output to Python Object.

Look at the entity card and see the real and imag properties.

These are also available as Fields.


41¶

Why does imag come before real? Excel sorts Fields alphabetically.

We can change the order in the card view, but not in the Field list.


42¶

Let's create a custom entity card for our Point class.

Add the following to the bottom of cell B1

def _repr_xl_preview_(self):
        return {
            "type": "Entity",
            "text": f"Point(x={self.x}, y={self.y})",
            "properties": {
                "x": self.x,
                "y": self.y,
                "distance": self.distance(),
                "array": (self.x, self.y),
                "labels": [['x', self.x], ['y', self.y]]

            },
            "layouts": {
                "card": {
                    "sections": [
                        {
                            "layout": "List",
                            "properties": ["x", "y", "distance"],
                        },
                        {
                            "layout": "List",
                            "properties": ["array", "labels"],
                        }
                    ],
                },
            },

        }

"Point"

43¶

Change cell D12 from Excel Value to Python Object and look at the card view.

Click on the icon next to array and labels. It shows a nested view of just that element.


44¶

Extract the array Field. One thing to notice is that these fields are normal Excel objects. They can be placed anywhere (doesn't need to respect row-major order).

Also, you can mix them with other Excel formulas. Try this:

=TRANSPOSE(D12.array)

45 - Discussion¶

Why would we use custom reprs?

Pros:

  • Allow a single computation to return multiple values, each accessible as an Excel Field

Cons:

  • If you store too much data inside the returned result, it could slow down computation as all that data must be returned from Azure even if the spreadsheet isn't going to use it

46 - Recap of Custom Reprs¶

Associate Python types with a custom repr using excel.repr.register_repr_xl_preview(<type>, <function>)

If creating a new class, simply include a method def _repr_xl_preview_(self): ...


NLTK Analysis¶

Disclaimer: We will be using NLTK for this analysis. However, I am not an expert or even an intermediate user of NLTK. I'm using it here as an example of a workflow using real libraries.

This tutorial is ultimately about using Python in Excel, not best practices for using NLTK.

We will be using the Vader library within NLTK

https://www.nltk.org/howto/sentiment.html#vader

The dataset we will work with is a set of Amazon reviews about the iPhone.

https://www.kaggle.com/datasets/mrmars1010/iphone-customer-reviews-nlp


Setup¶

Create a new tab named "NLTK Analysis".

Make sure it is to the right of the other sheets, as we will use some of the data conversions defined previously.

47¶

On the "NLTK Analysis" sheet, create a =PY cell in A1

from nltk.sentiment.vader import SentimentIntensityAnalyzer

sia = SentimentIntensityAnalyzer()

df_reviews = xl("IPhoneReviews[#All]", headers=True)

Normally, this would go in Initialization, but it lives here until that page is editable.


Analysis¶

Rather than giving the full solution, we will walk through step-by-step so you can see the process of debugging and working towards the final goal, all in Excel.

48¶

Let's start by looking at the review descriptions.

Create a =PY cell in D1

reviews = df_reviews['reviewDescription']

49¶

Modify D1 to calculate a sentiment score for each review

reviews = df_reviews['reviewDescription']
scores = reviews.apply(sia.polarity_scores)

50¶

Oh no! A #PYTHON! error -> AttributeError: 'NoneType' object has no attribute 'encode'

Let's try getting the sentiment of just a single review.

reviews = df_reviews['reviewDescription']
#scores = reviews.apply(sia.polarity_scores)
review = reviews[0]
sia.polarity_scores(review)

Now we get a valid result.


51¶

Let's do a search on the error, with extra context for good measure.

pandas apply AttributeError: 'NoneType' object has no attribute 'encode'


52¶

Let's check for None values in the ratings description column.

reviews = df_reviews['reviewDescription']
#scores = reviews.apply(sia.polarity_scores)
reviews[reviews.isna()]

Apparently there are 87 blanks. That explains our error.


53 - Debate¶

Should we remove those rows or simply set the rating to an empty string?


54¶

For now, let's set them as empty strings, essentially giving each a completely neutral sentiment score.

reviews = df_reviews['reviewDescription']
reviews[reviews.isna()] = ""
scores = reviews.apply(sia.polarity_scores)

55¶

Now we have a column of dictionaries. Luckily pandas has a method to convert a dictionary into columns based on the keys.

reviews = df_reviews['reviewDescription']
reviews[reviews.isna()] = ""
scores = pd.json_normalize(reviews.apply(sia.polarity_scores))

56¶

Looks good. Let's combine with the original DataFrame.

reviews = df_reviews['reviewDescription']
reviews[reviews.isna()] = ""
scores = pd.json_normalize(reviews.apply(sia.polarity_scores))
df_reviews = pd.concat([df_reviews, scores], axis=1)

57¶

Are we done?

No, because no one is going to remember what each line is doing two days from now.

Let's add comments because we can -- one more thing that makes Python nicer than regular Excel formulas.

Remember: Readability counts. (Zen of Python)

reviews = df_reviews['reviewDescription']
# Convert missing reviews into empty reviews
reviews[reviews.isna()] = ""
# Compute sentiment scores (compound, neg, neu, pos)
scores = reviews.apply(sia.polarity_scores)
# Convert dictionaries to columns
scores = pd.json_normalize(scores)
# Combine original table with scores
df_reviews = pd.concat([df_reviews, scores], axis=1)

View the results¶

58¶

One benefit of Excel over Jupyter Notebooks is that the output is easy to view. Let's make the results visible.

Set cell D1 to output as Excel Value.

Then style using normal Excel options (bold header, column widths, etc).

Unfortunately, we can't convert this into a proper Excel Table.

Spilled results are dynamic and are incompatible with Tables.


Visualize the results¶

We will use the Anaconda Toolbox to demonstrate an easy way to make Python charts.

59¶

On the Formulas ribbon, click Anaconda Toolbox.

Sign In using your anaconda.cloud account.

60¶

Click "Visualize with Python", then create a new Box Plot.

For the Data, select "Choose Range" and click on cell D1. The Toolbox is smart enough to know you want the entire spilled table.

Choose ratingScore for the X-Axis.

Choose compount for the Y-Axis.

Choose R1 as the chart output location.

61¶

Update the title and labels on the Design tab.

Click Create.

62¶

Cell R1 now contains Python code to build the box plot.

Click "Create Reference" to view a larger image.

63 - Discussion¶

What about the chart is expected and what is a surprise?


Wordcloud¶

Excel can do box and whisker charts. Let's do something Excel can't do.

Wordcloud generates an image of common words in a paragraph. It's fun, it looks nice, and it can impress the boss.

64¶

We will use the Anaconda AI Assistant to help us.

Make a new chat and choose the "Aggregate or summarize a table or range" template.

Choose the range D1 so the chat knows about our combined results table.


65¶

Ignore the suggestions and type

Filter the table to only show rating 5, then create a wordcloud from the reviewDescription field

66¶

Click the purple Play button below the code block, and choose cell T26 as the location.

Once the wordcloud is generated, click "Create Reference" to make it larger.


Stopwords¶

Some of the most common words are not about sentiment. Let's remove these using STOPWORDS.

Stopwords are common, but unhelpful words like a, an, the, in, on, etc.

Wordcloud has a default list, but we can add to the list.

68¶

Paste into cell R25 using Paste Special -> Text:

Ignore Words
phone
iphone

69¶

Now convert the stopwords into an Excel Table.

Name it IgnoreWords.


70¶

Add a regular Excel 5 in cell T24


71¶

Overwrite cell T25 as follows:

_df = xl("'NLTK Analysis'!$D$1#", headers=True)

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

# Filter the dataframe for rating
filtered_df = _df[_df['ratingScore'] == xl("T25")]

# Combine all review descriptions into a single string
text = ' '.join(filtered_df['reviewDescription'].astype(str))

# Create a word cloud
stopwords = list(STOPWORDS) + xl("IgnoreWords[Ignore Words]", type=list)
wordcloud = WordCloud(width=800, height=400, background_color='white', stopwords=stopwords).generate(text)

# Display the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

72¶

Now add some more stop words: Apple, Amazon

The wordchart will regenerate.

NOTE: This is one benefit of creating a Table. As new rows are added, the reference will pick up the change. No need to update the range.


73¶

Add a regular Excel 4, 3, 2, 1 in cells U24:X24.

Drag the formula in cell T25 to the right. The Python formula should work as we made everything into references.

74¶

Add more stopwords as appropriate: day, will, one, even, à


Wrap Up¶

Python is a vast ecosystem with lots of amazing libraries.

Hopefully you have some new tools in your Excel toolbelt.

  • Data conversion from Excel to Python via xl()
  • Conversion from Python objects to Excel Objects
  • Custom entity cards with Fields
  • An example of using Excel Tables to control calculations
  • An example of how Python formulas still follow the same rules for duplicating formulas
In [ ]: